Changes since v5:
* Restructure the docs and add some more explanations
* PMMIR.WIDTH -> PMMIR.THWIDTH in one comment
* Don't write EVTYPER.TC if TH is 0. Doesn't have any functional
effect but it might be a bit easier to understand the code.
* Expand the format field #define names
Changes since v4:
* Rebase onto v6.7-rc1, it no longer depends on kvmarm/next
* Remove change that moved ARMV8_PMU_EVTYPE_MASK to the asm files.
This actually depended on those files being included in a certain
order with arm_pmuv3.h to avoid circular includes. Now the
definition is done programmatically in arm_pmuv3.c instead.
Changes since v3:
* Drop #include changes to KVM source files because since
commit bc512d6a9b92 ("KVM: arm64: Make PMEVTYPER<n>_EL0.NSH RES0 if
EL2 isn't advertised"), KVM doesn't use ARMV8_PMU_EVTYPE_MASK
anymore
Changes since v2:
* Split threshold_control attribute into two, threshold_compare and
threshold_count so that it's easier to use
* Add some notes to the first commit message and the cover letter
about the behavior in KVM
* Update the docs commit with regards to the split attribute
Changes since v1:
* Fix build on aarch32 by disabling FEAT_PMUv3_TH and splitting event
type mask between the platforms
* Change armv8pmu_write_evtype() to take unsigned long instead of u64
so it isn't unnecessarily wide on aarch32
* Add UL suffix to aarch64 event type mask definition
----
FEAT_PMUv3_TH (Armv8.8) is a new feature that allows conditional
counting of PMU events depending on how much the event increments on
a single cycle. Two new config fields for perf_event_open have been
added, and a PMU cap file for reading the max_threshold. See the second
commit message and the docs in the last commit for more details.
The feature is not currently supported on KVM guests, and PMMIR is set
to read as zero, so it's not advertised as available. But it can be
added at a later time. Writes to PMEVTYPER.TC and TH from guests are
already RES0.
The change has been validated on the Arm FVP model:
# Zero values, works as expected (as before).
$ perf stat -e dtlb_walk/threshold=0,threshold_compare=0/ -- true
5962 dtlb_walk/threshold=0,threshold_compare=0/
# Threshold >= 255 causes count to be 0 because dtlb_walk doesn't
# increase by more than 1 per cycle.
$ perf stat -e dtlb_walk/threshold=255,threshold_compare=2/ -- true
0 dtlb_walk/threshold=255,threshold_compare=2/
# Keeping comparison as >= but lowering the threshold to 1 makes the
# count return.
$ perf stat -e dtlb_walk/threshold=1,threshold_compare=2/ -- true
6329 dtlb_walk/threshold=1,threshold_compare=2/
James Clark (3):
arm64: perf: Include threshold control fields in PMEVTYPER mask
arm64: perf: Add support for event counting threshold
Documentation: arm64: Document the PMU event counting threshold
feature
Documentation/arch/arm64/perf.rst | 72 ++++++++++++++++++++++++
drivers/perf/arm_pmuv3.c | 93 ++++++++++++++++++++++++++++++-
include/linux/perf/arm_pmuv3.h | 4 +-
3 files changed, 166 insertions(+), 3 deletions(-)
--
2.34.1
Add documentation for the new Perf event open parameters and
the threshold_max capability file.
Signed-off-by: James Clark <[email protected]>
---
Documentation/arch/arm64/perf.rst | 72 +++++++++++++++++++++++++++++++
1 file changed, 72 insertions(+)
diff --git a/Documentation/arch/arm64/perf.rst b/Documentation/arch/arm64/perf.rst
index 1f87b57c2332..41eee68951ff 100644
--- a/Documentation/arch/arm64/perf.rst
+++ b/Documentation/arch/arm64/perf.rst
@@ -164,3 +164,75 @@ and should be used to mask the upper bits as needed.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
.. _tools/lib/perf/tests/test-evsel.c:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c
+
+Event Counting Threshold
+==========================================
+
+Overview
+--------
+
+FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on
+events whose count meets a specified threshold condition. For example if
+threshold_compare is set to 2 ('Greater than or equal'), and the
+threshold is set to 2, then the PMU counter will now only increment by
+when an event would have previously incremented the PMU counter by 2 or
+more on a single processor cycle.
+
+To increment by 1 after passing the threshold condition instead of the
+number of events on that cycle, add the 'threshold_count' option to the
+commandline.
+
+How-to
+------
+
+These are the parameters for controlling the feature:
+
+.. list-table::
+ :header-rows: 1
+
+ * - Parameter
+ - Description
+ * - threshold
+ - Value to threshold the event by. A value of 0 means that
+ thresholding is disabled and the other parameters have no effect.
+ * - threshold_compare
+ - | Comparison function to use, with the following values supported:
+ |
+ | 0: Not-equal
+ | 1: Equals
+ | 2: Greater-than-or-equal
+ | 3: Less-than
+ * - threshold_count
+ - If this is set, count by 1 after passing the threshold condition
+ instead of the value of the event on this cycle.
+
+The threshold, threshold_compare and threshold_count values can be
+provided per event, for example:
+
+.. code-block:: sh
+
+ perf stat -e stall_slot/threshold=2,threshold_compare=2/ \
+ -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/
+
+In this example the stall_slot event will count by 2 or more on every
+cycle where 2 or more stalls happen. And dtlb_walk will count by 1 on
+every cycle where the number of dtlb walks were less than 10.
+
+The maximum supported threshold value can be read from the caps of each
+PMU, for example:
+
+.. code-block:: sh
+
+ cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max
+
+ 0x000000ff
+
+If a value higher than this is given, then it will be silently clamped
+to the maximum. The highest possible maximum is 4095, as the config
+field for threshold is limited to 12 bits, and the Perf tool will refuse
+to parse higher values.
+
+If the PMU doesn't support FEAT_PMUv3_TH, then threshold_max will read
+0, and both threshold and threshold_compare will be silently ignored.
+threshold_max will also read as 0 on aarch32 guests, even if the host
+is running on hardware with the feature.
--
2.34.1
FEAT_PMUv3_TH (Armv8.8) adds two new fields to PMEVTYPER, so include
them in the mask. These aren't writable on 32 bit kernels as they are in
the high part of the register, so only include them for arm64.
It would be difficult to do this statically in the asm header files for
each platform without resulting in circular includes or #ifdefs inline
in the code. For that reason the ARMV8_PMU_EVTYPE_MASK definition has
been removed and the mask is constructed programmatically.
Reviewed-by: Suzuki K Poulose <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Signed-off-by: James Clark <[email protected]>
---
drivers/perf/arm_pmuv3.c | 9 ++++++++-
include/linux/perf/arm_pmuv3.h | 3 ++-
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 6ca7be05229c..1d40d794f5e4 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -555,8 +555,15 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
static inline void armv8pmu_write_evtype(int idx, u32 val)
{
u32 counter = ARMV8_IDX_TO_COUNTER(idx);
+ unsigned long mask = ARMV8_PMU_EVTYPE_EVENT |
+ ARMV8_PMU_INCLUDE_EL2 |
+ ARMV8_PMU_EXCLUDE_EL0 |
+ ARMV8_PMU_EXCLUDE_EL1;
- val &= ARMV8_PMU_EVTYPE_MASK;
+ if (IS_ENABLED(CONFIG_ARM64))
+ mask |= ARMV8_PMU_EVTYPE_TC | ARMV8_PMU_EVTYPE_TH;
+
+ val &= mask;
write_pmevtypern(counter, val);
}
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index 9c226adf938a..ddd1fec86739 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -228,7 +228,8 @@
/*
* PMXEVTYPER: Event selection reg
*/
-#define ARMV8_PMU_EVTYPE_MASK 0xc800ffff /* Mask for writable bits */
+#define ARMV8_PMU_EVTYPE_TH GENMASK(43, 32)
+#define ARMV8_PMU_EVTYPE_TC GENMASK(63, 61)
#define ARMV8_PMU_EVTYPE_EVENT 0xffff /* Mask for EVENT bits */
/*
--
2.34.1
FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on
events whose count meets a specified threshold condition. For example if
PMEVTYPERn.TC (Threshold Control) is set to 0b101 (Greater than or
equal, count), and the threshold is set to 2, then the PMU counter will
now only increment by 1 when an event would have previously incremented
the PMU counter by 2 or more on a single processor cycle.
Three new Perf event config fields, 'threshold', 'threshold_compare' and
'threshold_count' have been added to control the feature.
threshold_compare maps to the upper two bits of PMEVTYPERn.TC and
threshold_count maps to the first bit of TC. These separate attributes
have been picked rather than enumerating all the possible combinations
of the TC field as in the Arm ARM. The attributes would be used on a
Perf command line like this:
$ perf stat -e stall_slot/threshold=2,threshold_compare=2/
A new capability for reading out the maximum supported threshold value
has also been added:
$ cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max
0x000000ff
If a threshold higher than threshold_max is provided, then no error is
generated but the threshold is clamped to the max value. If
FEAT_PMUv3_TH isn't implemented or a 32 bit kernel is running, then
threshold_max reads zero, and neither the 'threshold' nor
'threshold_control' parameters will be used.
The threshold is per PMU counter, and there are potentially different
threshold_max values per PMU type on heterogeneous systems.
Bits higher than 32 now need to be written into PMEVTYPER, so
armv8pmu_write_evtype() has to be updated to take an unsigned long value
rather than u32 which gives the correct behavior on both aarch32 and 64.
Reviewed-by: Suzuki K Poulose <[email protected]>
Signed-off-by: James Clark <[email protected]>
---
drivers/perf/arm_pmuv3.c | 84 +++++++++++++++++++++++++++++++++-
include/linux/perf/arm_pmuv3.h | 1 +
2 files changed, 84 insertions(+), 1 deletion(-)
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 1d40d794f5e4..eb1ef84e1dbb 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -15,6 +15,7 @@
#include <clocksource/arm_arch_timer.h>
#include <linux/acpi.h>
+#include <linux/bitfield.h>
#include <linux/clocksource.h>
#include <linux/of.h>
#include <linux/perf/arm_pmu.h>
@@ -294,9 +295,20 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = {
.is_visible = armv8pmu_event_attr_is_visible,
};
+#define THRESHOLD_LOW 2
+#define THRESHOLD_HIGH 13
+#define THRESHOLD_CNT 14
+#define THRESHOLD_CMP_LO 15
+#define THRESHOLD_CMP_HI 16
+
PMU_FORMAT_ATTR(event, "config:0-15");
PMU_FORMAT_ATTR(long, "config1:0");
PMU_FORMAT_ATTR(rdpmc, "config1:1");
+PMU_FORMAT_ATTR(threshold, "config1:" __stringify(THRESHOLD_LOW) "-"
+ __stringify(THRESHOLD_HIGH));
+PMU_FORMAT_ATTR(threshold_compare, "config1:" __stringify(THRESHOLD_CMP_LO) "-"
+ __stringify(THRESHOLD_CMP_HI));
+PMU_FORMAT_ATTR(threshold_count, "config1:" __stringify(THRESHOLD_CNT));
static int sysctl_perf_user_access __read_mostly;
@@ -310,10 +322,33 @@ static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
return event->attr.config1 & 0x2;
}
+static inline u32 armv8pmu_event_threshold(struct perf_event_attr *attr)
+{
+ return FIELD_GET(GENMASK(THRESHOLD_HIGH, THRESHOLD_LOW), attr->config1);
+}
+
+static inline u8 armv8pmu_event_threshold_control(struct perf_event_attr *attr)
+{
+ u8 th_compare = FIELD_GET(GENMASK(THRESHOLD_CMP_HI, THRESHOLD_CMP_LO),
+ attr->config1);
+ u8 th_count = FIELD_GET(BIT(THRESHOLD_CNT), attr->config1);
+
+ /*
+ * The count bit is always the bottom bit of the full control field, and
+ * the comparison is the upper two bits, but it's not explicitly
+ * labelled in the Arm ARM. For the Perf interface we split it into two
+ * fields, so reconstruct it here.
+ */
+ return (th_compare << 1) | th_count;
+}
+
static struct attribute *armv8_pmuv3_format_attrs[] = {
&format_attr_event.attr,
&format_attr_long.attr,
&format_attr_rdpmc.attr,
+ &format_attr_threshold.attr,
+ &format_attr_threshold_compare.attr,
+ &format_attr_threshold_count.attr,
NULL,
};
@@ -365,10 +400,38 @@ static ssize_t bus_width_show(struct device *dev, struct device_attribute *attr,
static DEVICE_ATTR_RO(bus_width);
+static u32 threshold_max(struct arm_pmu *cpu_pmu)
+{
+ /*
+ * PMMIR.THWIDTH is readable and non-zero on aarch32, but it would be
+ * impossible to write the threshold in the upper 32 bits of PMEVTYPER.
+ */
+ if (IS_ENABLED(CONFIG_ARM))
+ return 0;
+
+ /*
+ * The largest value that can be written to PMEVTYPER<n>_EL0.TH is
+ * (2 ^ PMMIR.THWIDTH) - 1.
+ */
+ return (1 << FIELD_GET(ARMV8_PMU_THWIDTH, cpu_pmu->reg_pmmir)) - 1;
+}
+
+static ssize_t threshold_max_show(struct device *dev,
+ struct device_attribute *attr, char *page)
+{
+ struct pmu *pmu = dev_get_drvdata(dev);
+ struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
+
+ return sysfs_emit(page, "0x%08x\n", threshold_max(cpu_pmu));
+}
+
+static DEVICE_ATTR_RO(threshold_max);
+
static struct attribute *armv8_pmuv3_caps_attrs[] = {
&dev_attr_slots.attr,
&dev_attr_bus_slots.attr,
&dev_attr_bus_width.attr,
+ &dev_attr_threshold_max.attr,
NULL,
};
@@ -552,7 +615,7 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
armv8pmu_write_hw_counter(event, value);
}
-static inline void armv8pmu_write_evtype(int idx, u32 val)
+static inline void armv8pmu_write_evtype(int idx, unsigned long val)
{
u32 counter = ARMV8_IDX_TO_COUNTER(idx);
unsigned long mask = ARMV8_PMU_EVTYPE_EVENT |
@@ -921,6 +984,10 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
struct perf_event_attr *attr)
{
unsigned long config_base = 0;
+ struct perf_event *perf_event = container_of(attr, struct perf_event,
+ attr);
+ struct arm_pmu *cpu_pmu = to_arm_pmu(perf_event->pmu);
+ u32 th, th_max;
if (attr->exclude_idle)
return -EPERM;
@@ -952,6 +1019,21 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
if (attr->exclude_user)
config_base |= ARMV8_PMU_EXCLUDE_EL0;
+ /*
+ * Insert event counting threshold (FEAT_PMUv3_TH) values. If
+ * FEAT_PMUv3_TH isn't implemented, then THWIDTH (threshold_max) will be
+ * 0 and no values will be written.
+ */
+ th_max = threshold_max(cpu_pmu);
+ if (IS_ENABLED(CONFIG_ARM64) && th_max) {
+ th = min(armv8pmu_event_threshold(attr), th_max);
+ if (th) {
+ config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TH, th);
+ config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TC,
+ armv8pmu_event_threshold_control(attr));
+ }
+ }
+
/*
* Install the filter into config_base as this is used to
* construct the event type.
diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
index ddd1fec86739..ccbc0f9a74d8 100644
--- a/include/linux/perf/arm_pmuv3.h
+++ b/include/linux/perf/arm_pmuv3.h
@@ -258,6 +258,7 @@
#define ARMV8_PMU_BUS_SLOTS_MASK 0xff
#define ARMV8_PMU_BUS_WIDTH_SHIFT 16
#define ARMV8_PMU_BUS_WIDTH_MASK 0xf
+#define ARMV8_PMU_THWIDTH GENMASK(23, 20)
/*
* This code is really good
--
2.34.1
On 11/24/23 15:58, James Clark wrote:
> FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on
> events whose count meets a specified threshold condition. For example if
> PMEVTYPERn.TC (Threshold Control) is set to 0b101 (Greater than or
> equal, count), and the threshold is set to 2, then the PMU counter will
> now only increment by 1 when an event would have previously incremented
> the PMU counter by 2 or more on a single processor cycle.
>
> Three new Perf event config fields, 'threshold', 'threshold_compare' and
> 'threshold_count' have been added to control the feature.
> threshold_compare maps to the upper two bits of PMEVTYPERn.TC and
> threshold_count maps to the first bit of TC. These separate attributes
> have been picked rather than enumerating all the possible combinations
> of the TC field as in the Arm ARM. The attributes would be used on a
> Perf command line like this:
>
> $ perf stat -e stall_slot/threshold=2,threshold_compare=2/
>
> A new capability for reading out the maximum supported threshold value
> has also been added:
>
> $ cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max
>
> 0x000000ff
>
> If a threshold higher than threshold_max is provided, then no error is
> generated but the threshold is clamped to the max value. If
> FEAT_PMUv3_TH isn't implemented or a 32 bit kernel is running, then
> threshold_max reads zero, and neither the 'threshold' nor
> 'threshold_control' parameters will be used.
>
> The threshold is per PMU counter, and there are potentially different
> threshold_max values per PMU type on heterogeneous systems.
>
> Bits higher than 32 now need to be written into PMEVTYPER, so
> armv8pmu_write_evtype() has to be updated to take an unsigned long value
> rather than u32 which gives the correct behavior on both aarch32 and 64.
>
> Reviewed-by: Suzuki K Poulose <[email protected]>
> Signed-off-by: James Clark <[email protected]>
> ---
> drivers/perf/arm_pmuv3.c | 84 +++++++++++++++++++++++++++++++++-
> include/linux/perf/arm_pmuv3.h | 1 +
> 2 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 1d40d794f5e4..eb1ef84e1dbb 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -15,6 +15,7 @@
> #include <clocksource/arm_arch_timer.h>
>
> #include <linux/acpi.h>
> +#include <linux/bitfield.h>
> #include <linux/clocksource.h>
> #include <linux/of.h>
> #include <linux/perf/arm_pmu.h>
> @@ -294,9 +295,20 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = {
> .is_visible = armv8pmu_event_attr_is_visible,
> };
>
> +#define THRESHOLD_LOW 2
> +#define THRESHOLD_HIGH 13
> +#define THRESHOLD_CNT 14
> +#define THRESHOLD_CMP_LO 15
> +#define THRESHOLD_CMP_HI 16
> +
> PMU_FORMAT_ATTR(event, "config:0-15");
> PMU_FORMAT_ATTR(long, "config1:0");
> PMU_FORMAT_ATTR(rdpmc, "config1:1");
> +PMU_FORMAT_ATTR(threshold, "config1:" __stringify(THRESHOLD_LOW) "-"
> + __stringify(THRESHOLD_HIGH));
> +PMU_FORMAT_ATTR(threshold_compare, "config1:" __stringify(THRESHOLD_CMP_LO) "-"
> + __stringify(THRESHOLD_CMP_HI));
> +PMU_FORMAT_ATTR(threshold_count, "config1:" __stringify(THRESHOLD_CNT));
Small nit - could this be formatted better ? Is not that the column could go
upto 100 without setting off checkpatch.pl warning these days ?
>
> static int sysctl_perf_user_access __read_mostly;
>
> @@ -310,10 +322,33 @@ static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
> return event->attr.config1 & 0x2;
> }
>
> +static inline u32 armv8pmu_event_threshold(struct perf_event_attr *attr)
> +{
> + return FIELD_GET(GENMASK(THRESHOLD_HIGH, THRESHOLD_LOW), attr->config1);
> +}
> +
> +static inline u8 armv8pmu_event_threshold_control(struct perf_event_attr *attr)
> +{
> + u8 th_compare = FIELD_GET(GENMASK(THRESHOLD_CMP_HI, THRESHOLD_CMP_LO),
> + attr->config1);
Ditto
> + u8 th_count = FIELD_GET(BIT(THRESHOLD_CNT), attr->config1);
> +
> + /*
> + * The count bit is always the bottom bit of the full control field, and
> + * the comparison is the upper two bits, but it's not explicitly
> + * labelled in the Arm ARM. For the Perf interface we split it into two
> + * fields, so reconstruct it here.
> + */
> + return (th_compare << 1) | th_count;
> +}
> +
> static struct attribute *armv8_pmuv3_format_attrs[] = {
> &format_attr_event.attr,
> &format_attr_long.attr,
> &format_attr_rdpmc.attr,
> + &format_attr_threshold.attr,
> + &format_attr_threshold_compare.attr,
> + &format_attr_threshold_count.attr,
> NULL,
> };
>
> @@ -365,10 +400,38 @@ static ssize_t bus_width_show(struct device *dev, struct device_attribute *attr,
>
> static DEVICE_ATTR_RO(bus_width);
>
> +static u32 threshold_max(struct arm_pmu *cpu_pmu)
> +{
> + /*
> + * PMMIR.THWIDTH is readable and non-zero on aarch32, but it would be
> + * impossible to write the threshold in the upper 32 bits of PMEVTYPER.
> + */
> + if (IS_ENABLED(CONFIG_ARM))
> + return 0;
> +
> + /*
> + * The largest value that can be written to PMEVTYPER<n>_EL0.TH is
> + * (2 ^ PMMIR.THWIDTH) - 1.
> + */
> + return (1 << FIELD_GET(ARMV8_PMU_THWIDTH, cpu_pmu->reg_pmmir)) - 1;
> +}
> +
> +static ssize_t threshold_max_show(struct device *dev,
> + struct device_attribute *attr, char *page)
> +{
> + struct pmu *pmu = dev_get_drvdata(dev);
> + struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
> +
> + return sysfs_emit(page, "0x%08x\n", threshold_max(cpu_pmu));
> +}
> +
> +static DEVICE_ATTR_RO(threshold_max);
> +
> static struct attribute *armv8_pmuv3_caps_attrs[] = {
> &dev_attr_slots.attr,
> &dev_attr_bus_slots.attr,
> &dev_attr_bus_width.attr,
> + &dev_attr_threshold_max.attr,
> NULL,
> };
>
> @@ -552,7 +615,7 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
> armv8pmu_write_hw_counter(event, value);
> }
>
> -static inline void armv8pmu_write_evtype(int idx, u32 val)
> +static inline void armv8pmu_write_evtype(int idx, unsigned long val)
> {
> u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> unsigned long mask = ARMV8_PMU_EVTYPE_EVENT |
> @@ -921,6 +984,10 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
> struct perf_event_attr *attr)
> {
> unsigned long config_base = 0;
> + struct perf_event *perf_event = container_of(attr, struct perf_event,
> + attr);
Ditto
> + struct arm_pmu *cpu_pmu = to_arm_pmu(perf_event->pmu);
> + u32 th, th_max;
>
> if (attr->exclude_idle)
> return -EPERM;
> @@ -952,6 +1019,21 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
> if (attr->exclude_user)
> config_base |= ARMV8_PMU_EXCLUDE_EL0;
>
> + /*
> + * Insert event counting threshold (FEAT_PMUv3_TH) values. If
> + * FEAT_PMUv3_TH isn't implemented, then THWIDTH (threshold_max) will be
> + * 0 and no values will be written.
> + */
> + th_max = threshold_max(cpu_pmu);
> + if (IS_ENABLED(CONFIG_ARM64) && th_max) {
> + th = min(armv8pmu_event_threshold(attr), th_max);
> + if (th) {
> + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TH, th);
> + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TC,
> + armv8pmu_event_threshold_control(attr));
Ditto. As mentioned earlier this could have been avoided using a local variable.
> + }
> + }
> +
> /*
> * Install the filter into config_base as this is used to
> * construct the event type.
> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
> index ddd1fec86739..ccbc0f9a74d8 100644
> --- a/include/linux/perf/arm_pmuv3.h
> +++ b/include/linux/perf/arm_pmuv3.h
> @@ -258,6 +258,7 @@
> #define ARMV8_PMU_BUS_SLOTS_MASK 0xff
> #define ARMV8_PMU_BUS_WIDTH_SHIFT 16
> #define ARMV8_PMU_BUS_WIDTH_MASK 0xf
> +#define ARMV8_PMU_THWIDTH GENMASK(23, 20)
>
> /*
> * This code is really good
Otherwise LGTM
Reviewed-by: Anshuman Khandual <[email protected]>
On 11/24/23 15:58, James Clark wrote:
> Add documentation for the new Perf event open parameters and
> the threshold_max capability file.
>
> Signed-off-by: James Clark <[email protected]>
> ---
> Documentation/arch/arm64/perf.rst | 72 +++++++++++++++++++++++++++++++
> 1 file changed, 72 insertions(+)
>
> diff --git a/Documentation/arch/arm64/perf.rst b/Documentation/arch/arm64/perf.rst
> index 1f87b57c2332..41eee68951ff 100644
> --- a/Documentation/arch/arm64/perf.rst
> +++ b/Documentation/arch/arm64/perf.rst
> @@ -164,3 +164,75 @@ and should be used to mask the upper bits as needed.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
> .. _tools/lib/perf/tests/test-evsel.c:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c
> +
> +Event Counting Threshold
> +==========================================
> +
> +Overview
> +--------
> +
> +FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on
> +events whose count meets a specified threshold condition. For example if
> +threshold_compare is set to 2 ('Greater than or equal'), and the
> +threshold is set to 2, then the PMU counter will now only increment by
> +when an event would have previously incremented the PMU counter by 2 or
> +more on a single processor cycle.
> +
> +To increment by 1 after passing the threshold condition instead of the
> +number of events on that cycle, add the 'threshold_count' option to the
> +commandline.
> +
> +How-to
> +------
> +
> +These are the parameters for controlling the feature:
> +
> +.. list-table::
> + :header-rows: 1
> +
> + * - Parameter
> + - Description
> + * - threshold
> + - Value to threshold the event by. A value of 0 means that
> + thresholding is disabled and the other parameters have no effect.
> + * - threshold_compare
> + - | Comparison function to use, with the following values supported:
> + |
> + | 0: Not-equal
> + | 1: Equals
> + | 2: Greater-than-or-equal
> + | 3: Less-than
> + * - threshold_count
> + - If this is set, count by 1 after passing the threshold condition
> + instead of the value of the event on this cycle.
> +
> +The threshold, threshold_compare and threshold_count values can be
> +provided per event, for example:
> +
> +.. code-block:: sh
> +
> + perf stat -e stall_slot/threshold=2,threshold_compare=2/ \
> + -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/
> +
> +In this example the stall_slot event will count by 2 or more on every
> +cycle where 2 or more stalls happen. And dtlb_walk will count by 1 on
> +every cycle where the number of dtlb walks were less than 10.
> +
> +The maximum supported threshold value can be read from the caps of each
> +PMU, for example:
> +
> +.. code-block:: sh
> +
> + cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max
> +
> + 0x000000ff
> +
> +If a value higher than this is given, then it will be silently clamped
> +to the maximum. The highest possible maximum is 4095, as the config
> +field for threshold is limited to 12 bits, and the Perf tool will refuse
> +to parse higher values.
> +
> +If the PMU doesn't support FEAT_PMUv3_TH, then threshold_max will read
> +0, and both threshold and threshold_compare will be silently ignored.
> +threshold_max will also read as 0 on aarch32 guests, even if the host
> +is running on hardware with the feature.
Reviewed-by: Anshuman Khandual <[email protected]>
On 27/11/2023 05:32, Anshuman Khandual wrote:
>
>
> On 11/24/23 15:58, James Clark wrote:
>> FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on
>> events whose count meets a specified threshold condition. For example if
>> PMEVTYPERn.TC (Threshold Control) is set to 0b101 (Greater than or
>> equal, count), and the threshold is set to 2, then the PMU counter will
>> now only increment by 1 when an event would have previously incremented
>> the PMU counter by 2 or more on a single processor cycle.
>>
>> Three new Perf event config fields, 'threshold', 'threshold_compare' and
>> 'threshold_count' have been added to control the feature.
>> threshold_compare maps to the upper two bits of PMEVTYPERn.TC and
>> threshold_count maps to the first bit of TC. These separate attributes
>> have been picked rather than enumerating all the possible combinations
>> of the TC field as in the Arm ARM. The attributes would be used on a
>> Perf command line like this:
>>
>> $ perf stat -e stall_slot/threshold=2,threshold_compare=2/
>>
>> A new capability for reading out the maximum supported threshold value
>> has also been added:
>>
>> $ cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max
>>
>> 0x000000ff
>>
>> If a threshold higher than threshold_max is provided, then no error is
>> generated but the threshold is clamped to the max value. If
>> FEAT_PMUv3_TH isn't implemented or a 32 bit kernel is running, then
>> threshold_max reads zero, and neither the 'threshold' nor
>> 'threshold_control' parameters will be used.
>>
>> The threshold is per PMU counter, and there are potentially different
>> threshold_max values per PMU type on heterogeneous systems.
>>
>> Bits higher than 32 now need to be written into PMEVTYPER, so
>> armv8pmu_write_evtype() has to be updated to take an unsigned long value
>> rather than u32 which gives the correct behavior on both aarch32 and 64.
>>
>> Reviewed-by: Suzuki K Poulose <[email protected]>
>> Signed-off-by: James Clark <[email protected]>
>> ---
>> drivers/perf/arm_pmuv3.c | 84 +++++++++++++++++++++++++++++++++-
>> include/linux/perf/arm_pmuv3.h | 1 +
>> 2 files changed, 84 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
>> index 1d40d794f5e4..eb1ef84e1dbb 100644
>> --- a/drivers/perf/arm_pmuv3.c
>> +++ b/drivers/perf/arm_pmuv3.c
>> @@ -15,6 +15,7 @@
>> #include <clocksource/arm_arch_timer.h>
>>
>> #include <linux/acpi.h>
>> +#include <linux/bitfield.h>
>> #include <linux/clocksource.h>
>> #include <linux/of.h>
>> #include <linux/perf/arm_pmu.h>
>> @@ -294,9 +295,20 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = {
>> .is_visible = armv8pmu_event_attr_is_visible,
>> };
>>
>> +#define THRESHOLD_LOW 2
>> +#define THRESHOLD_HIGH 13
>> +#define THRESHOLD_CNT 14
>> +#define THRESHOLD_CMP_LO 15
>> +#define THRESHOLD_CMP_HI 16
>> +
>> PMU_FORMAT_ATTR(event, "config:0-15");
>> PMU_FORMAT_ATTR(long, "config1:0");
>> PMU_FORMAT_ATTR(rdpmc, "config1:1");
>> +PMU_FORMAT_ATTR(threshold, "config1:" __stringify(THRESHOLD_LOW) "-"
>> + __stringify(THRESHOLD_HIGH));
>> +PMU_FORMAT_ATTR(threshold_compare, "config1:" __stringify(THRESHOLD_CMP_LO) "-"
>> + __stringify(THRESHOLD_CMP_HI));
>> +PMU_FORMAT_ATTR(threshold_count, "config1:" __stringify(THRESHOLD_CNT));
>
> Small nit - could this be formatted better ? Is not that the column could go
> upto 100 without setting off checkpatch.pl warning these days ?
I think it looks perfectly readable to me, is there a specific
formatting rule that's been broken? And no, it can't be unindented
without exceeding the 100 char limit.
>
>>
>> static int sysctl_perf_user_access __read_mostly;
>>
>> @@ -310,10 +322,33 @@ static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
>> return event->attr.config1 & 0x2;
>> }
>>
>> +static inline u32 armv8pmu_event_threshold(struct perf_event_attr *attr)
>> +{
>> + return FIELD_GET(GENMASK(THRESHOLD_HIGH, THRESHOLD_LOW), attr->config1);
>> +}
>> +
>> +static inline u8 armv8pmu_event_threshold_control(struct perf_event_attr *attr)
>> +{
>> + u8 th_compare = FIELD_GET(GENMASK(THRESHOLD_CMP_HI, THRESHOLD_CMP_LO),
>> + attr->config1);
>
> Ditto
>
There's no rule saying that you can't indent _before_ 100 chars. Most of
the code in this file is indented at 80 chars, and consistency is
usually valued above other things.
>> + u8 th_count = FIELD_GET(BIT(THRESHOLD_CNT), attr->config1);
>> +
>> + /*
>> + * The count bit is always the bottom bit of the full control field, and
>> + * the comparison is the upper two bits, but it's not explicitly
>> + * labelled in the Arm ARM. For the Perf interface we split it into two
>> + * fields, so reconstruct it here.
>> + */
>> + return (th_compare << 1) | th_count;
>> +}
>> +
>> static struct attribute *armv8_pmuv3_format_attrs[] = {
>> &format_attr_event.attr,
>> &format_attr_long.attr,
>> &format_attr_rdpmc.attr,
>> + &format_attr_threshold.attr,
>> + &format_attr_threshold_compare.attr,
>> + &format_attr_threshold_count.attr,
>> NULL,
>> };
>>
>> @@ -365,10 +400,38 @@ static ssize_t bus_width_show(struct device *dev, struct device_attribute *attr,
>>
>> static DEVICE_ATTR_RO(bus_width);
>>
>> +static u32 threshold_max(struct arm_pmu *cpu_pmu)
>> +{
>> + /*
>> + * PMMIR.THWIDTH is readable and non-zero on aarch32, but it would be
>> + * impossible to write the threshold in the upper 32 bits of PMEVTYPER.
>> + */
>> + if (IS_ENABLED(CONFIG_ARM))
>> + return 0;
>> +
>> + /*
>> + * The largest value that can be written to PMEVTYPER<n>_EL0.TH is
>> + * (2 ^ PMMIR.THWIDTH) - 1.
>> + */
>> + return (1 << FIELD_GET(ARMV8_PMU_THWIDTH, cpu_pmu->reg_pmmir)) - 1;
>> +}
>> +
>> +static ssize_t threshold_max_show(struct device *dev,
>> + struct device_attribute *attr, char *page)
>> +{
>> + struct pmu *pmu = dev_get_drvdata(dev);
>> + struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
>> +
>> + return sysfs_emit(page, "0x%08x\n", threshold_max(cpu_pmu));
>> +}
>> +
>> +static DEVICE_ATTR_RO(threshold_max);
>> +
>> static struct attribute *armv8_pmuv3_caps_attrs[] = {
>> &dev_attr_slots.attr,
>> &dev_attr_bus_slots.attr,
>> &dev_attr_bus_width.attr,
>> + &dev_attr_threshold_max.attr,
>> NULL,
>> };
>>
>> @@ -552,7 +615,7 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
>> armv8pmu_write_hw_counter(event, value);
>> }
>>
>> -static inline void armv8pmu_write_evtype(int idx, u32 val)
>> +static inline void armv8pmu_write_evtype(int idx, unsigned long val)
>> {
>> u32 counter = ARMV8_IDX_TO_COUNTER(idx);
>> unsigned long mask = ARMV8_PMU_EVTYPE_EVENT |
>> @@ -921,6 +984,10 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
>> struct perf_event_attr *attr)
>> {
>> unsigned long config_base = 0;
>> + struct perf_event *perf_event = container_of(attr, struct perf_event,
>> + attr);
>
> Ditto
>
>> + struct arm_pmu *cpu_pmu = to_arm_pmu(perf_event->pmu);
>> + u32 th, th_max;
>>
>> if (attr->exclude_idle)
>> return -EPERM;
>> @@ -952,6 +1019,21 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
>> if (attr->exclude_user)
>> config_base |= ARMV8_PMU_EXCLUDE_EL0;
>>
>> + /*
>> + * Insert event counting threshold (FEAT_PMUv3_TH) values. If
>> + * FEAT_PMUv3_TH isn't implemented, then THWIDTH (threshold_max) will be
>> + * 0 and no values will be written.
>> + */
>> + th_max = threshold_max(cpu_pmu);
>> + if (IS_ENABLED(CONFIG_ARM64) && th_max) {
>> + th = min(armv8pmu_event_threshold(attr), th_max);
>> + if (th) {
>> + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TH, th);
>> + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TC,
>> + armv8pmu_event_threshold_control(attr));
>
> Ditto. As mentioned earlier this could have been avoided using a local variable.
>
But explained why it's like that on the previous review. It's completely
down to personal preference whether a local variable is used or not. I
don't see why this is a review comment, it doesn't affect how the code
behaves or the readability in any way.
And what could be avoided? No formatting rules have been broken.
>> + }
>> + }
>> +
>> /*
>> * Install the filter into config_base as this is used to
>> * construct the event type.
>> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
>> index ddd1fec86739..ccbc0f9a74d8 100644
>> --- a/include/linux/perf/arm_pmuv3.h
>> +++ b/include/linux/perf/arm_pmuv3.h
>> @@ -258,6 +258,7 @@
>> #define ARMV8_PMU_BUS_SLOTS_MASK 0xff
>> #define ARMV8_PMU_BUS_WIDTH_SHIFT 16
>> #define ARMV8_PMU_BUS_WIDTH_MASK 0xf
>> +#define ARMV8_PMU_THWIDTH GENMASK(23, 20)
>>
>> /*
>> * This code is really good
>
> Otherwise LGTM
>
> Reviewed-by: Anshuman Khandual <[email protected]>
>
Hello,
On Sun, Nov 26, 2023 at 9:50 PM Anshuman Khandual
<[email protected]> wrote:
>
>
>
> On 11/24/23 15:58, James Clark wrote:
> > Add documentation for the new Perf event open parameters and
> > the threshold_max capability file.
> >
> > Signed-off-by: James Clark <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Thanks,
Namhyung
On 11/27/23 15:54, James Clark wrote:
>
>
> On 27/11/2023 05:32, Anshuman Khandual wrote:
>>
>>
>> On 11/24/23 15:58, James Clark wrote:
>>> FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on
>>> events whose count meets a specified threshold condition. For example if
>>> PMEVTYPERn.TC (Threshold Control) is set to 0b101 (Greater than or
>>> equal, count), and the threshold is set to 2, then the PMU counter will
>>> now only increment by 1 when an event would have previously incremented
>>> the PMU counter by 2 or more on a single processor cycle.
>>>
>>> Three new Perf event config fields, 'threshold', 'threshold_compare' and
>>> 'threshold_count' have been added to control the feature.
>>> threshold_compare maps to the upper two bits of PMEVTYPERn.TC and
>>> threshold_count maps to the first bit of TC. These separate attributes
>>> have been picked rather than enumerating all the possible combinations
>>> of the TC field as in the Arm ARM. The attributes would be used on a
>>> Perf command line like this:
>>>
>>> $ perf stat -e stall_slot/threshold=2,threshold_compare=2/
>>>
>>> A new capability for reading out the maximum supported threshold value
>>> has also been added:
>>>
>>> $ cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max
>>>
>>> 0x000000ff
>>>
>>> If a threshold higher than threshold_max is provided, then no error is
>>> generated but the threshold is clamped to the max value. If
>>> FEAT_PMUv3_TH isn't implemented or a 32 bit kernel is running, then
>>> threshold_max reads zero, and neither the 'threshold' nor
>>> 'threshold_control' parameters will be used.
>>>
>>> The threshold is per PMU counter, and there are potentially different
>>> threshold_max values per PMU type on heterogeneous systems.
>>>
>>> Bits higher than 32 now need to be written into PMEVTYPER, so
>>> armv8pmu_write_evtype() has to be updated to take an unsigned long value
>>> rather than u32 which gives the correct behavior on both aarch32 and 64.
>>>
>>> Reviewed-by: Suzuki K Poulose <[email protected]>
>>> Signed-off-by: James Clark <[email protected]>
>>> ---
>>> drivers/perf/arm_pmuv3.c | 84 +++++++++++++++++++++++++++++++++-
>>> include/linux/perf/arm_pmuv3.h | 1 +
>>> 2 files changed, 84 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
>>> index 1d40d794f5e4..eb1ef84e1dbb 100644
>>> --- a/drivers/perf/arm_pmuv3.c
>>> +++ b/drivers/perf/arm_pmuv3.c
>>> @@ -15,6 +15,7 @@
>>> #include <clocksource/arm_arch_timer.h>
>>>
>>> #include <linux/acpi.h>
>>> +#include <linux/bitfield.h>
>>> #include <linux/clocksource.h>
>>> #include <linux/of.h>
>>> #include <linux/perf/arm_pmu.h>
>>> @@ -294,9 +295,20 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = {
>>> .is_visible = armv8pmu_event_attr_is_visible,
>>> };
>>>
>>> +#define THRESHOLD_LOW 2
>>> +#define THRESHOLD_HIGH 13
>>> +#define THRESHOLD_CNT 14
>>> +#define THRESHOLD_CMP_LO 15
>>> +#define THRESHOLD_CMP_HI 16
>>> +
>>> PMU_FORMAT_ATTR(event, "config:0-15");
>>> PMU_FORMAT_ATTR(long, "config1:0");
>>> PMU_FORMAT_ATTR(rdpmc, "config1:1");
>>> +PMU_FORMAT_ATTR(threshold, "config1:" __stringify(THRESHOLD_LOW) "-"
>>> + __stringify(THRESHOLD_HIGH));
>>> +PMU_FORMAT_ATTR(threshold_compare, "config1:" __stringify(THRESHOLD_CMP_LO) "-"
>>> + __stringify(THRESHOLD_CMP_HI));
>>> +PMU_FORMAT_ATTR(threshold_count, "config1:" __stringify(THRESHOLD_CNT));
>>
>> Small nit - could this be formatted better ? Is not that the column could go
>> upto 100 without setting off checkpatch.pl warning these days ?
>
> I think it looks perfectly readable to me, is there a specific
> formatting rule that's been broken? And no, it can't be unindented
> without exceeding the 100 char limit.
Fair enough. There is nothing broken in here, otherwise checkpatch.pl
would have warned. I was just wondering if the indentation could have
been avoided. But as you mentioned, it cannot be without going beyond
the 100 char limit.
>
>>
>>>
>>> static int sysctl_perf_user_access __read_mostly;
>>>
>>> @@ -310,10 +322,33 @@ static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
>>> return event->attr.config1 & 0x2;
>>> }
>>>
>>> +static inline u32 armv8pmu_event_threshold(struct perf_event_attr *attr)
>>> +{
>>> + return FIELD_GET(GENMASK(THRESHOLD_HIGH, THRESHOLD_LOW), attr->config1);
>>> +}
>>> +
>>> +static inline u8 armv8pmu_event_threshold_control(struct perf_event_attr *attr)
>>> +{
>>> + u8 th_compare = FIELD_GET(GENMASK(THRESHOLD_CMP_HI, THRESHOLD_CMP_LO),
>>> + attr->config1);
>>
>> Ditto
>>
>
> There's no rule saying that you can't indent _before_ 100 chars. Most of
> the code in this file is indented at 80 chars, and consistency is
> usually valued above other things.
Fair enough.
>
>>> + u8 th_count = FIELD_GET(BIT(THRESHOLD_CNT), attr->config1);
>>> +
>>> + /*
>>> + * The count bit is always the bottom bit of the full control field, and
>>> + * the comparison is the upper two bits, but it's not explicitly
>>> + * labelled in the Arm ARM. For the Perf interface we split it into two
>>> + * fields, so reconstruct it here.
>>> + */
>>> + return (th_compare << 1) | th_count;
>>> +}
>>> +
>>> static struct attribute *armv8_pmuv3_format_attrs[] = {
>>> &format_attr_event.attr,
>>> &format_attr_long.attr,
>>> &format_attr_rdpmc.attr,
>>> + &format_attr_threshold.attr,
>>> + &format_attr_threshold_compare.attr,
>>> + &format_attr_threshold_count.attr,
>>> NULL,
>>> };
>>>
>>> @@ -365,10 +400,38 @@ static ssize_t bus_width_show(struct device *dev, struct device_attribute *attr,
>>>
>>> static DEVICE_ATTR_RO(bus_width);
>>>
>>> +static u32 threshold_max(struct arm_pmu *cpu_pmu)
>>> +{
>>> + /*
>>> + * PMMIR.THWIDTH is readable and non-zero on aarch32, but it would be
>>> + * impossible to write the threshold in the upper 32 bits of PMEVTYPER.
>>> + */
>>> + if (IS_ENABLED(CONFIG_ARM))
>>> + return 0;
>>> +
>>> + /*
>>> + * The largest value that can be written to PMEVTYPER<n>_EL0.TH is
>>> + * (2 ^ PMMIR.THWIDTH) - 1.
>>> + */
>>> + return (1 << FIELD_GET(ARMV8_PMU_THWIDTH, cpu_pmu->reg_pmmir)) - 1;
>>> +}
>>> +
>>> +static ssize_t threshold_max_show(struct device *dev,
>>> + struct device_attribute *attr, char *page)
>>> +{
>>> + struct pmu *pmu = dev_get_drvdata(dev);
>>> + struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
>>> +
>>> + return sysfs_emit(page, "0x%08x\n", threshold_max(cpu_pmu));
>>> +}
>>> +
>>> +static DEVICE_ATTR_RO(threshold_max);
>>> +
>>> static struct attribute *armv8_pmuv3_caps_attrs[] = {
>>> &dev_attr_slots.attr,
>>> &dev_attr_bus_slots.attr,
>>> &dev_attr_bus_width.attr,
>>> + &dev_attr_threshold_max.attr,
>>> NULL,
>>> };
>>>
>>> @@ -552,7 +615,7 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
>>> armv8pmu_write_hw_counter(event, value);
>>> }
>>>
>>> -static inline void armv8pmu_write_evtype(int idx, u32 val)
>>> +static inline void armv8pmu_write_evtype(int idx, unsigned long val)
>>> {
>>> u32 counter = ARMV8_IDX_TO_COUNTER(idx);
>>> unsigned long mask = ARMV8_PMU_EVTYPE_EVENT |
>>> @@ -921,6 +984,10 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
>>> struct perf_event_attr *attr)
>>> {
>>> unsigned long config_base = 0;
>>> + struct perf_event *perf_event = container_of(attr, struct perf_event,
>>> + attr);
>>
>> Ditto
>>
>>> + struct arm_pmu *cpu_pmu = to_arm_pmu(perf_event->pmu);
>>> + u32 th, th_max;
>>>
>>> if (attr->exclude_idle)
>>> return -EPERM;
>>> @@ -952,6 +1019,21 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
>>> if (attr->exclude_user)
>>> config_base |= ARMV8_PMU_EXCLUDE_EL0;
>>>
>>> + /*
>>> + * Insert event counting threshold (FEAT_PMUv3_TH) values. If
>>> + * FEAT_PMUv3_TH isn't implemented, then THWIDTH (threshold_max) will be
>>> + * 0 and no values will be written.
>>> + */
>>> + th_max = threshold_max(cpu_pmu);
>>> + if (IS_ENABLED(CONFIG_ARM64) && th_max) {
>>> + th = min(armv8pmu_event_threshold(attr), th_max);
>>> + if (th) {
>>> + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TH, th);
>>> + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TC,
>>> + armv8pmu_event_threshold_control(attr));
>>
>> Ditto. As mentioned earlier this could have been avoided using a local variable.
>>
>
> But explained why it's like that on the previous review. It's completely
> down to personal preference whether a local variable is used or not. I
> don't see why this is a review comment, it doesn't affect how the code
> behaves or the readability in any way.
>
> And what could be avoided? No formatting rules have been broken.
Just an indentation could have been avoided, but again nothing is broken
here to be clear, neither functionality nor the formatting. It was just
a suggestion, which you could very well ignore.
>
>>> + }
>>> + }
>>> +
>>> /*
>>> * Install the filter into config_base as this is used to
>>> * construct the event type.
>>> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
>>> index ddd1fec86739..ccbc0f9a74d8 100644
>>> --- a/include/linux/perf/arm_pmuv3.h
>>> +++ b/include/linux/perf/arm_pmuv3.h
>>> @@ -258,6 +258,7 @@
>>> #define ARMV8_PMU_BUS_SLOTS_MASK 0xff
>>> #define ARMV8_PMU_BUS_WIDTH_SHIFT 16
>>> #define ARMV8_PMU_BUS_WIDTH_MASK 0xf
>>> +#define ARMV8_PMU_THWIDTH GENMASK(23, 20)
>>>
>>> /*
>>> * This code is really good
>>
>> Otherwise LGTM
>>
>> Reviewed-by: Anshuman Khandual <[email protected]>
>>
On 24/11/2023 10:28, James Clark wrote:
> Add documentation for the new Perf event open parameters and
> the threshold_max capability file.
>
> Signed-off-by: James Clark <[email protected]>
Reviewed-by: Suzuki K Poulose <[email protected]>
> ---
> Documentation/arch/arm64/perf.rst | 72 +++++++++++++++++++++++++++++++
> 1 file changed, 72 insertions(+)
>
> diff --git a/Documentation/arch/arm64/perf.rst b/Documentation/arch/arm64/perf.rst
> index 1f87b57c2332..41eee68951ff 100644
> --- a/Documentation/arch/arm64/perf.rst
> +++ b/Documentation/arch/arm64/perf.rst
> @@ -164,3 +164,75 @@ and should be used to mask the upper bits as needed.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
> .. _tools/lib/perf/tests/test-evsel.c:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c
> +
> +Event Counting Threshold
> +==========================================
> +
> +Overview
> +--------
> +
> +FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on
> +events whose count meets a specified threshold condition. For example if
> +threshold_compare is set to 2 ('Greater than or equal'), and the
> +threshold is set to 2, then the PMU counter will now only increment by
> +when an event would have previously incremented the PMU counter by 2 or
> +more on a single processor cycle.
> +
> +To increment by 1 after passing the threshold condition instead of the
> +number of events on that cycle, add the 'threshold_count' option to the
> +commandline.
> +
> +How-to
> +------
> +
> +These are the parameters for controlling the feature:
> +
> +.. list-table::
> + :header-rows: 1
> +
> + * - Parameter
> + - Description
> + * - threshold
> + - Value to threshold the event by. A value of 0 means that
> + thresholding is disabled and the other parameters have no effect.
> + * - threshold_compare
> + - | Comparison function to use, with the following values supported:
> + |
> + | 0: Not-equal
> + | 1: Equals
> + | 2: Greater-than-or-equal
> + | 3: Less-than
> + * - threshold_count
> + - If this is set, count by 1 after passing the threshold condition
> + instead of the value of the event on this cycle.
> +
> +The threshold, threshold_compare and threshold_count values can be
> +provided per event, for example:
> +
> +.. code-block:: sh
> +
> + perf stat -e stall_slot/threshold=2,threshold_compare=2/ \
> + -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/
> +
> +In this example the stall_slot event will count by 2 or more on every
> +cycle where 2 or more stalls happen. And dtlb_walk will count by 1 on
> +every cycle where the number of dtlb walks were less than 10.
> +
> +The maximum supported threshold value can be read from the caps of each
> +PMU, for example:
> +
> +.. code-block:: sh
> +
> + cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max
> +
> + 0x000000ff
> +
> +If a value higher than this is given, then it will be silently clamped
> +to the maximum. The highest possible maximum is 4095, as the config
> +field for threshold is limited to 12 bits, and the Perf tool will refuse
> +to parse higher values.
> +
> +If the PMU doesn't support FEAT_PMUv3_TH, then threshold_max will read
> +0, and both threshold and threshold_compare will be silently ignored.
> +threshold_max will also read as 0 on aarch32 guests, even if the host
> +is running on hardware with the feature.
On Fri, Nov 24, 2023 at 10:28:56AM +0000, James Clark wrote:
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 1d40d794f5e4..eb1ef84e1dbb 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -15,6 +15,7 @@
> #include <clocksource/arm_arch_timer.h>
>
> #include <linux/acpi.h>
> +#include <linux/bitfield.h>
> #include <linux/clocksource.h>
> #include <linux/of.h>
> #include <linux/perf/arm_pmu.h>
> @@ -294,9 +295,20 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = {
> .is_visible = armv8pmu_event_attr_is_visible,
> };
>
> +#define THRESHOLD_LOW 2
> +#define THRESHOLD_HIGH 13
> +#define THRESHOLD_CNT 14
> +#define THRESHOLD_CMP_LO 15
> +#define THRESHOLD_CMP_HI 16
Do you think THWIDTH might extend beyond 12 bits in future? If so, it might
be worth juggling these bits around a bit so it's not sandwiched between
'rdpmc' and 'threshold_compare'. I defer to your judgement, however.
> PMU_FORMAT_ATTR(event, "config:0-15");
> PMU_FORMAT_ATTR(long, "config1:0");
> PMU_FORMAT_ATTR(rdpmc, "config1:1");
> +PMU_FORMAT_ATTR(threshold, "config1:" __stringify(THRESHOLD_LOW) "-"
> + __stringify(THRESHOLD_HIGH));
> +PMU_FORMAT_ATTR(threshold_compare, "config1:" __stringify(THRESHOLD_CMP_LO) "-"
> + __stringify(THRESHOLD_CMP_HI));
> +PMU_FORMAT_ATTR(threshold_count, "config1:" __stringify(THRESHOLD_CNT));
>
> static int sysctl_perf_user_access __read_mostly;
>
> @@ -310,10 +322,33 @@ static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
> return event->attr.config1 & 0x2;
> }
>
> +static inline u32 armv8pmu_event_threshold(struct perf_event_attr *attr)
> +{
> + return FIELD_GET(GENMASK(THRESHOLD_HIGH, THRESHOLD_LOW), attr->config1);
> +}
> +
> +static inline u8 armv8pmu_event_threshold_control(struct perf_event_attr *attr)
You can drop the 'inline's for these functions (and, in fact, this whole
file could do with that cleanup :)
> +{
> + u8 th_compare = FIELD_GET(GENMASK(THRESHOLD_CMP_HI, THRESHOLD_CMP_LO),
> + attr->config1);
> + u8 th_count = FIELD_GET(BIT(THRESHOLD_CNT), attr->config1);
I think this is correct, but you might want to look at how we handle this
in the SPE driver as I think it ends up looking cleaner and makes it pretty
obvious which bits correspond to the user ABI (i.e. config fields) and which
bits are part of architectural registers. I'm not saying you have to do it
that way, but please take a look if you haven't already.
> + /*
> + * The count bit is always the bottom bit of the full control field, and
> + * the comparison is the upper two bits, but it's not explicitly
> + * labelled in the Arm ARM. For the Perf interface we split it into two
> + * fields, so reconstruct it here.
> + */
> + return (th_compare << 1) | th_count;
> +}
> +
> static struct attribute *armv8_pmuv3_format_attrs[] = {
> &format_attr_event.attr,
> &format_attr_long.attr,
> &format_attr_rdpmc.attr,
> + &format_attr_threshold.attr,
> + &format_attr_threshold_compare.attr,
> + &format_attr_threshold_count.attr,
> NULL,
> };
>
> @@ -365,10 +400,38 @@ static ssize_t bus_width_show(struct device *dev, struct device_attribute *attr,
>
> static DEVICE_ATTR_RO(bus_width);
>
> +static u32 threshold_max(struct arm_pmu *cpu_pmu)
> +{
> + /*
> + * PMMIR.THWIDTH is readable and non-zero on aarch32, but it would be
> + * impossible to write the threshold in the upper 32 bits of PMEVTYPER.
> + */
> + if (IS_ENABLED(CONFIG_ARM))
> + return 0;
> +
> + /*
> + * The largest value that can be written to PMEVTYPER<n>_EL0.TH is
> + * (2 ^ PMMIR.THWIDTH) - 1.
> + */
> + return (1 << FIELD_GET(ARMV8_PMU_THWIDTH, cpu_pmu->reg_pmmir)) - 1;
> +}
> +
> +static ssize_t threshold_max_show(struct device *dev,
> + struct device_attribute *attr, char *page)
> +{
> + struct pmu *pmu = dev_get_drvdata(dev);
> + struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
> +
> + return sysfs_emit(page, "0x%08x\n", threshold_max(cpu_pmu));
> +}
> +
> +static DEVICE_ATTR_RO(threshold_max);
> +
> static struct attribute *armv8_pmuv3_caps_attrs[] = {
> &dev_attr_slots.attr,
> &dev_attr_bus_slots.attr,
> &dev_attr_bus_width.attr,
> + &dev_attr_threshold_max.attr,
> NULL,
> };
>
> @@ -552,7 +615,7 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
> armv8pmu_write_hw_counter(event, value);
> }
>
> -static inline void armv8pmu_write_evtype(int idx, u32 val)
> +static inline void armv8pmu_write_evtype(int idx, unsigned long val)
> {
> u32 counter = ARMV8_IDX_TO_COUNTER(idx);
> unsigned long mask = ARMV8_PMU_EVTYPE_EVENT |
> @@ -921,6 +984,10 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
> struct perf_event_attr *attr)
> {
> unsigned long config_base = 0;
> + struct perf_event *perf_event = container_of(attr, struct perf_event,
> + attr);
> + struct arm_pmu *cpu_pmu = to_arm_pmu(perf_event->pmu);
> + u32 th, th_max;
>
> if (attr->exclude_idle)
> return -EPERM;
> @@ -952,6 +1019,21 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
> if (attr->exclude_user)
> config_base |= ARMV8_PMU_EXCLUDE_EL0;
>
> + /*
> + * Insert event counting threshold (FEAT_PMUv3_TH) values. If
> + * FEAT_PMUv3_TH isn't implemented, then THWIDTH (threshold_max) will be
> + * 0 and no values will be written.
> + */
> + th_max = threshold_max(cpu_pmu);
> + if (IS_ENABLED(CONFIG_ARM64) && th_max) {
Why is the IS_ENABLED() check needed here?
> + th = min(armv8pmu_event_threshold(attr), th_max);
> + if (th) {
Why is it useful to take the minimum here? If userspace asks for a value
bigger than the maximum support threshold, shouldn't we return an error
rather than silently clamp it?
> + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TH, th);
> + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TC,
> + armv8pmu_event_threshold_control(attr));
> + }
> + }
> +
> /*
> * Install the filter into config_base as this is used to
> * construct the event type.
> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
> index ddd1fec86739..ccbc0f9a74d8 100644
> --- a/include/linux/perf/arm_pmuv3.h
> +++ b/include/linux/perf/arm_pmuv3.h
> @@ -258,6 +258,7 @@
> #define ARMV8_PMU_BUS_SLOTS_MASK 0xff
> #define ARMV8_PMU_BUS_WIDTH_SHIFT 16
> #define ARMV8_PMU_BUS_WIDTH_MASK 0xf
> +#define ARMV8_PMU_THWIDTH GENMASK(23, 20)
It's a bit messy having a mixture of GENMASK and MASK/SHIFT pairs. Please
can you either update what's there to use GENMASK, or use SHIFT/MASK for the
new addition?
Will
On 05/12/2023 13:14, Will Deacon wrote:
> On Fri, Nov 24, 2023 at 10:28:56AM +0000, James Clark wrote:
>> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
>> index 1d40d794f5e4..eb1ef84e1dbb 100644
>> --- a/drivers/perf/arm_pmuv3.c
>> +++ b/drivers/perf/arm_pmuv3.c
>> @@ -15,6 +15,7 @@
>> #include <clocksource/arm_arch_timer.h>
>>
>> #include <linux/acpi.h>
>> +#include <linux/bitfield.h>
>> #include <linux/clocksource.h>
>> #include <linux/of.h>
>> #include <linux/perf/arm_pmu.h>
>> @@ -294,9 +295,20 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = {
>> .is_visible = armv8pmu_event_attr_is_visible,
>> };
>>
>> +#define THRESHOLD_LOW 2
>> +#define THRESHOLD_HIGH 13
>> +#define THRESHOLD_CNT 14
>> +#define THRESHOLD_CMP_LO 15
>> +#define THRESHOLD_CMP_HI 16
>
> Do you think THWIDTH might extend beyond 12 bits in future? If so, it might
> be worth juggling these bits around a bit so it's not sandwiched between
> 'rdpmc' and 'threshold_compare'. I defer to your judgement, however.
>
It's possible, both PMMIR.THWIDTH and PMEVTYPER.TH currently have unused
bits above them. I can easily move threshold to the end, I suppose that
covers us at least until someone adds a new field above that.
>> PMU_FORMAT_ATTR(event, "config:0-15");
>> PMU_FORMAT_ATTR(long, "config1:0");
>> PMU_FORMAT_ATTR(rdpmc, "config1:1");
>> +PMU_FORMAT_ATTR(threshold, "config1:" __stringify(THRESHOLD_LOW) "-"
>> + __stringify(THRESHOLD_HIGH));
>> +PMU_FORMAT_ATTR(threshold_compare, "config1:" __stringify(THRESHOLD_CMP_LO) "-"
>> + __stringify(THRESHOLD_CMP_HI));
>> +PMU_FORMAT_ATTR(threshold_count, "config1:" __stringify(THRESHOLD_CNT));
>>
>> static int sysctl_perf_user_access __read_mostly;
>>
>> @@ -310,10 +322,33 @@ static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
>> return event->attr.config1 & 0x2;
>> }
>>
>> +static inline u32 armv8pmu_event_threshold(struct perf_event_attr *attr)
>> +{
>> + return FIELD_GET(GENMASK(THRESHOLD_HIGH, THRESHOLD_LOW), attr->config1);
>> +}
>> +
>> +static inline u8 armv8pmu_event_threshold_control(struct perf_event_attr *attr)
>
> You can drop the 'inline's for these functions (and, in fact, this whole
> file could do with that cleanup :)
>
Will do.
>> +{
>> + u8 th_compare = FIELD_GET(GENMASK(THRESHOLD_CMP_HI, THRESHOLD_CMP_LO),
>> + attr->config1);
>> + u8 th_count = FIELD_GET(BIT(THRESHOLD_CNT), attr->config1);
>
> I think this is correct, but you might want to look at how we handle this
> in the SPE driver as I think it ends up looking cleaner and makes it pretty
> obvious which bits correspond to the user ABI (i.e. config fields) and which
> bits are part of architectural registers. I'm not saying you have to do it
> that way, but please take a look if you haven't already.
>
Yeah I could take the GEN_PMU_FORMAT_ATTR() etc macros out of there and
re-use them here too. And also for the other existing configs in
arm_pmuv3.c.
>> + /*
>> + * The count bit is always the bottom bit of the full control field, and
>> + * the comparison is the upper two bits, but it's not explicitly
>> + * labelled in the Arm ARM. For the Perf interface we split it into two
>> + * fields, so reconstruct it here.
>> + */
>> + return (th_compare << 1) | th_count;
>> +}
>> +
>> static struct attribute *armv8_pmuv3_format_attrs[] = {
>> &format_attr_event.attr,
>> &format_attr_long.attr,
>> &format_attr_rdpmc.attr,
>> + &format_attr_threshold.attr,
>> + &format_attr_threshold_compare.attr,
>> + &format_attr_threshold_count.attr,
>> NULL,
>> };
>>
>> @@ -365,10 +400,38 @@ static ssize_t bus_width_show(struct device *dev, struct device_attribute *attr,
>>
>> static DEVICE_ATTR_RO(bus_width);
>>
>> +static u32 threshold_max(struct arm_pmu *cpu_pmu)
>> +{
>> + /*
>> + * PMMIR.THWIDTH is readable and non-zero on aarch32, but it would be
>> + * impossible to write the threshold in the upper 32 bits of PMEVTYPER.
>> + */
>> + if (IS_ENABLED(CONFIG_ARM))
>> + return 0;
>> +
>> + /*
>> + * The largest value that can be written to PMEVTYPER<n>_EL0.TH is
>> + * (2 ^ PMMIR.THWIDTH) - 1.
>> + */
>> + return (1 << FIELD_GET(ARMV8_PMU_THWIDTH, cpu_pmu->reg_pmmir)) - 1;
>> +}
>> +
>> +static ssize_t threshold_max_show(struct device *dev,
>> + struct device_attribute *attr, char *page)
>> +{
>> + struct pmu *pmu = dev_get_drvdata(dev);
>> + struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
>> +
>> + return sysfs_emit(page, "0x%08x\n", threshold_max(cpu_pmu));
>> +}
>> +
>> +static DEVICE_ATTR_RO(threshold_max);
>> +
>> static struct attribute *armv8_pmuv3_caps_attrs[] = {
>> &dev_attr_slots.attr,
>> &dev_attr_bus_slots.attr,
>> &dev_attr_bus_width.attr,
>> + &dev_attr_threshold_max.attr,
>> NULL,
>> };
>>
>> @@ -552,7 +615,7 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
>> armv8pmu_write_hw_counter(event, value);
>> }
>>
>> -static inline void armv8pmu_write_evtype(int idx, u32 val)
>> +static inline void armv8pmu_write_evtype(int idx, unsigned long val)
>> {
>> u32 counter = ARMV8_IDX_TO_COUNTER(idx);
>> unsigned long mask = ARMV8_PMU_EVTYPE_EVENT |
>> @@ -921,6 +984,10 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
>> struct perf_event_attr *attr)
>> {
>> unsigned long config_base = 0;
>> + struct perf_event *perf_event = container_of(attr, struct perf_event,
>> + attr);
>> + struct arm_pmu *cpu_pmu = to_arm_pmu(perf_event->pmu);
>> + u32 th, th_max;
>>
>> if (attr->exclude_idle)
>> return -EPERM;
>> @@ -952,6 +1019,21 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
>> if (attr->exclude_user)
>> config_base |= ARMV8_PMU_EXCLUDE_EL0;
>>
>> + /*
>> + * Insert event counting threshold (FEAT_PMUv3_TH) values. If
>> + * FEAT_PMUv3_TH isn't implemented, then THWIDTH (threshold_max) will be
>> + * 0 and no values will be written.
>> + */
>> + th_max = threshold_max(cpu_pmu);
>> + if (IS_ENABLED(CONFIG_ARM64) && th_max) {
>
> Why is the IS_ENABLED() check needed here?
>
The FIELD_PREP() below would cause a compilation error on arm32 because
TH and TC are above 32 bits. I can add a comment.
>> + th = min(armv8pmu_event_threshold(attr), th_max);
>> + if (th) {
>
> Why is it useful to take the minimum here? If userspace asks for a value
> bigger than the maximum support threshold, shouldn't we return an error
> rather than silently clamp it?
>
I think it probably was just so I didn't have to think about what would
happen when the value varied between cores.
But you're right, it looks like I can add a validation function to
struct arm_pmu and call it from armpmu_event_init(). armpmu->map_event()
and armpmu->set_event_filter() are already called from there so I think
the validation could technically be added to one of those, but that's
probably a bit hacky. I don't know if you have any preference for where
the threshold validation should happen?
>> + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TH, th);
>> + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TC,
>> + armv8pmu_event_threshold_control(attr));
>> + }
>> + }
>> +
>> /*
>> * Install the filter into config_base as this is used to
>> * construct the event type.
>> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
>> index ddd1fec86739..ccbc0f9a74d8 100644
>> --- a/include/linux/perf/arm_pmuv3.h
>> +++ b/include/linux/perf/arm_pmuv3.h
>> @@ -258,6 +258,7 @@
>> #define ARMV8_PMU_BUS_SLOTS_MASK 0xff
>> #define ARMV8_PMU_BUS_WIDTH_SHIFT 16
>> #define ARMV8_PMU_BUS_WIDTH_MASK 0xf
>> +#define ARMV8_PMU_THWIDTH GENMASK(23, 20)
>
> It's a bit messy having a mixture of GENMASK and MASK/SHIFT pairs. Please
> can you either update what's there to use GENMASK, or use SHIFT/MASK for the
> new addition?
>
> Will
>
Yep will do. Thanks for the review.
James
On 05/12/2023 15:05, James Clark wrote:
>
>
> On 05/12/2023 13:14, Will Deacon wrote:
>> On Fri, Nov 24, 2023 at 10:28:56AM +0000, James Clark wrote:
>>> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
>>> index 1d40d794f5e4..eb1ef84e1dbb 100644
>>> --- a/drivers/perf/arm_pmuv3.c
>>> +++ b/drivers/perf/arm_pmuv3.c
>>> @@ -15,6 +15,7 @@
>>> #include <clocksource/arm_arch_timer.h>
>>>
>>> #include <linux/acpi.h>
>>> +#include <linux/bitfield.h>
>>> #include <linux/clocksource.h>
>>> #include <linux/of.h>
>>> #include <linux/perf/arm_pmu.h>
>>> @@ -294,9 +295,20 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = {
>>> .is_visible = armv8pmu_event_attr_is_visible,
>>> };
>>>
>>> +#define THRESHOLD_LOW 2
>>> +#define THRESHOLD_HIGH 13
>>> +#define THRESHOLD_CNT 14
>>> +#define THRESHOLD_CMP_LO 15
>>> +#define THRESHOLD_CMP_HI 16
>>
>> Do you think THWIDTH might extend beyond 12 bits in future? If so, it might
>> be worth juggling these bits around a bit so it's not sandwiched between
>> 'rdpmc' and 'threshold_compare'. I defer to your judgement, however.
>>
>
> It's possible, both PMMIR.THWIDTH and PMEVTYPER.TH currently have unused
> bits above them. I can easily move threshold to the end, I suppose that
> covers us at least until someone adds a new field above that.
>
>>> PMU_FORMAT_ATTR(event, "config:0-15");
>>> PMU_FORMAT_ATTR(long, "config1:0");
>>> PMU_FORMAT_ATTR(rdpmc, "config1:1");
>>> +PMU_FORMAT_ATTR(threshold, "config1:" __stringify(THRESHOLD_LOW) "-"
>>> + __stringify(THRESHOLD_HIGH));
>>> +PMU_FORMAT_ATTR(threshold_compare, "config1:" __stringify(THRESHOLD_CMP_LO) "-"
>>> + __stringify(THRESHOLD_CMP_HI));
>>> +PMU_FORMAT_ATTR(threshold_count, "config1:" __stringify(THRESHOLD_CNT));
>>>
>>> static int sysctl_perf_user_access __read_mostly;
>>>
>>> @@ -310,10 +322,33 @@ static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
>>> return event->attr.config1 & 0x2;
>>> }
>>>
>>> +static inline u32 armv8pmu_event_threshold(struct perf_event_attr *attr)
>>> +{
>>> + return FIELD_GET(GENMASK(THRESHOLD_HIGH, THRESHOLD_LOW), attr->config1);
>>> +}
>>> +
>>> +static inline u8 armv8pmu_event_threshold_control(struct perf_event_attr *attr)
>>
>> You can drop the 'inline's for these functions (and, in fact, this whole
>> file could do with that cleanup :)
>>
>
> Will do.
>
>>> +{
>>> + u8 th_compare = FIELD_GET(GENMASK(THRESHOLD_CMP_HI, THRESHOLD_CMP_LO),
>>> + attr->config1);
>>> + u8 th_count = FIELD_GET(BIT(THRESHOLD_CNT), attr->config1);
>>
>> I think this is correct, but you might want to look at how we handle this
>> in the SPE driver as I think it ends up looking cleaner and makes it pretty
>> obvious which bits correspond to the user ABI (i.e. config fields) and which
>> bits are part of architectural registers. I'm not saying you have to do it
>> that way, but please take a look if you haven't already.
>>
>
> Yeah I could take the GEN_PMU_FORMAT_ATTR() etc macros out of there and
> re-use them here too. And also for the other existing configs in
> arm_pmuv3.c.
>
>>> + /*
>>> + * The count bit is always the bottom bit of the full control field, and
>>> + * the comparison is the upper two bits, but it's not explicitly
>>> + * labelled in the Arm ARM. For the Perf interface we split it into two
>>> + * fields, so reconstruct it here.
>>> + */
>>> + return (th_compare << 1) | th_count;
>>> +}
>>> +
>>> static struct attribute *armv8_pmuv3_format_attrs[] = {
>>> &format_attr_event.attr,
>>> &format_attr_long.attr,
>>> &format_attr_rdpmc.attr,
>>> + &format_attr_threshold.attr,
>>> + &format_attr_threshold_compare.attr,
>>> + &format_attr_threshold_count.attr,
>>> NULL,
>>> };
>>>
>>> @@ -365,10 +400,38 @@ static ssize_t bus_width_show(struct device *dev, struct device_attribute *attr,
>>>
>>> static DEVICE_ATTR_RO(bus_width);
>>>
>>> +static u32 threshold_max(struct arm_pmu *cpu_pmu)
>>> +{
>>> + /*
>>> + * PMMIR.THWIDTH is readable and non-zero on aarch32, but it would be
>>> + * impossible to write the threshold in the upper 32 bits of PMEVTYPER.
>>> + */
>>> + if (IS_ENABLED(CONFIG_ARM))
>>> + return 0;
>>> +
>>> + /*
>>> + * The largest value that can be written to PMEVTYPER<n>_EL0.TH is
>>> + * (2 ^ PMMIR.THWIDTH) - 1.
>>> + */
>>> + return (1 << FIELD_GET(ARMV8_PMU_THWIDTH, cpu_pmu->reg_pmmir)) - 1;
>>> +}
>>> +
>>> +static ssize_t threshold_max_show(struct device *dev,
>>> + struct device_attribute *attr, char *page)
>>> +{
>>> + struct pmu *pmu = dev_get_drvdata(dev);
>>> + struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
>>> +
>>> + return sysfs_emit(page, "0x%08x\n", threshold_max(cpu_pmu));
>>> +}
>>> +
>>> +static DEVICE_ATTR_RO(threshold_max);
>>> +
>>> static struct attribute *armv8_pmuv3_caps_attrs[] = {
>>> &dev_attr_slots.attr,
>>> &dev_attr_bus_slots.attr,
>>> &dev_attr_bus_width.attr,
>>> + &dev_attr_threshold_max.attr,
>>> NULL,
>>> };
>>>
>>> @@ -552,7 +615,7 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value)
>>> armv8pmu_write_hw_counter(event, value);
>>> }
>>>
>>> -static inline void armv8pmu_write_evtype(int idx, u32 val)
>>> +static inline void armv8pmu_write_evtype(int idx, unsigned long val)
>>> {
>>> u32 counter = ARMV8_IDX_TO_COUNTER(idx);
>>> unsigned long mask = ARMV8_PMU_EVTYPE_EVENT |
>>> @@ -921,6 +984,10 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
>>> struct perf_event_attr *attr)
>>> {
>>> unsigned long config_base = 0;
>>> + struct perf_event *perf_event = container_of(attr, struct perf_event,
>>> + attr);
>>> + struct arm_pmu *cpu_pmu = to_arm_pmu(perf_event->pmu);
>>> + u32 th, th_max;
>>>
>>> if (attr->exclude_idle)
>>> return -EPERM;
>>> @@ -952,6 +1019,21 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
>>> if (attr->exclude_user)
>>> config_base |= ARMV8_PMU_EXCLUDE_EL0;
>>>
>>> + /*
>>> + * Insert event counting threshold (FEAT_PMUv3_TH) values. If
>>> + * FEAT_PMUv3_TH isn't implemented, then THWIDTH (threshold_max) will be
>>> + * 0 and no values will be written.
>>> + */
>>> + th_max = threshold_max(cpu_pmu);
>>> + if (IS_ENABLED(CONFIG_ARM64) && th_max) {
>>
>> Why is the IS_ENABLED() check needed here?
>>
>
> The FIELD_PREP() below would cause a compilation error on arm32 because
> TH and TC are above 32 bits. I can add a comment.
>
>>> + th = min(armv8pmu_event_threshold(attr), th_max);
>>> + if (th) {
>>
>> Why is it useful to take the minimum here? If userspace asks for a value
>> bigger than the maximum support threshold, shouldn't we return an error
>> rather than silently clamp it?
>>
>
> I think it probably was just so I didn't have to think about what would
> happen when the value varied between cores.
>
> But you're right, it looks like I can add a validation function to
> struct arm_pmu and call it from armpmu_event_init(). armpmu->map_event()
> and armpmu->set_event_filter() are already called from there so I think
> the validation could technically be added to one of those, but that's
> probably a bit hacky. I don't know if you have any preference for where
> the threshold validation should happen?
I think the other reason was to make sure we never set those fields for
something that doesn't support the feature ? (Given th_max is set to 0
in that case). But I agree that validation is a better approach than
silently masking it.
Suzuki
>
>>> + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TH, th);
>>> + config_base |= FIELD_PREP(ARMV8_PMU_EVTYPE_TC,
>>> + armv8pmu_event_threshold_control(attr));
>>> + }
>>> + }
>>> +
>>> /*
>>> * Install the filter into config_base as this is used to
>>> * construct the event type.
>>> diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h
>>> index ddd1fec86739..ccbc0f9a74d8 100644
>>> --- a/include/linux/perf/arm_pmuv3.h
>>> +++ b/include/linux/perf/arm_pmuv3.h
>>> @@ -258,6 +258,7 @@
>>> #define ARMV8_PMU_BUS_SLOTS_MASK 0xff
>>> #define ARMV8_PMU_BUS_WIDTH_SHIFT 16
>>> #define ARMV8_PMU_BUS_WIDTH_MASK 0xf
>>> +#define ARMV8_PMU_THWIDTH GENMASK(23, 20)
>>
>> It's a bit messy having a mixture of GENMASK and MASK/SHIFT pairs. Please
>> can you either update what's there to use GENMASK, or use SHIFT/MASK for the
>> new addition?
>>
>> Will
>>
>
> Yep will do. Thanks for the review.
>
> James