2022-09-29 08:41:03

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V3 0/7] arm64/perf: Enable branch stack sampling

This series enables perf branch stack sampling support on arm64 platform
via a new arch feature called Branch Record Buffer Extension (BRBE). All
relevant register definitions could be accessed here.

https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers

This series applies on v6.0-rc5 after the BRBE related perf ABI changes series
(V7) that was posted earlier, and a branch sample filter helper patch.

https://lore.kernel.org/all/[email protected]/

https://lore.kernel.org/all/[email protected]/

Changes in V3:

- Moved brbe_stack from the stack and now dynamically allocated
- Return PERF_BR_PRIV_UNKNOWN instead of -1 in brbe_fetch_perf_priv()
- Moved BRBIDR0, BRBCR, BRBFCR registers and fields into tools/sysreg
- Created dummy BRBINF_EL1 field definitions in tools/sysreg
- Dropped ARMPMU_EVT_PRIV framework which cached perfmon_capable()
- Both exception and exception return branche records are now captured
only if the event has PERF_SAMPLE_BRANCH_KERNEL which would already
been checked in generic perf via perf_allow_kernel()

Changes in V2:

https://lore.kernel.org/all/[email protected]/

- Dropped branch sample filter helpers consolidation patch from this series
- Added new hw_perf_event.flags element ARMPMU_EVT_PRIV to cache perfmon_capable()
- Use cached perfmon_capable() while configuring BRBE branch record filters

Changes in V1:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

- Added CONFIG_PERF_EVENTS wrapper for all branch sample filter helpers
- Process new perf branch types via PERF_BR_EXTEND_ABI

Changes in RFC V2:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

- Added branch_sample_priv() while consolidating other branch sample filter helpers
- Changed all SYS_BRBXXXN_EL1 register definition encodings per Marc
- Changed the BRBE driver as per proposed BRBE related perf ABI changes (V5)
- Added documentation for struct arm_pmu changes, updated commit message
- Updated commit message for BRBE detection infrastructure patch
- PERF_SAMPLE_BRANCH_KERNEL gets checked during arm event init (outside the driver)
- Branch privilege state capture mechanism has now moved inside the driver

Changes in RFC V1:

https://lore.kernel.org/all/[email protected]/

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: James Clark <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Suzuki Poulose <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]


Anshuman Khandual (7):
arm64/perf: Add BRBE registers and fields
arm64/perf: Update struct arm_pmu for BRBE
arm64/perf: Update struct pmu_hw_events for BRBE
driver/perf/arm_pmu_platform: Add support for BRBE attributes detection
arm64/perf: Drive BRBE from perf event states
arm64/perf: Add BRBE driver
arm64/perf: Enable branch stack sampling

arch/arm64/include/asm/sysreg.h | 107 ++++++++
arch/arm64/kernel/perf_event.c | 48 ++++
arch/arm64/tools/sysreg | 159 ++++++++++++
drivers/perf/Kconfig | 11 +
drivers/perf/Makefile | 1 +
drivers/perf/arm_pmu.c | 73 +++++-
drivers/perf/arm_pmu_brbe.c | 447 ++++++++++++++++++++++++++++++++
drivers/perf/arm_pmu_brbe.h | 259 ++++++++++++++++++
drivers/perf/arm_pmu_platform.c | 34 +++
include/linux/perf/arm_pmu.h | 67 +++++
10 files changed, 1203 insertions(+), 3 deletions(-)
create mode 100644 drivers/perf/arm_pmu_brbe.c
create mode 100644 drivers/perf/arm_pmu_brbe.h

--
2.25.1


2022-09-29 08:48:57

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection

This adds arm pmu infrastrure to probe BRBE implementation's attributes via
driver exported callbacks later. The actual BRBE feature detection will be
added by the driver itself.

CPU specific BRBE entries, cycle count, format support gets detected during
PMU init. This information gets saved in per-cpu struct pmu_hw_events which
later helps in operating BRBE during a perf event context.

Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
index 933b96e243b8..acdc445081aa 100644
--- a/drivers/perf/arm_pmu_platform.c
+++ b/drivers/perf/arm_pmu_platform.c
@@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
return err;
}

+static void arm_brbe_probe_cpu(void *info)
+{
+ struct pmu_hw_events *hw_events;
+ struct arm_pmu *armpmu = info;
+
+ /*
+ * Return from here, if BRBE driver has not been
+ * implemented for this PMU. This helps prevent
+ * kernel crash later when brbe_probe() will be
+ * called on the PMU.
+ */
+ if (!armpmu->brbe_probe)
+ return;
+
+ hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
+ armpmu->brbe_probe(hw_events);
+}
+
+static int armpmu_request_brbe(struct arm_pmu *armpmu)
+{
+ int cpu, err = 0;
+
+ for_each_cpu(cpu, &armpmu->supported_cpus) {
+ err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
+ if (err)
+ return err;
+ }
+ return err;
+}
+
static void armpmu_free_irqs(struct arm_pmu *armpmu)
{
int cpu;
@@ -229,6 +259,10 @@ int arm_pmu_device_probe(struct platform_device *pdev,
if (ret)
goto out_free_irqs;

+ ret = armpmu_request_brbe(pmu);
+ if (ret)
+ goto out_free_irqs;
+
ret = armpmu_register(pmu);
if (ret) {
dev_err(dev, "failed to register PMU devices!\n");
--
2.25.1

2022-09-29 08:51:17

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V3 1/7] arm64/perf: Add BRBE registers and fields

This adds BRBE related register definitions and various other related field
macros there in. These will be used subsequently in a BRBE driver which is
being added later on.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
arch/arm64/include/asm/sysreg.h | 107 +++++++++++++++++++++
arch/arm64/tools/sysreg | 159 ++++++++++++++++++++++++++++++++
2 files changed, 266 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 818df938a7ad..1cf9345730f0 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -161,6 +161,109 @@
#define SYS_DBGDTRTX_EL0 sys_reg(2, 3, 0, 5, 0)
#define SYS_DBGVCR32_EL2 sys_reg(2, 4, 0, 7, 0)

+#define __SYS_BRBINFO(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 0))
+#define __SYS_BRBSRC(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 1))
+#define __SYS_BRBTGT(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 2))
+
+#define SYS_BRBINF0_EL1 __SYS_BRBINFO(0)
+#define SYS_BRBINF1_EL1 __SYS_BRBINFO(1)
+#define SYS_BRBINF2_EL1 __SYS_BRBINFO(2)
+#define SYS_BRBINF3_EL1 __SYS_BRBINFO(3)
+#define SYS_BRBINF4_EL1 __SYS_BRBINFO(4)
+#define SYS_BRBINF5_EL1 __SYS_BRBINFO(5)
+#define SYS_BRBINF6_EL1 __SYS_BRBINFO(6)
+#define SYS_BRBINF7_EL1 __SYS_BRBINFO(7)
+#define SYS_BRBINF8_EL1 __SYS_BRBINFO(8)
+#define SYS_BRBINF9_EL1 __SYS_BRBINFO(9)
+#define SYS_BRBINF10_EL1 __SYS_BRBINFO(10)
+#define SYS_BRBINF11_EL1 __SYS_BRBINFO(11)
+#define SYS_BRBINF12_EL1 __SYS_BRBINFO(12)
+#define SYS_BRBINF13_EL1 __SYS_BRBINFO(13)
+#define SYS_BRBINF14_EL1 __SYS_BRBINFO(14)
+#define SYS_BRBINF15_EL1 __SYS_BRBINFO(15)
+#define SYS_BRBINF16_EL1 __SYS_BRBINFO(16)
+#define SYS_BRBINF17_EL1 __SYS_BRBINFO(17)
+#define SYS_BRBINF18_EL1 __SYS_BRBINFO(18)
+#define SYS_BRBINF19_EL1 __SYS_BRBINFO(19)
+#define SYS_BRBINF20_EL1 __SYS_BRBINFO(20)
+#define SYS_BRBINF21_EL1 __SYS_BRBINFO(21)
+#define SYS_BRBINF22_EL1 __SYS_BRBINFO(22)
+#define SYS_BRBINF23_EL1 __SYS_BRBINFO(23)
+#define SYS_BRBINF24_EL1 __SYS_BRBINFO(24)
+#define SYS_BRBINF25_EL1 __SYS_BRBINFO(25)
+#define SYS_BRBINF26_EL1 __SYS_BRBINFO(26)
+#define SYS_BRBINF27_EL1 __SYS_BRBINFO(27)
+#define SYS_BRBINF28_EL1 __SYS_BRBINFO(28)
+#define SYS_BRBINF29_EL1 __SYS_BRBINFO(29)
+#define SYS_BRBINF30_EL1 __SYS_BRBINFO(30)
+#define SYS_BRBINF31_EL1 __SYS_BRBINFO(31)
+
+#define SYS_BRBSRC0_EL1 __SYS_BRBSRC(0)
+#define SYS_BRBSRC1_EL1 __SYS_BRBSRC(1)
+#define SYS_BRBSRC2_EL1 __SYS_BRBSRC(2)
+#define SYS_BRBSRC3_EL1 __SYS_BRBSRC(3)
+#define SYS_BRBSRC4_EL1 __SYS_BRBSRC(4)
+#define SYS_BRBSRC5_EL1 __SYS_BRBSRC(5)
+#define SYS_BRBSRC6_EL1 __SYS_BRBSRC(6)
+#define SYS_BRBSRC7_EL1 __SYS_BRBSRC(7)
+#define SYS_BRBSRC8_EL1 __SYS_BRBSRC(8)
+#define SYS_BRBSRC9_EL1 __SYS_BRBSRC(9)
+#define SYS_BRBSRC10_EL1 __SYS_BRBSRC(10)
+#define SYS_BRBSRC11_EL1 __SYS_BRBSRC(11)
+#define SYS_BRBSRC12_EL1 __SYS_BRBSRC(12)
+#define SYS_BRBSRC13_EL1 __SYS_BRBSRC(13)
+#define SYS_BRBSRC14_EL1 __SYS_BRBSRC(14)
+#define SYS_BRBSRC15_EL1 __SYS_BRBSRC(15)
+#define SYS_BRBSRC16_EL1 __SYS_BRBSRC(16)
+#define SYS_BRBSRC17_EL1 __SYS_BRBSRC(17)
+#define SYS_BRBSRC18_EL1 __SYS_BRBSRC(18)
+#define SYS_BRBSRC19_EL1 __SYS_BRBSRC(19)
+#define SYS_BRBSRC20_EL1 __SYS_BRBSRC(20)
+#define SYS_BRBSRC21_EL1 __SYS_BRBSRC(21)
+#define SYS_BRBSRC22_EL1 __SYS_BRBSRC(22)
+#define SYS_BRBSRC23_EL1 __SYS_BRBSRC(23)
+#define SYS_BRBSRC24_EL1 __SYS_BRBSRC(24)
+#define SYS_BRBSRC25_EL1 __SYS_BRBSRC(25)
+#define SYS_BRBSRC26_EL1 __SYS_BRBSRC(26)
+#define SYS_BRBSRC27_EL1 __SYS_BRBSRC(27)
+#define SYS_BRBSRC28_EL1 __SYS_BRBSRC(28)
+#define SYS_BRBSRC29_EL1 __SYS_BRBSRC(29)
+#define SYS_BRBSRC30_EL1 __SYS_BRBSRC(30)
+#define SYS_BRBSRC31_EL1 __SYS_BRBSRC(31)
+
+#define SYS_BRBTGT0_EL1 __SYS_BRBTGT(0)
+#define SYS_BRBTGT1_EL1 __SYS_BRBTGT(1)
+#define SYS_BRBTGT2_EL1 __SYS_BRBTGT(2)
+#define SYS_BRBTGT3_EL1 __SYS_BRBTGT(3)
+#define SYS_BRBTGT4_EL1 __SYS_BRBTGT(4)
+#define SYS_BRBTGT5_EL1 __SYS_BRBTGT(5)
+#define SYS_BRBTGT6_EL1 __SYS_BRBTGT(6)
+#define SYS_BRBTGT7_EL1 __SYS_BRBTGT(7)
+#define SYS_BRBTGT8_EL1 __SYS_BRBTGT(8)
+#define SYS_BRBTGT9_EL1 __SYS_BRBTGT(9)
+#define SYS_BRBTGT10_EL1 __SYS_BRBTGT(10)
+#define SYS_BRBTGT11_EL1 __SYS_BRBTGT(11)
+#define SYS_BRBTGT12_EL1 __SYS_BRBTGT(12)
+#define SYS_BRBTGT13_EL1 __SYS_BRBTGT(13)
+#define SYS_BRBTGT14_EL1 __SYS_BRBTGT(14)
+#define SYS_BRBTGT15_EL1 __SYS_BRBTGT(15)
+#define SYS_BRBTGT16_EL1 __SYS_BRBTGT(16)
+#define SYS_BRBTGT17_EL1 __SYS_BRBTGT(17)
+#define SYS_BRBTGT18_EL1 __SYS_BRBTGT(18)
+#define SYS_BRBTGT19_EL1 __SYS_BRBTGT(19)
+#define SYS_BRBTGT20_EL1 __SYS_BRBTGT(20)
+#define SYS_BRBTGT21_EL1 __SYS_BRBTGT(21)
+#define SYS_BRBTGT22_EL1 __SYS_BRBTGT(22)
+#define SYS_BRBTGT23_EL1 __SYS_BRBTGT(23)
+#define SYS_BRBTGT24_EL1 __SYS_BRBTGT(24)
+#define SYS_BRBTGT25_EL1 __SYS_BRBTGT(25)
+#define SYS_BRBTGT26_EL1 __SYS_BRBTGT(26)
+#define SYS_BRBTGT27_EL1 __SYS_BRBTGT(27)
+#define SYS_BRBTGT28_EL1 __SYS_BRBTGT(28)
+#define SYS_BRBTGT29_EL1 __SYS_BRBTGT(29)
+#define SYS_BRBTGT30_EL1 __SYS_BRBTGT(30)
+#define SYS_BRBTGT31_EL1 __SYS_BRBTGT(31)
+
#define SYS_MIDR_EL1 sys_reg(3, 0, 0, 0, 0)
#define SYS_MPIDR_EL1 sys_reg(3, 0, 0, 0, 5)
#define SYS_REVIDR_EL1 sys_reg(3, 0, 0, 0, 6)
@@ -826,6 +929,7 @@
#define ID_AA64MMFR2_CNP_SHIFT 0

/* id_aa64dfr0 */
+#define ID_AA64DFR0_BRBE_SHIFT 52
#define ID_AA64DFR0_MTPMU_SHIFT 48
#define ID_AA64DFR0_TRBE_SHIFT 44
#define ID_AA64DFR0_TRACE_FILT_SHIFT 40
@@ -848,6 +952,9 @@
#define ID_AA64DFR0_PMSVER_8_2 0x1
#define ID_AA64DFR0_PMSVER_8_3 0x2

+#define ID_AA64DFR0_BRBE 0x1
+#define ID_AA64DFR0_BRBE_V1P1 0x2
+
#define ID_DFR0_PERFMON_SHIFT 24

#define ID_DFR0_PERFMON_8_0 0x3
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 9ae483ec1e56..b7e945e95f05 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -46,6 +46,165 @@
# feature that introduces them (eg, FEAT_LS64_ACCDATA introduces enumeration
# item ACCDATA) though it may be more taseful to do something else.

+
+# This is just a dummy register declaration to get all common field masks and
+# shifts for accessing given BRBINF contents.
+Sysreg BRBINF_EL1 2 1 8 0 0
+Res0 63:47
+Field 46 CCU
+Field 45:32 CC
+Res0 31:18
+Field 17 LASTFAILED
+Field 16 TX
+Res0 15:14
+Enum 13:8 TYPE
+ 0b000000 UNCOND_DIR
+ 0b000001 INDIR
+ 0b000010 DIR_LINK
+ 0b000011 INDIR_LINK
+ 0b000101 RET_SUB
+ 0b000111 RET_EXCPT
+ 0b001000 COND_DIR
+ 0b100001 DEBUG_HALT
+ 0b100010 CALL
+ 0b100011 TRAP
+ 0b100100 SERROR
+ 0b100110 INST_DEBUG
+ 0b100111 DATA_DEBUG
+ 0b101010 ALGN_FAULT
+ 0b101011 INST_FAULT
+ 0b101100 DATA_FAULT
+ 0b101110 IRQ
+ 0b101111 FIQ
+ 0b111001 DEBUG_EXIT
+EndEnum
+Enum 7:6 EL
+ 0b00 EL0
+ 0b01 EL1
+ 0b10 EL2
+EndEnum
+Field 5 MPRED
+Res0 4:2
+Enum 1:0 VALID
+ 0b00 NONE
+ 0b01 TARGET
+ 0b10 SOURCE
+ 0b11 FULL
+EndEnum
+EndSysreg
+
+Sysreg BRBCR_EL1 2 1 9 0 0
+Res0 63:24
+Field 23 EXCEPTION
+Field 22 ERTN
+Res0 21:9
+Field 8 FZP
+Field 7 ZERO
+Enum 6:5 TS
+ 0b1 VIRTUAL
+ 0b10 GST_PHYSICAL
+ 0b11 PHYSICAL
+EndEnum
+Field 4 MPRED
+Field 3 CC
+Field 2 ZERO1
+Field 1 E1BRE
+Field 0 E0BRE
+EndSysreg
+
+Sysreg BRBFCR_EL1 2 1 9 0 1
+Res0 63:30
+Enum 29:28 BANK
+ 0b0 FIRST
+ 0b1 SECOND
+EndEnum
+Res0 27:23
+Field 22 CONDDIR
+Field 21 DIRCALL
+Field 20 INDCALL
+Field 19 RTN
+Field 18 INDIRECT
+Field 17 DIRECT
+Field 16 ENL
+Res0 15:8
+Field 7 PAUSED
+Field 6 LASTFAILED
+Res0 5:0
+EndSysreg
+
+Sysreg BRBTS_EL1 2 1 9 0 2
+Field 63:0 TS
+EndSysreg
+
+Sysreg BRBINFINJ_EL1 2 1 9 1 0
+Res0 63:47
+Field 46 CCU
+Field 45:32 CC
+Res0 31:18
+Field 17 LASTFAILED
+Field 16 TX
+Res0 15:14
+Enum 13:8 TYPE
+ 0b000000 UNCOND_DIR
+ 0b000001 INDIR
+ 0b000010 DIR_LINK
+ 0b000011 INDIR_LINK
+ 0b000100 RET_SUB
+ 0b000100 RET_SUB
+ 0b000111 RET_EXCPT
+ 0b001000 COND_DIR
+ 0b100001 DEBUG_HALT
+ 0b100010 CALL
+ 0b100011 TRAP
+ 0b100100 SERROR
+ 0b100110 INST_DEBUG
+ 0b100111 DATA_DEBUG
+ 0b101010 ALGN_FAULT
+ 0b101011 INST_FAULT
+ 0b101100 DATA_FAULT
+ 0b101110 IRQ
+ 0b101111 FIQ
+ 0b111001 DEBUG_EXIT
+EndEnum
+Enum 7:6 EL
+ 0b00 EL0
+ 0b01 EL1
+ 0b10 EL2
+EndEnum
+Field 5 MPRED
+Res0 4:2
+Enum 1:0 VALID
+ 0b00 NONE
+ 0b01 TARGET
+ 0b10 SOURCE
+ 0b00 FULL
+EndEnum
+EndSysreg
+
+Sysreg BRBSRCINJ_EL1 2 1 9 1 1
+Field 63:0 ADDRESS
+EndSysreg
+
+Sysreg BRBTGTINJ_EL1 2 1 9 1 2
+Field 63:0 ADDRESS
+EndSysreg
+
+Sysreg BRBIDR0_EL1 2 1 9 2 0
+Res0 63:16
+Enum 15:12 CC
+ 0b101 20_BIT
+EndEnum
+Enum 11:8 FORMAT
+ 0b0 0
+EndEnum
+Enum 7:0 NUMREC
+ 0b1000 8
+ 0b10000 16
+ 0b100000 32
+ 0b1000000 64
+EndEnum
+EndSysreg
+
Sysreg ID_AA64ZFR0_EL1 3 0 0 4 4
Res0 63:60
Enum 59:56 F64MM
--
2.25.1

2022-09-29 08:54:13

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V3 3/7] arm64/perf: Update struct pmu_hw_events for BRBE

A single perf event instance BRBE related contexts and data will be tracked
in struct pmu_hw_events. Hence update the structure to accommodate required
details related to BRBE.

Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
drivers/perf/arm_pmu.c | 1 +
include/linux/perf/arm_pmu.h | 26 ++++++++++++++++++++++++++
2 files changed, 27 insertions(+)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 59d3980b8ca2..16fda9a1229f 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -905,6 +905,7 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)

events = per_cpu_ptr(pmu->hw_events, cpu);
raw_spin_lock_init(&events->pmu_lock);
+ events->branches = kmalloc(sizeof(struct brbe_records), flags);
events->percpu_pmu = pmu;
}

diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 3d427ac0ca45..ffce43ceb670 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -43,6 +43,16 @@
}, \
}

+/*
+ * Maximum branch records in BRBE
+ */
+#define BRBE_MAX_ENTRIES 64
+
+struct brbe_records {
+ struct perf_branch_stack brbe_stack;
+ struct perf_branch_entry brbe_entries[BRBE_MAX_ENTRIES];
+};
+
/* The events for a given PMU register set. */
struct pmu_hw_events {
/*
@@ -69,6 +79,22 @@ struct pmu_hw_events {
struct arm_pmu *percpu_pmu;

int irq;
+
+ /* Detected BRBE attributes */
+ bool v1p1;
+ int brbe_cc;
+ int brbe_nr;
+
+ /* Evaluated BRBE configuration */
+ u64 brbfcr;
+ u64 brbcr;
+
+ /* Tracked BRBE context */
+ unsigned int brbe_users;
+ void *brbe_context;
+
+ /* Captured BRBE buffer - copied as is into perf_sample_data */
+ struct brbe_records *branches;
};

enum armpmu_attr_groups {
--
2.25.1

2022-09-29 08:55:50

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V3 2/7] arm64/perf: Update struct arm_pmu for BRBE

Although BRBE is an armv8 speciifc HW feature, abstracting out its various
function callbacks at the struct arm_pmu level is preferred, as it cleaner
, easier to follow and maintain.

Besides some helpers i.e brbe_supported(), brbe_probe() and brbe_reset()
might not fit seamlessly, when tried to be embedded via existing arm_pmu
helpers in the armv8 implementation.

Updates the struct arm_pmu to include all required helpers that will drive
BRBE functionality for a given PMU implementation. These are the following.

- brbe_filter : Convert perf event filters into BRBE HW filters
- brbe_probe : Probe BRBE HW and capture its attributes
- brbe_enable : Enable BRBE HW with a given config
- brbe_disable : Disable BRBE HW
- brbe_read : Read BRBE buffer for captured branch records
- brbe_reset : Reset BRBE buffer
- brbe_supported: Whether BRBE is supported or not

A BRBE driver implementation needs to provide these functionalities.

Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
arch/arm64/kernel/perf_event.c | 36 ++++++++++++++++++++++++++++++++++
include/linux/perf/arm_pmu.h | 21 ++++++++++++++++++++
2 files changed, 57 insertions(+)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index cb69ff1e6138..e7013699171f 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1025,6 +1025,35 @@ static int armv8pmu_filter_match(struct perf_event *event)
return evtype != ARMV8_PMUV3_PERFCTR_CHAIN;
}

+static void armv8pmu_brbe_filter(struct pmu_hw_events *hw_event, struct perf_event *event)
+{
+}
+
+static void armv8pmu_brbe_enable(struct pmu_hw_events *hw_event)
+{
+}
+
+static void armv8pmu_brbe_disable(struct pmu_hw_events *hw_event)
+{
+}
+
+static void armv8pmu_brbe_read(struct pmu_hw_events *hw_event, struct perf_event *event)
+{
+}
+
+static void armv8pmu_brbe_probe(struct pmu_hw_events *hw_event)
+{
+}
+
+static void armv8pmu_brbe_reset(struct pmu_hw_events *hw_event)
+{
+}
+
+static bool armv8pmu_brbe_supported(struct perf_event *event)
+{
+ return false;
+}
+
static void armv8pmu_reset(void *info)
{
struct arm_pmu *cpu_pmu = (struct arm_pmu *)info;
@@ -1257,6 +1286,13 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,

cpu_pmu->pmu.event_idx = armv8pmu_user_event_idx;

+ cpu_pmu->brbe_filter = armv8pmu_brbe_filter;
+ cpu_pmu->brbe_enable = armv8pmu_brbe_enable;
+ cpu_pmu->brbe_disable = armv8pmu_brbe_disable;
+ cpu_pmu->brbe_read = armv8pmu_brbe_read;
+ cpu_pmu->brbe_probe = armv8pmu_brbe_probe;
+ cpu_pmu->brbe_reset = armv8pmu_brbe_reset;
+ cpu_pmu->brbe_supported = armv8pmu_brbe_supported;
cpu_pmu->name = name;
cpu_pmu->map_event = map_event;
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = events ?
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 0407a38b470a..3d427ac0ca45 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -100,6 +100,27 @@ struct arm_pmu {
void (*reset)(void *);
int (*map_event)(struct perf_event *event);
int (*filter_match)(struct perf_event *event);
+
+ /* Convert perf event filters into BRBE HW filters */
+ void (*brbe_filter)(struct pmu_hw_events *hw_events, struct perf_event *event);
+
+ /* Probe BRBE HW and capture its attributes */
+ void (*brbe_probe)(struct pmu_hw_events *hw_events);
+
+ /* Enable BRBE HW with a given config */
+ void (*brbe_enable)(struct pmu_hw_events *hw_events);
+
+ /* Disable BRBE HW */
+ void (*brbe_disable)(struct pmu_hw_events *hw_events);
+
+ /* Process BRBE buffer for captured branch records */
+ void (*brbe_read)(struct pmu_hw_events *hw_events, struct perf_event *event);
+
+ /* Reset BRBE buffer */
+ void (*brbe_reset)(struct pmu_hw_events *hw_events);
+
+ /* Check whether BRBE is supported */
+ bool (*brbe_supported)(struct perf_event *event);
int num_events;
bool secure_access; /* 32-bit ARM only */
#define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
--
2.25.1

2022-09-29 12:17:40

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH V3 1/7] arm64/perf: Add BRBE registers and fields

On Thu, Sep 29, 2022 at 01:28:51PM +0530, Anshuman Khandual wrote:

Thanks for doing this work - I did spot a few small issues though.

> /* id_aa64dfr0 */
> +#define ID_AA64DFR0_BRBE_SHIFT 52
> #define ID_AA64DFR0_MTPMU_SHIFT 48
> #define ID_AA64DFR0_TRBE_SHIFT 44
> #define ID_AA64DFR0_TRACE_FILT_SHIFT 40
> @@ -848,6 +952,9 @@
> #define ID_AA64DFR0_PMSVER_8_2 0x1
> #define ID_AA64DFR0_PMSVER_8_3 0x2
>
> +#define ID_AA64DFR0_BRBE 0x1
> +#define ID_AA64DFR0_BRBE_V1P1 0x2
> +
> #define ID_DFR0_PERFMON_SHIFT 24
>
> #define ID_DFR0_PERFMON_8_0 0x3

This is already done in -next as a result of ID_AA64DFR0_EL1 being
converted, the enumberation define comes out as
ID_AA64DFR0_EL1_BRBE_BRBE_V1P1.

> +# This is just a dummy register declaration to get all common field masks and
> +# shifts for accessing given BRBINF contents.
> +Sysreg BRBINF_EL1 2 1 8 0 0
> +Res0 63:47
> +Field 46 CCU
> +Field 45:32 CC
> +Res0 31:18
> +Field 17 LASTFAILED
> +Field 16 TX

According to DDI0487I.a bit 16 is called T not TX.

> +Res0 15:14
> +Enum 13:8 TYPE

It's probably worth noting in the comment the issue with Enums here
that's meaning you're not using a SysregFields - I'm not sure what
people will think of this but providing a definition using the ID for
the 0th register does seem expedient.

> +Enum 7:6 EL
> + 0b00 EL0
> + 0b01 EL1
> + 0b10 EL2
> +EndEnum

According to DDI0487I.a 0b11 has the value EL3 (when FEAT_BRBEv1p1).

> +Sysreg BRBCR_EL1 2 1 9 0 0
> +Res0 63:24
> +Field 23 EXCEPTION
> +Field 22 ERTN
> +Res0 21:9
> +Field 8 FZP
> +Field 7 ZERO

According to DDI0487I.a bit 7 is Res0.

> +Field 2 ZERO1

According to DDI0487I.a bit 2 is Res0.

> +Sysreg BRBFCR_EL1 2 1 9 0 1

> +Field 16 ENL

Accoding to DDI0487I.a this is EnI (ie, an L not an I).

> +Sysreg BRBINFINJ_EL1 2 1 9 1 0

> +Field 16 TX

According to DDI0487I.a this is T not TX.

> +Enum 7:6 EL
> + 0b00 EL0
> + 0b01 EL1
> + 0b10 EL2
> +EndEnum

According to DDI0487I.a 0b11 has the value EL3 (when FEAT_BRBEv1p1).


Attachments:
(No filename) (2.05 kB)
signature.asc (499.00 B)
Download all attachments

2022-09-30 04:30:54

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V3 1/7] arm64/perf: Add BRBE registers and fields



On 9/29/22 16:59, Mark Brown wrote:
> On Thu, Sep 29, 2022 at 01:28:51PM +0530, Anshuman Khandual wrote:
>
> Thanks for doing this work - I did spot a few small issues though.
>
>> /* id_aa64dfr0 */
>> +#define ID_AA64DFR0_BRBE_SHIFT 52
>> #define ID_AA64DFR0_MTPMU_SHIFT 48
>> #define ID_AA64DFR0_TRBE_SHIFT 44
>> #define ID_AA64DFR0_TRACE_FILT_SHIFT 40
>> @@ -848,6 +952,9 @@
>> #define ID_AA64DFR0_PMSVER_8_2 0x1
>> #define ID_AA64DFR0_PMSVER_8_3 0x2
>>
>> +#define ID_AA64DFR0_BRBE 0x1
>> +#define ID_AA64DFR0_BRBE_V1P1 0x2
>> +
>> #define ID_DFR0_PERFMON_SHIFT 24
>>
>> #define ID_DFR0_PERFMON_8_0 0x3
>
> This is already done in -next as a result of ID_AA64DFR0_EL1 being
> converted, the enumberation define comes out as
> ID_AA64DFR0_EL1_BRBE_BRBE_V1P1.

Right. I will rebase the series on upcoming 6.1-rc1 which should have all the
current -next patches including ID_AA64DFR0_EL1 migration into tools/sysreg.
This should just fall in place.

>
>> +# This is just a dummy register declaration to get all common field masks and
>> +# shifts for accessing given BRBINF contents.
>> +Sysreg BRBINF_EL1 2 1 8 0 0
>> +Res0 63:47
>> +Field 46 CCU
>> +Field 45:32 CC
>> +Res0 31:18
>> +Field 17 LASTFAILED
>> +Field 16 TX
>
> According to DDI0487I.a bit 16 is called T not TX.

I understand :) The intention here was to keep the field name associated with
"transaction" some how. But I guess, it would be more important to keep it as
is matching the ARM ARM than something for readability purpose. Will change it
as 'T'.

>
>> +Res0 15:14
>> +Enum 13:8 TYPE
>
> It's probably worth noting in the comment the issue with Enums here
> that's meaning you're not using a SysregFields - I'm not sure what

Sure, will add a comment describing the problem of using enum elements inside
SysregFields definition.

> people will think of this but providing a definition using the ID for
> the 0th register does seem expedient.

I understand your concern but this turned out to be a better option

- Original sysreg.h based definitions had all field masks on the right end

- When reading (reg >> field_shift) & field_mask
- When writing (val & field_mask) << field_shift

- tools/sysreg format creates in-place field masks

- When reading (reg & field_mask) >> field_shift
- When writing (val << field_shift) & field_mask

- After moving some BRBE registers into tools/sysreg, the driver code had
to be changed to accommodate these new write/read methods

- To avoid mix up in the BRBE driver, all BRBINF register fields need to be
converted into in place masks, either in sysreg.h itself or moving into
tools/sysreg

Moving BRBE fields into tools/sysreg via a dummy BRBINF_EL1 register seems
to achieve that objective. These common fields can be used to work on any
BRBINF<N>_EL1 register value. But I might just keep them in sysreg.h, if
the proposed solution is not preferable or seems expedient.

Later when enum support comes up in SysregFields and tools/sysreg supports
formula based crm/op2 expansion entire BRBINF, BRBSRC, BRBTGT register set
can be moved into tools/sysreg.

>
>> +Enum 7:6 EL
>> + 0b00 EL0
>> + 0b01 EL1
>> + 0b10 EL2
>> +EndEnum
>
> According to DDI0487I.a 0b11 has the value EL3 (when FEAT_BRBEv1p1).

Sure, will add it.

>
>> +Sysreg BRBCR_EL1 2 1 9 0 0
>> +Res0 63:24
>> +Field 23 EXCEPTION
>> +Field 22 ERTN
>> +Res0 21:9
>> +Field 8 FZP
>> +Field 7 ZERO
>
> According to DDI0487I.a bit 7 is Res0.

Sure, will change.

>
>> +Field 2 ZERO1
>
> According to DDI0487I.a bit 2 is Res0.

Sure, will change.

>
>> +Sysreg BRBFCR_EL1 2 1 9 0 1
>
>> +Field 16 ENL
>
> Accoding to DDI0487I.a this is EnI (ie, an L not an I).

ENL != Enl ? Do we need to match the case as well ?

>
>> +Sysreg BRBINFINJ_EL1 2 1 9 1 0
>
>> +Field 16 TX
>
> According to DDI0487I.a this is T not TX.

As mentioned, will change it as 'T'.

>
>> +Enum 7:6 EL
>> + 0b00 EL0
>> + 0b01 EL1
>> + 0b10 EL2
>> +EndEnum
>
> According to DDI0487I.a 0b11 has the value EL3 (when FEAT_BRBEv1p1).

Sure, will add.

2022-10-06 13:53:17

by James Clark

[permalink] [raw]
Subject: Re: [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection



On 29/09/2022 08:58, Anshuman Khandual wrote:
> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
> driver exported callbacks later. The actual BRBE feature detection will be
> added by the driver itself.
>
> CPU specific BRBE entries, cycle count, format support gets detected during
> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
> later helps in operating BRBE during a perf event context.
>
> Cc: Will Deacon <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Anshuman Khandual <[email protected]>
> ---
> drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
> index 933b96e243b8..acdc445081aa 100644
> --- a/drivers/perf/arm_pmu_platform.c
> +++ b/drivers/perf/arm_pmu_platform.c
> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
> return err;
> }
>
> +static void arm_brbe_probe_cpu(void *info)
> +{
> + struct pmu_hw_events *hw_events;
> + struct arm_pmu *armpmu = info;
> +
> + /*
> + * Return from here, if BRBE driver has not been
> + * implemented for this PMU. This helps prevent
> + * kernel crash later when brbe_probe() will be
> + * called on the PMU.
> + */
> + if (!armpmu->brbe_probe)
> + return;
> +
> + hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
> + armpmu->brbe_probe(hw_events);
> +}
> +
> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
> +{
> + int cpu, err = 0;
> +
> + for_each_cpu(cpu, &armpmu->supported_cpus) {
> + err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);

Hi Anshuman,

I have LOCKDEP on and the patchset applied to perf/core (82aad7ff7) on
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git and I get
this:

armv8-pmu pmu: hw perfevents: no interrupt-affinity property, guessing.
brbe: implementation found on cpu 0

=============================
[ BUG: Invalid wait context ]
6.0.0-rc7 #38 Not tainted
-----------------------------
kworker/u8:0/9 is trying to lock:
ffff000800855898 (&port_lock_key){....}-{3:3}, at:
pl011_console_write+0x148/0x240
other info that might help us debug this:
context-{2:2}
5 locks held by kworker/u8:0/9:
#0: ffff00080032a138 ((wq_completion)eval_map_wq){+.+.}-{0:0}, at:
process_one_work+0x200/0x6b0
#1: ffff80000807bde0
((work_completion)(&eval_map_work)){+.+.}-{0:0}, at:
process_one_work+0x200/0x6b0
#2: ffff80000aa3db70 (trace_event_sem){+.+.}-{4:4}, at:
trace_event_eval_update+0x28/0x420
#3: ffff80000a9afe58 (console_lock){+.+.}-{0:0}, at:
vprintk_emit+0x130/0x380
#4: ffff80000a9aff78 (console_owner){-...}-{0:0}, at:
console_emit_next_record.constprop.0+0x128/0x338
stack backtrace:
CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.0.0-rc7 #38
Hardware name: Foundation-v8A (DT)
Workqueue: eval_map_wq eval_map_work_func
Call trace:
dump_backtrace+0x114/0x120
show_stack+0x20/0x58
dump_stack_lvl+0x9c/0xd8
dump_stack+0x18/0x34
__lock_acquire+0x17cc/0x1920
lock_acquire+0x138/0x3b8
_raw_spin_lock+0x58/0x70
pl011_console_write+0x148/0x240
console_emit_next_record.constprop.0+0x194/0x338
console_unlock+0x18c/0x208
vprintk_emit+0x24c/0x380
vprintk_default+0x40/0x50
vprintk+0xd4/0xf0
_printk+0x68/0x90
arm64_pmu_brbe_probe+0x10c/0x128
armv8pmu_brbe_probe+0x18/0x28
arm_brbe_probe_cpu+0x44/0x58
__flush_smp_call_function_queue+0x1d0/0x440
generic_smp_call_function_single_interrupt+0x20/0x78
ipi_handler+0x98/0x368
handle_percpu_devid_irq+0xc0/0x3a8
generic_handle_domain_irq+0x34/0x50
gic_handle_irq+0x58/0x138
call_on_irq_stack+0x2c/0x58
do_interrupt_handler+0x88/0x90
el1_interrupt+0x40/0x78
el1h_64_irq_handler+0x18/0x28
el1h_64_irq+0x64/0x68
trace_event_eval_update+0x114/0x420
eval_map_work_func+0x30/0x40
process_one_work+0x298/0x6b0
worker_thread+0x54/0x408
kthread+0x118/0x128
ret_from_fork+0x10/0x20
brbe: implementation found on cpu 1
brbe: implementation found on cpu 2
brbe: implementation found on cpu 3

> + if (err)
> + return err;
> + }
> + return err;
> +}
> +
> static void armpmu_free_irqs(struct arm_pmu *armpmu)
> {
> int cpu;
> @@ -229,6 +259,10 @@ int arm_pmu_device_probe(struct platform_device *pdev,
> if (ret)
> goto out_free_irqs;
>
> + ret = armpmu_request_brbe(pmu);
> + if (ret)
> + goto out_free_irqs;
> +
> ret = armpmu_register(pmu);
> if (ret) {
> dev_err(dev, "failed to register PMU devices!\n");

2022-10-10 15:22:45

by James Clark

[permalink] [raw]
Subject: Re: [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection



On 06/10/2022 14:37, James Clark wrote:
>
>
> On 29/09/2022 08:58, Anshuman Khandual wrote:
>> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
>> driver exported callbacks later. The actual BRBE feature detection will be
>> added by the driver itself.
>>
>> CPU specific BRBE entries, cycle count, format support gets detected during
>> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
>> later helps in operating BRBE during a perf event context.
>>
>> Cc: Will Deacon <[email protected]>
>> Cc: Mark Rutland <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
>> 1 file changed, 34 insertions(+)
>>
>> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
>> index 933b96e243b8..acdc445081aa 100644
>> --- a/drivers/perf/arm_pmu_platform.c
>> +++ b/drivers/perf/arm_pmu_platform.c
>> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
>> return err;
>> }
>>
>> +static void arm_brbe_probe_cpu(void *info)
>> +{
>> + struct pmu_hw_events *hw_events;
>> + struct arm_pmu *armpmu = info;
>> +
>> + /*
>> + * Return from here, if BRBE driver has not been
>> + * implemented for this PMU. This helps prevent
>> + * kernel crash later when brbe_probe() will be
>> + * called on the PMU.
>> + */
>> + if (!armpmu->brbe_probe)
>> + return;
>> +
>> + hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
>> + armpmu->brbe_probe(hw_events);
>> +}
>> +
>> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
>> +{
>> + int cpu, err = 0;
>> +
>> + for_each_cpu(cpu, &armpmu->supported_cpus) {
>> + err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
>
> Hi Anshuman,
>
> I have LOCKDEP on and the patchset applied to perf/core (82aad7ff7) on
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git and I get

Can you confirm if this is currently the correct place to apply this to?
I'm only getting 0 length branch stacks now. Seems like it could be
something to do with the layout of perf samples because I know that was
done in separate commits:

sudo ./perf record -j any_call -- ls
./perf report -D | grep "branch stack"
... branch stack: nr:0
... branch stack: nr:0
... branch stack: nr:0
... branch stack: nr:0
...

> this:
>
> armv8-pmu pmu: hw perfevents: no interrupt-affinity property, guessing.
> brbe: implementation found on cpu 0
>
> =============================
> [ BUG: Invalid wait context ]
> 6.0.0-rc7 #38 Not tainted
> -----------------------------
> kworker/u8:0/9 is trying to lock:
> ffff000800855898 (&port_lock_key){....}-{3:3}, at:
> pl011_console_write+0x148/0x240
> other info that might help us debug this:
> context-{2:2}
> 5 locks held by kworker/u8:0/9:
> #0: ffff00080032a138 ((wq_completion)eval_map_wq){+.+.}-{0:0}, at:
> process_one_work+0x200/0x6b0
> #1: ffff80000807bde0
> ((work_completion)(&eval_map_work)){+.+.}-{0:0}, at:
> process_one_work+0x200/0x6b0
> #2: ffff80000aa3db70 (trace_event_sem){+.+.}-{4:4}, at:
> trace_event_eval_update+0x28/0x420
> #3: ffff80000a9afe58 (console_lock){+.+.}-{0:0}, at:
> vprintk_emit+0x130/0x380
> #4: ffff80000a9aff78 (console_owner){-...}-{0:0}, at:
> console_emit_next_record.constprop.0+0x128/0x338
> stack backtrace:
> CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.0.0-rc7 #38
> Hardware name: Foundation-v8A (DT)
> Workqueue: eval_map_wq eval_map_work_func
> Call trace:
> dump_backtrace+0x114/0x120
> show_stack+0x20/0x58
> dump_stack_lvl+0x9c/0xd8
> dump_stack+0x18/0x34
> __lock_acquire+0x17cc/0x1920
> lock_acquire+0x138/0x3b8
> _raw_spin_lock+0x58/0x70
> pl011_console_write+0x148/0x240
> console_emit_next_record.constprop.0+0x194/0x338
> console_unlock+0x18c/0x208
> vprintk_emit+0x24c/0x380
> vprintk_default+0x40/0x50
> vprintk+0xd4/0xf0
> _printk+0x68/0x90
> arm64_pmu_brbe_probe+0x10c/0x128
> armv8pmu_brbe_probe+0x18/0x28
> arm_brbe_probe_cpu+0x44/0x58
> __flush_smp_call_function_queue+0x1d0/0x440
> generic_smp_call_function_single_interrupt+0x20/0x78
> ipi_handler+0x98/0x368
> handle_percpu_devid_irq+0xc0/0x3a8
> generic_handle_domain_irq+0x34/0x50
> gic_handle_irq+0x58/0x138
> call_on_irq_stack+0x2c/0x58
> do_interrupt_handler+0x88/0x90
> el1_interrupt+0x40/0x78
> el1h_64_irq_handler+0x18/0x28
> el1h_64_irq+0x64/0x68
> trace_event_eval_update+0x114/0x420
> eval_map_work_func+0x30/0x40
> process_one_work+0x298/0x6b0
> worker_thread+0x54/0x408
> kthread+0x118/0x128
> ret_from_fork+0x10/0x20
> brbe: implementation found on cpu 1
> brbe: implementation found on cpu 2
> brbe: implementation found on cpu 3
>
>> + if (err)
>> + return err;
>> + }
>> + return err;
>> +}
>> +
>> static void armpmu_free_irqs(struct arm_pmu *armpmu)
>> {
>> int cpu;
>> @@ -229,6 +259,10 @@ int arm_pmu_device_probe(struct platform_device *pdev,
>> if (ret)
>> goto out_free_irqs;
>>
>> + ret = armpmu_request_brbe(pmu);
>> + if (ret)
>> + goto out_free_irqs;
>> +
>> ret = armpmu_register(pmu);
>> if (ret) {
>> dev_err(dev, "failed to register PMU devices!\n");

2022-10-11 09:37:51

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection



On 10/6/22 19:07, James Clark wrote:
>
> On 29/09/2022 08:58, Anshuman Khandual wrote:
>> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
>> driver exported callbacks later. The actual BRBE feature detection will be
>> added by the driver itself.
>>
>> CPU specific BRBE entries, cycle count, format support gets detected during
>> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
>> later helps in operating BRBE during a perf event context.
>>
>> Cc: Will Deacon <[email protected]>
>> Cc: Mark Rutland <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Anshuman Khandual <[email protected]>
>> ---
>> drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
>> 1 file changed, 34 insertions(+)
>>
>> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
>> index 933b96e243b8..acdc445081aa 100644
>> --- a/drivers/perf/arm_pmu_platform.c
>> +++ b/drivers/perf/arm_pmu_platform.c
>> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
>> return err;
>> }
>>
>> +static void arm_brbe_probe_cpu(void *info)
>> +{
>> + struct pmu_hw_events *hw_events;
>> + struct arm_pmu *armpmu = info;
>> +
>> + /*
>> + * Return from here, if BRBE driver has not been
>> + * implemented for this PMU. This helps prevent
>> + * kernel crash later when brbe_probe() will be
>> + * called on the PMU.
>> + */
>> + if (!armpmu->brbe_probe)
>> + return;
>> +
>> + hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
>> + armpmu->brbe_probe(hw_events);
>> +}
>> +
>> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
>> +{
>> + int cpu, err = 0;
>> +
>> + for_each_cpu(cpu, &armpmu->supported_cpus) {
>> + err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
> Hi Anshuman,
>
> I have LOCKDEP on and the patchset applied to perf/core (82aad7ff7) on
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git and I get
> this:
>
> armv8-pmu pmu: hw perfevents: no interrupt-affinity property, guessing.
> brbe: implementation found on cpu 0
>
> =============================
> [ BUG: Invalid wait context ]
> 6.0.0-rc7 #38 Not tainted
> -----------------------------
> kworker/u8:0/9 is trying to lock:
> ffff000800855898 (&port_lock_key){....}-{3:3}, at:
> pl011_console_write+0x148/0x240
> other info that might help us debug this:
> context-{2:2}
> 5 locks held by kworker/u8:0/9:
> #0: ffff00080032a138 ((wq_completion)eval_map_wq){+.+.}-{0:0}, at:
> process_one_work+0x200/0x6b0
> #1: ffff80000807bde0
> ((work_completion)(&eval_map_work)){+.+.}-{0:0}, at:
> process_one_work+0x200/0x6b0
> #2: ffff80000aa3db70 (trace_event_sem){+.+.}-{4:4}, at:
> trace_event_eval_update+0x28/0x420
> #3: ffff80000a9afe58 (console_lock){+.+.}-{0:0}, at:
> vprintk_emit+0x130/0x380
> #4: ffff80000a9aff78 (console_owner){-...}-{0:0}, at:
> console_emit_next_record.constprop.0+0x128/0x338
> stack backtrace:
> CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.0.0-rc7 #38
> Hardware name: Foundation-v8A (DT)
> Workqueue: eval_map_wq eval_map_work_func
> Call trace:
> dump_backtrace+0x114/0x120
> show_stack+0x20/0x58
> dump_stack_lvl+0x9c/0xd8
> dump_stack+0x18/0x34
> __lock_acquire+0x17cc/0x1920
> lock_acquire+0x138/0x3b8
> _raw_spin_lock+0x58/0x70
> pl011_console_write+0x148/0x240
> console_emit_next_record.constprop.0+0x194/0x338
> console_unlock+0x18c/0x208
> vprintk_emit+0x24c/0x380
> vprintk_default+0x40/0x50
> vprintk+0xd4/0xf0
> _printk+0x68/0x90
> arm64_pmu_brbe_probe+0x10c/0x128
> armv8pmu_brbe_probe+0x18/0x28
> arm_brbe_probe_cpu+0x44/0x58
> __flush_smp_call_function_queue+0x1d0/0x440
> generic_smp_call_function_single_interrupt+0x20/0x78
> ipi_handler+0x98/0x368
> handle_percpu_devid_irq+0xc0/0x3a8
> generic_handle_domain_irq+0x34/0x50
> gic_handle_irq+0x58/0x138
> call_on_irq_stack+0x2c/0x58
> do_interrupt_handler+0x88/0x90
> el1_interrupt+0x40/0x78
> el1h_64_irq_handler+0x18/0x28
> el1h_64_irq+0x64/0x68
> trace_event_eval_update+0x114/0x420
> eval_map_work_func+0x30/0x40
> process_one_work+0x298/0x6b0
> worker_thread+0x54/0x408
> kthread+0x118/0x128
> ret_from_fork+0x10/0x20
> brbe: implementation found on cpu 1
> brbe: implementation found on cpu 2
> brbe: implementation found on cpu 3


The LOCKDEP warnings are because of pr_warn/pr_info in arm64_pmu_brbe_probe()
which gets called from smp_call_function_single() context. I will drop these
prints, instead probably capture them in struct pmu_hw_events and display in
the caller itself.

2022-10-11 09:50:51

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection



On 10/10/22 19:47, James Clark wrote:
>
>
> On 06/10/2022 14:37, James Clark wrote:
>>
>>
>> On 29/09/2022 08:58, Anshuman Khandual wrote:
>>> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
>>> driver exported callbacks later. The actual BRBE feature detection will be
>>> added by the driver itself.
>>>
>>> CPU specific BRBE entries, cycle count, format support gets detected during
>>> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
>>> later helps in operating BRBE during a perf event context.
>>>
>>> Cc: Will Deacon <[email protected]>
>>> Cc: Mark Rutland <[email protected]>
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Signed-off-by: Anshuman Khandual <[email protected]>
>>> ---
>>> drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
>>> 1 file changed, 34 insertions(+)
>>>
>>> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
>>> index 933b96e243b8..acdc445081aa 100644
>>> --- a/drivers/perf/arm_pmu_platform.c
>>> +++ b/drivers/perf/arm_pmu_platform.c
>>> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
>>> return err;
>>> }
>>>
>>> +static void arm_brbe_probe_cpu(void *info)
>>> +{
>>> + struct pmu_hw_events *hw_events;
>>> + struct arm_pmu *armpmu = info;
>>> +
>>> + /*
>>> + * Return from here, if BRBE driver has not been
>>> + * implemented for this PMU. This helps prevent
>>> + * kernel crash later when brbe_probe() will be
>>> + * called on the PMU.
>>> + */
>>> + if (!armpmu->brbe_probe)
>>> + return;
>>> +
>>> + hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
>>> + armpmu->brbe_probe(hw_events);
>>> +}
>>> +
>>> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
>>> +{
>>> + int cpu, err = 0;
>>> +
>>> + for_each_cpu(cpu, &armpmu->supported_cpus) {
>>> + err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
>>
>> Hi Anshuman,
>>
>> I have LOCKDEP on and the patchset applied to perf/core (82aad7ff7) on
>> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git and I get
>
> Can you confirm if this is currently the correct place to apply this to?

This series applied on v6.0-rc5 after the perf ABI changes, both in kernel
and in user space tools.

> I'm only getting 0 length branch stacks now. Seems like it could be
> something to do with the layout of perf samples because I know that was
> done in separate commits:

Right, might be.

>
> sudo ./perf record -j any_call -- ls
> ./perf report -D | grep "branch stack"
> ... branch stack: nr:0
> ... branch stack: nr:0
> ... branch stack: nr:0
> ... branch stack: nr:0

I am planning to respin the series on 6.1-rc1 next week which should solve
these multiple moving parts problem.

> ...
>
>> this:
>>
>> armv8-pmu pmu: hw perfevents: no interrupt-affinity property, guessing.
>> brbe: implementation found on cpu 0
>>
>> =============================
>> [ BUG: Invalid wait context ]
>> 6.0.0-rc7 #38 Not tainted
>> -----------------------------
>> kworker/u8:0/9 is trying to lock:
>> ffff000800855898 (&port_lock_key){....}-{3:3}, at:
>> pl011_console_write+0x148/0x240
>> other info that might help us debug this:
>> context-{2:2}
>> 5 locks held by kworker/u8:0/9:
>> #0: ffff00080032a138 ((wq_completion)eval_map_wq){+.+.}-{0:0}, at:
>> process_one_work+0x200/0x6b0
>> #1: ffff80000807bde0
>> ((work_completion)(&eval_map_work)){+.+.}-{0:0}, at:
>> process_one_work+0x200/0x6b0
>> #2: ffff80000aa3db70 (trace_event_sem){+.+.}-{4:4}, at:
>> trace_event_eval_update+0x28/0x420
>> #3: ffff80000a9afe58 (console_lock){+.+.}-{0:0}, at:
>> vprintk_emit+0x130/0x380
>> #4: ffff80000a9aff78 (console_owner){-...}-{0:0}, at:
>> console_emit_next_record.constprop.0+0x128/0x338
>> stack backtrace:
>> CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.0.0-rc7 #38
>> Hardware name: Foundation-v8A (DT)
>> Workqueue: eval_map_wq eval_map_work_func
>> Call trace:
>> dump_backtrace+0x114/0x120
>> show_stack+0x20/0x58
>> dump_stack_lvl+0x9c/0xd8
>> dump_stack+0x18/0x34
>> __lock_acquire+0x17cc/0x1920
>> lock_acquire+0x138/0x3b8
>> _raw_spin_lock+0x58/0x70
>> pl011_console_write+0x148/0x240
>> console_emit_next_record.constprop.0+0x194/0x338
>> console_unlock+0x18c/0x208
>> vprintk_emit+0x24c/0x380
>> vprintk_default+0x40/0x50
>> vprintk+0xd4/0xf0
>> _printk+0x68/0x90
>> arm64_pmu_brbe_probe+0x10c/0x128
>> armv8pmu_brbe_probe+0x18/0x28
>> arm_brbe_probe_cpu+0x44/0x58
>> __flush_smp_call_function_queue+0x1d0/0x440
>> generic_smp_call_function_single_interrupt+0x20/0x78
>> ipi_handler+0x98/0x368
>> handle_percpu_devid_irq+0xc0/0x3a8
>> generic_handle_domain_irq+0x34/0x50
>> gic_handle_irq+0x58/0x138
>> call_on_irq_stack+0x2c/0x58
>> do_interrupt_handler+0x88/0x90
>> el1_interrupt+0x40/0x78
>> el1h_64_irq_handler+0x18/0x28
>> el1h_64_irq+0x64/0x68
>> trace_event_eval_update+0x114/0x420
>> eval_map_work_func+0x30/0x40
>> process_one_work+0x298/0x6b0
>> worker_thread+0x54/0x408
>> kthread+0x118/0x128
>> ret_from_fork+0x10/0x20
>> brbe: implementation found on cpu 1
>> brbe: implementation found on cpu 2
>> brbe: implementation found on cpu 3
>>
>>> + if (err)
>>> + return err;
>>> + }
>>> + return err;
>>> +}
>>> +
>>> static void armpmu_free_irqs(struct arm_pmu *armpmu)
>>> {
>>> int cpu;
>>> @@ -229,6 +259,10 @@ int arm_pmu_device_probe(struct platform_device *pdev,
>>> if (ret)
>>> goto out_free_irqs;
>>>
>>> + ret = armpmu_request_brbe(pmu);
>>> + if (ret)
>>> + goto out_free_irqs;
>>> +
>>> ret = armpmu_register(pmu);
>>> if (ret) {
>>> dev_err(dev, "failed to register PMU devices!\n");

2022-10-12 08:12:50

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V3 4/7] driver/perf/arm_pmu_platform: Add support for BRBE attributes detection



On 10/11/22 14:51, Anshuman Khandual wrote:
>
> On 10/10/22 19:47, James Clark wrote:
>>
>> On 06/10/2022 14:37, James Clark wrote:
>>>
>>> On 29/09/2022 08:58, Anshuman Khandual wrote:
>>>> This adds arm pmu infrastrure to probe BRBE implementation's attributes via
>>>> driver exported callbacks later. The actual BRBE feature detection will be
>>>> added by the driver itself.
>>>>
>>>> CPU specific BRBE entries, cycle count, format support gets detected during
>>>> PMU init. This information gets saved in per-cpu struct pmu_hw_events which
>>>> later helps in operating BRBE during a perf event context.
>>>>
>>>> Cc: Will Deacon <[email protected]>
>>>> Cc: Mark Rutland <[email protected]>
>>>> Cc: [email protected]
>>>> Cc: [email protected]
>>>> Signed-off-by: Anshuman Khandual <[email protected]>
>>>> ---
>>>> drivers/perf/arm_pmu_platform.c | 34 +++++++++++++++++++++++++++++++++
>>>> 1 file changed, 34 insertions(+)
>>>>
>>>> diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c
>>>> index 933b96e243b8..acdc445081aa 100644
>>>> --- a/drivers/perf/arm_pmu_platform.c
>>>> +++ b/drivers/perf/arm_pmu_platform.c
>>>> @@ -172,6 +172,36 @@ static int armpmu_request_irqs(struct arm_pmu *armpmu)
>>>> return err;
>>>> }
>>>>
>>>> +static void arm_brbe_probe_cpu(void *info)
>>>> +{
>>>> + struct pmu_hw_events *hw_events;
>>>> + struct arm_pmu *armpmu = info;
>>>> +
>>>> + /*
>>>> + * Return from here, if BRBE driver has not been
>>>> + * implemented for this PMU. This helps prevent
>>>> + * kernel crash later when brbe_probe() will be
>>>> + * called on the PMU.
>>>> + */
>>>> + if (!armpmu->brbe_probe)
>>>> + return;
>>>> +
>>>> + hw_events = per_cpu_ptr(armpmu->hw_events, smp_processor_id());
>>>> + armpmu->brbe_probe(hw_events);
>>>> +}
>>>> +
>>>> +static int armpmu_request_brbe(struct arm_pmu *armpmu)
>>>> +{
>>>> + int cpu, err = 0;
>>>> +
>>>> + for_each_cpu(cpu, &armpmu->supported_cpus) {
>>>> + err = smp_call_function_single(cpu, arm_brbe_probe_cpu, armpmu, 1);
>>> Hi Anshuman,
>>>
>>> I have LOCKDEP on and the patchset applied to perf/core (82aad7ff7) on
>>> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git and I get
>> Can you confirm if this is currently the correct place to apply this to?
> This series applied on v6.0-rc5 after the perf ABI changes, both in kernel
> and in user space tools.
>
>> I'm only getting 0 length branch stacks now. Seems like it could be
>> something to do with the layout of perf samples because I know that was
>> done in separate commits:
> Right, might be.
>
>> sudo ./perf record -j any_call -- ls
>> ./perf report -D | grep "branch stack"
>> ... branch stack: nr:0
>> ... branch stack: nr:0
>> ... branch stack: nr:0
>> ... branch stack: nr:0
> I am planning to respin the series on 6.1-rc1 next week which should solve
> these multiple moving parts problem

There are some recent changes which require PMU driver to set data.sample_flags
indicating what kind of records are being filled in there. Here are the commits

a9a931e2666878343 ("perf: Use sample_flags for branch stack")
3aac580d5cc3001ca ("perf: Add sample_flags to indicate the PMU-filled sample data")

Following fix solves the problem for BRBE driver.

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 98e9a615d3cb..85a3aaefc0fb 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -877,6 +877,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
if (has_branch_stack(event)) {
cpu_pmu->brbe_read(cpuc, event);
data.br_stack = &cpuc->branches->brbe_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
cpu_pmu->brbe_reset(cpuc);
}