2023-01-23 13:00:18

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V8 0/6] arm64/perf: Enable branch stack sampling

This series enables perf branch stack sampling support on arm64 platform
via a new arch feature called Branch Record Buffer Extension (BRBE). All
relevant register definitions could be accessed here.

https://developer.arm.com/documentation/ddi0601/2021-12/AArch64-Registers

This series applies on v6.2-r5

Changes in V8:

- Replaced arm_pmu->features as arm_pmu->has_branch_stack, updated its helper
- Added a comment and line break before arm_pmu->private element
- Added WARN_ON_ONCE() in helpers i.e armv8pmu_branch_[read|valid|enable|disable]()
- Dropped comments in armv8pmu_enable_event() and armv8pmu_disable_event()
- Replaced open bank encoding in BRBFCR_EL1 with SYS_FIELD_PREP()
- Changed brbe_hw_attr->brbe_version from 'bool' to 'int'
- Updated pr_warn() as pr_warn_once() with values in brbe_get_perf_[type|priv]()
- Replaced all pr_warn_once() as pr_debug_once() in armv8pmu_branch_valid()
- Added a comment in branch_type_to_brbcr() for the BRBCR_EL1 privilege settings
- Modified the comment related to BRBINFx_EL1.LASTFAILED in capture_brbe_flags()
- Modified brbe_get_perf_entry_type() as brbe_set_perf_entry_type()
- Renamed brbe_valid() as brbe_record_is_complete()
- Renamed brbe_source() as brbe_record_is_source_only()
- Renamed brbe_target() as brbe_record_is_target_only()
- Inverted checks for !brbe_record_is_[target|source]_only() for info capture
- Replaced 'fetch' with 'get' in all helpers that extract field value
- Dropped 'static int brbe_current_bank' optimization in select_brbe_bank()
- Dropped select_brbe_bank_index() completely, added capture_branch_entry()
- Process captured branch entries in two separate loops one for each BRBE bank
- Moved branch_records_alloc() inside armv8pmu_probe_pmu()
- Added a forward declaration for the helper has_branch_stack()
- Added new callbacks armv8pmu_private_alloc() and armv8pmu_private_free()
- Updated armv8pmu_probe_pmu() to allocate the private structure before SMP call

Changes in V7:

https://lore.kernel.org/all/[email protected]/

- Folded [PATCH 7/7] into [PATCH 3/7] which enables branch stack sampling event
- Defined BRBFCR_EL1_BRANCH_FILTERS, BRBCR_EL1_DEFAULT_CONFIG in the header
- Defined BRBFCR_EL1_DEFAULT_CONFIG in the header
- Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_FZP
- Defined BRBCR_EL1_DEFAULT_TS in the header
- Updated BRBCR_EL1_DEFAULT_CONFIG with BRBCR_EL1_DEFAULT_TS
- Moved BRBCR_EL1_DEFAULT_CONFIG check inside branch_type_to_brbcr()
- Moved down BRBCR_EL1_CC, BRBCR_EL1_MPRED later in branch_type_to_brbcr()
- Also set BRBE in paused state in armv8pmu_branch_disable()
- Dropped brbe_paused(), set_brbe_paused() helpers
- Extracted error string via branch_filter_error_msg[] for armv8pmu_branch_valid()
- Replaced brbe_v1p1 with brbe_version in struct brbe_hw_attr
- Added valid_brbe_[cc, format, version]() helpers
- Split a separate brbe_attributes_probe() from armv8pmu_branch_probe()
- Capture event->attr.branch_sample_type earlier in armv8pmu_branch_valid()
- Defined enum brbe_bank_idx with possible values for BRBE bank indices
- Changed armpmu->hw_attr into armpmu->private
- Added missing space in stub definition for armv8pmu_branch_valid()
- Replaced both kmalloc() with kzalloc()
- Added BRBE_BANK_MAX_ENTRIES
- Updated comment for capture_brbe_flags()
- Updated comment for struct brbe_hw_attr
- Dropped space after type cast in couple of places
- Replaced inverse with negation for testing BRBCR_EL1_FZP in armv8pmu_branch_read()
- Captured cpuc->branches->branch_entries[idx] in a local variable
- Dropped saved_priv from armv8pmu_branch_read()
- Reorganize PERF_SAMPLE_BRANCH_NO_[CYCLES|NO_FLAGS] related configuration
- Replaced with FIELD_GET() and FIELD_PREP() wherever applicable
- Replaced BRBCR_EL1_TS_PHYSICAL with BRBCR_EL1_TS_VIRTUAL
- Moved valid_brbe_nr(), valid_brbe_cc(), valid_brbe_format(), valid_brbe_version()
select_brbe_bank(), select_brbe_bank_index() helpers inside the C implementation
- Reorganized brbe_valid_nr() and dropped the pr_warn() message
- Changed probe sequence in brbe_attributes_probe()
- Added 'brbcr' argument into capture_brbe_flags() to ascertain correct state
- Disable BRBE before disabling the PMU event counter
- Enable PERF_SAMPLE_BRANCH_HV filters when is_kernel_in_hyp_mode()
- Guard armv8pmu_reset() & armv8pmu_sched_task() with arm_pmu_branch_stack_supported()

Changes in V6:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

- Restore the exception level privilege after reading the branch records
- Unpause the buffer after reading the branch records
- Decouple BRBCR_EL1_EXCEPTION/ERTN from perf event privilege level
- Reworked BRBE implementation and branch stack sampling support on arm pmu
- BRBE implementation is now part of overall ARMV8 PMU implementation
- BRBE implementation moved from drivers/perf/ to inside arch/arm64/kernel/
- CONFIG_ARM_BRBE_PMU renamed as CONFIG_ARM64_BRBE in arch/arm64/Kconfig
- File moved - drivers/perf/arm_pmu_brbe.c -> arch/arm64/kernel/brbe.c
- File moved - drivers/perf/arm_pmu_brbe.h -> arch/arm64/kernel/brbe.h
- BRBE name has been dropped from struct arm_pmu and struct hw_pmu_events
- BRBE name has been abstracted out as 'branches' in arm_pmu and hw_pmu_events
- BRBE name has been abstracted out as 'branches' in ARMV8 PMU implementation
- Added sched_task() callback into struct arm_pmu
- Added 'hw_attr' into struct arm_pmu encapsulating possible PMU HW attributes
- Dropped explicit attributes brbe_(v1p1, nr, cc, format) from struct arm_pmu
- Dropped brbfcr, brbcr, registers scratch area from struct hw_pmu_events
- Dropped brbe_users, brbe_context tracking in struct hw_pmu_events
- Added 'features' tracking into struct arm_pmu with ARM_PMU_BRANCH_STACK flag
- armpmu->hw_attr maps into 'struct brbe_hw_attr' inside BRBE implementation
- Set ARM_PMU_BRANCH_STACK in 'arm_pmu->features' after successful BRBE probe
- Added armv8pmu_branch_reset() inside armv8pmu_branch_enable()
- Dropped brbe_supported() as events will be rejected via ARM_PMU_BRANCH_STACK
- Dropped set_brbe_disabled() as well
- Reformated armv8pmu_branch_valid() warnings while rejecting unsupported events

Changes in V5:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

- Changed BRBCR_EL1.VIRTUAL from 0b1 to 0b01
- Changed BRBFCR_EL1.EnL into BRBFCR_EL1.EnI
- Changed config ARM_BRBE_PMU from 'tristate' to 'bool'

Changes in V4:

https://lore.kernel.org/all/[email protected]/

- Changed ../tools/sysreg declarations as suggested
- Set PERF_SAMPLE_BRANCH_STACK in data.sample_flags
- Dropped perfmon_capable() check in armpmu_event_init()
- s/pr_warn_once/pr_info in armpmu_event_init()
- Added brbe_format element into struct pmu_hw_events
- Changed v1p1 as brbe_v1p1 in struct pmu_hw_events
- Dropped pr_info() from arm64_pmu_brbe_probe(), solved LOCKDEP warning

Changes in V3:

https://lore.kernel.org/all/[email protected]/

- Moved brbe_stack from the stack and now dynamically allocated
- Return PERF_BR_PRIV_UNKNOWN instead of -1 in brbe_fetch_perf_priv()
- Moved BRBIDR0, BRBCR, BRBFCR registers and fields into tools/sysreg
- Created dummy BRBINF_EL1 field definitions in tools/sysreg
- Dropped ARMPMU_EVT_PRIV framework which cached perfmon_capable()
- Both exception and exception return branche records are now captured
only if the event has PERF_SAMPLE_BRANCH_KERNEL which would already
been checked in generic perf via perf_allow_kernel()

Changes in V2:

https://lore.kernel.org/all/[email protected]/

- Dropped branch sample filter helpers consolidation patch from this series
- Added new hw_perf_event.flags element ARMPMU_EVT_PRIV to cache perfmon_capable()
- Use cached perfmon_capable() while configuring BRBE branch record filters

Changes in V1:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

- Added CONFIG_PERF_EVENTS wrapper for all branch sample filter helpers
- Process new perf branch types via PERF_BR_EXTEND_ABI

Changes in RFC V2:

https://lore.kernel.org/linux-arm-kernel/[email protected]/

- Added branch_sample_priv() while consolidating other branch sample filter helpers
- Changed all SYS_BRBXXXN_EL1 register definition encodings per Marc
- Changed the BRBE driver as per proposed BRBE related perf ABI changes (V5)
- Added documentation for struct arm_pmu changes, updated commit message
- Updated commit message for BRBE detection infrastructure patch
- PERF_SAMPLE_BRANCH_KERNEL gets checked during arm event init (outside the driver)
- Branch privilege state capture mechanism has now moved inside the driver

Changes in RFC V1:

https://lore.kernel.org/all/[email protected]/

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: James Clark <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Suzuki Poulose <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Anshuman Khandual (6):
drivers: perf: arm_pmu: Add new sched_task() callback
arm64/perf: Add BRBE registers and fields
arm64/perf: Add branch stack support in struct arm_pmu
arm64/perf: Add branch stack support in struct pmu_hw_events
arm64/perf: Add branch stack support in ARMV8 PMU
arm64/perf: Enable branch stack events via FEAT_BRBE

arch/arm64/Kconfig | 11 +
arch/arm64/include/asm/perf_event.h | 42 +++
arch/arm64/include/asm/sysreg.h | 103 ++++++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/brbe.c | 538 ++++++++++++++++++++++++++++
arch/arm64/kernel/brbe.h | 257 +++++++++++++
arch/arm64/kernel/perf_event.c | 78 ++--
arch/arm64/tools/sysreg | 161 +++++++++
drivers/perf/arm_pmu.c | 12 +-
include/linux/perf/arm_pmu.h | 22 +-
10 files changed, 1199 insertions(+), 26 deletions(-)
create mode 100644 arch/arm64/kernel/brbe.c
create mode 100644 arch/arm64/kernel/brbe.h

--
2.25.1



2023-01-23 13:00:22

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V8 1/6] drivers: perf: arm_pmu: Add new sched_task() callback

This adds armpmu_sched_task(), as generic pmu's sched_task() override which
in turn can utilize a new arm_pmu.sched_task() callback when available from
the arm_pmu instance. This new callback will be used while enabling BRBE in
ARMV8 PMU.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
drivers/perf/arm_pmu.c | 9 +++++++++
include/linux/perf/arm_pmu.h | 1 +
2 files changed, 10 insertions(+)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 9b593f985805..14a3ed3bdb0b 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -517,6 +517,14 @@ static int armpmu_event_init(struct perf_event *event)
return __hw_perf_event_init(event);
}

+static void armpmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu);
+
+ if (armpmu->sched_task)
+ armpmu->sched_task(pmu_ctx, sched_in);
+}
+
static void armpmu_enable(struct pmu *pmu)
{
struct arm_pmu *armpmu = to_arm_pmu(pmu);
@@ -873,6 +881,7 @@ struct arm_pmu *armpmu_alloc(void)
}

pmu->pmu = (struct pmu) {
+ .sched_task = armpmu_sched_task,
.pmu_enable = armpmu_enable,
.pmu_disable = armpmu_disable,
.event_init = armpmu_event_init,
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index ef914a600087..2a9d07cee927 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -101,6 +101,7 @@ struct arm_pmu {
void (*reset)(void *);
int (*map_event)(struct perf_event *event);
bool (*filter)(struct pmu *pmu, int cpu);
+ void (*sched_task)(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
int num_events;
bool secure_access; /* 32-bit ARM only */
#define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
--
2.25.1


2023-01-23 13:00:36

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V8 2/6] arm64/perf: Add BRBE registers and fields

This adds BRBE related register definitions and various other related field
macros there in. These will be used subsequently in a BRBE driver which is
being added later on.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Mark Brown <[email protected]>
Signed-off-by: Anshuman Khandual <[email protected]>
---
arch/arm64/include/asm/sysreg.h | 103 ++++++++++++++++++++
arch/arm64/tools/sysreg | 161 ++++++++++++++++++++++++++++++++
2 files changed, 264 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 1312fb48f18b..05b70e8b7f83 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -165,6 +165,109 @@
#define SYS_DBGDTRTX_EL0 sys_reg(2, 3, 0, 5, 0)
#define SYS_DBGVCR32_EL2 sys_reg(2, 4, 0, 7, 0)

+#define __SYS_BRBINFO(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 0))
+#define __SYS_BRBSRC(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 1))
+#define __SYS_BRBTGT(n) sys_reg(2, 1, 8, ((n) & 0xf), ((((n) & 0x10)) >> 2 + 2))
+
+#define SYS_BRBINF0_EL1 __SYS_BRBINFO(0)
+#define SYS_BRBINF1_EL1 __SYS_BRBINFO(1)
+#define SYS_BRBINF2_EL1 __SYS_BRBINFO(2)
+#define SYS_BRBINF3_EL1 __SYS_BRBINFO(3)
+#define SYS_BRBINF4_EL1 __SYS_BRBINFO(4)
+#define SYS_BRBINF5_EL1 __SYS_BRBINFO(5)
+#define SYS_BRBINF6_EL1 __SYS_BRBINFO(6)
+#define SYS_BRBINF7_EL1 __SYS_BRBINFO(7)
+#define SYS_BRBINF8_EL1 __SYS_BRBINFO(8)
+#define SYS_BRBINF9_EL1 __SYS_BRBINFO(9)
+#define SYS_BRBINF10_EL1 __SYS_BRBINFO(10)
+#define SYS_BRBINF11_EL1 __SYS_BRBINFO(11)
+#define SYS_BRBINF12_EL1 __SYS_BRBINFO(12)
+#define SYS_BRBINF13_EL1 __SYS_BRBINFO(13)
+#define SYS_BRBINF14_EL1 __SYS_BRBINFO(14)
+#define SYS_BRBINF15_EL1 __SYS_BRBINFO(15)
+#define SYS_BRBINF16_EL1 __SYS_BRBINFO(16)
+#define SYS_BRBINF17_EL1 __SYS_BRBINFO(17)
+#define SYS_BRBINF18_EL1 __SYS_BRBINFO(18)
+#define SYS_BRBINF19_EL1 __SYS_BRBINFO(19)
+#define SYS_BRBINF20_EL1 __SYS_BRBINFO(20)
+#define SYS_BRBINF21_EL1 __SYS_BRBINFO(21)
+#define SYS_BRBINF22_EL1 __SYS_BRBINFO(22)
+#define SYS_BRBINF23_EL1 __SYS_BRBINFO(23)
+#define SYS_BRBINF24_EL1 __SYS_BRBINFO(24)
+#define SYS_BRBINF25_EL1 __SYS_BRBINFO(25)
+#define SYS_BRBINF26_EL1 __SYS_BRBINFO(26)
+#define SYS_BRBINF27_EL1 __SYS_BRBINFO(27)
+#define SYS_BRBINF28_EL1 __SYS_BRBINFO(28)
+#define SYS_BRBINF29_EL1 __SYS_BRBINFO(29)
+#define SYS_BRBINF30_EL1 __SYS_BRBINFO(30)
+#define SYS_BRBINF31_EL1 __SYS_BRBINFO(31)
+
+#define SYS_BRBSRC0_EL1 __SYS_BRBSRC(0)
+#define SYS_BRBSRC1_EL1 __SYS_BRBSRC(1)
+#define SYS_BRBSRC2_EL1 __SYS_BRBSRC(2)
+#define SYS_BRBSRC3_EL1 __SYS_BRBSRC(3)
+#define SYS_BRBSRC4_EL1 __SYS_BRBSRC(4)
+#define SYS_BRBSRC5_EL1 __SYS_BRBSRC(5)
+#define SYS_BRBSRC6_EL1 __SYS_BRBSRC(6)
+#define SYS_BRBSRC7_EL1 __SYS_BRBSRC(7)
+#define SYS_BRBSRC8_EL1 __SYS_BRBSRC(8)
+#define SYS_BRBSRC9_EL1 __SYS_BRBSRC(9)
+#define SYS_BRBSRC10_EL1 __SYS_BRBSRC(10)
+#define SYS_BRBSRC11_EL1 __SYS_BRBSRC(11)
+#define SYS_BRBSRC12_EL1 __SYS_BRBSRC(12)
+#define SYS_BRBSRC13_EL1 __SYS_BRBSRC(13)
+#define SYS_BRBSRC14_EL1 __SYS_BRBSRC(14)
+#define SYS_BRBSRC15_EL1 __SYS_BRBSRC(15)
+#define SYS_BRBSRC16_EL1 __SYS_BRBSRC(16)
+#define SYS_BRBSRC17_EL1 __SYS_BRBSRC(17)
+#define SYS_BRBSRC18_EL1 __SYS_BRBSRC(18)
+#define SYS_BRBSRC19_EL1 __SYS_BRBSRC(19)
+#define SYS_BRBSRC20_EL1 __SYS_BRBSRC(20)
+#define SYS_BRBSRC21_EL1 __SYS_BRBSRC(21)
+#define SYS_BRBSRC22_EL1 __SYS_BRBSRC(22)
+#define SYS_BRBSRC23_EL1 __SYS_BRBSRC(23)
+#define SYS_BRBSRC24_EL1 __SYS_BRBSRC(24)
+#define SYS_BRBSRC25_EL1 __SYS_BRBSRC(25)
+#define SYS_BRBSRC26_EL1 __SYS_BRBSRC(26)
+#define SYS_BRBSRC27_EL1 __SYS_BRBSRC(27)
+#define SYS_BRBSRC28_EL1 __SYS_BRBSRC(28)
+#define SYS_BRBSRC29_EL1 __SYS_BRBSRC(29)
+#define SYS_BRBSRC30_EL1 __SYS_BRBSRC(30)
+#define SYS_BRBSRC31_EL1 __SYS_BRBSRC(31)
+
+#define SYS_BRBTGT0_EL1 __SYS_BRBTGT(0)
+#define SYS_BRBTGT1_EL1 __SYS_BRBTGT(1)
+#define SYS_BRBTGT2_EL1 __SYS_BRBTGT(2)
+#define SYS_BRBTGT3_EL1 __SYS_BRBTGT(3)
+#define SYS_BRBTGT4_EL1 __SYS_BRBTGT(4)
+#define SYS_BRBTGT5_EL1 __SYS_BRBTGT(5)
+#define SYS_BRBTGT6_EL1 __SYS_BRBTGT(6)
+#define SYS_BRBTGT7_EL1 __SYS_BRBTGT(7)
+#define SYS_BRBTGT8_EL1 __SYS_BRBTGT(8)
+#define SYS_BRBTGT9_EL1 __SYS_BRBTGT(9)
+#define SYS_BRBTGT10_EL1 __SYS_BRBTGT(10)
+#define SYS_BRBTGT11_EL1 __SYS_BRBTGT(11)
+#define SYS_BRBTGT12_EL1 __SYS_BRBTGT(12)
+#define SYS_BRBTGT13_EL1 __SYS_BRBTGT(13)
+#define SYS_BRBTGT14_EL1 __SYS_BRBTGT(14)
+#define SYS_BRBTGT15_EL1 __SYS_BRBTGT(15)
+#define SYS_BRBTGT16_EL1 __SYS_BRBTGT(16)
+#define SYS_BRBTGT17_EL1 __SYS_BRBTGT(17)
+#define SYS_BRBTGT18_EL1 __SYS_BRBTGT(18)
+#define SYS_BRBTGT19_EL1 __SYS_BRBTGT(19)
+#define SYS_BRBTGT20_EL1 __SYS_BRBTGT(20)
+#define SYS_BRBTGT21_EL1 __SYS_BRBTGT(21)
+#define SYS_BRBTGT22_EL1 __SYS_BRBTGT(22)
+#define SYS_BRBTGT23_EL1 __SYS_BRBTGT(23)
+#define SYS_BRBTGT24_EL1 __SYS_BRBTGT(24)
+#define SYS_BRBTGT25_EL1 __SYS_BRBTGT(25)
+#define SYS_BRBTGT26_EL1 __SYS_BRBTGT(26)
+#define SYS_BRBTGT27_EL1 __SYS_BRBTGT(27)
+#define SYS_BRBTGT28_EL1 __SYS_BRBTGT(28)
+#define SYS_BRBTGT29_EL1 __SYS_BRBTGT(29)
+#define SYS_BRBTGT30_EL1 __SYS_BRBTGT(30)
+#define SYS_BRBTGT31_EL1 __SYS_BRBTGT(31)
+
#define SYS_MIDR_EL1 sys_reg(3, 0, 0, 0, 0)
#define SYS_MPIDR_EL1 sys_reg(3, 0, 0, 0, 5)
#define SYS_REVIDR_EL1 sys_reg(3, 0, 0, 0, 6)
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 184e58fd5631..a7f9054bd84c 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -921,6 +921,167 @@ Enum 3:0 BT
EndEnum
EndSysreg

+
+# This is just a dummy register declaration to get all common field masks and
+# shifts for accessing given BRBINF contents.
+Sysreg BRBINF_EL1 2 1 8 0 0
+Res0 63:47
+Field 46 CCU
+Field 45:32 CC
+Res0 31:18
+Field 17 LASTFAILED
+Field 16 T
+Res0 15:14
+Enum 13:8 TYPE
+ 0b000000 UNCOND_DIR
+ 0b000001 INDIR
+ 0b000010 DIR_LINK
+ 0b000011 INDIR_LINK
+ 0b000101 RET_SUB
+ 0b000111 RET_EXCPT
+ 0b001000 COND_DIR
+ 0b100001 DEBUG_HALT
+ 0b100010 CALL
+ 0b100011 TRAP
+ 0b100100 SERROR
+ 0b100110 INST_DEBUG
+ 0b100111 DATA_DEBUG
+ 0b101010 ALGN_FAULT
+ 0b101011 INST_FAULT
+ 0b101100 DATA_FAULT
+ 0b101110 IRQ
+ 0b101111 FIQ
+ 0b111001 DEBUG_EXIT
+EndEnum
+Enum 7:6 EL
+ 0b00 EL0
+ 0b01 EL1
+ 0b10 EL2
+ 0b11 EL3
+EndEnum
+Field 5 MPRED
+Res0 4:2
+Enum 1:0 VALID
+ 0b00 NONE
+ 0b01 TARGET
+ 0b10 SOURCE
+ 0b11 FULL
+EndEnum
+EndSysreg
+
+Sysreg BRBCR_EL1 2 1 9 0 0
+Res0 63:24
+Field 23 EXCEPTION
+Field 22 ERTN
+Res0 21:9
+Field 8 FZP
+Res0 7
+Enum 6:5 TS
+ 0b01 VIRTUAL
+ 0b10 GST_PHYSICAL
+ 0b11 PHYSICAL
+EndEnum
+Field 4 MPRED
+Field 3 CC
+Res0 2
+Field 1 E1BRE
+Field 0 E0BRE
+EndSysreg
+
+Sysreg BRBFCR_EL1 2 1 9 0 1
+Res0 63:30
+Enum 29:28 BANK
+ 0b0 FIRST
+ 0b1 SECOND
+EndEnum
+Res0 27:23
+Field 22 CONDDIR
+Field 21 DIRCALL
+Field 20 INDCALL
+Field 19 RTN
+Field 18 INDIRECT
+Field 17 DIRECT
+Field 16 EnI
+Res0 15:8
+Field 7 PAUSED
+Field 6 LASTFAILED
+Res0 5:0
+EndSysreg
+
+Sysreg BRBTS_EL1 2 1 9 0 2
+Field 63:0 TS
+EndSysreg
+
+Sysreg BRBINFINJ_EL1 2 1 9 1 0
+Res0 63:47
+Field 46 CCU
+Field 45:32 CC
+Res0 31:18
+Field 17 LASTFAILED
+Field 16 T
+Res0 15:14
+Enum 13:8 TYPE
+ 0b000000 UNCOND_DIR
+ 0b000001 INDIR
+ 0b000010 DIR_LINK
+ 0b000011 INDIR_LINK
+ 0b000100 RET_SUB
+ 0b000100 RET_SUB
+ 0b000111 RET_EXCPT
+ 0b001000 COND_DIR
+ 0b100001 DEBUG_HALT
+ 0b100010 CALL
+ 0b100011 TRAP
+ 0b100100 SERROR
+ 0b100110 INST_DEBUG
+ 0b100111 DATA_DEBUG
+ 0b101010 ALGN_FAULT
+ 0b101011 INST_FAULT
+ 0b101100 DATA_FAULT
+ 0b101110 IRQ
+ 0b101111 FIQ
+ 0b111001 DEBUG_EXIT
+EndEnum
+Enum 7:6 EL
+ 0b00 EL0
+ 0b01 EL1
+ 0b10 EL2
+ 0b11 EL3
+EndEnum
+Field 5 MPRED
+Res0 4:2
+Enum 1:0 VALID
+ 0b00 NONE
+ 0b01 TARGET
+ 0b10 SOURCE
+ 0b00 FULL
+EndEnum
+EndSysreg
+
+Sysreg BRBSRCINJ_EL1 2 1 9 1 1
+Field 63:0 ADDRESS
+EndSysreg
+
+Sysreg BRBTGTINJ_EL1 2 1 9 1 2
+Field 63:0 ADDRESS
+EndSysreg
+
+Sysreg BRBIDR0_EL1 2 1 9 2 0
+Res0 63:16
+Enum 15:12 CC
+ 0b101 20_BIT
+EndEnum
+Enum 11:8 FORMAT
+ 0b0 0
+EndEnum
+Enum 7:0 NUMREC
+ 0b1000 8
+ 0b10000 16
+ 0b100000 32
+ 0b1000000 64
+EndEnum
+EndSysreg
+
Sysreg ID_AA64ZFR0_EL1 3 0 0 4 4
Res0 63:60
Enum 59:56 F64MM
--
2.25.1


2023-01-23 13:00:44

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V8 3/6] arm64/perf: Add branch stack support in struct arm_pmu

This updates 'struct arm_pmu' for branch stack sampling support later. This
adds a new 'features' element in the structure to track supported features,
and another 'private' element to encapsulate implementation attributes on a
given 'struct arm_pmu'. These updates here will help in tracking any branch
stack sampling support, which is being added later. This also adds a helper
arm_pmu_branch_stack_supported().

This also enables perf branch stack sampling event on all 'struct arm pmu',
supporting the feature but after removing the current gate that blocks such
events unconditionally in armpmu_event_init(). Instead a quick probe can be
initiated via arm_pmu_branch_stack_supported() to ascertain the support.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
drivers/perf/arm_pmu.c | 3 +--
include/linux/perf/arm_pmu.h | 12 +++++++++++-
2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 14a3ed3bdb0b..a85b2d67022e 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -510,8 +510,7 @@ static int armpmu_event_init(struct perf_event *event)
!cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
return -ENOENT;

- /* does not support taken branch sampling */
- if (has_branch_stack(event))
+ if (has_branch_stack(event) && !arm_pmu_branch_stack_supported(armpmu))
return -EOPNOTSUPP;

return __hw_perf_event_init(event);
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 2a9d07cee927..78cb1c895ab9 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -103,7 +103,9 @@ struct arm_pmu {
bool (*filter)(struct pmu *pmu, int cpu);
void (*sched_task)(struct perf_event_pmu_context *pmu_ctx, bool sched_in);
int num_events;
- bool secure_access; /* 32-bit ARM only */
+ unsigned int secure_access : 1, /* 32-bit ARM only */
+ has_branch_stack: 1, /* 64-bit ARM only */
+ reserved : 30;
#define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
#define ARMV8_PMUV3_EXT_COMMON_EVENT_BASE 0x4000
@@ -119,8 +121,16 @@ struct arm_pmu {

/* Only to be used by ACPI probing code */
unsigned long acpi_cpuid;
+
+ /* Implementation specific attributes */
+ void *private;
};

+static inline bool arm_pmu_branch_stack_supported(struct arm_pmu *armpmu)
+{
+ return armpmu->has_branch_stack;
+}
+
#define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))

u64 armpmu_event_update(struct perf_event *event);
--
2.25.1


2023-01-23 13:00:52

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V8 4/6] arm64/perf: Add branch stack support in struct pmu_hw_events

This adds branch records buffer pointer in 'struct pmu_hw_events' which can
be used to capture branch records during PMU interrupt. This percpu pointer
here needs to be allocated first before usage.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
include/linux/perf/arm_pmu.h | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 78cb1c895ab9..bd6dc6d0bd44 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -44,6 +44,13 @@ static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_47BIT) == ARMPMU_EVT_47BIT);
}, \
}

+#define MAX_BRANCH_RECORDS 64
+
+struct branch_records {
+ struct perf_branch_stack branch_stack;
+ struct perf_branch_entry branch_entries[MAX_BRANCH_RECORDS];
+};
+
/* The events for a given PMU register set. */
struct pmu_hw_events {
/*
@@ -70,6 +77,8 @@ struct pmu_hw_events {
struct arm_pmu *percpu_pmu;

int irq;
+
+ struct branch_records *branches;
};

enum armpmu_attr_groups {
--
2.25.1


2023-01-23 13:01:00

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V8 5/6] arm64/perf: Add branch stack support in ARMV8 PMU

This enables support for branch stack sampling event in ARMV8 PMU, checking
has_branch_stack() on the event inside 'struct arm_pmu' callbacks. Although
these branch stack helpers armv8pmu_branch_XXXXX() are just dummy functions
for now. While here, this also defines arm_pmu's sched_task() callback with
armv8pmu_sched_task(), which resets the branch record buffer on a sched_in.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
arch/arm64/include/asm/perf_event.h | 31 ++++++++++++
arch/arm64/kernel/perf_event.c | 78 ++++++++++++++++++++---------
2 files changed, 86 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index 3eaf462f5752..83951fdeccf3 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -273,4 +273,35 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs);
(regs)->pstate = PSR_MODE_EL1h; \
}

+struct pmu_hw_events;
+struct arm_pmu;
+struct perf_event;
+
+static inline bool has_branch_stack(struct perf_event *event);
+
+static inline void armv8pmu_branch_read(struct pmu_hw_events *cpuc, struct perf_event *event)
+{
+ WARN_ON_ONCE(!has_branch_stack(event));
+}
+
+static inline bool armv8pmu_branch_valid(struct perf_event *event)
+{
+ WARN_ON_ONCE(!has_branch_stack(event));
+ return false;
+}
+
+static inline void armv8pmu_branch_enable(struct perf_event *event)
+{
+ WARN_ON_ONCE(!has_branch_stack(event));
+}
+
+static inline void armv8pmu_branch_disable(struct perf_event *event)
+{
+ WARN_ON_ONCE(!has_branch_stack(event));
+}
+
+static inline void armv8pmu_branch_probe(struct arm_pmu *arm_pmu) { }
+static inline void armv8pmu_branch_reset(void) { }
+static inline int armv8pmu_private_alloc(struct arm_pmu *arm_pmu) { return 0; }
+static inline void armv8pmu_private_free(struct arm_pmu *arm_pmu) { }
#endif
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index a5193f2146a6..f0689c84530b 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -769,38 +769,21 @@ static void armv8pmu_enable_event(struct perf_event *event)
* Enable counter and interrupt, and set the counter to count
* the event that we're interested in.
*/
-
- /*
- * Disable counter
- */
armv8pmu_disable_event_counter(event);
-
- /*
- * Set event.
- */
armv8pmu_write_event_type(event);
-
- /*
- * Enable interrupt for this counter
- */
armv8pmu_enable_event_irq(event);
-
- /*
- * Enable counter
- */
armv8pmu_enable_event_counter(event);
+
+ if (has_branch_stack(event))
+ armv8pmu_branch_enable(event);
}

static void armv8pmu_disable_event(struct perf_event *event)
{
- /*
- * Disable counter
- */
- armv8pmu_disable_event_counter(event);
+ if (has_branch_stack(event))
+ armv8pmu_branch_disable(event);

- /*
- * Disable interrupt for this counter
- */
+ armv8pmu_disable_event_counter(event);
armv8pmu_disable_event_irq(event);
}

@@ -878,6 +861,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
if (!armpmu_event_set_period(event))
continue;

+ if (has_branch_stack(event)) {
+ WARN_ON(!cpuc->branches);
+ armv8pmu_branch_read(cpuc, event);
+ data.br_stack = &cpuc->branches->branch_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+ }
+
/*
* Perf event overflow will queue the processing of the event as
* an irq_work which will be taken care of in the handling of
@@ -976,6 +966,14 @@ static int armv8pmu_user_event_idx(struct perf_event *event)
return event->hw.idx;
}

+static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
+{
+ struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu);
+
+ if (sched_in && arm_pmu_branch_stack_supported(armpmu))
+ armv8pmu_branch_reset();
+}
+
/*
* Add an event filter to a given event.
*/
@@ -1052,6 +1050,9 @@ static void armv8pmu_reset(void *info)
pmcr |= ARMV8_PMU_PMCR_LP;

armv8pmu_pmcr_write(pmcr);
+
+ if (arm_pmu_branch_stack_supported(cpu_pmu))
+ armv8pmu_branch_reset();
}

static int __armv8_pmuv3_map_event(struct perf_event *event,
@@ -1069,6 +1070,9 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
&armv8_pmuv3_perf_cache_map,
ARMV8_PMU_EVTYPE_EVENT);

+ if (has_branch_stack(event) && !armv8pmu_branch_valid(event))
+ return -EOPNOTSUPP;
+
if (armv8pmu_event_is_64bit(event))
event->hw.flags |= ARMPMU_EVT_64BIT;

@@ -1181,6 +1185,21 @@ static void __armv8pmu_probe_pmu(void *info)
cpu_pmu->reg_pmmir = read_cpuid(PMMIR_EL1);
else
cpu_pmu->reg_pmmir = 0;
+ armv8pmu_branch_probe(cpu_pmu);
+}
+
+static int branch_records_alloc(struct arm_pmu *armpmu)
+{
+ struct pmu_hw_events *events;
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ events = per_cpu_ptr(armpmu->hw_events, cpu);
+ events->branches = kzalloc(sizeof(struct branch_records), GFP_KERNEL);
+ if (!events->branches)
+ return -ENOMEM;
+ }
+ return 0;
}

static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
@@ -1191,12 +1210,24 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
};
int ret;

+ ret = armv8pmu_private_alloc(cpu_pmu);
+ if (ret)
+ return ret;
+
ret = smp_call_function_any(&cpu_pmu->supported_cpus,
__armv8pmu_probe_pmu,
&probe, 1);
if (ret)
return ret;

+ if (arm_pmu_branch_stack_supported(cpu_pmu)) {
+ ret = branch_records_alloc(cpu_pmu);
+ if (ret)
+ return ret;
+ } else {
+ armv8pmu_private_free(cpu_pmu);
+ }
+
return probe.present ? 0 : -ENODEV;
}

@@ -1261,6 +1292,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
cpu_pmu->filter = armv8pmu_filter;

cpu_pmu->pmu.event_idx = armv8pmu_user_event_idx;
+ cpu_pmu->sched_task = armv8pmu_sched_task;

cpu_pmu->name = name;
cpu_pmu->map_event = map_event;
--
2.25.1


2023-01-23 13:01:12

by Anshuman Khandual

[permalink] [raw]
Subject: [PATCH V8 6/6] arm64/perf: Enable branch stack events via FEAT_BRBE

This enables branch stack sampling events in ARMV8 PMU, via an architecture
feature FEAT_BRBE aka branch record buffer extension. This defines required
branch helper functions pmuv8pmu_branch_XXXXX() and the implementation here
is wrapped with a new config option CONFIG_ARM64_BRBE.

Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Anshuman Khandual <[email protected]>
---
arch/arm64/Kconfig | 11 +
arch/arm64/include/asm/perf_event.h | 11 +
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/brbe.c | 538 ++++++++++++++++++++++++++++
arch/arm64/kernel/brbe.h | 257 +++++++++++++
5 files changed, 818 insertions(+)
create mode 100644 arch/arm64/kernel/brbe.c
create mode 100644 arch/arm64/kernel/brbe.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c5ccca26a408..ff5fab0ae771 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1377,6 +1377,17 @@ config HW_PERF_EVENTS
def_bool y
depends on ARM_PMU

+config ARM64_BRBE
+ bool "Enable support for Branch Record Buffer Extension (BRBE)"
+ depends on PERF_EVENTS && ARM64 && ARM_PMU
+ default y
+ help
+ Enable perf support for Branch Record Buffer Extension (BRBE) which
+ records all branches taken in an execution path. This supports some
+ branch types and privilege based filtering. It captured additional
+ relevant information such as cycle count, misprediction and branch
+ type, branch privilege level etc.
+
# Supported by clang >= 7.0 or GCC >= 12.0.0
config CC_HAVE_SHADOW_CALL_STACK
def_bool $(cc-option, -fsanitize=shadow-call-stack -ffixed-x18)
diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index 83951fdeccf3..97db96326e13 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -279,6 +279,16 @@ struct perf_event;

static inline bool has_branch_stack(struct perf_event *event);

+#ifdef CONFIG_ARM64_BRBE
+void armv8pmu_branch_read(struct pmu_hw_events *cpuc, struct perf_event *event);
+bool armv8pmu_branch_valid(struct perf_event *event);
+void armv8pmu_branch_enable(struct perf_event *event);
+void armv8pmu_branch_disable(struct perf_event *event);
+void armv8pmu_branch_probe(struct arm_pmu *arm_pmu);
+void armv8pmu_branch_reset(void);
+int armv8pmu_private_alloc(struct arm_pmu *arm_pmu);
+void armv8pmu_private_free(struct arm_pmu *arm_pmu);
+#else
static inline void armv8pmu_branch_read(struct pmu_hw_events *cpuc, struct perf_event *event)
{
WARN_ON_ONCE(!has_branch_stack(event));
@@ -305,3 +315,4 @@ static inline void armv8pmu_branch_reset(void) { }
static inline int armv8pmu_private_alloc(struct arm_pmu *arm_pmu) { return 0; }
static inline void armv8pmu_private_free(struct arm_pmu *arm_pmu) { }
#endif
+#endif
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index ceba6792f5b3..6ee7ccb61621 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -46,6 +46,7 @@ obj-$(CONFIG_MODULES) += module.o
obj-$(CONFIG_ARM64_MODULE_PLTS) += module-plts.o
obj-$(CONFIG_PERF_EVENTS) += perf_regs.o perf_callchain.o
obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o
+obj-$(CONFIG_ARM64_BRBE) += brbe.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
obj-$(CONFIG_CPU_PM) += sleep.o suspend.o
obj-$(CONFIG_CPU_IDLE) += cpuidle.o
diff --git a/arch/arm64/kernel/brbe.c b/arch/arm64/kernel/brbe.c
new file mode 100644
index 000000000000..81ea6563e153
--- /dev/null
+++ b/arch/arm64/kernel/brbe.c
@@ -0,0 +1,538 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Branch Record Buffer Extension Driver.
+ *
+ * Copyright (C) 2022 ARM Limited
+ *
+ * Author: Anshuman Khandual <[email protected]>
+ */
+#include "brbe.h"
+
+static bool valid_brbe_nr(int brbe_nr)
+{
+ return brbe_nr == BRBIDR0_EL1_NUMREC_8 ||
+ brbe_nr == BRBIDR0_EL1_NUMREC_16 ||
+ brbe_nr == BRBIDR0_EL1_NUMREC_32 ||
+ brbe_nr == BRBIDR0_EL1_NUMREC_64;
+}
+
+static bool valid_brbe_cc(int brbe_cc)
+{
+ return brbe_cc == BRBIDR0_EL1_CC_20_BIT;
+}
+
+static bool valid_brbe_format(int brbe_format)
+{
+ return brbe_format == BRBIDR0_EL1_FORMAT_0;
+}
+
+static bool valid_brbe_version(int brbe_version)
+{
+ return brbe_version == ID_AA64DFR0_EL1_BRBE_IMP ||
+ brbe_version == ID_AA64DFR0_EL1_BRBE_BRBE_V1P1;
+}
+
+static void select_brbe_bank(int bank)
+{
+ u64 brbfcr;
+
+ WARN_ON(bank > BRBE_BANK_IDX_1);
+ brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
+ brbfcr &= ~BRBFCR_EL1_BANK_MASK;
+ brbfcr |= SYS_FIELD_PREP(BRBFCR_EL1, BANK, bank);
+ write_sysreg_s(brbfcr, SYS_BRBFCR_EL1);
+ isb();
+}
+
+static const char branch_filter_error_msg[] = "branch filter not supported";
+
+bool armv8pmu_branch_valid(struct perf_event *event)
+{
+ u64 branch_type = event->attr.branch_sample_type;
+
+ /*
+ * If the event does not have at least one of the privilege
+ * branch filters as in PERF_SAMPLE_BRANCH_PLM_ALL, the core
+ * perf will adjust its value based on perf event's existing
+ * privilege level via attr.exclude_[user|kernel|hv].
+ *
+ * As event->attr.branch_sample_type might have been changed
+ * when the event reaches here, it is not possible to figure
+ * out whether the event originally had HV privilege request
+ * or got added via the core perf. Just report this situation
+ * once and continue ignoring if there are other instances.
+ */
+ if ((branch_type & PERF_SAMPLE_BRANCH_HV) && !is_kernel_in_hyp_mode())
+ pr_debug_once("%s - hypervisor privilege\n", branch_filter_error_msg);
+
+ if (branch_type & PERF_SAMPLE_BRANCH_ABORT_TX) {
+ pr_debug_once("%s - aborted transaction\n", branch_filter_error_msg);
+ return false;
+ }
+
+ if (branch_type & PERF_SAMPLE_BRANCH_NO_TX) {
+ pr_debug_once("%s - no transaction\n", branch_filter_error_msg);
+ return false;
+ }
+
+ if (branch_type & PERF_SAMPLE_BRANCH_IN_TX) {
+ pr_debug_once("%s - in transaction\n", branch_filter_error_msg);
+ return false;
+ }
+ return true;
+}
+
+int armv8pmu_private_alloc(struct arm_pmu *arm_pmu)
+{
+ struct brbe_hw_attr *brbe_attr = kzalloc(sizeof(struct brbe_hw_attr), GFP_KERNEL);
+
+ if (!brbe_attr)
+ return -ENOMEM;
+
+ arm_pmu->private = brbe_attr;
+ return 0;
+}
+
+void armv8pmu_private_free(struct arm_pmu *arm_pmu)
+{
+ kfree(arm_pmu->private);
+}
+
+static int brbe_attributes_probe(struct arm_pmu *armpmu, u32 brbe)
+{
+ struct brbe_hw_attr *brbe_attr = (struct brbe_hw_attr *)armpmu->private;
+ u64 brbidr = read_sysreg_s(SYS_BRBIDR0_EL1);
+
+ brbe_attr->brbe_version = brbe;
+ brbe_attr->brbe_format = brbe_get_format(brbidr);
+ brbe_attr->brbe_cc = brbe_get_cc_bits(brbidr);
+ brbe_attr->brbe_nr = brbe_get_numrec(brbidr);
+
+ if (!valid_brbe_version(brbe_attr->brbe_version) ||
+ !valid_brbe_format(brbe_attr->brbe_format) ||
+ !valid_brbe_cc(brbe_attr->brbe_cc) ||
+ !valid_brbe_nr(brbe_attr->brbe_nr))
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+void armv8pmu_branch_probe(struct arm_pmu *armpmu)
+{
+ u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
+ u32 brbe;
+
+ brbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT);
+ if (!brbe)
+ return;
+
+ if (brbe_attributes_probe(armpmu, brbe))
+ return;
+
+ armpmu->has_branch_stack = 1;
+}
+
+static u64 branch_type_to_brbfcr(int branch_type)
+{
+ u64 brbfcr = 0;
+
+ if (branch_type & PERF_SAMPLE_BRANCH_ANY) {
+ brbfcr |= BRBFCR_EL1_BRANCH_FILTERS;
+ return brbfcr;
+ }
+
+ if (branch_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
+ brbfcr |= BRBFCR_EL1_INDCALL;
+ brbfcr |= BRBFCR_EL1_DIRCALL;
+ }
+
+ if (branch_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
+ brbfcr |= BRBFCR_EL1_RTN;
+
+ if (branch_type & PERF_SAMPLE_BRANCH_IND_CALL)
+ brbfcr |= BRBFCR_EL1_INDCALL;
+
+ if (branch_type & PERF_SAMPLE_BRANCH_COND)
+ brbfcr |= BRBFCR_EL1_CONDDIR;
+
+ if (branch_type & PERF_SAMPLE_BRANCH_IND_JUMP)
+ brbfcr |= BRBFCR_EL1_INDIRECT;
+
+ if (branch_type & PERF_SAMPLE_BRANCH_CALL)
+ brbfcr |= BRBFCR_EL1_DIRCALL;
+
+ return brbfcr;
+}
+
+static u64 branch_type_to_brbcr(int branch_type)
+{
+ u64 brbcr = (BRBCR_EL1_FZP | BRBCR_EL1_DEFAULT_TS);
+
+ /*
+ * When running in the hyp mode, writing into BRBCR_EL1
+ * actually writes into BRBCR_EL2 instead. Field E2BRE
+ * is also at the same position as E1BRE.
+ */
+ if (branch_type & PERF_SAMPLE_BRANCH_USER)
+ brbcr |= BRBCR_EL1_E0BRE;
+
+ if (branch_type & PERF_SAMPLE_BRANCH_KERNEL)
+ brbcr |= BRBCR_EL1_E1BRE;
+
+ if (branch_type & PERF_SAMPLE_BRANCH_HV) {
+ if (is_kernel_in_hyp_mode())
+ brbcr |= BRBCR_EL1_E1BRE;
+ }
+
+ if (!(branch_type & PERF_SAMPLE_BRANCH_NO_CYCLES))
+ brbcr |= BRBCR_EL1_CC;
+
+ if (!(branch_type & PERF_SAMPLE_BRANCH_NO_FLAGS))
+ brbcr |= BRBCR_EL1_MPRED;
+
+ /*
+ * The exception and exception return branches could be
+ * captured, irrespective of the perf event's privilege.
+ * If the perf event does not have enough privilege for
+ * a given exception level, then addresses which falls
+ * under that exception level will be reported as zero
+ * for the captured branch record, creating source only
+ * or target only records.
+ */
+ if (branch_type & PERF_SAMPLE_BRANCH_ANY) {
+ brbcr |= BRBCR_EL1_EXCEPTION;
+ brbcr |= BRBCR_EL1_ERTN;
+ }
+
+ if (branch_type & PERF_SAMPLE_BRANCH_ANY_CALL)
+ brbcr |= BRBCR_EL1_EXCEPTION;
+
+ if (branch_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
+ brbcr |= BRBCR_EL1_ERTN;
+
+ return brbcr & BRBCR_EL1_DEFAULT_CONFIG;
+}
+
+void armv8pmu_branch_enable(struct perf_event *event)
+{
+ u64 branch_type = event->attr.branch_sample_type;
+ u64 brbfcr, brbcr;
+
+ brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
+ brbfcr &= ~BRBFCR_EL1_DEFAULT_CONFIG;
+ brbfcr |= branch_type_to_brbfcr(branch_type);
+ write_sysreg_s(brbfcr, SYS_BRBFCR_EL1);
+ isb();
+
+ brbcr = read_sysreg_s(SYS_BRBCR_EL1);
+ brbcr &= ~BRBCR_EL1_DEFAULT_CONFIG;
+ brbcr |= branch_type_to_brbcr(branch_type);
+ write_sysreg_s(brbcr, SYS_BRBCR_EL1);
+ isb();
+ armv8pmu_branch_reset();
+}
+
+void armv8pmu_branch_disable(struct perf_event *event)
+{
+ u64 brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
+ u64 brbcr = read_sysreg_s(SYS_BRBCR_EL1);
+
+ brbcr &= ~(BRBCR_EL1_E0BRE | BRBCR_EL1_E1BRE);
+ brbfcr |= BRBFCR_EL1_PAUSED;
+ write_sysreg_s(brbcr, SYS_BRBCR_EL1);
+ write_sysreg_s(brbfcr, SYS_BRBFCR_EL1);
+ isb();
+}
+
+static void brbe_set_perf_entry_type(struct perf_branch_entry *entry, u64 brbinf)
+{
+ int brbe_type = brbe_get_type(brbinf);
+
+ switch (brbe_type) {
+ case BRBINF_EL1_TYPE_UNCOND_DIR:
+ entry->type = PERF_BR_UNCOND;
+ break;
+ case BRBINF_EL1_TYPE_INDIR:
+ entry->type = PERF_BR_IND;
+ break;
+ case BRBINF_EL1_TYPE_DIR_LINK:
+ entry->type = PERF_BR_CALL;
+ break;
+ case BRBINF_EL1_TYPE_INDIR_LINK:
+ entry->type = PERF_BR_IND_CALL;
+ break;
+ case BRBINF_EL1_TYPE_RET_SUB:
+ entry->type = PERF_BR_RET;
+ break;
+ case BRBINF_EL1_TYPE_COND_DIR:
+ entry->type = PERF_BR_COND;
+ break;
+ case BRBINF_EL1_TYPE_CALL:
+ entry->type = PERF_BR_CALL;
+ break;
+ case BRBINF_EL1_TYPE_TRAP:
+ entry->type = PERF_BR_SYSCALL;
+ break;
+ case BRBINF_EL1_TYPE_RET_EXCPT:
+ entry->type = PERF_BR_ERET;
+ break;
+ case BRBINF_EL1_TYPE_IRQ:
+ entry->type = PERF_BR_IRQ;
+ break;
+ case BRBINF_EL1_TYPE_DEBUG_HALT:
+ entry->type = PERF_BR_EXTEND_ABI;
+ entry->new_type = PERF_BR_ARM64_DEBUG_HALT;
+ break;
+ case BRBINF_EL1_TYPE_SERROR:
+ entry->type = PERF_BR_SERROR;
+ break;
+ case BRBINF_EL1_TYPE_INST_DEBUG:
+ entry->type = PERF_BR_EXTEND_ABI;
+ entry->new_type = PERF_BR_ARM64_DEBUG_INST;
+ break;
+ case BRBINF_EL1_TYPE_DATA_DEBUG:
+ entry->type = PERF_BR_EXTEND_ABI;
+ entry->new_type = PERF_BR_ARM64_DEBUG_DATA;
+ break;
+ case BRBINF_EL1_TYPE_ALGN_FAULT:
+ entry->type = PERF_BR_EXTEND_ABI;
+ entry->new_type = PERF_BR_NEW_FAULT_ALGN;
+ break;
+ case BRBINF_EL1_TYPE_INST_FAULT:
+ entry->type = PERF_BR_EXTEND_ABI;
+ entry->new_type = PERF_BR_NEW_FAULT_INST;
+ break;
+ case BRBINF_EL1_TYPE_DATA_FAULT:
+ entry->type = PERF_BR_EXTEND_ABI;
+ entry->new_type = PERF_BR_NEW_FAULT_DATA;
+ break;
+ case BRBINF_EL1_TYPE_FIQ:
+ entry->type = PERF_BR_EXTEND_ABI;
+ entry->new_type = PERF_BR_ARM64_FIQ;
+ break;
+ case BRBINF_EL1_TYPE_DEBUG_EXIT:
+ entry->type = PERF_BR_EXTEND_ABI;
+ entry->new_type = PERF_BR_ARM64_DEBUG_EXIT;
+ break;
+ default:
+ pr_warn_once("%d - unknown branch type captured\n", brbe_type);
+ entry->type = PERF_BR_UNKNOWN;
+ break;
+ }
+}
+
+static int brbe_get_perf_priv(u64 brbinf)
+{
+ int brbe_el = brbe_get_el(brbinf);
+
+ switch (brbe_el) {
+ case BRBINF_EL1_EL_EL0:
+ return PERF_BR_PRIV_USER;
+ case BRBINF_EL1_EL_EL1:
+ return PERF_BR_PRIV_KERNEL;
+ case BRBINF_EL1_EL_EL2:
+ if (is_kernel_in_hyp_mode())
+ return PERF_BR_PRIV_KERNEL;
+ return PERF_BR_PRIV_HV;
+ default:
+ pr_warn_once("%d - unknown branch privilege captured\n", brbe_el);
+ return PERF_BR_PRIV_UNKNOWN;
+ }
+}
+
+static void capture_brbe_flags(struct pmu_hw_events *cpuc, struct perf_event *event,
+ u64 brbinf, u64 brbcr, int idx)
+{
+ struct perf_branch_entry *entry = &cpuc->branches->branch_entries[idx];
+
+ if (branch_sample_type(event))
+ brbe_set_perf_entry_type(entry, brbinf);
+
+ if (!branch_sample_no_cycles(event)) {
+ WARN_ON_ONCE(!(brbcr & BRBCR_EL1_CC));
+ entry->cycles = brbe_get_cycles(brbinf);
+ }
+
+ if (!branch_sample_no_flags(event)) {
+ /*
+ * BRBINFx_EL1.LASTFAILED indicates that a TME transaction failed (or
+ * was cancelled) prior to this record, and some number of records
+ * prior to this one, may have been generated during an attempt to
+ * execute the transaction.
+ *
+ * We will remove such entries later in process_branch_aborts().
+ */
+ entry->abort = brbe_get_lastfailed(brbinf);
+
+ /*
+ * All these information (i.e transaction state and mispredicts)
+ * are available for source only and complete branch records.
+ */
+ if (brbe_record_is_complete(brbinf) ||
+ brbe_record_is_source_only(brbinf)) {
+ WARN_ON_ONCE(!(brbcr & BRBCR_EL1_MPRED));
+ entry->mispred = brbe_get_mispredict(brbinf);
+ entry->predicted = !entry->mispred;
+ entry->in_tx = brbe_get_in_tx(brbinf);
+ }
+ }
+
+ if (branch_sample_priv(event)) {
+ /*
+ * All these information (i.e branch privilege level) are
+ * available for target only and complete branch records.
+ */
+ if (brbe_record_is_complete(brbinf) ||
+ brbe_record_is_target_only(brbinf))
+ entry->priv = brbe_get_perf_priv(brbinf);
+ }
+}
+
+/*
+ * A branch record with BRBINF_EL1.LASTFAILED set, implies that all
+ * preceding consecutive branch records, that were in a transaction
+ * (i.e their BRBINF_EL1.TX set) have been aborted.
+ *
+ * Similarly BRBFCR_EL1.LASTFAILED set, indicate that all preceding
+ * consecutive branch records upto the last record, which were in a
+ * transaction (i.e their BRBINF_EL1.TX set) have been aborted.
+ *
+ * --------------------------------- -------------------
+ * | 00 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX success]
+ * --------------------------------- -------------------
+ * | 01 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX success]
+ * --------------------------------- -------------------
+ * | 02 | BRBSRC | BRBTGT | BRBINF | | TX = 0 | LF = 0 |
+ * --------------------------------- -------------------
+ * | 03 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX failed]
+ * --------------------------------- -------------------
+ * | 04 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX failed]
+ * --------------------------------- -------------------
+ * | 05 | BRBSRC | BRBTGT | BRBINF | | TX = 0 | LF = 1 |
+ * --------------------------------- -------------------
+ * | .. | BRBSRC | BRBTGT | BRBINF | | TX = 0 | LF = 0 |
+ * --------------------------------- -------------------
+ * | 61 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX failed]
+ * --------------------------------- -------------------
+ * | 62 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX failed]
+ * --------------------------------- -------------------
+ * | 63 | BRBSRC | BRBTGT | BRBINF | | TX = 1 | LF = 0 | [TX failed]
+ * --------------------------------- -------------------
+ *
+ * BRBFCR_EL1.LASTFAILED == 1
+ *
+ * Here BRBFCR_EL1.LASTFAILED failes all those consecutive and also
+ * in transaction branches near the end of the BRBE buffer.
+ */
+static void process_branch_aborts(struct pmu_hw_events *cpuc)
+{
+ struct brbe_hw_attr *brbe_attr = (struct brbe_hw_attr *)cpuc->percpu_pmu->private;
+ u64 brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
+ bool lastfailed = !!(brbfcr & BRBFCR_EL1_LASTFAILED);
+ int idx = brbe_attr->brbe_nr - 1;
+ struct perf_branch_entry *entry;
+
+ do {
+ entry = &cpuc->branches->branch_entries[idx];
+ if (entry->in_tx) {
+ entry->abort = lastfailed;
+ } else {
+ lastfailed = entry->abort;
+ entry->abort = false;
+ }
+ } while (idx--, idx >= 0);
+}
+
+void armv8pmu_branch_reset(void)
+{
+ asm volatile(BRB_IALL);
+ isb();
+}
+
+static bool capture_branch_entry(struct pmu_hw_events *cpuc,
+ struct perf_event *event,
+ u64 brbcr, int idx)
+{
+ struct perf_branch_entry *entry = &cpuc->branches->branch_entries[idx];
+ u64 brbinf = get_brbinf_reg(idx);
+
+ /*
+ * There are no valid entries anymore on the buffer.
+ * Abort the branch record processing to save some
+ * cycles and also reduce the capture/process load
+ * for the user space as well.
+ */
+ if (brbe_invalid(brbinf))
+ return false;
+
+ perf_clear_branch_entry_bitfields(entry);
+ if (brbe_record_is_complete(brbinf)) {
+ entry->from = get_brbsrc_reg(idx);
+ entry->to = get_brbtgt_reg(idx);
+ } else if (brbe_record_is_source_only(brbinf)) {
+ entry->from = get_brbsrc_reg(idx);
+ entry->to = 0;
+ } else if (brbe_record_is_target_only(brbinf)) {
+ entry->from = 0;
+ entry->to = get_brbtgt_reg(idx);
+ }
+ capture_brbe_flags(cpuc, event, brbinf, brbcr, idx);
+ return true;
+}
+
+void armv8pmu_branch_read(struct pmu_hw_events *cpuc, struct perf_event *event)
+{
+ struct brbe_hw_attr *brbe_attr = (struct brbe_hw_attr *)cpuc->percpu_pmu->private;
+ u64 brbfcr, brbcr;
+ int idx, loop1_idx1, loop1_idx2, loop2_idx1, loop2_idx2, count;
+
+ brbcr = read_sysreg_s(SYS_BRBCR_EL1);
+ brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
+
+ /* Ensure pause on PMU interrupt is enabled */
+ WARN_ON_ONCE(!(brbcr & BRBCR_EL1_FZP));
+
+ /* Save and clear the privilege */
+ write_sysreg_s(brbcr & ~(BRBCR_EL1_E0BRE | BRBCR_EL1_E1BRE), SYS_BRBCR_EL1);
+
+ /* Pause the buffer */
+ write_sysreg_s(brbfcr | BRBFCR_EL1_PAUSED, SYS_BRBFCR_EL1);
+ isb();
+
+ /* Determine the indices for each loop */
+ loop1_idx1 = BRBE_BANK0_IDX_MIN;
+ if (brbe_attr->brbe_nr <= BRBE_BANK_MAX_ENTRIES) {
+ loop1_idx2 = brbe_attr->brbe_nr - 1;
+ loop2_idx1 = BRBE_BANK1_IDX_MIN;
+ loop2_idx2 = BRBE_BANK0_IDX_MAX;
+ } else {
+ loop1_idx2 = BRBE_BANK0_IDX_MAX;
+ loop2_idx1 = BRBE_BANK1_IDX_MIN;
+ loop2_idx2 = brbe_attr->brbe_nr - 1;
+ }
+
+ /* Loop through bank 0 */
+ select_brbe_bank(BRBE_BANK_IDX_0);
+ for (idx = 0, count = loop1_idx1; count <= loop1_idx2; idx++, count++) {
+ if (!capture_branch_entry(cpuc, event, brbcr, idx))
+ break;
+ }
+
+ /* Loop through bank 1 */
+ select_brbe_bank(BRBE_BANK_IDX_1);
+ for (count = loop2_idx1; count <= loop2_idx2; idx++, count++) {
+ if (!capture_branch_entry(cpuc, event, brbcr, idx))
+ break;
+ }
+ cpuc->branches->branch_stack.nr = idx;
+ cpuc->branches->branch_stack.hw_idx = -1ULL;
+ process_branch_aborts(cpuc);
+
+ /* Restore privilege, enable pause on PMU interrupt */
+ write_sysreg_s(brbcr | BRBCR_EL1_FZP, SYS_BRBCR_EL1);
+
+ /* Unpause the buffer */
+ write_sysreg_s(brbfcr & ~BRBFCR_EL1_PAUSED, SYS_BRBFCR_EL1);
+ isb();
+ armv8pmu_branch_reset();
+}
diff --git a/arch/arm64/kernel/brbe.h b/arch/arm64/kernel/brbe.h
new file mode 100644
index 000000000000..3a5a6f5f03bf
--- /dev/null
+++ b/arch/arm64/kernel/brbe.h
@@ -0,0 +1,257 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Branch Record Buffer Extension Helpers.
+ *
+ * Copyright (C) 2022 ARM Limited
+ *
+ * Author: Anshuman Khandual <[email protected]>
+ */
+#define pr_fmt(fmt) "brbe: " fmt
+
+#include <linux/perf/arm_pmu.h>
+
+#define BRBFCR_EL1_BRANCH_FILTERS (BRBFCR_EL1_DIRECT | \
+ BRBFCR_EL1_INDIRECT | \
+ BRBFCR_EL1_RTN | \
+ BRBFCR_EL1_INDCALL | \
+ BRBFCR_EL1_DIRCALL | \
+ BRBFCR_EL1_CONDDIR)
+
+#define BRBFCR_EL1_DEFAULT_CONFIG (BRBFCR_EL1_BANK_MASK | \
+ BRBFCR_EL1_PAUSED | \
+ BRBFCR_EL1_EnI | \
+ BRBFCR_EL1_BRANCH_FILTERS)
+
+/*
+ * BRBTS_EL1 is currently not used for branch stack implementation
+ * purpose but BRBCR_EL1.TS needs to have a valid value from all
+ * available options. BRBCR_EL1_TS_VIRTUAL is selected for this.
+ */
+#define BRBCR_EL1_DEFAULT_TS FIELD_PREP(BRBCR_EL1_TS_MASK, BRBCR_EL1_TS_VIRTUAL)
+
+#define BRBCR_EL1_DEFAULT_CONFIG (BRBCR_EL1_EXCEPTION | \
+ BRBCR_EL1_ERTN | \
+ BRBCR_EL1_CC | \
+ BRBCR_EL1_MPRED | \
+ BRBCR_EL1_E1BRE | \
+ BRBCR_EL1_E0BRE | \
+ BRBCR_EL1_FZP | \
+ BRBCR_EL1_DEFAULT_TS)
+/*
+ * BRBE Instructions
+ *
+ * BRB_IALL : Invalidate the entire buffer
+ * BRB_INJ : Inject latest branch record derived from [BRBSRCINJ, BRBTGTINJ, BRBINFINJ]
+ */
+#define BRB_IALL __emit_inst(0xD5000000 | sys_insn(1, 1, 7, 2, 4) | (0x1f))
+#define BRB_INJ __emit_inst(0xD5000000 | sys_insn(1, 1, 7, 2, 5) | (0x1f))
+
+/*
+ * BRBE Buffer Organization
+ *
+ * BRBE buffer is arranged as multiple banks of 32 branch record
+ * entries each. An individual branch record in a given bank could
+ * be accessed, after selecting the bank in BRBFCR_EL1.BANK and
+ * accessing the registers i.e [BRBSRC, BRBTGT, BRBINF] set with
+ * indices [0..31].
+ *
+ * Bank 0
+ *
+ * --------------------------------- ------
+ * | 00 | BRBSRC | BRBTGT | BRBINF | | 00 |
+ * --------------------------------- ------
+ * | 01 | BRBSRC | BRBTGT | BRBINF | | 01 |
+ * --------------------------------- ------
+ * | .. | BRBSRC | BRBTGT | BRBINF | | .. |
+ * --------------------------------- ------
+ * | 31 | BRBSRC | BRBTGT | BRBINF | | 31 |
+ * --------------------------------- ------
+ *
+ * Bank 1
+ *
+ * --------------------------------- ------
+ * | 32 | BRBSRC | BRBTGT | BRBINF | | 00 |
+ * --------------------------------- ------
+ * | 33 | BRBSRC | BRBTGT | BRBINF | | 01 |
+ * --------------------------------- ------
+ * | .. | BRBSRC | BRBTGT | BRBINF | | .. |
+ * --------------------------------- ------
+ * | 63 | BRBSRC | BRBTGT | BRBINF | | 31 |
+ * --------------------------------- ------
+ */
+#define BRBE_BANK_MAX_ENTRIES 32
+
+#define BRBE_BANK0_IDX_MIN 0
+#define BRBE_BANK0_IDX_MAX 31
+#define BRBE_BANK1_IDX_MIN 32
+#define BRBE_BANK1_IDX_MAX 63
+
+struct brbe_hw_attr {
+ int brbe_version;
+ int brbe_cc;
+ int brbe_nr;
+ int brbe_format;
+};
+
+enum brbe_bank_idx {
+ BRBE_BANK_IDX_INVALID = -1,
+ BRBE_BANK_IDX_0,
+ BRBE_BANK_IDX_1,
+ BRBE_BANK_IDX_MAX
+};
+
+#define RETURN_READ_BRBSRCN(n) \
+ read_sysreg_s(SYS_BRBSRC##n##_EL1)
+
+#define RETURN_READ_BRBTGTN(n) \
+ read_sysreg_s(SYS_BRBTGT##n##_EL1)
+
+#define RETURN_READ_BRBINFN(n) \
+ read_sysreg_s(SYS_BRBINF##n##_EL1)
+
+#define BRBE_REGN_CASE(n, case_macro) \
+ case n: return case_macro(n); break
+
+#define BRBE_REGN_SWITCH(x, case_macro) \
+ do { \
+ switch (x) { \
+ BRBE_REGN_CASE(0, case_macro); \
+ BRBE_REGN_CASE(1, case_macro); \
+ BRBE_REGN_CASE(2, case_macro); \
+ BRBE_REGN_CASE(3, case_macro); \
+ BRBE_REGN_CASE(4, case_macro); \
+ BRBE_REGN_CASE(5, case_macro); \
+ BRBE_REGN_CASE(6, case_macro); \
+ BRBE_REGN_CASE(7, case_macro); \
+ BRBE_REGN_CASE(8, case_macro); \
+ BRBE_REGN_CASE(9, case_macro); \
+ BRBE_REGN_CASE(10, case_macro); \
+ BRBE_REGN_CASE(11, case_macro); \
+ BRBE_REGN_CASE(12, case_macro); \
+ BRBE_REGN_CASE(13, case_macro); \
+ BRBE_REGN_CASE(14, case_macro); \
+ BRBE_REGN_CASE(15, case_macro); \
+ BRBE_REGN_CASE(16, case_macro); \
+ BRBE_REGN_CASE(17, case_macro); \
+ BRBE_REGN_CASE(18, case_macro); \
+ BRBE_REGN_CASE(19, case_macro); \
+ BRBE_REGN_CASE(20, case_macro); \
+ BRBE_REGN_CASE(21, case_macro); \
+ BRBE_REGN_CASE(22, case_macro); \
+ BRBE_REGN_CASE(23, case_macro); \
+ BRBE_REGN_CASE(24, case_macro); \
+ BRBE_REGN_CASE(25, case_macro); \
+ BRBE_REGN_CASE(26, case_macro); \
+ BRBE_REGN_CASE(27, case_macro); \
+ BRBE_REGN_CASE(28, case_macro); \
+ BRBE_REGN_CASE(29, case_macro); \
+ BRBE_REGN_CASE(30, case_macro); \
+ BRBE_REGN_CASE(31, case_macro); \
+ default: \
+ pr_warn("unknown register index\n"); \
+ return -1; \
+ } \
+ } while (0)
+
+static inline int buffer_to_brbe_idx(int buffer_idx)
+{
+ return buffer_idx % BRBE_BANK_MAX_ENTRIES;
+}
+
+static inline u64 get_brbsrc_reg(int buffer_idx)
+{
+ int brbe_idx = buffer_to_brbe_idx(buffer_idx);
+
+ BRBE_REGN_SWITCH(brbe_idx, RETURN_READ_BRBSRCN);
+}
+
+static inline u64 get_brbtgt_reg(int buffer_idx)
+{
+ int brbe_idx = buffer_to_brbe_idx(buffer_idx);
+
+ BRBE_REGN_SWITCH(brbe_idx, RETURN_READ_BRBTGTN);
+}
+
+static inline u64 get_brbinf_reg(int buffer_idx)
+{
+ int brbe_idx = buffer_to_brbe_idx(buffer_idx);
+
+ BRBE_REGN_SWITCH(brbe_idx, RETURN_READ_BRBINFN);
+}
+
+static inline u64 brbe_record_valid(u64 brbinf)
+{
+ return FIELD_GET(BRBINF_EL1_VALID_MASK, brbinf);
+}
+
+static inline bool brbe_invalid(u64 brbinf)
+{
+ return brbe_record_valid(brbinf) == BRBINF_EL1_VALID_NONE;
+}
+
+static inline bool brbe_record_is_complete(u64 brbinf)
+{
+ return brbe_record_valid(brbinf) == BRBINF_EL1_VALID_FULL;
+}
+
+static inline bool brbe_record_is_source_only(u64 brbinf)
+{
+ return brbe_record_valid(brbinf) == BRBINF_EL1_VALID_SOURCE;
+}
+
+static inline bool brbe_record_is_target_only(u64 brbinf)
+{
+ return brbe_record_valid(brbinf) == BRBINF_EL1_VALID_TARGET;
+}
+
+static inline int brbe_get_in_tx(u64 brbinf)
+{
+ return FIELD_GET(BRBINF_EL1_T_MASK, brbinf);
+}
+
+static inline int brbe_get_mispredict(u64 brbinf)
+{
+ return FIELD_GET(BRBINF_EL1_MPRED_MASK, brbinf);
+}
+
+static inline int brbe_get_lastfailed(u64 brbinf)
+{
+ return FIELD_GET(BRBINF_EL1_LASTFAILED_MASK, brbinf);
+}
+
+static inline int brbe_get_cycles(u64 brbinf)
+{
+ /*
+ * Captured cycle count is unknown and hence
+ * should not be passed on to the user space.
+ */
+ if (brbinf & BRBINF_EL1_CCU)
+ return 0;
+
+ return FIELD_GET(BRBINF_EL1_CC_MASK, brbinf);
+}
+
+static inline int brbe_get_type(u64 brbinf)
+{
+ return FIELD_GET(BRBINF_EL1_TYPE_MASK, brbinf);
+}
+
+static inline int brbe_get_el(u64 brbinf)
+{
+ return FIELD_GET(BRBINF_EL1_EL_MASK, brbinf);
+}
+
+static inline int brbe_get_numrec(u64 brbidr)
+{
+ return FIELD_GET(BRBIDR0_EL1_NUMREC_MASK, brbidr);
+}
+
+static inline int brbe_get_format(u64 brbidr)
+{
+ return FIELD_GET(BRBIDR0_EL1_FORMAT_MASK, brbidr);
+}
+
+static inline int brbe_get_cc_bits(u64 brbidr)
+{
+ return FIELD_GET(BRBIDR0_EL1_CC_MASK, brbidr);
+}
--
2.25.1


2023-01-30 16:29:23

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH V8 5/6] arm64/perf: Add branch stack support in ARMV8 PMU

Hi Anshuman,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on arm64/for-next/core]
[also build test WARNING on tip/perf/core acme/perf/core linus/master v6.2-rc6 next-20230130]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Anshuman-Khandual/drivers-perf-arm_pmu-Add-new-sched_task-callback/20230123-210254
base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
patch link: https://lore.kernel.org/r/20230123125956.1350336-6-anshuman.khandual%40arm.com
patch subject: [PATCH V8 5/6] arm64/perf: Add branch stack support in ARMV8 PMU
config: arm64-randconfig-r016-20230130 (https://download.01.org/0day-ci/archive/20230131/[email protected]/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 4196ca3278f78c6e19246e54ab0ecb364e37d66a)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm64 cross compiling tool for clang build
# apt-get install binutils-aarch64-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/0deba04ac45f8632b8579cb5cbf908b9f4428402
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Anshuman-Khandual/drivers-perf-arm_pmu-Add-new-sched_task-callback/20230123-210254
git checkout 0deba04ac45f8632b8579cb5cbf908b9f4428402
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 prepare

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

In file included from arch/arm64/kernel/asm-offsets.c:15:
In file included from include/linux/kvm_host.h:45:
In file included from arch/arm64/include/asm/kvm_host.h:36:
In file included from include/kvm/arm_pmu.h:11:
>> arch/arm64/include/asm/perf_event.h:280:20: warning: function 'has_branch_stack' has internal linkage but is not defined [-Wundefined-internal]
static inline bool has_branch_stack(struct perf_event *event);
^
arch/arm64/include/asm/perf_event.h:284:16: note: used here
WARN_ON_ONCE(!has_branch_stack(event));
^
1 warning generated.
--
In file included from arch/arm64/kernel/asm-offsets.c:15:
In file included from include/linux/kvm_host.h:45:
In file included from arch/arm64/include/asm/kvm_host.h:36:
In file included from include/kvm/arm_pmu.h:11:
>> arch/arm64/include/asm/perf_event.h:280:20: warning: function 'has_branch_stack' has internal linkage but is not defined [-Wundefined-internal]
static inline bool has_branch_stack(struct perf_event *event);
^
arch/arm64/include/asm/perf_event.h:284:16: note: used here
WARN_ON_ONCE(!has_branch_stack(event));
^
1 warning generated.


vim +/has_branch_stack +280 arch/arm64/include/asm/perf_event.h

279
> 280 static inline bool has_branch_stack(struct perf_event *event);
281

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

2023-02-03 11:32:06

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH V8 5/6] arm64/perf: Add branch stack support in ARMV8 PMU

On Tue, Jan 31, 2023 at 12:28:17AM +0800, kernel test robot wrote:
> Hi Anshuman,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on arm64/for-next/core]
> [also build test WARNING on tip/perf/core acme/perf/core linus/master v6.2-rc6 next-20230130]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Anshuman-Khandual/drivers-perf-arm_pmu-Add-new-sched_task-callback/20230123-210254
> base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
> patch link: https://lore.kernel.org/r/20230123125956.1350336-6-anshuman.khandual%40arm.com
> patch subject: [PATCH V8 5/6] arm64/perf: Add branch stack support in ARMV8 PMU
> config: arm64-randconfig-r016-20230130 (https://download.01.org/0day-ci/archive/20230131/[email protected]/config)
> compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 4196ca3278f78c6e19246e54ab0ecb364e37d66a)
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # install arm64 cross compiling tool for clang build
> # apt-get install binutils-aarch64-linux-gnu
> # https://github.com/intel-lab-lkp/linux/commit/0deba04ac45f8632b8579cb5cbf908b9f4428402
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review Anshuman-Khandual/drivers-perf-arm_pmu-Add-new-sched_task-callback/20230123-210254
> git checkout 0deba04ac45f8632b8579cb5cbf908b9f4428402
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 olddefconfig
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 prepare
>
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <[email protected]>
>
> All warnings (new ones prefixed by >>):
>
> In file included from arch/arm64/kernel/asm-offsets.c:15:
> In file included from include/linux/kvm_host.h:45:
> In file included from arch/arm64/include/asm/kvm_host.h:36:
> In file included from include/kvm/arm_pmu.h:11:
> >> arch/arm64/include/asm/perf_event.h:280:20: warning: function 'has_branch_stack' has internal linkage but is not defined [-Wundefined-internal]
> static inline bool has_branch_stack(struct perf_event *event);
> ^
> arch/arm64/include/asm/perf_event.h:284:16: note: used here
> WARN_ON_ONCE(!has_branch_stack(event));
> ^
> 1 warning generated.
> --
> In file included from arch/arm64/kernel/asm-offsets.c:15:
> In file included from include/linux/kvm_host.h:45:
> In file included from arch/arm64/include/asm/kvm_host.h:36:
> In file included from include/kvm/arm_pmu.h:11:
> >> arch/arm64/include/asm/perf_event.h:280:20: warning: function 'has_branch_stack' has internal linkage but is not defined [-Wundefined-internal]
> static inline bool has_branch_stack(struct perf_event *event);
> ^
> arch/arm64/include/asm/perf_event.h:284:16: note: used here
> WARN_ON_ONCE(!has_branch_stack(event));
> ^
> 1 warning generated.

This looks like it should be fixed. I'd also like to see Mark's ack on the
final series, since he had some detailed comments on the previous version.

Will

2023-02-06 04:16:41

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V8 5/6] arm64/perf: Add branch stack support in ARMV8 PMU



On 2/3/23 17:01, Will Deacon wrote:
> On Tue, Jan 31, 2023 at 12:28:17AM +0800, kernel test robot wrote:
>> Hi Anshuman,
>>
>> Thank you for the patch! Perhaps something to improve:
>>
>> [auto build test WARNING on arm64/for-next/core]
>> [also build test WARNING on tip/perf/core acme/perf/core linus/master v6.2-rc6 next-20230130]
>> [If your patch is applied to the wrong git tree, kindly drop us a note.
>> And when submitting patch, we suggest to use '--base' as documented in
>> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>>
>> url: https://github.com/intel-lab-lkp/linux/commits/Anshuman-Khandual/drivers-perf-arm_pmu-Add-new-sched_task-callback/20230123-210254
>> base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
>> patch link: https://lore.kernel.org/r/20230123125956.1350336-6-anshuman.khandual%40arm.com
>> patch subject: [PATCH V8 5/6] arm64/perf: Add branch stack support in ARMV8 PMU
>> config: arm64-randconfig-r016-20230130 (https://download.01.org/0day-ci/archive/20230131/[email protected]/config)
>> compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 4196ca3278f78c6e19246e54ab0ecb364e37d66a)
>> reproduce (this is a W=1 build):
>> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>> chmod +x ~/bin/make.cross
>> # install arm64 cross compiling tool for clang build
>> # apt-get install binutils-aarch64-linux-gnu
>> # https://github.com/intel-lab-lkp/linux/commit/0deba04ac45f8632b8579cb5cbf908b9f4428402
>> git remote add linux-review https://github.com/intel-lab-lkp/linux
>> git fetch --no-tags linux-review Anshuman-Khandual/drivers-perf-arm_pmu-Add-new-sched_task-callback/20230123-210254
>> git checkout 0deba04ac45f8632b8579cb5cbf908b9f4428402
>> # save the config file
>> mkdir build_dir && cp config build_dir/.config
>> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 olddefconfig
>> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 prepare
>>
>> If you fix the issue, kindly add following tag where applicable
>> | Reported-by: kernel test robot <[email protected]>
>>
>> All warnings (new ones prefixed by >>):
>>
>> In file included from arch/arm64/kernel/asm-offsets.c:15:
>> In file included from include/linux/kvm_host.h:45:
>> In file included from arch/arm64/include/asm/kvm_host.h:36:
>> In file included from include/kvm/arm_pmu.h:11:
>>>> arch/arm64/include/asm/perf_event.h:280:20: warning: function 'has_branch_stack' has internal linkage but is not defined [-Wundefined-internal]
>> static inline bool has_branch_stack(struct perf_event *event);
>> ^
>> arch/arm64/include/asm/perf_event.h:284:16: note: used here
>> WARN_ON_ONCE(!has_branch_stack(event));
>> ^
>> 1 warning generated.
>> --
>> In file included from arch/arm64/kernel/asm-offsets.c:15:
>> In file included from include/linux/kvm_host.h:45:
>> In file included from arch/arm64/include/asm/kvm_host.h:36:
>> In file included from include/kvm/arm_pmu.h:11:
>>>> arch/arm64/include/asm/perf_event.h:280:20: warning: function 'has_branch_stack' has internal linkage but is not defined [-Wundefined-internal]
>> static inline bool has_branch_stack(struct perf_event *event);
>> ^
>> arch/arm64/include/asm/perf_event.h:284:16: note: used here
>> WARN_ON_ONCE(!has_branch_stack(event));
>> ^
>> 1 warning generated.
>
> This looks like it should be fixed. I'd also like to see Mark's ack on the

This build warning is triggered when CONFIG_PERF_EVENTS is not enabled, when
all the fallback stubs in there try to use has_branch_stack() which does not
get defined. The following change, fixes the problem.

diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h
index 97db96326e13..1f68c125eb1a 100644
--- a/arch/arm64/include/asm/perf_event.h
+++ b/arch/arm64/include/asm/perf_event.h
@@ -277,8 +277,6 @@ struct pmu_hw_events;
struct arm_pmu;
struct perf_event;

-static inline bool has_branch_stack(struct perf_event *event);
-
#ifdef CONFIG_ARM64_BRBE
void armv8pmu_branch_read(struct pmu_hw_events *cpuc, struct perf_event *event);
bool armv8pmu_branch_valid(struct perf_event *event);
@@ -289,6 +287,9 @@ void armv8pmu_branch_reset(void);
int armv8pmu_private_alloc(struct arm_pmu *arm_pmu);
void armv8pmu_private_free(struct arm_pmu *arm_pmu);
#else
+#ifdef CONFIG_PERF_EVENTS
+static inline bool has_branch_stack(struct perf_event *event);
+
static inline void armv8pmu_branch_read(struct pmu_hw_events *cpuc, struct perf_event *event)
{
WARN_ON_ONCE(!has_branch_stack(event));
@@ -316,3 +317,4 @@ static inline int armv8pmu_private_alloc(struct arm_pmu *arm_pmu) { return 0; }
static inline void armv8pmu_private_free(struct arm_pmu *arm_pmu) { }
#endif
#endif
+#endif

> final series, since he had some detailed comments on the previous version.
>
> Will

2023-02-08 20:06:53

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH V8 5/6] arm64/perf: Add branch stack support in ARMV8 PMU

On Mon, Feb 06, 2023 at 09:46:25AM +0530, Anshuman Khandual wrote:
> On 2/3/23 17:01, Will Deacon wrote:
> > On Tue, Jan 31, 2023 at 12:28:17AM +0800, kernel test robot wrote:

[...]

> > This looks like it should be fixed. I'd also like to see Mark's ack on the
>
> This build warning is triggered when CONFIG_PERF_EVENTS is not enabled, when
> all the fallback stubs in there try to use has_branch_stack() which does not
> get defined. The following change, fixes the problem.

I expect that you'll post a v9 with that folded in, and I've just made some
comments on v7 which I expect will require some changes, so I'm going to wait
for v9 for further review to keep this manageable.

Thanks,
Mark.

2023-02-09 02:51:37

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [PATCH V8 5/6] arm64/perf: Add branch stack support in ARMV8 PMU



On 2/9/23 01:36, Mark Rutland wrote:
> On Mon, Feb 06, 2023 at 09:46:25AM +0530, Anshuman Khandual wrote:
>> On 2/3/23 17:01, Will Deacon wrote:
>>> On Tue, Jan 31, 2023 at 12:28:17AM +0800, kernel test robot wrote:
>
> [...]
>
>>> This looks like it should be fixed. I'd also like to see Mark's ack on the
>>
>> This build warning is triggered when CONFIG_PERF_EVENTS is not enabled, when
>> all the fallback stubs in there try to use has_branch_stack() which does not
>> get defined. The following change, fixes the problem.
>
> I expect that you'll post a v9 with that folded in, and I've just made some
> comments on v7 which I expect will require some changes, so I'm going to wait
> for v9 for further review to keep this manageable.

Sure, will work on a V9 and post when ready.