Last Branch Record (LBR) is a feature available on modern processors for
recording branch information. It helps determine the flow of control by
logging branch information to registers in realtime and helps with the
detection of hot code paths.
Add support for using AMD Last Branch Record Extension Version 2 (LbrExtV2)
features on Zen 4 processors. New CPU features are introduced for LbrExtV2
detection. New MSR definitions are added for configuring hardware branch
filtering and for enabling the LBR Freeze on PMI feature.
The LBR Freeze on PMI feature is essential for ensuring that branch records
remain consistent with the point of PMU overflow in order to provide a
precise correlation between the two.
Hardware branch filtering allows users to record only specific types of
branches and can be mapped to most of the existing filters supported by the
perf tool. Additional software filtering ensures that some special branches
(syscall entry and exit) for which direct hardware filters do not exist are
also recorded. This expands the scope of filters like "any_call".
Additionally, the perf UAPI is now extended to provide branch speculation
information, if available. LbrExtV2 provides this information through the
"valid" and "spec" bits in the Branch To registers. The tools-side changes
for this will be submitted as a separate series.
Users of perf tool can now record branches as shown below. The 'div'
workload used here is from https://lwn.net/Articles/680985/.
E.g.
$ perf record -b -e cycles:u ./div
Before:
Error:
cycles:u: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
After:
[ perf record: Woken up 49 times to write data ]
[ perf record: Captured and wrote 12.197 MB perf.data (29601 samples) ]
$ perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 473K of event 'cycles:u'
# Event count (approx.): 473521
#
# Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles
# ........ ....... .................... ...................... ...................... ..................
#
29.69% div div [.] main [.] main -
23.84% div div [.] compute_flag [.] main -
23.41% div div [.] compute_flag [.] compute_flag -
23.04% div div [.] main [.] compute_flag -
[...]
No additional failures are seen upon running the following:
* perf built-in test suite
* perf_event_tests suite
Sandipan Das (13):
perf/x86/amd/brs: Move feature-specific functions
perf/x86/amd/core: Refactor branch attributes
perf/x86/amd/core: Add generic branch record interfaces
x86/cpufeatures: Add LbrExtV2 feature bit
perf/x86/amd/lbr: Detect LbrExtV2 support
perf/x86/amd/lbr: Add LbrExtV2 branch record support
perf/x86/amd/lbr: Add LbrExtV2 hardware branch filter support
perf/x86: Move branch classifier
perf/x86/amd/lbr: Add LbrExtV2 software branch filter support
perf/x86: Make branch classifier fusion-aware
perf/x86/amd/lbr: Use fusion-aware branch classifier
perf/core: Add speculation info to branch entries
perf/x86/amd/lbr: Add LbrExtV2 branch speculation info support
arch/x86/events/Makefile | 2 +-
arch/x86/events/amd/Makefile | 2 +-
arch/x86/events/amd/brs.c | 69 ++++-
arch/x86/events/amd/core.c | 200 +++++++------
arch/x86/events/amd/lbr.c | 435 +++++++++++++++++++++++++++++
arch/x86/events/intel/lbr.c | 273 ------------------
arch/x86/events/perf_event.h | 81 +++++-
arch/x86/events/utils.c | 247 ++++++++++++++++
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/msr-index.h | 5 +
arch/x86/include/asm/perf_event.h | 3 +-
arch/x86/kernel/cpu/scattered.c | 1 +
include/linux/perf_event.h | 1 +
include/uapi/linux/perf_event.h | 15 +-
14 files changed, 952 insertions(+), 384 deletions(-)
create mode 100644 arch/x86/events/amd/lbr.c
create mode 100644 arch/x86/events/utils.c
--
2.34.1
If AMD Last Branch Record Extension Version 2 (LbrExtV2) is detected,
enable it alongside LBR Freeze on PMI when an event requests branch stack
i.e. PERF_SAMPLE_BRANCH_STACK.
Each branch record is represented by a pair of registers, LBR From and LBR
To. The freeze feature prevents any updates to these registers once a PMC
overflows. The contents remain unchanged until the freeze bit is cleared by
the PMI handler.
The branch records are read and copied to sample data before unfreezing.
However, only valid entries are copied. There is no additional register to
denote which of the register pairs represent the top of the stack (TOS)
since internal register renaming always ensures that the first pair (i.e.
index 0) is the one representing the most recent branch and so on.
The LBR registers are per-thread resources and are cleared explicitly
whenever a new task is scheduled in. There are no special implications on
the contents of these registers when transitioning to deep C-states.
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/events/amd/core.c | 47 +++++--
arch/x86/events/amd/lbr.c | 203 +++++++++++++++++++++++++++++++
arch/x86/events/perf_event.h | 8 ++
arch/x86/include/asm/msr-index.h | 5 +
4 files changed, 252 insertions(+), 11 deletions(-)
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index a3aa67b4fd50..d799628016c8 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -620,7 +620,7 @@ static inline u64 amd_pmu_get_global_status(void)
/* PerfCntrGlobalStatus is read-only */
rdmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, status);
- return status & amd_pmu_global_cntr_mask;
+ return status;
}
static inline void amd_pmu_ack_global_status(u64 status)
@@ -631,8 +631,6 @@ static inline void amd_pmu_ack_global_status(u64 status)
* clears the same bit in PerfCntrGlobalStatus
*/
- /* Only allow modifications to PerfCntrGlobalStatus.PerfCntrOvfl */
- status &= amd_pmu_global_cntr_mask;
wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, status);
}
@@ -742,11 +740,17 @@ static void amd_pmu_v2_enable_event(struct perf_event *event)
__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
}
-static void amd_pmu_v2_enable_all(int added)
+static __always_inline void amd_pmu_core_enable_all(void)
{
amd_pmu_set_global_ctl(amd_pmu_global_cntr_mask);
}
+static void amd_pmu_v2_enable_all(int added)
+{
+ amd_pmu_lbr_enable_all();
+ amd_pmu_core_enable_all();
+}
+
static void amd_pmu_disable_event(struct perf_event *event)
{
x86_pmu_disable_event(event);
@@ -771,10 +775,15 @@ static void amd_pmu_disable_all(void)
amd_pmu_check_overflow();
}
-static void amd_pmu_v2_disable_all(void)
+static __always_inline void amd_pmu_core_disable_all(void)
{
- /* Disable all PMCs */
amd_pmu_set_global_ctl(0);
+}
+
+static void amd_pmu_v2_disable_all(void)
+{
+ amd_pmu_core_disable_all();
+ amd_pmu_lbr_disable_all();
amd_pmu_check_overflow();
}
@@ -877,8 +886,8 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
pmu_enabled = cpuc->enabled;
cpuc->enabled = 0;
- /* Stop counting */
- amd_pmu_v2_disable_all();
+ /* Stop counting but do not disable LBR */
+ amd_pmu_core_disable_all();
status = amd_pmu_get_global_status();
@@ -886,6 +895,12 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
if (!status)
goto done;
+ /* Read branch records before unfreezing */
+ if (status & GLOBAL_STATUS_LBRS_FROZEN) {
+ amd_pmu_lbr_read();
+ status &= ~GLOBAL_STATUS_LBRS_FROZEN;
+ }
+
for (idx = 0; idx < x86_pmu.num_counters; idx++) {
if (!test_bit(idx, cpuc->active_mask))
continue;
@@ -905,6 +920,9 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
if (!x86_perf_event_set_period(event))
continue;
+ if (has_branch_stack(event))
+ data.br_stack = &cpuc->lbr_stack;
+
if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
@@ -918,7 +936,7 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
*/
WARN_ON(status > 0);
- /* Clear overflow bits */
+ /* Clear overflow and freeze bits */
amd_pmu_ack_global_status(~status);
/*
@@ -932,7 +950,7 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
/* Resume counting only if PMU is active */
if (pmu_enabled)
- amd_pmu_v2_enable_all(0);
+ amd_pmu_core_enable_all();
return amd_pmu_adjust_nmi_window(handled);
}
@@ -1375,7 +1393,14 @@ static int __init amd_core_pmu_init(void)
}
/* LBR and BRS are mutually exclusive features */
- if (amd_pmu_lbr_init() && !amd_brs_init()) {
+ if (!amd_pmu_lbr_init()) {
+ /* LBR requires flushing on context switch */
+ x86_pmu.sched_task = amd_pmu_lbr_sched_task;
+ static_call_update(amd_pmu_branch_hw_config, amd_pmu_lbr_hw_config);
+ static_call_update(amd_pmu_branch_reset, amd_pmu_lbr_reset);
+ static_call_update(amd_pmu_branch_add, amd_pmu_lbr_add);
+ static_call_update(amd_pmu_branch_del, amd_pmu_lbr_del);
+ } else if (!amd_brs_init()) {
/*
* BRS requires special event constraints and flushing on ctxsw.
*/
diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
index d240ff722b78..39247e7be53c 100644
--- a/arch/x86/events/amd/lbr.c
+++ b/arch/x86/events/amd/lbr.c
@@ -4,6 +4,209 @@
#include "../perf_event.h"
+struct branch_entry {
+ union {
+ struct {
+ u64 ip:58;
+ u64 ip_sign_ext:5;
+ u64 mispredict:1;
+ } split;
+ u64 full;
+ } from;
+
+ union {
+ struct {
+ u64 ip:58;
+ u64 ip_sign_ext:3;
+ u64 reserved:1;
+ u64 spec:1;
+ u64 valid:1;
+ } split;
+ u64 full;
+ } to;
+};
+
+static __always_inline void amd_pmu_lbr_set_from(unsigned int idx, u64 val)
+{
+ wrmsrl(MSR_AMD_SAMP_BR_FROM + idx * 2, val);
+}
+
+static __always_inline void amd_pmu_lbr_set_to(unsigned int idx, u64 val)
+{
+ wrmsrl(MSR_AMD_SAMP_BR_FROM + idx * 2 + 1, val);
+}
+
+static __always_inline u64 amd_pmu_lbr_get_from(unsigned int idx)
+{
+ u64 val;
+
+ rdmsrl(MSR_AMD_SAMP_BR_FROM + idx * 2, val);
+
+ return val;
+}
+
+static __always_inline u64 amd_pmu_lbr_get_to(unsigned int idx)
+{
+ u64 val;
+
+ rdmsrl(MSR_AMD_SAMP_BR_FROM + idx * 2 + 1, val);
+
+ return val;
+}
+
+static __always_inline u64 sign_ext_branch_ip(u64 ip)
+{
+ u32 shift = 64 - boot_cpu_data.x86_virt_bits;
+
+ return (u64)(((s64)ip << shift) >> shift);
+}
+
+void amd_pmu_lbr_read(void)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ struct perf_branch_entry *br = cpuc->lbr_entries;
+ struct branch_entry entry;
+ int out = 0, i;
+
+ if (!cpuc->lbr_users)
+ return;
+
+ for (i = 0; i < x86_pmu.lbr_nr; i++) {
+ entry.from.full = amd_pmu_lbr_get_from(i);
+ entry.to.full = amd_pmu_lbr_get_to(i);
+
+ /* Check if a branch has been logged */
+ if (!entry.to.split.valid)
+ continue;
+
+ perf_clear_branch_entry_bitfields(br + out);
+
+ br[out].from = sign_ext_branch_ip(entry.from.split.ip);
+ br[out].to = sign_ext_branch_ip(entry.to.split.ip);
+ br[out].mispred = entry.from.split.mispredict;
+ br[out].predicted = !br[out].mispred;
+ out++;
+ }
+
+ cpuc->lbr_stack.nr = out;
+
+ /*
+ * Internal register renaming always ensures that LBR From[0] and
+ * LBR To[0] always represent the TOS
+ */
+ cpuc->lbr_stack.hw_idx = 0;
+}
+
+static int amd_pmu_lbr_setup_filter(struct perf_event *event)
+{
+ /* No LBR support */
+ if (!x86_pmu.lbr_nr)
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+int amd_pmu_lbr_hw_config(struct perf_event *event)
+{
+ int ret = 0;
+
+ /* LBR is not recommended in counting mode */
+ if (!is_sampling_event(event))
+ return -EINVAL;
+
+ ret = amd_pmu_lbr_setup_filter(event);
+ if (!ret)
+ event->attach_state |= PERF_ATTACH_SCHED_CB;
+
+ return ret;
+}
+
+void amd_pmu_lbr_reset(void)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ int i;
+
+ if (!x86_pmu.lbr_nr)
+ return;
+
+ /* Reset all branch records individually */
+ for (i = 0; i < x86_pmu.lbr_nr; i++) {
+ amd_pmu_lbr_set_from(i, 0);
+ amd_pmu_lbr_set_to(i, 0);
+ }
+
+ cpuc->last_task_ctx = NULL;
+ cpuc->last_log_id = 0;
+}
+
+void amd_pmu_lbr_add(struct perf_event *event)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+ if (!x86_pmu.lbr_nr)
+ return;
+
+ perf_sched_cb_inc(event->ctx->pmu);
+
+ if (!cpuc->lbr_users++ && !event->total_time_running)
+ amd_pmu_lbr_reset();
+}
+
+void amd_pmu_lbr_del(struct perf_event *event)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+ if (!x86_pmu.lbr_nr)
+ return;
+
+ cpuc->lbr_users--;
+ WARN_ON_ONCE(cpuc->lbr_users < 0);
+ perf_sched_cb_dec(event->ctx->pmu);
+}
+
+void amd_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+ /*
+ * A context switch can flip the address space and LBR entries are
+ * not tagged with an identifier. Hence, branches cannot be resolved
+ * from the old address space and the LBR records should be wiped.
+ */
+ if (cpuc->lbr_users && sched_in)
+ amd_pmu_lbr_reset();
+}
+
+void amd_pmu_lbr_enable_all(void)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ u64 dbg_ctl, dbg_extn_cfg;
+
+ if (!cpuc->lbr_users || !x86_pmu.lbr_nr)
+ return;
+
+ rdmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl);
+ rdmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg);
+
+ wrmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI);
+ wrmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg | DBG_EXTN_CFG_LBRV2EN);
+}
+
+void amd_pmu_lbr_disable_all(void)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ u64 dbg_ctl, dbg_extn_cfg;
+
+ if (!cpuc->lbr_users || !x86_pmu.lbr_nr)
+ return;
+
+ rdmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg);
+ rdmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl);
+
+ wrmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg & ~DBG_EXTN_CFG_LBRV2EN);
+ wrmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl & ~DEBUGCTLMSR_FREEZE_LBRS_ON_PMI);
+}
+
__init int amd_pmu_lbr_init(void)
{
union cpuid_0x80000022_ebx ebx;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index a6cd83c57e8b..c8397290f388 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1231,6 +1231,14 @@ static inline bool fixed_counter_disabled(int i, struct pmu *pmu)
int amd_pmu_init(void);
int amd_pmu_lbr_init(void);
+void amd_pmu_lbr_reset(void);
+void amd_pmu_lbr_read(void);
+void amd_pmu_lbr_add(struct perf_event *event);
+void amd_pmu_lbr_del(struct perf_event *event);
+void amd_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in);
+void amd_pmu_lbr_enable_all(void);
+void amd_pmu_lbr_disable_all(void);
+int amd_pmu_lbr_hw_config(struct perf_event *event);
#ifdef CONFIG_PERF_EVENTS_AMD_BRS
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 403e83b4adc8..440d0490f71e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -539,6 +539,9 @@
#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL 0xc0000301
#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR 0xc0000302
+/* AMD Last Branch Record MSRs */
+#define MSR_AMD64_LBR_SELECT 0xc000010e
+
/* Fam 17h MSRs */
#define MSR_F17H_IRPERF 0xc00000e9
@@ -707,6 +710,8 @@
#define MSR_AMD_DBG_EXTN_CFG 0xc000010f
#define MSR_AMD_SAMP_BR_FROM 0xc0010300
+#define DBG_EXTN_CFG_LBRV2EN BIT_ULL(6)
+
#define MSR_IA32_MPERF 0x000000e7
#define MSR_IA32_APERF 0x000000e8
--
2.34.1
Commit 3e702ff6d1ea ("perf/x86: Add LBR software filter support for Intel
CPUs") introduces a software branch filter which complements the hardware
branch filter and adds an x86 branch classifier.
Move the branch classifier to arch/x86/events/ so that it can be utilized
by other vendors for branch record filtering.
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/events/Makefile | 2 +-
arch/x86/events/intel/lbr.c | 273 -----------------------------------
arch/x86/events/perf_event.h | 62 ++++++++
arch/x86/events/utils.c | 216 +++++++++++++++++++++++++++
4 files changed, 279 insertions(+), 274 deletions(-)
create mode 100644 arch/x86/events/utils.c
diff --git a/arch/x86/events/Makefile b/arch/x86/events/Makefile
index 9933c0e8e97a..86a76efa8bb6 100644
--- a/arch/x86/events/Makefile
+++ b/arch/x86/events/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
-obj-y += core.o probe.o
+obj-y += core.o probe.o utils.o
obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL) += rapl.o
obj-y += amd/
obj-$(CONFIG_X86_LOCAL_APIC) += msr.o
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 13179f31fe10..d0dc1c358ec0 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -4,7 +4,6 @@
#include <asm/perf_event.h>
#include <asm/msr.h>
-#include <asm/insn.h>
#include "../perf_event.h"
@@ -65,65 +64,6 @@
#define LBR_FROM_SIGNEXT_2MSB (BIT_ULL(60) | BIT_ULL(59))
-/*
- * x86control flow change classification
- * x86control flow changes include branches, interrupts, traps, faults
- */
-enum {
- X86_BR_NONE = 0, /* unknown */
-
- X86_BR_USER = 1 << 0, /* branch target is user */
- X86_BR_KERNEL = 1 << 1, /* branch target is kernel */
-
- X86_BR_CALL = 1 << 2, /* call */
- X86_BR_RET = 1 << 3, /* return */
- X86_BR_SYSCALL = 1 << 4, /* syscall */
- X86_BR_SYSRET = 1 << 5, /* syscall return */
- X86_BR_INT = 1 << 6, /* sw interrupt */
- X86_BR_IRET = 1 << 7, /* return from interrupt */
- X86_BR_JCC = 1 << 8, /* conditional */
- X86_BR_JMP = 1 << 9, /* jump */
- X86_BR_IRQ = 1 << 10,/* hw interrupt or trap or fault */
- X86_BR_IND_CALL = 1 << 11,/* indirect calls */
- X86_BR_ABORT = 1 << 12,/* transaction abort */
- X86_BR_IN_TX = 1 << 13,/* in transaction */
- X86_BR_NO_TX = 1 << 14,/* not in transaction */
- X86_BR_ZERO_CALL = 1 << 15,/* zero length call */
- X86_BR_CALL_STACK = 1 << 16,/* call stack */
- X86_BR_IND_JMP = 1 << 17,/* indirect jump */
-
- X86_BR_TYPE_SAVE = 1 << 18,/* indicate to save branch type */
-
-};
-
-#define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
-#define X86_BR_ANYTX (X86_BR_NO_TX | X86_BR_IN_TX)
-
-#define X86_BR_ANY \
- (X86_BR_CALL |\
- X86_BR_RET |\
- X86_BR_SYSCALL |\
- X86_BR_SYSRET |\
- X86_BR_INT |\
- X86_BR_IRET |\
- X86_BR_JCC |\
- X86_BR_JMP |\
- X86_BR_IRQ |\
- X86_BR_ABORT |\
- X86_BR_IND_CALL |\
- X86_BR_IND_JMP |\
- X86_BR_ZERO_CALL)
-
-#define X86_BR_ALL (X86_BR_PLM | X86_BR_ANY)
-
-#define X86_BR_ANY_CALL \
- (X86_BR_CALL |\
- X86_BR_IND_CALL |\
- X86_BR_ZERO_CALL |\
- X86_BR_SYSCALL |\
- X86_BR_IRQ |\
- X86_BR_INT)
-
/*
* Intel LBR_CTL bits
*
@@ -1143,219 +1083,6 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
return ret;
}
-/*
- * return the type of control flow change at address "from"
- * instruction is not necessarily a branch (in case of interrupt).
- *
- * The branch type returned also includes the priv level of the
- * target of the control flow change (X86_BR_USER, X86_BR_KERNEL).
- *
- * If a branch type is unknown OR the instruction cannot be
- * decoded (e.g., text page not present), then X86_BR_NONE is
- * returned.
- */
-static int branch_type(unsigned long from, unsigned long to, int abort)
-{
- struct insn insn;
- void *addr;
- int bytes_read, bytes_left;
- int ret = X86_BR_NONE;
- int ext, to_plm, from_plm;
- u8 buf[MAX_INSN_SIZE];
- int is64 = 0;
-
- to_plm = kernel_ip(to) ? X86_BR_KERNEL : X86_BR_USER;
- from_plm = kernel_ip(from) ? X86_BR_KERNEL : X86_BR_USER;
-
- /*
- * maybe zero if lbr did not fill up after a reset by the time
- * we get a PMU interrupt
- */
- if (from == 0 || to == 0)
- return X86_BR_NONE;
-
- if (abort)
- return X86_BR_ABORT | to_plm;
-
- if (from_plm == X86_BR_USER) {
- /*
- * can happen if measuring at the user level only
- * and we interrupt in a kernel thread, e.g., idle.
- */
- if (!current->mm)
- return X86_BR_NONE;
-
- /* may fail if text not present */
- bytes_left = copy_from_user_nmi(buf, (void __user *)from,
- MAX_INSN_SIZE);
- bytes_read = MAX_INSN_SIZE - bytes_left;
- if (!bytes_read)
- return X86_BR_NONE;
-
- addr = buf;
- } else {
- /*
- * The LBR logs any address in the IP, even if the IP just
- * faulted. This means userspace can control the from address.
- * Ensure we don't blindly read any address by validating it is
- * a known text address.
- */
- if (kernel_text_address(from)) {
- addr = (void *)from;
- /*
- * Assume we can get the maximum possible size
- * when grabbing kernel data. This is not
- * _strictly_ true since we could possibly be
- * executing up next to a memory hole, but
- * it is very unlikely to be a problem.
- */
- bytes_read = MAX_INSN_SIZE;
- } else {
- return X86_BR_NONE;
- }
- }
-
- /*
- * decoder needs to know the ABI especially
- * on 64-bit systems running 32-bit apps
- */
-#ifdef CONFIG_X86_64
- is64 = kernel_ip((unsigned long)addr) || any_64bit_mode(current_pt_regs());
-#endif
- insn_init(&insn, addr, bytes_read, is64);
- if (insn_get_opcode(&insn))
- return X86_BR_ABORT;
-
- switch (insn.opcode.bytes[0]) {
- case 0xf:
- switch (insn.opcode.bytes[1]) {
- case 0x05: /* syscall */
- case 0x34: /* sysenter */
- ret = X86_BR_SYSCALL;
- break;
- case 0x07: /* sysret */
- case 0x35: /* sysexit */
- ret = X86_BR_SYSRET;
- break;
- case 0x80 ... 0x8f: /* conditional */
- ret = X86_BR_JCC;
- break;
- default:
- ret = X86_BR_NONE;
- }
- break;
- case 0x70 ... 0x7f: /* conditional */
- ret = X86_BR_JCC;
- break;
- case 0xc2: /* near ret */
- case 0xc3: /* near ret */
- case 0xca: /* far ret */
- case 0xcb: /* far ret */
- ret = X86_BR_RET;
- break;
- case 0xcf: /* iret */
- ret = X86_BR_IRET;
- break;
- case 0xcc ... 0xce: /* int */
- ret = X86_BR_INT;
- break;
- case 0xe8: /* call near rel */
- if (insn_get_immediate(&insn) || insn.immediate1.value == 0) {
- /* zero length call */
- ret = X86_BR_ZERO_CALL;
- break;
- }
- fallthrough;
- case 0x9a: /* call far absolute */
- ret = X86_BR_CALL;
- break;
- case 0xe0 ... 0xe3: /* loop jmp */
- ret = X86_BR_JCC;
- break;
- case 0xe9 ... 0xeb: /* jmp */
- ret = X86_BR_JMP;
- break;
- case 0xff: /* call near absolute, call far absolute ind */
- if (insn_get_modrm(&insn))
- return X86_BR_ABORT;
-
- ext = (insn.modrm.bytes[0] >> 3) & 0x7;
- switch (ext) {
- case 2: /* near ind call */
- case 3: /* far ind call */
- ret = X86_BR_IND_CALL;
- break;
- case 4:
- case 5:
- ret = X86_BR_IND_JMP;
- break;
- }
- break;
- default:
- ret = X86_BR_NONE;
- }
- /*
- * interrupts, traps, faults (and thus ring transition) may
- * occur on any instructions. Thus, to classify them correctly,
- * we need to first look at the from and to priv levels. If they
- * are different and to is in the kernel, then it indicates
- * a ring transition. If the from instruction is not a ring
- * transition instr (syscall, systenter, int), then it means
- * it was a irq, trap or fault.
- *
- * we have no way of detecting kernel to kernel faults.
- */
- if (from_plm == X86_BR_USER && to_plm == X86_BR_KERNEL
- && ret != X86_BR_SYSCALL && ret != X86_BR_INT)
- ret = X86_BR_IRQ;
-
- /*
- * branch priv level determined by target as
- * is done by HW when LBR_SELECT is implemented
- */
- if (ret != X86_BR_NONE)
- ret |= to_plm;
-
- return ret;
-}
-
-#define X86_BR_TYPE_MAP_MAX 16
-
-static int branch_map[X86_BR_TYPE_MAP_MAX] = {
- PERF_BR_CALL, /* X86_BR_CALL */
- PERF_BR_RET, /* X86_BR_RET */
- PERF_BR_SYSCALL, /* X86_BR_SYSCALL */
- PERF_BR_SYSRET, /* X86_BR_SYSRET */
- PERF_BR_UNKNOWN, /* X86_BR_INT */
- PERF_BR_ERET, /* X86_BR_IRET */
- PERF_BR_COND, /* X86_BR_JCC */
- PERF_BR_UNCOND, /* X86_BR_JMP */
- PERF_BR_IRQ, /* X86_BR_IRQ */
- PERF_BR_IND_CALL, /* X86_BR_IND_CALL */
- PERF_BR_UNKNOWN, /* X86_BR_ABORT */
- PERF_BR_UNKNOWN, /* X86_BR_IN_TX */
- PERF_BR_UNKNOWN, /* X86_BR_NO_TX */
- PERF_BR_CALL, /* X86_BR_ZERO_CALL */
- PERF_BR_UNKNOWN, /* X86_BR_CALL_STACK */
- PERF_BR_IND, /* X86_BR_IND_JMP */
-};
-
-static int
-common_branch_type(int type)
-{
- int i;
-
- type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
-
- if (type) {
- i = __ffs(type);
- if (i < X86_BR_TYPE_MAP_MAX)
- return branch_map[i];
- }
-
- return PERF_BR_UNKNOWN;
-}
-
enum {
ARCH_LBR_BR_TYPE_JCC = 0,
ARCH_LBR_BR_TYPE_NEAR_IND_JMP = 1,
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index c8397290f388..8ac8f9e6affb 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1208,6 +1208,68 @@ static inline void set_linear_ip(struct pt_regs *regs, unsigned long ip)
regs->ip = ip;
}
+/*
+ * x86control flow change classification
+ * x86control flow changes include branches, interrupts, traps, faults
+ */
+enum {
+ X86_BR_NONE = 0, /* unknown */
+
+ X86_BR_USER = 1 << 0, /* branch target is user */
+ X86_BR_KERNEL = 1 << 1, /* branch target is kernel */
+
+ X86_BR_CALL = 1 << 2, /* call */
+ X86_BR_RET = 1 << 3, /* return */
+ X86_BR_SYSCALL = 1 << 4, /* syscall */
+ X86_BR_SYSRET = 1 << 5, /* syscall return */
+ X86_BR_INT = 1 << 6, /* sw interrupt */
+ X86_BR_IRET = 1 << 7, /* return from interrupt */
+ X86_BR_JCC = 1 << 8, /* conditional */
+ X86_BR_JMP = 1 << 9, /* jump */
+ X86_BR_IRQ = 1 << 10,/* hw interrupt or trap or fault */
+ X86_BR_IND_CALL = 1 << 11,/* indirect calls */
+ X86_BR_ABORT = 1 << 12,/* transaction abort */
+ X86_BR_IN_TX = 1 << 13,/* in transaction */
+ X86_BR_NO_TX = 1 << 14,/* not in transaction */
+ X86_BR_ZERO_CALL = 1 << 15,/* zero length call */
+ X86_BR_CALL_STACK = 1 << 16,/* call stack */
+ X86_BR_IND_JMP = 1 << 17,/* indirect jump */
+
+ X86_BR_TYPE_SAVE = 1 << 18,/* indicate to save branch type */
+
+};
+
+#define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
+#define X86_BR_ANYTX (X86_BR_NO_TX | X86_BR_IN_TX)
+
+#define X86_BR_ANY \
+ (X86_BR_CALL |\
+ X86_BR_RET |\
+ X86_BR_SYSCALL |\
+ X86_BR_SYSRET |\
+ X86_BR_INT |\
+ X86_BR_IRET |\
+ X86_BR_JCC |\
+ X86_BR_JMP |\
+ X86_BR_IRQ |\
+ X86_BR_ABORT |\
+ X86_BR_IND_CALL |\
+ X86_BR_IND_JMP |\
+ X86_BR_ZERO_CALL)
+
+#define X86_BR_ALL (X86_BR_PLM | X86_BR_ANY)
+
+#define X86_BR_ANY_CALL \
+ (X86_BR_CALL |\
+ X86_BR_IND_CALL |\
+ X86_BR_ZERO_CALL |\
+ X86_BR_SYSCALL |\
+ X86_BR_IRQ |\
+ X86_BR_INT)
+
+int common_branch_type(int type);
+int branch_type(unsigned long from, unsigned long to, int abort);
+
ssize_t x86_event_sysfs_show(char *page, u64 config, u64 event);
ssize_t intel_event_sysfs_show(char *page, u64 config);
diff --git a/arch/x86/events/utils.c b/arch/x86/events/utils.c
new file mode 100644
index 000000000000..a32368945462
--- /dev/null
+++ b/arch/x86/events/utils.c
@@ -0,0 +1,216 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <asm/insn.h>
+
+#include "perf_event.h"
+
+/*
+ * return the type of control flow change at address "from"
+ * instruction is not necessarily a branch (in case of interrupt).
+ *
+ * The branch type returned also includes the priv level of the
+ * target of the control flow change (X86_BR_USER, X86_BR_KERNEL).
+ *
+ * If a branch type is unknown OR the instruction cannot be
+ * decoded (e.g., text page not present), then X86_BR_NONE is
+ * returned.
+ */
+int branch_type(unsigned long from, unsigned long to, int abort)
+{
+ struct insn insn;
+ void *addr;
+ int bytes_read, bytes_left;
+ int ret = X86_BR_NONE;
+ int ext, to_plm, from_plm;
+ u8 buf[MAX_INSN_SIZE];
+ int is64 = 0;
+
+ to_plm = kernel_ip(to) ? X86_BR_KERNEL : X86_BR_USER;
+ from_plm = kernel_ip(from) ? X86_BR_KERNEL : X86_BR_USER;
+
+ /*
+ * maybe zero if lbr did not fill up after a reset by the time
+ * we get a PMU interrupt
+ */
+ if (from == 0 || to == 0)
+ return X86_BR_NONE;
+
+ if (abort)
+ return X86_BR_ABORT | to_plm;
+
+ if (from_plm == X86_BR_USER) {
+ /*
+ * can happen if measuring at the user level only
+ * and we interrupt in a kernel thread, e.g., idle.
+ */
+ if (!current->mm)
+ return X86_BR_NONE;
+
+ /* may fail if text not present */
+ bytes_left = copy_from_user_nmi(buf, (void __user *)from,
+ MAX_INSN_SIZE);
+ bytes_read = MAX_INSN_SIZE - bytes_left;
+ if (!bytes_read)
+ return X86_BR_NONE;
+
+ addr = buf;
+ } else {
+ /*
+ * The LBR logs any address in the IP, even if the IP just
+ * faulted. This means userspace can control the from address.
+ * Ensure we don't blindly read any address by validating it is
+ * a known text address.
+ */
+ if (kernel_text_address(from)) {
+ addr = (void *)from;
+ /*
+ * Assume we can get the maximum possible size
+ * when grabbing kernel data. This is not
+ * _strictly_ true since we could possibly be
+ * executing up next to a memory hole, but
+ * it is very unlikely to be a problem.
+ */
+ bytes_read = MAX_INSN_SIZE;
+ } else {
+ return X86_BR_NONE;
+ }
+ }
+
+ /*
+ * decoder needs to know the ABI especially
+ * on 64-bit systems running 32-bit apps
+ */
+#ifdef CONFIG_X86_64
+ is64 = kernel_ip((unsigned long)addr) || any_64bit_mode(current_pt_regs());
+#endif
+ insn_init(&insn, addr, bytes_read, is64);
+ if (insn_get_opcode(&insn))
+ return X86_BR_ABORT;
+
+ switch (insn.opcode.bytes[0]) {
+ case 0xf:
+ switch (insn.opcode.bytes[1]) {
+ case 0x05: /* syscall */
+ case 0x34: /* sysenter */
+ ret = X86_BR_SYSCALL;
+ break;
+ case 0x07: /* sysret */
+ case 0x35: /* sysexit */
+ ret = X86_BR_SYSRET;
+ break;
+ case 0x80 ... 0x8f: /* conditional */
+ ret = X86_BR_JCC;
+ break;
+ default:
+ ret = X86_BR_NONE;
+ }
+ break;
+ case 0x70 ... 0x7f: /* conditional */
+ ret = X86_BR_JCC;
+ break;
+ case 0xc2: /* near ret */
+ case 0xc3: /* near ret */
+ case 0xca: /* far ret */
+ case 0xcb: /* far ret */
+ ret = X86_BR_RET;
+ break;
+ case 0xcf: /* iret */
+ ret = X86_BR_IRET;
+ break;
+ case 0xcc ... 0xce: /* int */
+ ret = X86_BR_INT;
+ break;
+ case 0xe8: /* call near rel */
+ if (insn_get_immediate(&insn) || insn.immediate1.value == 0) {
+ /* zero length call */
+ ret = X86_BR_ZERO_CALL;
+ break;
+ }
+ fallthrough;
+ case 0x9a: /* call far absolute */
+ ret = X86_BR_CALL;
+ break;
+ case 0xe0 ... 0xe3: /* loop jmp */
+ ret = X86_BR_JCC;
+ break;
+ case 0xe9 ... 0xeb: /* jmp */
+ ret = X86_BR_JMP;
+ break;
+ case 0xff: /* call near absolute, call far absolute ind */
+ if (insn_get_modrm(&insn))
+ return X86_BR_ABORT;
+
+ ext = (insn.modrm.bytes[0] >> 3) & 0x7;
+ switch (ext) {
+ case 2: /* near ind call */
+ case 3: /* far ind call */
+ ret = X86_BR_IND_CALL;
+ break;
+ case 4:
+ case 5:
+ ret = X86_BR_IND_JMP;
+ break;
+ }
+ break;
+ default:
+ ret = X86_BR_NONE;
+ }
+ /*
+ * interrupts, traps, faults (and thus ring transition) may
+ * occur on any instructions. Thus, to classify them correctly,
+ * we need to first look at the from and to priv levels. If they
+ * are different and to is in the kernel, then it indicates
+ * a ring transition. If the from instruction is not a ring
+ * transition instr (syscall, systenter, int), then it means
+ * it was a irq, trap or fault.
+ *
+ * we have no way of detecting kernel to kernel faults.
+ */
+ if (from_plm == X86_BR_USER && to_plm == X86_BR_KERNEL
+ && ret != X86_BR_SYSCALL && ret != X86_BR_INT)
+ ret = X86_BR_IRQ;
+
+ /*
+ * branch priv level determined by target as
+ * is done by HW when LBR_SELECT is implemented
+ */
+ if (ret != X86_BR_NONE)
+ ret |= to_plm;
+
+ return ret;
+}
+
+#define X86_BR_TYPE_MAP_MAX 16
+
+static int branch_map[X86_BR_TYPE_MAP_MAX] = {
+ PERF_BR_CALL, /* X86_BR_CALL */
+ PERF_BR_RET, /* X86_BR_RET */
+ PERF_BR_SYSCALL, /* X86_BR_SYSCALL */
+ PERF_BR_SYSRET, /* X86_BR_SYSRET */
+ PERF_BR_UNKNOWN, /* X86_BR_INT */
+ PERF_BR_ERET, /* X86_BR_IRET */
+ PERF_BR_COND, /* X86_BR_JCC */
+ PERF_BR_UNCOND, /* X86_BR_JMP */
+ PERF_BR_IRQ, /* X86_BR_IRQ */
+ PERF_BR_IND_CALL, /* X86_BR_IND_CALL */
+ PERF_BR_UNKNOWN, /* X86_BR_ABORT */
+ PERF_BR_UNKNOWN, /* X86_BR_IN_TX */
+ PERF_BR_UNKNOWN, /* X86_BR_NO_TX */
+ PERF_BR_CALL, /* X86_BR_ZERO_CALL */
+ PERF_BR_UNKNOWN, /* X86_BR_CALL_STACK */
+ PERF_BR_IND, /* X86_BR_IND_JMP */
+};
+
+int common_branch_type(int type)
+{
+ int i;
+
+ type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
+
+ if (type) {
+ i = __ffs(type);
+ if (i < X86_BR_TYPE_MAP_MAX)
+ return branch_map[i];
+ }
+
+ return PERF_BR_UNKNOWN;
+}
--
2.34.1
If AMD Last Branch Record Extension Version 2 (LbrExtV2) is detected,
convert the requested branch filter (PERF_SAMPLE_BRANCH_* flags) to the
corresponding hardware filter value and stash it in the event data when
a branch stack is requested. The hardware filter value is also saved in
per-CPU areas for use during event scheduling.
Hardware filtering is provided by the LBR Branch Select register. It has
bits which when set, suppress recording of the following types of branches:
* CPL = 0 (Kernel only)
* CPL > 0 (Userspace only)
* Conditional Branches
* Near Relative Calls
* Near Indirect Calls
* Near Returns
* Near Indirect Jumps (excluding Near Indirect Calls and Near Returns)
* Near Relative Jumps (excluding Near Relative Calls)
* Far Branches
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/events/amd/core.c | 21 ++++++---
arch/x86/events/amd/lbr.c | 94 +++++++++++++++++++++++++++++++++++++-
2 files changed, 108 insertions(+), 7 deletions(-)
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index d799628016c8..36bede1d7b1e 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -542,16 +542,24 @@ static int amd_pmu_cpu_prepare(int cpu)
{
struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
+ cpuc->lbr_sel = kzalloc_node(sizeof(struct er_account), GFP_KERNEL,
+ cpu_to_node(cpu));
+ if (!cpuc->lbr_sel)
+ return -ENOMEM;
+
WARN_ON_ONCE(cpuc->amd_nb);
if (!x86_pmu.amd_nb_constraints)
return 0;
cpuc->amd_nb = amd_alloc_nb(cpu);
- if (!cpuc->amd_nb)
- return -ENOMEM;
+ if (cpuc->amd_nb)
+ return 0;
- return 0;
+ kfree(cpuc->lbr_sel);
+ cpuc->lbr_sel = NULL;
+
+ return -ENOMEM;
}
static void amd_pmu_cpu_starting(int cpu)
@@ -589,13 +597,14 @@ static void amd_pmu_cpu_starting(int cpu)
static void amd_pmu_cpu_dead(int cpu)
{
- struct cpu_hw_events *cpuhw;
+ struct cpu_hw_events *cpuhw = &per_cpu(cpu_hw_events, cpu);
+
+ kfree(cpuhw->lbr_sel);
+ cpuhw->lbr_sel = NULL;
if (!x86_pmu.amd_nb_constraints)
return;
- cpuhw = &per_cpu(cpu_hw_events, cpu);
-
if (cpuhw->amd_nb) {
struct amd_nb *nb = cpuhw->amd_nb;
diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
index 39247e7be53c..064a6c8e1597 100644
--- a/arch/x86/events/amd/lbr.c
+++ b/arch/x86/events/amd/lbr.c
@@ -4,6 +4,39 @@
#include "../perf_event.h"
+/* LBR Branch Select valid bits */
+#define LBR_SELECT_MASK 0x1ff
+
+/*
+ * LBR Branch Select filter bits which when set, ensures that the
+ * corresponding type of branches are not recorded
+ */
+#define LBR_SELECT_KERNEL 0 /* Branches ending in CPL = 0 */
+#define LBR_SELECT_USER 1 /* Branches ending in CPL > 0 */
+#define LBR_SELECT_JCC 2 /* Conditional branches */
+#define LBR_SELECT_CALL_NEAR_REL 3 /* Near relative calls */
+#define LBR_SELECT_CALL_NEAR_IND 4 /* Indirect relative calls */
+#define LBR_SELECT_RET_NEAR 5 /* Near returns */
+#define LBR_SELECT_JMP_NEAR_IND 6 /* Near indirect jumps (excl. calls and returns) */
+#define LBR_SELECT_JMP_NEAR_REL 7 /* Near relative jumps (excl. calls) */
+#define LBR_SELECT_FAR_BRANCH 8 /* Far branches */
+
+#define LBR_KERNEL BIT(LBR_SELECT_KERNEL)
+#define LBR_USER BIT(LBR_SELECT_USER)
+#define LBR_JCC BIT(LBR_SELECT_JCC)
+#define LBR_REL_CALL BIT(LBR_SELECT_CALL_NEAR_REL)
+#define LBR_IND_CALL BIT(LBR_SELECT_CALL_NEAR_IND)
+#define LBR_RETURN BIT(LBR_SELECT_RET_NEAR)
+#define LBR_REL_JMP BIT(LBR_SELECT_JMP_NEAR_REL)
+#define LBR_IND_JMP BIT(LBR_SELECT_JMP_NEAR_IND)
+#define LBR_FAR BIT(LBR_SELECT_FAR_BRANCH)
+#define LBR_NOT_SUPP -1 /* unsupported filter */
+#define LBR_IGNORE 0
+
+#define LBR_ANY \
+ (LBR_JCC | LBR_REL_CALL | LBR_IND_CALL | LBR_RETURN | \
+ LBR_REL_JMP | LBR_IND_JMP | LBR_FAR)
+
struct branch_entry {
union {
struct {
@@ -97,12 +130,56 @@ void amd_pmu_lbr_read(void)
cpuc->lbr_stack.hw_idx = 0;
}
+static const int lbr_select_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
+ [PERF_SAMPLE_BRANCH_USER_SHIFT] = LBR_USER,
+ [PERF_SAMPLE_BRANCH_KERNEL_SHIFT] = LBR_KERNEL,
+ [PERF_SAMPLE_BRANCH_HV_SHIFT] = LBR_IGNORE,
+
+ [PERF_SAMPLE_BRANCH_ANY_SHIFT] = LBR_ANY,
+ [PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] = LBR_REL_CALL | LBR_IND_CALL,
+ [PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT] = LBR_RETURN,
+ [PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] = LBR_IND_CALL,
+ [PERF_SAMPLE_BRANCH_ABORT_TX_SHIFT] = LBR_NOT_SUPP,
+ [PERF_SAMPLE_BRANCH_IN_TX_SHIFT] = LBR_NOT_SUPP,
+ [PERF_SAMPLE_BRANCH_NO_TX_SHIFT] = LBR_NOT_SUPP,
+ [PERF_SAMPLE_BRANCH_COND_SHIFT] = LBR_JCC,
+
+ [PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT] = LBR_NOT_SUPP,
+ [PERF_SAMPLE_BRANCH_IND_JUMP_SHIFT] = LBR_IND_JMP,
+ [PERF_SAMPLE_BRANCH_CALL_SHIFT] = LBR_REL_CALL,
+
+ [PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT] = LBR_NOT_SUPP,
+ [PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT] = LBR_NOT_SUPP,
+
+ [PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT] = LBR_NOT_SUPP,
+};
+
static int amd_pmu_lbr_setup_filter(struct perf_event *event)
{
+ struct hw_perf_event_extra *reg = &event->hw.branch_reg;
+ u64 br_type = event->attr.branch_sample_type;
+ u64 mask = 0, v;
+ int i;
+
/* No LBR support */
if (!x86_pmu.lbr_nr)
return -EOPNOTSUPP;
+ for (i = 0; i < PERF_SAMPLE_BRANCH_MAX_SHIFT; i++) {
+ if (!(br_type & BIT_ULL(i)))
+ continue;
+
+ v = lbr_select_map[i];
+ if (v == LBR_NOT_SUPP)
+ return -EOPNOTSUPP;
+
+ if (v != LBR_IGNORE)
+ mask |= v;
+ }
+
+ /* Filter bits operate in suppress mode */
+ reg->config = mask ^ LBR_SELECT_MASK;
+
return 0;
}
@@ -137,6 +214,7 @@ void amd_pmu_lbr_reset(void)
cpuc->last_task_ctx = NULL;
cpuc->last_log_id = 0;
+ wrmsrl(MSR_AMD64_LBR_SELECT, 0);
}
void amd_pmu_lbr_add(struct perf_event *event)
@@ -146,6 +224,11 @@ void amd_pmu_lbr_add(struct perf_event *event)
if (!x86_pmu.lbr_nr)
return;
+ if (has_branch_stack(event)) {
+ cpuc->lbr_select = 1;
+ cpuc->lbr_sel->config = event->hw.branch_reg.config;
+ }
+
perf_sched_cb_inc(event->ctx->pmu);
if (!cpuc->lbr_users++ && !event->total_time_running)
@@ -159,6 +242,9 @@ void amd_pmu_lbr_del(struct perf_event *event)
if (!x86_pmu.lbr_nr)
return;
+ if (has_branch_stack(event))
+ cpuc->lbr_select = 0;
+
cpuc->lbr_users--;
WARN_ON_ONCE(cpuc->lbr_users < 0);
perf_sched_cb_dec(event->ctx->pmu);
@@ -180,11 +266,17 @@ void amd_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
void amd_pmu_lbr_enable_all(void)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
- u64 dbg_ctl, dbg_extn_cfg;
+ u64 lbr_select, dbg_ctl, dbg_extn_cfg;
if (!cpuc->lbr_users || !x86_pmu.lbr_nr)
return;
+ /* Set hardware branch filter */
+ if (cpuc->lbr_select) {
+ lbr_select = cpuc->lbr_sel->config & LBR_SELECT_MASK;
+ wrmsrl(MSR_AMD64_LBR_SELECT, lbr_select);
+ }
+
rdmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl);
rdmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg);
--
2.34.1
AMD processors that are capable of recording branches support either Branch
Sampling (BRS) or Last Branch Record (LBR). In preparation for adding Last
Branch Record Extension Version 2 (LbrExtV2) support, reuse the "branches"
capability to advertise information about both BRS and LBR but make the
"branch-brs" event exclusive to Family 19h processors that support BRS.
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/events/amd/core.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index e32a27899e11..2f524cf84528 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -1247,23 +1247,25 @@ static ssize_t branches_show(struct device *cdev,
static DEVICE_ATTR_RO(branches);
-static struct attribute *amd_pmu_brs_attrs[] = {
+static struct attribute *amd_pmu_branches_attrs[] = {
&dev_attr_branches.attr,
NULL,
};
static umode_t
-amd_brs_is_visible(struct kobject *kobj, struct attribute *attr, int i)
+amd_branches_is_visible(struct kobject *kobj, struct attribute *attr, int i)
{
return x86_pmu.lbr_nr ? attr->mode : 0;
}
-static struct attribute_group group_caps_amd_brs = {
+static struct attribute_group group_caps_amd_branches = {
.name = "caps",
- .attrs = amd_pmu_brs_attrs,
- .is_visible = amd_brs_is_visible,
+ .attrs = amd_pmu_branches_attrs,
+ .is_visible = amd_branches_is_visible,
};
+#ifdef CONFIG_PERF_EVENTS_AMD_BRS
+
EVENT_ATTR_STR(branch-brs, amd_branch_brs,
"event=" __stringify(AMD_FAM19H_BRS_EVENT)"\n");
@@ -1272,15 +1274,26 @@ static struct attribute *amd_brs_events_attrs[] = {
NULL,
};
+static umode_t
+amd_brs_is_visible(struct kobject *kobj, struct attribute *attr, int i)
+{
+ return static_cpu_has(X86_FEATURE_BRS) && x86_pmu.lbr_nr ?
+ attr->mode : 0;
+}
+
static struct attribute_group group_events_amd_brs = {
.name = "events",
.attrs = amd_brs_events_attrs,
.is_visible = amd_brs_is_visible,
};
+#endif /* CONFIG_PERF_EVENTS_AMD_BRS */
+
static const struct attribute_group *amd_attr_update[] = {
- &group_caps_amd_brs,
+ &group_caps_amd_branches,
+#ifdef CONFIG_PERF_EVENTS_AMD_BRS
&group_events_amd_brs,
+#endif
NULL,
};
--
2.34.1
AMD Last Branch Record Extension Version 2 (LbrExtV2) is driven by Core PMC
overflows. It records recently taken branches up to the moment when the PMC
overflow occurs.
Detect the feature during PMU initialization and set the branch stack depth
using CPUID leaf 0x80000022 EBX.
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/events/amd/Makefile | 2 +-
arch/x86/events/amd/core.c | 9 +++++----
arch/x86/events/amd/lbr.c | 21 +++++++++++++++++++++
arch/x86/events/perf_event.h | 2 ++
arch/x86/include/asm/perf_event.h | 3 ++-
5 files changed, 31 insertions(+), 6 deletions(-)
create mode 100644 arch/x86/events/amd/lbr.c
diff --git a/arch/x86/events/amd/Makefile b/arch/x86/events/amd/Makefile
index b9f5d4610256..527d947eb76b 100644
--- a/arch/x86/events/amd/Makefile
+++ b/arch/x86/events/amd/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_CPU_SUP_AMD) += core.o
+obj-$(CONFIG_CPU_SUP_AMD) += core.o lbr.o
obj-$(CONFIG_PERF_EVENTS_AMD_BRS) += brs.o
obj-$(CONFIG_PERF_EVENTS_AMD_POWER) += power.o
obj-$(CONFIG_X86_LOCAL_APIC) += ibs.o
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index ef3520731a20..a3aa67b4fd50 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -1374,10 +1374,11 @@ static int __init amd_core_pmu_init(void)
x86_pmu.flags |= PMU_FL_PAIR;
}
- /*
- * BRS requires special event constraints and flushing on ctxsw.
- */
- if (boot_cpu_data.x86 >= 0x19 && !amd_brs_init()) {
+ /* LBR and BRS are mutually exclusive features */
+ if (amd_pmu_lbr_init() && !amd_brs_init()) {
+ /*
+ * BRS requires special event constraints and flushing on ctxsw.
+ */
x86_pmu.get_event_constraints = amd_get_event_constraints_f19h;
x86_pmu.sched_task = amd_pmu_brs_sched_task;
x86_pmu.limit_period = amd_pmu_limit_period;
diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
new file mode 100644
index 000000000000..d240ff722b78
--- /dev/null
+++ b/arch/x86/events/amd/lbr.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/perf_event.h>
+#include <asm/perf_event.h>
+
+#include "../perf_event.h"
+
+__init int amd_pmu_lbr_init(void)
+{
+ union cpuid_0x80000022_ebx ebx;
+
+ if (x86_pmu.version < 2 || !boot_cpu_has(X86_FEATURE_LBREXT_V2))
+ return -EOPNOTSUPP;
+
+ /* Set number of entries */
+ ebx.full = cpuid_ebx(EXT_PERFMON_DEBUG_FEATURES);
+ x86_pmu.lbr_nr = ebx.split.lbr_v2_stack_sz;
+
+ pr_cont("%d-deep LBR, ", x86_pmu.lbr_nr);
+
+ return 0;
+}
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 6d23e88d232c..a6cd83c57e8b 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1230,6 +1230,8 @@ static inline bool fixed_counter_disabled(int i, struct pmu *pmu)
int amd_pmu_init(void);
+int amd_pmu_lbr_init(void);
+
#ifdef CONFIG_PERF_EVENTS_AMD_BRS
#define AMD_FAM19H_BRS_EVENT 0xc4 /* RETIRED_TAKEN_BRANCH_INSTRUCTIONS */
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 34348ae41cdb..07844424aaea 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -207,7 +207,8 @@ union cpuid_0x80000022_ebx {
struct {
/* Number of Core Performance Counters */
unsigned int num_core_pmc:4;
- unsigned int reserved:6;
+ /* Number of available LBR Stack Entries */
+ unsigned int lbr_v2_stack_sz:6;
/* Number of Data Fabric Counters */
unsigned int num_df_pmc:6;
} split;
--
2.34.1
Add a new "spec" bitfield to branch entries for providing speculation
information. This will be populated using hints provided by branch sampling
features on supported hardware. The following cases are covered:
* No branch speculation information is available
* Branch is speculative but taken on the wrong path
* Branch is non-speculative but taken on the correct path
* Branch is speculative and taken on the correct path
Suggested-by: Stephane Eranian <[email protected]>
Signed-off-by: Sandipan Das <[email protected]>
---
include/linux/perf_event.h | 1 +
include/uapi/linux/perf_event.h | 15 ++++++++++++++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ee8b9ecdc03b..ae30c61957d2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1078,6 +1078,7 @@ static inline void perf_clear_branch_entry_bitfields(struct perf_branch_entry *b
br->abort = 0;
br->cycles = 0;
br->type = 0;
+ br->spec = PERF_BR_SPEC_NA;
br->reserved = 0;
}
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 0474ee362151..9656733a95a5 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -256,6 +256,17 @@ enum {
PERF_BR_MAX,
};
+/*
+ * Common branch speculation outcome classification
+ */
+enum {
+ PERF_BR_SPEC_NA = 0, /* Not available */
+ PERF_BR_SPEC_WRONG_PATH = 1, /* Speculative but on wrong path */
+ PERF_BR_NON_SPEC_CORRECT_PATH = 2, /* Non-speculative but on correct path */
+ PERF_BR_SPEC_CORRECT_PATH = 3, /* Speculative and on correct path */
+ PERF_BR_SPEC_MAX,
+};
+
#define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
PERF_SAMPLE_BRANCH_KERNEL|\
@@ -1363,6 +1374,7 @@ union perf_mem_data_src {
* abort: aborting a hardware transaction
* cycles: cycles from last branch (or 0 if not supported)
* type: branch type
+ * spec: branch speculation info (or 0 if not supported)
*/
struct perf_branch_entry {
__u64 from;
@@ -1373,7 +1385,8 @@ struct perf_branch_entry {
abort:1, /* transaction abort */
cycles:16, /* cycle count to last branch */
type:4, /* branch type */
- reserved:40;
+ spec:2, /* branch speculation info */
+ reserved:38;
};
union perf_sample_weight {
--
2.34.1
Move some of the Branch Sampling (BRS) specific functions out of the Core
events sources and into the BRS sources in preparation for adding other
mechanisms to record branches.
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/events/amd/brs.c | 69 ++++++++++++++++++++++++++++++++++-
arch/x86/events/amd/core.c | 70 ++----------------------------------
arch/x86/events/perf_event.h | 7 ++--
3 files changed, 76 insertions(+), 70 deletions(-)
diff --git a/arch/x86/events/amd/brs.c b/arch/x86/events/amd/brs.c
index bee8765a1e9b..f1bff153d945 100644
--- a/arch/x86/events/amd/brs.c
+++ b/arch/x86/events/amd/brs.c
@@ -81,7 +81,7 @@ static bool __init amd_brs_detect(void)
* a br_sel_map. Software filtering is not supported because it would not correlate well
* with a sampling period.
*/
-int amd_brs_setup_filter(struct perf_event *event)
+static int amd_brs_setup_filter(struct perf_event *event)
{
u64 type = event->attr.branch_sample_type;
@@ -96,6 +96,73 @@ int amd_brs_setup_filter(struct perf_event *event)
return 0;
}
+static inline int amd_is_brs_event(struct perf_event *e)
+{
+ return (e->hw.config & AMD64_RAW_EVENT_MASK) == AMD_FAM19H_BRS_EVENT;
+}
+
+int amd_brs_hw_config(struct perf_event *event)
+{
+ int ret = 0;
+
+ /*
+ * Due to interrupt holding, BRS is not recommended in
+ * counting mode.
+ */
+ if (!is_sampling_event(event))
+ return -EINVAL;
+
+ /*
+ * Due to the way BRS operates by holding the interrupt until
+ * lbr_nr entries have been captured, it does not make sense
+ * to allow sampling on BRS with an event that does not match
+ * what BRS is capturing, i.e., retired taken branches.
+ * Otherwise the correlation with the event's period is even
+ * more loose:
+ *
+ * With retired taken branch:
+ * Effective P = P + 16 + X
+ * With any other event:
+ * Effective P = P + Y + X
+ *
+ * Where X is the number of taken branches due to interrupt
+ * skid. Skid is large.
+ *
+ * Where Y is the occurences of the event while BRS is
+ * capturing the lbr_nr entries.
+ *
+ * By using retired taken branches, we limit the impact on the
+ * Y variable. We know it cannot be more than the depth of
+ * BRS.
+ */
+ if (!amd_is_brs_event(event))
+ return -EINVAL;
+
+ /*
+ * BRS implementation does not work with frequency mode
+ * reprogramming of the period.
+ */
+ if (event->attr.freq)
+ return -EINVAL;
+ /*
+ * The kernel subtracts BRS depth from period, so it must
+ * be big enough.
+ */
+ if (event->attr.sample_period <= x86_pmu.lbr_nr)
+ return -EINVAL;
+
+ /*
+ * Check if we can allow PERF_SAMPLE_BRANCH_STACK
+ */
+ ret = amd_brs_setup_filter(event);
+
+ /* only set in case of success */
+ if (!ret)
+ event->hw.flags |= PERF_X86_EVENT_AMD_BRS;
+
+ return ret;
+}
+
/* tos = top of stack, i.e., last valid entry written */
static inline int amd_brs_get_tos(union amd_debug_extn_cfg *cfg)
{
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 9ac3718410ce..e32a27899e11 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -330,16 +330,8 @@ static inline bool amd_is_pair_event_code(struct hw_perf_event *hwc)
}
}
-#define AMD_FAM19H_BRS_EVENT 0xc4 /* RETIRED_TAKEN_BRANCH_INSTRUCTIONS */
-static inline int amd_is_brs_event(struct perf_event *e)
-{
- return (e->hw.config & AMD64_RAW_EVENT_MASK) == AMD_FAM19H_BRS_EVENT;
-}
-
static int amd_core_hw_config(struct perf_event *event)
{
- int ret = 0;
-
if (event->attr.exclude_host && event->attr.exclude_guest)
/*
* When HO == GO == 1 the hardware treats that as GO == HO == 0
@@ -356,66 +348,10 @@ static int amd_core_hw_config(struct perf_event *event)
if ((x86_pmu.flags & PMU_FL_PAIR) && amd_is_pair_event_code(&event->hw))
event->hw.flags |= PERF_X86_EVENT_PAIR;
- /*
- * if branch stack is requested
- */
- if (has_branch_stack(event)) {
- /*
- * Due to interrupt holding, BRS is not recommended in
- * counting mode.
- */
- if (!is_sampling_event(event))
- return -EINVAL;
-
- /*
- * Due to the way BRS operates by holding the interrupt until
- * lbr_nr entries have been captured, it does not make sense
- * to allow sampling on BRS with an event that does not match
- * what BRS is capturing, i.e., retired taken branches.
- * Otherwise the correlation with the event's period is even
- * more loose:
- *
- * With retired taken branch:
- * Effective P = P + 16 + X
- * With any other event:
- * Effective P = P + Y + X
- *
- * Where X is the number of taken branches due to interrupt
- * skid. Skid is large.
- *
- * Where Y is the occurences of the event while BRS is
- * capturing the lbr_nr entries.
- *
- * By using retired taken branches, we limit the impact on the
- * Y variable. We know it cannot be more than the depth of
- * BRS.
- */
- if (!amd_is_brs_event(event))
- return -EINVAL;
+ if (has_branch_stack(event))
+ return amd_brs_hw_config(event);
- /*
- * BRS implementation does not work with frequency mode
- * reprogramming of the period.
- */
- if (event->attr.freq)
- return -EINVAL;
- /*
- * The kernel subtracts BRS depth from period, so it must
- * be big enough.
- */
- if (event->attr.sample_period <= x86_pmu.lbr_nr)
- return -EINVAL;
-
- /*
- * Check if we can allow PERF_SAMPLE_BRANCH_STACK
- */
- ret = amd_brs_setup_filter(event);
-
- /* only set in case of success */
- if (!ret)
- event->hw.flags |= PERF_X86_EVENT_AMD_BRS;
- }
- return ret;
+ return 0;
}
static inline int amd_is_nb_event(struct hw_perf_event *hwc)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index ca2f8bfe6ff1..6d23e88d232c 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1231,6 +1231,9 @@ static inline bool fixed_counter_disabled(int i, struct pmu *pmu)
int amd_pmu_init(void);
#ifdef CONFIG_PERF_EVENTS_AMD_BRS
+
+#define AMD_FAM19H_BRS_EVENT 0xc4 /* RETIRED_TAKEN_BRANCH_INSTRUCTIONS */
+
int amd_brs_init(void);
void amd_brs_disable(void);
void amd_brs_enable(void);
@@ -1239,7 +1242,7 @@ void amd_brs_disable_all(void);
void amd_brs_drain(void);
void amd_brs_lopwr_init(void);
void amd_brs_disable_all(void);
-int amd_brs_setup_filter(struct perf_event *event);
+int amd_brs_hw_config(struct perf_event *event);
void amd_brs_reset(void);
static inline void amd_pmu_brs_add(struct perf_event *event)
@@ -1275,7 +1278,7 @@ static inline void amd_brs_enable(void) {}
static inline void amd_brs_drain(void) {}
static inline void amd_brs_lopwr_init(void) {}
static inline void amd_brs_disable_all(void) {}
-static inline int amd_brs_setup_filter(struct perf_event *event)
+static inline int amd_brs_hw_config(struct perf_event *event)
{
return 0;
}
--
2.34.1
AMD processors that are capable of recording branches support either Branch
Sampling (BRS) or Last Branch Record (LBR). In preparation for adding Last
Branch Record Extension Version 2 (LbrExtV2) support, introduce new static
calls which act as gateways to call into the feature-dependent functions
based on what is available on the processor.
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/events/amd/core.c | 34 ++++++++++++++++++++++------------
1 file changed, 22 insertions(+), 12 deletions(-)
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 2f524cf84528..ef3520731a20 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -330,6 +330,8 @@ static inline bool amd_is_pair_event_code(struct hw_perf_event *hwc)
}
}
+DEFINE_STATIC_CALL_RET0(amd_pmu_branch_hw_config, *x86_pmu.hw_config);
+
static int amd_core_hw_config(struct perf_event *event)
{
if (event->attr.exclude_host && event->attr.exclude_guest)
@@ -349,7 +351,7 @@ static int amd_core_hw_config(struct perf_event *event)
event->hw.flags |= PERF_X86_EVENT_PAIR;
if (has_branch_stack(event))
- return amd_brs_hw_config(event);
+ return static_call(amd_pmu_branch_hw_config)(event);
return 0;
}
@@ -518,8 +520,14 @@ static struct amd_nb *amd_alloc_nb(int cpu)
return nb;
}
+typedef void (amd_pmu_branch_reset_t)(void);
+DEFINE_STATIC_CALL_NULL(amd_pmu_branch_reset, amd_pmu_branch_reset_t);
+
static void amd_pmu_cpu_reset(int cpu)
{
+ if (x86_pmu.lbr_nr)
+ static_call(amd_pmu_branch_reset)();
+
if (x86_pmu.version < 2)
return;
@@ -576,7 +584,6 @@ static void amd_pmu_cpu_starting(int cpu)
cpuc->amd_nb->nb_id = nb_id;
cpuc->amd_nb->refcnt++;
- amd_brs_reset();
amd_pmu_cpu_reset(cpu);
}
@@ -771,16 +778,20 @@ static void amd_pmu_v2_disable_all(void)
amd_pmu_check_overflow();
}
+DEFINE_STATIC_CALL_NULL(amd_pmu_branch_add, *x86_pmu.add);
+
static void amd_pmu_add_event(struct perf_event *event)
{
if (needs_branch_stack(event))
- amd_pmu_brs_add(event);
+ static_call(amd_pmu_branch_add)(event);
}
+DEFINE_STATIC_CALL_NULL(amd_pmu_branch_del, *x86_pmu.del);
+
static void amd_pmu_del_event(struct perf_event *event)
{
if (needs_branch_stack(event))
- amd_pmu_brs_del(event);
+ static_call(amd_pmu_branch_del)(event);
}
/*
@@ -1184,13 +1195,6 @@ static ssize_t amd_event_sysfs_show(char *page, u64 config)
return x86_event_sysfs_show(page, config, event);
}
-static void amd_pmu_sched_task(struct perf_event_context *ctx,
- bool sched_in)
-{
- if (sched_in && x86_pmu.lbr_nr)
- amd_pmu_brs_sched_task(ctx, sched_in);
-}
-
static u64 amd_pmu_limit_period(struct perf_event *event, u64 left)
{
/*
@@ -1375,8 +1379,14 @@ static int __init amd_core_pmu_init(void)
*/
if (boot_cpu_data.x86 >= 0x19 && !amd_brs_init()) {
x86_pmu.get_event_constraints = amd_get_event_constraints_f19h;
- x86_pmu.sched_task = amd_pmu_sched_task;
+ x86_pmu.sched_task = amd_pmu_brs_sched_task;
x86_pmu.limit_period = amd_pmu_limit_period;
+
+ static_call_update(amd_pmu_branch_hw_config, amd_brs_hw_config);
+ static_call_update(amd_pmu_branch_reset, amd_brs_reset);
+ static_call_update(amd_pmu_branch_add, amd_pmu_brs_add);
+ static_call_update(amd_pmu_branch_del, amd_pmu_brs_del);
+
/*
* put_event_constraints callback same as Fam17h, set above
*/
--
2.34.1
CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance
monitoring features for AMD processors.
Bit 1 of EAX indicates support for Last Branch Record Extension Version 2
(LbrExtV2) features. If found to be set during PMU initialization, the EBX
bits of the same leaf can be used to determine the number of available LBR
entries.
For better utilization of feature words, LbrExtV2 is added as a scattered
feature bit.
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/kernel/cpu/scattered.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 393f2bbb5e3a..e3fa476a24b0 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -96,7 +96,7 @@
#define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in IA32 userspace */
#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */
#define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */
-/* FREE! ( 3*32+17) */
+#define X86_FEATURE_LBREXT_V2 ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
#define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" LFENCE synchronizes RDTSC */
#define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */
#define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index dbaa8326d6f2..6be46dffddbf 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 },
{ X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
{ X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
+ { X86_FEATURE_LBREXT_V2, CPUID_EAX, 1, 0x80000022, 0 },
{ 0, 0, 0, 0, 0 }
};
--
2.34.1
Provide branch speculation information captured via AMD Last Branch Record
Extension Version 2 (LbrExtV2) by setting the speculation info in branch
records. The info is based on the "valid" and "spec" bits in the Branch To
registers.
Suggested-by: Stephane Eranian <[email protected]>
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/events/amd/lbr.c | 35 ++++++++++++++++++++++++++++++++---
1 file changed, 32 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
index 5fa985cd44cb..3fd316b6adda 100644
--- a/arch/x86/events/amd/lbr.c
+++ b/arch/x86/events/amd/lbr.c
@@ -146,12 +146,19 @@ static void amd_pmu_lbr_filter(void)
}
}
+static const int lbr_spec_map[PERF_BR_SPEC_MAX] = {
+ PERF_BR_SPEC_NA,
+ PERF_BR_SPEC_WRONG_PATH,
+ PERF_BR_NON_SPEC_CORRECT_PATH,
+ PERF_BR_SPEC_CORRECT_PATH,
+};
+
void amd_pmu_lbr_read(void)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
struct perf_branch_entry *br = cpuc->lbr_entries;
struct branch_entry entry;
- int out = 0, i;
+ int out = 0, idx, i;
if (!cpuc->lbr_users)
return;
@@ -160,8 +167,11 @@ void amd_pmu_lbr_read(void)
entry.from.full = amd_pmu_lbr_get_from(i);
entry.to.full = amd_pmu_lbr_get_to(i);
- /* Check if a branch has been logged */
- if (!entry.to.split.valid)
+ /*
+ * Check if a branch has been logged; if valid = 0, spec = 0
+ * then no branch was recorded
+ */
+ if (!entry.to.split.valid && !entry.to.split.spec)
continue;
perf_clear_branch_entry_bitfields(br + out);
@@ -170,6 +180,25 @@ void amd_pmu_lbr_read(void)
br[out].to = sign_ext_branch_ip(entry.to.split.ip);
br[out].mispred = entry.from.split.mispredict;
br[out].predicted = !br[out].mispred;
+
+ /*
+ * Set branch speculation information using the status of
+ * the valid and spec bits.
+ *
+ * When valid = 0, spec = 0, no branch was recorded and the
+ * entry is discarded as seen above.
+ *
+ * When valid = 0, spec = 1, the recorded branch was
+ * speculative but took the wrong path.
+ *
+ * When valid = 1, spec = 0, the recorded branch was
+ * non-speculative but took the correct path.
+ *
+ * When valid = 1, spec = 1, the recorded branch was
+ * speculative and took the correct path
+ */
+ idx = (entry.to.split.valid << 1) | entry.to.split.spec;
+ br[out].spec = lbr_spec_map[idx];
out++;
}
--
2.34.1
With AMD Last Branch Record Extension Version 2 (LbrExtV2), it is necessary
to process the branch records further as hardware filtering is not granular
enough for identifying certain types of branches. E.g. to record system
calls, one should record far branches. The filter captures both far calls
and far returns but the irrelevant records are filtered out based on the
branch type as seen by the branch classifier.
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/events/amd/lbr.c | 92 ++++++++++++++++++++++++++++++++++++---
1 file changed, 87 insertions(+), 5 deletions(-)
diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
index 064a6c8e1597..3699ba53a326 100644
--- a/arch/x86/events/amd/lbr.c
+++ b/arch/x86/events/amd/lbr.c
@@ -94,6 +94,50 @@ static __always_inline u64 sign_ext_branch_ip(u64 ip)
return (u64)(((s64)ip << shift) >> shift);
}
+static void amd_pmu_lbr_filter(void)
+{
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ int br_sel = cpuc->br_sel, type, i, j;
+ bool compress = false;
+ u64 from, to;
+
+ /* If sampling all branches, there is nothing to filter */
+ if (((br_sel & X86_BR_ALL) == X86_BR_ALL) &&
+ ((br_sel & X86_BR_TYPE_SAVE) != X86_BR_TYPE_SAVE))
+ return;
+
+ for (i = 0; i < cpuc->lbr_stack.nr; i++) {
+ from = cpuc->lbr_entries[i].from;
+ to = cpuc->lbr_entries[i].to;
+ type = branch_type(from, to, 0);
+
+ /* If type does not correspond, then discard */
+ if (type == X86_BR_NONE || (br_sel & type) != type) {
+ cpuc->lbr_entries[i].from = 0; /* mark invalid */
+ compress = true;
+ }
+
+ if ((br_sel & X86_BR_TYPE_SAVE) == X86_BR_TYPE_SAVE)
+ cpuc->lbr_entries[i].type = common_branch_type(type);
+ }
+
+ if (!compress)
+ return;
+
+ /* Remove all invalid entries */
+ for (i = 0; i < cpuc->lbr_stack.nr; ) {
+ if (!cpuc->lbr_entries[i].from) {
+ j = i;
+ while (++j < cpuc->lbr_stack.nr)
+ cpuc->lbr_entries[j - 1] = cpuc->lbr_entries[j];
+ cpuc->lbr_stack.nr--;
+ if (!cpuc->lbr_entries[i].from)
+ continue;
+ }
+ i++;
+ }
+}
+
void amd_pmu_lbr_read(void)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -128,6 +172,9 @@ void amd_pmu_lbr_read(void)
* LBR To[0] always represent the TOS
*/
cpuc->lbr_stack.hw_idx = 0;
+
+ /* Perform further software filtering */
+ amd_pmu_lbr_filter();
}
static const int lbr_select_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
@@ -136,8 +183,8 @@ static const int lbr_select_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
[PERF_SAMPLE_BRANCH_HV_SHIFT] = LBR_IGNORE,
[PERF_SAMPLE_BRANCH_ANY_SHIFT] = LBR_ANY,
- [PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] = LBR_REL_CALL | LBR_IND_CALL,
- [PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT] = LBR_RETURN,
+ [PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT] = LBR_REL_CALL | LBR_IND_CALL | LBR_FAR,
+ [PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT] = LBR_RETURN | LBR_FAR,
[PERF_SAMPLE_BRANCH_IND_CALL_SHIFT] = LBR_IND_CALL,
[PERF_SAMPLE_BRANCH_ABORT_TX_SHIFT] = LBR_NOT_SUPP,
[PERF_SAMPLE_BRANCH_IN_TX_SHIFT] = LBR_NOT_SUPP,
@@ -150,8 +197,6 @@ static const int lbr_select_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
[PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT] = LBR_NOT_SUPP,
[PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT] = LBR_NOT_SUPP,
-
- [PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT] = LBR_NOT_SUPP,
};
static int amd_pmu_lbr_setup_filter(struct perf_event *event)
@@ -165,6 +210,41 @@ static int amd_pmu_lbr_setup_filter(struct perf_event *event)
if (!x86_pmu.lbr_nr)
return -EOPNOTSUPP;
+ if (br_type & PERF_SAMPLE_BRANCH_USER)
+ mask |= X86_BR_USER;
+
+ if (br_type & PERF_SAMPLE_BRANCH_KERNEL)
+ mask |= X86_BR_KERNEL;
+
+ /* Ignore BRANCH_HV here */
+
+ if (br_type & PERF_SAMPLE_BRANCH_ANY)
+ mask |= X86_BR_ANY;
+
+ if (br_type & PERF_SAMPLE_BRANCH_ANY_CALL)
+ mask |= X86_BR_ANY_CALL;
+
+ if (br_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
+ mask |= X86_BR_RET | X86_BR_IRET | X86_BR_SYSRET;
+
+ if (br_type & PERF_SAMPLE_BRANCH_IND_CALL)
+ mask |= X86_BR_IND_CALL;
+
+ if (br_type & PERF_SAMPLE_BRANCH_COND)
+ mask |= X86_BR_JCC;
+
+ if (br_type & PERF_SAMPLE_BRANCH_IND_JUMP)
+ mask |= X86_BR_IND_JMP;
+
+ if (br_type & PERF_SAMPLE_BRANCH_CALL)
+ mask |= X86_BR_CALL | X86_BR_ZERO_CALL;
+
+ if (br_type & PERF_SAMPLE_BRANCH_TYPE_SAVE)
+ mask |= X86_BR_TYPE_SAVE;
+
+ reg->reg = mask;
+ mask = 0;
+
for (i = 0; i < PERF_SAMPLE_BRANCH_MAX_SHIFT; i++) {
if (!(br_type & BIT_ULL(i)))
continue;
@@ -220,13 +300,15 @@ void amd_pmu_lbr_reset(void)
void amd_pmu_lbr_add(struct perf_event *event)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ struct hw_perf_event_extra *reg = &event->hw.branch_reg;
if (!x86_pmu.lbr_nr)
return;
if (has_branch_stack(event)) {
cpuc->lbr_select = 1;
- cpuc->lbr_sel->config = event->hw.branch_reg.config;
+ cpuc->lbr_sel->config = reg->config;
+ cpuc->br_sel = reg->reg;
}
perf_sched_cb_inc(event->ctx->pmu);
--
2.34.1
AMD Last Branch Record Extension Version 2 (LbrExtV2) can report a branch
from address that points to an instruction preceding the actual branch by
several bytes due to branch fusion and further optimizations in Zen4
processors.
In such cases, software should move forward sequentially in the instruction
stream from the reported address and the address of the first branch
encountered should be used instead. Hence, use the fusion-aware branch
classifier to determine the correct branch type and get the offset for
adjusting the branch from address.
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/events/amd/lbr.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
index 3699ba53a326..5fa985cd44cb 100644
--- a/arch/x86/events/amd/lbr.c
+++ b/arch/x86/events/amd/lbr.c
@@ -97,7 +97,7 @@ static __always_inline u64 sign_ext_branch_ip(u64 ip)
static void amd_pmu_lbr_filter(void)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
- int br_sel = cpuc->br_sel, type, i, j;
+ int br_sel = cpuc->br_sel, offset, type, i, j;
bool compress = false;
u64 from, to;
@@ -109,7 +109,15 @@ static void amd_pmu_lbr_filter(void)
for (i = 0; i < cpuc->lbr_stack.nr; i++) {
from = cpuc->lbr_entries[i].from;
to = cpuc->lbr_entries[i].to;
- type = branch_type(from, to, 0);
+ type = branch_type_fused(from, to, 0, &offset);
+
+ /*
+ * Adjust the branch from address in case of instruction
+ * fusion where it points to an instruction preceding the
+ * actual branch
+ */
+ if (offset)
+ cpuc->lbr_entries[i].from += offset;
/* If type does not correspond, then discard */
if (type == X86_BR_NONE || (br_sel & type) != type) {
--
2.34.1
With branch fusion and other optimizations, branch sampling hardware in
some processors can report a branch from address that points to an
instruction preceding the actual branch by several bytes.
In such cases, the classifier cannot determine the branch type which leads
to failures such as with the recently added test from commit b55878c90ab9
("perf test: Add test for branch stack sampling"). Branch information is
also easier to consume and annotate if branch from addresses always point
to branch instructions.
Add a new variant of the branch classifier that can account for instruction
fusion. If fusion is expected and the current branch from address does not
point to a branch instruction, it attempts to find the first branch within
the next (MAX_INSN_SIZE - 1) bytes and if found, additionally provides the
offset between the reported branch from address and the address of the
expected branch instruction.
Signed-off-by: Sandipan Das <[email protected]>
---
arch/x86/events/perf_event.h | 2 +
arch/x86/events/utils.c | 169 +++++++++++++++++++++--------------
2 files changed, 102 insertions(+), 69 deletions(-)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 8ac8f9e6affb..eb6227630244 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1269,6 +1269,8 @@ enum {
int common_branch_type(int type);
int branch_type(unsigned long from, unsigned long to, int abort);
+int branch_type_fused(unsigned long from, unsigned long to, int abort,
+ int *offset);
ssize_t x86_event_sysfs_show(char *page, u64 config, u64 event);
ssize_t intel_event_sysfs_show(char *page, u64 config);
diff --git a/arch/x86/events/utils.c b/arch/x86/events/utils.c
index a32368945462..e013243f360c 100644
--- a/arch/x86/events/utils.c
+++ b/arch/x86/events/utils.c
@@ -3,6 +3,68 @@
#include "perf_event.h"
+static int decode_branch_type(struct insn *insn)
+{
+ int ext;
+
+ if (insn_get_opcode(insn))
+ return X86_BR_ABORT;
+
+ switch (insn->opcode.bytes[0]) {
+ case 0xf:
+ switch (insn->opcode.bytes[1]) {
+ case 0x05: /* syscall */
+ case 0x34: /* sysenter */
+ return X86_BR_SYSCALL;
+ case 0x07: /* sysret */
+ case 0x35: /* sysexit */
+ return X86_BR_SYSRET;
+ case 0x80 ... 0x8f: /* conditional */
+ return X86_BR_JCC;
+ }
+ return X86_BR_NONE;
+ case 0x70 ... 0x7f: /* conditional */
+ return X86_BR_JCC;
+ case 0xc2: /* near ret */
+ case 0xc3: /* near ret */
+ case 0xca: /* far ret */
+ case 0xcb: /* far ret */
+ return X86_BR_RET;
+ case 0xcf: /* iret */
+ return X86_BR_IRET;
+ case 0xcc ... 0xce: /* int */
+ return X86_BR_INT;
+ case 0xe8: /* call near rel */
+ if (insn_get_immediate(insn) || insn->immediate1.value == 0) {
+ /* zero length call */
+ return X86_BR_ZERO_CALL;
+ }
+ fallthrough;
+ case 0x9a: /* call far absolute */
+ return X86_BR_CALL;
+ case 0xe0 ... 0xe3: /* loop jmp */
+ return X86_BR_JCC;
+ case 0xe9 ... 0xeb: /* jmp */
+ return X86_BR_JMP;
+ case 0xff: /* call near absolute, call far absolute ind */
+ if (insn_get_modrm(insn))
+ return X86_BR_ABORT;
+
+ ext = (insn->modrm.bytes[0] >> 3) & 0x7;
+ switch (ext) {
+ case 2: /* near ind call */
+ case 3: /* far ind call */
+ return X86_BR_IND_CALL;
+ case 4:
+ case 5:
+ return X86_BR_IND_JMP;
+ }
+ return X86_BR_NONE;
+ }
+
+ return X86_BR_NONE;
+}
+
/*
* return the type of control flow change at address "from"
* instruction is not necessarily a branch (in case of interrupt).
@@ -13,14 +75,22 @@
* If a branch type is unknown OR the instruction cannot be
* decoded (e.g., text page not present), then X86_BR_NONE is
* returned.
+ *
+ * While recording branches, some processors can report the "from"
+ * address to be that of an instruction preceding the actual branch
+ * when instruction fusion occurs. If fusion is expected, attempt to
+ * find the type of the first branch instruction within the next
+ * MAX_INSN_SIZE bytes and if found, provide the offset between the
+ * reported "from" address and the actual branch instruction address.
*/
-int branch_type(unsigned long from, unsigned long to, int abort)
+static int get_branch_type(unsigned long from, unsigned long to, int abort,
+ bool fused, int *offset)
{
struct insn insn;
void *addr;
- int bytes_read, bytes_left;
+ int bytes_read, bytes_left, insn_offset;
int ret = X86_BR_NONE;
- int ext, to_plm, from_plm;
+ int to_plm, from_plm;
u8 buf[MAX_INSN_SIZE];
int is64 = 0;
@@ -83,77 +153,27 @@ int branch_type(unsigned long from, unsigned long to, int abort)
is64 = kernel_ip((unsigned long)addr) || any_64bit_mode(current_pt_regs());
#endif
insn_init(&insn, addr, bytes_read, is64);
- if (insn_get_opcode(&insn))
- return X86_BR_ABORT;
+ ret = decode_branch_type(&insn);
+ insn_offset = 0;
- switch (insn.opcode.bytes[0]) {
- case 0xf:
- switch (insn.opcode.bytes[1]) {
- case 0x05: /* syscall */
- case 0x34: /* sysenter */
- ret = X86_BR_SYSCALL;
- break;
- case 0x07: /* sysret */
- case 0x35: /* sysexit */
- ret = X86_BR_SYSRET;
- break;
- case 0x80 ... 0x8f: /* conditional */
- ret = X86_BR_JCC;
- break;
- default:
- ret = X86_BR_NONE;
- }
- break;
- case 0x70 ... 0x7f: /* conditional */
- ret = X86_BR_JCC;
- break;
- case 0xc2: /* near ret */
- case 0xc3: /* near ret */
- case 0xca: /* far ret */
- case 0xcb: /* far ret */
- ret = X86_BR_RET;
- break;
- case 0xcf: /* iret */
- ret = X86_BR_IRET;
- break;
- case 0xcc ... 0xce: /* int */
- ret = X86_BR_INT;
- break;
- case 0xe8: /* call near rel */
- if (insn_get_immediate(&insn) || insn.immediate1.value == 0) {
- /* zero length call */
- ret = X86_BR_ZERO_CALL;
+ /* Check for the possibility of branch fusion */
+ while (fused && ret == X86_BR_NONE) {
+ /* Check for decoding errors */
+ if (insn_get_length(&insn) || !insn.length)
break;
- }
- fallthrough;
- case 0x9a: /* call far absolute */
- ret = X86_BR_CALL;
- break;
- case 0xe0 ... 0xe3: /* loop jmp */
- ret = X86_BR_JCC;
- break;
- case 0xe9 ... 0xeb: /* jmp */
- ret = X86_BR_JMP;
- break;
- case 0xff: /* call near absolute, call far absolute ind */
- if (insn_get_modrm(&insn))
- return X86_BR_ABORT;
- ext = (insn.modrm.bytes[0] >> 3) & 0x7;
- switch (ext) {
- case 2: /* near ind call */
- case 3: /* far ind call */
- ret = X86_BR_IND_CALL;
+ insn_offset += insn.length;
+ bytes_read -= insn.length;
+ if (bytes_read < 0)
break;
- case 4:
- case 5:
- ret = X86_BR_IND_JMP;
- break;
- }
- break;
- default:
- ret = X86_BR_NONE;
+
+ insn_init(&insn, addr + insn_offset, bytes_read, is64);
+ ret = decode_branch_type(&insn);
}
+
+ if (offset)
+ *offset = insn_offset;
+
/*
* interrupts, traps, faults (and thus ring transition) may
* occur on any instructions. Thus, to classify them correctly,
@@ -179,6 +199,17 @@ int branch_type(unsigned long from, unsigned long to, int abort)
return ret;
}
+int branch_type(unsigned long from, unsigned long to, int abort)
+{
+ return get_branch_type(from, to, abort, false, NULL);
+}
+
+int branch_type_fused(unsigned long from, unsigned long to, int abort,
+ int *offset)
+{
+ return get_branch_type(from, to, abort, true, offset);
+}
+
#define X86_BR_TYPE_MAP_MAX 16
static int branch_map[X86_BR_TYPE_MAP_MAX] = {
--
2.34.1
On Thu, Aug 11, 2022 at 05:59:52PM +0530, Sandipan Das wrote:
> CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance
> monitoring features for AMD processors.
>
> Bit 1 of EAX indicates support for Last Branch Record Extension Version 2
> (LbrExtV2) features. If found to be set during PMU initialization, the EBX
> bits of the same leaf can be used to determine the number of available LBR
> entries.
>
> For better utilization of feature words, LbrExtV2 is added as a scattered
s/LbrExtV2 is added/add LbrExtV2/
> feature bit.
>
> Signed-off-by: Sandipan Das <[email protected]>
> ---
> arch/x86/include/asm/cpufeatures.h | 2 +-
> arch/x86/kernel/cpu/scattered.c | 1 +
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 393f2bbb5e3a..e3fa476a24b0 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -96,7 +96,7 @@
> #define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in IA32 userspace */
> #define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */
> #define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */
> -/* FREE! ( 3*32+17) */
> +#define X86_FEATURE_LBREXT_V2 ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
> #define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" LFENCE synchronizes RDTSC */
> #define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */
> #define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index dbaa8326d6f2..6be46dffddbf 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> { X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 },
> { X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
> { X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
> + { X86_FEATURE_LBREXT_V2, CPUID_EAX, 1, 0x80000022, 0 },
> { 0, 0, 0, 0, 0 }
> };
Other than that:
Acked-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Aug 11, 2022 at 05:59:52PM +0530, Sandipan Das wrote:
> CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance
> monitoring features for AMD processors.
>
> Bit 1 of EAX indicates support for Last Branch Record Extension Version 2
> (LbrExtV2) features. If found to be set during PMU initialization, the EBX
> bits of the same leaf can be used to determine the number of available LBR
> entries.
>
> For better utilization of feature words, LbrExtV2 is added as a scattered
> feature bit.
>
> Signed-off-by: Sandipan Das <[email protected]>
> ---
> arch/x86/include/asm/cpufeatures.h | 2 +-
> arch/x86/kernel/cpu/scattered.c | 1 +
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 393f2bbb5e3a..e3fa476a24b0 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -96,7 +96,7 @@
> #define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in IA32 userspace */
> #define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */
> #define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */
> -/* FREE! ( 3*32+17) */
> +#define X86_FEATURE_LBREXT_V2 ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
> #define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" LFENCE synchronizes RDTSC */
> #define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */
> #define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index dbaa8326d6f2..6be46dffddbf 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> { X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 },
> { X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
> { X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
> + { X86_FEATURE_LBREXT_V2, CPUID_EAX, 1, 0x80000022, 0 },
> { 0, 0, 0, 0, 0 }
> };
Would LBR_V2 work at all? It being a new version already seems to imply
extention, no? Then again, I suppose there's an argument to be had for
avoiding confusion vs the Intel LBR thing.. Couldn't you have called
this BRS_V2 :-)
Oh well...
Hi,
On Mon, Aug 15, 2022 at 4:27 AM Peter Zijlstra <[email protected]> wrote:
>
> On Thu, Aug 11, 2022 at 05:59:52PM +0530, Sandipan Das wrote:
> > CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance
> > monitoring features for AMD processors.
> >
> > Bit 1 of EAX indicates support for Last Branch Record Extension Version 2
> > (LbrExtV2) features. If found to be set during PMU initialization, the EBX
> > bits of the same leaf can be used to determine the number of available LBR
> > entries.
> >
> > For better utilization of feature words, LbrExtV2 is added as a scattered
> > feature bit.
> >
> > Signed-off-by: Sandipan Das <[email protected]>
> > ---
> > arch/x86/include/asm/cpufeatures.h | 2 +-
> > arch/x86/kernel/cpu/scattered.c | 1 +
> > 2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index 393f2bbb5e3a..e3fa476a24b0 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -96,7 +96,7 @@
> > #define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in IA32 userspace */
> > #define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */
> > #define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */
> > -/* FREE! ( 3*32+17) */
> > +#define X86_FEATURE_LBREXT_V2 ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
> > #define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" LFENCE synchronizes RDTSC */
> > #define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */
> > #define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */
> > diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> > index dbaa8326d6f2..6be46dffddbf 100644
> > --- a/arch/x86/kernel/cpu/scattered.c
> > +++ b/arch/x86/kernel/cpu/scattered.c
> > @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> > { X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 },
> > { X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
> > { X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
> > + { X86_FEATURE_LBREXT_V2, CPUID_EAX, 1, 0x80000022, 0 },
> > { 0, 0, 0, 0, 0 }
> > };
>
> Would LBR_V2 work at all? It being a new version already seems to imply
> extention, no? Then again, I suppose there's an argument to be had for
> avoiding confusion vs the Intel LBR thing.. Couldn't you have called
> this BRS_V2 :-)
>
I believe it is called v2 because there was already a LBR in previous
generations, however it
was 1-deep and it was not connected to the PMU like this one. The
public PPR mentions it
(MSR 0x1DB/0x1DC, Last Branch From IP, Last Branch To IP). See for
instance the PPR
for Fam17h model 71h:
https://www.amd.com/system/files/TechDocs/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip
BRS is a model specific feature for Zen3.
LBRv2 is a great improvement including over Zen3 BRS.
On Mon, Aug 15, 2022 at 12:42:23PM -0700, Stephane Eranian wrote:
> Hi,
>
> On Mon, Aug 15, 2022 at 4:27 AM Peter Zijlstra <[email protected]> wrote:
> >
> > On Thu, Aug 11, 2022 at 05:59:52PM +0530, Sandipan Das wrote:
> > > CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance
> > > monitoring features for AMD processors.
> > >
> > > Bit 1 of EAX indicates support for Last Branch Record Extension Version 2
> > > (LbrExtV2) features. If found to be set during PMU initialization, the EBX
> > > bits of the same leaf can be used to determine the number of available LBR
> > > entries.
> > >
> > > For better utilization of feature words, LbrExtV2 is added as a scattered
> > > feature bit.
> > >
> > > Signed-off-by: Sandipan Das <[email protected]>
> > > ---
> > > arch/x86/include/asm/cpufeatures.h | 2 +-
> > > arch/x86/kernel/cpu/scattered.c | 1 +
> > > 2 files changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > > index 393f2bbb5e3a..e3fa476a24b0 100644
> > > --- a/arch/x86/include/asm/cpufeatures.h
> > > +++ b/arch/x86/include/asm/cpufeatures.h
> > > @@ -96,7 +96,7 @@
> > > #define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in IA32 userspace */
> > > #define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */
> > > #define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */
> > > -/* FREE! ( 3*32+17) */
> > > +#define X86_FEATURE_LBREXT_V2 ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
> > > #define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" LFENCE synchronizes RDTSC */
> > > #define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */
> > > #define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */
> > > diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> > > index dbaa8326d6f2..6be46dffddbf 100644
> > > --- a/arch/x86/kernel/cpu/scattered.c
> > > +++ b/arch/x86/kernel/cpu/scattered.c
> > > @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> > > { X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 },
> > > { X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
> > > { X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
> > > + { X86_FEATURE_LBREXT_V2, CPUID_EAX, 1, 0x80000022, 0 },
> > > { 0, 0, 0, 0, 0 }
> > > };
> >
> > Would LBR_V2 work at all? It being a new version already seems to imply
> > extention, no? Then again, I suppose there's an argument to be had for
> > avoiding confusion vs the Intel LBR thing.. Couldn't you have called
> > this BRS_V2 :-)
> >
> I believe it is called v2 because there was already a LBR in previous
> generations, however it
That's not the question; It's currently called LBREXT_V2, which is a bit
of a shit name. Then again LBR_V2 is too because AMD and Intel LBR are
quite different. So in that respect BRS_V2 would be an ever so much
better name.
Hi Peter,
On 8/22/2022 2:35 PM, Peter Zijlstra wrote:
> On Mon, Aug 15, 2022 at 12:42:23PM -0700, Stephane Eranian wrote:
>> Hi,
>>
>> On Mon, Aug 15, 2022 at 4:27 AM Peter Zijlstra <[email protected]> wrote:
>>>
>>> On Thu, Aug 11, 2022 at 05:59:52PM +0530, Sandipan Das wrote:
>>>> CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance
>>>> monitoring features for AMD processors.
>>>>
>>>> Bit 1 of EAX indicates support for Last Branch Record Extension Version 2
>>>> (LbrExtV2) features. If found to be set during PMU initialization, the EBX
>>>> bits of the same leaf can be used to determine the number of available LBR
>>>> entries.
>>>>
>>>> For better utilization of feature words, LbrExtV2 is added as a scattered
>>>> feature bit.
>>>>
>>>> Signed-off-by: Sandipan Das <[email protected]>
>>>> ---
>>>> arch/x86/include/asm/cpufeatures.h | 2 +-
>>>> arch/x86/kernel/cpu/scattered.c | 1 +
>>>> 2 files changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>>>> index 393f2bbb5e3a..e3fa476a24b0 100644
>>>> --- a/arch/x86/include/asm/cpufeatures.h
>>>> +++ b/arch/x86/include/asm/cpufeatures.h
>>>> @@ -96,7 +96,7 @@
>>>> #define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in IA32 userspace */
>>>> #define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */
>>>> #define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */
>>>> -/* FREE! ( 3*32+17) */
>>>> +#define X86_FEATURE_LBREXT_V2 ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
>>>> #define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" LFENCE synchronizes RDTSC */
>>>> #define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */
>>>> #define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */
>>>> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
>>>> index dbaa8326d6f2..6be46dffddbf 100644
>>>> --- a/arch/x86/kernel/cpu/scattered.c
>>>> +++ b/arch/x86/kernel/cpu/scattered.c
>>>> @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>>>> { X86_FEATURE_PROC_FEEDBACK, CPUID_EDX, 11, 0x80000007, 0 },
>>>> { X86_FEATURE_MBA, CPUID_EBX, 6, 0x80000008, 0 },
>>>> { X86_FEATURE_PERFMON_V2, CPUID_EAX, 0, 0x80000022, 0 },
>>>> + { X86_FEATURE_LBREXT_V2, CPUID_EAX, 1, 0x80000022, 0 },
>>>> { 0, 0, 0, 0, 0 }
>>>> };
>>>
>>> Would LBR_V2 work at all? It being a new version already seems to imply
>>> extention, no? Then again, I suppose there's an argument to be had for
>>> avoiding confusion vs the Intel LBR thing.. Couldn't you have called
>>> this BRS_V2 :-)
>>>
>> I believe it is called v2 because there was already a LBR in previous
>> generations, however it
>
> That's not the question; It's currently called LBREXT_V2, which is a bit
> of a shit name. Then again LBR_V2 is too because AMD and Intel LBR are
> quite different. So in that respect BRS_V2 would be an ever so much
> better name.
AMD LbrExtV2 is similar to Intel LBR. Unlike BRS, LbrExtV2 does not rely on
interrupt holding. The branch records are "frozen" at the time of counter
overflow.
- Sandipan
On Mon, Aug 22, 2022 at 06:22:25PM +0530, Sandipan Das wrote:
> AMD LbrExtV2 is similar to Intel LBR. Unlike BRS, LbrExtV2 does not rely on
LbrExtV2 must be the most terrible name ever, please stop using it. Heck
your own code calls it lbr_v2 wherever it can.
So can we please just kill that name entirely?
$ quilt diff --combine - | grep -i lbrext_v2
+ if (x86_pmu.version < 2 || !boot_cpu_has(X86_FEATURE_LBREXT_V2))
+#define X86_FEATURE_LBREXT_V2 ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
+ { X86_FEATURE_LBREXT_V2, CPUID_EAX, 1, 0x80000022, 0 },
Is the complete usage of this silly name.
> interrupt holding. The branch records are "frozen" at the time of counter
> overflow.
Yes, I get all that. It is also significantly different from Intel LBR
in all details and shares not a single line of code, so also calling it
LBR is confusing at best.
The MSRs are called AMD_SAMPL_BR, so why not call the thing BRS_V2 ?
Hi Peter,
On 8/22/2022 6:56 PM, Peter Zijlstra wrote:
> On Mon, Aug 22, 2022 at 06:22:25PM +0530, Sandipan Das wrote:
>> AMD LbrExtV2 is similar to Intel LBR. Unlike BRS, LbrExtV2 does not rely on
>
> LbrExtV2 must be the most terrible name ever, please stop using it. Heck
> your own code calls it lbr_v2 wherever it can.
>
> So can we please just kill that name entirely?
>
> $ quilt diff --combine - | grep -i lbrext_v2
> + if (x86_pmu.version < 2 || !boot_cpu_has(X86_FEATURE_LBREXT_V2))
> +#define X86_FEATURE_LBREXT_V2 ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
> + { X86_FEATURE_LBREXT_V2, CPUID_EAX, 1, 0x80000022, 0 },
>
> Is the complete usage of this silly name.
>
I don't have any reservations about the name :) Its the name of the feature
bit from CPUID 0x80000022 EAX as mentioned in the processor manuals.
Unlike BRS, LBRv2 (if I may call it so) is an architected feature starting
with Zen4. BRS is model-specific and available only on Zen3. LBRv2 supersedes
it and from an architectural perspective, future platforms are slated to have
LBRv2 and not BRS. As Stephane alluded to earlier on this thread, AMD's legacy
LBR is 1-entry deep. We want to be able to differentiate from that as well as
BRS. For these reasons, and also to keep it disjoint from Intel's LBR, if you
so prefer, I can rename this feature to AMD_LBR_V2.
>> interrupt holding. The branch records are "frozen" at the time of counter
>> overflow.
>
> Yes, I get all that. It is also significantly different from Intel LBR
> in all details and shares not a single line of code, so also calling it
> LBR is confusing at best.
>
> The MSRs are called AMD_SAMPL_BR, so why not call the thing BRS_V2 ?
Sorry for the confusion with the register names. Since the SAMP_BR_FROM
and SAMP_BR_TO registers used by BRS have the same addresses as that of
the LBR_TO_V2 and LBR_FROM_V2 registers, I chose to reuse the definitions.
It it helps and considering LBRv2 is an architected feature, I can rename
AMD_SAMP_BR_* to AMD_LBR_V2_* or have these MSR definitions duplicated with
different names.
- Sandipan
On Tue, Aug 23, 2022 at 02:21:06PM +0530, Sandipan Das wrote:
> I can rename this feature to AMD_LBR_V2.
I've done that and stuck them in queue/perf/core. If the robots don't
hate on it I'll push them into tip.
On 8/25/2022 3:54 PM, Peter Zijlstra wrote:
> On Tue, Aug 23, 2022 at 02:21:06PM +0530, Sandipan Das wrote:
>> I can rename this feature to AMD_LBR_V2.
>
> I've done that and stuck them in queue/perf/core. If the robots don't
> hate on it I'll push them into tip.
Thanks!