2023-04-10 20:44:54

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 1/6] perf/x86/intel: Add Grand Ridge and Sierra Forest

From: Kan Liang <[email protected]>

The Grand Ridge and Sierra Forest are successors to Snow Ridge. They
both have Crestmont core. From the core PMU's perspective, they are
similar to the e-core of MTL. The only difference is the LBR event
logging feature, which will be implemented in the following patches.

Create a non-hybrid PMU setup for Grand Ridge and Sierra Forest.

Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/core.c | 52 +++++++++++++++++++++++++++++++++++-
arch/x86/events/intel/ds.c | 9 +++++--
arch/x86/events/perf_event.h | 2 ++
3 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ec667ef72e85..0bc325b7e028 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2119,6 +2119,17 @@ static struct extra_reg intel_grt_extra_regs[] __read_mostly = {
EVENT_EXTRA_END
};

+EVENT_ATTR_STR(topdown-retiring, td_retiring_cmt, "event=0x72,umask=0x0");
+EVENT_ATTR_STR(topdown-bad-spec, td_bad_spec_cmt, "event=0x73,umask=0x0");
+
+static struct attribute *cmt_events_attrs[] = {
+ EVENT_PTR(td_fe_bound_tnt),
+ EVENT_PTR(td_retiring_cmt),
+ EVENT_PTR(td_bad_spec_cmt),
+ EVENT_PTR(td_be_bound_tnt),
+ NULL
+};
+
static struct extra_reg intel_cmt_extra_regs[] __read_mostly = {
/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x800ff3ffffffffffull, RSP_0),
@@ -4830,6 +4841,8 @@ PMU_FORMAT_ATTR(ldlat, "config1:0-15");

PMU_FORMAT_ATTR(frontend, "config1:0-23");

+PMU_FORMAT_ATTR(snoop_rsp, "config1:0-63");
+
static struct attribute *intel_arch3_formats_attr[] = {
&format_attr_event.attr,
&format_attr_umask.attr,
@@ -4860,6 +4873,13 @@ static struct attribute *slm_format_attr[] = {
NULL
};

+static struct attribute *cmt_format_attr[] = {
+ &format_attr_offcore_rsp.attr,
+ &format_attr_ldlat.attr,
+ &format_attr_snoop_rsp.attr,
+ NULL
+};
+
static struct attribute *skl_format_attr[] = {
&format_attr_frontend.attr,
NULL,
@@ -5632,7 +5652,6 @@ static struct attribute *adl_hybrid_extra_attr[] = {
NULL
};

-PMU_FORMAT_ATTR_SHOW(snoop_rsp, "config1:0-63");
FORMAT_ATTR_HYBRID(snoop_rsp, hybrid_small);

static struct attribute *mtl_hybrid_extra_attr_rtm[] = {
@@ -6180,6 +6199,37 @@ __init int intel_pmu_init(void)
name = "gracemont";
break;

+ case INTEL_FAM6_GRANDRIDGE:
+ case INTEL_FAM6_SIERRAFOREST_X:
+ x86_pmu.mid_ack = true;
+ memcpy(hw_cache_event_ids, glp_hw_cache_event_ids,
+ sizeof(hw_cache_event_ids));
+ memcpy(hw_cache_extra_regs, tnt_hw_cache_extra_regs,
+ sizeof(hw_cache_extra_regs));
+ hw_cache_event_ids[C(ITLB)][C(OP_READ)][C(RESULT_ACCESS)] = -1;
+
+ x86_pmu.event_constraints = intel_slm_event_constraints;
+ x86_pmu.pebs_constraints = intel_grt_pebs_event_constraints;
+ x86_pmu.extra_regs = intel_cmt_extra_regs;
+
+ x86_pmu.pebs_aliases = NULL;
+ x86_pmu.pebs_prec_dist = true;
+ x86_pmu.lbr_pt_coexist = true;
+ x86_pmu.pebs_block = true;
+ x86_pmu.flags |= PMU_FL_HAS_RSP_1;
+ x86_pmu.flags |= PMU_FL_INSTR_LATENCY;
+
+ intel_pmu_pebs_data_source_cmt();
+ x86_pmu.pebs_latency_data = mtl_latency_data_small;
+ x86_pmu.get_event_constraints = cmt_get_event_constraints;
+ x86_pmu.limit_period = spr_limit_period;
+ td_attr = cmt_events_attrs;
+ mem_attr = grt_mem_attrs;
+ extra_attr = cmt_format_attr;
+ pr_cont("Crestmont events, ");
+ name = "crestmont";
+ break;
+
case INTEL_FAM6_WESTMERE:
case INTEL_FAM6_WESTMERE_EP:
case INTEL_FAM6_WESTMERE_EX:
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 6f38c94e92c5..1630a084dfe8 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -144,7 +144,7 @@ void __init intel_pmu_pebs_data_source_adl(void)
__intel_pmu_pebs_data_source_grt(data_source);
}

-static void __init intel_pmu_pebs_data_source_cmt(u64 *data_source)
+static void __init __intel_pmu_pebs_data_source_cmt(u64 *data_source)
{
data_source[0x07] = OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOPX, FWD);
data_source[0x08] = OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM);
@@ -164,7 +164,12 @@ void __init intel_pmu_pebs_data_source_mtl(void)

data_source = x86_pmu.hybrid_pmu[X86_HYBRID_PMU_ATOM_IDX].pebs_data_source;
memcpy(data_source, pebs_data_source, sizeof(pebs_data_source));
- intel_pmu_pebs_data_source_cmt(data_source);
+ __intel_pmu_pebs_data_source_cmt(data_source);
+}
+
+void __init intel_pmu_pebs_data_source_cmt(void)
+{
+ __intel_pmu_pebs_data_source_cmt(pebs_data_source);
}

static u64 precise_store_data(u64 status)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index d6de4487348c..c8ba2be7585d 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1606,6 +1606,8 @@ void intel_pmu_pebs_data_source_grt(void);

void intel_pmu_pebs_data_source_mtl(void);

+void intel_pmu_pebs_data_source_cmt(void);
+
int intel_pmu_setup_lbr_filter(struct perf_event *event);

void intel_pt_interrupt(void);
--
2.35.1


2023-04-10 20:45:01

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 3/6] perf/x86/intel: Support LBR event logging

From: Kan Liang <[email protected]>

The LBR event logging introduces a per-counter indication of precise
event occurrences in LBRs. It can provide a means to attribute exposed
retirement latency to combinations of events across a block of
instructions. It also provides a means of attributing Timed LBR
latencies to events.

The feature and supported counters can be enumerated in the ARCH LBR
leaf. Add an x86_pmu flag to indicate the availability of the feature.

The feature is only supported on the first 4 GP counters on SRF/GRR.
Force the event constraint to the first 4 GP counters.

The LBR event is logged by the counter order, which is not available for
the perf tool. Records the event IDs if the PERF_SAMPLE_BRANCH_EVENT_IDS
sample type is set. The cpuc->events has global information. Filter the
system-wide events IDs for a per-thread event.

When rescheduling a counter, which was assigned to an event enabled the
LBR event logging feature, the existing LBR entries may contain the
counter information. If the counter is assigned to another event later,
the information will be wrongly interpreted. Flush the LBR for the case.

Add sanity check in intel_pmu_hw_config(). Disable the feature if other
counter filters (inv, cmask, edge, in_tx) are set or LBR call stack mode
is enabled. (For the LBR call stack mode, we cannot simply flush the
LBR, since it will break the call stack. Also, there is no obvious usage
with the call stack mode for now.)

Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/core.c | 28 ++++++++--
arch/x86/events/intel/ds.c | 1 +
arch/x86/events/intel/lbr.c | 86 +++++++++++++++++++++++++++++++
arch/x86/events/perf_event.h | 8 +++
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/include/asm/perf_event.h | 4 ++
6 files changed, 125 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 0bc325b7e028..6c9ecb8f3a4b 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2788,6 +2788,7 @@ static void intel_pmu_enable_fixed(struct perf_event *event)

static void intel_pmu_enable_event(struct perf_event *event)
{
+ u64 enable_mask = ARCH_PERFMON_EVENTSEL_ENABLE;
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;

@@ -2795,9 +2796,13 @@ static void intel_pmu_enable_event(struct perf_event *event)
intel_pmu_pebs_enable(event);

switch (idx) {
- case 0 ... INTEL_PMC_IDX_FIXED - 1:
+ case 0 ... PERF_MAX_BRANCH_EVENTS - 1:
+ if (branch_sample_event(event))
+ enable_mask |= ARCH_PERFMON_EVENTSEL_LBR_LOG;
+ fallthrough;
+ case PERF_MAX_BRANCH_EVENTS ... INTEL_PMC_IDX_FIXED - 1:
intel_set_masks(event, idx);
- __x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
+ __x86_pmu_enable_event(hwc, enable_mask);
break;
case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1:
case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END:
@@ -3047,8 +3052,10 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)

perf_sample_data_init(&data, 0, event->hw.last_period);

- if (has_branch_stack(event))
+ if (has_branch_stack(event)) {
perf_sample_save_brstack(&data, event, &cpuc->lbr_stack);
+ intel_pmu_lbr_save_event_ids(&data, event, cpuc);
+ }

if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
@@ -3613,6 +3620,13 @@ intel_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
if (cpuc->excl_cntrs)
return intel_get_excl_constraints(cpuc, event, idx, c2);

+ /* The LBR event logging is only available for some counters. */
+ if (branch_sample_event(event)) {
+ c2 = dyn_constraint(cpuc, c2, idx);
+ c2->idxmsk64 &= x86_pmu.lbr_events;
+ c2->weight = hweight64(c2->idxmsk64);
+ }
+
return c2;
}

@@ -3898,6 +3912,12 @@ static int intel_pmu_hw_config(struct perf_event *event)
x86_pmu.pebs_aliases(event);
}

+ if (branch_sample_event(event) &&
+ (!(x86_pmu.flags & PMU_FL_LBR_EVENT) ||
+ (event->attr.config & ~INTEL_ARCH_EVENT_MASK) ||
+ (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK)))
+ return -EINVAL;
+
if (needs_branch_stack(event)) {
ret = intel_pmu_setup_lbr_filter(event);
if (ret)
@@ -4549,7 +4569,7 @@ int intel_cpuc_prepare(struct cpu_hw_events *cpuc, int cpu)
goto err;
}

- if (x86_pmu.flags & (PMU_FL_EXCL_CNTRS | PMU_FL_TFA)) {
+ if (x86_pmu.flags & (PMU_FL_EXCL_CNTRS | PMU_FL_TFA | PMU_FL_LBR_EVENT)) {
size_t sz = X86_PMC_IDX_MAX * sizeof(struct event_constraint);

cpuc->constraint_list = kzalloc_node(sz, GFP_KERNEL, cpu_to_node(cpu));
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 1630a084dfe8..413690191a89 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1920,6 +1920,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
if (has_branch_stack(event)) {
intel_pmu_store_pebs_lbrs(lbr);
perf_sample_save_brstack(data, event, &cpuc->lbr_stack);
+ intel_pmu_lbr_save_event_ids(data, event, cpuc);
}
}

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index c3b0d15a9841..7418753cc458 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -676,6 +676,48 @@ void intel_pmu_lbr_del(struct perf_event *event)
WARN_ON_ONCE(cpuc->lbr_users < 0);
WARN_ON_ONCE(cpuc->lbr_pebs_users < 0);
perf_sched_cb_dec(event->pmu);
+
+ /*
+ * When rescheduling a counter, which was assigned to an event
+ * enabled the LBR event logging feature, the existing LBR
+ * entries may contain the counter information. If the counter
+ * is assigned to another event later, the information will be
+ * wrongly interpreted. It's too expensive to modify the counter
+ * information for each existing LBR entry compared with flushing
+ * the LBR.
+ */
+ if (branch_sample_event(event)) {
+ int i, flexible = 0;
+
+ for (i = 0; i < PERF_MAX_BRANCH_EVENTS; i++) {
+ struct perf_event *e = cpuc->events[i];
+
+ if (e && !e->attr.pinned)
+ flexible++;
+ }
+
+ /*
+ * There should be two common cases for rescheduling,
+ * multiplexing and context switch.
+ * - For the multiplexing, only the flexible events are
+ * rescheduled. The LBR entries will only be flushed
+ * for the last flexible event with the LBR event
+ * logging feature.
+ * - For the context switch, the LBR will be unconditionally
+ * flushed when the new task is scheduled in. Ideally,
+ * the flush is not required. But it's hard to tell
+ * whether it's a context switch here.
+ * There could be a case that an extra flush is introduced.
+ * But the extra flush doesn't impact the functionality.
+ * For example, both the new task and the old task are
+ * monitored by some flexible events with LBR event logging
+ * enabled. There will be an extra flush when the last
+ * flexible event of the old task is scheduled out. But the
+ * case should not be a common case.
+ */
+ if (!flexible && !event->attr.pinned)
+ intel_pmu_arch_lbr_reset();
+ }
}

static inline bool vlbr_exclude_host(void)
@@ -866,6 +908,16 @@ static __always_inline u16 get_lbr_cycles(u64 info)
return cycles;
}

+static u8 get_lbr_events(struct cpu_hw_events *cpuc, u64 info)
+{
+ u8 lbr_events = 0;
+
+ if (x86_pmu.flags & PMU_FL_LBR_EVENT)
+ lbr_events = (info & LBR_INFO_EVENTS) >> LBR_INFO_EVENTS_OFFSET;
+
+ return lbr_events;
+}
+
static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
struct lbr_entry *entries)
{
@@ -898,6 +950,7 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
e->abort = !!(info & LBR_INFO_ABORT);
e->cycles = get_lbr_cycles(info);
e->type = get_lbr_br_type(info);
+ e->events = get_lbr_events(cpuc, info);
}

cpuc->lbr_stack.nr = i;
@@ -1198,6 +1251,35 @@ void intel_pmu_store_pebs_lbrs(struct lbr_entry *lbr)
intel_pmu_lbr_filter(cpuc);
}

+void intel_pmu_lbr_save_event_ids(struct perf_sample_data *data,
+ struct perf_event *event,
+ struct cpu_hw_events *cpuc)
+{
+ bool filter;
+ int i;
+
+ if (!(x86_pmu.flags & PMU_FL_LBR_EVENT) ||
+ !(event->attr.sample_type & PERF_SAMPLE_BRANCH_EVENT_IDS))
+ return;
+
+ /* Filter the system-wide events ID for per-thread event */
+ filter = !!(event->attach_state & PERF_ATTACH_TASK);
+
+ for (i = 0; i < PERF_MAX_BRANCH_EVENTS; i++) {
+ struct perf_event *e = cpuc->events[i];
+
+ if (e && branch_sample_event(e) &&
+ (!filter || (e->attach_state & PERF_ATTACH_TASK))) {
+ cpuc->lbr_ids[i] = cpuc->events[i]->id;
+ continue;
+ }
+ cpuc->lbr_ids[i] = -1ULL;
+ }
+
+ cpuc->lbr_event_ids.nr = PERF_MAX_BRANCH_EVENTS;
+ perf_sample_save_event_ids(data, &cpuc->lbr_event_ids);
+}
+
/*
* Map interface branch filters onto LBR filters
*/
@@ -1525,8 +1607,12 @@ void __init intel_pmu_arch_lbr_init(void)
x86_pmu.lbr_mispred = ecx.split.lbr_mispred;
x86_pmu.lbr_timed_lbr = ecx.split.lbr_timed_lbr;
x86_pmu.lbr_br_type = ecx.split.lbr_br_type;
+ x86_pmu.lbr_events = ecx.split.lbr_events;
x86_pmu.lbr_nr = lbr_nr;

+ if (!!x86_pmu.lbr_events)
+ x86_pmu.flags |= PMU_FL_LBR_EVENT;
+
if (x86_pmu.lbr_mispred)
static_branch_enable(&x86_lbr_mispred);
if (x86_pmu.lbr_timed_lbr)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index c8ba2be7585d..feeef9d41cac 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -283,6 +283,8 @@ struct cpu_hw_events {
int lbr_pebs_users;
struct perf_branch_stack lbr_stack;
struct perf_branch_entry lbr_entries[MAX_LBR_ENTRIES];
+ struct perf_branch_event_ids lbr_event_ids;
+ u64 lbr_ids[PERF_MAX_BRANCH_EVENTS];
union {
struct er_account *lbr_sel;
struct er_account *lbr_ctl;
@@ -881,6 +883,7 @@ struct x86_pmu {
unsigned int lbr_mispred:1;
unsigned int lbr_timed_lbr:1;
unsigned int lbr_br_type:1;
+ unsigned int lbr_events:4;

void (*lbr_reset)(void);
void (*lbr_read)(struct cpu_hw_events *cpuc);
@@ -1005,6 +1008,7 @@ do { \
#define PMU_FL_INSTR_LATENCY 0x80 /* Support Instruction Latency in PEBS Memory Info Record */
#define PMU_FL_MEM_LOADS_AUX 0x100 /* Require an auxiliary event for the complete memory info */
#define PMU_FL_RETIRE_LATENCY 0x200 /* Support Retire Latency in PEBS */
+#define PMU_FL_LBR_EVENT 0x400 /* Support LBR event logging */

#define EVENT_VAR(_id) event_attr_##_id
#define EVENT_PTR(_id) &event_attr_##_id.attr.attr
@@ -1545,6 +1549,10 @@ void intel_pmu_store_pebs_lbrs(struct lbr_entry *lbr);

void intel_ds_init(void);

+void intel_pmu_lbr_save_event_ids(struct perf_sample_data *data,
+ struct perf_event *event,
+ struct cpu_hw_events *cpuc);
+
void intel_pmu_lbr_swap_task_ctx(struct perf_event_pmu_context *prev_epc,
struct perf_event_pmu_context *next_epc);

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ad35355ee43e..b845eeb527ef 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -222,6 +222,8 @@
#define LBR_INFO_CYCLES 0xffff
#define LBR_INFO_BR_TYPE_OFFSET 56
#define LBR_INFO_BR_TYPE (0xfull << LBR_INFO_BR_TYPE_OFFSET)
+#define LBR_INFO_EVENTS_OFFSET 32
+#define LBR_INFO_EVENTS (0xffull << LBR_INFO_EVENTS_OFFSET)

#define MSR_ARCH_LBR_CTL 0x000014ce
#define ARCH_LBR_CTL_LBREN BIT(0)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8fc15ed5e60b..2ae60c378e3a 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -31,6 +31,7 @@
#define ARCH_PERFMON_EVENTSEL_ENABLE (1ULL << 22)
#define ARCH_PERFMON_EVENTSEL_INV (1ULL << 23)
#define ARCH_PERFMON_EVENTSEL_CMASK 0xFF000000ULL
+#define ARCH_PERFMON_EVENTSEL_LBR_LOG (1ULL << 35)

#define HSW_IN_TX (1ULL << 32)
#define HSW_IN_TX_CHECKPOINTED (1ULL << 33)
@@ -203,6 +204,9 @@ union cpuid28_ecx {
unsigned int lbr_timed_lbr:1;
/* Branch Type Field Supported */
unsigned int lbr_br_type:1;
+ unsigned int reserved:13;
+ /* Event Logging Supported */
+ unsigned int lbr_events:4;
} split;
unsigned int full;
};
--
2.35.1

2023-04-10 20:45:07

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 2/6] perf: Support branch events logging

From: Kan Liang <[email protected]>

With the cycle time information between branches, stalls can be easily
observed. But it's difficult to explain what causes the long delay.

Add a new field to collect the occurrences of events since the last
branch entry, which can be used to provide some causality information
for the cycle time values currently recorded in branches.

Add a new branch sample type to indicate whether include occurrences of
events in branch info.

Only support up to 4 events with saturating at value 3.
In the current kernel, the events are ordered by either the counter
index or the enabling sequence. But none of the order information is
available to the user space tool.
Add a new PERF_SAMPLE format, PERF_SAMPLE_BRANCH_EVENT_IDS, and generic
support to dump the event IDs of the branch events.
Add a helper function to detect the branch event flag.
These will be used in the following patch.

Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
include/linux/perf_event.h | 26 ++++++++++++++++++++++++++
include/uapi/linux/perf_event.h | 22 ++++++++++++++++++++--
kernel/events/core.c | 23 +++++++++++++++++++++++
3 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d5628a7b5eaa..3b659a57129a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -126,6 +126,11 @@ struct perf_branch_stack {
struct perf_branch_entry entries[];
};

+struct perf_branch_event_ids {
+ __u64 nr;
+ __u64 ids[];
+};
+
struct task_struct;

/*
@@ -1127,6 +1132,11 @@ static inline bool branch_sample_priv(const struct perf_event *event)
return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_PRIV_SAVE;
}

+static inline bool branch_sample_event(const struct perf_event *event)
+{
+ return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_EVENT;
+}
+

struct perf_sample_data {
/*
@@ -1161,6 +1171,7 @@ struct perf_sample_data {
struct perf_callchain_entry *callchain;
struct perf_raw_record *raw;
struct perf_branch_stack *br_stack;
+ struct perf_branch_event_ids *br_event_ids;
union perf_sample_weight weight;
union perf_mem_data_src data_src;
u64 txn;
@@ -1250,6 +1261,21 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,
data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
}

+static inline void perf_sample_save_event_ids(struct perf_sample_data *data,
+ struct perf_branch_event_ids *ids)
+{
+ int size = sizeof(u64); /* nr */
+
+ if (WARN_ON_ONCE(ids->nr > PERF_MAX_BRANCH_EVENTS))
+ ids->nr = PERF_MAX_BRANCH_EVENTS;
+
+ size += ids->nr * sizeof(u64);
+
+ data->br_event_ids = ids;
+ data->dyn_size += size;
+ data->sample_flags |= PERF_SAMPLE_BRANCH_EVENT_IDS;
+}
+
static inline u32 perf_sample_data_size(struct perf_sample_data *data,
struct perf_event *event)
{
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 37675437b768..36d70717ecbd 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -162,8 +162,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_DATA_PAGE_SIZE = 1U << 22,
PERF_SAMPLE_CODE_PAGE_SIZE = 1U << 23,
PERF_SAMPLE_WEIGHT_STRUCT = 1U << 24,
+ PERF_SAMPLE_BRANCH_EVENT_IDS = 1U << 25,

- PERF_SAMPLE_MAX = 1U << 25, /* non-ABI */
+ PERF_SAMPLE_MAX = 1U << 26, /* non-ABI */
};

#define PERF_SAMPLE_WEIGHT_TYPE (PERF_SAMPLE_WEIGHT | PERF_SAMPLE_WEIGHT_STRUCT)
@@ -204,6 +205,8 @@ enum perf_branch_sample_type_shift {

PERF_SAMPLE_BRANCH_PRIV_SAVE_SHIFT = 18, /* save privilege mode */

+ PERF_SAMPLE_BRANCH_EVENT_SHIFT = 19, /* save occurrences of events */
+
PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
};

@@ -235,6 +238,8 @@ enum perf_branch_sample_type {

PERF_SAMPLE_BRANCH_PRIV_SAVE = 1U << PERF_SAMPLE_BRANCH_PRIV_SAVE_SHIFT,

+ PERF_SAMPLE_BRANCH_EVENT = 1U << PERF_SAMPLE_BRANCH_EVENT_SHIFT,
+
PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
};

@@ -1018,6 +1023,8 @@ enum perf_event_type {
* char data[size]; } && PERF_SAMPLE_AUX
* { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
* { u64 code_page_size;} && PERF_SAMPLE_CODE_PAGE_SIZE
+ * { u64 nr;
+ * u64 ids[nr];} && PERF_SAMPLE_BRANCH_EVENT_IDS
* };
*/
PERF_RECORD_SAMPLE = 9,
@@ -1394,6 +1401,12 @@ union perf_mem_data_src {
#define PERF_MEM_S(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)

+#define PERF_MAX_BRANCH_EVENTS 4
+#define PERF_BRANCH_EVENTS_MASK 0x3
+#define PERF_BRANCH_EVENTS_STEP 2
+
+#define perf_branch_event_by_idx(_events, _idx) \
+ (((_events) >> ((_idx) * PERF_BRANCH_EVENTS_STEP)) & PERF_BRANCH_EVENTS_MASK)
/*
* single taken branch record layout:
*
@@ -1410,6 +1423,10 @@ union perf_mem_data_src {
* cycles: cycles from last branch (or 0 if not supported)
* type: branch type
* spec: branch speculation info (or 0 if not supported)
+ * events: occurrences of events since the last branch entry.
+ * The fields can store up to 4 events with saturating
+ * at value 3.
+ * (or 0 if not supported)
*/
struct perf_branch_entry {
__u64 from;
@@ -1423,7 +1440,8 @@ struct perf_branch_entry {
spec:2, /* branch speculation info */
new_type:4, /* additional branch type */
priv:3, /* privilege level */
- reserved:31;
+ events:8, /* occurrences of events since the last branch entry */
+ reserved:23;
};

union perf_sample_weight {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f79fd8b87f75..1ec7cc8b0730 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7401,6 +7401,23 @@ void perf_output_sample(struct perf_output_handle *handle,
perf_aux_sample_output(event, handle, data);
}

+ if (sample_type & PERF_SAMPLE_BRANCH_EVENT_IDS) {
+ if (data->br_event_ids) {
+ size_t size;
+
+ size = data->br_event_ids->nr * sizeof(u64);
+ perf_output_put(handle, data->br_event_ids->nr);
+ perf_output_copy(handle, data->br_event_ids->ids, size);
+ } else {
+ /*
+ * we always store at least the value of nr
+ */
+ u64 nr = 0;
+
+ perf_output_put(handle, nr);
+ }
+ }
+
if (!event->attr.watermark) {
int wakeup_events = event->attr.wakeup_events;

@@ -7747,6 +7764,12 @@ void perf_prepare_sample(struct perf_sample_data *data,
data->dyn_size += size + sizeof(u64); /* size above */
data->sample_flags |= PERF_SAMPLE_AUX;
}
+
+ if (filtered_sample_type & PERF_SAMPLE_BRANCH_EVENT_IDS) {
+ data->br_event_ids = NULL;
+ data->dyn_size += sizeof(u64);
+ data->sample_flags |= PERF_SAMPLE_BRANCH_EVENT_IDS;
+ }
}

void perf_prepare_header(struct perf_event_header *header,
--
2.35.1

2023-04-10 20:45:37

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 6/6] perf tools: Support PERF_SAMPLE_BRANCH_EVENT_IDS

From: Kan Liang <[email protected]>

Support new sample type PERF_SAMPLE_BRANCH_EVENT_IDS.

It's used with the branch event feature together. If the legacy kernel
doesn't support either of them, switching off them together.

The sampling event may not be the event logged by a branch. Apply the
PERF_SAMPLE_BRANCH_EVENT_IDS for all events if the branch events logging
feature is detected.

Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/util/branch.h | 5 +++++
tools/perf/util/evsel.c | 22 ++++++++++++++++++++--
tools/perf/util/perf_event_attr_fprintf.c | 2 +-
tools/perf/util/record.c | 13 +++++++++++++
tools/perf/util/sample.h | 1 +
tools/perf/util/session.c | 17 +++++++++++++++++
tools/perf/util/synthetic-events.c | 12 ++++++++++++
7 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
index 5feb79ccd698..761b686e7730 100644
--- a/tools/perf/util/branch.h
+++ b/tools/perf/util/branch.h
@@ -51,6 +51,11 @@ struct branch_stack {
struct branch_entry entries[];
};

+struct branch_event_ids {
+ u64 nr;
+ u64 ids[];
+};
+
/*
* The hw_idx is only available when PERF_SAMPLE_BRANCH_HW_INDEX is applied.
* Otherwise, the output format of a sample with branch stack is
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 1888552f41f9..91bd989c8491 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1850,8 +1850,10 @@ static int __evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,

static void evsel__disable_missing_features(struct evsel *evsel)
{
- if (perf_missing_features.branch_event)
+ if (perf_missing_features.branch_event) {
evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_EVENT;
+ evsel__reset_sample_bit(evsel, BRANCH_EVENT_IDS);
+ }
if (perf_missing_features.read_lost)
evsel->core.attr.read_format &= ~PERF_FORMAT_LOST;
if (perf_missing_features.weight_struct) {
@@ -1906,7 +1908,8 @@ bool evsel__detect_missing_features(struct evsel *evsel)
* perf_event_attr interface.
*/
if (!perf_missing_features.branch_event &&
- (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_EVENT)) {
+ ((evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_EVENT) ||
+ (evsel->core.attr.sample_type & PERF_SAMPLE_BRANCH_EVENT_IDS))) {
perf_missing_features.branch_event = true;
pr_debug2("switching off branch event support\n");
return true;
@@ -2710,6 +2713,21 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
array = (void *)array + sz;
}

+ if (type & PERF_SAMPLE_BRANCH_EVENT_IDS) {
+ const u64 max_branch_nr = UINT64_MAX / sizeof(u64);
+
+ OVERFLOW_CHECK_u64(array);
+ data->branch_event_ids = (struct branch_event_ids *)array++;
+
+ if (data->branch_event_ids->nr > max_branch_nr)
+ return -EFAULT;
+
+ sz = data->branch_event_ids->nr * sizeof(u64);
+
+ OVERFLOW_CHECK(array, sz, max_size);
+ array = (void *)array + sz;
+ }
+
return 0;
}

diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 96f0aafc962d..5eadcdaba12e 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -36,7 +36,7 @@ static void __p_sample_type(char *buf, size_t size, u64 value)
bit_name(IDENTIFIER), bit_name(REGS_INTR), bit_name(DATA_SRC),
bit_name(WEIGHT), bit_name(PHYS_ADDR), bit_name(AUX),
bit_name(CGROUP), bit_name(DATA_PAGE_SIZE), bit_name(CODE_PAGE_SIZE),
- bit_name(WEIGHT_STRUCT),
+ bit_name(WEIGHT_STRUCT), bit_name(BRANCH_EVENT_IDS),
{ .name = NULL, }
};
#undef bit_name
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 9eb5c6a08999..640ba5243209 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -98,6 +98,7 @@ void evlist__config(struct evlist *evlist, struct record_opts *opts, struct call
bool use_sample_identifier = false;
bool use_comm_exec;
bool sample_id = opts->sample_id;
+ bool has_branch_events = false;

if (perf_cpu_map__cpu(evlist->core.user_requested_cpus, 0).cpu < 0)
opts->no_inherit = true;
@@ -108,6 +109,8 @@ void evlist__config(struct evlist *evlist, struct record_opts *opts, struct call
evsel__config(evsel, opts, callchain);
if (evsel->tracking && use_comm_exec)
evsel->core.attr.comm_exec = 1;
+ if (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_EVENT)
+ has_branch_events = true;
}

/* Configure leader sampling here now that the sample type is known */
@@ -139,6 +142,16 @@ void evlist__config(struct evlist *evlist, struct record_opts *opts, struct call
evsel__set_sample_id(evsel, use_sample_identifier);
}

+ if (has_branch_events) {
+ /*
+ * The sampling event may not be the event logged by a
+ * branch. Apply the BRANCH_EVENT_IDS for all events if
+ * the branch events logging feature is detected.
+ */
+ evlist__for_each_entry(evlist, evsel)
+ evsel__set_sample_bit(evsel, BRANCH_EVENT_IDS);
+ }
+
evlist__set_id_pos(evlist);
}

diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
index 33b08e0ac746..b0979571c8af 100644
--- a/tools/perf/util/sample.h
+++ b/tools/perf/util/sample.h
@@ -101,6 +101,7 @@ struct perf_sample {
void *raw_data;
struct ip_callchain *callchain;
struct branch_stack *branch_stack;
+ struct branch_event_ids *branch_event_ids;
struct regs_dump user_regs;
struct regs_dump intr_regs;
struct stack_dump user_stack;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index ce6d9349ec42..cc53a4ddfe6d 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1203,6 +1203,20 @@ static void branch_stack__printf(struct perf_sample *sample, bool callstack)
}
}

+static void branch_event_ids__printf(struct branch_event_ids *br_event)
+{
+ u64 i;
+
+ printf("%s: nr:%" PRIu64 "\n", "... branch event IDs", br_event->nr);
+
+ for (i = 0; i < br_event->nr; i++) {
+ if (br_event->ids[i] != -1ULL)
+ printf("..... %2"PRIu64": %016" PRIx64 "\n", i, br_event->ids[i]);
+ else
+ printf("..... %2"PRIu64": N/A\n", i);
+ }
+}
+
static void regs_dump__printf(u64 mask, u64 *regs, const char *arch)
{
unsigned rid, i = 0;
@@ -1364,6 +1378,9 @@ static void dump_sample(struct evsel *evsel, union perf_event *event,
if (evsel__has_br_stack(evsel))
branch_stack__printf(sample, evsel__has_branch_callstack(evsel));

+ if (sample_type & PERF_SAMPLE_BRANCH_EVENT_IDS)
+ branch_event_ids__printf(sample->branch_event_ids);
+
if (sample_type & PERF_SAMPLE_REGS_USER)
regs_user__printf(sample, arch);

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 9ab9308ee80c..f4c47979e7c1 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -1543,6 +1543,11 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
result += sample->aux_sample.size;
}

+ if (type & PERF_SAMPLE_BRANCH_EVENT_IDS) {
+ result += sizeof(u64);
+ result += sample->branch_event_ids->nr * sizeof(u64);
+ }
+
return result;
}

@@ -1757,6 +1762,13 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, u64 read_fo
array = (void *)array + sz;
}

+ if (type & PERF_SAMPLE_BRANCH_EVENT_IDS) {
+ sz = sizeof(u64);
+ sz += sample->branch_event_ids->nr * sizeof(u64);
+ memcpy(array, sample->branch_event_ids, sz);
+ array = (void *)array + sz;
+ }
+
return 0;
}

--
2.35.1

2023-04-10 20:46:03

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 4/6] tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel

From: Kan Liang <[email protected]>

The branch event information and the corresponding event IDs can be
collected by kernel with the LBR event logging feature on Intel
platforms. Sync the new sample types and the new fields of
struct perf_branch_entry, so the perf tool can retrieve the occurrences
of events for each branch and the corresponding event IDs.

Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index ccb7f5dad59b..3c019ed7dbf6 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -162,8 +162,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_DATA_PAGE_SIZE = 1U << 22,
PERF_SAMPLE_CODE_PAGE_SIZE = 1U << 23,
PERF_SAMPLE_WEIGHT_STRUCT = 1U << 24,
+ PERF_SAMPLE_BRANCH_EVENT_IDS = 1U << 25,

- PERF_SAMPLE_MAX = 1U << 25, /* non-ABI */
+ PERF_SAMPLE_MAX = 1U << 26, /* non-ABI */
};

#define PERF_SAMPLE_WEIGHT_TYPE (PERF_SAMPLE_WEIGHT | PERF_SAMPLE_WEIGHT_STRUCT)
@@ -204,6 +205,8 @@ enum perf_branch_sample_type_shift {

PERF_SAMPLE_BRANCH_PRIV_SAVE_SHIFT = 18, /* save privilege mode */

+ PERF_SAMPLE_BRANCH_EVENT_SHIFT = 19, /* save occurrences of events */
+
PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
};

@@ -235,6 +238,8 @@ enum perf_branch_sample_type {

PERF_SAMPLE_BRANCH_PRIV_SAVE = 1U << PERF_SAMPLE_BRANCH_PRIV_SAVE_SHIFT,

+ PERF_SAMPLE_BRANCH_EVENT = 1U << PERF_SAMPLE_BRANCH_EVENT_SHIFT,
+
PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
};

@@ -1015,6 +1020,8 @@ enum perf_event_type {
* char data[size]; } && PERF_SAMPLE_AUX
* { u64 data_page_size;} && PERF_SAMPLE_DATA_PAGE_SIZE
* { u64 code_page_size;} && PERF_SAMPLE_CODE_PAGE_SIZE
+ * { u64 nr;
+ * u64 ids[nr];} && PERF_SAMPLE_BRANCH_EVENT_IDS
* };
*/
PERF_RECORD_SAMPLE = 9,
@@ -1391,6 +1398,12 @@ union perf_mem_data_src {
#define PERF_MEM_S(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)

+#define PERF_MAX_BRANCH_EVENTS 4
+#define PERF_BRANCH_EVENTS_MASK 0x3
+#define PERF_BRANCH_EVENTS_STEP 2
+
+#define perf_branch_event_by_idx(_events, _idx) \
+ (((_events) >> ((_idx) * PERF_BRANCH_EVENTS_STEP)) & PERF_BRANCH_EVENTS_MASK)
/*
* single taken branch record layout:
*
@@ -1407,6 +1420,10 @@ union perf_mem_data_src {
* cycles: cycles from last branch (or 0 if not supported)
* type: branch type
* spec: branch speculation info (or 0 if not supported)
+ * events: occurrences of events since the last branch entry.
+ * The fields can store up to 4 events with saturating
+ * at value 3.
+ * (or 0 if not supported)
*/
struct perf_branch_entry {
__u64 from;
@@ -1420,7 +1437,8 @@ struct perf_branch_entry {
spec:2, /* branch speculation info */
new_type:4, /* additional branch type */
priv:3, /* privilege level */
- reserved:31;
+ events:8, /* occurrences of events since the last branch entry */
+ reserved:23;
};

union perf_sample_weight {
--
2.35.1

2023-04-10 20:47:13

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 5/6] perf tools: Add branch event knob

From: Kan Liang <[email protected]>

Add a new branch filter for the branch event option. If the legacy
kernel doesn't support the branch sample type, switching off the branch
event filter.

The new branch event information should be dumped with other branch
information via perf report -D.

Extend the struct branch_flags and evsel__bitfield_swap_branch_flags()
to support the new field.

Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 1 +
tools/perf/util/branch.h | 3 ++-
tools/perf/util/evsel.c | 18 ++++++++++++++----
tools/perf/util/evsel.h | 1 +
tools/perf/util/parse-branch-options.c | 1 +
tools/perf/util/perf_event_attr_fprintf.c | 1 +
tools/perf/util/session.c | 3 ++-
7 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index ff815c2f67e8..d09443a01d91 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -402,6 +402,7 @@ following filters are defined:
4th-Gen Xeon+ server), the save branch type is unconditionally enabled
when the taken branch stack sampling is enabled.
- priv: save privilege state during sampling in case binary is not available later
+ - event: save occurrences of the event since the last branch entry.

+
The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
index e41bfffe2217..5feb79ccd698 100644
--- a/tools/perf/util/branch.h
+++ b/tools/perf/util/branch.h
@@ -25,7 +25,8 @@ struct branch_flags {
u64 spec:2;
u64 new_type:4;
u64 priv:3;
- u64 reserved:31;
+ u64 events:8;
+ u64 reserved:23;
};
};
};
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 51e8ce6edddc..1888552f41f9 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1850,6 +1850,8 @@ static int __evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,

static void evsel__disable_missing_features(struct evsel *evsel)
{
+ if (perf_missing_features.branch_event)
+ evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_EVENT;
if (perf_missing_features.read_lost)
evsel->core.attr.read_format &= ~PERF_FORMAT_LOST;
if (perf_missing_features.weight_struct) {
@@ -1903,7 +1905,12 @@ bool evsel__detect_missing_features(struct evsel *evsel)
* Must probe features in the order they were added to the
* perf_event_attr interface.
*/
- if (!perf_missing_features.read_lost &&
+ if (!perf_missing_features.branch_event &&
+ (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_EVENT)) {
+ perf_missing_features.branch_event = true;
+ pr_debug2("switching off branch event support\n");
+ return true;
+ } else if (!perf_missing_features.read_lost &&
(evsel->core.attr.read_format & PERF_FORMAT_LOST)) {
perf_missing_features.read_lost = true;
pr_debug2("switching off PERF_FORMAT_LOST support\n");
@@ -2320,7 +2327,8 @@ u64 evsel__bitfield_swap_branch_flags(u64 value)
* spec:2 //branch speculation info
* new_type:4 //additional branch type
* priv:3 //privilege level
- * reserved:31
+ * events:8 //occurrences of events
+ * reserved:23
* }
* }
*
@@ -2339,7 +2347,8 @@ u64 evsel__bitfield_swap_branch_flags(u64 value)
new_val |= bitfield_swap(value, 24, 2);
new_val |= bitfield_swap(value, 26, 4);
new_val |= bitfield_swap(value, 30, 3);
- new_val |= bitfield_swap(value, 33, 31);
+ new_val |= bitfield_swap(value, 33, 8);
+ new_val |= bitfield_swap(value, 41, 23);
} else {
new_val = bitfield_swap(value, 63, 1);
new_val |= bitfield_swap(value, 62, 1);
@@ -2350,7 +2359,8 @@ u64 evsel__bitfield_swap_branch_flags(u64 value)
new_val |= bitfield_swap(value, 38, 2);
new_val |= bitfield_swap(value, 34, 4);
new_val |= bitfield_swap(value, 31, 3);
- new_val |= bitfield_swap(value, 0, 31);
+ new_val |= bitfield_swap(value, 23, 8);
+ new_val |= bitfield_swap(value, 0, 23);
}

return new_val;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 24cb807ef6ce..05a61d36ee10 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -187,6 +187,7 @@ struct perf_missing_features {
bool code_page_size;
bool weight_struct;
bool read_lost;
+ bool branch_event;
};

extern struct perf_missing_features perf_missing_features;
diff --git a/tools/perf/util/parse-branch-options.c b/tools/perf/util/parse-branch-options.c
index fd67d204d720..c9fcefed5f9d 100644
--- a/tools/perf/util/parse-branch-options.c
+++ b/tools/perf/util/parse-branch-options.c
@@ -36,6 +36,7 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("stack", PERF_SAMPLE_BRANCH_CALL_STACK),
BRANCH_OPT("hw_index", PERF_SAMPLE_BRANCH_HW_INDEX),
BRANCH_OPT("priv", PERF_SAMPLE_BRANCH_PRIV_SAVE),
+ BRANCH_OPT("event", PERF_SAMPLE_BRANCH_EVENT),
BRANCH_END
};

diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 7e5e7b30510d..96f0aafc962d 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -53,6 +53,7 @@ static void __p_branch_sample_type(char *buf, size_t size, u64 value)
bit_name(COND), bit_name(CALL_STACK), bit_name(IND_JUMP),
bit_name(CALL), bit_name(NO_FLAGS), bit_name(NO_CYCLES),
bit_name(TYPE_SAVE), bit_name(HW_INDEX), bit_name(PRIV_SAVE),
+ bit_name(EVENT),
{ .name = NULL, }
};
#undef bit_name
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 749d5b5c135b..ce6d9349ec42 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1180,13 +1180,14 @@ static void branch_stack__printf(struct perf_sample *sample, bool callstack)
struct branch_entry *e = &entries[i];

if (!callstack) {
- printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x %s %s\n",
+ printf("..... %2"PRIu64": %016" PRIx64 " -> %016" PRIx64 " %hu cycles %s%s%s%s %x %x %s %s\n",
i, e->from, e->to,
(unsigned short)e->flags.cycles,
e->flags.mispred ? "M" : " ",
e->flags.predicted ? "P" : " ",
e->flags.abort ? "A" : " ",
e->flags.in_tx ? "T" : " ",
+ e->flags.events,
(unsigned)e->flags.reserved,
get_branch_type(e),
e->flags.spec ? branch_spec_desc(e->flags.spec) : "");
--
2.35.1

2023-04-14 10:43:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging

On Mon, Apr 10, 2023 at 01:43:48PM -0700, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> With the cycle time information between branches, stalls can be easily
> observed. But it's difficult to explain what causes the long delay.
>
> Add a new field to collect the occurrences of events since the last
> branch entry, which can be used to provide some causality information
> for the cycle time values currently recorded in branches.
>
> Add a new branch sample type to indicate whether include occurrences of
> events in branch info.
>
> Only support up to 4 events with saturating at value 3.
> In the current kernel, the events are ordered by either the counter
> index or the enabling sequence. But none of the order information is
> available to the user space tool.
> Add a new PERF_SAMPLE format, PERF_SAMPLE_BRANCH_EVENT_IDS, and generic
> support to dump the event IDs of the branch events.
> Add a helper function to detect the branch event flag.
> These will be used in the following patch.

I'm having trouble reverse engineering this. Can you more coherently
explain this feature and how you've implemented it?

2023-04-14 13:38:25

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging



On 2023-04-14 6:38 a.m., Peter Zijlstra wrote:
> On Mon, Apr 10, 2023 at 01:43:48PM -0700, [email protected] wrote:
>> From: Kan Liang <[email protected]>
>>
>> With the cycle time information between branches, stalls can be easily
>> observed. But it's difficult to explain what causes the long delay.
>>
>> Add a new field to collect the occurrences of events since the last
>> branch entry, which can be used to provide some causality information
>> for the cycle time values currently recorded in branches.
>>
>> Add a new branch sample type to indicate whether include occurrences of
>> events in branch info.
>>
>> Only support up to 4 events with saturating at value 3.
>> In the current kernel, the events are ordered by either the counter
>> index or the enabling sequence. But none of the order information is
>> available to the user space tool.
>> Add a new PERF_SAMPLE format, PERF_SAMPLE_BRANCH_EVENT_IDS, and generic
>> support to dump the event IDs of the branch events.
>> Add a helper function to detect the branch event flag.
>> These will be used in the following patch.
>
> I'm having trouble reverse engineering this. Can you more coherently
> explain this feature and how you've implemented it?

Sorry for that.

The feature is an enhancement of ARCH LBR. It adds new fields in the
LBR_INFO MSRs to log the occurrences of events on the first 4 GP
counters. Worked with the previous timed LBR feature together, the user
can understand not only the latency between two LBR blocks, but also
which events causes the stall.

The spec can be found at the latest Intel® Architecture Instruction Set
Extensions and Future Features, v048. Chapter 8.4.
https://cdrdv2.intel.com/v1/dl/getContent/671368

To support the feature, there are three main changes in ABIs.
- A new branch sample type, PERF_SAMPLE_BRANCH_EVENT, is used as a knob
to enable the feature.
- Extend the struct perf_branch_entry layout, because we have to save
and pass the occurrences of events to user space. Since it's only
available for 4 counters and saturating at value 3, it only occupies 8
bits. For the current Intel implementation, the order is the order of
counters.
- Add a new PERF_SAMPLE format, PERF_SAMPLE_BRANCH_EVENT_IDS, to dump
the order information. User space tool doesn't understand the order of
counters. So it cannot map the new fields in struct perf_branch_entry to
a specific event. We have to dump the order information.
I once considered using enabling order to avoid this new sample format.
It works for some cases, e.g., group. But it doesn't work for some
complex cases, e.g., multiplexing, in which the enabling order keeps
changing.
Ideally, we should dump the order information for each LBR entry. But
that will include too much duplicate information. So the order
information is only dumped for each sample. The drawback is that we have
to flush/update old LBR entries once the events are rescheduled between
samples, e.g., multiplexing. Because it's possible that the new sample
can still see the stall LBR entries. That's specially handled in the
next Intel specific patch.

For the current implementation, perf tool has to apply both
PERF_SAMPLE_BRANCH_EVENT and PERF_SAMPLE_BRANCH_EVENT_IDS to enable the
feature.

Thanks,
Kan

2023-04-14 14:58:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging

On Fri, Apr 14, 2023 at 09:35:37AM -0400, Liang, Kan wrote:
>
>
> On 2023-04-14 6:38 a.m., Peter Zijlstra wrote:
> > On Mon, Apr 10, 2023 at 01:43:48PM -0700, [email protected] wrote:
> >> From: Kan Liang <[email protected]>
> >>
> >> With the cycle time information between branches, stalls can be easily
> >> observed. But it's difficult to explain what causes the long delay.
> >>
> >> Add a new field to collect the occurrences of events since the last
> >> branch entry, which can be used to provide some causality information
> >> for the cycle time values currently recorded in branches.
> >>
> >> Add a new branch sample type to indicate whether include occurrences of
> >> events in branch info.
> >>
> >> Only support up to 4 events with saturating at value 3.
> >> In the current kernel, the events are ordered by either the counter
> >> index or the enabling sequence. But none of the order information is
> >> available to the user space tool.
> >> Add a new PERF_SAMPLE format, PERF_SAMPLE_BRANCH_EVENT_IDS, and generic
> >> support to dump the event IDs of the branch events.
> >> Add a helper function to detect the branch event flag.
> >> These will be used in the following patch.
> >
> > I'm having trouble reverse engineering this. Can you more coherently
> > explain this feature and how you've implemented it?
>
> Sorry for that.
>
> The feature is an enhancement of ARCH LBR. It adds new fields in the
> LBR_INFO MSRs to log the occurrences of events on the first 4 GP
> counters. Worked with the previous timed LBR feature together, the user
> can understand not only the latency between two LBR blocks, but also
> which events causes the stall.
>
> The spec can be found at the latest Intel? Architecture Instruction Set
> Extensions and Future Features, v048. Chapter 8.4.
> https://cdrdv2.intel.com/v1/dl/getContent/671368

Oh gawd; that's terse. Why can't these people write comprehensible
things :/ It's almost as if they don't want this stuff to be used.

So IA32_LBR_x_INFO is extended:

[0:15] CYC_CNT
[16:31] undefined
+ [32:33] PMC0_CNT
+ [34:35] PMC1_CNT
+ [36:37] PMC2_CNT
+ [38:39] PMC3_CNT
+ [40:41] PMC4_CNT
+ [42:43] PMC5_CNT
+ [44:45] PMC6_CNT
+ [46:47] PMC7_CNT
[48:55] undefined
[56:59] BR_TYPE
[60] CYC_CNT_VALID
[61] TSX_ABORT

Where the PMCx_CNT fields are saturating counters for the respective
PMCs. And we'll run out of bits if we get more than 12 PMCs. Is SMT=n
PMC merging still a thing?

And for some reason this counting is enabled in PERFEVTSELx[35] instead
of in LBR_CTL somewhere :/

> To support the feature, there are three main changes in ABIs.
> - A new branch sample type, PERF_SAMPLE_BRANCH_EVENT, is used as a knob
> to enable the feature.

> - Extend the struct perf_branch_entry layout, because we have to save
> and pass the occurrences of events to user space. Since it's only
> available for 4 counters and saturating at value 3, it only occupies 8
> bits. For the current Intel implementation, the order is the order of
> counters.

Only for 4? Where does it say that? If it were to only support 4, then
we're in counter scheduling contraint hell again and we need to somehow
group all these things together with the LBR event.

@@ -1410,6 +1423,10 @@ union perf_mem_data_src {
* cycles: cycles from last branch (or 0 if not supported)
* type: branch type
* spec: branch speculation info (or 0 if not supported)
+ * events: occurrences of events since the last branch entry.
+ * The fields can store up to 4 events with saturating
+ * at value 3.
+ * (or 0 if not supported)
*/
struct perf_branch_entry {
__u64 from;
@@ -1423,7 +1440,8 @@ struct perf_branch_entry {
spec:2, /* branch speculation info */
new_type:4, /* additional branch type */
priv:3, /* privilege level */
- reserved:31;
+ events:8, /* occurrences of events since the last branch entry */
+ reserved:23;
};

union perf_sample_weight {

This seems properly terrible from an interface pov. What if the next
generation of silicon extends this to all 8 PMCs or another architecture
comes along that does this with 3 bits per counter etc...

> - Add a new PERF_SAMPLE format, PERF_SAMPLE_BRANCH_EVENT_IDS, to dump
> the order information. User space tool doesn't understand the order of
> counters. So it cannot map the new fields in struct perf_branch_entry to
> a specific event. We have to dump the order information.

Sorry; I can't parse this.


2023-04-14 16:04:42

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging



On 2023-04-14 10:53 a.m., Peter Zijlstra wrote:
> On Fri, Apr 14, 2023 at 09:35:37AM -0400, Liang, Kan wrote:
>>
>>
>> On 2023-04-14 6:38 a.m., Peter Zijlstra wrote:
>>> On Mon, Apr 10, 2023 at 01:43:48PM -0700, [email protected] wrote:
>>>> From: Kan Liang <[email protected]>
>>>>
>>>> With the cycle time information between branches, stalls can be easily
>>>> observed. But it's difficult to explain what causes the long delay.
>>>>
>>>> Add a new field to collect the occurrences of events since the last
>>>> branch entry, which can be used to provide some causality information
>>>> for the cycle time values currently recorded in branches.
>>>>
>>>> Add a new branch sample type to indicate whether include occurrences of
>>>> events in branch info.
>>>>
>>>> Only support up to 4 events with saturating at value 3.
>>>> In the current kernel, the events are ordered by either the counter
>>>> index or the enabling sequence. But none of the order information is
>>>> available to the user space tool.
>>>> Add a new PERF_SAMPLE format, PERF_SAMPLE_BRANCH_EVENT_IDS, and generic
>>>> support to dump the event IDs of the branch events.
>>>> Add a helper function to detect the branch event flag.
>>>> These will be used in the following patch.
>>>
>>> I'm having trouble reverse engineering this. Can you more coherently
>>> explain this feature and how you've implemented it?
>>
>> Sorry for that.
>>
>> The feature is an enhancement of ARCH LBR. It adds new fields in the
>> LBR_INFO MSRs to log the occurrences of events on the first 4 GP
>> counters. Worked with the previous timed LBR feature together, the user
>> can understand not only the latency between two LBR blocks, but also
>> which events causes the stall.
>>
>> The spec can be found at the latest Intel® Architecture Instruction Set
>> Extensions and Future Features, v048. Chapter 8.4.
>> https://cdrdv2.intel.com/v1/dl/getContent/671368
>
> Oh gawd; that's terse. Why can't these people write comprehensible
> things :/ It's almost as if they don't want this stuff to be used.
>
> So IA32_LBR_x_INFO is extended:
>
> [0:15] CYC_CNT
> [16:31] undefined
> + [32:33] PMC0_CNT
> + [34:35] PMC1_CNT
> + [36:37] PMC2_CNT
> + [38:39] PMC3_CNT
> + [40:41] PMC4_CNT
> + [42:43] PMC5_CNT
> + [44:45] PMC6_CNT
> + [46:47] PMC7_CNT
> [48:55] undefined
> [56:59] BR_TYPE
> [60] CYC_CNT_VALID
> [61] TSX_ABORT
>
> Where the PMCx_CNT fields are saturating counters for the respective
> PMCs. And we'll run out of bits if we get more than 12 PMCs. Is SMT=n
> PMC merging still a thing?
>
> And for some reason this counting is enabled in PERFEVTSELx[35] instead
> of in LBR_CTL somewhere :/
>
>> To support the feature, there are three main changes in ABIs.
>> - A new branch sample type, PERF_SAMPLE_BRANCH_EVENT, is used as a knob
>> to enable the feature.
>
>> - Extend the struct perf_branch_entry layout, because we have to save
>> and pass the occurrences of events to user space. Since it's only
>> available for 4 counters and saturating at value 3, it only occupies 8
>> bits. For the current Intel implementation, the order is the order of
>> counters.
>
> Only for 4? Where does it say that?

"Per-counter support for LBR Event Logging is indicated by the “Event
Logging Supported” bitmap in
CPUID.(EAX=01CH, ECX=0).ECX[19:16]"

There are only 4 bits to indicate the supported counter.

> If it were to only support 4, then
> we're in counter scheduling contraint hell again

Unfortunately, yes.

> and we need to somehow
> group all these things together with the LBR event.

Group will bring many limits for the usage. For example, I was told
there could be someone wants to use it with multiplexing.

>
> @@ -1410,6 +1423,10 @@ union perf_mem_data_src {
> * cycles: cycles from last branch (or 0 if not supported)
> * type: branch type
> * spec: branch speculation info (or 0 if not supported)
> + * events: occurrences of events since the last branch entry.
> + * The fields can store up to 4 events with saturating
> + * at value 3.
> + * (or 0 if not supported)
> */
> struct perf_branch_entry {
> __u64 from;
> @@ -1423,7 +1440,8 @@ struct perf_branch_entry {
> spec:2, /* branch speculation info */
> new_type:4, /* additional branch type */
> priv:3, /* privilege level */
> - reserved:31;
> + events:8, /* occurrences of events since the last branch entry */
> + reserved:23;
> };
>
> union perf_sample_weight {
>
> This seems properly terrible from an interface pov. What if the next
> generation of silicon extends this to all 8 PMCs or another architecture
> comes along that does this with 3 bits per counter etc...

OK. The reserved space is not enough anymore. I think we have to add
several new fields. I will redesign it.

>
>> - Add a new PERF_SAMPLE format, PERF_SAMPLE_BRANCH_EVENT_IDS, to dump
>> the order information. User space tool doesn't understand the order of
>> counters. So it cannot map the new fields in struct perf_branch_entry to
>> a specific event. We have to dump the order information.
>
> Sorry; I can't parse this.

The perf tool has no idea which physical counter is assigned to an event.
The HW has no idea about an event. It only log the information from the
counter 0 into IA32_LBR_x_INFO[32:33].
If we pass the information from IA32_LBR_x_INFO[32:33] to the perf tool.
The perf tool lacks of knowledge to connect the information to an event.
So we have to dump the event ID at the same time.

Thanks,
Kan

2023-04-14 16:16:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging

On Fri, Apr 14, 2023 at 11:56:41AM -0400, Liang, Kan wrote:
> > If it were to only support 4, then
> > we're in counter scheduling contraint hell again
>
> Unfortunately, yes.
>
> > and we need to somehow
> > group all these things together with the LBR event.
>
> Group will bring many limits for the usage. For example, I was told
> there could be someone wants to use it with multiplexing.

You can create two groups, each with an LBR event, no?

2023-04-14 17:59:04

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging



On 2023-04-14 12:09 p.m., Peter Zijlstra wrote:
> On Fri, Apr 14, 2023 at 11:56:41AM -0400, Liang, Kan wrote:
>>> If it were to only support 4, then
>>> we're in counter scheduling contraint hell again
>>
>> Unfortunately, yes.
>>
>>> and we need to somehow
>>> group all these things together with the LBR event.
>>
>> Group will bring many limits for the usage. For example, I was told
>> there could be someone wants to use it with multiplexing.
>
> You can create two groups, each with an LBR event, no?

If we put everything in a group, that will make the enabling much
simpler. I don't think the perf tool needs the order information
anymore. Because the kernel enables the events one by one in a group.
The kernel just need to convert the information from the counter order
to the enabling order and dump to user space.

But if we have two groups with LBR event, the order information is still
required. Why we still want to group things?


Thanks,
Kan

2023-04-14 19:42:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging

On Fri, Apr 14, 2023 at 01:53:24PM -0400, Liang, Kan wrote:
>
>
> On 2023-04-14 12:09 p.m., Peter Zijlstra wrote:
> > On Fri, Apr 14, 2023 at 11:56:41AM -0400, Liang, Kan wrote:
> >>> If it were to only support 4, then
> >>> we're in counter scheduling contraint hell again
> >>
> >> Unfortunately, yes.
> >>
> >>> and we need to somehow
> >>> group all these things together with the LBR event.
> >>
> >> Group will bring many limits for the usage. For example, I was told
> >> there could be someone wants to use it with multiplexing.
> >
> > You can create two groups, each with an LBR event, no?
>
> If we put everything in a group, that will make the enabling much
> simpler. I don't think the perf tool needs the order information
> anymore. Because the kernel enables the events one by one in a group.
> The kernel just need to convert the information from the counter order
> to the enabling order and dump to user space.

I never understood the whole order thing. What was it trying to do?

> But if we have two groups with LBR event, the order information is still
> required. Why we still want to group things?

Why would you need that; what is that whole order nonsense about?

{e1, e2, e3, e4}, {e5, e6, e7, e8} with e1 and e5 both having LBR on
just works no?

Since they have LBR and that extra sample flag they all get a 0-3
constraint.

Since both e1 and e5 use LBR, they're mutually exclusive, either e1 or
e5 group runs.

2023-04-14 20:40:17

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging



On 2023-04-14 3:24 p.m., Peter Zijlstra wrote:
> On Fri, Apr 14, 2023 at 01:53:24PM -0400, Liang, Kan wrote:
>>
>>
>> On 2023-04-14 12:09 p.m., Peter Zijlstra wrote:
>>> On Fri, Apr 14, 2023 at 11:56:41AM -0400, Liang, Kan wrote:
>>>>> If it were to only support 4, then
>>>>> we're in counter scheduling contraint hell again
>>>>
>>>> Unfortunately, yes.
>>>>
>>>>> and we need to somehow
>>>>> group all these things together with the LBR event.
>>>>
>>>> Group will bring many limits for the usage. For example, I was told
>>>> there could be someone wants to use it with multiplexing.
>>>
>>> You can create two groups, each with an LBR event, no?
>>
>> If we put everything in a group, that will make the enabling much
>> simpler. I don't think the perf tool needs the order information
>> anymore. Because the kernel enables the events one by one in a group.
>> The kernel just need to convert the information from the counter order
>> to the enabling order and dump to user space.
>
> I never understood the whole order thing. What was it trying to do?

Let's say we have three events with the LBR event logging feature as below.
perf record -e branches,branches,instructions:ppp -j any,event

The counter 0 will be assigned to instructions:ppp, since the PDist is
only supported on GP 0 & 1.
The count 1 & 2 will be assigned to the other two branches.

If branches occurs 1 time and instructions occurs 3 times in a LBR
block, the LBR_INFO will have 0b010111 (counter order).

But as you can see from the perf command, the first event is actually
branches. Without the event IDs information, perf tool will interpret
that branches 3 branches 1 and instructions:ppp 1. That's wrong.

If there are multiple users, the situation becomes even worse.

>
>> But if we have two groups with LBR event, the order information is still
>> required. Why we still want to group things?
>
> Why would you need that; what is that whole order nonsense about?
>
> {e1, e2, e3, e4}, {e5, e6, e7, e8} with e1 and e5 both having LBR on
> just works no?
>
> Since they have LBR and that extra sample flag they all get a 0-3
> constraint.
>
> Since both e1 and e5 use LBR, they're mutually exclusive, either e1 or
> e5 group runs.

It's possible that someone pins an event using LBR, and set more than 4
events for logging, e0:D,{e1, e2},{e3, e4},{e5, e6}. If so, those events
could do multiplexing. Without the event IDs information, perf tool has
no idea how to interpret the information.


Andi, do you have any other cases which require the multiplexing support
for LBR event logging.


Thanks,
Kan

2023-04-14 22:07:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging

On Fri, Apr 14, 2023 at 04:34:45PM -0400, Liang, Kan wrote:

> > I never understood the whole order thing. What was it trying to do?
>
> Let's say we have three events with the LBR event logging feature as below.
> perf record -e branches,branches,instructions:ppp -j any,event
>
> The counter 0 will be assigned to instructions:ppp, since the PDist is
> only supported on GP 0 & 1.
> The count 1 & 2 will be assigned to the other two branches.
>
> If branches occurs 1 time and instructions occurs 3 times in a LBR
> block, the LBR_INFO will have 0b010111 (counter order).
>
> But as you can see from the perf command, the first event is actually
> branches. Without the event IDs information, perf tool will interpret
> that branches 3 branches 1 and instructions:ppp 1. That's wrong.
>
> If there are multiple users, the situation becomes even worse.

But this makes no sense what so ever; in this case you have no control
over what events you'll actually get in your LBR. Could be none of the
events you're interested in end up in 0-3 but instead land on 4-7.

> >> But if we have two groups with LBR event, the order information is still
> >> required. Why we still want to group things?
> >
> > Why would you need that; what is that whole order nonsense about?
> >
> > {e1, e2, e3, e4}, {e5, e6, e7, e8} with e1 and e5 both having LBR on
> > just works no?
> >
> > Since they have LBR and that extra sample flag they all get a 0-3
> > constraint.
> >
> > Since both e1 and e5 use LBR, they're mutually exclusive, either e1 or
> > e5 group runs.
>
> It's possible that someone pins an event using LBR, and set more than 4
> events for logging, e0:D,{e1, e2},{e3, e4},{e5, e6}. If so, those events
> could do multiplexing. Without the event IDs information, perf tool has
> no idea how to interpret the information.

Yeah, don't do this. There is no guarantee what so ever you'll get any
of those events in the 0-3 range.

You really *must* make then a group such that perf knows what events to
associated with the LBR event and constain them to the 0-3 range of
PMCs.

If you want multiplexing, simply create multiple groups with an LBR
event in them.

2023-04-14 22:52:49

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging


> Yeah, don't do this. There is no guarantee what so ever you'll get any
> of those events in the 0-3 range.


The kernel can simply force to 0-3 if LBR is enabled and the feature
too. It's in Kan's patch

and it isn't particularly complicated.

>
> You really *must* make then a group such that perf knows what events to
> associated with the LBR event and constain them to the 0-3 range of
> PMCs.
>
> If you want multiplexing, simply create multiple groups with an LBR
> event in them.


Well if you force groups then you require user space or a user which
understands all the constraints

to create groups. I thought one of the basic ideas of perf was to be
able to abstract those things.

There are tools today (e.g. the perf builtin metrics[1] and I believe
others) that don't have enough

knowledge to set up real groups and rely on the kernel.  While they
could be fixed it's a lot of work

(I do it in pmu-tools, and it's quite hairy code)


-Andi


[1] Given they are are not supported by perf record yet, but since perf
script already supports

evaluating them they could be at some point, and then it would make
sense to use them

with this feature too.

2023-04-17 11:51:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging

On Fri, Apr 14, 2023 at 03:47:29PM -0700, Andi Kleen wrote:
>
> > Yeah, don't do this. There is no guarantee what so ever you'll get any
> > of those events in the 0-3 range.
>
>
> The kernel can simply force to 0-3 if LBR is enabled and the feature too.
> It's in Kan's patch
>
> and it isn't particularly complicated.

And what, totally leave 4-7 unused even if those counters were not
related to LBR at all? That seems exceedingly daft.

2023-04-17 12:07:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging

On Fri, Apr 14, 2023 at 03:47:29PM -0700, Andi Kleen wrote:

> > You really *must* make then a group such that perf knows what events to
> > associated with the LBR event and constain them to the 0-3 range of
> > PMCs.
> >
> > If you want multiplexing, simply create multiple groups with an LBR
> > event in them.
>
>
> Well if you force groups then you require user space or a user which
> understands all the constraints

The LBR feature is naturally a group read, very similar to
PERF_SAMPLE_READ+PERF_FORMAT_GROUP.

> to create groups. I thought one of the basic ideas of perf was to be able to
> abstract those things.

Yeah, the abstraction at hand is a group.

2023-04-17 13:38:30

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging


On 4/17/2023 4:46 AM, Peter Zijlstra wrote:
> On Fri, Apr 14, 2023 at 03:47:29PM -0700, Andi Kleen wrote:
>>> Yeah, don't do this. There is no guarantee what so ever you'll get any
>>> of those events in the 0-3 range.
>>
>> The kernel can simply force to 0-3 if LBR is enabled and the feature too.
>> It's in Kan's patch
>>
>> and it isn't particularly complicated.
> And what, totally leave 4-7 unused even if those counters were not
> related to LBR at all? That seems exceedingly daft.


Only for the events which enabled LBR and also only if the branch events
feature is enabled

-j event -e '{event1:b,event2:b,event3:b,event4:b,event5,event6}'

event5 and 6 can go > 3

Given there is currently no syntax to control branch events inside a
group other than fully enabling/disabling LBR.

Kan, I guess that could be added to the user tools.

-Andi

2023-04-17 13:59:37

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging


On 4/17/2023 4:55 AM, Peter Zijlstra wrote:
>
>> to create groups. I thought one of the basic ideas of perf was to be able to
>> abstract those things.
> Yeah, the abstraction at hand is a group.

Okay then if we add perf record metrics and want to support this feature
we'll need to replicate

the kernel scheduler in the user tools

-Andi


2023-04-17 14:09:09

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 2/6] perf: Support branch events logging



On 2023-04-17 9:37 a.m., Andi Kleen wrote:
>
> On 4/17/2023 4:46 AM, Peter Zijlstra wrote:
>> On Fri, Apr 14, 2023 at 03:47:29PM -0700, Andi Kleen wrote:
>>>> Yeah, don't do this. There is no guarantee what so ever you'll get any
>>>> of those events in the 0-3 range.
>>>
>>> The kernel can simply force to 0-3 if LBR is enabled and the feature
>>> too.
>>> It's in Kan's patch
>>>
>>> and it isn't particularly complicated.
>> And what, totally leave 4-7 unused even if those counters were not
>> related to LBR at all? That seems exceedingly daft.
>
>
> Only for the events which enabled LBR and also only if the branch events
> feature is enabled
>
> -j event -e '{event1:b,event2:b,event3:b,event4:b,event5,event6}'
>
> event5 and 6 can go > 3
>
> Given there is currently no syntax to control branch events inside a
> group other than fully enabling/disabling LBR.
>
> Kan, I guess that could be added to the user tools.

We already have a per-event option for LBR, branch_type, which can be
used to control branch events in a group. With the patch in this series,
we can do, e.g.,

-j call -e
'{cpu/event=0x1,branch_type=event,/,cpu/event=0x2,branch_type=event/,cpu/event=0x3,branch_type=event/,cpu/event=0x4,branch_type=event/,cpu/event=0x5/,cpu/event=0x6/}'


Thanks,
Kan