2022-09-01 13:18:19

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 0/6] Add sample_flags to improve the perf_sample_data struct

From: Kan Liang <[email protected]>

Changes since V1:
- Update the AMD LBR code in patch 3

The patch series is to fix PEBS timestamps overwritten and improve the
perf_sample_data struct. The detailed discussion can be found at
https://lore.kernel.org/lkml/YwXvGe4%[email protected]/

The patch series has two changes compared with the suggestions in the
above discussion.
- Only clear the sample flags for the perf_prepare_sample().
The __perf_event_header__init_id is shared between perf_prepare_sample()
(used by PERF_RECORD_SAMPLE) and perf_event_header__init_id() (used by
other PERF_RECORD_* event type). The sample data is only available
for the PERF_RECORD_SAMPLE.
- The CALLCHAIN_EARLY hack is still required for the BPF, especially
perf_event_set_bpf_handler(). The sample data is not available when
the function is invoked.

Kan Liang (6):
perf: Add sample_flags to indicate the PMU-filled sample data
perf/x86/intel/pebs: Fix PEBS timestamps overwritten
perf: Use sample_flags for branch stack
perf: Use sample_flags for weight
perf: Use sample_flags for data_src
perf: Use sample_flags for txn

arch/powerpc/perf/core-book3s.c | 10 ++++++---
arch/x86/events/amd/core.c | 4 +++-
arch/x86/events/core.c | 4 +++-
arch/x86/events/intel/core.c | 4 +++-
arch/x86/events/intel/ds.c | 39 ++++++++++++++++++++++++---------
include/linux/perf_event.h | 15 ++++++-------
kernel/events/core.c | 30 ++++++++++++++++++-------
7 files changed, 74 insertions(+), 32 deletions(-)

--
2.35.1


2022-09-01 13:18:22

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 1/6] perf: Add sample_flags to indicate the PMU-filled sample data

From: Kan Liang <[email protected]>

On some platforms, some data e.g., timestamps, can be retrieved from
the PMU driver. Usually, the data from the PMU driver is more accurate.
The current perf kernel should output the PMU-filled sample data if
it's available.

To check the availability of the PMU-filled sample data, the current
perf kernel initializes the related fields in the
perf_sample_data_init(). When outputting a sample, the perf checks
whether the field is updated by the PMU driver. If yes, the updated
value will be output. If not, the perf uses an SW way to calculate the
value or just outputs the initialized value if an SW way is unavailable
either.

With more and more data being provided by the PMU driver, more fields
has to be initialized in the perf_sample_data_init(). That will
increase the number of cache lines touched in perf_sample_data_init()
and be harmful to the performance.

Add new "sample_flags" to indicate the PMU-filled sample data. The PMU
driver should set the corresponding PERF_SAMPLE_ flag when the field is
updated. The initialization of the corresponding field is not required
anymore. The following patches will make use of it and remove the
corresponding fields from the perf_sample_data_init(), which will
further minimize the number of cache lines touched.

Only clear the sample flags that have already been done by the PMU
driver in the perf_prepare_sample() for the PERF_RECORD_SAMPLE. For the
other PERF_RECORD_ event type, the sample data is not available.

Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
include/linux/perf_event.h | 2 ++
kernel/events/core.c | 17 +++++++++++------
2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1999408a9cbb..0978165a2d87 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1008,6 +1008,7 @@ struct perf_sample_data {
* Fields set by perf_sample_data_init(), group so as to
* minimize the cachelines touched.
*/
+ u64 sample_flags;
u64 addr;
struct perf_raw_record *raw;
struct perf_branch_stack *br_stack;
@@ -1057,6 +1058,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
u64 addr, u64 period)
{
/* remaining struct members initialized in perf_prepare_sample() */
+ data->sample_flags = 0;
data->addr = addr;
data->raw = NULL;
data->br_stack = NULL;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2621fd24ad26..c9b9cb79231a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6794,11 +6794,10 @@ static void perf_aux_sample_output(struct perf_event *event,

static void __perf_event_header__init_id(struct perf_event_header *header,
struct perf_sample_data *data,
- struct perf_event *event)
+ struct perf_event *event,
+ u64 sample_type)
{
- u64 sample_type = event->attr.sample_type;
-
- data->type = sample_type;
+ data->type = event->attr.sample_type;
header->size += event->id_header_size;

if (sample_type & PERF_SAMPLE_TID) {
@@ -6827,7 +6826,7 @@ void perf_event_header__init_id(struct perf_event_header *header,
struct perf_event *event)
{
if (event->attr.sample_id_all)
- __perf_event_header__init_id(header, data, event);
+ __perf_event_header__init_id(header, data, event, event->attr.sample_type);
}

static void __perf_event__output_id_sample(struct perf_output_handle *handle,
@@ -7303,6 +7302,7 @@ void perf_prepare_sample(struct perf_event_header *header,
struct pt_regs *regs)
{
u64 sample_type = event->attr.sample_type;
+ u64 filtered_sample_type;

header->type = PERF_RECORD_SAMPLE;
header->size = sizeof(*header) + event->header_size;
@@ -7310,7 +7310,12 @@ void perf_prepare_sample(struct perf_event_header *header,
header->misc = 0;
header->misc |= perf_misc_flags(regs);

- __perf_event_header__init_id(header, data, event);
+ /*
+ * Clear the sample flags that have already been done by the
+ * PMU driver.
+ */
+ filtered_sample_type = sample_type & ~data->sample_flags;
+ __perf_event_header__init_id(header, data, event, filtered_sample_type);

if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
data->ip = perf_instruction_pointer(regs);
--
2.35.1

2022-09-01 13:18:39

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 5/6] perf: Use sample_flags for data_src

From: Kan Liang <[email protected]>

Use the new sample_flags to indicate whether the data_src field is
filled by the PMU driver.

Remove the data_src field from the perf_sample_data_init() to minimize
the number of cache lines touched.

Signed-off-by: Kan Liang <[email protected]>
---
arch/powerpc/perf/core-book3s.c | 4 +++-
arch/x86/events/intel/ds.c | 8 ++++++--
include/linux/perf_event.h | 3 +--
kernel/events/core.c | 3 +++
4 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index a5c95a2006ea..6ec7069e6482 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2301,8 +2301,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
}

if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
- ppmu->get_mem_data_src)
+ ppmu->get_mem_data_src) {
ppmu->get_mem_data_src(&data.data_src, ppmu->flags, regs);
+ data.sample_flags |= PERF_SAMPLE_DATA_SRC;
+ }

if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
ppmu->get_mem_weight) {
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index f66a4905cc87..985bbbab057f 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1543,8 +1543,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
/*
* data.data_src encodes the data source
*/
- if (sample_type & PERF_SAMPLE_DATA_SRC)
+ if (sample_type & PERF_SAMPLE_DATA_SRC) {
data->data_src.val = get_data_src(event, pebs->dse);
+ data->sample_flags |= PERF_SAMPLE_DATA_SRC;
+ }

/*
* We must however always use iregs for the unwinder to stay sane; the
@@ -1778,8 +1780,10 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
}

- if (sample_type & PERF_SAMPLE_DATA_SRC)
+ if (sample_type & PERF_SAMPLE_DATA_SRC) {
data->data_src.val = get_data_src(event, meminfo->aux);
+ data->sample_flags |= PERF_SAMPLE_DATA_SRC;
+ }

if (sample_type & PERF_SAMPLE_ADDR_TYPE)
data->addr = meminfo->address;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 06a587b5faa9..6849f10dfc7e 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1013,7 +1013,6 @@ struct perf_sample_data {
struct perf_raw_record *raw;
u64 period;
u64 txn;
- union perf_mem_data_src data_src;

/*
* The other fields, optionally {set,used} by
@@ -1021,6 +1020,7 @@ struct perf_sample_data {
*/
struct perf_branch_stack *br_stack;
union perf_sample_weight weight;
+ union perf_mem_data_src data_src;

u64 type;
u64 ip;
@@ -1063,7 +1063,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->addr = addr;
data->raw = NULL;
data->period = period;
- data->data_src.val = PERF_MEM_NA;
data->txn = 0;
}

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f0af45db02b3..163e2f478e61 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7411,6 +7411,9 @@ void perf_prepare_sample(struct perf_event_header *header,
if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE)
data->weight.full = 0;

+ if (filtered_sample_type & PERF_SAMPLE_DATA_SRC)
+ data->data_src.val = PERF_MEM_NA;
+
if (sample_type & PERF_SAMPLE_REGS_INTR) {
/* regs dump ABI info */
int size = sizeof(u64);
--
2.35.1

2022-09-01 13:38:27

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 4/6] perf: Use sample_flags for weight

From: Kan Liang <[email protected]>

Use the new sample_flags to indicate whether the weight field is filled
by the PMU driver.

Remove the weight field from the perf_sample_data_init() to minimize the
number of cache lines touched.

Signed-off-by: Kan Liang <[email protected]>
---
arch/powerpc/perf/core-book3s.c | 5 +++--
arch/x86/events/intel/ds.c | 10 +++++++---
include/linux/perf_event.h | 3 +--
kernel/events/core.c | 3 +++
4 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 1ad1efdb33f9..a5c95a2006ea 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2305,9 +2305,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
ppmu->get_mem_data_src(&data.data_src, ppmu->flags, regs);

if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
- ppmu->get_mem_weight)
+ ppmu->get_mem_weight) {
ppmu->get_mem_weight(&data.weight.full, event->attr.sample_type);
-
+ data.sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
+ }
if (perf_event_overflow(event, &data, regs))
power_pmu_stop(event, 0);
} else if (period) {
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 5dcfd2de6ef8..f66a4905cc87 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1535,8 +1535,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
/*
* Use latency for weight (only avail with PEBS-LL)
*/
- if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE))
+ if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE)) {
data->weight.full = pebs->lat;
+ data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
+ }

/*
* data.data_src encodes the data source
@@ -1628,9 +1630,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,

if (x86_pmu.intel_cap.pebs_format >= 2) {
/* Only set the TSX weight when no memory weight. */
- if ((sample_type & PERF_SAMPLE_WEIGHT_TYPE) && !fll)
+ if ((sample_type & PERF_SAMPLE_WEIGHT_TYPE) && !fll) {
data->weight.full = intel_get_tsx_weight(pebs->tsx_tuning);
-
+ data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
+ }
if (sample_type & PERF_SAMPLE_TRANSACTION)
data->txn = intel_get_tsx_transaction(pebs->tsx_tuning,
pebs->ax);
@@ -1772,6 +1775,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
data->weight.var1_dw = (u32)(weight & PEBS_LATENCY_MASK) ?:
intel_get_tsx_weight(meminfo->tsx_tuning);
}
+ data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
}

if (sample_type & PERF_SAMPLE_DATA_SRC)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1e12e79454e0..06a587b5faa9 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1012,7 +1012,6 @@ struct perf_sample_data {
u64 addr;
struct perf_raw_record *raw;
u64 period;
- union perf_sample_weight weight;
u64 txn;
union perf_mem_data_src data_src;

@@ -1021,6 +1020,7 @@ struct perf_sample_data {
* perf_{prepare,output}_sample().
*/
struct perf_branch_stack *br_stack;
+ union perf_sample_weight weight;

u64 type;
u64 ip;
@@ -1063,7 +1063,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->addr = addr;
data->raw = NULL;
data->period = period;
- data->weight.full = 0;
data->data_src.val = PERF_MEM_NA;
data->txn = 0;
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 104c0c9f4e6f..f0af45db02b3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7408,6 +7408,9 @@ void perf_prepare_sample(struct perf_event_header *header,
header->size += size;
}

+ if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE)
+ data->weight.full = 0;
+
if (sample_type & PERF_SAMPLE_REGS_INTR) {
/* regs dump ABI info */
int size = sizeof(u64);
--
2.35.1

2022-09-01 14:11:56

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 2/6] perf/x86/intel/pebs: Fix PEBS timestamps overwritten

From: Kan Liang <[email protected]>

The PEBS TSC-based timestamps do not appear correctly in the final
perf.data output file from perf record.

The data->time field setup by PEBS in the setup_pebs_fixed_sample_data()
is later overwritten by perf_events generic code in
perf_prepare_sample(). There is an ordering problem.

Set the sample flags when the data->time is updated by PEBS.
The data->time field will not be overwritten anymore.

Reported-by: Andreas Kogler <[email protected]>
Reported-by: Stephane Eranian <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/ds.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index de1f55d51784..01cbe26225c2 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1643,8 +1643,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
* We can only do this for the default trace clock.
*/
if (x86_pmu.intel_cap.pebs_format >= 3 &&
- event->attr.use_clockid == 0)
+ event->attr.use_clockid == 0) {
data->time = native_sched_clock_from_tsc(pebs->tsc);
+ data->sample_flags |= PERF_SAMPLE_TIME;
+ }

if (has_branch_stack(event))
data->br_stack = &cpuc->lbr_stack;
@@ -1705,8 +1707,10 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
perf_sample_data_init(data, 0, event->hw.last_period);
data->period = event->hw.last_period;

- if (event->attr.use_clockid == 0)
+ if (event->attr.use_clockid == 0) {
data->time = native_sched_clock_from_tsc(basic->tsc);
+ data->sample_flags |= PERF_SAMPLE_TIME;
+ }

/*
* We must however always use iregs for the unwinder to stay sane; the
--
2.35.1

2022-09-01 14:14:48

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 3/6] perf: Use sample_flags for branch stack

From: Kan Liang <[email protected]>

Use the new sample_flags to indicate whether the branch stack is filled
by the PMU driver.

Remove the br_stack from the perf_sample_data_init() to minimize the number
of cache lines touched.

Signed-off-by: Kan Liang <[email protected]>
---
arch/powerpc/perf/core-book3s.c | 1 +
arch/x86/events/amd/core.c | 4 +++-
arch/x86/events/core.c | 4 +++-
arch/x86/events/intel/core.c | 4 +++-
arch/x86/events/intel/ds.c | 5 ++++-
include/linux/perf_event.h | 4 ++--
kernel/events/core.c | 4 ++--
7 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 13919eb96931..1ad1efdb33f9 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2297,6 +2297,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
cpuhw = this_cpu_ptr(&cpu_hw_events);
power_pmu_bhrb_read(event, cpuhw);
data.br_stack = &cpuhw->bhrb_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
}

if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 9fbdfbcaf25a..8b70237c33f7 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -929,8 +929,10 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
if (!x86_perf_event_set_period(event))
continue;

- if (has_branch_stack(event))
+ if (has_branch_stack(event)) {
data.br_stack = &cpuc->lbr_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+ }

if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c3d6d139a3c1..b30b8bbcd1e2 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1701,8 +1701,10 @@ int x86_pmu_handle_irq(struct pt_regs *regs)

perf_sample_data_init(&data, 0, event->hw.last_period);

- if (has_branch_stack(event))
+ if (has_branch_stack(event)) {
data.br_stack = &cpuc->lbr_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+ }

if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 7c1e3d36bc65..b5c02627a155 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3017,8 +3017,10 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)

perf_sample_data_init(&data, 0, event->hw.last_period);

- if (has_branch_stack(event))
+ if (has_branch_stack(event)) {
data.br_stack = &cpuc->lbr_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+ }

if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 01cbe26225c2..5dcfd2de6ef8 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1648,8 +1648,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
data->sample_flags |= PERF_SAMPLE_TIME;
}

- if (has_branch_stack(event))
+ if (has_branch_stack(event)) {
data->br_stack = &cpuc->lbr_stack;
+ data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+ }
}

static void adaptive_pebs_save_regs(struct pt_regs *regs,
@@ -1799,6 +1801,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
if (has_branch_stack(event)) {
intel_pmu_store_pebs_lbrs(lbr);
data->br_stack = &cpuc->lbr_stack;
+ data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
}
}

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0978165a2d87..1e12e79454e0 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1011,7 +1011,6 @@ struct perf_sample_data {
u64 sample_flags;
u64 addr;
struct perf_raw_record *raw;
- struct perf_branch_stack *br_stack;
u64 period;
union perf_sample_weight weight;
u64 txn;
@@ -1021,6 +1020,8 @@ struct perf_sample_data {
* The other fields, optionally {set,used} by
* perf_{prepare,output}_sample().
*/
+ struct perf_branch_stack *br_stack;
+
u64 type;
u64 ip;
struct {
@@ -1061,7 +1062,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->sample_flags = 0;
data->addr = addr;
data->raw = NULL;
- data->br_stack = NULL;
data->period = period;
data->weight.full = 0;
data->data_src.val = PERF_MEM_NA;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c9b9cb79231a..104c0c9f4e6f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7052,7 +7052,7 @@ void perf_output_sample(struct perf_output_handle *handle,
}

if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
- if (data->br_stack) {
+ if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
size_t size;

size = data->br_stack->nr
@@ -7358,7 +7358,7 @@ void perf_prepare_sample(struct perf_event_header *header,

if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
int size = sizeof(u64); /* nr */
- if (data->br_stack) {
+ if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
if (perf_sample_save_hw_index(event))
size += sizeof(u64);

--
2.35.1

2022-09-01 14:15:20

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 6/6] perf: Use sample_flags for txn

From: Kan Liang <[email protected]>

Use the new sample_flags to indicate whether the txn field is filled by
the PMU driver.

Remove the txn field from the perf_sample_data_init() to minimize the
number of cache lines touched.

Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/ds.c | 8 ++++++--
include/linux/perf_event.h | 3 +--
kernel/events/core.c | 3 +++
3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 985bbbab057f..cbcb5c2e832c 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1636,9 +1636,11 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
data->weight.full = intel_get_tsx_weight(pebs->tsx_tuning);
data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
}
- if (sample_type & PERF_SAMPLE_TRANSACTION)
+ if (sample_type & PERF_SAMPLE_TRANSACTION) {
data->txn = intel_get_tsx_transaction(pebs->tsx_tuning,
pebs->ax);
+ data->sample_flags |= PERF_SAMPLE_TRANSACTION;
+ }
}

/*
@@ -1788,9 +1790,11 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
if (sample_type & PERF_SAMPLE_ADDR_TYPE)
data->addr = meminfo->address;

- if (sample_type & PERF_SAMPLE_TRANSACTION)
+ if (sample_type & PERF_SAMPLE_TRANSACTION) {
data->txn = intel_get_tsx_transaction(meminfo->tsx_tuning,
gprs ? gprs->ax : 0);
+ data->sample_flags |= PERF_SAMPLE_TRANSACTION;
+ }
}

if (format_size & PEBS_DATACFG_XMMS) {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6849f10dfc7e..581880ddb9ef 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1012,7 +1012,6 @@ struct perf_sample_data {
u64 addr;
struct perf_raw_record *raw;
u64 period;
- u64 txn;

/*
* The other fields, optionally {set,used} by
@@ -1021,6 +1020,7 @@ struct perf_sample_data {
struct perf_branch_stack *br_stack;
union perf_sample_weight weight;
union perf_mem_data_src data_src;
+ u64 txn;

u64 type;
u64 ip;
@@ -1063,7 +1063,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->addr = addr;
data->raw = NULL;
data->period = period;
- data->txn = 0;
}

/*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 163e2f478e61..15d27b14c827 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7414,6 +7414,9 @@ void perf_prepare_sample(struct perf_event_header *header,
if (filtered_sample_type & PERF_SAMPLE_DATA_SRC)
data->data_src.val = PERF_MEM_NA;

+ if (filtered_sample_type & PERF_SAMPLE_TRANSACTION)
+ data->txn = 0;
+
if (sample_type & PERF_SAMPLE_REGS_INTR) {
/* regs dump ABI info */
int size = sizeof(u64);
--
2.35.1

2022-09-02 05:41:01

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH V2 4/6] perf: Use sample_flags for weight

On Thu, Sep 1, 2022 at 6:10 AM <[email protected]> wrote:
>
> From: Kan Liang <[email protected]>
>
> Use the new sample_flags to indicate whether the weight field is filled
> by the PMU driver.
>
> Remove the weight field from the perf_sample_data_init() to minimize the
> number of cache lines touched.
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> arch/powerpc/perf/core-book3s.c | 5 +++--
> arch/x86/events/intel/ds.c | 10 +++++++---
> include/linux/perf_event.h | 3 +--
> kernel/events/core.c | 3 +++
> 4 files changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 1ad1efdb33f9..a5c95a2006ea 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -2305,9 +2305,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
> ppmu->get_mem_data_src(&data.data_src, ppmu->flags, regs);
>
> if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
> - ppmu->get_mem_weight)
> + ppmu->get_mem_weight) {
> ppmu->get_mem_weight(&data.weight.full, event->attr.sample_type);
> -
> + data.sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
> + }
> if (perf_event_overflow(event, &data, regs))
> power_pmu_stop(event, 0);
> } else if (period) {
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 5dcfd2de6ef8..f66a4905cc87 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1535,8 +1535,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
> /*
> * Use latency for weight (only avail with PEBS-LL)
> */
> - if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE))
> + if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE)) {
> data->weight.full = pebs->lat;
> + data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
> + }
>
> /*
> * data.data_src encodes the data source
> @@ -1628,9 +1630,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
>
> if (x86_pmu.intel_cap.pebs_format >= 2) {
> /* Only set the TSX weight when no memory weight. */
> - if ((sample_type & PERF_SAMPLE_WEIGHT_TYPE) && !fll)
> + if ((sample_type & PERF_SAMPLE_WEIGHT_TYPE) && !fll) {
> data->weight.full = intel_get_tsx_weight(pebs->tsx_tuning);
> -
> + data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
> + }
> if (sample_type & PERF_SAMPLE_TRANSACTION)
> data->txn = intel_get_tsx_transaction(pebs->tsx_tuning,
> pebs->ax);
> @@ -1772,6 +1775,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
> data->weight.var1_dw = (u32)(weight & PEBS_LATENCY_MASK) ?:
> intel_get_tsx_weight(meminfo->tsx_tuning);
> }
> + data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;

I was thinking about splitting PERF_SAMPLE_WEIGHT and
PERF_SAMPLE_WEIGHT_STRUCT but it'd just add complexity
unnecessarily?

Thanks,
Namhyung

> }
>
> if (sample_type & PERF_SAMPLE_DATA_SRC)
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 1e12e79454e0..06a587b5faa9 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1012,7 +1012,6 @@ struct perf_sample_data {
> u64 addr;
> struct perf_raw_record *raw;
> u64 period;
> - union perf_sample_weight weight;
> u64 txn;
> union perf_mem_data_src data_src;
>
> @@ -1021,6 +1020,7 @@ struct perf_sample_data {
> * perf_{prepare,output}_sample().
> */
> struct perf_branch_stack *br_stack;
> + union perf_sample_weight weight;
>
> u64 type;
> u64 ip;
> @@ -1063,7 +1063,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
> data->addr = addr;
> data->raw = NULL;
> data->period = period;
> - data->weight.full = 0;
> data->data_src.val = PERF_MEM_NA;
> data->txn = 0;
> }
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 104c0c9f4e6f..f0af45db02b3 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7408,6 +7408,9 @@ void perf_prepare_sample(struct perf_event_header *header,
> header->size += size;
> }
>
> + if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE)
> + data->weight.full = 0;
> +
> if (sample_type & PERF_SAMPLE_REGS_INTR) {
> /* regs dump ABI info */
> int size = sizeof(u64);
> --
> 2.35.1
>

2022-09-02 05:42:22

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH V2 0/6] Add sample_flags to improve the perf_sample_data struct

On Thu, Sep 1, 2022 at 6:10 AM <[email protected]> wrote:
>
> From: Kan Liang <[email protected]>
>
> Changes since V1:
> - Update the AMD LBR code in patch 3
>
> The patch series is to fix PEBS timestamps overwritten and improve the
> perf_sample_data struct. The detailed discussion can be found at
> https://lore.kernel.org/lkml/YwXvGe4%[email protected]/
>
> The patch series has two changes compared with the suggestions in the
> above discussion.
> - Only clear the sample flags for the perf_prepare_sample().
> The __perf_event_header__init_id is shared between perf_prepare_sample()
> (used by PERF_RECORD_SAMPLE) and perf_event_header__init_id() (used by
> other PERF_RECORD_* event type). The sample data is only available
> for the PERF_RECORD_SAMPLE.
> - The CALLCHAIN_EARLY hack is still required for the BPF, especially
> perf_event_set_bpf_handler(). The sample data is not available when
> the function is invoked.
>
> Kan Liang (6):
> perf: Add sample_flags to indicate the PMU-filled sample data
> perf/x86/intel/pebs: Fix PEBS timestamps overwritten
> perf: Use sample_flags for branch stack
> perf: Use sample_flags for weight
> perf: Use sample_flags for data_src
> perf: Use sample_flags for txn

Maybe we can get rid of perf_sample_data_init() completely.
But it could be done later, so

Acked-by: Namhyung Kim <[email protected]>

Thanks,
Namhyung


>
> arch/powerpc/perf/core-book3s.c | 10 ++++++---
> arch/x86/events/amd/core.c | 4 +++-
> arch/x86/events/core.c | 4 +++-
> arch/x86/events/intel/core.c | 4 +++-
> arch/x86/events/intel/ds.c | 39 ++++++++++++++++++++++++---------
> include/linux/perf_event.h | 15 ++++++-------
> kernel/events/core.c | 30 ++++++++++++++++++-------
> 7 files changed, 74 insertions(+), 32 deletions(-)
>
> --
> 2.35.1
>

2022-09-05 04:07:16

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH V2 0/6] Add sample_flags to improve the perf_sample_data struct

On 02-Sep-22 11:02 AM, Namhyung Kim wrote:
> On Thu, Sep 1, 2022 at 6:10 AM <[email protected]> wrote:
>>
>> From: Kan Liang <[email protected]>
>>
>> Changes since V1:
>> - Update the AMD LBR code in patch 3

Thanks.

I'll also need to rebase my perf mem/c2c series:
https://lore.kernel.org/r/20220616113638.900-1-ravi.bangoria%40amd.com

Subject: [tip: perf/core] perf: Use sample_flags for weight

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 2abe681da0a192ab850a5271d838a7817b469fca
Gitweb: https://git.kernel.org/tip/2abe681da0a192ab850a5271d838a7817b469fca
Author: Kan Liang <[email protected]>
AuthorDate: Thu, 01 Sep 2022 06:09:57 -07:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 06 Sep 2022 11:33:02 +02:00

perf: Use sample_flags for weight

Use the new sample_flags to indicate whether the weight field is filled
by the PMU driver.

Remove the weight field from the perf_sample_data_init() to minimize the
number of cache lines touched.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/powerpc/perf/core-book3s.c | 5 +++--
arch/x86/events/intel/ds.c | 10 +++++++---
include/linux/perf_event.h | 3 +--
kernel/events/core.c | 3 +++
4 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 1ad1efd..a5c95a2 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2305,9 +2305,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
ppmu->get_mem_data_src(&data.data_src, ppmu->flags, regs);

if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
- ppmu->get_mem_weight)
+ ppmu->get_mem_weight) {
ppmu->get_mem_weight(&data.weight.full, event->attr.sample_type);
-
+ data.sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
+ }
if (perf_event_overflow(event, &data, regs))
power_pmu_stop(event, 0);
} else if (period) {
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 0489f75..4c51118 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1527,8 +1527,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
/*
* Use latency for weight (only avail with PEBS-LL)
*/
- if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE))
+ if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE)) {
data->weight.full = pebs->lat;
+ data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
+ }

/*
* data.data_src encodes the data source
@@ -1620,9 +1622,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,

if (x86_pmu.intel_cap.pebs_format >= 2) {
/* Only set the TSX weight when no memory weight. */
- if ((sample_type & PERF_SAMPLE_WEIGHT_TYPE) && !fll)
+ if ((sample_type & PERF_SAMPLE_WEIGHT_TYPE) && !fll) {
data->weight.full = intel_get_tsx_weight(pebs->tsx_tuning);
-
+ data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
+ }
if (sample_type & PERF_SAMPLE_TRANSACTION)
data->txn = intel_get_tsx_transaction(pebs->tsx_tuning,
pebs->ax);
@@ -1764,6 +1767,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
data->weight.var1_dw = (u32)(weight & PEBS_LATENCY_MASK) ?:
intel_get_tsx_weight(meminfo->tsx_tuning);
}
+ data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
}

if (sample_type & PERF_SAMPLE_DATA_SRC)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1e12e79..06a587b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1012,7 +1012,6 @@ struct perf_sample_data {
u64 addr;
struct perf_raw_record *raw;
u64 period;
- union perf_sample_weight weight;
u64 txn;
union perf_mem_data_src data_src;

@@ -1021,6 +1020,7 @@ struct perf_sample_data {
* perf_{prepare,output}_sample().
*/
struct perf_branch_stack *br_stack;
+ union perf_sample_weight weight;

u64 type;
u64 ip;
@@ -1063,7 +1063,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->addr = addr;
data->raw = NULL;
data->period = period;
- data->weight.full = 0;
data->data_src.val = PERF_MEM_NA;
data->txn = 0;
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 104c0c9..f0af45d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7408,6 +7408,9 @@ void perf_prepare_sample(struct perf_event_header *header,
header->size += size;
}

+ if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE)
+ data->weight.full = 0;
+
if (sample_type & PERF_SAMPLE_REGS_INTR) {
/* regs dump ABI info */
int size = sizeof(u64);

Subject: [tip: perf/core] perf: Use sample_flags for txn

The following commit has been merged into the perf/core branch of tip:

Commit-ID: ee9db0e14b0575aa827579dc2471a29ec5fc6877
Gitweb: https://git.kernel.org/tip/ee9db0e14b0575aa827579dc2471a29ec5fc6877
Author: Kan Liang <[email protected]>
AuthorDate: Thu, 01 Sep 2022 06:09:59 -07:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 06 Sep 2022 11:33:03 +02:00

perf: Use sample_flags for txn

Use the new sample_flags to indicate whether the txn field is filled by
the PMU driver.

Remove the txn field from the perf_sample_data_init() to minimize the
number of cache lines touched.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/intel/ds.c | 8 ++++++--
include/linux/perf_event.h | 3 +--
kernel/events/core.c | 3 +++
3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index bde73d4..a5275c2 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1628,9 +1628,11 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
data->weight.full = intel_get_tsx_weight(pebs->tsx_tuning);
data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
}
- if (sample_type & PERF_SAMPLE_TRANSACTION)
+ if (sample_type & PERF_SAMPLE_TRANSACTION) {
data->txn = intel_get_tsx_transaction(pebs->tsx_tuning,
pebs->ax);
+ data->sample_flags |= PERF_SAMPLE_TRANSACTION;
+ }
}

/*
@@ -1780,9 +1782,11 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
if (sample_type & PERF_SAMPLE_ADDR_TYPE)
data->addr = meminfo->address;

- if (sample_type & PERF_SAMPLE_TRANSACTION)
+ if (sample_type & PERF_SAMPLE_TRANSACTION) {
data->txn = intel_get_tsx_transaction(meminfo->tsx_tuning,
gprs ? gprs->ax : 0);
+ data->sample_flags |= PERF_SAMPLE_TRANSACTION;
+ }
}

if (format_size & PEBS_DATACFG_XMMS) {
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6849f10..581880d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1012,7 +1012,6 @@ struct perf_sample_data {
u64 addr;
struct perf_raw_record *raw;
u64 period;
- u64 txn;

/*
* The other fields, optionally {set,used} by
@@ -1021,6 +1020,7 @@ struct perf_sample_data {
struct perf_branch_stack *br_stack;
union perf_sample_weight weight;
union perf_mem_data_src data_src;
+ u64 txn;

u64 type;
u64 ip;
@@ -1063,7 +1063,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->addr = addr;
data->raw = NULL;
data->period = period;
- data->txn = 0;
}

/*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 163e2f4..15d27b1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7414,6 +7414,9 @@ void perf_prepare_sample(struct perf_event_header *header,
if (filtered_sample_type & PERF_SAMPLE_DATA_SRC)
data->data_src.val = PERF_MEM_NA;

+ if (filtered_sample_type & PERF_SAMPLE_TRANSACTION)
+ data->txn = 0;
+
if (sample_type & PERF_SAMPLE_REGS_INTR) {
/* regs dump ABI info */
int size = sizeof(u64);

Subject: [tip: perf/core] perf/x86/intel/pebs: Fix PEBS timestamps overwritten

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 47a3aeb39e8dc099ae431cd8b46bdf218f5511b2
Gitweb: https://git.kernel.org/tip/47a3aeb39e8dc099ae431cd8b46bdf218f5511b2
Author: Kan Liang <[email protected]>
AuthorDate: Thu, 01 Sep 2022 06:09:55 -07:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 06 Sep 2022 11:33:01 +02:00

perf/x86/intel/pebs: Fix PEBS timestamps overwritten

The PEBS TSC-based timestamps do not appear correctly in the final
perf.data output file from perf record.

The data->time field setup by PEBS in the setup_pebs_fixed_sample_data()
is later overwritten by perf_events generic code in
perf_prepare_sample(). There is an ordering problem.

Set the sample flags when the data->time is updated by PEBS.
The data->time field will not be overwritten anymore.

Reported-by: Andreas Kogler <[email protected]>
Reported-by: Stephane Eranian <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/events/intel/ds.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index ba60427..cdd857b 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1635,8 +1635,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
* We can only do this for the default trace clock.
*/
if (x86_pmu.intel_cap.pebs_format >= 3 &&
- event->attr.use_clockid == 0)
+ event->attr.use_clockid == 0) {
data->time = native_sched_clock_from_tsc(pebs->tsc);
+ data->sample_flags |= PERF_SAMPLE_TIME;
+ }

if (has_branch_stack(event))
data->br_stack = &cpuc->lbr_stack;
@@ -1697,8 +1699,10 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
perf_sample_data_init(data, 0, event->hw.last_period);
data->period = event->hw.last_period;

- if (event->attr.use_clockid == 0)
+ if (event->attr.use_clockid == 0) {
data->time = native_sched_clock_from_tsc(basic->tsc);
+ data->sample_flags |= PERF_SAMPLE_TIME;
+ }

/*
* We must however always use iregs for the unwinder to stay sane; the

Subject: [tip: perf/core] perf: Add sample_flags to indicate the PMU-filled sample data

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 3aac580d5cc3001ca1627725b3b61edb529f341d
Gitweb: https://git.kernel.org/tip/3aac580d5cc3001ca1627725b3b61edb529f341d
Author: Kan Liang <[email protected]>
AuthorDate: Thu, 01 Sep 2022 06:09:54 -07:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 06 Sep 2022 11:33:01 +02:00

perf: Add sample_flags to indicate the PMU-filled sample data

On some platforms, some data e.g., timestamps, can be retrieved from
the PMU driver. Usually, the data from the PMU driver is more accurate.
The current perf kernel should output the PMU-filled sample data if
it's available.

To check the availability of the PMU-filled sample data, the current
perf kernel initializes the related fields in the
perf_sample_data_init(). When outputting a sample, the perf checks
whether the field is updated by the PMU driver. If yes, the updated
value will be output. If not, the perf uses an SW way to calculate the
value or just outputs the initialized value if an SW way is unavailable
either.

With more and more data being provided by the PMU driver, more fields
has to be initialized in the perf_sample_data_init(). That will
increase the number of cache lines touched in perf_sample_data_init()
and be harmful to the performance.

Add new "sample_flags" to indicate the PMU-filled sample data. The PMU
driver should set the corresponding PERF_SAMPLE_ flag when the field is
updated. The initialization of the corresponding field is not required
anymore. The following patches will make use of it and remove the
corresponding fields from the perf_sample_data_init(), which will
further minimize the number of cache lines touched.

Only clear the sample flags that have already been done by the PMU
driver in the perf_prepare_sample() for the PERF_RECORD_SAMPLE. For the
other PERF_RECORD_ event type, the sample data is not available.

Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/perf_event.h | 2 ++
kernel/events/core.c | 17 +++++++++++------
2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1999408..0978165 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1008,6 +1008,7 @@ struct perf_sample_data {
* Fields set by perf_sample_data_init(), group so as to
* minimize the cachelines touched.
*/
+ u64 sample_flags;
u64 addr;
struct perf_raw_record *raw;
struct perf_branch_stack *br_stack;
@@ -1057,6 +1058,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
u64 addr, u64 period)
{
/* remaining struct members initialized in perf_prepare_sample() */
+ data->sample_flags = 0;
data->addr = addr;
data->raw = NULL;
data->br_stack = NULL;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2621fd2..c9b9cb7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6794,11 +6794,10 @@ out_put:

static void __perf_event_header__init_id(struct perf_event_header *header,
struct perf_sample_data *data,
- struct perf_event *event)
+ struct perf_event *event,
+ u64 sample_type)
{
- u64 sample_type = event->attr.sample_type;
-
- data->type = sample_type;
+ data->type = event->attr.sample_type;
header->size += event->id_header_size;

if (sample_type & PERF_SAMPLE_TID) {
@@ -6827,7 +6826,7 @@ void perf_event_header__init_id(struct perf_event_header *header,
struct perf_event *event)
{
if (event->attr.sample_id_all)
- __perf_event_header__init_id(header, data, event);
+ __perf_event_header__init_id(header, data, event, event->attr.sample_type);
}

static void __perf_event__output_id_sample(struct perf_output_handle *handle,
@@ -7303,6 +7302,7 @@ void perf_prepare_sample(struct perf_event_header *header,
struct pt_regs *regs)
{
u64 sample_type = event->attr.sample_type;
+ u64 filtered_sample_type;

header->type = PERF_RECORD_SAMPLE;
header->size = sizeof(*header) + event->header_size;
@@ -7310,7 +7310,12 @@ void perf_prepare_sample(struct perf_event_header *header,
header->misc = 0;
header->misc |= perf_misc_flags(regs);

- __perf_event_header__init_id(header, data, event);
+ /*
+ * Clear the sample flags that have already been done by the
+ * PMU driver.
+ */
+ filtered_sample_type = sample_type & ~data->sample_flags;
+ __perf_event_header__init_id(header, data, event, filtered_sample_type);

if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
data->ip = perf_instruction_pointer(regs);

Subject: [tip: perf/core] perf: Use sample_flags for data_src

The following commit has been merged into the perf/core branch of tip:

Commit-ID: e16fd7f2cb1a65555cfe76f983eaefb1eab7471f
Gitweb: https://git.kernel.org/tip/e16fd7f2cb1a65555cfe76f983eaefb1eab7471f
Author: Kan Liang <[email protected]>
AuthorDate: Thu, 01 Sep 2022 06:09:58 -07:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 06 Sep 2022 11:33:03 +02:00

perf: Use sample_flags for data_src

Use the new sample_flags to indicate whether the data_src field is
filled by the PMU driver.

Remove the data_src field from the perf_sample_data_init() to minimize
the number of cache lines touched.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/powerpc/perf/core-book3s.c | 4 +++-
arch/x86/events/intel/ds.c | 8 ++++++--
include/linux/perf_event.h | 3 +--
kernel/events/core.c | 3 +++
4 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index a5c95a2..6ec7069 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2301,8 +2301,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
}

if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
- ppmu->get_mem_data_src)
+ ppmu->get_mem_data_src) {
ppmu->get_mem_data_src(&data.data_src, ppmu->flags, regs);
+ data.sample_flags |= PERF_SAMPLE_DATA_SRC;
+ }

if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
ppmu->get_mem_weight) {
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 4c51118..bde73d4 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1535,8 +1535,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
/*
* data.data_src encodes the data source
*/
- if (sample_type & PERF_SAMPLE_DATA_SRC)
+ if (sample_type & PERF_SAMPLE_DATA_SRC) {
data->data_src.val = get_data_src(event, pebs->dse);
+ data->sample_flags |= PERF_SAMPLE_DATA_SRC;
+ }

/*
* We must however always use iregs for the unwinder to stay sane; the
@@ -1770,8 +1772,10 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
}

- if (sample_type & PERF_SAMPLE_DATA_SRC)
+ if (sample_type & PERF_SAMPLE_DATA_SRC) {
data->data_src.val = get_data_src(event, meminfo->aux);
+ data->sample_flags |= PERF_SAMPLE_DATA_SRC;
+ }

if (sample_type & PERF_SAMPLE_ADDR_TYPE)
data->addr = meminfo->address;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 06a587b..6849f10 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1013,7 +1013,6 @@ struct perf_sample_data {
struct perf_raw_record *raw;
u64 period;
u64 txn;
- union perf_mem_data_src data_src;

/*
* The other fields, optionally {set,used} by
@@ -1021,6 +1020,7 @@ struct perf_sample_data {
*/
struct perf_branch_stack *br_stack;
union perf_sample_weight weight;
+ union perf_mem_data_src data_src;

u64 type;
u64 ip;
@@ -1063,7 +1063,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->addr = addr;
data->raw = NULL;
data->period = period;
- data->data_src.val = PERF_MEM_NA;
data->txn = 0;
}

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f0af45d..163e2f4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7411,6 +7411,9 @@ void perf_prepare_sample(struct perf_event_header *header,
if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE)
data->weight.full = 0;

+ if (filtered_sample_type & PERF_SAMPLE_DATA_SRC)
+ data->data_src.val = PERF_MEM_NA;
+
if (sample_type & PERF_SAMPLE_REGS_INTR) {
/* regs dump ABI info */
int size = sizeof(u64);

Subject: [tip: perf/core] perf: Use sample_flags for branch stack

The following commit has been merged into the perf/core branch of tip:

Commit-ID: a9a931e2666878343782c82d7d55cc173ddeb3e9
Gitweb: https://git.kernel.org/tip/a9a931e2666878343782c82d7d55cc173ddeb3e9
Author: Kan Liang <[email protected]>
AuthorDate: Thu, 01 Sep 2022 06:09:56 -07:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 06 Sep 2022 11:33:02 +02:00

perf: Use sample_flags for branch stack

Use the new sample_flags to indicate whether the branch stack is filled
by the PMU driver.

Remove the br_stack from the perf_sample_data_init() to minimize the number
of cache lines touched.

Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/powerpc/perf/core-book3s.c | 1 +
arch/x86/events/amd/core.c | 4 +++-
arch/x86/events/core.c | 4 +++-
arch/x86/events/intel/core.c | 4 +++-
arch/x86/events/intel/ds.c | 5 ++++-
include/linux/perf_event.h | 4 ++--
kernel/events/core.c | 4 ++--
7 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 13919eb..1ad1efd 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2297,6 +2297,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
cpuhw = this_cpu_ptr(&cpu_hw_events);
power_pmu_bhrb_read(event, cpuhw);
data.br_stack = &cpuhw->bhrb_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
}

if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 36bede1..bd99d2a 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -929,8 +929,10 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
if (!x86_perf_event_set_period(event))
continue;

- if (has_branch_stack(event))
+ if (has_branch_stack(event)) {
data.br_stack = &cpuc->lbr_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+ }

if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f969410..bb34a28 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1714,8 +1714,10 @@ int x86_pmu_handle_irq(struct pt_regs *regs)

perf_sample_data_init(&data, 0, event->hw.last_period);

- if (has_branch_stack(event))
+ if (has_branch_stack(event)) {
data.br_stack = &cpuc->lbr_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+ }

if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2db9349..ba101c2 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2995,8 +2995,10 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)

perf_sample_data_init(&data, 0, event->hw.last_period);

- if (has_branch_stack(event))
+ if (has_branch_stack(event)) {
data.br_stack = &cpuc->lbr_stack;
+ data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+ }

if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index cdd857b..0489f75 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1640,8 +1640,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
data->sample_flags |= PERF_SAMPLE_TIME;
}

- if (has_branch_stack(event))
+ if (has_branch_stack(event)) {
data->br_stack = &cpuc->lbr_stack;
+ data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
+ }
}

static void adaptive_pebs_save_regs(struct pt_regs *regs,
@@ -1791,6 +1793,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
if (has_branch_stack(event)) {
intel_pmu_store_pebs_lbrs(lbr);
data->br_stack = &cpuc->lbr_stack;
+ data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
}
}

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0978165..1e12e79 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1011,7 +1011,6 @@ struct perf_sample_data {
u64 sample_flags;
u64 addr;
struct perf_raw_record *raw;
- struct perf_branch_stack *br_stack;
u64 period;
union perf_sample_weight weight;
u64 txn;
@@ -1021,6 +1020,8 @@ struct perf_sample_data {
* The other fields, optionally {set,used} by
* perf_{prepare,output}_sample().
*/
+ struct perf_branch_stack *br_stack;
+
u64 type;
u64 ip;
struct {
@@ -1061,7 +1062,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->sample_flags = 0;
data->addr = addr;
data->raw = NULL;
- data->br_stack = NULL;
data->period = period;
data->weight.full = 0;
data->data_src.val = PERF_MEM_NA;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c9b9cb7..104c0c9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7052,7 +7052,7 @@ void perf_output_sample(struct perf_output_handle *handle,
}

if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
- if (data->br_stack) {
+ if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
size_t size;

size = data->br_stack->nr
@@ -7358,7 +7358,7 @@ void perf_prepare_sample(struct perf_event_header *header,

if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
int size = sizeof(u64); /* nr */
- if (data->br_stack) {
+ if (data->sample_flags & PERF_SAMPLE_BRANCH_STACK) {
if (perf_sample_save_hw_index(event))
size += sizeof(u64);

2022-09-06 15:42:37

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2 4/6] perf: Use sample_flags for weight



On 2022-09-02 1:29 a.m., Namhyung Kim wrote:
> On Thu, Sep 1, 2022 at 6:10 AM <[email protected]> wrote:
>>
>> From: Kan Liang <[email protected]>
>>
>> Use the new sample_flags to indicate whether the weight field is filled
>> by the PMU driver.
>>
>> Remove the weight field from the perf_sample_data_init() to minimize the
>> number of cache lines touched.
>>
>> Signed-off-by: Kan Liang <[email protected]>
>> ---
>> arch/powerpc/perf/core-book3s.c | 5 +++--
>> arch/x86/events/intel/ds.c | 10 +++++++---
>> include/linux/perf_event.h | 3 +--
>> kernel/events/core.c | 3 +++
>> 4 files changed, 14 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index 1ad1efdb33f9..a5c95a2006ea 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -2305,9 +2305,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
>> ppmu->get_mem_data_src(&data.data_src, ppmu->flags, regs);
>>
>> if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
>> - ppmu->get_mem_weight)
>> + ppmu->get_mem_weight) {
>> ppmu->get_mem_weight(&data.weight.full, event->attr.sample_type);
>> -
>> + data.sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
>> + }
>> if (perf_event_overflow(event, &data, regs))
>> power_pmu_stop(event, 0);
>> } else if (period) {
>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>> index 5dcfd2de6ef8..f66a4905cc87 100644
>> --- a/arch/x86/events/intel/ds.c
>> +++ b/arch/x86/events/intel/ds.c
>> @@ -1535,8 +1535,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
>> /*
>> * Use latency for weight (only avail with PEBS-LL)
>> */
>> - if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE))
>> + if (fll && (sample_type & PERF_SAMPLE_WEIGHT_TYPE)) {
>> data->weight.full = pebs->lat;
>> + data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
>> + }
>>
>> /*
>> * data.data_src encodes the data source
>> @@ -1628,9 +1630,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
>>
>> if (x86_pmu.intel_cap.pebs_format >= 2) {
>> /* Only set the TSX weight when no memory weight. */
>> - if ((sample_type & PERF_SAMPLE_WEIGHT_TYPE) && !fll)
>> + if ((sample_type & PERF_SAMPLE_WEIGHT_TYPE) && !fll) {
>> data->weight.full = intel_get_tsx_weight(pebs->tsx_tuning);
>> -
>> + data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
>> + }
>> if (sample_type & PERF_SAMPLE_TRANSACTION)
>> data->txn = intel_get_tsx_transaction(pebs->tsx_tuning,
>> pebs->ax);
>> @@ -1772,6 +1775,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
>> data->weight.var1_dw = (u32)(weight & PEBS_LATENCY_MASK) ?:
>> intel_get_tsx_weight(meminfo->tsx_tuning);
>> }
>> + data->sample_flags |= PERF_SAMPLE_WEIGHT_TYPE;
>
> I was thinking about splitting PERF_SAMPLE_WEIGHT and
> PERF_SAMPLE_WEIGHT_STRUCT but it'd just add complexity
> unnecessarily?

Right, I don't think it's necessary. The code path is to handle the
weight. We use the same way to handle the old and new weight flags. We
should have both set.

Thanks,
Kan
>
> Thanks,
> Namhyung
>
>> }
>>
>> if (sample_type & PERF_SAMPLE_DATA_SRC)
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 1e12e79454e0..06a587b5faa9 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -1012,7 +1012,6 @@ struct perf_sample_data {
>> u64 addr;
>> struct perf_raw_record *raw;
>> u64 period;
>> - union perf_sample_weight weight;
>> u64 txn;
>> union perf_mem_data_src data_src;
>>
>> @@ -1021,6 +1020,7 @@ struct perf_sample_data {
>> * perf_{prepare,output}_sample().
>> */
>> struct perf_branch_stack *br_stack;
>> + union perf_sample_weight weight;
>>
>> u64 type;
>> u64 ip;
>> @@ -1063,7 +1063,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
>> data->addr = addr;
>> data->raw = NULL;
>> data->period = period;
>> - data->weight.full = 0;
>> data->data_src.val = PERF_MEM_NA;
>> data->txn = 0;
>> }
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 104c0c9f4e6f..f0af45db02b3 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -7408,6 +7408,9 @@ void perf_prepare_sample(struct perf_event_header *header,
>> header->size += size;
>> }
>>
>> + if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE)
>> + data->weight.full = 0;
>> +
>> if (sample_type & PERF_SAMPLE_REGS_INTR) {
>> /* regs dump ABI info */
>> int size = sizeof(u64);
>> --
>> 2.35.1
>>

2022-09-27 11:53:50

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH V2 3/6] perf: Use sample_flags for branch stack

On Thu, Sep 01, 2022 at 06:09:56AM -0700, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> Use the new sample_flags to indicate whether the branch stack is filled
> by the PMU driver.
>
> Remove the br_stack from the perf_sample_data_init() to minimize the number
> of cache lines touched.
>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> arch/powerpc/perf/core-book3s.c | 1 +
> arch/x86/events/amd/core.c | 4 +++-
> arch/x86/events/core.c | 4 +++-
> arch/x86/events/intel/core.c | 4 +++-
> arch/x86/events/intel/ds.c | 5 ++++-
> include/linux/perf_event.h | 4 ++--
> kernel/events/core.c | 4 ++--
> 7 files changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 13919eb96931..1ad1efdb33f9 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -2297,6 +2297,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
> cpuhw = this_cpu_ptr(&cpu_hw_events);
> power_pmu_bhrb_read(event, cpuhw);
> data.br_stack = &cpuhw->bhrb_stack;
> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
> }
>
> if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
> index 9fbdfbcaf25a..8b70237c33f7 100644
> --- a/arch/x86/events/amd/core.c
> +++ b/arch/x86/events/amd/core.c
> @@ -929,8 +929,10 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
> if (!x86_perf_event_set_period(event))
> continue;
>
> - if (has_branch_stack(event))
> + if (has_branch_stack(event)) {
> data.br_stack = &cpuc->lbr_stack;
> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
> + }
>
> if (perf_event_overflow(event, &data, regs))
> x86_pmu_stop(event, 0);
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index c3d6d139a3c1..b30b8bbcd1e2 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1701,8 +1701,10 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>
> perf_sample_data_init(&data, 0, event->hw.last_period);
>
> - if (has_branch_stack(event))
> + if (has_branch_stack(event)) {
> data.br_stack = &cpuc->lbr_stack;
> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
> + }
>
> if (perf_event_overflow(event, &data, regs))
> x86_pmu_stop(event, 0);
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 7c1e3d36bc65..b5c02627a155 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3017,8 +3017,10 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
>
> perf_sample_data_init(&data, 0, event->hw.last_period);
>
> - if (has_branch_stack(event))
> + if (has_branch_stack(event)) {
> data.br_stack = &cpuc->lbr_stack;
> + data.sample_flags |= PERF_SAMPLE_BRANCH_STACK;
> + }
>
> if (perf_event_overflow(event, &data, regs))
> x86_pmu_stop(event, 0);
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 01cbe26225c2..5dcfd2de6ef8 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1648,8 +1648,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
> data->sample_flags |= PERF_SAMPLE_TIME;
> }
>
> - if (has_branch_stack(event))
> + if (has_branch_stack(event)) {
> data->br_stack = &cpuc->lbr_stack;
> + data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
> + }
> }
>
> static void adaptive_pebs_save_regs(struct pt_regs *regs,
> @@ -1799,6 +1801,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
> if (has_branch_stack(event)) {
> intel_pmu_store_pebs_lbrs(lbr);
> data->br_stack = &cpuc->lbr_stack;
> + data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
> }
> }
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0978165a2d87..1e12e79454e0 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1011,7 +1011,6 @@ struct perf_sample_data {
> u64 sample_flags;
> u64 addr;
> struct perf_raw_record *raw;
> - struct perf_branch_stack *br_stack;
> u64 period;
> union perf_sample_weight weight;
> u64 txn;
> @@ -1021,6 +1020,8 @@ struct perf_sample_data {
> * The other fields, optionally {set,used} by
> * perf_{prepare,output}_sample().
> */
> + struct perf_branch_stack *br_stack;
> +
> u64 type;
> u64 ip;
> struct {
> @@ -1061,7 +1062,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
> data->sample_flags = 0;
> data->addr = addr;
> data->raw = NULL;
> - data->br_stack = NULL;

hi,
there's one more place, I'll send full patch for that

jirka


---
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index b05f0310dbd3..98abc6ebb8ea 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1687,6 +1687,9 @@ BPF_CALL_4(bpf_read_branch_records, struct bpf_perf_event_data_kern *, ctx,
if (unlikely(flags & ~BPF_F_GET_BRANCH_RECORDS_SIZE))
return -EINVAL;

+ if (unlikely(!(ctx->data->sample_flags & PERF_SAMPLE_BRANCH_STACK)))
+ return -ENOENT;
+
if (unlikely(!br_stack))
return -ENOENT;

2022-10-11 18:24:06

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH V2 2/6] perf/x86/intel/pebs: Fix PEBS timestamps overwritten

Hello,

On Thu, Sep 1, 2022 at 6:10 AM <[email protected]> wrote:
>
> From: Kan Liang <[email protected]>
>
> The PEBS TSC-based timestamps do not appear correctly in the final
> perf.data output file from perf record.
>
> The data->time field setup by PEBS in the setup_pebs_fixed_sample_data()
> is later overwritten by perf_events generic code in
> perf_prepare_sample(). There is an ordering problem.
>
> Set the sample flags when the data->time is updated by PEBS.
> The data->time field will not be overwritten anymore.

I have a report that it breaks the symbolization of samples.
It seems time is not in sync between perf_clock and PEBS.

One thing I noticed is that the system has a config option
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y.
Looking at the code, it seems sched_clock is doing some
adjustments in that case. So I'm not sure if it'd work well
on those systems.

Thoughts?

Thanks,
Namhyung


>
> Reported-by: Andreas Kogler <[email protected]>
> Reported-by: Stephane Eranian <[email protected]>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> arch/x86/events/intel/ds.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index de1f55d51784..01cbe26225c2 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1643,8 +1643,10 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
> * We can only do this for the default trace clock.
> */
> if (x86_pmu.intel_cap.pebs_format >= 3 &&
> - event->attr.use_clockid == 0)
> + event->attr.use_clockid == 0) {
> data->time = native_sched_clock_from_tsc(pebs->tsc);
> + data->sample_flags |= PERF_SAMPLE_TIME;
> + }
>
> if (has_branch_stack(event))
> data->br_stack = &cpuc->lbr_stack;
> @@ -1705,8 +1707,10 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
> perf_sample_data_init(data, 0, event->hw.last_period);
> data->period = event->hw.last_period;
>
> - if (event->attr.use_clockid == 0)
> + if (event->attr.use_clockid == 0) {
> data->time = native_sched_clock_from_tsc(basic->tsc);
> + data->sample_flags |= PERF_SAMPLE_TIME;
> + }
>
> /*
> * We must however always use iregs for the unwinder to stay sane; the
> --
> 2.35.1
>

2022-10-12 08:16:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V2 2/6] perf/x86/intel/pebs: Fix PEBS timestamps overwritten

On Tue, Oct 11, 2022 at 11:20:54AM -0700, Namhyung Kim wrote:

> One thing I noticed is that the system has a config option
> CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y.

You can't build x86 without that.

2022-10-12 16:23:10

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH V2 2/6] perf/x86/intel/pebs: Fix PEBS timestamps overwritten

Hi Peter,

On Wed, Oct 12, 2022 at 1:05 AM Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Oct 11, 2022 at 11:20:54AM -0700, Namhyung Kim wrote:
>
> > One thing I noticed is that the system has a config option
> > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y.
>
> You can't build x86 without that.

Oh, I didn't know that.. so it's not a config problem.

Kan, could you check the following command?

$ perf record -e cycles:upp dd if=/dev/zero of=/dev/null count=10000

Thanks,
Namhyung