2023-01-12 22:03:23

by Namhyung Kim

[permalink] [raw]
Subject: [PATCHSET 0/8] perf/core: Prepare sample data for BPF

Hello,

The perf_prepare_sample() is to fill the perf sample data and update the
header info before sending it to the ring buffer. But we want to use it
for BPF overflow handler so that it can access the sample data to filter
relevant ones.

Changes in v2)
* the layout change is merged
* reduce branches using __cond_set (PeterZ)
* add helpers to set dynamic sample data (PeterZ)
* introduce perf_prepare_header() (PeterZ)
* call perf_prepare_sample() before bpf_overflow_handler unconditionally

This means the perf_prepare_handler() can be called more than once. To
avoid duplicate work, use the data->sample_flags and save the data size.

I also added a few of helpers to set those information accordingly.
But it looks some fields like REGS_USER, STACK_USER and AUX are saved in
the perf_prepare_sample() so I didn't add the helpers for them.

After than we can just check the filtered_sample_type flags begin zero
to determine if it has more work. In that case, it needs to update the
data->type since it's possible to miss when PMU driver sets all required
sample flags before calling perf_prepare_sample().

The code is also available at 'perf/prepare-sample-v2' branch in

git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Cc: Song Liu <[email protected]>
Cc: [email protected]


Namhyung Kim (8):
perf/core: Save the dynamic parts of sample data size
perf/core: Add perf_sample_save_callchain() helper
perf/core: Add perf_sample_save_raw_data() helper
perf/core: Add perf_sample_save_brstack() helper
perf/core: Set data->sample_flags in perf_prepare_sample()
perf/core: Do not pass header for sample id init
perf/core: Introduce perf_prepare_header()
perf/core: Call perf_prepare_sample() before running BPF

arch/powerpc/perf/core-book3s.c | 3 +-
arch/s390/kernel/perf_cpum_cf.c | 4 +-
arch/s390/kernel/perf_cpum_sf.c | 3 +-
arch/s390/kernel/perf_pai_crypto.c | 4 +-
arch/s390/kernel/perf_pai_ext.c | 4 +-
arch/x86/events/amd/core.c | 6 +-
arch/x86/events/amd/ibs.c | 9 +-
arch/x86/events/intel/core.c | 6 +-
arch/x86/events/intel/ds.c | 24 ++--
include/linux/perf_event.h | 133 +++++++++++++-----
kernel/events/core.c | 207 ++++++++++++++++-------------
kernel/trace/bpf_trace.c | 6 +-
12 files changed, 236 insertions(+), 173 deletions(-)


base-commit: 9fcad995c6c52cc9791f7ee9f1386a5684055f9c
--
2.39.0.314.g84b9a713c41-goog


2023-01-12 22:03:30

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 2/8] perf/core: Add perf_sample_save_callchain() helper

When it saves the callchain to the perf sample data, it needs to update
the sample flags and the dynamic size. To make sure this, add the
perf_sample_save_callchain() helper and convert all call sites.

Cc: [email protected]
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
arch/x86/events/amd/ibs.c | 6 ++----
arch/x86/events/intel/ds.c | 12 ++++--------
include/linux/perf_event.h | 16 +++++++++++++++-
kernel/events/core.c | 12 ++----------
4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index da3f5ebac4e1..417c80bd3274 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -1122,10 +1122,8 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
* recorded as part of interrupt regs. Thus we need to use rip from
* interrupt regs while unwinding call stack.
*/
- if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
- data.callchain = perf_callchain(event, iregs);
- data.sample_flags |= PERF_SAMPLE_CALLCHAIN;
- }
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+ perf_sample_save_callchain(&data, event, iregs);

throttle = perf_event_overflow(event, &data, &regs);
out:
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 6ec326b47e2e..158cf845fc80 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1617,10 +1617,8 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
* previous PMI context or an (I)RET happened between the record and
* PMI.
*/
- if (sample_type & PERF_SAMPLE_CALLCHAIN) {
- data->callchain = perf_callchain(event, iregs);
- data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
- }
+ if (sample_type & PERF_SAMPLE_CALLCHAIN)
+ perf_sample_save_callchain(data, event, iregs);

/*
* We use the interrupt regs as a base because the PEBS record does not
@@ -1795,10 +1793,8 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
* previous PMI context or an (I)RET happened between the record and
* PMI.
*/
- if (sample_type & PERF_SAMPLE_CALLCHAIN) {
- data->callchain = perf_callchain(event, iregs);
- data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
- }
+ if (sample_type & PERF_SAMPLE_CALLCHAIN)
+ perf_sample_save_callchain(data, event, iregs);

*regs = *iregs;
/* The ip in basic is EventingIP */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 16b980014449..a9419608402b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1095,6 +1095,8 @@ int perf_event_read_local(struct perf_event *event, u64 *value,
extern u64 perf_event_read_value(struct perf_event *event,
u64 *enabled, u64 *running);

+extern struct perf_callchain_entry *perf_callchain(struct perf_event *event, struct pt_regs *regs);
+

struct perf_sample_data {
/*
@@ -1167,6 +1169,19 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
}
}

+static inline void perf_sample_save_callchain(struct perf_sample_data *data,
+ struct perf_event *event,
+ struct pt_regs *regs)
+{
+ int size = 1;
+
+ data->callchain = perf_callchain(event, regs);
+ size += data->callchain->nr;
+
+ data->dyn_size += size * sizeof(u64);
+ data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
+}
+
/*
* Clear all bitfields in the perf_branch_entry.
* The to and from fields are not cleared because they are
@@ -1408,7 +1423,6 @@ extern void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct
extern struct perf_callchain_entry *
get_perf_callchain(struct pt_regs *regs, u32 init_nr, bool kernel, bool user,
u32 max_stack, bool crosstask, bool add_mark);
-extern struct perf_callchain_entry *perf_callchain(struct perf_event *event, struct pt_regs *regs);
extern int get_callchain_buffers(int max_stack);
extern void put_callchain_buffers(void);
extern struct perf_callchain_entry *get_callchain_entry(int *rctx);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8c8de26f04ab..0fba98b9cd65 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7585,16 +7585,8 @@ void perf_prepare_sample(struct perf_event_header *header,
if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
data->ip = perf_instruction_pointer(regs);

- if (sample_type & PERF_SAMPLE_CALLCHAIN) {
- int size = 1;
-
- if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN)
- data->callchain = perf_callchain(event, regs);
-
- size += data->callchain->nr;
-
- data->dyn_size += size * sizeof(u64);
- }
+ if (filtered_sample_type & PERF_SAMPLE_CALLCHAIN)
+ perf_sample_save_callchain(data, event, regs);

if (sample_type & PERF_SAMPLE_RAW) {
struct perf_raw_record *raw = data->raw;
--
2.39.0.314.g84b9a713c41-goog

2023-01-12 22:40:59

by Namhyung Kim

[permalink] [raw]
Subject: [PATCH 1/8] perf/core: Save the dynamic parts of sample data size

The perf sample data can be divided into parts. The event->header_size
and event->id_header_size keep the static part of the sample data which
is determined by the sample_type flags.

But other parts like CALLCHAIN and BRANCH_STACK are changing dynamically
so it needs to see the actual data. In preparation of handling repeated
calls for perf_prepare_sample(), it can save the dynamic size to the
perf sample data to avoid the duplicate work.

Signed-off-by: Namhyung Kim <[email protected]>
---
include/linux/perf_event.h | 2 ++
kernel/events/core.c | 17 ++++++++++-------
2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 03949d017ac9..16b980014449 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1103,6 +1103,7 @@ struct perf_sample_data {
*/
u64 sample_flags;
u64 period;
+ u64 dyn_size;

/*
* Fields commonly set by __perf_event_header__init_id(),
@@ -1158,6 +1159,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
/* remaining struct members initialized in perf_prepare_sample() */
data->sample_flags = PERF_SAMPLE_PERIOD;
data->period = period;
+ data->dyn_size = 0;

if (addr) {
data->addr = addr;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index eacc3702654d..8c8de26f04ab 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7593,7 +7593,7 @@ void perf_prepare_sample(struct perf_event_header *header,

size += data->callchain->nr;

- header->size += size * sizeof(u64);
+ data->dyn_size += size * sizeof(u64);
}

if (sample_type & PERF_SAMPLE_RAW) {
@@ -7619,7 +7619,7 @@ void perf_prepare_sample(struct perf_event_header *header,
data->raw = NULL;
}

- header->size += size;
+ data->dyn_size += size;
}

if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
@@ -7631,7 +7631,7 @@ void perf_prepare_sample(struct perf_event_header *header,
size += data->br_stack->nr
* sizeof(struct perf_branch_entry);
}
- header->size += size;
+ data->dyn_size += size;
}

if (sample_type & (PERF_SAMPLE_REGS_USER | PERF_SAMPLE_STACK_USER))
@@ -7646,7 +7646,7 @@ void perf_prepare_sample(struct perf_event_header *header,
size += hweight64(mask) * sizeof(u64);
}

- header->size += size;
+ data->dyn_size += size;
}

if (sample_type & PERF_SAMPLE_STACK_USER) {
@@ -7671,7 +7671,7 @@ void perf_prepare_sample(struct perf_event_header *header,
size += sizeof(u64) + stack_size;

data->stack_user_size = stack_size;
- header->size += size;
+ data->dyn_size += size;
}

if (filtered_sample_type & PERF_SAMPLE_WEIGHT_TYPE)
@@ -7700,7 +7700,7 @@ void perf_prepare_sample(struct perf_event_header *header,
size += hweight64(mask) * sizeof(u64);
}

- header->size += size;
+ data->dyn_size += size;
}

if (sample_type & PERF_SAMPLE_PHYS_ADDR &&
@@ -7745,8 +7745,11 @@ void perf_prepare_sample(struct perf_event_header *header,
size = perf_prepare_sample_aux(event, data, size);

WARN_ON_ONCE(size + header->size > U16_MAX);
- header->size += size;
+ data->dyn_size += size + sizeof(u64); /* size above */
}
+
+ header->size += data->dyn_size;
+
/*
* If you're adding more sample types here, you likely need to do
* something about the overflowing header::size, like repurpose the
--
2.39.0.314.g84b9a713c41-goog

2023-01-13 11:52:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCHSET 0/8] perf/core: Prepare sample data for BPF

On Thu, Jan 12, 2023 at 01:40:07PM -0800, Namhyung Kim wrote:

> Namhyung Kim (8):
> perf/core: Save the dynamic parts of sample data size
> perf/core: Add perf_sample_save_callchain() helper
> perf/core: Add perf_sample_save_raw_data() helper
> perf/core: Add perf_sample_save_brstack() helper
> perf/core: Set data->sample_flags in perf_prepare_sample()
> perf/core: Do not pass header for sample id init
> perf/core: Introduce perf_prepare_header()
> perf/core: Call perf_prepare_sample() before running BPF

Aside from a few small niggles, this looks really good, Thanks!

2023-01-13 13:57:17

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCHSET 0/8] perf/core: Prepare sample data for BPF

On Thu, Jan 12, 2023 at 01:40:07PM -0800, Namhyung Kim wrote:
> Hello,
>
> The perf_prepare_sample() is to fill the perf sample data and update the
> header info before sending it to the ring buffer. But we want to use it
> for BPF overflow handler so that it can access the sample data to filter
> relevant ones.
>
> Changes in v2)
> * the layout change is merged
> * reduce branches using __cond_set (PeterZ)
> * add helpers to set dynamic sample data (PeterZ)
> * introduce perf_prepare_header() (PeterZ)
> * call perf_prepare_sample() before bpf_overflow_handler unconditionally
>
> This means the perf_prepare_handler() can be called more than once. To
> avoid duplicate work, use the data->sample_flags and save the data size.
>
> I also added a few of helpers to set those information accordingly.
> But it looks some fields like REGS_USER, STACK_USER and AUX are saved in
> the perf_prepare_sample() so I didn't add the helpers for them.
>
> After than we can just check the filtered_sample_type flags begin zero
> to determine if it has more work. In that case, it needs to update the
> data->type since it's possible to miss when PMU driver sets all required
> sample flags before calling perf_prepare_sample().
>
> The code is also available at 'perf/prepare-sample-v2' branch in
>
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> Thanks,
> Namhyung
>
>
> Cc: Song Liu <[email protected]>
> Cc: [email protected]
>
>
> Namhyung Kim (8):
> perf/core: Save the dynamic parts of sample data size
> perf/core: Add perf_sample_save_callchain() helper
> perf/core: Add perf_sample_save_raw_data() helper
> perf/core: Add perf_sample_save_brstack() helper
> perf/core: Set data->sample_flags in perf_prepare_sample()
> perf/core: Do not pass header for sample id init
> perf/core: Introduce perf_prepare_header()
> perf/core: Call perf_prepare_sample() before running BPF

lgtm, I ran the bpf selftests on top of that and it's ok

jirka

>
> arch/powerpc/perf/core-book3s.c | 3 +-
> arch/s390/kernel/perf_cpum_cf.c | 4 +-
> arch/s390/kernel/perf_cpum_sf.c | 3 +-
> arch/s390/kernel/perf_pai_crypto.c | 4 +-
> arch/s390/kernel/perf_pai_ext.c | 4 +-
> arch/x86/events/amd/core.c | 6 +-
> arch/x86/events/amd/ibs.c | 9 +-
> arch/x86/events/intel/core.c | 6 +-
> arch/x86/events/intel/ds.c | 24 ++--
> include/linux/perf_event.h | 133 +++++++++++++-----
> kernel/events/core.c | 207 ++++++++++++++++-------------
> kernel/trace/bpf_trace.c | 6 +-
> 12 files changed, 236 insertions(+), 173 deletions(-)
>
>
> base-commit: 9fcad995c6c52cc9791f7ee9f1386a5684055f9c
> --
> 2.39.0.314.g84b9a713c41-goog
>

2023-01-13 21:08:36

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH 1/8] perf/core: Save the dynamic parts of sample data size

On Thu, Jan 12, 2023 at 1:40 PM Namhyung Kim <[email protected]> wrote:
>
> The perf sample data can be divided into parts. The event->header_size
> and event->id_header_size keep the static part of the sample data which
> is determined by the sample_type flags.
>
> But other parts like CALLCHAIN and BRANCH_STACK are changing dynamically
> so it needs to see the actual data. In preparation of handling repeated
> calls for perf_prepare_sample(), it can save the dynamic size to the
> perf sample data to avoid the duplicate work.
>
> Signed-off-by: Namhyung Kim <[email protected]>

Acked-by: Song Liu <[email protected]>

2023-01-13 21:09:36

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH 2/8] perf/core: Add perf_sample_save_callchain() helper

On Thu, Jan 12, 2023 at 1:40 PM Namhyung Kim <[email protected]> wrote:
>
> When it saves the callchain to the perf sample data, it needs to update
> the sample flags and the dynamic size. To make sure this, add the
> perf_sample_save_callchain() helper and convert all call sites.
>
> Cc: [email protected]
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>

Acked-by: Song Liu <[email protected]>

2023-01-17 08:17:23

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCHSET 0/8] perf/core: Prepare sample data for BPF

Hi Jiri,

On Fri, Jan 13, 2023 at 5:26 AM Jiri Olsa <[email protected]> wrote:
>
> On Thu, Jan 12, 2023 at 01:40:07PM -0800, Namhyung Kim wrote:
> > Hello,
> >
> > The perf_prepare_sample() is to fill the perf sample data and update the
> > header info before sending it to the ring buffer. But we want to use it
> > for BPF overflow handler so that it can access the sample data to filter
> > relevant ones.
> >
> > Changes in v2)
> > * the layout change is merged
> > * reduce branches using __cond_set (PeterZ)
> > * add helpers to set dynamic sample data (PeterZ)
> > * introduce perf_prepare_header() (PeterZ)
> > * call perf_prepare_sample() before bpf_overflow_handler unconditionally
> >
> > This means the perf_prepare_handler() can be called more than once. To
> > avoid duplicate work, use the data->sample_flags and save the data size.
> >
> > I also added a few of helpers to set those information accordingly.
> > But it looks some fields like REGS_USER, STACK_USER and AUX are saved in
> > the perf_prepare_sample() so I didn't add the helpers for them.
> >
> > After than we can just check the filtered_sample_type flags begin zero
> > to determine if it has more work. In that case, it needs to update the
> > data->type since it's possible to miss when PMU driver sets all required
> > sample flags before calling perf_prepare_sample().
> >
> > The code is also available at 'perf/prepare-sample-v2' branch in
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> >
> > Thanks,
> > Namhyung
> >
> >
> > Cc: Song Liu <[email protected]>
> > Cc: [email protected]
> >
> >
> > Namhyung Kim (8):
> > perf/core: Save the dynamic parts of sample data size
> > perf/core: Add perf_sample_save_callchain() helper
> > perf/core: Add perf_sample_save_raw_data() helper
> > perf/core: Add perf_sample_save_brstack() helper
> > perf/core: Set data->sample_flags in perf_prepare_sample()
> > perf/core: Do not pass header for sample id init
> > perf/core: Introduce perf_prepare_header()
> > perf/core: Call perf_prepare_sample() before running BPF
>
> lgtm, I ran the bpf selftests on top of that and it's ok

Thanks for your review. I'll add your Acked-by and Tested-by
in the v3. :)

Thanks,
Namhyung