2024-01-11 08:19:44

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V4 00/11] perf/core: Add ability for an event to "pause" or "resume" AUX area tracing

Hi

Hardware traces, such as instruction traces, can produce a vast amount of
trace data, so being able to reduce tracing to more specific circumstances
can be useful.

The ability to pause or resume tracing when another event happens, can do
that.

These patches add such a facilty and show how it would work for Intel
Processor Trace.

Maintainers of other AUX area tracing implementations are requested to
consider if this is something they might employ and then whether or not
the ABI would work for them.

Changes to perf tools are now (since V4) fleshed out.


Changes in V4:

perf/core: Add aux_pause, aux_resume, aux_start_paused
Rename aux_output_cfg -> aux_action
Reorder aux_action bits from:
aux_pause, aux_resume, aux_start_paused
to:
aux_start_paused, aux_pause, aux_resume
Fix aux_action bits __u64 -> __u32

coresight: Have a stab at support for pause / resume
Dropped

perf tools
All new patches

Changes in RFC V3:

coresight: Have a stab at support for pause / resume
'mode' -> 'flags' so it at least compiles

Changes in RFC V2:

Use ->stop() / ->start() instead of ->pause_resume()
Move aux_start_paused bit into aux_output_cfg
Tighten up when Intel PT pause / resume is allowed
Add an example of how it might work for CoreSight


Adrian Hunter (11):
perf/core: Add aux_pause, aux_resume, aux_start_paused
perf/x86/intel/pt: Add support for pause / resume
perf tools: Enable evsel__is_aux_event() to work for ARM/ARM64
perf tools: Enable evsel__is_aux_event() to work for S390_CPUMSF
perf tools: Add aux_start_paused, aux_pause and aux_resume
perf tools: Add aux-action config term
perf tools: Parse aux-action
perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resume
perf intel-pt: Improve man page format
perf intel-pt: Add documentation for pause / resume
perf intel-pt: Add a test for pause / resume

arch/x86/events/intel/pt.c | 63 +++-
arch/x86/events/intel/pt.h | 4 +
include/linux/perf_event.h | 15 +
include/uapi/linux/perf_event.h | 11 +-
kernel/events/core.c | 72 +++-
kernel/events/internal.h | 1 +
tools/include/uapi/linux/perf_event.h | 11 +-
tools/perf/Documentation/perf-intel-pt.txt | 558 +++++++++++++++++------------
tools/perf/Documentation/perf-record.txt | 4 +
tools/perf/arch/arm/util/pmu.c | 3 +
tools/perf/builtin-record.c | 4 +-
tools/perf/tests/shell/test_intel_pt.sh | 28 ++
tools/perf/util/auxtrace.c | 67 +++-
tools/perf/util/auxtrace.h | 6 +-
tools/perf/util/evsel.c | 13 +-
tools/perf/util/evsel.h | 1 +
tools/perf/util/evsel_config.h | 1 +
tools/perf/util/parse-events.c | 10 +
tools/perf/util/parse-events.h | 1 +
tools/perf/util/parse-events.l | 1 +
tools/perf/util/perf_event_attr_fprintf.c | 3 +
tools/perf/util/pmu.c | 6 +-
22 files changed, 645 insertions(+), 238 deletions(-)


Regards
Adrian


2024-01-11 08:20:16

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V4 02/11] perf/x86/intel/pt: Add support for pause / resume

Prevent tracing to start if aux_paused.

Implement support for PERF_EF_PAUSE / PERF_EF_RESUME. When aux_paused, stop
tracing. When not aux_paused, only start tracing if it isn't currently
meant to be stopped.

Signed-off-by: Adrian Hunter <[email protected]>
---
arch/x86/events/intel/pt.c | 63 ++++++++++++++++++++++++++++++++++++--
arch/x86/events/intel/pt.h | 4 +++
2 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 8e2a12235e62..b6e838f2c6d5 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -418,6 +418,9 @@ static void pt_config_start(struct perf_event *event)
struct pt *pt = this_cpu_ptr(&pt_ctx);
u64 ctl = event->hw.config;

+ if (READ_ONCE(event->aux_paused))
+ return;
+
ctl |= RTIT_CTL_TRACEEN;
if (READ_ONCE(pt->vmx_on))
perf_aux_output_flag(&pt->handle, PERF_AUX_FLAG_PARTIAL);
@@ -534,7 +537,20 @@ static void pt_config(struct perf_event *event)
reg |= (event->attr.config & PT_CONFIG_MASK);

event->hw.config = reg;
+
+ /*
+ * Allow resume before starting so as not to overwrite a value set by a
+ * PMI.
+ */
+ WRITE_ONCE(pt->resume_allowed, 1);
+
pt_config_start(event);
+
+ /*
+ * Allow pause after starting so its pt_config_stop() doesn't race with
+ * pt_config_start().
+ */
+ WRITE_ONCE(pt->pause_allowed, 1);
}

static void pt_config_stop(struct perf_event *event)
@@ -1511,6 +1527,7 @@ void intel_pt_interrupt(void)
buf = perf_aux_output_begin(&pt->handle, event);
if (!buf) {
event->hw.state = PERF_HES_STOPPED;
+ pt->resume_allowed = 0;
return;
}

@@ -1519,6 +1536,7 @@ void intel_pt_interrupt(void)
ret = pt_buffer_reset_markers(buf, &pt->handle);
if (ret) {
perf_aux_output_end(&pt->handle, 0);
+ pt->resume_allowed = 0;
return;
}

@@ -1573,6 +1591,26 @@ static void pt_event_start(struct perf_event *event, int mode)
struct pt *pt = this_cpu_ptr(&pt_ctx);
struct pt_buffer *buf;

+ if (mode & PERF_EF_RESUME) {
+ if (READ_ONCE(pt->resume_allowed)) {
+ u64 status;
+
+ /*
+ * Only if the trace is not active and the error and
+ * stopped bits are clear, is it safe to start, but a
+ * PMI might have just cleared these, so resume_allowed
+ * must be checked again also.
+ */
+ rdmsrl(MSR_IA32_RTIT_STATUS, status);
+ if (!(status & (RTIT_STATUS_TRIGGEREN |
+ RTIT_STATUS_ERROR |
+ RTIT_STATUS_STOPPED)) &&
+ READ_ONCE(pt->resume_allowed))
+ pt_config_start(event);
+ }
+ return;
+ }
+
buf = perf_aux_output_begin(&pt->handle, event);
if (!buf)
goto fail_stop;
@@ -1601,6 +1639,16 @@ static void pt_event_stop(struct perf_event *event, int mode)
{
struct pt *pt = this_cpu_ptr(&pt_ctx);

+ if (mode & PERF_EF_PAUSE) {
+ if (READ_ONCE(pt->pause_allowed))
+ pt_config_stop(event);
+ return;
+ }
+
+ /* Protect against racing */
+ WRITE_ONCE(pt->pause_allowed, 0);
+ WRITE_ONCE(pt->resume_allowed, 0);
+
/*
* Protect against the PMI racing with disabling wrmsr,
* see comment in intel_pt_interrupt().
@@ -1659,8 +1707,12 @@ static long pt_event_snapshot_aux(struct perf_event *event,
/*
* Here, handle_nmi tells us if the tracing is on
*/
- if (READ_ONCE(pt->handle_nmi))
+ if (READ_ONCE(pt->handle_nmi)) {
+ /* Protect against racing */
+ WRITE_ONCE(pt->pause_allowed, 0);
+ WRITE_ONCE(pt->resume_allowed, 0);
pt_config_stop(event);
+ }

pt_read_offset(buf);
pt_update_head(pt);
@@ -1677,8 +1729,11 @@ static long pt_event_snapshot_aux(struct perf_event *event,
* Compiler barrier not needed as we couldn't have been
* preempted by anything that touches pt->handle_nmi.
*/
- if (pt->handle_nmi)
+ if (pt->handle_nmi) {
+ WRITE_ONCE(pt->resume_allowed, 1);
pt_config_start(event);
+ WRITE_ONCE(pt->pause_allowed, 1);
+ }

return ret;
}
@@ -1794,7 +1849,9 @@ static __init int pt_init(void)
if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries))
pt_pmu.pmu.capabilities = PERF_PMU_CAP_AUX_NO_SG;

- pt_pmu.pmu.capabilities |= PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE;
+ pt_pmu.pmu.capabilities |= PERF_PMU_CAP_EXCLUSIVE |
+ PERF_PMU_CAP_ITRACE |
+ PERF_PMU_CAP_AUX_PAUSE;
pt_pmu.pmu.attr_groups = pt_attr_groups;
pt_pmu.pmu.task_ctx_nr = perf_sw_context;
pt_pmu.pmu.event_init = pt_event_init;
diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h
index 96906a62aacd..b9527205e028 100644
--- a/arch/x86/events/intel/pt.h
+++ b/arch/x86/events/intel/pt.h
@@ -117,6 +117,8 @@ struct pt_filters {
* @filters: last configured filters
* @handle_nmi: do handle PT PMI on this cpu, there's an active event
* @vmx_on: 1 if VMX is ON on this cpu
+ * @pause_allowed: PERF_EF_PAUSE is allowed to stop tracing
+ * @resume_allowed: PERF_EF_RESUME is allowed to start tracing
* @output_base: cached RTIT_OUTPUT_BASE MSR value
* @output_mask: cached RTIT_OUTPUT_MASK MSR value
*/
@@ -125,6 +127,8 @@ struct pt {
struct pt_filters filters;
int handle_nmi;
int vmx_on;
+ int pause_allowed;
+ int resume_allowed;
u64 output_base;
u64 output_mask;
};
--
2.34.1


2024-01-11 08:20:47

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V4 04/11] perf tools: Enable evsel__is_aux_event() to work for S390_CPUMSF

evsel__is_aux_event() identifies AUX area tracing selected events.

S390_CPUMSF uses a raw event type (PERF_TYPE_RAW - refer
s390_cpumsf_evsel_is_auxtrace()) not a PMU type value that could be checked
in evsel__is_aux_event(). However it sets needs_auxtrace_mmap (refer
auxtrace_record__init()), so check that first.

Currently, the features that use evsel__is_aux_event() are used only by
Intel PT, but that may change in the future.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/util/pmu.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 3c9609944a2f..7f1b96936ff1 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1096,8 +1096,12 @@ void perf_pmu__warn_invalid_formats(struct perf_pmu *pmu)

bool evsel__is_aux_event(const struct evsel *evsel)
{
- struct perf_pmu *pmu = evsel__find_pmu(evsel);
+ struct perf_pmu *pmu;
+
+ if (evsel->needs_auxtrace_mmap)
+ return true;

+ pmu = evsel__find_pmu(evsel);
return pmu && pmu->auxtrace;
}

--
2.34.1


2024-01-11 08:21:23

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V4 05/11] perf tools: Add aux_start_paused, aux_pause and aux_resume

Add struct perf_event_attr members to support pause and resume of AUX area
tracing.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 11 ++++++++++-
tools/perf/util/perf_event_attr_fprintf.c | 3 +++
2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 3a64499b0f5d..0c557f0a17b3 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -511,7 +511,16 @@ struct perf_event_attr {
__u16 sample_max_stack;
__u16 __reserved_2;
__u32 aux_sample_size;
- __u32 __reserved_3;
+
+ union {
+ __u32 aux_action;
+ struct {
+ __u32 aux_start_paused : 1, /* start AUX area tracing paused */
+ aux_pause : 1, /* on overflow, pause AUX area tracing */
+ aux_resume : 1, /* on overflow, resume AUX area tracing */
+ __reserved_3 : 29;
+ };
+ };

/*
* User provided data if sigtrap=1, passed back to user via
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 8f04d3b7f3ec..0e3cb35aab33 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -323,6 +323,9 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
PRINT_ATTRf(sample_max_stack, p_unsigned);
PRINT_ATTRf(aux_sample_size, p_unsigned);
PRINT_ATTRf(sig_data, p_unsigned);
+ PRINT_ATTRf(aux_start_paused, p_unsigned);
+ PRINT_ATTRf(aux_pause, p_unsigned);
+ PRINT_ATTRf(aux_resume, p_unsigned);

return ret;
}
--
2.34.1


2024-01-11 08:21:24

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V4 01/11] perf/core: Add aux_pause, aux_resume, aux_start_paused

Hardware traces, such as instruction traces, can produce a vast amount of
trace data, so being able to reduce tracing to more specific circumstances
can be useful.

The ability to pause or resume tracing when another event happens, can do
that.

Add ability for an event to "pause" or "resume" AUX area tracing.

Add aux_pause bit to perf_event_attr to indicate that, if the event
happens, the associated AUX area tracing should be paused. Ditto
aux_resume. Do not allow aux_pause and aux_resume to be set together.

Add aux_start_paused bit to perf_event_attr to indicate to an AUX area
event that it should start in a "paused" state.

Add aux_paused to struct perf_event for AUX area events to keep track of
the "paused" state. aux_paused is initialized to aux_start_paused.

Add PERF_EF_PAUSE and PERF_EF_RESUME modes for ->stop() and ->start()
callbacks. Call as needed, during __perf_event_output(). Add
aux_in_pause_resume to struct perf_buffer to prevent races with the NMI
handler. Pause/resume in NMI context will miss out if it coincides with
another pause/resume.

To use aux_pause or aux_resume, an event must be in a group with the AUX
area event as the group leader.

Example (requires Intel PT and tools patches also):

$ perf record --kcore -e intel_pt/aux-action=start-paused/k,syscalls:sys_enter_newuname/aux-action=resume/,syscalls:sys_exit_newuname/aux-action=pause/ uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.043 MB perf.data ]
$ perf script --call-trace
uname 30805 [000] 24001.058782799: name: 0x7ffc9c1865b0
uname 30805 [000] 24001.058784424: psb offs: 0
uname 30805 [000] 24001.058784424: cbr: 39 freq: 3904 MHz (139%)
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) debug_smp_processor_id
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __x64_sys_newuname
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) down_read
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __cond_resched
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) in_lock_functions
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_sub
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) up_read
uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) in_lock_functions
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) preempt_count_sub
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) _copy_to_user
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_to_user_mode
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_work
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) perf_syscall_exit
uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) debug_smp_processor_id
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_alloc
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_get_recursion_context
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_tp_event
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_update
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) tracing_gen_ctx_irq_test
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_event
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __perf_event_account_interrupt
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __this_cpu_preempt_check
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_output_forward
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_aux_pause
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) ring_buffer_get
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_lock
uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_unlock
uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) pt_event_stop
uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id
uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id
uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) native_write_msr
uname 30805 [000] 24001.058785463: ([kernel.kallsyms]) native_write_msr
uname 30805 [000] 24001.058785639: 0x0

Signed-off-by: Adrian Hunter <[email protected]>
---


Changes in V4:
Rename aux_output_cfg -> aux_action
Reorder aux_action bits from:
aux_pause, aux_resume, aux_start_paused
to:
aux_start_paused, aux_pause, aux_resume
Fix aux_action bits __u64 -> __u32


include/linux/perf_event.h | 15 +++++++
include/uapi/linux/perf_event.h | 11 ++++-
kernel/events/core.c | 72 +++++++++++++++++++++++++++++++--
kernel/events/internal.h | 1 +
4 files changed, 95 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 5547ba68e6e4..342879168269 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -291,6 +291,7 @@ struct perf_event_pmu_context;
#define PERF_PMU_CAP_NO_EXCLUDE 0x0040
#define PERF_PMU_CAP_AUX_OUTPUT 0x0080
#define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
+#define PERF_PMU_CAP_AUX_PAUSE 0x0200

struct perf_output_handle;

@@ -363,6 +364,8 @@ struct pmu {
#define PERF_EF_START 0x01 /* start the counter when adding */
#define PERF_EF_RELOAD 0x02 /* reload the counter when starting */
#define PERF_EF_UPDATE 0x04 /* update the counter when stopping */
+#define PERF_EF_PAUSE 0x08 /* AUX area event, pause tracing */
+#define PERF_EF_RESUME 0x10 /* AUX area event, resume tracing */

/*
* Adds/Removes a counter to/from the PMU, can be done inside a
@@ -402,6 +405,15 @@ struct pmu {
*
* ->start() with PERF_EF_RELOAD will reprogram the counter
* value, must be preceded by a ->stop() with PERF_EF_UPDATE.
+ *
+ * ->stop() with PERF_EF_PAUSE will stop as simply as possible. Will not
+ * overlap another ->stop() with PERF_EF_PAUSE nor ->start() with
+ * PERF_EF_RESUME.
+ *
+ * ->start() with PERF_EF_RESUME will start as simply as possible but
+ * only if the counter is not otherwise stopped. Will not overlap
+ * another ->start() with PERF_EF_RESUME nor ->stop() with
+ * PERF_EF_PAUSE.
*/
void (*start) (struct perf_event *event, int flags);
void (*stop) (struct perf_event *event, int flags);
@@ -798,6 +810,9 @@ struct perf_event {
/* for aux_output events */
struct perf_event *aux_event;

+ /* for AUX area events */
+ unsigned int aux_paused;
+
void (*destroy)(struct perf_event *);
struct rcu_head rcu_head;

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 39c6a250dd1b..5f6b3b494184 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -507,7 +507,16 @@ struct perf_event_attr {
__u16 sample_max_stack;
__u16 __reserved_2;
__u32 aux_sample_size;
- __u32 __reserved_3;
+
+ union {
+ __u32 aux_action;
+ struct {
+ __u32 aux_start_paused : 1, /* start AUX area tracing paused */
+ aux_pause : 1, /* on overflow, pause AUX area tracing */
+ aux_resume : 1, /* on overflow, resume AUX area tracing */
+ __reserved_3 : 29;
+ };
+ };

/*
* User provided data if sigtrap=1, passed back to user via
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9efd0d7775e7..dc9ec2443ac9 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2097,7 +2097,8 @@ static void perf_put_aux_event(struct perf_event *event)

static bool perf_need_aux_event(struct perf_event *event)
{
- return !!event->attr.aux_output || !!event->attr.aux_sample_size;
+ return event->attr.aux_output || event->attr.aux_sample_size ||
+ event->attr.aux_pause || event->attr.aux_resume;
}

static int perf_get_aux_event(struct perf_event *event,
@@ -2122,6 +2123,10 @@ static int perf_get_aux_event(struct perf_event *event,
!perf_aux_output_match(event, group_leader))
return 0;

+ if ((event->attr.aux_pause || event->attr.aux_resume) &&
+ !(group_leader->pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE))
+ return 0;
+
if (event->attr.aux_sample_size && !group_leader->pmu->snapshot_aux)
return 0;

@@ -7846,6 +7851,47 @@ void perf_prepare_header(struct perf_event_header *header,
WARN_ON_ONCE(header->size & 7);
}

+static void __perf_event_aux_pause(struct perf_event *event, bool pause)
+{
+ if (pause) {
+ if (!READ_ONCE(event->aux_paused)) {
+ WRITE_ONCE(event->aux_paused, 1);
+ event->pmu->stop(event, PERF_EF_PAUSE);
+ }
+ } else {
+ if (READ_ONCE(event->aux_paused)) {
+ WRITE_ONCE(event->aux_paused, 0);
+ event->pmu->start(event, PERF_EF_RESUME);
+ }
+ }
+}
+
+static void perf_event_aux_pause(struct perf_event *event, bool pause)
+{
+ struct perf_buffer *rb;
+ unsigned long flags;
+
+ if (WARN_ON_ONCE(!event))
+ return;
+
+ rb = ring_buffer_get(event);
+ if (!rb)
+ return;
+
+ local_irq_save(flags);
+ /* Guard against NMI, NMI loses here */
+ if (READ_ONCE(rb->aux_in_pause_resume))
+ goto out_restore;
+ WRITE_ONCE(rb->aux_in_pause_resume, 1);
+ barrier();
+ __perf_event_aux_pause(event, pause);
+ barrier();
+ WRITE_ONCE(rb->aux_in_pause_resume, 0);
+out_restore:
+ local_irq_restore(flags);
+ ring_buffer_put(rb);
+}
+
static __always_inline int
__perf_event_output(struct perf_event *event,
struct perf_sample_data *data,
@@ -7859,6 +7905,9 @@ __perf_event_output(struct perf_event *event,
struct perf_event_header header;
int err;

+ if (event->attr.aux_pause)
+ perf_event_aux_pause(event->aux_event, true);
+
/* protect the callchain buffers */
rcu_read_lock();

@@ -7875,6 +7924,10 @@ __perf_event_output(struct perf_event *event,

exit:
rcu_read_unlock();
+
+ if (event->attr.aux_resume)
+ perf_event_aux_pause(event->aux_event, false);
+
return err;
}

@@ -12014,10 +12067,23 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
}

if (event->attr.aux_output &&
- !(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT)) {
+ (!(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT) ||
+ event->attr.aux_pause || event->attr.aux_resume)) {
+ err = -EOPNOTSUPP;
+ goto err_pmu;
+ }
+
+ if (event->attr.aux_pause && event->attr.aux_resume) {
+ err = -EINVAL;
+ goto err_pmu;
+ }
+
+ if (event->attr.aux_start_paused &&
+ !(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE)) {
err = -EOPNOTSUPP;
goto err_pmu;
}
+ event->aux_paused = event->attr.aux_start_paused;

if (cgroup_fd != -1) {
err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
@@ -12814,7 +12880,7 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
* Grouping is not supported for kernel events, neither is 'AUX',
* make sure the caller's intentions are adjusted.
*/
- if (attr->aux_output)
+ if (attr->aux_output || attr->aux_action)
return ERR_PTR(-EINVAL);

event = perf_event_alloc(attr, cpu, task, NULL, NULL,
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 5150d5f84c03..3320f78117dc 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -51,6 +51,7 @@ struct perf_buffer {
void (*free_aux)(void *);
refcount_t aux_refcount;
int aux_in_sampling;
+ int aux_in_pause_resume;
void **aux_pages;
void *aux_priv;

--
2.34.1


2024-01-11 08:22:48

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V4 06/11] perf tools: Add aux-action config term

Add a new common config term "aux-action" to use for configuring AUX area
trace pause / resume. The value is a string that will be parsed in a
subsequent patch.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/util/evsel.c | 2 ++
tools/perf/util/evsel_config.h | 1 +
tools/perf/util/parse-events.c | 10 ++++++++++
tools/perf/util/parse-events.h | 1 +
tools/perf/util/parse-events.l | 1 +
5 files changed, 15 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 6d7c9c58a9bc..d8ee610edd62 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1001,6 +1001,8 @@ static void evsel__apply_config_terms(struct evsel *evsel,
case EVSEL__CONFIG_TERM_AUX_OUTPUT:
attr->aux_output = term->val.aux_output ? 1 : 0;
break;
+ case EVSEL__CONFIG_TERM_AUX_ACTION:
+ break;
case EVSEL__CONFIG_TERM_AUX_SAMPLE_SIZE:
/* Already applied by auxtrace */
break;
diff --git a/tools/perf/util/evsel_config.h b/tools/perf/util/evsel_config.h
index aee6f808b512..af52a1516d0b 100644
--- a/tools/perf/util/evsel_config.h
+++ b/tools/perf/util/evsel_config.h
@@ -25,6 +25,7 @@ enum evsel_term_type {
EVSEL__CONFIG_TERM_BRANCH,
EVSEL__CONFIG_TERM_PERCORE,
EVSEL__CONFIG_TERM_AUX_OUTPUT,
+ EVSEL__CONFIG_TERM_AUX_ACTION,
EVSEL__CONFIG_TERM_AUX_SAMPLE_SIZE,
EVSEL__CONFIG_TERM_CFG_CHG,
};
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 66eabcea4242..b597caacd905 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -768,6 +768,7 @@ static const char *config_term_name(enum parse_events__term_type term_type)
[PARSE_EVENTS__TERM_TYPE_DRV_CFG] = "driver-config",
[PARSE_EVENTS__TERM_TYPE_PERCORE] = "percore",
[PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT] = "aux-output",
+ [PARSE_EVENTS__TERM_TYPE_AUX_ACTION] = "aux-action",
[PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE] = "aux-sample-size",
[PARSE_EVENTS__TERM_TYPE_METRIC_ID] = "metric-id",
[PARSE_EVENTS__TERM_TYPE_RAW] = "raw",
@@ -817,6 +818,7 @@ config_term_avail(enum parse_events__term_type term_type, struct parse_events_er
case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
case PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT:
+ case PARSE_EVENTS__TERM_TYPE_AUX_ACTION:
case PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE:
case PARSE_EVENTS__TERM_TYPE_RAW:
case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
@@ -936,6 +938,9 @@ do { \
case PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT:
CHECK_TYPE_VAL(NUM);
break;
+ case PARSE_EVENTS__TERM_TYPE_AUX_ACTION:
+ CHECK_TYPE_VAL(STR);
+ break;
case PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE:
CHECK_TYPE_VAL(NUM);
if (term->val.num > UINT_MAX) {
@@ -1053,6 +1058,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
case PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT:
+ case PARSE_EVENTS__TERM_TYPE_AUX_ACTION:
case PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE:
return config_term_common(attr, term, err);
case PARSE_EVENTS__TERM_TYPE_USER:
@@ -1187,6 +1193,9 @@ do { \
ADD_CONFIG_TERM_VAL(AUX_OUTPUT, aux_output,
term->val.num ? 1 : 0, term->weak);
break;
+ case PARSE_EVENTS__TERM_TYPE_AUX_ACTION:
+ ADD_CONFIG_TERM_STR(AUX_ACTION, term->val.str, term->weak);
+ break;
case PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE:
ADD_CONFIG_TERM_VAL(AUX_SAMPLE_SIZE, aux_sample_size,
term->val.num, term->weak);
@@ -1249,6 +1258,7 @@ static int get_config_chgs(struct perf_pmu *pmu, struct parse_events_terms *head
case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
case PARSE_EVENTS__TERM_TYPE_PERCORE:
case PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT:
+ case PARSE_EVENTS__TERM_TYPE_AUX_ACTION:
case PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE:
case PARSE_EVENTS__TERM_TYPE_METRIC_ID:
case PARSE_EVENTS__TERM_TYPE_RAW:
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 63c0a36a4bf1..04b4deff81ff 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -74,6 +74,7 @@ enum parse_events__term_type {
PARSE_EVENTS__TERM_TYPE_DRV_CFG,
PARSE_EVENTS__TERM_TYPE_PERCORE,
PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT,
+ PARSE_EVENTS__TERM_TYPE_AUX_ACTION,
PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE,
PARSE_EVENTS__TERM_TYPE_METRIC_ID,
PARSE_EVENTS__TERM_TYPE_RAW,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index e86c45675e1d..26a60ad5853c 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -244,6 +244,7 @@ overwrite { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_OVERWRITE); }
no-overwrite { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOOVERWRITE); }
percore { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_PERCORE); }
aux-output { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
+aux-action { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_ACTION); }
aux-sample-size { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
metric-id { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
cpu-cycles|cycles { return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
--
2.34.1


2024-01-11 08:23:03

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V4 07/11] perf tools: Parse aux-action

Add parsing for aux-action to accept "pause", "resume" or "start-paused"
values.

"start-paused" is valid only for AUX area events.

"pause" and "resume" are valid only for events grouped with an AUX area
event as the group leader. However, like with aux-output, the events
will be automatically grouped if they are not currently in a group, and
the AUX area event precedes the other events.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 4 ++
tools/perf/builtin-record.c | 4 +-
tools/perf/util/auxtrace.c | 67 ++++++++++++++++++++++--
tools/perf/util/auxtrace.h | 6 ++-
tools/perf/util/evsel.c | 1 +
5 files changed, 74 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 6015fdd08fb6..ccdba551e52d 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -68,6 +68,10 @@ OPTIONS
like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'.
- 'aux-output': Generate AUX records instead of events. This requires
that an AUX area event is also provided.
+ - 'aux-action': "pause" or "resume" to pause or resume an AUX
+ area event (the group leader) when this event occurs.
+ "start-paused" on an AUX area event itself, will
+ start in a paused state.
- 'aux-sample-size': Set sample size for AUX area sampling. If the
'--aux-sample' option has been used, set aux-sample-size=0 to disable
AUX area sampling for the event.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 91e6828c38cc..055d360926d6 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -850,7 +850,9 @@ static int record__auxtrace_init(struct record *rec)
if (err)
return err;

- auxtrace_regroup_aux_output(rec->evlist);
+ err = auxtrace_parse_aux_action(rec->evlist);
+ if (err)
+ return err;

return auxtrace_parse_filters(rec->evlist);
}
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 3684e6009b63..67bd33ea2eaf 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -805,19 +805,76 @@ int auxtrace_parse_sample_options(struct auxtrace_record *itr,
return auxtrace_validate_aux_sample_size(evlist, opts);
}

-void auxtrace_regroup_aux_output(struct evlist *evlist)
+static struct aux_action_opt {
+ const char *str;
+ u32 aux_action;
+ bool aux_event_opt;
+} aux_action_opts[] = {
+ {"start-paused", BIT(0), true},
+ {"pause", BIT(1), false},
+ {"resume", BIT(2), false},
+ {NULL},
+};
+
+static const struct aux_action_opt *auxtrace_parse_aux_action_str(const char *str)
+{
+ const struct aux_action_opt *opt;
+
+ if (!str)
+ return NULL;
+
+ for (opt = aux_action_opts; opt->str; opt++)
+ if (!strcmp(str, opt->str))
+ return opt;
+
+ return NULL;
+}
+
+int auxtrace_parse_aux_action(struct evlist *evlist)
{
- struct evsel *evsel, *aux_evsel = NULL;
struct evsel_config_term *term;
+ struct evsel *aux_evsel = NULL;
+ struct evsel *evsel;

evlist__for_each_entry(evlist, evsel) {
- if (evsel__is_aux_event(evsel))
+ bool is_aux_event = evsel__is_aux_event(evsel);
+ const struct aux_action_opt *opt;
+
+ if (is_aux_event)
aux_evsel = evsel;
- term = evsel__get_config_term(evsel, AUX_OUTPUT);
+ term = evsel__get_config_term(evsel, AUX_ACTION);
+ if (!term) {
+ if (evsel__get_config_term(evsel, AUX_OUTPUT))
+ goto regroup;
+ continue;
+ }
+ opt = auxtrace_parse_aux_action_str(term->val.str);
+ if (!opt) {
+ pr_err("Bad aux-action '%s'\n", term->val.str);
+ return -EINVAL;
+ }
+ if (opt->aux_event_opt && !is_aux_event) {
+ pr_err("aux-action '%s' can only be used with AUX area event\n",
+ term->val.str);
+ return -EINVAL;
+ }
+ if (!opt->aux_event_opt && is_aux_event) {
+ pr_err("aux-action '%s' cannot be used for AUX area event itself\n",
+ term->val.str);
+ return -EINVAL;
+ }
+ evsel->core.attr.aux_action = opt->aux_action;
+regroup:
/* If possible, group with the AUX event */
- if (term && aux_evsel)
+ if (aux_evsel)
evlist__regroup(evlist, aux_evsel, evsel);
+ if (!evsel__is_aux_event(evsel__leader(evsel))) {
+ pr_err("Events with aux-action must have AUX area event group leader\n");
+ return -EINVAL;
+ }
}
+
+ return 0;
}

struct auxtrace_record *__weak
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 55702215a82d..35324ad12aad 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -579,7 +579,7 @@ int auxtrace_parse_snapshot_options(struct auxtrace_record *itr,
int auxtrace_parse_sample_options(struct auxtrace_record *itr,
struct evlist *evlist,
struct record_opts *opts, const char *str);
-void auxtrace_regroup_aux_output(struct evlist *evlist);
+int auxtrace_parse_aux_action(struct evlist *evlist);
int auxtrace_record__options(struct auxtrace_record *itr,
struct evlist *evlist,
struct record_opts *opts);
@@ -800,8 +800,10 @@ int auxtrace_parse_sample_options(struct auxtrace_record *itr __maybe_unused,
}

static inline
-void auxtrace_regroup_aux_output(struct evlist *evlist __maybe_unused)
+int auxtrace_parse_aux_action(struct evlist *evlist __maybe_unused)
{
+ pr_err("AUX area tracing not supported\n");
+ return -EINVAL;
}

static inline
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index d8ee610edd62..2ccfc5c6f52f 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1002,6 +1002,7 @@ static void evsel__apply_config_terms(struct evsel *evsel,
attr->aux_output = term->val.aux_output ? 1 : 0;
break;
case EVSEL__CONFIG_TERM_AUX_ACTION:
+ /* Already applied by auxtrace */
break;
case EVSEL__CONFIG_TERM_AUX_SAMPLE_SIZE:
/* Already applied by auxtrace */
--
2.34.1


2024-01-11 08:23:17

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V4 08/11] perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resume

Display "feature is not supported" error message if aux_start_paused,
aux_pause or aux_resume result in a perf_event_open() error.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/util/evsel.c | 10 +++++++++-
tools/perf/util/evsel.h | 1 +
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 2ccfc5c6f52f..5681266acdfd 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1890,7 +1890,13 @@ bool evsel__detect_missing_features(struct evsel *evsel)
* Must probe features in the order they were added to the
* perf_event_attr interface.
*/
- if (!perf_missing_features.branch_counters &&
+ if (!perf_missing_features.aux_pause_resume &&
+ (evsel->core.attr.aux_pause || evsel->core.attr.aux_resume ||
+ evsel->core.attr.aux_start_paused)) {
+ perf_missing_features.aux_pause_resume = true;
+ pr_debug2_peo("Kernel has no aux_pause/aux_resume support, bailing out\n");
+ return false;
+ } else if (!perf_missing_features.branch_counters &&
(evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS)) {
perf_missing_features.branch_counters = true;
pr_debug2("switching off branch counters support\n");
@@ -3057,6 +3063,8 @@ int evsel__open_strerror(struct evsel *evsel, struct target *target,
return scnprintf(msg, size, "clockid feature not supported.");
if (perf_missing_features.clockid_wrong)
return scnprintf(msg, size, "wrong clockid (%d).", clockid);
+ if (perf_missing_features.aux_pause_resume)
+ return scnprintf(msg, size, "The 'aux_pause / aux_resume' feature is not supported, update the kernel.");
if (perf_missing_features.aux_output)
return scnprintf(msg, size, "The 'aux_output' feature is not supported, update the kernel.");
if (!target__has_cpu(target))
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index efbb6e848287..cb316bba3c58 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -192,6 +192,7 @@ struct perf_missing_features {
bool weight_struct;
bool read_lost;
bool branch_counters;
+ bool aux_pause_resume;
};

extern struct perf_missing_features perf_missing_features;
--
2.34.1


2024-01-11 08:23:43

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V4 10/11] perf intel-pt: Add documentation for pause / resume

Document the use of aux-action config term and provide a simple example.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/perf-intel-pt.txt | 70 ++++++++++++++++++++++
1 file changed, 70 insertions(+)

diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index b3d9fb29ffd3..3a0d73bf8bc2 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -551,6 +551,9 @@ Support for this feature is indicated by:
which contains "1" if the feature is supported and
"0" otherwise.

+*aux-action=start-paused*::
+Start tracing paused, refer to the section <<_pause_or_resume_tracing,Pause or Resume Tracing>>
+

config terms on other events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -566,6 +569,9 @@ and PEBS-via-PT. In those cases, the other events can have config terms below:
Used to select PEBS-via-PT, refer to the
section <<_pebs_via_intel_pt,PEBS via Intel PT>>

+*aux-action*::
+ Used to pause or resume tracing, refer to the section
+ <<_pause_or_resume_tracing,Pause or Resume Tracing>>

AUX area sampling option
~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1911,6 +1917,70 @@ For pipe mode, the order of events and timestamps can presumably
be messed up.


+Pause or Resume Tracing
+-----------------------
+
+With newer Kernels, it is possible to use other selected events to pause
+or resume Intel PT tracing. This is configured by using the "aux-action"
+config term:
+
+"aux-action=pause" is used with events that are to pause Intel PT tracing.
+
+"aux-action=resume" is used with events that are to resume Intel PT tracing.
+
+"aux-action=start-paused" is used with the Intel PT event to start in a
+paused state.
+
+For example, to trace only the uname system call (sys_newuname) when running the
+command line utility uname:
+
+ $ perf record --kcore -e intel_pt/aux-action=start-paused/k,syscalls:sys_enter_newuname/aux-action=resume/,syscalls:sys_exit_newuname/aux-action=pause/ uname
+ Linux
+ [ perf record: Woken up 1 times to write data ]
+ [ perf record: Captured and wrote 0.043 MB perf.data ]
+ $ perf script --call-trace
+ uname 30805 [000] 24001.058782799: name: 0x7ffc9c1865b0
+ uname 30805 [000] 24001.058784424: psb offs: 0
+ uname 30805 [000] 24001.058784424: cbr: 39 freq: 3904 MHz (139%)
+ uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) debug_smp_processor_id
+ uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __x64_sys_newuname
+ uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) down_read
+ uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __cond_resched
+ uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add
+ uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) in_lock_functions
+ uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_sub
+ uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) up_read
+ uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add
+ uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) in_lock_functions
+ uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) preempt_count_sub
+ uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) _copy_to_user
+ uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_to_user_mode
+ uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_work
+ uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) perf_syscall_exit
+ uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) debug_smp_processor_id
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_alloc
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_get_recursion_context
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_tp_event
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_update
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) tracing_gen_ctx_irq_test
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_event
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __perf_event_account_interrupt
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __this_cpu_preempt_check
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_output_forward
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_aux_pause
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) ring_buffer_get
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_lock
+ uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_unlock
+ uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) pt_event_stop
+ uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id
+ uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id
+ uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) native_write_msr
+ uname 30805 [000] 24001.058785463: ([kernel.kallsyms]) native_write_msr
+ uname 30805 [000] 24001.058785639: 0x0
+
+
EXAMPLE
-------

--
2.34.1


2024-01-11 08:24:02

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V4 11/11] perf intel-pt: Add a test for pause / resume

Add a simple sub-test to the "Miscellaneous Intel PT testing" test to
check pause / resume.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/tests/shell/test_intel_pt.sh | 28 +++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/tools/perf/tests/shell/test_intel_pt.sh b/tools/perf/tests/shell/test_intel_pt.sh
index 723ec501f99a..e359db0d0ff2 100755
--- a/tools/perf/tests/shell/test_intel_pt.sh
+++ b/tools/perf/tests/shell/test_intel_pt.sh
@@ -644,6 +644,33 @@ test_pipe()
return 0
}

+test_pause_resume()
+{
+ echo "--- Test with pause / resume ---"
+ if ! perf_record_no_decode -o "${perfdatafile}" -e intel_pt/aux-action=start-paused/u uname ; then
+ echo "SKIP: pause / resume is not supported"
+ return 2
+ fi
+ if ! perf_record_no_bpf -o "${perfdatafile}" \
+ -e intel_pt/aux-action=start-paused/u \
+ -e instructions/period=50000,aux-action=resume,name=Resume/u \
+ -e instructions/period=100000,aux-action=pause,name=Pause/u uname ; then
+ echo "perf record with pause / resume failed"
+ return 1
+ fi
+ if ! perf script -i "${perfdatafile}" --itrace=b -Fperiod,event | \
+ awk 'BEGIN {paused=1;branches=0}
+ /Resume/ {paused=0}
+ /branches/ {if (paused) exit 1;branches=1}
+ /Pause/ {paused=1}
+ END {if (!branches) exit 1}' ; then
+ echo "perf record with pause / resume failed"
+ return 1
+ fi
+ echo OK
+ return 0
+}
+
count_result()
{
if [ "$1" -eq 2 ] ; then
@@ -672,6 +699,7 @@ test_power_event || ret=$? ; count_result $ret ; ret=0
test_no_tnt || ret=$? ; count_result $ret ; ret=0
test_event_trace || ret=$? ; count_result $ret ; ret=0
test_pipe || ret=$? ; count_result $ret ; ret=0
+test_pause_resume || ret=$? ; count_result $ret ; ret=0

cleanup

--
2.34.1


2024-01-11 08:24:31

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V4 03/11] perf tools: Enable evsel__is_aux_event() to work for ARM/ARM64

Set pmu->auxtrace on ARM/ARM64 AUX area PMUs. evsel__is_aux_event() needs
the setting to identify AUX area tracing selected events.

Currently, the features that use evsel__is_aux_event() are used only by
Intel PT, but that may change in the future.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/arch/arm/util/pmu.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/tools/perf/arch/arm/util/pmu.c b/tools/perf/arch/arm/util/pmu.c
index 7f3af3b97f3b..88fbd366246d 100644
--- a/tools/perf/arch/arm/util/pmu.c
+++ b/tools/perf/arch/arm/util/pmu.c
@@ -19,14 +19,17 @@ void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
#ifdef HAVE_AUXTRACE_SUPPORT
if (!strcmp(pmu->name, CORESIGHT_ETM_PMU_NAME)) {
/* add ETM default config here */
+ pmu->auxtrace = true;
pmu->selectable = true;
pmu->perf_event_attr_init_default = cs_etm_get_default_config;
#if defined(__aarch64__)
} else if (strstarts(pmu->name, ARM_SPE_PMU_NAME)) {
+ pmu->auxtrace = true;
pmu->selectable = true;
pmu->is_uncore = false;
pmu->perf_event_attr_init_default = arm_spe_pmu_default_config;
} else if (strstarts(pmu->name, HISI_PTT_PMU_NAME)) {
+ pmu->auxtrace = true;
pmu->selectable = true;
#endif
}
--
2.34.1


2024-01-11 08:27:18

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH V4 09/11] perf intel-pt: Improve man page format

Improve format of config terms and section references.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/Documentation/perf-intel-pt.txt | 486 +++++++++++----------
1 file changed, 267 insertions(+), 219 deletions(-)

diff --git a/tools/perf/Documentation/perf-intel-pt.txt b/tools/perf/Documentation/perf-intel-pt.txt
index 4c90cc176f81..b3d9fb29ffd3 100644
--- a/tools/perf/Documentation/perf-intel-pt.txt
+++ b/tools/perf/Documentation/perf-intel-pt.txt
@@ -147,7 +147,7 @@ displayed as follows:
There are two ways that instructions-per-cycle (IPC) can be calculated depending
on the recording.

-If the 'cyc' config term (see config terms section below) was used, then IPC
+If the 'cyc' config term (see <<_config_terms,config terms>> section below) was used, then IPC
and cycle events are calculated using the cycle count from CYC packets, otherwise
MTC packets are used - refer to the 'mtc' config term. When MTC is used, however,
the values are less accurate because the timing is less accurate.
@@ -235,7 +235,7 @@ which is the same as

-e intel_pt/tsc=1,noretcomp=0/

-Note there are now new config terms - see section 'config terms' further below.
+Note there are other config terms - see section <<_config_terms,config terms>> further below.

The config terms are listed in /sys/devices/intel_pt/format. They are bit
fields within the config member of the struct perf_event_attr which is
@@ -307,217 +307,264 @@ perf_event_attr is displayed if the -vv option is used e.g.
config terms
~~~~~~~~~~~~

-The June 2015 version of Intel 64 and IA-32 Architectures Software Developer
-Manuals, Chapter 36 Intel Processor Trace, defined new Intel PT features.
-Some of the features are reflect in new config terms. All the config terms are
-described below.
-
-tsc Always supported. Produces TSC timestamp packets to provide
- timing information. In some cases it is possible to decode
- without timing information, for example a per-thread context
- that does not overlap executable memory maps.
-
- The default config selects tsc (i.e. tsc=1).
-
-noretcomp Always supported. Disables "return compression" so a TIP packet
- is produced when a function returns. Causes more packets to be
- produced but might make decoding more reliable.
-
- The default config does not select noretcomp (i.e. noretcomp=0).
-
-psb_period Allows the frequency of PSB packets to be specified.
-
- The PSB packet is a synchronization packet that provides a
- starting point for decoding or recovery from errors.
-
- Support for psb_period is indicated by:
-
- /sys/bus/event_source/devices/intel_pt/caps/psb_cyc
-
- which contains "1" if the feature is supported and "0"
- otherwise.
-
- Valid values are given by:
-
- /sys/bus/event_source/devices/intel_pt/caps/psb_periods
-
- which contains a hexadecimal value, the bits of which represent
- valid values e.g. bit 2 set means value 2 is valid.
-
- The psb_period value is converted to the approximate number of
- trace bytes between PSB packets as:
-
- 2 ^ (value + 11)
-
- e.g. value 3 means 16KiB bytes between PSBs
-
- If an invalid value is entered, the error message
- will give a list of valid values e.g.
-
- $ perf record -e intel_pt/psb_period=15/u uname
- Invalid psb_period for intel_pt. Valid values are: 0-5
-
- If MTC packets are selected, the default config selects a value
- of 3 (i.e. psb_period=3) or the nearest lower value that is
- supported (0 is always supported). Otherwise the default is 0.
-
- If decoding is expected to be reliable and the buffer is large
- then a large PSB period can be used.
-
- Because a TSC packet is produced with PSB, the PSB period can
- also affect the granularity to timing information in the absence
- of MTC or CYC.
-
-mtc Produces MTC timing packets.
-
- MTC packets provide finer grain timestamp information than TSC
- packets. MTC packets record time using the hardware crystal
- clock (CTC) which is related to TSC packets using a TMA packet.
-
- Support for this feature is indicated by:
-
- /sys/bus/event_source/devices/intel_pt/caps/mtc
-
- which contains "1" if the feature is supported and
- "0" otherwise.
-
- The frequency of MTC packets can also be specified - see
- mtc_period below.
-
-mtc_period Specifies how frequently MTC packets are produced - see mtc
- above for how to determine if MTC packets are supported.
-
- Valid values are given by:
-
- /sys/bus/event_source/devices/intel_pt/caps/mtc_periods
-
- which contains a hexadecimal value, the bits of which represent
- valid values e.g. bit 2 set means value 2 is valid.
-
- The mtc_period value is converted to the MTC frequency as:
-
- CTC-frequency / (2 ^ value)
-
- e.g. value 3 means one eighth of CTC-frequency
-
- Where CTC is the hardware crystal clock, the frequency of which
- can be related to TSC via values provided in cpuid leaf 0x15.
-
- If an invalid value is entered, the error message
- will give a list of valid values e.g.
-
- $ perf record -e intel_pt/mtc_period=15/u uname
- Invalid mtc_period for intel_pt. Valid values are: 0,3,6,9
-
- The default value is 3 or the nearest lower value
- that is supported (0 is always supported).
-
-cyc Produces CYC timing packets.
-
- CYC packets provide even finer grain timestamp information than
- MTC and TSC packets. A CYC packet contains the number of CPU
- cycles since the last CYC packet. Unlike MTC and TSC packets,
- CYC packets are only sent when another packet is also sent.
-
- Support for this feature is indicated by:
-
- /sys/bus/event_source/devices/intel_pt/caps/psb_cyc
-
- which contains "1" if the feature is supported and
- "0" otherwise.
-
- The number of CYC packets produced can be reduced by specifying
- a threshold - see cyc_thresh below.
-
-cyc_thresh Specifies how frequently CYC packets are produced - see cyc
- above for how to determine if CYC packets are supported.
-
- Valid cyc_thresh values are given by:
-
- /sys/bus/event_source/devices/intel_pt/caps/cycle_thresholds
-
- which contains a hexadecimal value, the bits of which represent
- valid values e.g. bit 2 set means value 2 is valid.
-
- The cyc_thresh value represents the minimum number of CPU cycles
- that must have passed before a CYC packet can be sent. The
- number of CPU cycles is:
-
- 2 ^ (value - 1)
-
- e.g. value 4 means 8 CPU cycles must pass before a CYC packet
- can be sent. Note a CYC packet is still only sent when another
- packet is sent, not at, e.g. every 8 CPU cycles.
-
- If an invalid value is entered, the error message
- will give a list of valid values e.g.
-
- $ perf record -e intel_pt/cyc,cyc_thresh=15/u uname
- Invalid cyc_thresh for intel_pt. Valid values are: 0-12
-
- CYC packets are not requested by default.
-
-pt Specifies pass-through which enables the 'branch' config term.
-
- The default config selects 'pt' if it is available, so a user will
- never need to specify this term.
-
-branch Enable branch tracing. Branch tracing is enabled by default so to
- disable branch tracing use 'branch=0'.
-
- The default config selects 'branch' if it is available.
-
-ptw Enable PTWRITE packets which are produced when a ptwrite instruction
- is executed.
-
- Support for this feature is indicated by:
-
- /sys/bus/event_source/devices/intel_pt/caps/ptwrite
-
- which contains "1" if the feature is supported and
- "0" otherwise.
-
- As an alternative, refer to "Emulated PTWRITE" further below.
-
-fup_on_ptw Enable a FUP packet to follow the PTWRITE packet. The FUP packet
- provides the address of the ptwrite instruction. In the absence of
- fup_on_ptw, the decoder will use the address of the previous branch
- if branch tracing is enabled, otherwise the address will be zero.
- Note that fup_on_ptw will work even when branch tracing is disabled.
-
-pwr_evt Enable power events. The power events provide information about
- changes to the CPU C-state.
-
- Support for this feature is indicated by:
-
- /sys/bus/event_source/devices/intel_pt/caps/power_event_trace
-
- which contains "1" if the feature is supported and
- "0" otherwise.
-
-event Enable Event Trace. The events provide information about asynchronous
- events.
-
- Support for this feature is indicated by:
-
- /sys/bus/event_source/devices/intel_pt/caps/event_trace
-
- which contains "1" if the feature is supported and
- "0" otherwise.
-
-notnt Disable TNT packets. Without TNT packets, it is not possible to walk
- executable code to reconstruct control flow, however FUP, TIP, TIP.PGE
- and TIP.PGD packets still indicate asynchronous control flow, and (if
- return compression is disabled - see noretcomp) return statements.
- The advantage of eliminating TNT packets is reducing the size of the
- trace and corresponding tracing overhead.
-
- Support for this feature is indicated by:
-
- /sys/bus/event_source/devices/intel_pt/caps/tnt_disable
-
- which contains "1" if the feature is supported and
- "0" otherwise.
+Config terms are parameters specified with the -e intel_pt// event option,
+for example:
+
+ -e intel_pt/cyc/
+
+which selects cycle accurate mode. Each config term can have a value which
+defaults to 1, so the above is the same as:
+
+ -e intel_pt/cyc=1/
+
+Some terms are set by default, so must be set to 0 to turn them off. For
+example, to turn off branch tracing:
+
+ -e intel_pt/branch=0/
+
+Multiple config terms are separated by commas, for example:
+
+ -e intel_pt/cyc,mtc_period=9/
+
+There are also common config terms, see linkperf:perf-record[1] documentation.
+
+Intel PT config terms are described below.
+
+*tsc*::
+Always supported. Produces TSC timestamp packets to provide
+timing information. In some cases it is possible to decode
+without timing information, for example a per-thread context
+that does not overlap executable memory maps.
++
+The default config selects tsc (i.e. tsc=1).
+
+*noretcomp*::
+Always supported. Disables "return compression" so a TIP packet
+is produced when a function returns. Causes more packets to be
+produced but might make decoding more reliable.
++
+The default config does not select noretcomp (i.e. noretcomp=0).
+
+*psb_period*::
+Allows the frequency of PSB packets to be specified.
++
+The PSB packet is a synchronization packet that provides a
+starting point for decoding or recovery from errors.
++
+Support for psb_period is indicated by:
++
+ /sys/bus/event_source/devices/intel_pt/caps/psb_cyc
++
+which contains "1" if the feature is supported and "0"
+otherwise.
++
+Valid values are given by:
++
+ /sys/bus/event_source/devices/intel_pt/caps/psb_periods
++
+which contains a hexadecimal value, the bits of which represent
+valid values e.g. bit 2 set means value 2 is valid.
++
+The psb_period value is converted to the approximate number of
+trace bytes between PSB packets as:
++
+ 2 ^ (value + 11)
++
+e.g. value 3 means 16KiB bytes between PSBs
++
+If an invalid value is entered, the error message
+will give a list of valid values e.g.
++
+ $ perf record -e intel_pt/psb_period=15/u uname
+ Invalid psb_period for intel_pt. Valid values are: 0-5
++
+If MTC packets are selected, the default config selects a value
+of 3 (i.e. psb_period=3) or the nearest lower value that is
+supported (0 is always supported). Otherwise the default is 0.
++
+If decoding is expected to be reliable and the buffer is large
+then a large PSB period can be used.
++
+Because a TSC packet is produced with PSB, the PSB period can
+also affect the granularity to timing information in the absence
+of MTC or CYC.
+
+*mtc*::
+Produces MTC timing packets.
++
+MTC packets provide finer grain timestamp information than TSC
+packets. MTC packets record time using the hardware crystal
+clock (CTC) which is related to TSC packets using a TMA packet.
++
+Support for this feature is indicated by:
++
+ /sys/bus/event_source/devices/intel_pt/caps/mtc
++
+which contains "1" if the feature is supported and
+"0" otherwise.
++
+The frequency of MTC packets can also be specified - see
+mtc_period below.
+
+*mtc_period*::
+Specifies how frequently MTC packets are produced - see mtc
+above for how to determine if MTC packets are supported.
++
+Valid values are given by:
++
+ /sys/bus/event_source/devices/intel_pt/caps/mtc_periods
++
+which contains a hexadecimal value, the bits of which represent
+valid values e.g. bit 2 set means value 2 is valid.
++
+The mtc_period value is converted to the MTC frequency as:
+
+ CTC-frequency / (2 ^ value)
++
+e.g. value 3 means one eighth of CTC-frequency
++
+Where CTC is the hardware crystal clock, the frequency of which
+can be related to TSC via values provided in cpuid leaf 0x15.
++
+If an invalid value is entered, the error message
+will give a list of valid values e.g.
++
+ $ perf record -e intel_pt/mtc_period=15/u uname
+ Invalid mtc_period for intel_pt. Valid values are: 0,3,6,9
++
+The default value is 3 or the nearest lower value
+that is supported (0 is always supported).
+
+*cyc*::
+Produces CYC timing packets.
++
+CYC packets provide even finer grain timestamp information than
+MTC and TSC packets. A CYC packet contains the number of CPU
+cycles since the last CYC packet. Unlike MTC and TSC packets,
+CYC packets are only sent when another packet is also sent.
++
+Support for this feature is indicated by:
++
+ /sys/bus/event_source/devices/intel_pt/caps/psb_cyc
++
+which contains "1" if the feature is supported and
+"0" otherwise.
++
+The number of CYC packets produced can be reduced by specifying
+a threshold - see cyc_thresh below.
+
+*cyc_thresh*::
+Specifies how frequently CYC packets are produced - see cyc
+above for how to determine if CYC packets are supported.
++
+Valid cyc_thresh values are given by:
++
+ /sys/bus/event_source/devices/intel_pt/caps/cycle_thresholds
++
+which contains a hexadecimal value, the bits of which represent
+valid values e.g. bit 2 set means value 2 is valid.
++
+The cyc_thresh value represents the minimum number of CPU cycles
+that must have passed before a CYC packet can be sent. The
+number of CPU cycles is:
++
+ 2 ^ (value - 1)
++
+e.g. value 4 means 8 CPU cycles must pass before a CYC packet
+can be sent. Note a CYC packet is still only sent when another
+packet is sent, not at, e.g. every 8 CPU cycles.
++
+If an invalid value is entered, the error message
+will give a list of valid values e.g.
++
+ $ perf record -e intel_pt/cyc,cyc_thresh=15/u uname
+ Invalid cyc_thresh for intel_pt. Valid values are: 0-12
++
+CYC packets are not requested by default.
+
+*pt*::
+Specifies pass-through which enables the 'branch' config term.
++
+The default config selects 'pt' if it is available, so a user will
+never need to specify this term.
+
+*branch*::
+Enable branch tracing. Branch tracing is enabled by default so to
+disable branch tracing use 'branch=0'.
++
+The default config selects 'branch' if it is available.
+
+*ptw*::
+Enable PTWRITE packets which are produced when a ptwrite instruction
+is executed.
++
+Support for this feature is indicated by:
++
+ /sys/bus/event_source/devices/intel_pt/caps/ptwrite
++
+which contains "1" if the feature is supported and
+"0" otherwise.
++
+As an alternative, refer to "Emulated PTWRITE" further below.
+
+*fup_on_ptw*::
+Enable a FUP packet to follow the PTWRITE packet. The FUP packet
+provides the address of the ptwrite instruction. In the absence of
+fup_on_ptw, the decoder will use the address of the previous branch
+if branch tracing is enabled, otherwise the address will be zero.
+Note that fup_on_ptw will work even when branch tracing is disabled.
+
+*pwr_evt*::
+Enable power events. The power events provide information about
+changes to the CPU C-state.
++
+Support for this feature is indicated by:
++
+ /sys/bus/event_source/devices/intel_pt/caps/power_event_trace
++
+which contains "1" if the feature is supported and
+"0" otherwise.
+
+*event*::
+Enable Event Trace. The events provide information about asynchronous
+events.
++
+Support for this feature is indicated by:
++
+ /sys/bus/event_source/devices/intel_pt/caps/event_trace
++
+which contains "1" if the feature is supported and
+"0" otherwise.
+
+*notnt*::
+Disable TNT packets. Without TNT packets, it is not possible to walk
+executable code to reconstruct control flow, however FUP, TIP, TIP.PGE
+and TIP.PGD packets still indicate asynchronous control flow, and (if
+return compression is disabled - see noretcomp) return statements.
+The advantage of eliminating TNT packets is reducing the size of the
+trace and corresponding tracing overhead.
++
+Support for this feature is indicated by:
++
+ /sys/bus/event_source/devices/intel_pt/caps/tnt_disable
++
+which contains "1" if the feature is supported and
+"0" otherwise.
+
+
+config terms on other events
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Some Intel PT features work with other events, features such as AUX area sampling
+and PEBS-via-PT. In those cases, the other events can have config terms below:
+
+*aux-sample-size*::
+ Used to set the AUX area sample size, refer to the section
+ <<_aux_area_sampling_option,AUX area sampling option>>
+
+*aux-output*::
+ Used to select PEBS-via-PT, refer to the
+ section <<_pebs_via_intel_pt,PEBS via Intel PT>>


AUX area sampling option
@@ -592,7 +639,8 @@ The default snapshot size is the auxtrace mmap size. If neither auxtrace mmap s
nor snapshot size is specified, then the default is 4MiB for privileged users
(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
If an unprivileged user does not specify mmap pages, the mmap pages will be
-reduced as described in the 'new auxtrace mmap size option' section below.
+reduced as described in the <<_new_auxtrace_mmap_size_option,new auxtrace mmap size option>>
+section below.

The snapshot size is displayed if the option -vv is used e.g.

@@ -948,11 +996,11 @@ transaction start, commit or abort.

Note that "instructions", "cycles", "branches" and "transactions" events
depend on code flow packets which can be disabled by using the config term
-"branch=0". Refer to the config terms section above.
+"branch=0". Refer to the <<_config_terms,config terms>> section above.

"ptwrite" events record the payload of the ptwrite instruction and whether
"fup_on_ptw" was used. "ptwrite" events depend on PTWRITE packets which are
-recorded only if the "ptw" config term was used. Refer to the config terms
+recorded only if the "ptw" config term was used. Refer to the <<_config_terms,config terms>>
section above. perf script "synth" field displays "ptwrite" information like
this: "ip: 0 payload: 0x123456789abcdef0" where "ip" is 1 if "fup_on_ptw" was
used.
@@ -960,7 +1008,7 @@ used.
"Power" events correspond to power event packets and CBR (core-to-bus ratio)
packets. While CBR packets are always recorded when tracing is enabled, power
event packets are recorded only if the "pwr_evt" config term was used. Refer to
-the config terms section above. The power events record information about
+the <<_config_terms,config terms>> section above. The power events record information about
C-state changes, whereas CBR is indicative of CPU frequency. perf script
"event,synth" fields display information like this:

@@ -1116,7 +1164,7 @@ What *will* be decoded with the (single) q option:
- asynchronous branches such as interrupts
- indirect branches
- function return target address *if* the noretcomp config term (refer
- config terms section) was used
+ <<_config_terms,config terms>> section) was used
- start of (control-flow) tracing
- end of (control-flow) tracing, if it is not out of context
- power events, ptwrite, transaction start and abort
@@ -1129,7 +1177,7 @@ Repeating the q option (double-q i.e. qq) results in even faster decoding and ev
less detail. The decoder decodes only extended PSB (PSB+) packets, getting the
instruction pointer if there is a FUP packet within PSB+ (i.e. between PSB and
PSBEND). Note PSB packets occur regularly in the trace based on the psb_period
-config term (refer config terms section). There will be a FUP packet if the
+config term (refer <<_config_terms,config terms>> section). There will be a FUP packet if the
PSB+ occurs while control flow is being traced.

What will *not* be decoded with the qq option:
--
2.34.1


2024-01-16 11:15:25

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH V4 10/11] perf intel-pt: Add documentation for pause / resume

Adrian Hunter <[email protected]> writes:
> +
> +For example, to trace only the uname system call (sys_newuname) when running the
> +command line utility uname:
> +
> + $ perf record --kcore -e
> intel_pt/aux-action=start-paused/k,syscalls:sys_enter_newuname/aux-action=resume/,syscalls:sys_exit_newuname/aux-action=pause/
> uname

It's unclear if the syntax works for hardware break points, kprobes, uprobes too?
That would be most useful. If it works would be good to add examples for it.

-Andi


2024-01-16 12:23:03

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V4 10/11] perf intel-pt: Add documentation for pause / resume

On 16/01/24 13:15, Andi Kleen wrote:
> Adrian Hunter <[email protected]> writes:
>> +
>> +For example, to trace only the uname system call (sys_newuname) when running the
>> +command line utility uname:
>> +
>> + $ perf record --kcore -e
>> intel_pt/aux-action=start-paused/k,syscalls:sys_enter_newuname/aux-action=resume/,syscalls:sys_exit_newuname/aux-action=pause/
>> uname
>
> It's unclear if the syntax works for hardware break points, kprobes, uprobes too?

Yes, the perf tool syntax requires only that the group leader is
an AUX area event like intel_pt. Note that an attempt is made to
automatically group AUX area events with events with aux-action,
so grouping syntax like '{...}' is not always necessary.

Note the current kernel implementation is called from
__perf_event_output() which is used in nearly all cases for the
output of samples, the exceptions being Intel BTS (which we do not
support at the same time as Intel PT, but wouldn't make much sense
anyway) and S390 cpumsf_output_event_pid().

> That would be most useful. If it works would be good to add examples for it.

OK


2024-01-19 21:29:54

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH V4 10/11] perf intel-pt: Add documentation for pause / resume

Hello,

On Tue, Jan 16, 2024 at 4:22 AM Adrian Hunter <[email protected]> wrote:
>
> On 16/01/24 13:15, Andi Kleen wrote:
> > Adrian Hunter <[email protected]> writes:
> >> +
> >> +For example, to trace only the uname system call (sys_newuname) when running the
> >> +command line utility uname:
> >> +
> >> + $ perf record --kcore -e
> >> intel_pt/aux-action=start-paused/k,syscalls:sys_enter_newuname/aux-action=resume/,syscalls:sys_exit_newuname/aux-action=pause/
> >> uname
> >
> > It's unclear if the syntax works for hardware break points, kprobes, uprobes too?
>
> Yes, the perf tool syntax requires only that the group leader is
> an AUX area event like intel_pt. Note that an attempt is made to
> automatically group AUX area events with events with aux-action,
> so grouping syntax like '{...}' is not always necessary.

Depends on the position, right? Maybe there can be other events
without aux-action mixed with aux events.

Thanks,
Namhyung

>
> Note the current kernel implementation is called from
> __perf_event_output() which is used in nearly all cases for the
> output of samples, the exceptions being Intel BTS (which we do not
> support at the same time as Intel PT, but wouldn't make much sense
> anyway) and S390 cpumsf_output_event_pid().
>
> > That would be most useful. If it works would be good to add examples for it.
>
> OK
>

2024-01-19 21:40:28

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH V4 01/11] perf/core: Add aux_pause, aux_resume, aux_start_paused

On Thu, Jan 11, 2024 at 12:19 AM Adrian Hunter <adrian.hunter@intelcom> wrote:
>
> Hardware traces, such as instruction traces, can produce a vast amount of
> trace data, so being able to reduce tracing to more specific circumstances
> can be useful.
>
> The ability to pause or resume tracing when another event happens, can do
> that.
>
> Add ability for an event to "pause" or "resume" AUX area tracing.
>
> Add aux_pause bit to perf_event_attr to indicate that, if the event
> happens, the associated AUX area tracing should be paused. Ditto
> aux_resume. Do not allow aux_pause and aux_resume to be set together.
>
> Add aux_start_paused bit to perf_event_attr to indicate to an AUX area
> event that it should start in a "paused" state.
>
> Add aux_paused to struct perf_event for AUX area events to keep track of
> the "paused" state. aux_paused is initialized to aux_start_paused.
>
> Add PERF_EF_PAUSE and PERF_EF_RESUME modes for ->stop() and ->start()
> callbacks. Call as needed, during __perf_event_output(). Add
> aux_in_pause_resume to struct perf_buffer to prevent races with the NMI
> handler. Pause/resume in NMI context will miss out if it coincides with
> another pause/resume.
>
> To use aux_pause or aux_resume, an event must be in a group with the AUX
> area event as the group leader.
>
> Example (requires Intel PT and tools patches also):
>
> $ perf record --kcore -e intel_pt/aux-action=start-paused/k,syscalls:sys_enter_newuname/aux-action=resume/,syscalls:sys_exit_newuname/aux-action=pause/ uname
> Linux
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.043 MB perf.data ]
> $ perf script --call-trace
> uname 30805 [000] 24001.058782799: name: 0x7ffc9c1865b0
> uname 30805 [000] 24001.058784424: psb offs: 0
> uname 30805 [000] 24001.058784424: cbr: 39 freq: 3904 MHz (139%)
> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) debug_smp_processor_id
> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __x64_sys_newuname
> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) down_read
> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __cond_resched
> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add
> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) in_lock_functions
> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_sub
> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) up_read
> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add
> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) in_lock_functions
> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) preempt_count_sub
> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) _copy_to_user
> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_to_user_mode
> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_work
> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) perf_syscall_exit
> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) debug_smp_processor_id
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_alloc
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_get_recursion_context
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_tp_event
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_update
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) tracing_gen_ctx_irq_test
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_event
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __perf_event_account_interrupt
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __this_cpu_preempt_check
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_output_forward
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_aux_pause
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) ring_buffer_get
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_lock
> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_unlock
> uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) pt_event_stop
> uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id
> uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id
> uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) native_write_msr
> uname 30805 [000] 24001.058785463: ([kernel.kallsyms]) native_write_msr
> uname 30805 [000] 24001.058785639: 0x0

Looks great! I think this is very similar to what Kees asked in

https://lore.kernel.org/linux-perf-users/202401091452.B73E21B6C@keescook/

I have a couple of basic questions:
* Can we do that for regular events too?
* What's the difference between start/stop and pause/resume?
(IOW can we do that just using start/stop callbacks?)

Actually I was thinking about dropping samples using a BPF filter
outside the target scope (e.g. a syscall) but it'd be nice if we can
have builtin support for that.

Thanks,
Namhyung

>
> Signed-off-by: Adrian Hunter <[email protected]>
> ---
>
>
> Changes in V4:
> Rename aux_output_cfg -> aux_action
> Reorder aux_action bits from:
> aux_pause, aux_resume, aux_start_paused
> to:
> aux_start_paused, aux_pause, aux_resume
> Fix aux_action bits __u64 -> __u32
>
>
> include/linux/perf_event.h | 15 +++++++
> include/uapi/linux/perf_event.h | 11 ++++-
> kernel/events/core.c | 72 +++++++++++++++++++++++++++++++--
> kernel/events/internal.h | 1 +
> 4 files changed, 95 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 5547ba68e6e4..342879168269 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -291,6 +291,7 @@ struct perf_event_pmu_context;
> #define PERF_PMU_CAP_NO_EXCLUDE 0x0040
> #define PERF_PMU_CAP_AUX_OUTPUT 0x0080
> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> +#define PERF_PMU_CAP_AUX_PAUSE 0x0200
>
> struct perf_output_handle;
>
> @@ -363,6 +364,8 @@ struct pmu {
> #define PERF_EF_START 0x01 /* start the counter when adding */
> #define PERF_EF_RELOAD 0x02 /* reload the counter when starting */
> #define PERF_EF_UPDATE 0x04 /* update the counter when stopping */
> +#define PERF_EF_PAUSE 0x08 /* AUX area event, pause tracing */
> +#define PERF_EF_RESUME 0x10 /* AUX area event, resume tracing */
>
> /*
> * Adds/Removes a counter to/from the PMU, can be done inside a
> @@ -402,6 +405,15 @@ struct pmu {
> *
> * ->start() with PERF_EF_RELOAD will reprogram the counter
> * value, must be preceded by a ->stop() with PERF_EF_UPDATE.
> + *
> + * ->stop() with PERF_EF_PAUSE will stop as simply as possible. Will not
> + * overlap another ->stop() with PERF_EF_PAUSE nor ->start() with
> + * PERF_EF_RESUME.
> + *
> + * ->start() with PERF_EF_RESUME will start as simply as possible but
> + * only if the counter is not otherwise stopped. Will not overlap
> + * another ->start() with PERF_EF_RESUME nor ->stop() with
> + * PERF_EF_PAUSE.
> */
> void (*start) (struct perf_event *event, int flags);
> void (*stop) (struct perf_event *event, int flags);
> @@ -798,6 +810,9 @@ struct perf_event {
> /* for aux_output events */
> struct perf_event *aux_event;
>
> + /* for AUX area events */
> + unsigned int aux_paused;
> +
> void (*destroy)(struct perf_event *);
> struct rcu_head rcu_head;
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 39c6a250dd1b..5f6b3b494184 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -507,7 +507,16 @@ struct perf_event_attr {
> __u16 sample_max_stack;
> __u16 __reserved_2;
> __u32 aux_sample_size;
> - __u32 __reserved_3;
> +
> + union {
> + __u32 aux_action;
> + struct {
> + __u32 aux_start_paused : 1, /* start AUX area tracing paused */
> + aux_pause : 1, /* on overflow, pause AUX area tracing */
> + aux_resume : 1, /* on overflow, resume AUX area tracing */
> + __reserved_3 : 29;
> + };
> + };
>
> /*
> * User provided data if sigtrap=1, passed back to user via
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 9efd0d7775e7..dc9ec2443ac9 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2097,7 +2097,8 @@ static void perf_put_aux_event(struct perf_event *event)
>
> static bool perf_need_aux_event(struct perf_event *event)
> {
> - return !!event->attr.aux_output || !!event->attr.aux_sample_size;
> + return event->attr.aux_output || event->attr.aux_sample_size ||
> + event->attr.aux_pause || event->attr.aux_resume;
> }
>
> static int perf_get_aux_event(struct perf_event *event,
> @@ -2122,6 +2123,10 @@ static int perf_get_aux_event(struct perf_event *event,
> !perf_aux_output_match(event, group_leader))
> return 0;
>
> + if ((event->attr.aux_pause || event->attr.aux_resume) &&
> + !(group_leader->pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE))
> + return 0;
> +
> if (event->attr.aux_sample_size && !group_leader->pmu->snapshot_aux)
> return 0;
>
> @@ -7846,6 +7851,47 @@ void perf_prepare_header(struct perf_event_header *header,
> WARN_ON_ONCE(header->size & 7);
> }
>
> +static void __perf_event_aux_pause(struct perf_event *event, bool pause)
> +{
> + if (pause) {
> + if (!READ_ONCE(event->aux_paused)) {
> + WRITE_ONCE(event->aux_paused, 1);
> + event->pmu->stop(event, PERF_EF_PAUSE);
> + }
> + } else {
> + if (READ_ONCE(event->aux_paused)) {
> + WRITE_ONCE(event->aux_paused, 0);
> + event->pmu->start(event, PERF_EF_RESUME);
> + }
> + }
> +}
> +
> +static void perf_event_aux_pause(struct perf_event *event, bool pause)
> +{
> + struct perf_buffer *rb;
> + unsigned long flags;
> +
> + if (WARN_ON_ONCE(!event))
> + return;
> +
> + rb = ring_buffer_get(event);
> + if (!rb)
> + return;
> +
> + local_irq_save(flags);
> + /* Guard against NMI, NMI loses here */
> + if (READ_ONCE(rb->aux_in_pause_resume))
> + goto out_restore;
> + WRITE_ONCE(rb->aux_in_pause_resume, 1);
> + barrier();
> + __perf_event_aux_pause(event, pause);
> + barrier();
> + WRITE_ONCE(rb->aux_in_pause_resume, 0);
> +out_restore:
> + local_irq_restore(flags);
> + ring_buffer_put(rb);
> +}
> +
> static __always_inline int
> __perf_event_output(struct perf_event *event,
> struct perf_sample_data *data,
> @@ -7859,6 +7905,9 @@ __perf_event_output(struct perf_event *event,
> struct perf_event_header header;
> int err;
>
> + if (event->attr.aux_pause)
> + perf_event_aux_pause(event->aux_event, true);
> +
> /* protect the callchain buffers */
> rcu_read_lock();
>
> @@ -7875,6 +7924,10 @@ __perf_event_output(struct perf_event *event,
>
> exit:
> rcu_read_unlock();
> +
> + if (event->attr.aux_resume)
> + perf_event_aux_pause(event->aux_event, false);
> +
> return err;
> }
>
> @@ -12014,10 +12067,23 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
> }
>
> if (event->attr.aux_output &&
> - !(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT)) {
> + (!(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT) ||
> + event->attr.aux_pause || event->attr.aux_resume)) {
> + err = -EOPNOTSUPP;
> + goto err_pmu;
> + }
> +
> + if (event->attr.aux_pause && event->attr.aux_resume) {
> + err = -EINVAL;
> + goto err_pmu;
> + }
> +
> + if (event->attr.aux_start_paused &&
> + !(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE)) {
> err = -EOPNOTSUPP;
> goto err_pmu;
> }
> + event->aux_paused = event->attr.aux_start_paused;
>
> if (cgroup_fd != -1) {
> err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
> @@ -12814,7 +12880,7 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
> * Grouping is not supported for kernel events, neither is 'AUX',
> * make sure the caller's intentions are adjusted.
> */
> - if (attr->aux_output)
> + if (attr->aux_output || attr->aux_action)
> return ERR_PTR(-EINVAL);
>
> event = perf_event_alloc(attr, cpu, task, NULL, NULL,
> diff --git a/kernel/events/internal.h b/kernel/events/internal.h
> index 5150d5f84c03..3320f78117dc 100644
> --- a/kernel/events/internal.h
> +++ b/kernel/events/internal.h
> @@ -51,6 +51,7 @@ struct perf_buffer {
> void (*free_aux)(void *);
> refcount_t aux_refcount;
> int aux_in_sampling;
> + int aux_in_pause_resume;
> void **aux_pages;
> void *aux_priv;
>
> --
> 2.34.1
>

2024-01-22 10:31:25

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V4 10/11] perf intel-pt: Add documentation for pause / resume

On 19/01/24 23:28, Namhyung Kim wrote:
> Hello,
>
> On Tue, Jan 16, 2024 at 4:22 AM Adrian Hunter <[email protected]> wrote:
>>
>> On 16/01/24 13:15, Andi Kleen wrote:
>>> Adrian Hunter <[email protected]> writes:
>>>> +
>>>> +For example, to trace only the uname system call (sys_newuname) when running the
>>>> +command line utility uname:
>>>> +
>>>> + $ perf record --kcore -e
>>>> intel_pt/aux-action=start-paused/k,syscalls:sys_enter_newuname/aux-action=resume/,syscalls:sys_exit_newuname/aux-action=pause/
>>>> uname
>>>
>>> It's unclear if the syntax works for hardware break points, kprobes, uprobes too?
>>
>> Yes, the perf tool syntax requires only that the group leader is
>> an AUX area event like intel_pt. Note that an attempt is made to
>> automatically group AUX area events with events with aux-action,
>> so grouping syntax like '{...}' is not always necessary.
>
> Depends on the position, right? Maybe there can be other events
> without aux-action mixed with aux events.

Yes it depends on position. Non-grouped events in between will get
grouped too. So:

-e intel_pt --filter=blah -e not_grouped_event -e some_event/aux-action=resume/

would put those 3 in a group, but still allow --filter.

>
> Thanks,
> Namhyung
>
>>
>> Note the current kernel implementation is called from
>> __perf_event_output() which is used in nearly all cases for the
>> output of samples, the exceptions being Intel BTS (which we do not
>> support at the same time as Intel PT, but wouldn't make much sense
>> anyway) and S390 cpumsf_output_event_pid().
>>
>>> That would be most useful. If it works would be good to add examples for it.
>>
>> OK
>>


2024-01-23 10:09:36

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V4 01/11] perf/core: Add aux_pause, aux_resume, aux_start_paused

On 19/01/24 23:40, Namhyung Kim wrote:
> On Thu, Jan 11, 2024 at 12:19 AM Adrian Hunter <[email protected]> wrote:
>>
>> Hardware traces, such as instruction traces, can produce a vast amount of
>> trace data, so being able to reduce tracing to more specific circumstances
>> can be useful.
>>
>> The ability to pause or resume tracing when another event happens, can do
>> that.
>>
>> Add ability for an event to "pause" or "resume" AUX area tracing.
>>
>> Add aux_pause bit to perf_event_attr to indicate that, if the event
>> happens, the associated AUX area tracing should be paused. Ditto
>> aux_resume. Do not allow aux_pause and aux_resume to be set together.
>>
>> Add aux_start_paused bit to perf_event_attr to indicate to an AUX area
>> event that it should start in a "paused" state.
>>
>> Add aux_paused to struct perf_event for AUX area events to keep track of
>> the "paused" state. aux_paused is initialized to aux_start_paused.
>>
>> Add PERF_EF_PAUSE and PERF_EF_RESUME modes for ->stop() and ->start()
>> callbacks. Call as needed, during __perf_event_output(). Add
>> aux_in_pause_resume to struct perf_buffer to prevent races with the NMI
>> handler. Pause/resume in NMI context will miss out if it coincides with
>> another pause/resume.
>>
>> To use aux_pause or aux_resume, an event must be in a group with the AUX
>> area event as the group leader.
>>
>> Example (requires Intel PT and tools patches also):
>>
>> $ perf record --kcore -e intel_pt/aux-action=start-paused/k,syscalls:sys_enter_newuname/aux-action=resume/,syscalls:sys_exit_newuname/aux-action=pause/ uname
>> Linux
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.043 MB perf.data ]
>> $ perf script --call-trace
>> uname 30805 [000] 24001.058782799: name: 0x7ffc9c1865b0
>> uname 30805 [000] 24001.058784424: psb offs: 0
>> uname 30805 [000] 24001.058784424: cbr: 39 freq: 3904 MHz (139%)
>> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) debug_smp_processor_id
>> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __x64_sys_newuname
>> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) down_read
>> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) __cond_resched
>> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add
>> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) in_lock_functions
>> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_sub
>> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) up_read
>> uname 30805 [000] 24001.058784629: ([kernel.kallsyms]) preempt_count_add
>> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) in_lock_functions
>> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) preempt_count_sub
>> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) _copy_to_user
>> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_to_user_mode
>> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) syscall_exit_work
>> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) perf_syscall_exit
>> uname 30805 [000] 24001.058784838: ([kernel.kallsyms]) debug_smp_processor_id
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_alloc
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_get_recursion_context
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) debug_smp_processor_id
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_tp_event
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_trace_buf_update
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) tracing_gen_ctx_irq_test
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_swevent_event
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __perf_event_account_interrupt
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __this_cpu_preempt_check
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_output_forward
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) perf_event_aux_pause
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) ring_buffer_get
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_lock
>> uname 30805 [000] 24001.058785046: ([kernel.kallsyms]) __rcu_read_unlock
>> uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) pt_event_stop
>> uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id
>> uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) debug_smp_processor_id
>> uname 30805 [000] 24001.058785254: ([kernel.kallsyms]) native_write_msr
>> uname 30805 [000] 24001.058785463: ([kernel.kallsyms]) native_write_msr
>> uname 30805 [000] 24001.058785639: 0x0
>
> Looks great! I think this is very similar to what Kees asked in
>
> https://lore.kernel.org/linux-perf-users/202401091452.B73E21B6C@keescook/

Sometimes a precisely-defined workload is needed, just so that
running it repeatedly does not produce results that vary too much
to tell whether one software version is better than another.

>
> I have a couple of basic questions:
> * Can we do that for regular events too?

That would be much more complicated. The current implementation
can only pause / resume 1 event, the group leader, and it has to
be supported by the PMU callbacks.

> * What's the difference between start/stop and pause/resume?
> (IOW can we do that just using start/stop callbacks?)

It is using start / stop callbacks, albeit with a different mode
parameter. However pause / resume is not allowed unless the event
has been started and not stopped, so it is a different state.

>
> Actually I was thinking about dropping samples using a BPF filter
> outside the target scope (e.g. a syscall) but it'd be nice if we can
> have builtin support for that.

In general, I would have thought that capturing samples does not
produce so much data that it cannot be filtered in post-processing.
Looking at the email thread from above, that seems to be what
Arnaldo has proposed.

AUX area tracing is different in this regard. Intel PT can produce
more trace data than can be written out in time, so data will be
lost for large traces. Also post-processing takes a long time, so
less data captured helps a lot there also.


2024-01-29 12:54:15

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V4 00/11] perf/core: Add ability for an event to "pause" or "resume" AUX area tracing

On 11/01/24 10:19, Adrian Hunter wrote:
> Hi
>
> Hardware traces, such as instruction traces, can produce a vast amount of
> trace data, so being able to reduce tracing to more specific circumstances
> can be useful.
>
> The ability to pause or resume tracing when another event happens, can do
> that.
>
> These patches add such a facilty and show how it would work for Intel
> Processor Trace.
>
> Maintainers of other AUX area tracing implementations are requested to
> consider if this is something they might employ and then whether or not
> the ABI would work for them.
>
> Changes to perf tools are now (since V4) fleshed out.
>
>
> Changes in V4:
>
> perf/core: Add aux_pause, aux_resume, aux_start_paused
> Rename aux_output_cfg -> aux_action
> Reorder aux_action bits from:
> aux_pause, aux_resume, aux_start_paused
> to:
> aux_start_paused, aux_pause, aux_resume
> Fix aux_action bits __u64 -> __u32
>
> coresight: Have a stab at support for pause / resume
> Dropped
>
> perf tools
> All new patches
>
> Changes in RFC V3:
>
> coresight: Have a stab at support for pause / resume
> 'mode' -> 'flags' so it at least compiles
>
> Changes in RFC V2:
>
> Use ->stop() / ->start() instead of ->pause_resume()
> Move aux_start_paused bit into aux_output_cfg
> Tighten up when Intel PT pause / resume is allowed
> Add an example of how it might work for CoreSight

Any more comments?


2024-01-31 16:57:20

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH V4 00/11] perf/core: Add ability for an event to "pause" or "resume" AUX area tracing

On Mon, Jan 29, 2024 at 4:49 AM Adrian Hunter <[email protected]> wrote:
>
> On 11/01/24 10:19, Adrian Hunter wrote:
> > Hi
> >
> > Hardware traces, such as instruction traces, can produce a vast amount of
> > trace data, so being able to reduce tracing to more specific circumstances
> > can be useful.
> >
> > The ability to pause or resume tracing when another event happens, can do
> > that.
> >
> > These patches add such a facilty and show how it would work for Intel
> > Processor Trace.
> >
> > Maintainers of other AUX area tracing implementations are requested to
> > consider if this is something they might employ and then whether or not
> > the ABI would work for them.
> >
> > Changes to perf tools are now (since V4) fleshed out.
> >
> >
> > Changes in V4:
> >
> > perf/core: Add aux_pause, aux_resume, aux_start_paused
> > Rename aux_output_cfg -> aux_action
> > Reorder aux_action bits from:
> > aux_pause, aux_resume, aux_start_paused
> > to:
> > aux_start_paused, aux_pause, aux_resume
> > Fix aux_action bits __u64 -> __u32
> >
> > coresight: Have a stab at support for pause / resume
> > Dropped
> >
> > perf tools
> > All new patches
> >
> > Changes in RFC V3:
> >
> > coresight: Have a stab at support for pause / resume
> > 'mode' -> 'flags' so it at least compiles
> >
> > Changes in RFC V2:
> >
> > Use ->stop() / ->start() instead of ->pause_resume()
> > Move aux_start_paused bit into aux_output_cfg
> > Tighten up when Intel PT pause / resume is allowed
> > Add an example of how it might work for CoreSight
>
> Any more comments?

I think the tools side looks good. The parsing changes match the
existing style. I wonder if it wouldn't be better to handle the valid
strings (pause, resume, etc.) in the lexer rather than a separate
parse function, but the pattern used matches the existing one. You can
have my Acked-by on the tools changes, although the subtleties of ARM
PMUs makes me somewhat nervous in this regard.

Thanks,
Ian

2024-02-01 16:42:23

by James Clark

[permalink] [raw]
Subject: Re: [PATCH V4 00/11] perf/core: Add ability for an event to "pause" or "resume" AUX area tracing



On 31/01/2024 16:53, Ian Rogers wrote:
> On Mon, Jan 29, 2024 at 4:49 AM Adrian Hunter <[email protected]> wrote:
>>
>> On 11/01/24 10:19, Adrian Hunter wrote:
>>> Hi
>>>
>>> Hardware traces, such as instruction traces, can produce a vast amount of
>>> trace data, so being able to reduce tracing to more specific circumstances
>>> can be useful.
>>>
>>> The ability to pause or resume tracing when another event happens, can do
>>> that.
>>>
>>> These patches add such a facilty and show how it would work for Intel
>>> Processor Trace.
>>>
>>> Maintainers of other AUX area tracing implementations are requested to
>>> consider if this is something they might employ and then whether or not
>>> the ABI would work for them.
>>>
>>> Changes to perf tools are now (since V4) fleshed out.
>>>
>>>
>>> Changes in V4:
>>>
>>> perf/core: Add aux_pause, aux_resume, aux_start_paused
>>> Rename aux_output_cfg -> aux_action
>>> Reorder aux_action bits from:
>>> aux_pause, aux_resume, aux_start_paused
>>> to:
>>> aux_start_paused, aux_pause, aux_resume
>>> Fix aux_action bits __u64 -> __u32
>>>
>>> coresight: Have a stab at support for pause / resume
>>> Dropped
>>>
>>> perf tools
>>> All new patches
>>>
>>> Changes in RFC V3:
>>>
>>> coresight: Have a stab at support for pause / resume
>>> 'mode' -> 'flags' so it at least compiles
>>>
>>> Changes in RFC V2:
>>>
>>> Use ->stop() / ->start() instead of ->pause_resume()
>>> Move aux_start_paused bit into aux_output_cfg
>>> Tighten up when Intel PT pause / resume is allowed
>>> Add an example of how it might work for CoreSight
>>
>> Any more comments?
>
> I think the tools side looks good. The parsing changes match the
> existing style. I wonder if it wouldn't be better to handle the valid
> strings (pause, resume, etc.) in the lexer rather than a separate
> parse function, but the pattern used matches the existing one. You can
> have my Acked-by on the tools changes, although the subtleties of ARM
> PMUs makes me somewhat nervous in this regard.
>
> Thanks,
> Ian

Acked-by: James Clark <[email protected]>

I will get round to adding the Coresight support at some point. I
checked the new parsing in this version and it seems to work ok.

2024-02-08 11:41:19

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH V4 00/11] perf/core: Add ability for an event to "pause" or "resume" AUX area tracing

On 1/02/24 18:29, James Clark wrote:
>
>
> On 31/01/2024 16:53, Ian Rogers wrote:
>> On Mon, Jan 29, 2024 at 4:49 AM Adrian Hunter <[email protected]> wrote:
>>>
>>> On 11/01/24 10:19, Adrian Hunter wrote:
>>>> Hi
>>>>
>>>> Hardware traces, such as instruction traces, can produce a vast amount of
>>>> trace data, so being able to reduce tracing to more specific circumstances
>>>> can be useful.
>>>>
>>>> The ability to pause or resume tracing when another event happens, can do
>>>> that.
>>>>
>>>> These patches add such a facilty and show how it would work for Intel
>>>> Processor Trace.
>>>>
>>>> Maintainers of other AUX area tracing implementations are requested to
>>>> consider if this is something they might employ and then whether or not
>>>> the ABI would work for them.
>>>>
>>>> Changes to perf tools are now (since V4) fleshed out.
>>>>
>>>>
>>>> Changes in V4:
>>>>
>>>> perf/core: Add aux_pause, aux_resume, aux_start_paused
>>>> Rename aux_output_cfg -> aux_action
>>>> Reorder aux_action bits from:
>>>> aux_pause, aux_resume, aux_start_paused
>>>> to:
>>>> aux_start_paused, aux_pause, aux_resume
>>>> Fix aux_action bits __u64 -> __u32
>>>>
>>>> coresight: Have a stab at support for pause / resume
>>>> Dropped
>>>>
>>>> perf tools
>>>> All new patches
>>>>
>>>> Changes in RFC V3:
>>>>
>>>> coresight: Have a stab at support for pause / resume
>>>> 'mode' -> 'flags' so it at least compiles
>>>>
>>>> Changes in RFC V2:
>>>>
>>>> Use ->stop() / ->start() instead of ->pause_resume()
>>>> Move aux_start_paused bit into aux_output_cfg
>>>> Tighten up when Intel PT pause / resume is allowed
>>>> Add an example of how it might work for CoreSight
>>>
>>> Any more comments?
>>
>> I think the tools side looks good. The parsing changes match the
>> existing style. I wonder if it wouldn't be better to handle the valid
>> strings (pause, resume, etc.) in the lexer rather than a separate
>> parse function, but the pattern used matches the existing one. You can
>> have my Acked-by on the tools changes, although the subtleties of ARM
>> PMUs makes me somewhat nervous in this regard.
>>
>> Thanks,
>> Ian
>
> Acked-by: James Clark <[email protected]>
>
> I will get round to adding the Coresight support at some point. I
> checked the new parsing in this version and it seems to work ok.

Thanks James and Ian!