2023-11-23 12:19:34

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH RFC 0/3] perf/core: Add ability for an event to "pause" or "resume" AUX area tracing

Hi

Hardware traces, such as instruction traces, can produce a vast amount of
trace data, so being able to reduce tracing to more specific circumstances
can be useful.

The ability to pause or resume tracing when another event happens, can do
that.

These patches add such a facilty and show how it would work for Intel
Processor Trace.

Maintainers of other AUX area tracing implementations are requested to
consider if this is something they might employ and then whether or not
the ABI would work for them.

Changes to perf tools are not fleshed out yet.


Adrian Hunter (3):
perf/core: Add aux_pause, aux_resume, aux_start_paused
perf/x86/intel/pt: Add support for pause_resume()
perf tools: Add support for AUX area pause_resume()

arch/x86/events/intel/pt.c | 12 ++++++
include/linux/perf_event.h | 9 +++++
include/uapi/linux/perf_event.h | 13 ++++++-
kernel/events/core.c | 65 +++++++++++++++++++++++++++++--
kernel/events/internal.h | 1 +
tools/include/uapi/linux/perf_event.h | 13 ++++++-
tools/perf/util/auxtrace.c | 4 ++
tools/perf/util/evsel.c | 9 +++++
tools/perf/util/evsel_config.h | 6 +++
tools/perf/util/parse-events.c | 33 ++++++++++++++++
tools/perf/util/parse-events.h | 3 ++
tools/perf/util/parse-events.l | 3 ++
tools/perf/util/perf_event_attr_fprintf.c | 3 ++
13 files changed, 167 insertions(+), 7 deletions(-)


Regards
Adrian


2023-11-23 12:19:49

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH RFC 2/3] perf/x86/intel/pt: Add support for pause_resume()

Prevent tracing to start if aux_paused.

Implement pause_resume() callback. When aux_paused, stop tracing. When
not aux_paused, only start tracing if it isn't currently meant to be
stopped.

Signed-off-by: Adrian Hunter <[email protected]>
---
arch/x86/events/intel/pt.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 42a55794004a..aa883b64814a 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -418,6 +418,9 @@ static void pt_config_start(struct perf_event *event)
struct pt *pt = this_cpu_ptr(&pt_ctx);
u64 ctl = event->hw.config;

+ if (event->aux_paused)
+ return;
+
ctl |= RTIT_CTL_TRACEEN;
if (READ_ONCE(pt->vmx_on))
perf_aux_output_flag(&pt->handle, PERF_AUX_FLAG_PARTIAL);
@@ -1563,6 +1566,14 @@ EXPORT_SYMBOL_GPL(intel_pt_handle_vmx);
* PMU callbacks
*/

+static void pt_event_pause_resume(struct perf_event *event)
+{
+ if (event->aux_paused)
+ pt_config_stop(event);
+ else if (!event->hw.state)
+ pt_config_start(event);
+}
+
static void pt_event_start(struct perf_event *event, int mode)
{
struct hw_perf_event *hwc = &event->hw;
@@ -1798,6 +1809,7 @@ static __init int pt_init(void)
pt_pmu.pmu.del = pt_event_del;
pt_pmu.pmu.start = pt_event_start;
pt_pmu.pmu.stop = pt_event_stop;
+ pt_pmu.pmu.pause_resume = pt_event_pause_resume;
pt_pmu.pmu.snapshot_aux = pt_event_snapshot_aux;
pt_pmu.pmu.read = pt_event_read;
pt_pmu.pmu.setup_aux = pt_buffer_setup_aux;
--
2.34.1

2023-11-23 12:19:57

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH RFC 1/3] perf/core: Add aux_pause, aux_resume, aux_start_paused

Hardware traces, such as instruction traces, can produce a vast amount of
trace data, so being able to reduce tracing to more specific circumstances
can be useful.

The ability to pause or resume tracing when another event happens, can do
that.

Add ability for an event to "pause" or "resume" AUX area tracing.

Add aux_pause bit to perf_event_attr to indicate that, if the event
happens, the associated AUX area tracing should be paused. Ditto
aux_resume. Do not allow aux_pause and aux_resume to be set together.

Add aux_start_paused bit to perf_event_attr to indicate to an AUX area
event that it should start in a "paused" state.

Add aux_paused to struct perf_event for AUX area events to keep track of
the "paused" state. aux_paused is initialized to aux_start_paused.

Add pause_resume() callback to struct pmu to perform a pause or resume.
Whether to pause or resume is indicated by aux_paused. Call pause_resume()
as needed, during __perf_event_output(). Add aux_in_pause_resume to
struct perf_buffer to prevent races with the NMI handler. Pause/resume in
NMI context will miss out if it coinicides with another pause/resume.

To use aux_pause or aux_resume, an event must be in a group with the AUX
area event as the group leader.

Example (requires Intel PT and tools patches also):

$ perf record --kcore -e '{intel_pt/aux-start-paused/k,syscalls:sys_enter_newuname/aux-resume/,syscalls:sys_exit_newuname/aux-pause/}' uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.041 MB perf.data ]
$ perf script --call-trace
uname 8866 [003] 1549.674345648: name: 0x7fff01491a90
uname 8866 [003] 1549.674347393: psb offs: 0
uname 8866 [003] 1549.674347393: cbr: 47 freq: 4705 MHz (168%)
uname 8866 [003] 1549.674347556: ([kernel.kallsyms]) debug_smp_processor_id
uname 8866 [003] 1549.674347556: ([kernel.kallsyms]) __x64_sys_newuname
uname 8866 [003] 1549.674347556: ([kernel.kallsyms]) down_read
uname 8866 [003] 1549.674347556: ([kernel.kallsyms]) __cond_resched
uname 8866 [003] 1549.674347764: ([kernel.kallsyms]) preempt_count_add
uname 8866 [003] 1549.674347764: ([kernel.kallsyms]) in_lock_functions
uname 8866 [003] 1549.674347764: ([kernel.kallsyms]) preempt_count_sub
uname 8866 [003] 1549.674347764: ([kernel.kallsyms]) up_read
uname 8866 [003] 1549.674347764: ([kernel.kallsyms]) preempt_count_add
uname 8866 [003] 1549.674348181: ([kernel.kallsyms]) in_lock_functions
uname 8866 [003] 1549.674348181: ([kernel.kallsyms]) preempt_count_sub
uname 8866 [003] 1549.674348181: ([kernel.kallsyms]) _copy_to_user
uname 8866 [003] 1549.674348181: ([kernel.kallsyms]) syscall_exit_to_user_mode
uname 8866 [003] 1549.674348181: ([kernel.kallsyms]) syscall_exit_work
uname 8866 [003] 1549.674348181: ([kernel.kallsyms]) perf_syscall_exit
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) debug_smp_processor_id
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) perf_trace_buf_alloc
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) perf_swevent_get_recursion_context
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) debug_smp_processor_id
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) debug_smp_processor_id
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) perf_tp_event
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) perf_trace_buf_update
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) tracing_gen_ctx_irq_test
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) perf_swevent_event
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) __perf_event_account_interrupt
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) __this_cpu_preempt_check
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) perf_event_output_forward
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) perf_event_aux_pause.part.0
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) ring_buffer_get
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) __rcu_read_lock
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) __rcu_read_unlock
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) pt_event_pause_resume
uname 8866 [003] 1549.674348389: ([kernel.kallsyms]) debug_smp_processor_id
uname 8866 [003] 1549.674348598: ([kernel.kallsyms]) native_write_msr
uname 8866 [003] 1549.674348598: ([kernel.kallsyms]) native_write_msr
uname 8866 [003] 1549.674348869: 0x0

Signed-off-by: Adrian Hunter <[email protected]>
---
include/linux/perf_event.h | 9 +++++
include/uapi/linux/perf_event.h | 13 ++++++-
kernel/events/core.c | 65 +++++++++++++++++++++++++++++++--
kernel/events/internal.h | 1 +
4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e85cd1c0eaf3..1e09d537ff77 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -540,6 +540,12 @@ struct pmu {
* Check period value for PERF_EVENT_IOC_PERIOD ioctl.
*/
int (*check_period) (struct perf_event *event, u64 value); /* optional */
+
+ /*
+ * Pause or resume tracing. Used for AUX area tracing. event->aux_paused
+ * indicates whether tracing should be paused, or otherwise resumed.
+ */
+ void (*pause_resume) (struct perf_event *event); /* optional */
};

enum perf_addr_filter_action_t {
@@ -797,6 +803,9 @@ struct perf_event {
/* for aux_output events */
struct perf_event *aux_event;

+ /* for AUX area events */
+ unsigned int aux_paused;
+
void (*destroy)(struct perf_event *);
struct rcu_head rcu_head;

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 39c6a250dd1b..1232e70adfba 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -456,7 +456,8 @@ struct perf_event_attr {
inherit_thread : 1, /* children only inherit if cloned with CLONE_THREAD */
remove_on_exec : 1, /* event is removed from task on exec */
sigtrap : 1, /* send synchronous SIGTRAP on event */
- __reserved_1 : 26;
+ aux_start_paused : 1, /* start AUX area tracing paused */
+ __reserved_1 : 25;

union {
__u32 wakeup_events; /* wakeup every n events */
@@ -507,7 +508,15 @@ struct perf_event_attr {
__u16 sample_max_stack;
__u16 __reserved_2;
__u32 aux_sample_size;
- __u32 __reserved_3;
+
+ union {
+ __u32 aux_output_cfg;
+ struct {
+ __u64 aux_pause : 1, /* on overflow, pause AUX area tracing */
+ aux_resume : 1, /* on overflow, resume AUX area tracing */
+ __reserved_3 : 30;
+ };
+ };

/*
* User provided data if sigtrap=1, passed back to user via
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4c72a41f11af..8ec72f9153b7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2060,7 +2060,13 @@ static void perf_put_aux_event(struct perf_event *event)

static bool perf_need_aux_event(struct perf_event *event)
{
- return !!event->attr.aux_output || !!event->attr.aux_sample_size;
+ return event->attr.aux_output || event->attr.aux_output_cfg ||
+ event->attr.aux_sample_size;
+}
+
+static bool has_aux_pause_resume(struct perf_event *event)
+{
+ return has_aux(event) && event->pmu->pause_resume;
}

static int perf_get_aux_event(struct perf_event *event,
@@ -2085,6 +2091,10 @@ static int perf_get_aux_event(struct perf_event *event,
!perf_aux_output_match(event, group_leader))
return 0;

+ if ((event->attr.aux_pause || event->attr.aux_resume) &&
+ !has_aux_pause_resume(group_leader))
+ return 0;
+
if (event->attr.aux_sample_size && !group_leader->pmu->snapshot_aux)
return 0;

@@ -7773,6 +7783,36 @@ void perf_prepare_header(struct perf_event_header *header,
WARN_ON_ONCE(header->size & 7);
}

+static void perf_event_aux_pause(struct perf_event *event, bool pause)
+{
+ struct perf_buffer *rb;
+ unsigned long flags;
+
+ if (WARN_ON_ONCE(!event))
+ return;
+
+ if ((pause && event->aux_paused) || (!pause && !event->aux_paused))
+ return;
+
+ rb = ring_buffer_get(event);
+ if (!rb)
+ return;
+
+ local_irq_save(flags);
+ /* Guard against NMI, NMI loses here */
+ if (READ_ONCE(rb->aux_in_pause_resume))
+ goto out_restore;
+ WRITE_ONCE(rb->aux_in_pause_resume, 1);
+ barrier();
+ event->aux_paused = pause;
+ event->pmu->pause_resume(event);
+ barrier();
+ WRITE_ONCE(rb->aux_in_pause_resume, 0);
+out_restore:
+ local_irq_restore(flags);
+ ring_buffer_put(rb);
+}
+
static __always_inline int
__perf_event_output(struct perf_event *event,
struct perf_sample_data *data,
@@ -7786,6 +7826,9 @@ __perf_event_output(struct perf_event *event,
struct perf_event_header header;
int err;

+ if (event->attr.aux_pause)
+ perf_event_aux_pause(event->aux_event, true);
+
/* protect the callchain buffers */
rcu_read_lock();

@@ -7802,6 +7845,10 @@ __perf_event_output(struct perf_event *event,

exit:
rcu_read_unlock();
+
+ if (event->attr.aux_resume)
+ perf_event_aux_pause(event->aux_event, false);
+
return err;
}

@@ -11941,10 +11988,22 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
}

if (event->attr.aux_output &&
- !(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT)) {
+ (!(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT) ||
+ event->attr.aux_pause || event->attr.aux_resume)) {
+ err = -EOPNOTSUPP;
+ goto err_pmu;
+ }
+
+ if (event->attr.aux_pause && event->attr.aux_resume) {
+ err = -EINVAL;
+ goto err_pmu;
+ }
+
+ if (event->attr.aux_start_paused && !has_aux_pause_resume(event)) {
err = -EOPNOTSUPP;
goto err_pmu;
}
+ event->aux_paused = event->attr.aux_start_paused;

if (cgroup_fd != -1) {
err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
@@ -12741,7 +12800,7 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
* Grouping is not supported for kernel events, neither is 'AUX',
* make sure the caller's intentions are adjusted.
*/
- if (attr->aux_output)
+ if (attr->aux_output || attr->aux_output_cfg)
return ERR_PTR(-EINVAL);

event = perf_event_alloc(attr, cpu, task, NULL, NULL,
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 5150d5f84c03..3320f78117dc 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -51,6 +51,7 @@ struct perf_buffer {
void (*free_aux)(void *);
refcount_t aux_refcount;
int aux_in_sampling;
+ int aux_in_pause_resume;
void **aux_pages;
void *aux_priv;

--
2.34.1

2023-11-23 12:20:00

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH RFC 3/3] perf tools: Add support for AUX area pause_resume()

Add config terms aux-pause, aux-resume and aux-start-paused.

Still to do: validation, fallbacks for perf_event_open, documentation.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 13 +++++++--
tools/perf/util/auxtrace.c | 4 +++
tools/perf/util/evsel.c | 9 +++++++
tools/perf/util/evsel_config.h | 6 +++++
tools/perf/util/parse-events.c | 33 +++++++++++++++++++++++
tools/perf/util/parse-events.h | 3 +++
tools/perf/util/parse-events.l | 3 +++
tools/perf/util/perf_event_attr_fprintf.c | 3 +++
8 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 3a64499b0f5d..eebd289b22d7 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -460,7 +460,8 @@ struct perf_event_attr {
inherit_thread : 1, /* children only inherit if cloned with CLONE_THREAD */
remove_on_exec : 1, /* event is removed from task on exec */
sigtrap : 1, /* send synchronous SIGTRAP on event */
- __reserved_1 : 26;
+ aux_start_paused : 1, /* start AUX area tracing paused */
+ __reserved_1 : 25;

union {
__u32 wakeup_events; /* wakeup every n events */
@@ -511,7 +512,15 @@ struct perf_event_attr {
__u16 sample_max_stack;
__u16 __reserved_2;
__u32 aux_sample_size;
- __u32 __reserved_3;
+
+ union {
+ __u32 aux_output_cfg;
+ struct {
+ __u64 aux_pause : 1, /* on overflow, pause AUX area tracing */
+ aux_resume : 1, /* on overflow, resume AUX area tracing */
+ __reserved_3 : 30;
+ };
+ };

/*
* User provided data if sigtrap=1, passed back to user via
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index a0368202a746..4a7ca8b0d100 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -814,6 +814,10 @@ void auxtrace_regroup_aux_output(struct evlist *evlist)
if (evsel__is_aux_event(evsel))
aux_evsel = evsel;
term = evsel__get_config_term(evsel, AUX_OUTPUT);
+ if (!term)
+ term = evsel__get_config_term(evsel, AUX_PAUSE);
+ if (!term)
+ term = evsel__get_config_term(evsel, AUX_RESUME);
/* If possible, group with the AUX event */
if (term && aux_evsel)
evlist__regroup(evlist, aux_evsel, evsel);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a5da74e3a517..03553c104954 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1001,6 +1001,15 @@ static void evsel__apply_config_terms(struct evsel *evsel,
case EVSEL__CONFIG_TERM_AUX_OUTPUT:
attr->aux_output = term->val.aux_output ? 1 : 0;
break;
+ case EVSEL__CONFIG_TERM_AUX_PAUSE:
+ attr->aux_pause = term->val.aux_pause ? 1 : 0;
+ break;
+ case EVSEL__CONFIG_TERM_AUX_RESUME:
+ attr->aux_resume = term->val.aux_resume ? 1 : 0;
+ break;
+ case EVSEL__CONFIG_TERM_AUX_START_PAUSED:
+ attr->aux_start_paused = term->val.aux_start_paused ? 1 : 0;
+ break;
case EVSEL__CONFIG_TERM_AUX_SAMPLE_SIZE:
/* Already applied by auxtrace */
break;
diff --git a/tools/perf/util/evsel_config.h b/tools/perf/util/evsel_config.h
index aee6f808b512..85ad183b5637 100644
--- a/tools/perf/util/evsel_config.h
+++ b/tools/perf/util/evsel_config.h
@@ -25,6 +25,9 @@ enum evsel_term_type {
EVSEL__CONFIG_TERM_BRANCH,
EVSEL__CONFIG_TERM_PERCORE,
EVSEL__CONFIG_TERM_AUX_OUTPUT,
+ EVSEL__CONFIG_TERM_AUX_PAUSE,
+ EVSEL__CONFIG_TERM_AUX_RESUME,
+ EVSEL__CONFIG_TERM_AUX_START_PAUSED,
EVSEL__CONFIG_TERM_AUX_SAMPLE_SIZE,
EVSEL__CONFIG_TERM_CFG_CHG,
};
@@ -44,6 +47,9 @@ struct evsel_config_term {
unsigned long max_events;
bool percore;
bool aux_output;
+ bool aux_pause;
+ bool aux_resume;
+ bool aux_start_paused;
u32 aux_sample_size;
u64 cfg_chg;
char *str;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index aa2f5c6fc7fc..615b04d5fb30 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -768,6 +768,9 @@ static const char *config_term_name(enum parse_events__term_type term_type)
[PARSE_EVENTS__TERM_TYPE_DRV_CFG] = "driver-config",
[PARSE_EVENTS__TERM_TYPE_PERCORE] = "percore",
[PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT] = "aux-output",
+ [PARSE_EVENTS__TERM_TYPE_AUX_PAUSE] = "aux-pause",
+ [PARSE_EVENTS__TERM_TYPE_AUX_RESUME] = "aux-resume",
+ [PARSE_EVENTS__TERM_TYPE_AUX_START_PAUSED] = "aux-start-paused",
[PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE] = "aux-sample-size",
[PARSE_EVENTS__TERM_TYPE_METRIC_ID] = "metric-id",
[PARSE_EVENTS__TERM_TYPE_RAW] = "raw",
@@ -817,6 +820,9 @@ config_term_avail(enum parse_events__term_type term_type, struct parse_events_er
case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
case PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT:
+ case PARSE_EVENTS__TERM_TYPE_AUX_PAUSE:
+ case PARSE_EVENTS__TERM_TYPE_AUX_RESUME:
+ case PARSE_EVENTS__TERM_TYPE_AUX_START_PAUSED:
case PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE:
case PARSE_EVENTS__TERM_TYPE_RAW:
case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
@@ -936,6 +942,15 @@ do { \
case PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT:
CHECK_TYPE_VAL(NUM);
break;
+ case PARSE_EVENTS__TERM_TYPE_AUX_PAUSE:
+ CHECK_TYPE_VAL(NUM);
+ break;
+ case PARSE_EVENTS__TERM_TYPE_AUX_RESUME:
+ CHECK_TYPE_VAL(NUM);
+ break;
+ case PARSE_EVENTS__TERM_TYPE_AUX_START_PAUSED:
+ CHECK_TYPE_VAL(NUM);
+ break;
case PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE:
CHECK_TYPE_VAL(NUM);
if (term->val.num > UINT_MAX) {
@@ -1036,6 +1051,9 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
case PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT:
+ case PARSE_EVENTS__TERM_TYPE_AUX_PAUSE:
+ case PARSE_EVENTS__TERM_TYPE_AUX_RESUME:
+ case PARSE_EVENTS__TERM_TYPE_AUX_START_PAUSED:
case PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE:
return config_term_common(attr, term, err);
case PARSE_EVENTS__TERM_TYPE_USER:
@@ -1170,6 +1188,18 @@ do { \
ADD_CONFIG_TERM_VAL(AUX_OUTPUT, aux_output,
term->val.num ? 1 : 0, term->weak);
break;
+ case PARSE_EVENTS__TERM_TYPE_AUX_PAUSE:
+ ADD_CONFIG_TERM_VAL(AUX_PAUSE, aux_pause,
+ term->val.num ? 1 : 0, term->weak);
+ break;
+ case PARSE_EVENTS__TERM_TYPE_AUX_RESUME:
+ ADD_CONFIG_TERM_VAL(AUX_RESUME, aux_resume,
+ term->val.num ? 1 : 0, term->weak);
+ break;
+ case PARSE_EVENTS__TERM_TYPE_AUX_START_PAUSED:
+ ADD_CONFIG_TERM_VAL(AUX_START_PAUSED, aux_start_paused,
+ term->val.num ? 1 : 0, term->weak);
+ break;
case PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE:
ADD_CONFIG_TERM_VAL(AUX_SAMPLE_SIZE, aux_sample_size,
term->val.num, term->weak);
@@ -1232,6 +1262,9 @@ static int get_config_chgs(struct perf_pmu *pmu, struct parse_events_terms *head
case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
case PARSE_EVENTS__TERM_TYPE_PERCORE:
case PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT:
+ case PARSE_EVENTS__TERM_TYPE_AUX_PAUSE:
+ case PARSE_EVENTS__TERM_TYPE_AUX_RESUME:
+ case PARSE_EVENTS__TERM_TYPE_AUX_START_PAUSED:
case PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE:
case PARSE_EVENTS__TERM_TYPE_METRIC_ID:
case PARSE_EVENTS__TERM_TYPE_RAW:
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 63c0a36a4bf1..ff0871385b50 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -74,6 +74,9 @@ enum parse_events__term_type {
PARSE_EVENTS__TERM_TYPE_DRV_CFG,
PARSE_EVENTS__TERM_TYPE_PERCORE,
PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT,
+ PARSE_EVENTS__TERM_TYPE_AUX_PAUSE,
+ PARSE_EVENTS__TERM_TYPE_AUX_RESUME,
+ PARSE_EVENTS__TERM_TYPE_AUX_START_PAUSED,
PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE,
PARSE_EVENTS__TERM_TYPE_METRIC_ID,
PARSE_EVENTS__TERM_TYPE_RAW,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index e86c45675e1d..56963013c3af 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -244,6 +244,9 @@ overwrite { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_OVERWRITE); }
no-overwrite { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOOVERWRITE); }
percore { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_PERCORE); }
aux-output { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
+aux-pause { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_PAUSE); }
+aux-resume { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_RESUME); }
+aux-start-paused { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_START_PAUSED); }
aux-sample-size { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
metric-id { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
cpu-cycles|cycles { return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 8f04d3b7f3ec..e6ba0ac73182 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -309,6 +309,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
PRINT_ATTRf(inherit_thread, p_unsigned);
PRINT_ATTRf(remove_on_exec, p_unsigned);
PRINT_ATTRf(sigtrap, p_unsigned);
+ PRINT_ATTRf(aux_start_paused, p_unsigned);

PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned, false);
PRINT_ATTRf(bp_type, p_unsigned);
@@ -323,6 +324,8 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
PRINT_ATTRf(sample_max_stack, p_unsigned);
PRINT_ATTRf(aux_sample_size, p_unsigned);
PRINT_ATTRf(sig_data, p_unsigned);
+ PRINT_ATTRf(aux_pause, p_unsigned);
+ PRINT_ATTRf(aux_resume, p_unsigned);

return ret;
}
--
2.34.1

2023-11-28 19:53:13

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH RFC 0/3] perf/core: Add ability for an event to "pause" or "resume" AUX area tracing

On Thu, Nov 23, 2023 at 4:19 AM Adrian Hunter <[email protected]> wrote:
>
> Hi
>
> Hardware traces, such as instruction traces, can produce a vast amount of
> trace data, so being able to reduce tracing to more specific circumstances
> can be useful.
>
> The ability to pause or resume tracing when another event happens, can do
> that.
>
> These patches add such a facilty and show how it would work for Intel
> Processor Trace.
>
> Maintainers of other AUX area tracing implementations are requested to
> consider if this is something they might employ and then whether or not
> the ABI would work for them.
>
> Changes to perf tools are not fleshed out yet.

This looks good to me but the perf tools parsing changes I'm not sure
on. It would be nice to have a test case, likely a shell test or
change to the intel-pt shell test, to show how they work.

Thanks,
Ian

> Adrian Hunter (3):
> perf/core: Add aux_pause, aux_resume, aux_start_paused
> perf/x86/intel/pt: Add support for pause_resume()
> perf tools: Add support for AUX area pause_resume()
>
> arch/x86/events/intel/pt.c | 12 ++++++
> include/linux/perf_event.h | 9 +++++
> include/uapi/linux/perf_event.h | 13 ++++++-
> kernel/events/core.c | 65 +++++++++++++++++++++++++++++--
> kernel/events/internal.h | 1 +
> tools/include/uapi/linux/perf_event.h | 13 ++++++-
> tools/perf/util/auxtrace.c | 4 ++
> tools/perf/util/evsel.c | 9 +++++
> tools/perf/util/evsel_config.h | 6 +++
> tools/perf/util/parse-events.c | 33 ++++++++++++++++
> tools/perf/util/parse-events.h | 3 ++
> tools/perf/util/parse-events.l | 3 ++
> tools/perf/util/perf_event_attr_fprintf.c | 3 ++
> 13 files changed, 167 insertions(+), 7 deletions(-)
>
>
> Regards
> Adrian

2023-11-29 10:35:16

by James Clark

[permalink] [raw]
Subject: Re: [PATCH RFC 2/3] perf/x86/intel/pt: Add support for pause_resume()



On 23/11/2023 12:18, Adrian Hunter wrote:
> Prevent tracing to start if aux_paused.
>
> Implement pause_resume() callback. When aux_paused, stop tracing. When
> not aux_paused, only start tracing if it isn't currently meant to be
> stopped.
>
> Signed-off-by: Adrian Hunter <[email protected]>
> ---
> arch/x86/events/intel/pt.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> index 42a55794004a..aa883b64814a 100644
> --- a/arch/x86/events/intel/pt.c
> +++ b/arch/x86/events/intel/pt.c
> @@ -418,6 +418,9 @@ static void pt_config_start(struct perf_event *event)
> struct pt *pt = this_cpu_ptr(&pt_ctx);
> u64 ctl = event->hw.config;
>
> + if (event->aux_paused)
> + return;
> +
> ctl |= RTIT_CTL_TRACEEN;
> if (READ_ONCE(pt->vmx_on))
> perf_aux_output_flag(&pt->handle, PERF_AUX_FLAG_PARTIAL);
> @@ -1563,6 +1566,14 @@ EXPORT_SYMBOL_GPL(intel_pt_handle_vmx);
> * PMU callbacks
> */
>
> +static void pt_event_pause_resume(struct perf_event *event)
> +{
> + if (event->aux_paused)
> + pt_config_stop(event);
> + else if (!event->hw.state)
> + pt_config_start(event);
> +}

It seems like having a single pause/resume callback rather than separate
pause and resume ones pushes some of the event state management into the
individual drivers and would be prone to code duplication and divergent
behavior.

Would it be possible to move the conditions from here into the core code
and call separate functions instead?

> +
> static void pt_event_start(struct perf_event *event, int mode)
> {
> struct hw_perf_event *hwc = &event->hw;
> @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
> pt_pmu.pmu.del = pt_event_del;
> pt_pmu.pmu.start = pt_event_start;
> pt_pmu.pmu.stop = pt_event_stop;
> + pt_pmu.pmu.pause_resume = pt_event_pause_resume;

The general idea seems ok to me. Is there a reason to not use the
existing start() stop() callbacks, rather than adding a new one?

I assume it's intended to be something like an optimisation where you
can turn it on and off without having to do the full setup, teardown and
emit an AUX record because you know the process being traced never gets
switched out?

Could you make it so that it works out of the box, with the option of
later optimisation if you do something like this (not here but something
like this in events/core.c):

/* Use specialised pause/resume if it exists, otherwise use more
* expensive start/stop.
*/
if (pmu->pause_resume)
pmu->pause_resume(...)
else
pmu->stop(...)


> pt_pmu.pmu.snapshot_aux = pt_event_snapshot_aux;
> pt_pmu.pmu.read = pt_event_read;
> pt_pmu.pmu.setup_aux = pt_buffer_setup_aux;

2023-11-29 10:53:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC 1/3] perf/core: Add aux_pause, aux_resume, aux_start_paused

On Thu, Nov 23, 2023 at 02:18:49PM +0200, Adrian Hunter wrote:

> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 39c6a250dd1b..1232e70adfba 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -456,7 +456,8 @@ struct perf_event_attr {
> inherit_thread : 1, /* children only inherit if cloned with CLONE_THREAD */
> remove_on_exec : 1, /* event is removed from task on exec */
> sigtrap : 1, /* send synchronous SIGTRAP on event */
> - __reserved_1 : 26;
> + aux_start_paused : 1, /* start AUX area tracing paused */
> + __reserved_1 : 25;
>
> union {
> __u32 wakeup_events; /* wakeup every n events */
> @@ -507,7 +508,15 @@ struct perf_event_attr {
> __u16 sample_max_stack;
> __u16 __reserved_2;
> __u32 aux_sample_size;
> - __u32 __reserved_3;
> +
> + union {
> + __u32 aux_output_cfg;
> + struct {
> + __u64 aux_pause : 1, /* on overflow, pause AUX area tracing */
> + aux_resume : 1, /* on overflow, resume AUX area tracing */
> + __reserved_3 : 30;
> + };
> + };
>
> /*
> * User provided data if sigtrap=1, passed back to user via

So since you're adding a aux_output_cfg word, could you not also put the
start_paused thing in there?

2023-11-29 10:59:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC 2/3] perf/x86/intel/pt: Add support for pause_resume()

On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote:
> On 23/11/2023 12:18, Adrian Hunter wrote:

> > +static void pt_event_pause_resume(struct perf_event *event)
> > +{
> > + if (event->aux_paused)
> > + pt_config_stop(event);
> > + else if (!event->hw.state)
> > + pt_config_start(event);
> > +}
>
> It seems like having a single pause/resume callback rather than separate
> pause and resume ones pushes some of the event state management into the
> individual drivers and would be prone to code duplication and divergent
> behavior.
>
> Would it be possible to move the conditions from here into the core code
> and call separate functions instead?
>
> > +
> > static void pt_event_start(struct perf_event *event, int mode)
> > {
> > struct hw_perf_event *hwc = &event->hw;
> > @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
> > pt_pmu.pmu.del = pt_event_del;
> > pt_pmu.pmu.start = pt_event_start;
> > pt_pmu.pmu.stop = pt_event_stop;
> > + pt_pmu.pmu.pause_resume = pt_event_pause_resume;
>
> The general idea seems ok to me. Is there a reason to not use the
> existing start() stop() callbacks, rather than adding a new one?
>
> I assume it's intended to be something like an optimisation where you
> can turn it on and off without having to do the full setup, teardown and
> emit an AUX record because you know the process being traced never gets
> switched out?

So the actual scheduling uses ->add() / ->del(), the ->start() /
->stop() methods are something that can be used after ->add() and before
->del() to 'temporarily' pause things.

Pretty much exactly what is required here I think. We currently use this
for PMI throttling and adaptive frequency stuff, but there is no reason
it could not also be used for this.

As is, we don't track the paused state across ->del() / ->add(), but
perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_
bits to manage things.


2023-11-29 11:17:19

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH RFC 2/3] perf/x86/intel/pt: Add support for pause_resume()

On 29/11/23 12:58, Peter Zijlstra wrote:
> On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote:
>> On 23/11/2023 12:18, Adrian Hunter wrote:
>
>>> +static void pt_event_pause_resume(struct perf_event *event)
>>> +{
>>> + if (event->aux_paused)
>>> + pt_config_stop(event);
>>> + else if (!event->hw.state)
>>> + pt_config_start(event);
>>> +}
>>
>> It seems like having a single pause/resume callback rather than separate
>> pause and resume ones pushes some of the event state management into the
>> individual drivers and would be prone to code duplication and divergent
>> behavior.
>>
>> Would it be possible to move the conditions from here into the core code
>> and call separate functions instead?
>>
>>> +
>>> static void pt_event_start(struct perf_event *event, int mode)
>>> {
>>> struct hw_perf_event *hwc = &event->hw;
>>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
>>> pt_pmu.pmu.del = pt_event_del;
>>> pt_pmu.pmu.start = pt_event_start;
>>> pt_pmu.pmu.stop = pt_event_stop;
>>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume;
>>
>> The general idea seems ok to me. Is there a reason to not use the
>> existing start() stop() callbacks, rather than adding a new one?
>>
>> I assume it's intended to be something like an optimisation where you
>> can turn it on and off without having to do the full setup, teardown and
>> emit an AUX record because you know the process being traced never gets
>> switched out?
>
> So the actual scheduling uses ->add() / ->del(), the ->start() /
> ->stop() methods are something that can be used after ->add() and before
> ->del() to 'temporarily' pause things.
>
> Pretty much exactly what is required here I think. We currently use this
> for PMI throttling and adaptive frequency stuff, but there is no reason
> it could not also be used for this.
>
> As is, we don't track the paused state across ->del() / ->add(), but
> perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_
> bits to manage things.
>
>

I am not sure stop / start play nice with NMI's from other events e.g.

PMC NMI wants to pause or resume AUX but what if AUX event is currently
being processed in ->stop() or ->start()? Or maybe that can't happen?

2023-11-29 12:24:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH RFC 2/3] perf/x86/intel/pt: Add support for pause_resume()

On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote:
> On 29/11/23 12:58, Peter Zijlstra wrote:
> > On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote:
> >> On 23/11/2023 12:18, Adrian Hunter wrote:
> >
> >>> +static void pt_event_pause_resume(struct perf_event *event)
> >>> +{
> >>> + if (event->aux_paused)
> >>> + pt_config_stop(event);
> >>> + else if (!event->hw.state)
> >>> + pt_config_start(event);
> >>> +}
> >>
> >> It seems like having a single pause/resume callback rather than separate
> >> pause and resume ones pushes some of the event state management into the
> >> individual drivers and would be prone to code duplication and divergent
> >> behavior.
> >>
> >> Would it be possible to move the conditions from here into the core code
> >> and call separate functions instead?
> >>
> >>> +
> >>> static void pt_event_start(struct perf_event *event, int mode)
> >>> {
> >>> struct hw_perf_event *hwc = &event->hw;
> >>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
> >>> pt_pmu.pmu.del = pt_event_del;
> >>> pt_pmu.pmu.start = pt_event_start;
> >>> pt_pmu.pmu.stop = pt_event_stop;
> >>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume;
> >>
> >> The general idea seems ok to me. Is there a reason to not use the
> >> existing start() stop() callbacks, rather than adding a new one?
> >>
> >> I assume it's intended to be something like an optimisation where you
> >> can turn it on and off without having to do the full setup, teardown and
> >> emit an AUX record because you know the process being traced never gets
> >> switched out?
> >
> > So the actual scheduling uses ->add() / ->del(), the ->start() /
> > ->stop() methods are something that can be used after ->add() and before
> > ->del() to 'temporarily' pause things.
> >
> > Pretty much exactly what is required here I think. We currently use this
> > for PMI throttling and adaptive frequency stuff, but there is no reason
> > it could not also be used for this.
> >
> > As is, we don't track the paused state across ->del() / ->add(), but
> > perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_
> > bits to manage things.
> >
> >
>
> I am not sure stop / start play nice with NMI's from other events e.g.
>
> PMC NMI wants to pause or resume AUX but what if AUX event is currently
> being processed in ->stop() or ->start()? Or maybe that can't happen?

I think that can happen, and pt_event_stop() can actually handle some of
that, while your pause_resume() thing, which uses pt_config_stop() does
not.

But yes, I think that if you add pt_event_{stop,start}() calls from
*other* events their PMI, then you get to deal with more 'fun'.

Something like:

perf_addr_filters_adjust()
__perf_addr_filters_adjust()
perf_event_stop()
__perf_event_stop()
event->pmu->stop()
<NMI>
...
perf_event_overflow()
pt_event->pmu->stop()
</NMI>
event->pmu->start() // whoopsie!

Should now be possible.

I think what you want to do is rename pt->handle_nmi into pt->stop_count
and make it a counter, then ->stop() increments it, and ->start()
decrements it and everybody ensures the thing doesn't get restart while
!0 etc..

I suspect you need to guard the generic part of this feature with a new
PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once
they've audited things.

James, does that work for you?

2023-11-30 10:07:36

by James Clark

[permalink] [raw]
Subject: Re: [PATCH RFC 2/3] perf/x86/intel/pt: Add support for pause_resume()



On 29/11/2023 12:23, Peter Zijlstra wrote:
> On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote:
>> On 29/11/23 12:58, Peter Zijlstra wrote:
>>> On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote:
>>>> On 23/11/2023 12:18, Adrian Hunter wrote:
>>>
>>>>> +static void pt_event_pause_resume(struct perf_event *event)
>>>>> +{
>>>>> + if (event->aux_paused)
>>>>> + pt_config_stop(event);
>>>>> + else if (!event->hw.state)
>>>>> + pt_config_start(event);
>>>>> +}
>>>>
>>>> It seems like having a single pause/resume callback rather than separate
>>>> pause and resume ones pushes some of the event state management into the
>>>> individual drivers and would be prone to code duplication and divergent
>>>> behavior.
>>>>
>>>> Would it be possible to move the conditions from here into the core code
>>>> and call separate functions instead?
>>>>
>>>>> +
>>>>> static void pt_event_start(struct perf_event *event, int mode)
>>>>> {
>>>>> struct hw_perf_event *hwc = &event->hw;
>>>>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
>>>>> pt_pmu.pmu.del = pt_event_del;
>>>>> pt_pmu.pmu.start = pt_event_start;
>>>>> pt_pmu.pmu.stop = pt_event_stop;
>>>>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume;
>>>>
>>>> The general idea seems ok to me. Is there a reason to not use the
>>>> existing start() stop() callbacks, rather than adding a new one?
>>>>
>>>> I assume it's intended to be something like an optimisation where you
>>>> can turn it on and off without having to do the full setup, teardown and
>>>> emit an AUX record because you know the process being traced never gets
>>>> switched out?
>>>
>>> So the actual scheduling uses ->add() / ->del(), the ->start() /
>>> ->stop() methods are something that can be used after ->add() and before
>>> ->del() to 'temporarily' pause things.
>>>
>>> Pretty much exactly what is required here I think. We currently use this
>>> for PMI throttling and adaptive frequency stuff, but there is no reason
>>> it could not also be used for this.
>>>
>>> As is, we don't track the paused state across ->del() / ->add(), but
>>> perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_
>>> bits to manage things.
>>>
>>>
>>
>> I am not sure stop / start play nice with NMI's from other events e.g.
>>
>> PMC NMI wants to pause or resume AUX but what if AUX event is currently
>> being processed in ->stop() or ->start()? Or maybe that can't happen?
>
> I think that can happen, and pt_event_stop() can actually handle some of
> that, while your pause_resume() thing, which uses pt_config_stop() does
> not.
>
> But yes, I think that if you add pt_event_{stop,start}() calls from
> *other* events their PMI, then you get to deal with more 'fun'.
>
> Something like:
>
> perf_addr_filters_adjust()
> __perf_addr_filters_adjust()
> perf_event_stop()
> __perf_event_stop()
> event->pmu->stop()
> <NMI>
> ...
> perf_event_overflow()
> pt_event->pmu->stop()
> </NMI>
> event->pmu->start() // whoopsie!
>
> Should now be possible.
>
> I think what you want to do is rename pt->handle_nmi into pt->stop_count
> and make it a counter, then ->stop() increments it, and ->start()
> decrements it and everybody ensures the thing doesn't get restart while
> !0 etc..
>
> I suspect you need to guard the generic part of this feature with a new
> PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once
> they've audited things.
>
> James, does that work for you?
>

Yep I think that would work.

If I understand it with the stop_count counter, to be able to support
the new CAP, every device would basically have to implement a similar
counter?

Would it be possible to do that reference counting on the outside of
start() and stop() in the core code? So that a device only ever sees one
call to start and one call to stop and doesn't need to do any extra
tracking?

2023-12-05 05:36:56

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH RFC 2/3] perf/x86/intel/pt: Add support for pause_resume()

On 30/11/23 12:07, James Clark wrote:
>
>
> On 29/11/2023 12:23, Peter Zijlstra wrote:
>> On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote:
>>> On 29/11/23 12:58, Peter Zijlstra wrote:
>>>> On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote:
>>>>> On 23/11/2023 12:18, Adrian Hunter wrote:
>>>>
>>>>>> +static void pt_event_pause_resume(struct perf_event *event)
>>>>>> +{
>>>>>> + if (event->aux_paused)
>>>>>> + pt_config_stop(event);
>>>>>> + else if (!event->hw.state)
>>>>>> + pt_config_start(event);
>>>>>> +}
>>>>>
>>>>> It seems like having a single pause/resume callback rather than separate
>>>>> pause and resume ones pushes some of the event state management into the
>>>>> individual drivers and would be prone to code duplication and divergent
>>>>> behavior.
>>>>>
>>>>> Would it be possible to move the conditions from here into the core code
>>>>> and call separate functions instead?
>>>>>
>>>>>> +
>>>>>> static void pt_event_start(struct perf_event *event, int mode)
>>>>>> {
>>>>>> struct hw_perf_event *hwc = &event->hw;
>>>>>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
>>>>>> pt_pmu.pmu.del = pt_event_del;
>>>>>> pt_pmu.pmu.start = pt_event_start;
>>>>>> pt_pmu.pmu.stop = pt_event_stop;
>>>>>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume;
>>>>>
>>>>> The general idea seems ok to me. Is there a reason to not use the
>>>>> existing start() stop() callbacks, rather than adding a new one?
>>>>>
>>>>> I assume it's intended to be something like an optimisation where you
>>>>> can turn it on and off without having to do the full setup, teardown and
>>>>> emit an AUX record because you know the process being traced never gets
>>>>> switched out?
>>>>
>>>> So the actual scheduling uses ->add() / ->del(), the ->start() /
>>>> ->stop() methods are something that can be used after ->add() and before
>>>> ->del() to 'temporarily' pause things.
>>>>
>>>> Pretty much exactly what is required here I think. We currently use this
>>>> for PMI throttling and adaptive frequency stuff, but there is no reason
>>>> it could not also be used for this.
>>>>
>>>> As is, we don't track the paused state across ->del() / ->add(), but
>>>> perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_
>>>> bits to manage things.
>>>>
>>>>
>>>
>>> I am not sure stop / start play nice with NMI's from other events e.g.
>>>
>>> PMC NMI wants to pause or resume AUX but what if AUX event is currently
>>> being processed in ->stop() or ->start()? Or maybe that can't happen?
>>
>> I think that can happen, and pt_event_stop() can actually handle some of
>> that, while your pause_resume() thing, which uses pt_config_stop() does
>> not.
>>
>> But yes, I think that if you add pt_event_{stop,start}() calls from
>> *other* events their PMI, then you get to deal with more 'fun'.
>>
>> Something like:
>>
>> perf_addr_filters_adjust()
>> __perf_addr_filters_adjust()
>> perf_event_stop()
>> __perf_event_stop()
>> event->pmu->stop()
>> <NMI>
>> ...
>> perf_event_overflow()
>> pt_event->pmu->stop()
>> </NMI>
>> event->pmu->start() // whoopsie!
>>
>> Should now be possible.
>>
>> I think what you want to do is rename pt->handle_nmi into pt->stop_count
>> and make it a counter, then ->stop() increments it, and ->start()
>> decrements it and everybody ensures the thing doesn't get restart while
>> !0 etc..
>>
>> I suspect you need to guard the generic part of this feature with a new
>> PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once
>> they've audited things.
>>
>> James, does that work for you?
>>
>
> Yep I think that would work.
>
> If I understand it with the stop_count counter, to be able to support
> the new CAP, every device would basically have to implement a similar
> counter?
>
> Would it be possible to do that reference counting on the outside of
> start() and stop() in the core code? So that a device only ever sees one
> call to start and one call to stop and doesn't need to do any extra
> tracking?

stop_counter does not seem right for pauses and resumes because they
should not accumulate. We want:

pause stop
pause
resume start
resume

not:

pause stop
pause
resume
resume start

Also stop_counter has issues like:

stop_counter 0 -> 1
<NMI>
stop_counter 1 -> 0
start()
</NMI>
stop() <- stop_counter is now out of sync

or:

stop_counter 1 -> 0
<NMI>
stop_counter 0 -> 1
stop()
</NMI>
start() <- stop_counter is now out of sync

Also Intel PT implementation uses low-level start / stop
for pause / resume, which can be made not to conflict with
regular start / stop because they only toggle TRACEEN bit.