2018-01-15 15:00:38

by Alexander Shishkin

[permalink] [raw]
Subject: [PATCH] perf: Allow suppressing AUX records

It has been pointed out to me many times that it is useful to be able
to switch off AUX records to save the bandwidth for records that actually
matter, for example, in AUX overwrite mode.

The usefulness of PERF_RECORD_AUX is in some of its flags, like the
TRUNCATED flag that tells the decoder where exactly gaps in the trace are.
The OVERWRITE flag, on the other hand will be set on every single record
in overwrite mode. However, a PERF_RECORD_AUX[flags=OVERWRITE] is
generated on every target task's sched_out, which over time adds up to
a lot of useless information.

In case the existing userspace depends on AUX records in the overwrite
mode, we preserve the original behavior and add an opt-in for the new
behavior, wherein the 'useless' records get suppressed.

This patch adds an attribute bit to the described effect.

Signed-off-by: Alexander Shishkin <[email protected]>
Cc: Markus Metzger <[email protected]>
Cc: Adrian Hunter <[email protected]>
---
include/uapi/linux/perf_event.h | 3 ++-
kernel/events/core.c | 5 +++++
kernel/events/ring_buffer.c | 13 +++++++++++--
3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c77c9a2ebbbb..d7a981130561 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -370,7 +370,8 @@ struct perf_event_attr {
context_switch : 1, /* context switch data */
write_backward : 1, /* Write ring buffer from end to beginning */
namespaces : 1, /* include namespaces data */
- __reserved_1 : 35;
+ suppress_aux : 1, /* don't generate PERF_RECORD_AUX */
+ __reserved_1 : 34;

union {
__u32 wakeup_events; /* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4e1a1bf8d867..6245a88c2bda 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10012,6 +10012,11 @@ SYSCALL_DEFINE5(perf_event_open,
goto err_context;
}

+ if (attr.suppress_aux && !pmu->setup_aux) {
+ err = -EINVAL;
+ goto err_context;
+ }
+
/*
* Look up the group leader (we will attach this event to it):
*/
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 141aa2ca8728..381f080e6409 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -426,6 +426,12 @@ static bool __always_inline rb_need_aux_wakeup(struct ring_buffer *rb)
return false;
}

+/*
+ * These flags won't generate a PERF_RECORD_AUX on their own if
+ * attr::suppress_aux is set.
+ */
+#define SUPPRESSABLE_FLAGS PERF_AUX_FLAG_OVERWRITE
+
/*
* Commit the data written by hardware into the ring buffer by adjusting
* aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the
@@ -460,8 +466,11 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
* Only send RECORD_AUX if we have something useful to communicate
*/

- perf_event_aux_event(handle->event, aux_head, size,
- handle->aux_flags);
+ if (!handle->event->attr.suppress_aux ||
+ (handle->aux_flags & ~(u64)SUPPRESSABLE_FLAGS)) {
+ perf_event_aux_event(handle->event, aux_head, size,
+ handle->aux_flags);
+ }
}

rb->user_page->aux_head = rb->aux_head;
--
2.15.1


2018-03-29 11:55:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] perf: Allow suppressing AUX records

On Mon, Jan 15, 2018 at 05:00:20PM +0200, Alexander Shishkin wrote:
> It has been pointed out to me many times that it is useful to be able
> to switch off AUX records to save the bandwidth for records that actually
> matter, for example, in AUX overwrite mode.
>
> The usefulness of PERF_RECORD_AUX is in some of its flags, like the
> TRUNCATED flag that tells the decoder where exactly gaps in the trace are.
> The OVERWRITE flag, on the other hand will be set on every single record
> in overwrite mode. However, a PERF_RECORD_AUX[flags=OVERWRITE] is
> generated on every target task's sched_out, which over time adds up to
> a lot of useless information.
>
> In case the existing userspace depends on AUX records in the overwrite
> mode, we preserve the original behavior and add an opt-in for the new
> behavior, wherein the 'useless' records get suppressed.
>
> This patch adds an attribute bit to the described effect.
>
> Signed-off-by: Alexander Shishkin <[email protected]>
> Cc: Markus Metzger <[email protected]>
> Cc: Adrian Hunter <[email protected]>
> ---
> include/uapi/linux/perf_event.h | 3 ++-
> kernel/events/core.c | 5 +++++
> kernel/events/ring_buffer.c | 13 +++++++++++--
> 3 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index c77c9a2ebbbb..d7a981130561 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -370,7 +370,8 @@ struct perf_event_attr {
> context_switch : 1, /* context switch data */
> write_backward : 1, /* Write ring buffer from end to beginning */
> namespaces : 1, /* include namespaces data */
> - __reserved_1 : 35;
> + suppress_aux : 1, /* don't generate PERF_RECORD_AUX */
> + __reserved_1 : 34;
>
> union {
> __u32 wakeup_events; /* wakeup every n events */

So I'm basically fine with this patch, however I wonder if we really
need this suppress flag and can't just unconditionally drop these
events.

Ash said that as far as he knows no Intel-PT user actually relies on it;
Will is there anything ARM that is known to rely on them?

In anycase, tentative ACK on this, unless we wants to be brave and forgo
this flag.

Ingo, any opinions?


> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4e1a1bf8d867..6245a88c2bda 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10012,6 +10012,11 @@ SYSCALL_DEFINE5(perf_event_open,
> goto err_context;
> }
>
> + if (attr.suppress_aux && !pmu->setup_aux) {
> + err = -EINVAL;
> + goto err_context;
> + }
> +
> /*
> * Look up the group leader (we will attach this event to it):
> */
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 141aa2ca8728..381f080e6409 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -426,6 +426,12 @@ static bool __always_inline rb_need_aux_wakeup(struct ring_buffer *rb)
> return false;
> }
>
> +/*
> + * These flags won't generate a PERF_RECORD_AUX on their own if
> + * attr::suppress_aux is set.
> + */
> +#define SUPPRESSABLE_FLAGS PERF_AUX_FLAG_OVERWRITE
> +
> /*
> * Commit the data written by hardware into the ring buffer by adjusting
> * aux_head and posting a PERF_RECORD_AUX into the perf buffer. It is the
> @@ -460,8 +466,11 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
> * Only send RECORD_AUX if we have something useful to communicate
> */
>
> - perf_event_aux_event(handle->event, aux_head, size,
> - handle->aux_flags);
> + if (!handle->event->attr.suppress_aux ||
> + (handle->aux_flags & ~(u64)SUPPRESSABLE_FLAGS)) {
> + perf_event_aux_event(handle->event, aux_head, size,
> + handle->aux_flags);
> + }
> }
>
> rb->user_page->aux_head = rb->aux_head;
> --
> 2.15.1
>

2018-03-31 09:37:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] perf: Allow suppressing AUX records


* Peter Zijlstra <[email protected]> wrote:

> On Mon, Jan 15, 2018 at 05:00:20PM +0200, Alexander Shishkin wrote:
> > It has been pointed out to me many times that it is useful to be able
> > to switch off AUX records to save the bandwidth for records that actually
> > matter, for example, in AUX overwrite mode.
> >
> > The usefulness of PERF_RECORD_AUX is in some of its flags, like the
> > TRUNCATED flag that tells the decoder where exactly gaps in the trace are.
> > The OVERWRITE flag, on the other hand will be set on every single record
> > in overwrite mode. However, a PERF_RECORD_AUX[flags=OVERWRITE] is
> > generated on every target task's sched_out, which over time adds up to
> > a lot of useless information.
> >
> > In case the existing userspace depends on AUX records in the overwrite
> > mode, we preserve the original behavior and add an opt-in for the new
> > behavior, wherein the 'useless' records get suppressed.
> >
> > This patch adds an attribute bit to the described effect.
> >
> > Signed-off-by: Alexander Shishkin <[email protected]>
> > Cc: Markus Metzger <[email protected]>
> > Cc: Adrian Hunter <[email protected]>
> > ---
> > include/uapi/linux/perf_event.h | 3 ++-
> > kernel/events/core.c | 5 +++++
> > kernel/events/ring_buffer.c | 13 +++++++++++--
> > 3 files changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index c77c9a2ebbbb..d7a981130561 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -370,7 +370,8 @@ struct perf_event_attr {
> > context_switch : 1, /* context switch data */
> > write_backward : 1, /* Write ring buffer from end to beginning */
> > namespaces : 1, /* include namespaces data */
> > - __reserved_1 : 35;
> > + suppress_aux : 1, /* don't generate PERF_RECORD_AUX */
> > + __reserved_1 : 34;
> >
> > union {
> > __u32 wakeup_events; /* wakeup every n events */
>
> So I'm basically fine with this patch, however I wonder if we really
> need this suppress flag and can't just unconditionally drop these
> events.
>
> Ash said that as far as he knows no Intel-PT user actually relies on it;
> Will is there anything ARM that is known to rely on them?
>
> In anycase, tentative ACK on this, unless we wants to be brave and forgo
> this flag.
>
> Ingo, any opinions?

Yeah, I'd suggest we just supress those record, and wait for complaints - let's
not complicate the ABI if not necessary?

Thanks,

Ingo

2018-04-03 17:33:32

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH] perf: Allow suppressing AUX records

On Sat, Mar 31, 2018 at 11:35:46AM +0200, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
> > On Mon, Jan 15, 2018 at 05:00:20PM +0200, Alexander Shishkin wrote:
> > > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > > index c77c9a2ebbbb..d7a981130561 100644
> > > --- a/include/uapi/linux/perf_event.h
> > > +++ b/include/uapi/linux/perf_event.h
> > > @@ -370,7 +370,8 @@ struct perf_event_attr {
> > > context_switch : 1, /* context switch data */
> > > write_backward : 1, /* Write ring buffer from end to beginning */
> > > namespaces : 1, /* include namespaces data */
> > > - __reserved_1 : 35;
> > > + suppress_aux : 1, /* don't generate PERF_RECORD_AUX */
> > > + __reserved_1 : 34;
> > >
> > > union {
> > > __u32 wakeup_events; /* wakeup every n events */
> >
> > So I'm basically fine with this patch, however I wonder if we really
> > need this suppress flag and can't just unconditionally drop these
> > events.
> >
> > Ash said that as far as he knows no Intel-PT user actually relies on it;
> > Will is there anything ARM that is known to rely on them?
> >
> > In anycase, tentative ACK on this, unless we wants to be brave and forgo
> > this flag.
> >
> > Ingo, any opinions?
>
> Yeah, I'd suggest we just supress those record, and wait for complaints - let's
> not complicate the ABI if not necessary?

Works for me. We've not had SPE support in mainline perf for very long and
the availability of hardware is extremely limited at the moment, so I don't
anticipate any ABI implications on the arm64 side.

Cheers,

Will

2018-04-04 14:55:13

by Alexander Shishkin

[permalink] [raw]
Subject: [PATCH v2] perf: Suppress AUX/OVERWRITE records

It has been pointed out to me many times that it is useful to be able
to switch off AUX records to save the bandwidth for records that actually
matter, for example, in AUX overwrite mode.

The usefulness of PERF_RECORD_AUX is in some of its flags, like the
TRUNCATED flag that tells the decoder where exactly gaps in the trace are.
The OVERWRITE flag, on the other hand will be set on every single record
in overwrite mode. However, a PERF_RECORD_AUX[flags=OVERWRITE] is
generated on every target task's sched_out, which over time adds up to
a lot of useless information.

If any folks out there have userspace that depends on a constant stream of
OVERWRITE records for a good reason, they'll have to let us know.

Signed-off-by: Alexander Shishkin <[email protected]>
Cc: Markus Metzger <[email protected]>
Cc: Adrian Hunter <[email protected]>
---
kernel/events/ring_buffer.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 6c6b3c48db71..c4edd8fbc5d9 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -458,10 +458,20 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
if (size || handle->aux_flags) {
/*
* Only send RECORD_AUX if we have something useful to communicate
+ *
+ * Note: the OVERWRITE records by themselves are not considered
+ * useful, as they don't communicate any *new* information,
+ * aside from the short-lived offset, that becomes history at
+ * the next event sched-in and therefore isn't useful.
+ * The userspace that needs to copy out AUX data in overwrite
+ * mode should know to use user_page::aux_head for the actual
+ * offset. So, from now on we don't output AUX records that
+ * have *only* OVERWRITE flag set.
*/

- perf_event_aux_event(handle->event, aux_head, size,
- handle->aux_flags);
+ if (handle->aux_flags & ~(u64)PERF_AUX_FLAG_OVERWRITE)
+ perf_event_aux_event(handle->event, aux_head, size,
+ handle->aux_flags);
}

rb->user_page->aux_head = rb->aux_head;
--
2.16.3


2018-05-04 12:11:45

by Alexander Shishkin

[permalink] [raw]
Subject: Re: [PATCH v2] perf: Suppress AUX/OVERWRITE records

On Wed, Apr 04, 2018 at 05:53:23PM +0300, Alexander Shishkin wrote:
> It has been pointed out to me many times that it is useful to be able
> to switch off AUX records to save the bandwidth for records that actually
> matter, for example, in AUX overwrite mode.
>
> The usefulness of PERF_RECORD_AUX is in some of its flags, like the
> TRUNCATED flag that tells the decoder where exactly gaps in the trace are.
> The OVERWRITE flag, on the other hand will be set on every single record
> in overwrite mode. However, a PERF_RECORD_AUX[flags=OVERWRITE] is
> generated on every target task's sched_out, which over time adds up to
> a lot of useless information.
>
> If any folks out there have userspace that depends on a constant stream of
> OVERWRITE records for a good reason, they'll have to let us know.
>
> Signed-off-by: Alexander Shishkin <[email protected]>
> Cc: Markus Metzger <[email protected]>
> Cc: Adrian Hunter <[email protected]>

This one seems to be slipping through the cracks.

Cheers,
--
Alex

2018-05-04 15:36:41

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2] perf: Suppress AUX/OVERWRITE records

Em Fri, May 04, 2018 at 03:09:59PM +0300, Alexander Shishkin escreveu:
> On Wed, Apr 04, 2018 at 05:53:23PM +0300, Alexander Shishkin wrote:
> > It has been pointed out to me many times that it is useful to be able
> > to switch off AUX records to save the bandwidth for records that actually
> > matter, for example, in AUX overwrite mode.
> >
> > The usefulness of PERF_RECORD_AUX is in some of its flags, like the
> > TRUNCATED flag that tells the decoder where exactly gaps in the trace are.
> > The OVERWRITE flag, on the other hand will be set on every single record
> > in overwrite mode. However, a PERF_RECORD_AUX[flags=OVERWRITE] is
> > generated on every target task's sched_out, which over time adds up to
> > a lot of useless information.
> >
> > If any folks out there have userspace that depends on a constant stream of
> > OVERWRITE records for a good reason, they'll have to let us know.
> >
> > Signed-off-by: Alexander Shishkin <[email protected]>
> > Cc: Markus Metzger <[email protected]>
> > Cc: Adrian Hunter <[email protected]>
>
> This one seems to be slipping through the cracks.

So, did you got Acked-by or tested-by from anyone?


- Arnaldo

2018-05-04 15:37:15

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2] perf: Suppress AUX/OVERWRITE records

Em Fri, May 04, 2018 at 12:35:34PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, May 04, 2018 at 03:09:59PM +0300, Alexander Shishkin escreveu:
> > On Wed, Apr 04, 2018 at 05:53:23PM +0300, Alexander Shishkin wrote:
> > > It has been pointed out to me many times that it is useful to be able
> > > to switch off AUX records to save the bandwidth for records that actually
> > > matter, for example, in AUX overwrite mode.
> > >
> > > The usefulness of PERF_RECORD_AUX is in some of its flags, like the
> > > TRUNCATED flag that tells the decoder where exactly gaps in the trace are.
> > > The OVERWRITE flag, on the other hand will be set on every single record
> > > in overwrite mode. However, a PERF_RECORD_AUX[flags=OVERWRITE] is
> > > generated on every target task's sched_out, which over time adds up to
> > > a lot of useless information.
> > >
> > > If any folks out there have userspace that depends on a constant stream of
> > > OVERWRITE records for a good reason, they'll have to let us know.
> > >
> > > Signed-off-by: Alexander Shishkin <[email protected]>
> > > Cc: Markus Metzger <[email protected]>
> > > Cc: Adrian Hunter <[email protected]>
> >
> > This one seems to be slipping through the cracks.
>
> So, did you got Acked-by or tested-by from anyone?

Yeah, tons of them, I'll pick this up

- Arnaldo

Subject: [tip:perf/core] perf: Suppress AUX/OVERWRITE records

Commit-ID: 1627314fb54a33ebd23bd08f2e215eaed0f44712
Gitweb: https://git.kernel.org/tip/1627314fb54a33ebd23bd08f2e215eaed0f44712
Author: Alexander Shishkin <[email protected]>
AuthorDate: Wed, 4 Apr 2018 17:53:23 +0300
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Tue, 18 Sep 2018 17:21:13 -0300

perf: Suppress AUX/OVERWRITE records

It has been pointed out to me many times that it is useful to be able to
switch off AUX records to save the bandwidth for records that actually
matter, for example, in AUX overwrite mode.

The usefulness of PERF_RECORD_AUX is in some of its flags, like the
TRUNCATED flag that tells the decoder where exactly gaps in the trace
are. The OVERWRITE flag, on the other hand will be set on every single
record in overwrite mode. However, a PERF_RECORD_AUX[flags=OVERWRITE] is
generated on every target task's sched_out, which over time adds up to a
lot of useless information.

If any folks out there have userspace that depends on a constant stream
of OVERWRITE records for a good reason, they'll have to let us know.

Signed-off-by: Alexander Shishkin <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Acked-by: Will Deacon <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Markus T Metzger <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
kernel/events/ring_buffer.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 5d3cf407e374..4a9937076331 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -459,10 +459,20 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size)
if (size || handle->aux_flags) {
/*
* Only send RECORD_AUX if we have something useful to communicate
+ *
+ * Note: the OVERWRITE records by themselves are not considered
+ * useful, as they don't communicate any *new* information,
+ * aside from the short-lived offset, that becomes history at
+ * the next event sched-in and therefore isn't useful.
+ * The userspace that needs to copy out AUX data in overwrite
+ * mode should know to use user_page::aux_head for the actual
+ * offset. So, from now on we don't output AUX records that
+ * have *only* OVERWRITE flag set.
*/

- perf_event_aux_event(handle->event, aux_head, size,
- handle->aux_flags);
+ if (handle->aux_flags & ~(u64)PERF_AUX_FLAG_OVERWRITE)
+ perf_event_aux_event(handle->event, aux_head, size,
+ handle->aux_flags);
}

rb->user_page->aux_head = rb->aux_head;