On Mon, Nov 13, 2023 at 08:56:39AM -0800, Namhyung Kim wrote:
> On Sat, Nov 11, 2023 at 10:49 AM Josh Poimboeuf <[email protected]> wrote:
> >
> > On Fri, Nov 10, 2023 at 10:57:58PM -0800, Namhyung Kim wrote:
> > > Anyway I'm not sure it can support these additional samples for
> > > deferred callchains without breaking the existing perf tools.
> > > Anyway it doesn't know PERF_CONTEXT_USER_DEFERRED at least.
> > > I think this should be controlled by a new feature bit in the
> > > perf_event_attr.
> > >
> > > Then we might add a breaking change to have a special
> > > sample record for the deferred callchains and sample ID only.
> >
> > Sounds like a good idea. I'll need to study the code to figure out how
> > to do that on the perf tool side. Or would you care to write a patch?
>
> Sure, I'd be happy to write one.
I think we can start with something like the below.
The sample id (attr.sample_type) should have
IDENTIFIER | TID | TIME to enable defer_callchain
in order to match sample and callchain records.
Thanks,
Namhyung
---8<---
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 39c6a250dd1b..a3765ff59798 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -456,7 +456,8 @@ struct perf_event_attr {
inherit_thread : 1, /* children only inherit if cloned with CLONE_THREAD */
remove_on_exec : 1, /* event is removed from task on exec */
sigtrap : 1, /* send synchronous SIGTRAP on event */
- __reserved_1 : 26;
+ defer_callchain: 1, /* generate DEFERRED_CALLCHAINS records for userspace */
+ __reserved_1 : 25;
union {
__u32 wakeup_events; /* wakeup every n events */
@@ -1207,6 +1208,20 @@ enum perf_event_type {
*/
PERF_RECORD_AUX_OUTPUT_HW_ID = 21,
+ /*
+ * Deferred user stack callchains (for SFrame). Previous samples would
+ * have kernel callchains only and they need to be stitched with this
+ * to make full callchains.
+ *
+ * struct {
+ * struct perf_event_header header;
+ * u64 nr;
+ * u64 ips[nr];
+ * struct sample_id sample_id;
+ * };
+ */
+ PERF_RECORD_DEFERRED_CALLCHAINS = 22,
+
PERF_RECORD_MAX, /* non-ABI */
};
On Wed, Nov 15, 2023 at 08:13:31AM -0800, Namhyung Kim wrote:
> ---8<---
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 39c6a250dd1b..a3765ff59798 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -456,7 +456,8 @@ struct perf_event_attr {
> inherit_thread : 1, /* children only inherit if cloned with CLONE_THREAD */
> remove_on_exec : 1, /* event is removed from task on exec */
> sigtrap : 1, /* send synchronous SIGTRAP on event */
> - __reserved_1 : 26;
> + defer_callchain: 1, /* generate DEFERRED_CALLCHAINS records for userspace */
> + __reserved_1 : 25;
>
> union {
> __u32 wakeup_events; /* wakeup every n events */
> @@ -1207,6 +1208,20 @@ enum perf_event_type {
> */
> PERF_RECORD_AUX_OUTPUT_HW_ID = 21,
>
> + /*
> + * Deferred user stack callchains (for SFrame). Previous samples would
Possibly also useful for ShadowStack based unwinders. And by virtue of
it possibly saving work when multiple consecutive samples hit
the same kernel section, for everything.
> + * have kernel callchains only and they need to be stitched with this
> + * to make full callchains.
> + *
> + * struct {
> + * struct perf_event_header header;
> + * u64 nr;
> + * u64 ips[nr];
> + * struct sample_id sample_id;
> + * };
> + */
> + PERF_RECORD_DEFERRED_CALLCHAINS = 22,
> +
> PERF_RECORD_MAX, /* non-ABI */
> };
>
Anyway, yeah, that should do I suppose.