2020-02-11 01:17:52

by Yiwei Zhang

[permalink] [raw]
Subject: [PATCH] Add gpu memory tracepoints

From: Yiwei Zhang <[email protected]>

This change adds the below gpu memory tracepoint:
gpu_mem/gpu_mem_total: track global or process gpu memory total counters

Signed-off-by: Yiwei Zhang <[email protected]>
---
include/trace/events/gpu_mem.h | 64 ++++++++++++++++++++++++++++++++++
kernel/trace/Kconfig | 3 ++
kernel/trace/Makefile | 1 +
kernel/trace/trace_gpu_mem.c | 13 +++++++
4 files changed, 81 insertions(+)
create mode 100644 include/trace/events/gpu_mem.h
create mode 100644 kernel/trace/trace_gpu_mem.c

diff --git a/include/trace/events/gpu_mem.h b/include/trace/events/gpu_mem.h
new file mode 100644
index 000000000000..3b632a2b5100
--- /dev/null
+++ b/include/trace/events/gpu_mem.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * GPU memory trace points
+ *
+ * Copyright (C) 2020 Google, Inc.
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM gpu_mem
+
+#if !defined(_TRACE_GPU_MEM_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_GPU_MEM_H
+
+#include <linux/tracepoint.h>
+
+/*
+ * The gpu_memory_total event indicates that there's an update to either the
+ * global or process total gpu memory counters.
+ *
+ * This event should be emitted whenever the kernel device driver allocates,
+ * frees, imports, unimports memory in the GPU addressable space.
+ *
+ * @gpu_id: This is the gpu id.
+ *
+ * @pid: Put 0 for global total, while positive pid for process total.
+ *
+ * @size: Virtual size of the allocation in bytes.
+ *
+ */
+TRACE_EVENT(gpu_mem_total,
+ TP_PROTO(
+ uint32_t gpu_id,
+ uint32_t pid,
+ uint64_t size
+ ),
+ TP_ARGS(
+ gpu_id,
+ pid,
+ size
+ ),
+ TP_STRUCT__entry(
+ __field(uint32_t, gpu_id)
+ __field(uint32_t, pid)
+ __field(uint64_t, size)
+ ),
+ TP_fast_assign(
+ __entry->gpu_id = gpu_id;
+ __entry->pid = pid;
+ __entry->size = size;
+ ),
+ TP_printk(
+ "gpu_id=%u "
+ "pid=%u "
+ "size=%llu",
+ __entry->gpu_id,
+ __entry->pid,
+ __entry->size
+ )
+);
+
+#endif /* _TRACE_GPU_MEM_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 91e885194dbc..cb404755b0a6 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -85,6 +85,9 @@ config EVENT_TRACING
config CONTEXT_SWITCH_TRACER
bool

+config TRACE_GPU_MEM
+ bool
+
config RING_BUFFER_ALLOW_SWAP
bool
help
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index f9dcd19165fa..267985313dca 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -47,6 +47,7 @@ obj-$(CONFIG_PREEMPTIRQ_DELAY_TEST) += preemptirq_delay_test.o
obj-$(CONFIG_SYNTH_EVENT_GEN_TEST) += synth_event_gen_test.o
obj-$(CONFIG_KPROBE_EVENT_GEN_TEST) += kprobe_event_gen_test.o
obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
+obj-$(CONFIG_TRACE_GPU_MEM) += trace_gpu_mem.o
obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
diff --git a/kernel/trace/trace_gpu_mem.c b/kernel/trace/trace_gpu_mem.c
new file mode 100644
index 000000000000..01e855897b6d
--- /dev/null
+++ b/kernel/trace/trace_gpu_mem.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * GPU memory trace points
+ *
+ * Copyright (C) 2020 Google, Inc.
+ */
+
+#include <linux/module.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/gpu_mem.h>
+
+EXPORT_TRACEPOINT_SYMBOL(gpu_mem_total);
--
2.25.0.341.g760bfbb309-goog


2020-02-11 01:20:06

by Yiwei Zhang

[permalink] [raw]
Subject: Re: [PATCH] Add gpu memory tracepoints

On Mon, Feb 10, 2020 at 5:16 PM <[email protected]> wrote:
>
> From: Yiwei Zhang <[email protected]>
>
> This change adds the below gpu memory tracepoint:
> gpu_mem/gpu_mem_total: track global or process gpu memory total counters
>
> Signed-off-by: Yiwei Zhang <[email protected]>
> ---
> include/trace/events/gpu_mem.h | 64 ++++++++++++++++++++++++++++++++++
> kernel/trace/Kconfig | 3 ++
> kernel/trace/Makefile | 1 +
> kernel/trace/trace_gpu_mem.c | 13 +++++++
> 4 files changed, 81 insertions(+)
> create mode 100644 include/trace/events/gpu_mem.h
> create mode 100644 kernel/trace/trace_gpu_mem.c
>
> diff --git a/include/trace/events/gpu_mem.h b/include/trace/events/gpu_mem.h
> new file mode 100644
> index 000000000000..3b632a2b5100
> --- /dev/null
> +++ b/include/trace/events/gpu_mem.h
> @@ -0,0 +1,64 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * GPU memory trace points
> + *
> + * Copyright (C) 2020 Google, Inc.
> + */
> +
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM gpu_mem
> +
> +#if !defined(_TRACE_GPU_MEM_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_GPU_MEM_H
> +
> +#include <linux/tracepoint.h>
> +
> +/*
> + * The gpu_memory_total event indicates that there's an update to either the
> + * global or process total gpu memory counters.
> + *
> + * This event should be emitted whenever the kernel device driver allocates,
> + * frees, imports, unimports memory in the GPU addressable space.
> + *
> + * @gpu_id: This is the gpu id.
> + *
> + * @pid: Put 0 for global total, while positive pid for process total.
> + *
> + * @size: Virtual size of the allocation in bytes.
> + *
> + */
> +TRACE_EVENT(gpu_mem_total,
> + TP_PROTO(
> + uint32_t gpu_id,
> + uint32_t pid,
> + uint64_t size
> + ),
> + TP_ARGS(
> + gpu_id,
> + pid,
> + size
> + ),
> + TP_STRUCT__entry(
> + __field(uint32_t, gpu_id)
> + __field(uint32_t, pid)
> + __field(uint64_t, size)
> + ),
> + TP_fast_assign(
> + __entry->gpu_id = gpu_id;
> + __entry->pid = pid;
> + __entry->size = size;
> + ),
> + TP_printk(
> + "gpu_id=%u "
> + "pid=%u "
> + "size=%llu",
> + __entry->gpu_id,
> + __entry->pid,
> + __entry->size
> + )
> +);
> +
> +#endif /* _TRACE_GPU_MEM_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 91e885194dbc..cb404755b0a6 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -85,6 +85,9 @@ config EVENT_TRACING
> config CONTEXT_SWITCH_TRACER
> bool
>
> +config TRACE_GPU_MEM
> + bool
> +
> config RING_BUFFER_ALLOW_SWAP
> bool
> help
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index f9dcd19165fa..267985313dca 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -47,6 +47,7 @@ obj-$(CONFIG_PREEMPTIRQ_DELAY_TEST) += preemptirq_delay_test.o
> obj-$(CONFIG_SYNTH_EVENT_GEN_TEST) += synth_event_gen_test.o
> obj-$(CONFIG_KPROBE_EVENT_GEN_TEST) += kprobe_event_gen_test.o
> obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
> +obj-$(CONFIG_TRACE_GPU_MEM) += trace_gpu_mem.o
> obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
> obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
> obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
> diff --git a/kernel/trace/trace_gpu_mem.c b/kernel/trace/trace_gpu_mem.c
> new file mode 100644
> index 000000000000..01e855897b6d
> --- /dev/null
> +++ b/kernel/trace/trace_gpu_mem.c
> @@ -0,0 +1,13 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * GPU memory trace points
> + *
> + * Copyright (C) 2020 Google, Inc.
> + */
> +
> +#include <linux/module.h>
> +
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/gpu_mem.h>
> +
> +EXPORT_TRACEPOINT_SYMBOL(gpu_mem_total);
> --
> 2.25.0.341.g760bfbb309-goog
>

2020-02-11 01:39:57

by Yiwei Zhang

[permalink] [raw]
Subject: Re: [PATCH] Add gpu memory tracepoints

On Mon, Feb 10, 2020 at 5:17 PM Yiwei Zhang <[email protected]> wrote:
>
> On Mon, Feb 10, 2020 at 5:16 PM <[email protected]> wrote:
> >
> > From: Yiwei Zhang <[email protected]>
> >
> > This change adds the below gpu memory tracepoint:
> > gpu_mem/gpu_mem_total: track global or process gpu memory total counters
> >
> > Signed-off-by: Yiwei Zhang <[email protected]>
> > ---
> > include/trace/events/gpu_mem.h | 64 ++++++++++++++++++++++++++++++++++
> > kernel/trace/Kconfig | 3 ++
> > kernel/trace/Makefile | 1 +
> > kernel/trace/trace_gpu_mem.c | 13 +++++++
> > 4 files changed, 81 insertions(+)
> > create mode 100644 include/trace/events/gpu_mem.h
> > create mode 100644 kernel/trace/trace_gpu_mem.c
> >
> > diff --git a/include/trace/events/gpu_mem.h b/include/trace/events/gpu_mem.h
> > new file mode 100644
> > index 000000000000..3b632a2b5100
> > --- /dev/null
> > +++ b/include/trace/events/gpu_mem.h
> > @@ -0,0 +1,64 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * GPU memory trace points
> > + *
> > + * Copyright (C) 2020 Google, Inc.
> > + */
> > +
> > +#undef TRACE_SYSTEM
> > +#define TRACE_SYSTEM gpu_mem
> > +
> > +#if !defined(_TRACE_GPU_MEM_H) || defined(TRACE_HEADER_MULTI_READ)
> > +#define _TRACE_GPU_MEM_H
> > +
> > +#include <linux/tracepoint.h>
> > +
> > +/*
> > + * The gpu_memory_total event indicates that there's an update to either the
> > + * global or process total gpu memory counters.
> > + *
> > + * This event should be emitted whenever the kernel device driver allocates,
> > + * frees, imports, unimports memory in the GPU addressable space.
> > + *
> > + * @gpu_id: This is the gpu id.
> > + *
> > + * @pid: Put 0 for global total, while positive pid for process total.
> > + *
> > + * @size: Virtual size of the allocation in bytes.
> > + *
> > + */
> > +TRACE_EVENT(gpu_mem_total,
> > + TP_PROTO(
> > + uint32_t gpu_id,
> > + uint32_t pid,
> > + uint64_t size
> > + ),
> > + TP_ARGS(
> > + gpu_id,
> > + pid,
> > + size
> > + ),
> > + TP_STRUCT__entry(
> > + __field(uint32_t, gpu_id)
> > + __field(uint32_t, pid)
> > + __field(uint64_t, size)
> > + ),
> > + TP_fast_assign(
> > + __entry->gpu_id = gpu_id;
> > + __entry->pid = pid;
> > + __entry->size = size;
> > + ),
> > + TP_printk(
> > + "gpu_id=%u "
> > + "pid=%u "
> > + "size=%llu",
> > + __entry->gpu_id,
> > + __entry->pid,
> > + __entry->size
> > + )
> > +);
> > +
> > +#endif /* _TRACE_GPU_MEM_H */
> > +
> > +/* This part must be outside protection */
> > +#include <trace/define_trace.h>
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index 91e885194dbc..cb404755b0a6 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -85,6 +85,9 @@ config EVENT_TRACING
> > config CONTEXT_SWITCH_TRACER
> > bool
> >
> > +config TRACE_GPU_MEM
> > + bool
> > +
> > config RING_BUFFER_ALLOW_SWAP
> > bool
> > help
> > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > index f9dcd19165fa..267985313dca 100644
> > --- a/kernel/trace/Makefile
> > +++ b/kernel/trace/Makefile
> > @@ -47,6 +47,7 @@ obj-$(CONFIG_PREEMPTIRQ_DELAY_TEST) += preemptirq_delay_test.o
> > obj-$(CONFIG_SYNTH_EVENT_GEN_TEST) += synth_event_gen_test.o
> > obj-$(CONFIG_KPROBE_EVENT_GEN_TEST) += kprobe_event_gen_test.o
> > obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
> > +obj-$(CONFIG_TRACE_GPU_MEM) += trace_gpu_mem.o
> > obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
> > obj-$(CONFIG_PREEMPTIRQ_TRACEPOINTS) += trace_preemptirq.o
> > obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
> > diff --git a/kernel/trace/trace_gpu_mem.c b/kernel/trace/trace_gpu_mem.c
> > new file mode 100644
> > index 000000000000..01e855897b6d
> > --- /dev/null
> > +++ b/kernel/trace/trace_gpu_mem.c
> > @@ -0,0 +1,13 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * GPU memory trace points
> > + *
> > + * Copyright (C) 2020 Google, Inc.
> > + */
> > +
> > +#include <linux/module.h>
> > +
> > +#define CREATE_TRACE_POINTS
> > +#include <trace/events/gpu_mem.h>
> > +
> > +EXPORT_TRACEPOINT_SYMBOL(gpu_mem_total);
> > --
> > 2.25.0.341.g760bfbb309-goog
> >

2020-02-11 02:31:06

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] Add gpu memory tracepoints

On Mon, 10 Feb 2020 17:16:31 -0800
[email protected] wrote:

> From: Yiwei Zhang <[email protected]>
>
> This change adds the below gpu memory tracepoint:
> gpu_mem/gpu_mem_total: track global or process gpu memory total counters
>
> Signed-off-by: Yiwei Zhang <[email protected]>
> ---
> include/trace/events/gpu_mem.h | 64 ++++++++++++++++++++++++++++++++++
> kernel/trace/Kconfig | 3 ++
> kernel/trace/Makefile | 1 +
> kernel/trace/trace_gpu_mem.c | 13 +++++++
> 4 files changed, 81 insertions(+)
> create mode 100644 include/trace/events/gpu_mem.h
> create mode 100644 kernel/trace/trace_gpu_mem.c

What exactly is this, and why is it being put in the tracing
infrastructure code?

-- Steve

2020-02-11 03:20:01

by Yiwei Zhang

[permalink] [raw]
Subject: Re: [PATCH] Add gpu memory tracepoints

Thanks for the prompt reply!

The tracepoint proposed here is for tracking global gpu memory usage
total counter and per-process gpu memory usage total counter. The
tracepoint is for gfx drivers who have implemented gpu memory tracking
system. The tracepoint expects the de-duplication of the shared memory
is done inside the tracking system.

On Android, the graphics driver has implemented gpu memory tracking.
First, we'd like to profiler GPU memory with this tracepoint. Second,
we implement eBPF programs and attach to this tracepoint for tracking
GPU memory at runtime on production devices. However, the tracepoint +
eBPF approach requires the tracepoint to be upstreamed so that it's
considered a stable interface which Android common kernel can carry it
forever.

Best,
Yiwei



On Mon, Feb 10, 2020 at 6:19 PM Steven Rostedt <[email protected]> wrote:
>
> On Mon, 10 Feb 2020 17:16:31 -0800
> [email protected] wrote:
>
> > From: Yiwei Zhang <[email protected]>
> >
> > This change adds the below gpu memory tracepoint:
> > gpu_mem/gpu_mem_total: track global or process gpu memory total counters
> >
> > Signed-off-by: Yiwei Zhang <[email protected]>
> > ---
> > include/trace/events/gpu_mem.h | 64 ++++++++++++++++++++++++++++++++++
> > kernel/trace/Kconfig | 3 ++
> > kernel/trace/Makefile | 1 +
> > kernel/trace/trace_gpu_mem.c | 13 +++++++
> > 4 files changed, 81 insertions(+)
> > create mode 100644 include/trace/events/gpu_mem.h
> > create mode 100644 kernel/trace/trace_gpu_mem.c
>
> What exactly is this, and why is it being put in the tracing
> infrastructure code?
>
> -- Steve

2020-02-11 03:22:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] Add gpu memory tracepoints

On Mon, 10 Feb 2020 19:05:35 -0800
Yiwei Zhang <[email protected]> wrote:

> Thanks for the prompt reply!
>
> The tracepoint proposed here is for tracking global gpu memory usage
> total counter and per-process gpu memory usage total counter. The
> tracepoint is for gfx drivers who have implemented gpu memory tracking
> system. The tracepoint expects the de-duplication of the shared memory
> is done inside the tracking system.
>
> On Android, the graphics driver has implemented gpu memory tracking.
> First, we'd like to profiler GPU memory with this tracepoint. Second,
> we implement eBPF programs and attach to this tracepoint for tracking
> GPU memory at runtime on production devices. However, the tracepoint +
> eBPF approach requires the tracepoint to be upstreamed so that it's
> considered a stable interface which Android common kernel can carry it
> forever.


Then it needs to live in the drivers/gpu directory. kernel/trace is for
tracing infrastructure and not for adding trace points.

-- Steve

2020-02-12 19:28:02

by Yiwei Zhang

[permalink] [raw]
Subject: Re: [PATCH] Add gpu memory tracepoints

Hi Steven,

I can move the stuff out from the kernel/trace. Then can we still
leave include/trace/events/gpu_mem.h where it is right now? Or do we
have to move that out as well? Because we would need a non-drm common
header place for the tracepoint so that downstream drivers can find
the tracepoint definition.

Best,
Yiwei

On Mon, Feb 10, 2020 at 7:15 PM Steven Rostedt <[email protected]> wrote:
>
> On Mon, 10 Feb 2020 19:05:35 -0800
> Yiwei Zhang <[email protected]> wrote:
>
> > Thanks for the prompt reply!
> >
> > The tracepoint proposed here is for tracking global gpu memory usage
> > total counter and per-process gpu memory usage total counter. The
> > tracepoint is for gfx drivers who have implemented gpu memory tracking
> > system. The tracepoint expects the de-duplication of the shared memory
> > is done inside the tracking system.
> >
> > On Android, the graphics driver has implemented gpu memory tracking.
> > First, we'd like to profiler GPU memory with this tracepoint. Second,
> > we implement eBPF programs and attach to this tracepoint for tracking
> > GPU memory at runtime on production devices. However, the tracepoint +
> > eBPF approach requires the tracepoint to be upstreamed so that it's
> > considered a stable interface which Android common kernel can carry it
> > forever.
>
>
> Then it needs to live in the drivers/gpu directory. kernel/trace is for
> tracing infrastructure and not for adding trace points.
>
> -- Steve

2020-02-12 19:39:35

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] Add gpu memory tracepoints

On Wed, 12 Feb 2020 11:26:08 -0800
Yiwei Zhang <[email protected]> wrote:

> Hi Steven,
>
> I can move the stuff out from the kernel/trace. Then can we still
> leave include/trace/events/gpu_mem.h where it is right now? Or do we
> have to move that out as well? Because we would need a non-drm common
> header place for the tracepoint so that downstream drivers can find
> the tracepoint definition.
>

You can leave the header there. The include/trace/events/ is the place
to put trace event headers for common code.

It just did not belong in kernel/trace/

Thanks!

-- Steve

2020-02-12 19:41:13

by Yiwei Zhang

[permalink] [raw]
Subject: Re: [PATCH] Add gpu memory tracepoints

Thanks for the info! I'll update the patch accordingly.

Best regards,
Yiwei

On Wed, Feb 12, 2020 at 11:37 AM Steven Rostedt <[email protected]> wrote:
>
> On Wed, 12 Feb 2020 11:26:08 -0800
> Yiwei Zhang <[email protected]> wrote:
>
> > Hi Steven,
> >
> > I can move the stuff out from the kernel/trace. Then can we still
> > leave include/trace/events/gpu_mem.h where it is right now? Or do we
> > have to move that out as well? Because we would need a non-drm common
> > header place for the tracepoint so that downstream drivers can find
> > the tracepoint definition.
> >
>
> You can leave the header there. The include/trace/events/ is the place
> to put trace event headers for common code.
>
> It just did not belong in kernel/trace/
>
> Thanks!
>
> -- Steve