2024-03-27 09:05:04

by Tio Zhang

[permalink] [raw]
Subject: [PATCH] trace/sched: add tgid for sched_wakeup_template

By doing this, we are able to filter tasks by tgid while we are
tracing wakeup events by ebpf or other methods.

For example, when we care about tracing a user space process (which has
uncertain number of LWPs, i.e, pids) to monitor its wakeup latency,
without tgid available in sched_wakeup tracepoints, we would struggle
finding out all pids to trace, or we could use kprobe to achieve tgid
tracing, which is less accurate and much less efficient than using
tracepoint.

Signed-off-by: Tio Zhang <[email protected]>
Signed-off-by: Dylane Chen <[email protected]>
---
include/trace/events/sched.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index dbb01b4b7451..ea7e525649e5 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -149,6 +149,7 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
__field( pid_t, pid )
__field( int, prio )
__field( int, target_cpu )
+ __field( pid_t, tgid )
),

TP_fast_assign(
@@ -156,11 +157,12 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
__entry->pid = p->pid;
__entry->prio = p->prio; /* XXX SCHED_DEADLINE */
__entry->target_cpu = task_cpu(p);
+ __entry->tgid = p->tgid;
),

- TP_printk("comm=%s pid=%d prio=%d target_cpu=%03d",
+ TP_printk("comm=%s pid=%d prio=%d target_cpu=%03d tgid=%d",
__entry->comm, __entry->pid, __entry->prio,
- __entry->target_cpu)
+ __entry->target_cpu, __entry->tgid)
);

/*
--
2.17.1



2024-03-27 15:15:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] trace/sched: add tgid for sched_wakeup_template

On Wed, 27 Mar 2024 16:50:57 +0800
Tio Zhang <[email protected]> wrote:

> By doing this, we are able to filter tasks by tgid while we are
> tracing wakeup events by ebpf or other methods.
>
> For example, when we care about tracing a user space process (which has
> uncertain number of LWPs, i.e, pids) to monitor its wakeup latency,
> without tgid available in sched_wakeup tracepoints, we would struggle
> finding out all pids to trace, or we could use kprobe to achieve tgid
> tracing, which is less accurate and much less efficient than using
> tracepoint.

This is a very common trace event, and I really do not want to add more
data than necessary to it, as it increases the size of the event which
means less events can be recorded on a fixed size trace ring buffer.

Note, you are not modifying the "tracepoint", but you are actually
modifying a "trace event".

"tracepoint" is the hook in the kernel code:

trace_sched_wakeup()

"trace event" is defined by TRACE_EVENT() macro (and friends) that defines
what is exposed in the tracefs file system.

I thought ebpf could hook directly to the tracepoint which is:

trace_sched_wakeup(p);

I believe you can have direct access to the 'p' before it is processed from ebpf.

There's also "trace probes" (I think we are lacking documentation on this,
as well as event probes :-p):

$ gdb vmlinux
(gdb) p &((struct task_struct *)0)->tgid
$1 = (pid_t *) 0x56c
(gdb) p &((struct task_struct *)0)->pid
$2 = (pid_t *) 0x568

# echo 't:wakeup sched_waking pid=+0x568($arg1):u32 tgid=+0x56c($arg1):u32' > /sys/kernel/tracing/dynamic_events

# trace-cmd start -e wakeup
# trace-cmd show
trace-cmd-7307 [003] d..6. 599486.485762: wakeup: (__probestub_sched_waking+0x4/0x10) pid=845 tgid=845
bash-845 [001] d.s4. 599486.486136: wakeup: (__probestub_sched_waking+0x4/0x10) pid=17 tgid=17
bash-845 [001] d..4. 599486.486336: wakeup: (__probestub_sched_waking+0x4/0x10) pid=5516 tgid=5516
kworker/u18:2-5516 [001] d..4. 599486.486445: wakeup: (__probestub_sched_waking+0x4/0x10) pid=818 tgid=818
<idle>-0 [001] d.s4. 599486.491206: wakeup: (__probestub_sched_waking+0x4/0x10) pid=17 tgid=17
<idle>-0 [001] d.s5. 599486.493218: wakeup: (__probestub_sched_waking+0x4/0x10) pid=17 tgid=17
<idle>-0 [001] d.s4. 599486.497200: wakeup: (__probestub_sched_waking+0x4/0x10) pid=17 tgid=17
<idle>-0 [003] d.s4. 599486.829209: wakeup: (__probestub_sched_waking+0x4/0x10) pid=70 tgid=70

The above attaches to the tracepoint and $arg1 is the 'struct task_struct *p'.

-- Steve


>
> Signed-off-by: Tio Zhang <[email protected]>
> Signed-off-by: Dylane Chen <[email protected]>
> ---
> include/trace/events/sched.h | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index dbb01b4b7451..ea7e525649e5 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -149,6 +149,7 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
> __field( pid_t, pid )
> __field( int, prio )
> __field( int, target_cpu )
> + __field( pid_t, tgid )
> ),
>
> TP_fast_assign(
> @@ -156,11 +157,12 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
> __entry->pid = p->pid;
> __entry->prio = p->prio; /* XXX SCHED_DEADLINE */
> __entry->target_cpu = task_cpu(p);
> + __entry->tgid = p->tgid;
> ),
>
> - TP_printk("comm=%s pid=%d prio=%d target_cpu=%03d",
> + TP_printk("comm=%s pid=%d prio=%d target_cpu=%03d tgid=%d",
> __entry->comm, __entry->pid, __entry->prio,
> - __entry->target_cpu)
> + __entry->target_cpu, __entry->tgid)
> );
>
> /*


2024-03-29 03:05:57

by Tio Zhang

[permalink] [raw]
Subject: Re: [PATCH] trace/sched: add tgid for sched_wakeup_template

Make sense to me, thank you for your explanation.

On 3/27/24, 10:24 PM, "Steven Rostedt" <[email protected] <mailto:[email protected]>> wrote:


On Wed, 27 Mar 2024 16:50:57 +0800
Tio Zhang <[email protected] <mailto:[email protected]>> wrote:


> By doing this, we are able to filter tasks by tgid while we are
> tracing wakeup events by ebpf or other methods.
>
> For example, when we care about tracing a user space process (which has
> uncertain number of LWPs, i.e, pids) to monitor its wakeup latency,
> without tgid available in sched_wakeup tracepoints, we would struggle
> finding out all pids to trace, or we could use kprobe to achieve tgid
> tracing, which is less accurate and much less efficient than using
> tracepoint.


This is a very common trace event, and I really do not want to add more
data than necessary to it, as it increases the size of the event which
means less events can be recorded on a fixed size trace ring buffer.


Note, you are not modifying the "tracepoint", but you are actually
modifying a "trace event".


"tracepoint" is the hook in the kernel code:


trace_sched_wakeup()


"trace event" is defined by TRACE_EVENT() macro (and friends) that defines
what is exposed in the tracefs file system.


I thought ebpf could hook directly to the tracepoint which is:


trace_sched_wakeup(p);


I believe you can have direct access to the 'p' before it is processed from ebpf.


There's also "trace probes" (I think we are lacking documentation on this,
as well as event probes :-p):


$ gdb vmlinux
(gdb) p &((struct task_struct *)0)->tgid
$1 = (pid_t *) 0x56c
(gdb) p &((struct task_struct *)0)->pid
$2 = (pid_t *) 0x568


# echo 't:wakeup sched_waking pid=+0x568($arg1):u32 tgid=+0x56c($arg1):u32' > /sys/kernel/tracing/dynamic_events


# trace-cmd start -e wakeup
# trace-cmd show
trace-cmd-7307 [003] d..6. 599486.485762: wakeup: (__probestub_sched_waking+0x4/0x10) pid=845 tgid=845
bash-845 [001] d.s4. 599486.486136: wakeup: (__probestub_sched_waking+0x4/0x10) pid=17 tgid=17
bash-845 [001] d..4. 599486.486336: wakeup: (__probestub_sched_waking+0x4/0x10) pid=5516 tgid=5516
kworker/u18:2-5516 [001] d..4. 599486.486445: wakeup: (__probestub_sched_waking+0x4/0x10) pid=818 tgid=818
<idle>-0 [001] d.s4. 599486.491206: wakeup: (__probestub_sched_waking+0x4/0x10) pid=17 tgid=17
<idle>-0 [001] d.s5. 599486.493218: wakeup: (__probestub_sched_waking+0x4/0x10) pid=17 tgid=17
<idle>-0 [001] d.s4. 599486.497200: wakeup: (__probestub_sched_waking+0x4/0x10) pid=17 tgid=17
<idle>-0 [003] d.s4. 599486.829209: wakeup: (__probestub_sched_waking+0x4/0x10) pid=70 tgid=70


The above attaches to the tracepoint and $arg1 is the 'struct task_struct *p'.


-- Steve




>
> Signed-off-by: Tio Zhang <[email protected] <mailto:[email protected]>>
> Signed-off-by: Dylane Chen <[email protected] <mailto:[email protected]>>
> ---
> include/trace/events/sched.h | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index dbb01b4b7451..ea7e525649e5 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -149,6 +149,7 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
> __field( pid_t, pid )
> __field( int, prio )
> __field( int, target_cpu )
> + __field( pid_t, tgid )
> ),
>
> TP_fast_assign(
> @@ -156,11 +157,12 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
> __entry->pid = p->pid;
> __entry->prio = p->prio; /* XXX SCHED_DEADLINE */
> __entry->target_cpu = task_cpu(p);
> + __entry->tgid = p->tgid;
> ),
>
> - TP_printk("comm=%s pid=%d prio=%d target_cpu=%03d",
> + TP_printk("comm=%s pid=%d prio=%d target_cpu=%03d tgid=%d",
> __entry->comm, __entry->pid, __entry->prio,
> - __entry->target_cpu)
> + __entry->target_cpu, __entry->tgid)
> );
>
> /*