2024-02-01 12:39:15

by Ahmad Fatoum

[permalink] [raw]
Subject: Boot-time dumping of ftrace fuctiongraph buffer

Hello,

I semi-regularly debug probe failures. For drivers that use dev_err_probe
rigorously, this is a quick matter: The probe function records a deferral reason
and if the deferral persists, deferred_probe_timeout_work_func() will print
the collected reasons, even if PID 1 is never started.

For drivers that don't call dev_err_probe, I find myself sometimes doing printf
debugging inside the probe function.

I would like to replace this with the function graph tracer:

- record the probe function, configured over kernel command line
(The device indefinitely deferring probe is printed to the console,
so I know what I am looking for on the next boot)

- Dump the function graph trace

- See if the last call before (non-devm) cleanup is getting a clock, a GPIO,
a regulator or w/e.

For this to be maximally useful, I need to configure this not only at boot-time,
but also dump the ftrace buffer at boot time. Probe deferral can hinder the kernel from
calling init and providing a shell, where I could read /sys/kernel/tracing/trace.

I found following two mechanisms that looked relevant, but seem not to
do exactly what I want:

- tp_printk: seems to be related to trace points only and not usable
for the function graph output

- dump_on_oops: I don't get an Oops if probe deferral times out, but maybe
one could patch the kernel to check a oops_on_probe_deferral or dump_on_probe_deferral
kernel command line parameter in deferred_probe_timeout_work_func()?


Is there existing support that I am missing? Any input on whether this
would be a welcome feature to have?

Thanks!

Cheers,
Ahmad

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |



2024-02-02 01:46:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: Boot-time dumping of ftrace fuctiongraph buffer

On Thu, 1 Feb 2024 13:21:37 +0100
Ahmad Fatoum <[email protected]> wrote:

> Hello,
>
> I semi-regularly debug probe failures. For drivers that use dev_err_probe
> rigorously, this is a quick matter: The probe function records a deferral reason
> and if the deferral persists, deferred_probe_timeout_work_func() will print
> the collected reasons, even if PID 1 is never started.
>
> For drivers that don't call dev_err_probe, I find myself sometimes doing printf
> debugging inside the probe function.

Is the driver built in or started after init?

>
> I would like to replace this with the function graph tracer:
>
> - record the probe function, configured over kernel command line
> (The device indefinitely deferring probe is printed to the console,
> so I know what I am looking for on the next boot)
>
> - Dump the function graph trace
>
> - See if the last call before (non-devm) cleanup is getting a clock, a GPIO,
> a regulator or w/e.
>
> For this to be maximally useful, I need to configure this not only at boot-time,
> but also dump the ftrace buffer at boot time. Probe deferral can hinder the kernel from
> calling init and providing a shell, where I could read /sys/kernel/tracing/trace.

OK so the driver is built in.

>
> I found following two mechanisms that looked relevant, but seem not to
> do exactly what I want:
>
> - tp_printk: seems to be related to trace points only and not usable
> for the function graph output
>
> - dump_on_oops: I don't get an Oops if probe deferral times out, but maybe
> one could patch the kernel to check a oops_on_probe_deferral or dump_on_probe_deferral
> kernel command line parameter in deferred_probe_timeout_work_func()?
>
>
> Is there existing support that I am missing? Any input on whether this
> would be a welcome feature to have?

Well you can start function_graph on the kernel command line and event
filter on a give function

ftrace=function_graph function_graph_filter=probe_func

You can add your own ftrace_dump() on some kind of detected error and put
that in the kernel command line. For example RCU has:

rcupdate.rcu_cpu_stall_ftrace_dump=

Which will do a ftrace dump when a RCU stall is triggered.

-- Steve


2024-02-02 06:04:14

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: Boot-time dumping of ftrace fuctiongraph buffer

Hi Ahmad,

On Thu, 1 Feb 2024 13:21:37 +0100
Ahmad Fatoum <[email protected]> wrote:

> Hello,
>
> I semi-regularly debug probe failures. For drivers that use dev_err_probe
> rigorously, this is a quick matter: The probe function records a deferral reason
> and if the deferral persists, deferred_probe_timeout_work_func() will print
> the collected reasons, even if PID 1 is never started.
>
> For drivers that don't call dev_err_probe, I find myself sometimes doing printf
> debugging inside the probe function.
>
> I would like to replace this with the function graph tracer:
>
> - record the probe function, configured over kernel command line
> (The device indefinitely deferring probe is printed to the console,
> so I know what I am looking for on the next boot)
>
> - Dump the function graph trace
>
> - See if the last call before (non-devm) cleanup is getting a clock, a GPIO,
> a regulator or w/e.

What kind of information you prints by the printk()?
If the target (suspicious driver probe function) is obvious, you can use kprobe
event and tp_printk. Or, even if you don't know, if you are sure which function
is the starting/ending point, you can use bootconfig to record the specific part
of execution in the ring buffer, and dump it as Steve said.

In Documentation/trace/boottime-trace.rst, there is an example.
-----
With the trigger action and kprobes, you can trace function-graph while
a function is called. For example, this will trace all function calls in
the pci_proc_init()::

ftrace {
tracing_on = 0
tracer = function_graph
event.kprobes {
start_event {
probes = "pci_proc_init"
actions = "traceon"
}
end_event {
probes = "pci_proc_init%return"
actions = "traceoff"
}
}
}
-----

Thank you,

>
> For this to be maximally useful, I need to configure this not only at boot-time,
> but also dump the ftrace buffer at boot time. Probe deferral can hinder the kernel from
> calling init and providing a shell, where I could read /sys/kernel/tracing/trace.
>
> I found following two mechanisms that looked relevant, but seem not to
> do exactly what I want:
>
> - tp_printk: seems to be related to trace points only and not usable
> for the function graph output
>
> - dump_on_oops: I don't get an Oops if probe deferral times out, but maybe
> one could patch the kernel to check a oops_on_probe_deferral or dump_on_probe_deferral
> kernel command line parameter in deferred_probe_timeout_work_func()?
>
>
> Is there existing support that I am missing? Any input on whether this
> would be a welcome feature to have?
>
> Thanks!
>
> Cheers,
> Ahmad
>
> --
> Pengutronix e.K. | |
> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
>


--
Masami Hiramatsu (Google) <[email protected]>

2024-02-02 13:13:00

by Ahmad Fatoum

[permalink] [raw]
Subject: Re: Boot-time dumping of ftrace fuctiongraph buffer

Hello Masami-san,

On 02.02.24 07:04, Masami Hiramatsu (Google) wrote:
> On Thu, 1 Feb 2024 13:21:37 +0100
> Ahmad Fatoum <[email protected]> wrote:
>> For drivers that don't call dev_err_probe, I find myself sometimes doing printf
>> debugging inside the probe function.

[snip]

>> I would like to replace this with the function graph tracer:
>>
>> - record the probe function, configured over kernel command line
>> (The device indefinitely deferring probe is printed to the console,
>> so I know what I am looking for on the next boot)
>>
>> - Dump the function graph trace
>>
>> - See if the last call before (non-devm) cleanup is getting a clock, a GPIO,
>> a regulator or w/e.
>
> What kind of information you prints by the printk()?

Just littering around some:

pr_notice("%s:%d\n", __func__, __LINE__);

to find where the probe function exits.

> If the target (suspicious driver probe function) is obvious, you can use kprobe
> event and tp_printk. Or, even if you don't know, if you are sure which function
> is the starting/ending point, you can use bootconfig to record the specific part
> of execution in the ring buffer, and dump it as Steve said.
>
> In Documentation/trace/boottime-trace.rst, there is an example.
> -----
> With the trigger action and kprobes, you can trace function-graph while
> a function is called. For example, this will trace all function calls in
> the pci_proc_init()::
>
> ftrace {
> tracing_on = 0
> tracer = function_graph
> event.kprobes {
> start_event {
> probes = "pci_proc_init"
> actions = "traceon"
> }
> end_event {
> probes = "pci_proc_init%return"
> actions = "traceoff"
> }
> }
> }

I get comparable tracing with the kernel command line, but lacked a way to dump
it at boot-time. Sorry should have been clearer.

Grepping the kernel source for `ftrace_func_command' lets me think that
what I am after might be:

end_event {
actions = "traceoff", "dump"
}

I need to try it out. Do you happen to know if the equivalent is possible
without bootconfig?

Thank you!
Ahmad


> -----
>
> Thank you,
>
>>
>> For this to be maximally useful, I need to configure this not only at boot-time,
>> but also dump the ftrace buffer at boot time. Probe deferral can hinder the kernel from
>> calling init and providing a shell, where I could read /sys/kernel/tracing/trace.
>>
>> I found following two mechanisms that looked relevant, but seem not to
>> do exactly what I want:
>>
>> - tp_printk: seems to be related to trace points only and not usable
>> for the function graph output
>>
>> - dump_on_oops: I don't get an Oops if probe deferral times out, but maybe
>> one could patch the kernel to check a oops_on_probe_deferral or dump_on_probe_deferral
>> kernel command line parameter in deferred_probe_timeout_work_func()?
>>
>>
>> Is there existing support that I am missing? Any input on whether this
>> would be a welcome feature to have?
>>
>> Thanks!
>>
>> Cheers,
>> Ahmad
>>
>> --
>> Pengutronix e.K. | |
>> Steuerwalder Str. 21 | http://www.pengutronix.de/ |
>> 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
>> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
>>
>
>

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


2024-02-02 13:20:02

by Ahmad Fatoum

[permalink] [raw]
Subject: Re: Boot-time dumping of ftrace fuctiongraph buffer

Hello Steve,

On 02.02.24 02:46, Steven Rostedt wrote:
> On Thu, 1 Feb 2024 13:21:37 +0100
> Ahmad Fatoum <[email protected]> wrote:
>> For this to be maximally useful, I need to configure this not only at boot-time,
>> but also dump the ftrace buffer at boot time. Probe deferral can hinder the kernel from
>> calling init and providing a shell, where I could read /sys/kernel/tracing/trace.
>
> OK so the driver is built in.

Yes. The modules are easy, because I will have an access to shell by then
on my systems.

>> I found following two mechanisms that looked relevant, but seem not to
>> do exactly what I want:
>>
>> - tp_printk: seems to be related to trace points only and not usable
>> for the function graph output
>>
>> - dump_on_oops: I don't get an Oops if probe deferral times out, but maybe
>> one could patch the kernel to check a oops_on_probe_deferral or dump_on_probe_deferral
>> kernel command line parameter in deferred_probe_timeout_work_func()?
>>
>>
>> Is there existing support that I am missing? Any input on whether this
>> would be a welcome feature to have?
>
> Well you can start function_graph on the kernel command line and event
> filter on a give function
>
> ftrace=function_graph function_graph_filter=probe_func

Ye, that's what I am doing, but I was stuck on actually outputting the log
at boot time.

> You can add your own ftrace_dump() on some kind of detected error and put
> that in the kernel command line. For example RCU has:
>
> rcupdate.rcu_cpu_stall_ftrace_dump=
>
> Which will do a ftrace dump when a RCU stall is triggered.

Ah, thanks for the pointer. I take this as meaning that there is no builtin
way to dump on arbitrary function return. I will see if it's possible with
bootconfig before looking into adding a probe deferral specific kernel
command-line parameter.

Thanks,
Ahmad

>
> -- Steve
>
>

--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |