Dear Linux Kernel Community,
I hope this message finds you well.
I'd like to use crash utility for postmortem of my kernel coredump
analysis.
I was able to collect coredump and able to use various operation from
within the crash utility such as irq -s, log, files and others.
I am using: crash-arm64 version: 7.3.0, gdb version: 7.6, kernel version
4.19.
My specific interest lies in debugging drivers internal state, e.g.
platform drivers.
For some hands-on experience with crash utility I'd like to start by
iterating over all the platform drivers and print their names,
However, I am finding it challenging to get started with this process
and I am uncertain of the best approach to achieve this. I have scoured
various resources for insights, but the information related to this
specific usage seems to be scattered and not exhaustive.
Given the collective expertise on this mailing list, I thought it would
be the best place to seek guidance. Specifically, I would appreciate it
if you could provide:
Any relevant documentation, guides, or tutorials to debug platform
drivers using the crash utility for kernel coredump analysis.
Some simple examples of using the crash utility to debug platform
drivers, if possible.
Any important points or common pitfalls to keep in mind while performing
this kind of analysis.
Any other tips, best practices, or recommendations to effectively debug
platform drivers using the crash utility would also be greatly appreciated.
Thank you for your time and assistance. I look forward to hearing from you.
Best regards,
Talel, Shenhar.
Hi Talel,
Thanks for the message, this is definitely the right place to discuss
these sorts of questions.
"Shenhar, Talel" <[email protected]> writes:
> Dear Linux Kernel Community,
>
> I hope this message finds you well.
>
> I'd like to use crash utility for postmortem of my kernel coredump
> analysis.
>
> I was able to collect coredump and able to use various operation from
> within the crash utility such as irq -s, log, files and others.
>
> I am using: crash-arm64 version: 7.3.0, gdb version: 7.6, kernel version
> 4.19.
You've definitely got the hard part done if you've got the core dump and
crash all working.
> My specific interest lies in debugging drivers internal state, e.g.
> platform drivers.
Please excuse my ignorance on your particular use case, I haven't done a
ton of work with device drivers or ARM-specific ones either!
> For some hands-on experience with crash utility I'd like to start by
> iterating over all the platform drivers and print their names,
>
> However, I am finding it challenging to get started with this process
> and I am uncertain of the best approach to achieve this. I have scoured
> various resources for insights, but the information related to this
> specific usage seems to be scattered and not exhaustive.
Crash has some excellent helpers, as you've seen (irq, log, files, kmem,
etc...). If you're lucky enough to have a crash command that deals with
the particular area you're debugging, then that can go a long way.
Unfortunately not every subsystem has such a helper command, and this is
especially true for device drivers.
So no matter what tool you use for this -- crash, drgn, or others -- you
will not be relying on a nice "list-all-platform-devices --name"
command. Instead, you'll simply need to use your knowledge of the code
for the subsystem to help you navigate it.
As I said, I don't know much about device drivers, but what I've
frequently seen with subsystems is some struct with function pointers
and maybe a name, then a "register_xxx()" function call, which would
register a driver or backend, which then places a struct on a linked
list of all the drivers or backends.
So a good place to start for this particular question would be to find
the global variable declaring the list head for your drivers. Use the
crash "list" command (it takes a good few minutes to get your head
around all the options, but it's powerful) to enumerate them and print
relevant fields, such as the name.
As an example which isn't driver-specific, you might want to look at all
of the slab caches (struct kmem_cache) and print their names. They have
a field "name", and a field "list" which is a list_head. There is an
external global variable named "slab_caches" which is the list head of
the list of all caches. You could iterate over all of them with:
list -s kmem_cache.name -o kmem_cache.list -H slab_caches
The "-s kmem_cache.name" tells it what to print, the "-o
kmem_cache.list" tells it to use that as the struct list_head linking
the list, and "-H slab_caches" tells it that this is an external head of
the list.
I assume a similar method could be used for your particular situation.
Then the "struct" and "p" commands can be used to interpret data
structures you find.
> Given the collective expertise on this mailing list, I thought it would
> be the best place to seek guidance. Specifically, I would appreciate it
> if you could provide:
>
> Any relevant documentation, guides, or tutorials to debug platform
> drivers using the crash utility for kernel coredump analysis.
> Some simple examples of using the crash utility to debug platform
> drivers, if possible.
Unfortunately, debugging resources and guides are rather thin on the
ground, and usually there isn't one tailored to your particular
subsystem. If you haven't found one, unfortunately I don't have a
particular resource for platform devices. Instead, you'll need to apply
guides from other areas with your knowledge of the subsystem. Also, rely
heavily on the built-in crash "help" command.
> Any important points or common pitfalls to keep in mind while performing
> this kind of analysis.
> Any other tips, best practices, or recommendations to effectively debug
> platform drivers using the crash utility would also be greatly appreciated.
One thing I'd mention is that: when crash has helpers that are tailored
for your use case, it's definitely a super power. It makes doing
debugging tasks a breeze. But when there's no helper for your particular
subsystem, it's a lot more frustrating to do, as you're generally poring
over struct listings. Unfortunately it's a bit difficult to write new
crash helpers.
If you're familiar with Python code, then I might recommend Drgn [1] to
you. It's a Python library which allows very natural access to the
vmcore's variables and data structures. So you can write your own
helpers in Python to explore the subsystem you care about. You'll find
that many of the people on this mailing list are quite familiar with
drgn as well :)
Good luck in debugging!
Stephen
[1]: https://github.com/osandov/drgn
> Thank you for your time and assistance. I look forward to hearing from you.
>
> Best regards,
> Talel, Shenhar.
On Tue, Jun 20, 2023 at 01:47:10PM +0300, Shenhar, Talel wrote:
> Dear Linux Kernel Community,
>
> I hope this message finds you well.
>
> I'd like to use crash utility for postmortem of my kernel coredump analysis.
>
> I was able to collect coredump and able to use various operation from within
> the crash utility such as irq -s,? log, files and others.
>
> I am using: crash-arm64 version: 7.3.0, gdb version: 7.6, kernel version
> 4.19.
>
> My specific interest lies in debugging drivers internal state, e.g. platform
> drivers.
>
> For some hands-on experience with crash utility I'd like to start by
> iterating over all the platform drivers and print their names,
>
> However, I am finding it challenging to get started with this process and I
> am uncertain of the best approach to achieve this. I have scoured various
> resources for insights, but the information related to this specific usage
> seems to be scattered and not exhaustive.
>
> Given the collective expertise on this mailing list, I thought it would be
> the best place to seek guidance. Specifically, I would appreciate it if you
> could provide:
>
> Any relevant documentation, guides, or tutorials to debug platform drivers
> using the crash utility for kernel coredump analysis.
> Some simple examples of using the crash utility to debug platform drivers,
> if possible.
> Any important points or common pitfalls to keep in mind while performing
> this kind of analysis.
> Any other tips, best practices, or recommendations to effectively debug
> platform drivers using the crash utility would also be greatly appreciated.
>
> Thank you for your time and assistance. I look forward to hearing from you.
Hi, Talel,
The only thing I have to add to Stephen's excellent answer is my attempt
at getting the information you requested with drgn. I'm not very
familiar with platform drivers, so I basically read the code for
platform_driver_register() and translated the relevant parts to drgn.
Something like this should get you started:
------------------------------------------------------------------------
from drgn import NULL, container_of
from drgn.helpers.linux.list import list_for_each_entry
# This was directly translated from the bus_to_subsys() function in
# drivers/base/bus.c of the Linux kernel. We should probably add it as a
# drgn helper.
def bus_to_subsys(bus):
for sp in list_for_each_entry(
"struct subsys_private",
prog["bus_kset"].list.address_of_(),
"subsys.kobj.entry",
):
if sp.bus == bus:
return sp
return NULL(bus.prog_, "struct subsys_private *")
# Platform drivers are registered to the struct bus_type
# platform_bus_type in drivers/base/platform.c. The struct
# subsys_private has a kset containing a list of drivers.
sp = bus_to_subsys(prog["platform_bus_type"].address_of_())
for priv in list_for_each_entry(
"struct driver_private", sp.drivers_kset.list.address_of_(), "kobj.entry"
):
# This is a struct device_driver *.
driver = priv.driver
# To get the struct platform_driver *, do:
# platform_driver = container_of(driver, "struct platform_driver", "driver")
print(driver.name.string_().decode())
------------------------------------------------------------------------
(I also pushed this script to the contrib directory of the drgn
repository:
https://github.com/osandov/drgn/blob/main/contrib/platform_drivers.py)
On my ARM64 QEMU VM, this prints:
------------------------------------------------------------------------
sbsa-uart
alarmtimer
simple-pm-bus
pci-host-generic
of_fixed_factor_clk
of_fixed_clk
gpio-clk
------------------------------------------------------------------------
Hopefully this helps!