2022-09-16 21:32:41

by Steve French

[permalink] [raw]
Subject: debugging rmmod failures

Any suggestions on how to debug why rmmod fails with EBUSY?

I was trying to debug a problem where rmmod fails during an xfstest
run but couldn't find useful information on how to get ideas 'why'
rmmod is failing (in this case with EBUSY) and I could only find two
places where EBUSY would plausibly be returned but don't see the
pr_debug message in the log (e.g. "already dying") that I would expect
to get in those cases.

It also fails in this test scenario (which takes a while to reproduce
so isn't trivial to repro) with EBUSY when doing "rmmod --verbose
--force" and the --verbose didn't display any additional info.

I also tried "trace-cmd record -e *module*" which showed it (one call)
returning 0xFFFFFFF0 but nothing useful that I could see.

Any ideas on how to debug *why* an rmmod fails?
--
Thanks,

Steve


2022-09-19 16:52:39

by Stephen Brennan

[permalink] [raw]
Subject: Re: debugging rmmod failures

On 9/16/22 13:13, Steve French wrote:
> Any suggestions on how to debug why rmmod fails with EBUSY?
>
> I was trying to debug a problem where rmmod fails during an xfstest
> run but couldn't find useful information on how to get ideas 'why'
> rmmod is failing (in this case with EBUSY) and I could only find two
> places where EBUSY would plausibly be returned but don't see the
> pr_debug message in the log (e.g. "already dying") that I would expect
> to get in those cases.
>
> It also fails in this test scenario (which takes a while to reproduce
> so isn't trivial to repro) with EBUSY when doing "rmmod --verbose
> --force" and the --verbose didn't display any additional info.
>
> I also tried "trace-cmd record -e *module*" which showed it (one call)
> returning 0xFFFFFFF0 but nothing useful that I could see.

In situations like this I sometimes find it useful to use the ftrace
function_graph tracer. It can show you every single function call,
starting from some root function. Of course it's a bit of a footgun, so
be careful with it :)

I used it to trace the delete_module system call like so:

# trace-cmd record -p function_graph -g __x64_sys_delete_module \
-F rmmod $MODULE_TO_REMOVE
...
# trace-cmd report
rmmod-41656 [007] 5506.963846: funcgraph_entry: | __x64_sys_delete_module() {
rmmod-41656 [007] 5506.963846: funcgraph_entry: | capable() {
rmmod-41656 [007] 5506.963846: funcgraph_entry: | security_capable() {
... ...
rmmod-41656 [007] 5506.965521: funcgraph_entry: | free_module() {
rmmod-41656 [007] 5506.965521: funcgraph_entry: | mod_sysfs_teardown() {
rmmod-41656 [007] 5506.965522: funcgraph_entry: | mutex_lock() {


Of course, my module teardown succeeded, but you should be able to
cross reference your function calls with the source code to see where
an error may have occurred.

Regards,
Stephen

>
> Any ideas on how to debug *why* an rmmod fails?