LinuxLists.cc - Re: [PATCH v2] block: genhd: don't call probe function with major_names

2021-06-19 10:00:30

Subject: Re: [PATCH v2] block: genhd: don't call probe function with major_names_lock held

On Sat, Jun 19, 2021 at 10:05:44AM +0900, Tetsuo Handa wrote:
> syzbot is reporting circular locking problem at blk_request_module() [1],
> for blk_request_module() is calling probe function with major_names_lock
> held while major_names_lock is held during module's __init and __exit
> functions.
>
> loop_exit() {
> mutex_lock(&loop_ctl_mutex);
> blk_request_module() {
> mutex_lock(&major_names_lock);
> loop_probe() {
> mutex_lock(&loop_ctl_mutex); // Blocked by loop_exit()
> mutex_unlock(&loop_ctl_mutex);
> }
> mutex_unlock(&major_names_lock);
> unregister_blkdev() {
> mutex_lock(&major_names_lock); // Blocked by blk_request_module()
> mutex_unlock(&major_names_lock);
> }
> mutex_unlock(&loop_ctl_mutex);
> }
> }
>
> Based on an assumption that a probe callback passed to __register_blkdev()
> belongs to a module which calls __register_blkdev(), drop major_names_lock
> before calling probe function by holding a reference to that module which
> contains that probe function. If there is a module where this assumption
> does not hold, such module can call ____register_blkdev() directly.
>
> blk_request_module() {
> mutex_lock(&major_names_lock);
> // Block loop_exit()
> mutex_unlock(&major_names_lock);
> loop_probe() {
> mutex_lock(&loop_ctl_mutex);
> mutex_unlock(&loop_ctl_mutex);
> }
> // Unblock loop_exit()
> }
> loop_exit() {
> mutex_lock(&loop_ctl_mutex);
> unregister_blkdev() {
> mutex_lock(&major_names_lock);
> mutex_unlock(&major_names_lock);
> }
> mutex_unlock(&loop_ctl_mutex);
> }
>
> Note that regardless of this patch, it is up to probe function to
> serialize module's __init function and probe function in that module
> by using e.g. a mutex. This patch simply makes sure that module's __exit
> function won't be called when probe function is about to be called.

module init functions ARE serialized.

>
> While Desmond Cheong Zhi Xi already proposed a patch for breaking ABCD
> circular dependency [2], I consider that this patch is still needed for
> safely breaking AB-BA dependency upon module unloading. (Note that syzbot
> does not test module unloading code because the syzbot kernels are built
> with almost all modules built-in. We need manual inspection.)
>
> By doing kmalloc(GFP_KERNEL) in ____register_blkdev() before holding
> major_names_lock, we could convert major_names_lock from a mutex to
> a spinlock. But that is beyond the scope of this patch.
>
> Link: https://syzkaller.appspot.com/bug?id=7bd106c28e846d1023d4ca915718b1a0905444cb [1]
> Link: https://lkml.kernel.org/r/[email protected] [2]
> Reported-by: syzbot <[email protected]>
> Signed-off-by: Tetsuo Handa <[email protected]>
> Tested-by: syzbot <[email protected]>
> ---
> block/genhd.c | 36 +++++++++++++++++++++++++++---------
> include/linux/genhd.h | 8 +++++---
> 2 files changed, 32 insertions(+), 12 deletions(-)
>
> diff --git a/block/genhd.c b/block/genhd.c
> index 9f8cb7beaad1..9577c70a6bd3 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -169,6 +169,7 @@ static struct blk_major_name {
> int major;
> char name[16];
> void (*probe)(dev_t devt);
> + struct module *owner;

The module owner should not matter here.

> } *major_names[BLKDEV_MAJOR_HASH_SIZE];
> static DEFINE_MUTEX(major_names_lock);
>
> @@ -197,7 +198,8 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
> * @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If
> * @major = 0, try to allocate any unused major number.
> * @name: the name of the new block device as a zero terminated string
> - * @probe: allback that is called on access to any minor number of @major
> + * @probe: callback that is called on access to any minor number of @major
> + * @owner: the owner of @probe function (i.e. THIS_MODULE or NULL).

This feels wrong.

> *
> * The @name must be unique within the system.
> *
> @@ -214,8 +216,8 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
> *
> * Use register_blkdev instead for any new code.
> */
> -int __register_blkdev(unsigned int major, const char *name,
> - void (*probe)(dev_t devt))
> +int ____register_blkdev(unsigned int major, const char *name,
> + void (*probe)(dev_t devt), struct module *owner)

That's just a crazy naming scheme, please no.

> {
> struct blk_major_name **n, *p;
> int index, ret = 0;
> @@ -255,6 +257,7 @@ int __register_blkdev(unsigned int major, const char *name,
>
> p->major = major;
> p->probe = probe;
> + p->owner = owner;
> strlcpy(p->name, name, sizeof(p->name));
> p->next = NULL;
> index = major_to_index(major);
> @@ -277,7 +280,7 @@ int __register_blkdev(unsigned int major, const char *name,
> mutex_unlock(&major_names_lock);
> return ret;
> }
> -EXPORT_SYMBOL(__register_blkdev);
> +EXPORT_SYMBOL(____register_blkdev);

That's crazy.

>
> void unregister_blkdev(unsigned int major, const char *name)
> {
> @@ -676,14 +679,29 @@ void blk_request_module(dev_t devt)
> {
> unsigned int major = MAJOR(devt);
> struct blk_major_name **n;
> + void (*probe_fn)(dev_t devt);
>
> mutex_lock(&major_names_lock);
> for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
> - if ((*n)->major == major && (*n)->probe) {
> - (*n)->probe(devt);
> - mutex_unlock(&major_names_lock);
> - return;
> - }
> + if ((*n)->major != major || !(*n)->probe)
> + continue;
> + if (!try_module_get((*n)->owner))
> + break;

So you just fail? Are you sure that is ok?

> + /*
> + * Calling probe function with major_names_lock held causes
> + * circular locking dependency problem. Thus, call it after
> + * releasing major_names_lock.
> + */
> + probe_fn = (*n)->probe;
> + mutex_unlock(&major_names_lock);

So you save a pointer off and then unlock? Why does that lock matter?

> + /*
> + * Assuming that unregister_blkdev() is called from module's
> + * __exit function, a module refcount taken above allows us
> + * to safely call probe function without major_names_lock held.
> + */
> + probe_fn(devt);
> + module_put((*n)->owner);

So you are trying to keep the module in memory while the probe is call,
ok. But module removal is not an issue, you remove modules at your own
risk. As syzbot isn't even testing this, why is this an issue?

> + return;
> }
> mutex_unlock(&major_names_lock);
>
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index 6fc26f7bdf71..070b73c043e6 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -277,10 +277,12 @@ extern void put_disk(struct gendisk *disk);
>
> #define alloc_disk(minors) alloc_disk_node(minors, NUMA_NO_NODE)
>
> -int __register_blkdev(unsigned int major, const char *name,
> - void (*probe)(dev_t devt));
> +int ____register_blkdev(unsigned int major, const char *name,
> + void (*probe)(dev_t devt), struct module *owner);
> +#define __register_blkdev(major, name, probe) \
> + ____register_blkdev(major, name, probe, THIS_MODULE)

This just feels wrong, you should only need 1 level deep of __ at most.

thanks,

greg k-h

2021-06-19 10:07:04

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [PATCH v2] block: genhd: don't call probe function with major_names_lock held

On 2021/06/19 15:44, Greg KH wrote:
>> Note that regardless of this patch, it is up to probe function to
>> serialize module's __init function and probe function in that module
>> by using e.g. a mutex. This patch simply makes sure that module's __exit
>> function won't be called when probe function is about to be called.
>
> module init functions ARE serialized.

The __init functions between module foo and module bar are serialized.
But the __init function for module foo and the probe function for module
foo are not serialized.

> The module owner should not matter here.

The __exit functions between module foo and module bar are serialized.
But the __exit function for module bar and the probe function for module
bar are not serialized.

You can observe a deadlock via the following steps.

(1) Build loop.ko with CONFIG_BLK_DEV_LOOP_MIN_COUNT=0 and
a delay injection patch shown below.

----------
diff --git a/block/genhd.c b/block/genhd.c
index 9f8cb7beaad1..fe0360dc8c5d 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -680,6 +680,11 @@ void blk_request_module(dev_t devt)
mutex_lock(&major_names_lock);
for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
if ((*n)->major == major && (*n)->probe) {
+ if (!strcmp(current->comm, "bash")) {
+ pr_info("sleep injection start\n");
+ schedule_timeout_killable(10 * HZ); // Call "rmmod" here.
+ pr_info("sleep injection end\n");
+ }
(*n)->probe(devt);
mutex_unlock(&major_names_lock);
return;
----------

(2) Run the following commands from bash shell.

# modprobe loop
# mknod /dev/loop0 b 7 0
# exec 100<>/dev/loop0 & sleep 1; rmmod loop

(3) See dmesg output.

----------
[ 32.260467][ T2873] loop: module loaded
[ 32.289961][ T2880] sleep injection start
[ 42.484039][ T2880] sleep injection end
[ 42.484218][ T2880]
[ 42.484248][ T2880] ======================================================
[ 42.484381][ T2880] WARNING: possible circular locking dependency detected
[ 42.484455][ T2880] 5.13.0-rc6+ #867 Not tainted
[ 42.484508][ T2880] ------------------------------------------------------
[ 42.484579][ T2880] bash/2880 is trying to acquire lock:
[ 42.484638][ T2880] ffffffffc075b468 (loop_ctl_mutex){+.+.}-{3:3}, at: loop_probe+0x44/0x90 [loop]
[ 42.484760][ T2880]
[ 42.484760][ T2880] but task is already holding lock:
[ 42.484836][ T2880] ffffffff873ffe28 (major_names_lock){+.+.}-{3:3}, at: blk_request_module+0x1f/0x100
[ 42.484950][ T2880]
[ 42.484950][ T2880] which lock already depends on the new lock.
[ 42.484950][ T2880]
[ 42.485053][ T2880]
[ 42.485053][ T2880] the existing dependency chain (in reverse order) is:
[ 42.485144][ T2880]
[ 42.485144][ T2880] -> #1 (major_names_lock){+.+.}-{3:3}:
[ 42.485230][ T2880] lock_acquire+0xb3/0x380
[ 42.485292][ T2880] __mutex_lock+0x89/0x8f0
[ 42.485350][ T2880] mutex_lock_nested+0x16/0x20
[ 42.485410][ T2880] unregister_blkdev+0x38/0xb0
[ 42.485469][ T2880] loop_exit+0x44/0xd84 [loop]
[ 42.485534][ T2880] __x64_sys_delete_module+0x135/0x210
[ 42.485602][ T2880] do_syscall_64+0x36/0x70
[ 42.485660][ T2880] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 42.485733][ T2880]
[ 42.485733][ T2880] -> #0 (loop_ctl_mutex){+.+.}-{3:3}:
[ 42.485817][ T2880] check_prevs_add+0x16a/0x1040
[ 42.487091][ T2880] __lock_acquire+0x118b/0x1580
[ 42.488427][ T2880] lock_acquire+0xb3/0x380
[ 42.489701][ T2880] __mutex_lock+0x89/0x8f0
[ 42.490960][ T2880] mutex_lock_nested+0x16/0x20
[ 42.492374][ T2880] loop_probe+0x44/0x90 [loop]
[ 42.493756][ T2880] blk_request_module+0xa3/0x100
[ 42.495207][ T2880] blkdev_get_no_open+0x93/0xc0
[ 42.496516][ T2880] blkdev_get_by_dev+0x56/0x200
[ 42.497735][ T2880] blkdev_open+0x5d/0xa0
[ 42.498919][ T2880] do_dentry_open+0x163/0x3b0
[ 42.500157][ T2880] vfs_open+0x28/0x30
[ 42.501312][ T2880] path_openat+0x7e6/0xad0
[ 42.502443][ T2880] do_filp_open+0x9f/0x110
[ 42.503592][ T2880] do_sys_openat2+0x245/0x310
[ 42.504647][ T2880] do_sys_open+0x48/0x80
[ 42.505689][ T2880] __x64_sys_open+0x1c/0x20
[ 42.506730][ T2880] do_syscall_64+0x36/0x70
[ 42.507795][ T2880] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 42.508890][ T2880]
[ 42.508890][ T2880] other info that might help us debug this:
[ 42.508890][ T2880]
[ 42.511436][ T2880] Possible unsafe locking scenario:
[ 42.511436][ T2880]
[ 42.512303][ T2880] CPU0 CPU1
[ 42.512727][ T2880] ---- ----
[ 42.513143][ T2880] lock(major_names_lock);
[ 42.513557][ T2880] lock(loop_ctl_mutex);
[ 42.513987][ T2880] lock(major_names_lock);
[ 42.514417][ T2880] lock(loop_ctl_mutex);
[ 42.514841][ T2880]
[ 42.514841][ T2880] *** DEADLOCK ***
[ 42.514841][ T2880]
[ 42.516112][ T2880] 1 lock held by bash/2880:
[ 42.516518][ T2880] #0: ffffffff873ffe28 (major_names_lock){+.+.}-{3:3}, at: blk_request_module+0x1f/0x100
[ 42.517383][ T2880]
[ 42.517383][ T2880] stack backtrace:
[ 42.518228][ T2880] CPU: 3 PID: 2880 Comm: bash Not tainted 5.13.0-rc6+ #867
[ 42.518679][ T2880] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020
[ 42.519650][ T2880] Call Trace:
[ 42.520128][ T2880] dump_stack+0x76/0x95
[ 42.520608][ T2880] print_circular_bug.isra.50.cold.71+0x13c/0x141
[ 42.521105][ T2880] check_noncircular+0xfe/0x110
[ 42.521607][ T2880] check_prevs_add+0x16a/0x1040
[ 42.522106][ T2880] __lock_acquire+0x118b/0x1580
[ 42.522606][ T2880] lock_acquire+0xb3/0x380
[ 42.523244][ T2880] ? loop_probe+0x44/0x90 [loop]
[ 42.523753][ T2880] __mutex_lock+0x89/0x8f0
[ 42.524250][ T2880] ? loop_probe+0x44/0x90 [loop]
[ 42.524749][ T2880] ? loop_probe+0x44/0x90 [loop]
[ 42.525290][ T2880] ? blkdev_get_by_dev+0x200/0x200
[ 42.525790][ T2880] ? vprintk_default+0x18/0x20
[ 42.526286][ T2880] ? vprintk+0x56/0x130
[ 42.526797][ T2880] ? blkdev_get_by_dev+0x200/0x200
[ 42.527286][ T2880] mutex_lock_nested+0x16/0x20
[ 42.527768][ T2880] ? mutex_lock_nested+0x16/0x20
[ 42.528250][ T2880] loop_probe+0x44/0x90 [loop]
[ 42.528730][ T2880] blk_request_module+0xa3/0x100
[ 42.529209][ T2880] blkdev_get_no_open+0x93/0xc0
[ 42.529691][ T2880] blkdev_get_by_dev+0x56/0x200
[ 42.530176][ T2880] ? blkdev_get_by_dev+0x200/0x200
[ 42.530688][ T2880] blkdev_open+0x5d/0xa0
[ 42.531170][ T2880] do_dentry_open+0x163/0x3b0
[ 42.531656][ T2880] vfs_open+0x28/0x30
[ 42.532132][ T2880] path_openat+0x7e6/0xad0
[ 42.532612][ T2880] do_filp_open+0x9f/0x110
[ 42.533094][ T2880] ? _raw_spin_unlock+0x1d/0x30
[ 42.533582][ T2880] ? alloc_fd+0x116/0x200
[ 42.534069][ T2880] do_sys_openat2+0x245/0x310
[ 42.534566][ T2880] do_sys_open+0x48/0x80
[ 42.535015][ T2880] __x64_sys_open+0x1c/0x20
[ 42.535462][ T2880] do_syscall_64+0x36/0x70
[ 42.535913][ T2880] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 42.536364][ T2880] RIP: 0033:0x7f16f15737f0
[ 42.536797][ T2880] Code: 48 8b 15 83 76 2d 00 f7 d8 64 89 02 48 83 c8 ff c3 66 0f 1f 84 00 00 00 00 00 83 3d cd d7 2d 00 00 75 10 b8 02 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e cf 01 00 48 89 04 24
[ 42.538148][ T2880] RSP: 002b:00007ffde9e2df58 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[ 42.538671][ T2880] RAX: ffffffffffffffda RBX: 00000000021b5020 RCX: 00007f16f15737f0
[ 42.539173][ T2880] RDX: 00000000000001b6 RSI: 0000000000000042 RDI: 00000000021bd310
[ 42.539681][ T2880] RBP: 00007ffde9e2dfe0 R08: 0000000000000020 R09: 00000000021bd310
[ 42.540187][ T2880] R10: 00000000fffffff0 R11: 0000000000000246 R12: 000000000000000b
[ 42.540696][ T2880] R13: 0000000000000001 R14: 0000000000000064 R15: 0000000000000000
[ 246.772906][ T35] INFO: task bash:2880 blocked for more than 122 seconds.
[ 246.774289][ T35] Not tainted 5.13.0-rc6+ #867
[ 246.775478][ T35] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.776734][ T35] task:bash state:D stack: 0 pid: 2880 ppid: 2856 flags:0x00004000
[ 246.779550][ T35] Call Trace:
[ 246.781054][ T35] __schedule+0x271/0x9e0
[ 246.782381][ T35] schedule+0x50/0xc0
[ 246.783671][ T35] schedule_preempt_disabled+0x10/0x20
[ 246.784978][ T35] __mutex_lock+0x467/0x8f0
[ 246.786277][ T35] ? loop_probe+0x44/0x90 [loop]
[ 246.787582][ T35] ? blkdev_get_by_dev+0x200/0x200
[ 246.789093][ T35] ? blkdev_get_by_dev+0x200/0x200
[ 246.790409][ T35] mutex_lock_nested+0x16/0x20
[ 246.791700][ T35] ? mutex_lock_nested+0x16/0x20
[ 246.792992][ T35] loop_probe+0x44/0x90 [loop]
[ 246.794286][ T35] blk_request_module+0xa3/0x100
[ 246.795757][ T35] blkdev_get_no_open+0x93/0xc0
[ 246.797066][ T35] blkdev_get_by_dev+0x56/0x200
[ 246.798360][ T35] ? blkdev_get_by_dev+0x200/0x200
[ 246.799661][ T35] blkdev_open+0x5d/0xa0
[ 246.800953][ T35] do_dentry_open+0x163/0x3b0
[ 246.802258][ T35] vfs_open+0x28/0x30
[ 246.803559][ T35] path_openat+0x7e6/0xad0
[ 246.805201][ T35] do_filp_open+0x9f/0x110
[ 246.807128][ T35] ? _raw_spin_unlock+0x1d/0x30
[ 246.808461][ T35] ? alloc_fd+0x116/0x200
[ 246.809763][ T35] do_sys_openat2+0x245/0x310
[ 246.811034][ T35] do_sys_open+0x48/0x80
[ 246.812257][ T35] __x64_sys_open+0x1c/0x20
[ 246.813416][ T35] do_syscall_64+0x36/0x70
[ 246.814522][ T35] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 246.815624][ T35] RIP: 0033:0x7f16f15737f0
[ 246.816707][ T35] RSP: 002b:00007ffde9e2df58 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
[ 246.817863][ T35] RAX: ffffffffffffffda RBX: 00000000021b5020 RCX: 00007f16f15737f0
[ 246.819033][ T35] RDX: 00000000000001b6 RSI: 0000000000000042 RDI: 00000000021bd310
[ 246.820059][ T35] RBP: 00007ffde9e2dfe0 R08: 0000000000000020 R09: 00000000021bd310
[ 246.820687][ T35] R10: 00000000fffffff0 R11: 0000000000000246 R12: 000000000000000b
[ 246.821292][ T35] R13: 0000000000000001 R14: 0000000000000064 R15: 0000000000000000
[ 246.821882][ T35] INFO: task rmmod:2882 blocked for more than 122 seconds.
[ 246.822456][ T35] Not tainted 5.13.0-rc6+ #867
[ 246.823011][ T35] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.823599][ T35] task:rmmod state:D stack: 0 pid: 2882 ppid: 2856 flags:0x00000000
[ 246.824799][ T35] Call Trace:
[ 246.825405][ T35] __schedule+0x271/0x9e0
[ 246.826018][ T35] schedule+0x50/0xc0
[ 246.826621][ T35] schedule_preempt_disabled+0x10/0x20
[ 246.827232][ T35] __mutex_lock+0x467/0x8f0
[ 246.827854][ T35] ? unregister_blkdev+0x38/0xb0
[ 246.828608][ T35] mutex_lock_nested+0x16/0x20
[ 246.829237][ T35] ? mutex_lock_nested+0x16/0x20
[ 246.829858][ T35] unregister_blkdev+0x38/0xb0
[ 246.830478][ T35] loop_exit+0x44/0xd84 [loop]
[ 246.831112][ T35] __x64_sys_delete_module+0x135/0x210
[ 246.831749][ T35] do_syscall_64+0x36/0x70
[ 246.832379][ T35] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 246.833019][ T35] RIP: 0033:0x7ff4990fcee7
[ 246.833655][ T35] RSP: 002b:00007ffe445ecab8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[ 246.834326][ T35] RAX: ffffffffffffffda RBX: 0000000000a21220 RCX: 00007ff4990fcee7
[ 246.835011][ T35] RDX: 00007ff499171a80 RSI: 0000000000000800 RDI: 0000000000a21288
[ 246.835794][ T35] RBP: 0000000000000000 R08: 00007ff4993c6060 R09: 00007ff499171a80
[ 246.836501][ T35] R10: 00007ffe445ec520 R11: 0000000000000206 R12: 00007ffe445ee840
[ 246.837206][ T35] R13: 0000000000000000 R14: 0000000000a21220 R15: 0000000000a21010
[ 246.837922][ T35] INFO: lockdep is turned off.
----------

Christoph and Desmond commented

> > And now you can all probe after it has been freed and/or the module has
> > been unloaded. The obviously correct fix is to only hold mtd_table_mutex
> > for the actually required critical section:
> >
>
> Thank you for the correction, Christoph. I hadn't thought of the
> scenario where the module is unloaded. I'll be more conscientious in the
> future.

at https://lkml.kernel.org/r/YMs3O/[email protected]
but the fact that AB-BA deadlock is possible was forgotten. Therefore, I propose
this patch for fixing commit a160c6159d4a0cf8 ("block: add an optional probe callback
to major_names") which started calling the probe function without making sure that
the module will not go away.

>> @@ -676,14 +679,29 @@ void blk_request_module(dev_t devt)
>> {
>> unsigned int major = MAJOR(devt);
>> struct blk_major_name **n;
>> + void (*probe_fn)(dev_t devt);
>>
>> mutex_lock(&major_names_lock);
>> for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
>> - if ((*n)->major == major && (*n)->probe) {
>> - (*n)->probe(devt);
>> - mutex_unlock(&major_names_lock);
>> - return;
>> - }
>> + if ((*n)->major != major || !(*n)->probe)
>> + continue;
>> + if (!try_module_get((*n)->owner))
>> + break;
>
> So you just fail? Are you sure that is ok?

After "break;", the control reaches request_module() (which waits for module
unloading to complete and then tries to load that module again).

I think failing open() for once due to racing with module unloading is acceptable
(as long as there is no leak/deadlock/oops etc.), but do we want to immediately
retry this loop after returning from request_module() ?

Also, to make sure that the __init function for module foo and the probe
function for module foo are serialized, should we also verify that
(*n)->owner && (*n)->owner->state == MODULE_STATE_LIVE (which indicates that
the __init function for that module has completed) because module_is_live()
from try_module_get() is rather weak?

----------
/* FIXME: It'd be nice to isolate modules during init, too, so they
aren't used before they (may) fail. But presently too much code
(IDE & SCSI) require entry into the module during init.*/
static inline bool module_is_live(struct module *mod)
{
return mod->state != MODULE_STATE_GOING;
}
----------

>> + /*
>> + * Assuming that unregister_blkdev() is called from module's
>> + * __exit function, a module refcount taken above allows us
>> + * to safely call probe function without major_names_lock held.
>> + */
>> + probe_fn(devt);
>> + module_put((*n)->owner);
>
> So you are trying to keep the module in memory while the probe is call,
> ok. But module removal is not an issue, you remove modules at your own
> risk. As syzbot isn't even testing this, why is this an issue?

That's a crazy comment. A module removal bug (unless forced unloading is used)
is an issue. Why not fix bugs where syzbot cannot test?