2021-06-19 15:45:12

by Tetsuo Handa

[permalink] [raw]
Subject: [PATCH v2] block: genhd: don't call probe function with major_names_lock held

syzbot is reporting circular locking problem at blk_request_module() [1],
for blk_request_module() is calling probe function with major_names_lock
held while major_names_lock is held during module's __init and __exit
functions.

loop_exit() {
mutex_lock(&loop_ctl_mutex);
blk_request_module() {
mutex_lock(&major_names_lock);
loop_probe() {
mutex_lock(&loop_ctl_mutex); // Blocked by loop_exit()
mutex_unlock(&loop_ctl_mutex);
}
mutex_unlock(&major_names_lock);
unregister_blkdev() {
mutex_lock(&major_names_lock); // Blocked by blk_request_module()
mutex_unlock(&major_names_lock);
}
mutex_unlock(&loop_ctl_mutex);
}
}

Based on an assumption that a probe callback passed to __register_blkdev()
belongs to a module which calls __register_blkdev(), drop major_names_lock
before calling probe function by holding a reference to that module which
contains that probe function. If there is a module where this assumption
does not hold, such module can call ____register_blkdev() directly.

blk_request_module() {
mutex_lock(&major_names_lock);
// Block loop_exit()
mutex_unlock(&major_names_lock);
loop_probe() {
mutex_lock(&loop_ctl_mutex);
mutex_unlock(&loop_ctl_mutex);
}
// Unblock loop_exit()
}
loop_exit() {
mutex_lock(&loop_ctl_mutex);
unregister_blkdev() {
mutex_lock(&major_names_lock);
mutex_unlock(&major_names_lock);
}
mutex_unlock(&loop_ctl_mutex);
}

Note that regardless of this patch, it is up to probe function to
serialize module's __init function and probe function in that module
by using e.g. a mutex. This patch simply makes sure that module's __exit
function won't be called when probe function is about to be called.

While Desmond Cheong Zhi Xi already proposed a patch for breaking ABCD
circular dependency [2], I consider that this patch is still needed for
safely breaking AB-BA dependency upon module unloading. (Note that syzbot
does not test module unloading code because the syzbot kernels are built
with almost all modules built-in. We need manual inspection.)

By doing kmalloc(GFP_KERNEL) in ____register_blkdev() before holding
major_names_lock, we could convert major_names_lock from a mutex to
a spinlock. But that is beyond the scope of this patch.

Link: https://syzkaller.appspot.com/bug?id=7bd106c28e846d1023d4ca915718b1a0905444cb [1]
Link: https://lkml.kernel.org/r/[email protected] [2]
Reported-by: syzbot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Tested-by: syzbot <[email protected]>
---
block/genhd.c | 36 +++++++++++++++++++++++++++---------
include/linux/genhd.h | 8 +++++---
2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 9f8cb7beaad1..9577c70a6bd3 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -169,6 +169,7 @@ static struct blk_major_name {
int major;
char name[16];
void (*probe)(dev_t devt);
+ struct module *owner;
} *major_names[BLKDEV_MAJOR_HASH_SIZE];
static DEFINE_MUTEX(major_names_lock);

@@ -197,7 +198,8 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
* @major: the requested major device number [1..BLKDEV_MAJOR_MAX-1]. If
* @major = 0, try to allocate any unused major number.
* @name: the name of the new block device as a zero terminated string
- * @probe: allback that is called on access to any minor number of @major
+ * @probe: callback that is called on access to any minor number of @major
+ * @owner: the owner of @probe function (i.e. THIS_MODULE or NULL).
*
* The @name must be unique within the system.
*
@@ -214,8 +216,8 @@ void blkdev_show(struct seq_file *seqf, off_t offset)
*
* Use register_blkdev instead for any new code.
*/
-int __register_blkdev(unsigned int major, const char *name,
- void (*probe)(dev_t devt))
+int ____register_blkdev(unsigned int major, const char *name,
+ void (*probe)(dev_t devt), struct module *owner)
{
struct blk_major_name **n, *p;
int index, ret = 0;
@@ -255,6 +257,7 @@ int __register_blkdev(unsigned int major, const char *name,

p->major = major;
p->probe = probe;
+ p->owner = owner;
strlcpy(p->name, name, sizeof(p->name));
p->next = NULL;
index = major_to_index(major);
@@ -277,7 +280,7 @@ int __register_blkdev(unsigned int major, const char *name,
mutex_unlock(&major_names_lock);
return ret;
}
-EXPORT_SYMBOL(__register_blkdev);
+EXPORT_SYMBOL(____register_blkdev);

void unregister_blkdev(unsigned int major, const char *name)
{
@@ -676,14 +679,29 @@ void blk_request_module(dev_t devt)
{
unsigned int major = MAJOR(devt);
struct blk_major_name **n;
+ void (*probe_fn)(dev_t devt);

mutex_lock(&major_names_lock);
for (n = &major_names[major_to_index(major)]; *n; n = &(*n)->next) {
- if ((*n)->major == major && (*n)->probe) {
- (*n)->probe(devt);
- mutex_unlock(&major_names_lock);
- return;
- }
+ if ((*n)->major != major || !(*n)->probe)
+ continue;
+ if (!try_module_get((*n)->owner))
+ break;
+ /*
+ * Calling probe function with major_names_lock held causes
+ * circular locking dependency problem. Thus, call it after
+ * releasing major_names_lock.
+ */
+ probe_fn = (*n)->probe;
+ mutex_unlock(&major_names_lock);
+ /*
+ * Assuming that unregister_blkdev() is called from module's
+ * __exit function, a module refcount taken above allows us
+ * to safely call probe function without major_names_lock held.
+ */
+ probe_fn(devt);
+ module_put((*n)->owner);
+ return;
}
mutex_unlock(&major_names_lock);

diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 6fc26f7bdf71..070b73c043e6 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -277,10 +277,12 @@ extern void put_disk(struct gendisk *disk);

#define alloc_disk(minors) alloc_disk_node(minors, NUMA_NO_NODE)

-int __register_blkdev(unsigned int major, const char *name,
- void (*probe)(dev_t devt));
+int ____register_blkdev(unsigned int major, const char *name,
+ void (*probe)(dev_t devt), struct module *owner);
+#define __register_blkdev(major, name, probe) \
+ ____register_blkdev(major, name, probe, THIS_MODULE)
#define register_blkdev(major, name) \
- __register_blkdev(major, name, NULL)
+ ____register_blkdev(major, name, NULL, NULL)
void unregister_blkdev(unsigned int major, const char *name);

bool bdev_check_media_change(struct block_device *bdev);
--
2.18.4


2021-06-20 13:55:49

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v2] block: genhd: don't call probe function with major_names_lock held

On 2021/06/20 11:44, Hillf Danton wrote:
> Good craft in regard to triggering the ABBA deadlock, but curious why not
> move unregister_blkdev out of and before loop_ctl_mutex, given it will also
> serialise with the prober.
>

Well, something like this untested diff?

Call unregister_blkdev() as soon as __exit function starts, for calling
probe function after cleanup started will be unexpected for __exit function.

Keep probe function no-op until __init function ends, for probe function
might be called as soon as __register_blkdev() succeeded but calling probe
function before setup completes will be unexpected for __init function.

drivers/block/ataflop.c | 6 +++++-
drivers/block/brd.c | 8 ++++++--
drivers/block/floppy.c | 4 ++++
drivers/block/loop.c | 4 ++--
drivers/ide/ide-probe.c | 8 +++++++-
drivers/md/md.c | 5 +++++
drivers/scsi/sd.c | 10 +---------
7 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
index d601e49f80e0..3681e8c493b1 100644
--- a/drivers/block/ataflop.c
+++ b/drivers/block/ataflop.c
@@ -1995,6 +1995,7 @@ static int ataflop_alloc_disk(unsigned int drive, unsigned int type)
}

static DEFINE_MUTEX(ataflop_probe_lock);
+static bool module_initialize_completed;

static void ataflop_probe(dev_t dev)
{
@@ -2006,6 +2007,8 @@ static void ataflop_probe(dev_t dev)

if (drive >= FD_MAX_UNITS || type >= NUM_DISK_MINORS)
return;
+ if (!module_initialize_completed)
+ return;
mutex_lock(&ataflop_probe_lock);
if (!unit[drive].disk[type]) {
if (ataflop_alloc_disk(drive, type) == 0)
@@ -2080,6 +2083,7 @@ static int __init atari_floppy_init (void)
UseTrackbuffer ? "" : "no ");
config_types();

+ module_initialize_completed = true;
return 0;

err:
@@ -2138,6 +2142,7 @@ static void __exit atari_floppy_exit(void)
{
int i, type;

+ unregister_blkdev(FLOPPY_MAJOR, "fd");
for (i = 0; i < FD_MAX_UNITS; i++) {
for (type = 0; type < NUM_DISK_MINORS; type++) {
if (!unit[i].disk[type])
@@ -2148,7 +2153,6 @@ static void __exit atari_floppy_exit(void)
}
blk_mq_free_tag_set(&unit[i].tag_set);
}
- unregister_blkdev(FLOPPY_MAJOR, "fd");

del_timer_sync(&fd_timer);
atari_stram_free( DMABuffer );
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 7562cf30b14e..91a10c938e65 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -371,6 +371,7 @@ __setup("ramdisk_size=", ramdisk_size);
static LIST_HEAD(brd_devices);
static DEFINE_MUTEX(brd_devices_mutex);
static struct dentry *brd_debugfs_dir;
+static bool module_initialize_completed;

static struct brd_device *brd_alloc(int i)
{
@@ -439,6 +440,8 @@ static void brd_probe(dev_t dev)
struct brd_device *brd;
int i = MINOR(dev) / max_part;

+ if (!module_initialize_completed)
+ return;
mutex_lock(&brd_devices_mutex);
list_for_each_entry(brd, &brd_devices, brd_list) {
if (brd->brd_number == i)
@@ -530,6 +533,7 @@ static int __init brd_init(void)
mutex_unlock(&brd_devices_mutex);

pr_info("brd: module loaded\n");
+ module_initialize_completed = true;
return 0;

out_free:
@@ -550,13 +554,13 @@ static void __exit brd_exit(void)
{
struct brd_device *brd, *next;

+ unregister_blkdev(RAMDISK_MAJOR, "ramdisk");
+
debugfs_remove_recursive(brd_debugfs_dir);

list_for_each_entry_safe(brd, next, &brd_devices, brd_list)
brd_del_one(brd);

- unregister_blkdev(RAMDISK_MAJOR, "ramdisk");
-
pr_info("brd: module unloaded\n");
}

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 8a9d22207c59..37b8e53c183d 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4523,6 +4523,7 @@ static int floppy_alloc_disk(unsigned int drive, unsigned int type)
}

static DEFINE_MUTEX(floppy_probe_lock);
+static bool module_initialize_completed;

static void floppy_probe(dev_t dev)
{
@@ -4533,6 +4534,8 @@ static void floppy_probe(dev_t dev)
type >= ARRAY_SIZE(floppy_type))
return;

+ if (!module_initialize_completed)
+ return;
mutex_lock(&floppy_probe_lock);
if (!disks[drive][type]) {
if (floppy_alloc_disk(drive, type) == 0)
@@ -4705,6 +4708,7 @@ static int __init do_floppy_init(void)
NULL);
}

+ module_initialize_completed = true;
return 0;

out_remove_drives:
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 76e12f3482a9..08aef61ab791 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -2386,13 +2386,13 @@ static int loop_exit_cb(int id, void *ptr, void *data)

static void __exit loop_exit(void)
{
+ unregister_blkdev(LOOP_MAJOR, "loop");
+
mutex_lock(&loop_ctl_mutex);

idr_for_each(&loop_index_idr, &loop_exit_cb, NULL);
idr_destroy(&loop_index_idr);

- unregister_blkdev(LOOP_MAJOR, "loop");
-
misc_deregister(&loop_misc);

mutex_unlock(&loop_ctl_mutex);
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index aefd74c0d862..8c71356cdcfe 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -40,6 +40,8 @@
#include <linux/uaccess.h>
#include <asm/io.h>

+static bool module_initialize_completed;
+
/**
* generic_id - add a generic drive id
* @drive: drive to make an ID block for
@@ -904,6 +906,8 @@ static int init_irq (ide_hwif_t *hwif)

static void ata_probe(dev_t dev)
{
+ if (!module_initialize_completed)
+ return;
request_module("ide-disk");
request_module("ide-cd");
request_module("ide-tape");
@@ -1475,6 +1479,8 @@ int ide_host_register(struct ide_host *host, const struct ide_port_info *d,
}
}

+ if (j)
+ module_initialize_completed = true;
return j ? 0 : -1;
}
EXPORT_SYMBOL_GPL(ide_host_register);
@@ -1539,6 +1545,7 @@ EXPORT_SYMBOL_GPL(ide_port_unregister_devices);

static void ide_unregister(ide_hwif_t *hwif)
{
+ unregister_blkdev(hwif->major, hwif->name);
mutex_lock(&ide_cfg_mtx);

if (hwif->present) {
@@ -1559,7 +1566,6 @@ static void ide_unregister(ide_hwif_t *hwif)
* Remove us from the kernel's knowledge
*/
kfree(hwif->sg_table);
- unregister_blkdev(hwif->major, hwif->name);

ide_release_dma_engine(hwif);

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 49f897fbb89b..6603900062bc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -68,6 +68,8 @@
#include "md-bitmap.h"
#include "md-cluster.h"

+static bool module_initialize_completed;
+
/* pers_list is a list of registered personalities protected
* by pers_lock.
* pers_lock does extra service to protect accesses to
@@ -5776,6 +5778,8 @@ static void md_probe(dev_t dev)
{
if (MAJOR(dev) == MD_MAJOR && MINOR(dev) >= 512)
return;
+ if (!module_initialize_completed)
+ return;
if (create_on_open)
md_alloc(dev, NULL);
}
@@ -9590,6 +9594,7 @@ static int __init md_init(void)
raid_table_header = register_sysctl_table(raid_root_table);

md_geninit();
+ module_initialize_completed = true;
return 0;

err_mdp:
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index cb3c37d1e009..4fc8f4db2ccf 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -629,14 +629,6 @@ static struct scsi_driver sd_template = {
.eh_reset = sd_eh_reset,
};

-/*
- * Don't request a new module, as that could deadlock in multipath
- * environment.
- */
-static void sd_default_probe(dev_t devt)
-{
-}
-
/*
* Device no to disk mapping:
*
@@ -3715,7 +3707,7 @@ static int __init init_sd(void)
SCSI_LOG_HLQUEUE(3, printk("init_sd: sd driver entry point\n"));

for (i = 0; i < SD_MAJORS; i++) {
- if (__register_blkdev(sd_major(i), "sd", sd_default_probe))
+ if (register_blkdev(sd_major(i), "sd"))
continue;
majors++;
}

2021-06-21 06:21:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v2] block: genhd: don't call probe function with major_names_lock held

The obvious fix is to not call unregister_blkdev under loop_ctl_mutex.
Which is pretty pointless anyway - I have a series to fix that but it
might need some reordering to move this to the front.

2021-06-21 08:58:58

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH v2] block: genhd: don't call probe function with major_names_lock held

On Sun, Jun 20, 2021 at 10:54:20PM +0900, Tetsuo Handa wrote:
> On 2021/06/20 11:44, Hillf Danton wrote:
> > Good craft in regard to triggering the ABBA deadlock, but curious why not
> > move unregister_blkdev out of and before loop_ctl_mutex, given it will also
> > serialise with the prober.
> >
>
> Well, something like this untested diff?
>
> Call unregister_blkdev() as soon as __exit function starts, for calling
> probe function after cleanup started will be unexpected for __exit function.
>
> Keep probe function no-op until __init function ends, for probe function
> might be called as soon as __register_blkdev() succeeded but calling probe
> function before setup completes will be unexpected for __init function.
>
> drivers/block/ataflop.c | 6 +++++-
> drivers/block/brd.c | 8 ++++++--
> drivers/block/floppy.c | 4 ++++
> drivers/block/loop.c | 4 ++--
> drivers/ide/ide-probe.c | 8 +++++++-
> drivers/md/md.c | 5 +++++
> drivers/scsi/sd.c | 10 +---------
> 7 files changed, 30 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
> index d601e49f80e0..3681e8c493b1 100644
> --- a/drivers/block/ataflop.c
> +++ b/drivers/block/ataflop.c
> @@ -1995,6 +1995,7 @@ static int ataflop_alloc_disk(unsigned int drive, unsigned int type)
> }
>
> static DEFINE_MUTEX(ataflop_probe_lock);
> +static bool module_initialize_completed;

This is almost always wrong.

>
> static void ataflop_probe(dev_t dev)
> {
> @@ -2006,6 +2007,8 @@ static void ataflop_probe(dev_t dev)
>
> if (drive >= FD_MAX_UNITS || type >= NUM_DISK_MINORS)
> return;
> + if (!module_initialize_completed)
> + return;

This is not correct, when you register a callback structure, it can be
instantly called. Do not expect it to be delayed until later.

thanks,

greg k-h