2006-09-13 17:46:30

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 2/2] new bd_mutex lockdep annotation

Use the gendisk partition number to set a lock class.

Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Jason Baron <[email protected]>
---
fs/block_dev.c | 9 +++++++++
1 file changed, 9 insertions(+)

Index: linux-2.6-mm/fs/block_dev.c
===================================================================
--- linux-2.6-mm.orig/fs/block_dev.c
+++ linux-2.6-mm/fs/block_dev.c
@@ -357,10 +357,14 @@ static int bdev_set(struct inode *inode,

static LIST_HEAD(all_bdevs);

+static struct lock_class_key bdev_part_lock_key;
+
struct block_device *bdget(dev_t dev)
{
struct block_device *bdev;
struct inode *inode;
+ struct gendisk *disk;
+ int part = 0;

inode = iget5_locked(bd_mnt->mnt_sb, hash(dev),
bdev_test, bdev_set, &dev);
@@ -386,6 +390,11 @@ struct block_device *bdget(dev_t dev)
list_add(&bdev->bd_list, &all_bdevs);
spin_unlock(&bdev_lock);
unlock_new_inode(inode);
+ mutex_init(&bdev->bd_mutex);
+ disk = get_gendisk(dev, &part);
+ if (part)
+ lockdep_set_class(&bdev->bd_mutex, &bdev_part_lock_key);
+ put_disk(disk);
}
return bdev;
}

--


2006-09-13 18:16:14

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH 2/2] new bd_mutex lockdep annotation

Peter Zijlstra wrote:
> Use the gendisk partition number to set a lock class.
>
I like this approach a whole better than what is there today (but we talked about that before ;)
It's a lot more obviously the right approach and kills a whole lot of ugly duplication

so

Acked-by: Arjan van de Ven <[email protected]>

>
> --

2006-09-14 07:11:29

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 2/2] new bd_mutex lockdep annotation

On Wednesday September 13, [email protected] wrote:
> Use the gendisk partition number to set a lock class.

Yes, this does look a lot nicer, thanks.

Two observations.
1/ I was confused that you added a call to mutex_init. One would
normally expect to only have one of these for any given mutex, so
adding one was a surprise.
I now realise that the purpose of this call is not exactly to init
the mutex, but to init the lockdep class in case this inode was
previously used for a partition but is now being used for a whole
device. This makes sense, but renders the mutex_init in
init_once pointless. Maybe that should be removed?

2/ You are introducing a new call to get_gendisk.
This bothers me for two reasons. Both relate to a comparison
with the call to get_gendisk in block_dev.c:do_open.
a/ That call is protected by lock_kernel. Your call is not.
b/ That call is followed by a test for '!disk' implying that it
can return NULL. Yours is not - at least not obviously
(put_disk does have the check).

I'm not sure if these are actually problems, but the do bother me.

Thinking through the possibly reasons for the lock_kernel, I wonder
it the current device number mapping scheme actually allows you
to determine if something is partitioned or not in a static sense.
Maybe that is only guaranteed to be stable while the device is
open...
I wonder if Al Viro could put my mind at rest .... Al - do you have
a moment to look at this? Thanks.

NeilBrown

>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Cc: Neil Brown <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Arjan van de Ven <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Jason Baron <[email protected]>
> ---
> fs/block_dev.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> Index: linux-2.6-mm/fs/block_dev.c
> ===================================================================
> --- linux-2.6-mm.orig/fs/block_dev.c
> +++ linux-2.6-mm/fs/block_dev.c
> @@ -357,10 +357,14 @@ static int bdev_set(struct inode *inode,
>
> static LIST_HEAD(all_bdevs);
>
> +static struct lock_class_key bdev_part_lock_key;
> +
> struct block_device *bdget(dev_t dev)
> {
> struct block_device *bdev;
> struct inode *inode;
> + struct gendisk *disk;
> + int part = 0;
>
> inode = iget5_locked(bd_mnt->mnt_sb, hash(dev),
> bdev_test, bdev_set, &dev);
> @@ -386,6 +390,11 @@ struct block_device *bdget(dev_t dev)
> list_add(&bdev->bd_list, &all_bdevs);
> spin_unlock(&bdev_lock);
> unlock_new_inode(inode);
> + mutex_init(&bdev->bd_mutex);
> + disk = get_gendisk(dev, &part);
> + if (part)
> + lockdep_set_class(&bdev->bd_mutex, &bdev_part_lock_key);
> + put_disk(disk);
> }
> return bdev;
> }
>
> --

2006-09-14 08:48:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] new bd_mutex lockdep annotation

On Thu, 2006-09-14 at 17:11 +1000, Neil Brown wrote:
> On Wednesday September 13, [email protected] wrote:
> > Use the gendisk partition number to set a lock class.
>
> Yes, this does look a lot nicer, thanks.
>
> Two observations.
> 1/ I was confused that you added a call to mutex_init. One would
> normally expect to only have one of these for any given mutex, so
> adding one was a surprise.
> I now realise that the purpose of this call is not exactly to init
> the mutex, but to init the lockdep class in case this inode was
> previously used for a partition but is now being used for a whole
> device. This makes sense, but renders the mutex_init in
> init_once pointless. Maybe that should be removed?

Yes, that would be quite redundant now, new patch attached.

> 2/ You are introducing a new call to get_gendisk.
> This bothers me for two reasons. Both relate to a comparison
> with the call to get_gendisk in block_dev.c:do_open.
> a/ That call is protected by lock_kernel. Your call is not.
> b/ That call is followed by a test for '!disk' implying that it
> can return NULL. Yours is not - at least not obviously
> (put_disk does have the check).

a) kobj_lookup() vs kobj_(un)map() use the domain lock.

Not all calls to blk_register_region() were under lock_kernel() afaicf.

So I don't think this is needed, but I'll gladly take advise otherwise,
I'm not well versed with the kobj stuff.

b) from quick inspection yesterday I reached two (false) conclusions
- &part would not be changed when !disk
- disk would have to exists at the time we call bdget()
Now I can't seem to validate either of them. Added disk to the if
statement just to be safe.

> I'm not sure if these are actually problems, but the do bother me.
>
> Thinking through the possibly reasons for the lock_kernel, I wonder
> it the current device number mapping scheme actually allows you
> to determine if something is partitioned or not in a static sense.
> Maybe that is only guaranteed to be stable while the device is
> open...

Hmm, yes I think I see what you mean...

> I wonder if Al Viro could put my mind at rest .... Al - do you have
> a moment to look at this? Thanks.

+1

---

Use the gendisk partition number to set a lock class.

Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Arjan van de Ven <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Jason Baron <[email protected]>
---
fs/block_dev.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

Index: linux-2.6-mm/fs/block_dev.c
===================================================================
--- linux-2.6-mm.orig/fs/block_dev.c
+++ linux-2.6-mm/fs/block_dev.c
@@ -264,7 +264,6 @@ static void init_once(void * foo, kmem_c
SLAB_CTOR_CONSTRUCTOR)
{
memset(bdev, 0, sizeof(*bdev));
- mutex_init(&bdev->bd_mutex);
mutex_init(&bdev->bd_mount_mutex);
INIT_LIST_HEAD(&bdev->bd_inodes);
INIT_LIST_HEAD(&bdev->bd_list);
@@ -357,10 +356,14 @@ static int bdev_set(struct inode *inode,

static LIST_HEAD(all_bdevs);

+static struct lock_class_key bdev_part_lock_key;
+
struct block_device *bdget(dev_t dev)
{
struct block_device *bdev;
struct inode *inode;
+ struct gendisk *disk;
+ int part = 0;

inode = iget5_locked(bd_mnt->mnt_sb, hash(dev),
bdev_test, bdev_set, &dev);
@@ -386,6 +389,11 @@ struct block_device *bdget(dev_t dev)
list_add(&bdev->bd_list, &all_bdevs);
spin_unlock(&bdev_lock);
unlock_new_inode(inode);
+ mutex_init(&bdev->bd_mutex);
+ disk = get_gendisk(dev, &part);
+ if (disk && part)
+ lockdep_set_class(&bdev->bd_mutex, &bdev_part_lock_key);
+ put_disk(disk);
}
return bdev;
}


2006-09-29 18:31:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] new bd_mutex lockdep annotation

Andrew, please hold on this, Al Viro doesn't agree with it.

Calling get_gendisk() from bdget() changes the bdget() semantics too
much. For one it enables bdget() to load modules.

Al proposed the following approach. Neil can you agree with this too?

---
Avoid the nesting of bd_mutex by serializing the locks. This is made
easier by changing the ->bd_part_count rules, its now only changed for
the first openers/closers.

Signed-off-by: Peter Zijlstra <[email protected]>
---
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 92de28d..0ffc4f0 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -898,12 +889,15 @@ void bd_set_size(struct block_device *bd
}
EXPORT_SYMBOL(bd_set_size);

+static int __blkdev_put(struct block_device *bdev, int part);
+
static int do_open(struct block_device *bdev, struct file *file)
{
struct module *owner = NULL;
struct gendisk *disk;
int ret = -ENXIO;
int part;
+ struct block_device *whole = NULL;

file->f_mapping = bdev->bd_inode->i_mapping;
lock_kernel();
@@ -937,30 +931,42 @@ static int do_open(struct block_device *
rescan_partitions(disk, bdev);
} else {
struct hd_struct *p;
- struct block_device *whole;
+
+ mutex_unlock(&bdev->bd_mutex);
+
whole = bdget_disk(disk, 0);
ret = -ENOMEM;
if (!whole)
- goto out_first;
+ goto out_first_lock;
ret = blkdev_get(whole, file->f_mode, file->f_flags);
if (ret)
- goto out_first;
- bdev->bd_contains = whole;
+ goto out_first_lock;
+
mutex_lock(&whole->bd_mutex);
whole->bd_part_count++;
p = disk->part[part - 1];
- bdev->bd_inode->i_data.backing_dev_info =
- whole->bd_inode->i_data.backing_dev_info;
if (!(disk->flags & GENHD_FL_UP) || !p || !p->nr_sects) {
- whole->bd_part_count--;
mutex_unlock(&whole->bd_mutex);
ret = -ENXIO;
- goto out_first;
+ goto out_first_lock;
}
kobject_get(&p->kobj);
- bdev->bd_part = p;
- bd_set_size(bdev, (loff_t) p->nr_sects << 9);
mutex_unlock(&whole->bd_mutex);
+
+ mutex_lock(&bdev->bd_mutex);
+ if (bdev->bd_contains != whole) {
+ bdev->bd_contains = whole;
+ bdev->bd_inode->i_data.backing_dev_info =
+ whole->bd_inode->i_data.backing_dev_info;
+ bdev->bd_part = p;
+ bd_set_size(bdev, (loff_t) p->nr_sects << 9);
+ whole = NULL;
+ } else {
+ mutex_unlock(&bdev->bd_mutex);
+ kobject_put(&p->kobj);
+ __blkdev_put(whole, 1);
+ mutex_lock(&bdev->bd_mutex);
+ }
}
} else {
put_disk(disk);
@@ -973,10 +979,6 @@ static int do_open(struct block_device *
}
if (bdev->bd_invalidated)
rescan_partitions(bdev->bd_disk, bdev);
- } else {
- mutex_lock(&bdev->bd_contains->bd_mutex);
- bdev->bd_contains->bd_part_count++;
- mutex_unlock(&bdev->bd_contains->bd_mutex);
}
}
bdev->bd_openers++;
@@ -984,11 +986,12 @@ static int do_open(struct block_device *
unlock_kernel();
return 0;

+out_first_lock:
+ if (whole)
+ __blkdev_put(whole, 1);
+ mutex_lock(&bdev->bd_mutex);
out_first:
bdev->bd_disk = NULL;
- bdev->bd_inode->i_data.backing_dev_info = &default_backing_dev_info;
- if (bdev != bdev->bd_contains)
- blkdev_put(bdev->bd_contains);
bdev->bd_contains = NULL;
put_disk(disk);
module_put(owner);
@@ -1049,14 +1052,17 @@ static int blkdev_open(struct inode * in
return res;
}

-int blkdev_put(struct block_device *bdev)
+static int __blkdev_put(struct block_device *bdev, int part)
{
int ret = 0;
struct inode *bd_inode = bdev->bd_inode;
struct gendisk *disk = bdev->bd_disk;
+ struct block_device *victim = NULL;

mutex_lock(&bdev->bd_mutex);
lock_kernel();
+ if (part)
+ bdev->bd_part_count--;
if (!--bdev->bd_openers) {
sync_blockdev(bdev);
kill_bdev(bdev);
@@ -1064,10 +1070,6 @@ int blkdev_put(struct block_device *bdev
if (bdev->bd_contains == bdev) {
if (disk->fops->release)
ret = disk->fops->release(bd_inode, NULL);
- } else {
- mutex_lock(&bdev->bd_contains->bd_mutex);
- bdev->bd_contains->bd_part_count--;
- mutex_unlock(&bdev->bd_contains->bd_mutex);
}
if (!bdev->bd_openers) {
struct module *owner = disk->fops->owner;
@@ -1081,17 +1083,23 @@ int blkdev_put(struct block_device *bdev
}
bdev->bd_disk = NULL;
bdev->bd_inode->i_data.backing_dev_info = &default_backing_dev_info;
- if (bdev != bdev->bd_contains) {
- blkdev_put(bdev->bd_contains);
- }
+ if (bdev != bdev->bd_contains)
+ victim = bdev->bd_contains;
bdev->bd_contains = NULL;
}
unlock_kernel();
mutex_unlock(&bdev->bd_mutex);
+ if (victim)
+ __blkdev_put(victim, 1);
bdput(bdev);
return ret;
}

+int blkdev_put(struct block_device *bdev)
+{
+ return __blkdev_put(bdev, 0);
+}
+
EXPORT_SYMBOL(blkdev_put);

static int blkdev_close(struct inode * inode, struct file * filp)