This is version 2 of a patch series I submitted back in May 2008. This
version now checks for device size changes in the rescan_partitions()
routine. Which in turn is called when a device is opened and in the
BLKRRPART ioctl.
I am resubmitting this patch series as I got little response the
previous time. Al Viro has told me offline that he would look at it this
time.
This patch series handles online disk resizes that are currently not
completely recognized by the kernel using the existing revalidate_disk
routines. An online resize can occur when growing or shrinking a
Fibre Channel LUN or perhaps by adding a disk to an existing RAID
volume.
The kernel currently recognizes a device size change when the
lower-level revalidate_disk routines are called; however, the block
layer does not use the new size while it has any current openers on
the device. So, for example, if LVM has an volume open on the device,
you will generally not see the size change until after a reboot. We
fix this problem by creating a wrapper to be used with lower-level
revalidate_disk routines. This wrapper first calls the lower-level
driver's revalidate_disk routine. It then compares the gendisk
capacity to the block devices inode size. If there is a difference, we
adjust the block device's size. If the size has changed, we then flush
the disk for safety. The size is also checked in rescan_partitions
which is called when the device is opened or when the BLKRRPART ioctl
is called.
There are several ways to "kick off" a device size change:
1. For SCSI devices do:
# echo 1 > /sys/class/scsi_device/<device>/device/rescan
or
# blockdev --rereadpt <device file>
2. Other devices (not device mapper)
# blockdev --rereadpt <device file>
I have tested this patch on SCSI and SmartArray (cciss)
devices. Device mapper still does not recognize device size changes
until the device is restarted.
Jeff Moyer and Andy Ryan have done some light testing on the previous
version of this series.
This patch set has been tested with scsi-misc-2.6. It also applies to
linux-next with some minor, obvious changes.
Diff stats:
drivers/scsi/sd.c | 4 +-
fs/block_dev.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++---
fs/partitions/check.c | 3 +-
include/linux/fs.h | 3 ++
4 files changed, 94 insertions(+), 8 deletions(-)
Commits:
- Wrapper for lower-level revalidate_disk routines.
- Adjust block device size after an online resize of a disk.
- Check for device resize in rescan_partitions.
- SCSI sd driver calls revalidate_disk wrapper.
- Added flush_disk to factor out common buffer cache flushing code.
- Call flush_disk() after detecting an online resize.
--
Andrew Patterson
Wrapper for lower-level revalidate_disk routines.
This is a wrapper for the lower-level revalidate_disk call-backs such
as sd_revalidate_disk(). It allows us to perform pre and post
operations when calling them.
We will use this wrapper in a later patch to adjust block device sizes
after an online resize (a _post_ operation).
Signed-off-by: Andrew Patterson <[email protected]>
---
fs/block_dev.c | 21 +++++++++++++++++++++
include/linux/fs.h | 1 +
2 files changed, 22 insertions(+), 0 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index aff5421..30bafca 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -868,6 +868,27 @@ struct block_device *open_by_devnum(dev_t dev, unsigned mode)
EXPORT_SYMBOL(open_by_devnum);
+/**
+ * revalidate_disk - wrapper for lower-level driver's revalidate_disk
+ * call-back
+ *
+ * @disk: struct gendisk to be revalidated
+ *
+ * This routine is a wrapper for lower-level driver's revalidate_disk
+ * call-backs. It is used to do common pre and post operations needed
+ * for all revalidate_disk operations.
+ */
+int revalidate_disk(struct gendisk *disk)
+{
+ int ret = 0;
+
+ if (disk->fops->revalidate_disk)
+ ret = disk->fops->revalidate_disk(disk);
+
+ return ret;
+}
+EXPORT_SYMBOL(revalidate_disk);
+
/*
* This routine checks whether a removable media has been changed,
* and invalidates all buffer-cache-entries in that case. This
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 580b513..28756a9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1718,6 +1718,7 @@ extern int fs_may_remount_ro(struct super_block *);
*/
#define bio_data_dir(bio) ((bio)->bi_rw & 1)
+extern int revalidate_disk(struct gendisk *);
extern int check_disk_change(struct block_device *);
extern int __invalidate_device(struct block_device *);
extern int invalidate_partition(struct gendisk *, int);
Adjust block device size after an online resize of a disk.
The revalidate_disk routine now checks if a disk has been resized by
comparing the gendisk capacity to the bdev inode size. If they are
different (usually because the disk has been resized underneath the kernel)
the bdev inode size is adjusted to match the capacity.
Signed-off-by: Andrew Patterson <[email protected]>
---
fs/block_dev.c | 37 +++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 2 ++
2 files changed, 39 insertions(+), 0 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 30bafca..f8df73a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -869,6 +869,34 @@ struct block_device *open_by_devnum(dev_t dev, unsigned mode)
EXPORT_SYMBOL(open_by_devnum);
/**
+ * check_disk_size_change - checks for disk size change and adjusts
+ * bdev size.
+ *
+ * @disk: struct gendisk to check
+ * @bdev: struct bdev to adjust.
+ *
+ * This routine checks to see if the bdev size does not match the disk size
+ * and adjusts it if it differs.
+ */
+void check_disk_size_change(struct gendisk *disk, struct block_device *bdev)
+{
+ loff_t disk_size, bdev_size;
+
+ disk_size = (loff_t)get_capacity(disk) << 9;
+ bdev_size = i_size_read(bdev->bd_inode);
+ if (disk_size != bdev_size) {
+ char name[BDEVNAME_SIZE];
+
+ disk_name(disk, 0, name);
+ printk(KERN_INFO
+ "%s: detected capacity change from %lld to %lld\n",
+ name, bdev_size, disk_size);
+ i_size_write(bdev->bd_inode, disk_size);
+ }
+}
+EXPORT_SYMBOL(check_disk_size_change);
+
+/**
* revalidate_disk - wrapper for lower-level driver's revalidate_disk
* call-back
*
@@ -880,11 +908,20 @@ EXPORT_SYMBOL(open_by_devnum);
*/
int revalidate_disk(struct gendisk *disk)
{
+ struct block_device *bdev;
int ret = 0;
if (disk->fops->revalidate_disk)
ret = disk->fops->revalidate_disk(disk);
+ bdev = bdget_disk(disk, 0);
+ if (!bdev)
+ return ret;
+
+ mutex_lock(&bdev->bd_mutex);
+ check_disk_size_change(disk, bdev);
+ mutex_unlock(&bdev->bd_mutex);
+ bdput(bdev);
return ret;
}
EXPORT_SYMBOL(revalidate_disk);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 28756a9..ef23c3f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1718,6 +1718,8 @@ extern int fs_may_remount_ro(struct super_block *);
*/
#define bio_data_dir(bio) ((bio)->bi_rw & 1)
+extern void check_disk_size_change(struct gendisk *disk,
+ struct block_device *bdev);
extern int revalidate_disk(struct gendisk *);
extern int check_disk_change(struct block_device *);
extern int __invalidate_device(struct block_device *);
Check for device resize when rescanning partitions
Check for device resize in the rescan_partitions() routine. If the device
has been resized, the bdev size is set to match. The rescan_partitions()
routine is called when opening the device and when calling the
BLKRRPART ioctl.
Signed-off-by: Andrew Patterson <[email protected]>
---
fs/partitions/check.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index 7d6b34e..2e7b1fa 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -480,11 +480,12 @@ int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
res = invalidate_partition(disk, 0);
if (res)
return res;
- bdev->bd_invalidated = 0;
for (p = 1; p < disk->minors; p++)
delete_partition(disk, p);
if (disk->fops->revalidate_disk)
disk->fops->revalidate_disk(disk);
+ check_disk_size_change(disk, bdev);
+ bdev->bd_invalidated = 0;
if (!get_capacity(disk) || !(state = check_partition(disk, bdev)))
return 0;
if (IS_ERR(state)) /* I/O error reading the partition table */
SCSI sd driver calls revalidate_disk wrapper.
Modify the SCSI disk driver to call the revalidate_disk()
wrapper. This allows us to do some housekeeping such as accounting for
a disk being resized online. The wrapper will call
sd_revalidate_disk() at the appropriate time.
Signed-off-by: Andrew Patterson <[email protected]>
---
drivers/scsi/sd.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index e5e7d78..31545b4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -159,7 +159,7 @@ sd_store_cache_type(struct device *dev, struct device_attribute *attr,
sd_print_sense_hdr(sdkp, &sshdr);
return -EINVAL;
}
- sd_revalidate_disk(sdkp->disk);
+ revalidate_disk(sdkp->disk);
return count;
}
@@ -910,7 +910,7 @@ static void sd_rescan(struct device *dev)
struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
if (sdkp) {
- sd_revalidate_disk(sdkp->disk);
+ revalidate_disk(sdkp->disk);
scsi_disk_put(sdkp);
}
}
Added flush_disk to factor out common buffer cache flushing code.
We need to be able to flush the buffer cache for for more than
just when a disk is changed, so we factor out common cache flush code
in check_disk_change() to an internal flush_disk() routine. This
routine will then be used for both disk changes and disk resizes (in a
later patch).
Include the disk name in the text indicating that there are busy
inodes on the device and increase the KERN severity of the message.
Signed-off-by: Andrew Patterson <[email protected]>
---
fs/block_dev.c | 33 ++++++++++++++++++++++++++++-----
1 files changed, 28 insertions(+), 5 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index f8df73a..5ce28b1 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -868,6 +868,33 @@ struct block_device *open_by_devnum(dev_t dev, unsigned mode)
EXPORT_SYMBOL(open_by_devnum);
+
+/**
+ * flush_disk - invalidates all buffer-cache entries on a disk
+ *
+ * @bdev: struct block device to be flushed
+ *
+ * Invalidates all buffer-cache entries on a disk. It should be called
+ * when a disk has been changed -- either by a media change or online
+ * resize.
+ */
+static void flush_disk(struct block_device *bdev)
+{
+ if (__invalidate_device(bdev)) {
+ char name[BDEVNAME_SIZE] = "";
+
+ if (bdev->bd_disk)
+ disk_name(bdev->bd_disk, 0, name);
+ printk(KERN_WARNING "VFS: busy inodes on changed media %s\n",
+ name);
+ }
+
+ if (!bdev->bd_disk)
+ return;
+ if (bdev->bd_disk->minors > 1)
+ bdev->bd_invalidated = 1;
+}
+
/**
* check_disk_size_change - checks for disk size change and adjusts
* bdev size.
@@ -945,13 +972,9 @@ int check_disk_change(struct block_device *bdev)
if (!bdops->media_changed(bdev->bd_disk))
return 0;
- if (__invalidate_device(bdev))
- printk("VFS: busy inodes on changed media.\n");
-
+ flush_disk(bdev);
if (bdops->revalidate_disk)
bdops->revalidate_disk(bdev->bd_disk);
- if (bdev->bd_disk->minors > 1)
- bdev->bd_invalidated = 1;
return 1;
}
Call flush_disk() after detecting an online resize.
We call flush_disk() to make sure the buffer cache for the disk is
flushed after a disk resize. There are two resize cases, growing and
shrinking. Given that users can shrink/then grow a disk before
revalidate_disk() is called, we treat the grow case identically to
shrinking. We need to flush the buffer cache after an online shrink
because, as James Bottomley puts it,
The two use cases for shrinking I can see are
1. planned: the fs is already shrunk to within the new boundaries
and all data is relocated, so invalidate is fine (any dirty
buffers that might exist in the shrunk region are there only
because they were relocated but not yet written to their
original location).
2. unplanned: In this case, the fs is probably toast, so whether
we invalidate or not isn't going to make a whole lot of
difference; it's still going to try to read or write from
sectors beyond the new size and get I/O errors.
Immediately invalidating shrunk disks will cause errors for outstanding
I/Os for reads/write beyond the new end of the disk to be generated
earlier then if we waited for the normal buffer cache operation. It also
removes a potential security hole where we might keep old data around
from beyond the end of the shrunk disk if the disk was not invalidated.
Signed-off-by: Andrew Patterson <[email protected]>
---
fs/block_dev.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 5ce28b1..6cb00dd 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -885,8 +885,8 @@ static void flush_disk(struct block_device *bdev)
if (bdev->bd_disk)
disk_name(bdev->bd_disk, 0, name);
- printk(KERN_WARNING "VFS: busy inodes on changed media %s\n",
- name);
+ printk(KERN_WARNING "VFS: busy inodes on changed media or "
+ "resized disk %s\n", name);
}
if (!bdev->bd_disk)
@@ -919,6 +919,7 @@ void check_disk_size_change(struct gendisk *disk, struct block_device *bdev)
"%s: detected capacity change from %lld to %lld\n",
name, bdev_size, disk_size);
i_size_write(bdev->bd_inode, disk_size);
+ flush_disk(bdev);
}
}
EXPORT_SYMBOL(check_disk_size_change);