A new version of a block device interposer (blk_interposer).
In this series of patches, I have tried to take into account the comments
made by Mike to the previous version.
First of all, this applies to more detailed explanations of the commits.
Indeed, the changes in blk-core.c and dm.c may seem complicated, but they
are no more complicated than the rest of the code in these files.
Removed the [interpose] option for block devices opened by the DM target.
Instead, the dm_get_device_ex() function is added, which allows to
explicitly specify which devices can be used for the interposer and which
can not.
Additional testing has revealed a problem with suspending and resuming DM
targets attached via blk_interposer. This has been fixed.
History:
v8 - https://patchwork.kernel.org/project/linux-block/cover/[email protected]/
* The attaching and detaching to interposed device moved to
__dm_suspend() and __dm_resume() functions.
* Redesigned the submit_bio_noacct() function and added a lock for the
block device interposer.
* Adds [interpose] option to block device patch in dm table.
* Fix origin_map() then o->split_binary value is zero.
v7 - https://patchwork.kernel.org/project/linux-block/cover/[email protected]/
* the request interception mechanism. Now the interposer is
a block device that receives requests instead of the original device;
* code design fixes.
v6 - https://patchwork.kernel.org/project/linux-block/cover/[email protected]/
* designed for 5.12;
* thanks to the new design of the bio structure in v5.12, it is
possible to perform interception not for the entire disk, but
for each block device;
* instead of the new ioctl DM_DEV_REMAP_CMD and the 'noexcl' option,
the DM_INTERPOSED_FLAG flag for the ioctl DM_TABLE_LOAD_CMD is
applied.
v5 - https://patchwork.kernel.org/project/linux-block/cover/[email protected]/
* rebase for v5.11-rc7;
* patch set organization;
* fix defects in documentation;
* add some comments;
* change mutex names for better code readability;
* remove calling bd_unlink_disk_holder() for targets with non-exclusive
flag;
* change type for struct dm_remap_param from uint8_t to __u8.
v4 - https://patchwork.kernel.org/project/linux-block/cover/[email protected]/
Mostly changes were made, due to Damien's comments:
* on the design of the code;
* by the patch set organization;
* bug with passing a wrong parameter to dm_get_device();
* description of the 'noexcl' parameter in the linear.rst.
Also added remap_and_filter.rst.
v3 - https://patchwork.kernel.org/project/linux-block/cover/[email protected]/
In this version, I already suggested blk_interposer to apply to dm-linear.
Problems were solved:
* Interception of bio requests from a specific device on the disk, not
from the entire disk. To do this, we added the dm_interposed_dev
structure and an interval tree to store these structures.
* Implemented ioctl DM_DEV_REMAP_CMD. A patch with changes in the lvm2
project was sent to the team [email protected].
* Added the 'noexcl' option for dm-linear, which allows you to open
the underlying block-device without FMODE_EXCL mode.
v2 - https://patchwork.kernel.org/project/linux-block/cover/[email protected]/
I tried to suggest blk_interposer without using it in device mapper,
but with the addition of a sample of its use. It was then that I learned
about the maintainers' attitudes towards the samples directory :).
v1 - https://lwn.net/ml/linux-block/[email protected]/
This Hannes's patch can be considered as a starting point, since this is
where the interception mechanism and the term blk_interposer itself
appeared. It became clear that blk_interposer can be useful for
device mapper.
before v1 - https://patchwork.kernel.org/project/linux-block/cover/[email protected]/
I tried to offer a rather cumbersome blk-filter and a monster-like
blk-snap module for creating snapshots.
Sergei Shtepa (4):
Adds blk_interposer
Applying the blk_interposer in the block device layer
Add blk_interposer in DM
Using dm_get_device_ex() instead of dm_get_device()
block/bio.c | 2 +
block/blk-core.c | 194 ++++++++++++++-------------
block/genhd.c | 52 ++++++++
drivers/md/dm-cache-target.c | 5 +-
drivers/md/dm-core.h | 1 +
drivers/md/dm-delay.c | 3 +-
drivers/md/dm-dust.c | 3 +-
drivers/md/dm-era-target.c | 4 +-
drivers/md/dm-flakey.c | 3 +-
drivers/md/dm-ioctl.c | 59 ++++++++-
drivers/md/dm-linear.c | 3 +-
drivers/md/dm-log-writes.c | 3 +-
drivers/md/dm-snap.c | 3 +-
drivers/md/dm-table.c | 21 ++-
drivers/md/dm-writecache.c | 3 +-
drivers/md/dm.c | 242 ++++++++++++++++++++++++++++++----
drivers/md/dm.h | 8 +-
fs/block_dev.c | 3 +
include/linux/blk_types.h | 6 +
include/linux/blkdev.h | 32 +++++
include/linux/device-mapper.h | 11 +-
include/uapi/linux/dm-ioctl.h | 6 +
22 files changed, 530 insertions(+), 137 deletions(-)
--
2.20.1
Not every DM target needs the ability to attach via blk_interposer.
A DM target can attach and detach 'on the fly' only if the DM
target works as a filter without changing the location of the blocks
on the block device.
Signed-off-by: Sergei Shtepa <[email protected]>
---
drivers/md/dm-cache-target.c | 5 +++--
drivers/md/dm-delay.c | 3 ++-
drivers/md/dm-dust.c | 3 ++-
drivers/md/dm-era-target.c | 4 +++-
drivers/md/dm-flakey.c | 3 ++-
drivers/md/dm-linear.c | 3 ++-
drivers/md/dm-log-writes.c | 3 ++-
drivers/md/dm-snap.c | 3 ++-
drivers/md/dm-writecache.c | 3 ++-
9 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 541c45027cc8..885a6fde1b9b 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -2140,8 +2140,9 @@ static int parse_origin_dev(struct cache_args *ca, struct dm_arg_set *as,
if (!at_least_one_arg(as, error))
return -EINVAL;
- r = dm_get_device(ca->ti, dm_shift_arg(as), FMODE_READ | FMODE_WRITE,
- &ca->origin_dev);
+ r = dm_get_device_ex(ca->ti, dm_shift_arg(as), FMODE_READ | FMODE_WRITE,
+ dm_table_is_interposer(ca->ti->table),
+ &ca->origin_dev);
if (r) {
*error = "Error opening origin device";
return r;
diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
index 2628a832787b..1b051a023a5d 100644
--- a/drivers/md/dm-delay.c
+++ b/drivers/md/dm-delay.c
@@ -153,7 +153,8 @@ static int delay_class_ctr(struct dm_target *ti, struct delay_class *c, char **a
return -EINVAL;
}
- ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &c->dev);
+ ret = dm_get_device_ex(ti, argv[0], dm_table_get_mode(ti->table),
+ dm_table_is_interposer(ti->table), &c->dev);
if (ret) {
ti->error = "Device lookup failed";
return ret;
diff --git a/drivers/md/dm-dust.c b/drivers/md/dm-dust.c
index cbe1058ee589..5eb930ea8034 100644
--- a/drivers/md/dm-dust.c
+++ b/drivers/md/dm-dust.c
@@ -366,7 +366,8 @@ static int dust_ctr(struct dm_target *ti, unsigned int argc, char **argv)
return -ENOMEM;
}
- if (dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &dd->dev)) {
+ if (dm_get_device_ex(ti, argv[0], dm_table_get_mode(ti->table),
+ dm_table_is_interposer(ti->table), &dd->dev)) {
ti->error = "Device lookup failed";
kfree(dd);
return -EINVAL;
diff --git a/drivers/md/dm-era-target.c b/drivers/md/dm-era-target.c
index d9ac7372108c..db8791981605 100644
--- a/drivers/md/dm-era-target.c
+++ b/drivers/md/dm-era-target.c
@@ -1462,7 +1462,9 @@ static int era_ctr(struct dm_target *ti, unsigned argc, char **argv)
return -EINVAL;
}
- r = dm_get_device(ti, argv[1], FMODE_READ | FMODE_WRITE, &era->origin_dev);
+ r = dm_get_device_ex(ti, argv[1], FMODE_READ | FMODE_WRITE,
+ dm_table_is_interposer(ti->table),
+ &era->origin_dev);
if (r) {
ti->error = "Error opening data device";
era_destroy(era);
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index b7fee9936f05..89bb77545757 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -243,7 +243,8 @@ static int flakey_ctr(struct dm_target *ti, unsigned int argc, char **argv)
if (r)
goto bad;
- r = dm_get_device(ti, devname, dm_table_get_mode(ti->table), &fc->dev);
+ r = dm_get_device_ex(ti, devname, dm_table_get_mode(ti->table),
+ dm_table_is_interposer(ti->table), &fc->dev);
if (r) {
ti->error = "Device lookup failed";
goto bad;
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 92db0f5e7f28..1301b11dd2af 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -51,7 +51,8 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
}
lc->start = tmp;
- ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &lc->dev);
+ ret = dm_get_device_ex(ti, argv[0], dm_table_get_mode(ti->table),
+ dm_table_is_interposer(ti->table), &lc->dev);
if (ret) {
ti->error = "Device lookup failed";
goto bad;
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 57882654ffee..32a389ea4eb1 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -554,7 +554,8 @@ static int log_writes_ctr(struct dm_target *ti, unsigned int argc, char **argv)
atomic_set(&lc->pending_blocks, 0);
devname = dm_shift_arg(&as);
- ret = dm_get_device(ti, devname, dm_table_get_mode(ti->table), &lc->dev);
+ ret = dm_get_device_ex(ti, devname, dm_table_get_mode(ti->table),
+ dm_table_is_interposer(ti->table), &lc->dev);
if (ret) {
ti->error = "Device lookup failed";
goto bad;
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 11890db71f3f..eab96db253e1 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -2646,7 +2646,8 @@ static int origin_ctr(struct dm_target *ti, unsigned int argc, char **argv)
goto bad_alloc;
}
- r = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &o->dev);
+ r = dm_get_device_ex(ti, argv[0], dm_table_get_mode(ti->table),
+ dm_table_is_interposer(ti->table), &o->dev);
if (r) {
ti->error = "Cannot get target device";
goto bad_open;
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 4f72b6f66c3a..bb0801fe4c63 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -2169,7 +2169,8 @@ static int writecache_ctr(struct dm_target *ti, unsigned argc, char **argv)
string = dm_shift_arg(&as);
if (!string)
goto bad_arguments;
- r = dm_get_device(ti, string, dm_table_get_mode(ti->table), &wc->dev);
+ r = dm_get_device_ex(ti, string, dm_table_get_mode(ti->table),
+ dm_table_is_interposer(ti->table), &wc->dev);
if (r) {
ti->error = "Origin data device lookup failed";
goto bad;
--
2.20.1
A new 'DM_INTERPOSE_FLAG' flag allows to specify that a DM target should
be attached via blk_interposer. This flag defines the value of the
'interpose' flag in a mapped_device structure. The 'interpose' flag in
the dm_dev structure indicates which device in the device table is
attached via blk_interposer.
To safely attach a DM target to blk_interposer, the DM target must be
fully initialized. Immediately after attaching to blk_interposer,
the DM device can receive bio requests and those must be processed.
Therefore, the connection is performed in the __dm_resume() function.
To safely detach a DM target from blk_interposer, the DM target must be
suspended. Only in this case we can be sure that all DM target requests
have been processed. Therefore, detaching from blk_interposer is called
from the __dm_suspend() function. However, we must lock the request queue
from the original device before calling __dm_suspend(). That is why
the locking of the queue of the original device is made as a separate
function.
A new dm_get_device_ex() function can be used instead of
dm_get_device() if we need to specify the 'interposer' flag for dm_dev.
The old dm_get_device() function sets the 'interposer' flag to false.
It allows to not change every DM target. At the same time, the new
function allows to explicitly specify which block devices and in which
DM targets can be attached via blk_interposer.
Signed-off-by: Sergei Shtepa <[email protected]>
---
drivers/md/dm-core.h | 1 +
drivers/md/dm-ioctl.c | 59 ++++++++-
drivers/md/dm-table.c | 21 ++-
drivers/md/dm.c | 242 ++++++++++++++++++++++++++++++----
drivers/md/dm.h | 8 +-
include/linux/device-mapper.h | 11 +-
include/uapi/linux/dm-ioctl.h | 6 +
7 files changed, 309 insertions(+), 39 deletions(-)
diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h
index 5953ff2bd260..431b82461eae 100644
--- a/drivers/md/dm-core.h
+++ b/drivers/md/dm-core.h
@@ -112,6 +112,7 @@ struct mapped_device {
/* for blk-mq request-based DM support */
struct blk_mq_tag_set *tag_set;
bool init_tio_pdu:1;
+ bool interpose:1;
struct srcu_struct io_barrier;
};
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 1ca65b434f1f..cd36fa3cb627 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -301,6 +301,24 @@ static void dm_hash_remove_all(bool keep_open_devices, bool mark_deferred, bool
continue;
}
+ if (md->interpose) {
+ int r;
+
+ /*
+ * Interposer should be suspended and detached
+ * from the interposed block device.
+ */
+ r = dm_suspend(md, DM_SUSPEND_DETACH_IP_FLAG |
+ DM_SUSPEND_LOCKFS_FLAG);
+ if (r) {
+ DMERR("%s: unable to suspend and detach interposer",
+ dm_device_name(md));
+ dm_put(md);
+ dev_skipped++;
+ continue;
+ }
+ }
+
t = __hash_remove(hc);
up_write(&_hash_lock);
@@ -732,6 +750,9 @@ static void __dev_status(struct mapped_device *md, struct dm_ioctl *param)
if (dm_test_deferred_remove_flag(md))
param->flags |= DM_DEFERRED_REMOVE;
+ if (dm_interposer_attached(md))
+ param->flags |= DM_INTERPOSE_FLAG;
+
param->dev = huge_encode_dev(disk_devt(disk));
/*
@@ -893,6 +914,21 @@ static int dev_remove(struct file *filp, struct dm_ioctl *param, size_t param_si
dm_put(md);
return r;
}
+ if (md->interpose) {
+ /*
+ * Interposer should be suspended and detached from
+ * the interposed block device.
+ */
+ r = dm_suspend(md, DM_SUSPEND_DETACH_IP_FLAG |
+ DM_SUSPEND_LOCKFS_FLAG);
+ if (r) {
+ DMERR("%s: unable to suspend and detach interposer",
+ dm_device_name(md));
+ up_write(&_hash_lock);
+ dm_put(md);
+ return r;
+ }
+ }
t = __hash_remove(hc);
up_write(&_hash_lock);
@@ -1063,8 +1099,18 @@ static int do_resume(struct dm_ioctl *param)
suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG;
if (param->flags & DM_NOFLUSH_FLAG)
suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG;
- if (!dm_suspended_md(md))
- dm_suspend(md, suspend_flags);
+
+ if (md->interpose) {
+ /*
+ * Interposer should be detached before loading
+ * a new table
+ */
+ if (!dm_suspended_md(md) || dm_interposer_attached(md))
+ dm_suspend(md, suspend_flags | DM_SUSPEND_DETACH_IP_FLAG);
+ } else {
+ if (!dm_suspended_md(md))
+ dm_suspend(md, suspend_flags);
+ }
old_map = dm_swap_table(md, new_map);
if (IS_ERR(old_map)) {
@@ -1267,6 +1313,11 @@ static inline fmode_t get_mode(struct dm_ioctl *param)
return mode;
}
+static inline bool get_interpose_flag(struct dm_ioctl *param)
+{
+ return (param->flags & DM_INTERPOSE_FLAG);
+}
+
static int next_target(struct dm_target_spec *last, uint32_t next, void *end,
struct dm_target_spec **spec, char **target_params)
{
@@ -1338,6 +1389,8 @@ static int table_load(struct file *filp, struct dm_ioctl *param, size_t param_si
if (!md)
return -ENXIO;
+ md->interpose = get_interpose_flag(param);
+
r = dm_table_create(&t, get_mode(param), param->target_count, md);
if (r)
goto err;
@@ -2098,6 +2151,8 @@ int __init dm_early_create(struct dm_ioctl *dmi,
if (r)
goto err_hash_remove;
+ md->interpose = get_interpose_flag(dmi);
+
/* add targets */
for (i = 0; i < dmi->target_count; i++) {
r = dm_table_add_target(t, spec_array[i]->target_type,
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index e5f0f1703c5d..cc6b852cc967 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -327,14 +327,14 @@ static int device_area_is_invalid(struct dm_target *ti, struct dm_dev *dev,
* it is accessed concurrently.
*/
static int upgrade_mode(struct dm_dev_internal *dd, fmode_t new_mode,
- struct mapped_device *md)
+ bool interpose, struct mapped_device *md)
{
int r;
struct dm_dev *old_dev, *new_dev;
old_dev = dd->dm_dev;
- r = dm_get_table_device(md, dd->dm_dev->bdev->bd_dev,
+ r = dm_get_table_device(md, dd->dm_dev->bdev->bd_dev, interpose,
dd->dm_dev->mode | new_mode, &new_dev);
if (r)
return r;
@@ -362,8 +362,8 @@ EXPORT_SYMBOL_GPL(dm_get_dev_t);
* Add a device to the list, or just increment the usage count if
* it's already present.
*/
-int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
- struct dm_dev **result)
+int dm_get_device_ex(struct dm_target *ti, const char *path, fmode_t mode,
+ bool interpose, struct dm_dev **result)
{
int r;
dev_t dev;
@@ -391,7 +391,8 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
if (!dd)
return -ENOMEM;
- if ((r = dm_get_table_device(t->md, dev, mode, &dd->dm_dev))) {
+ r = dm_get_table_device(t->md, dev, mode, interpose, &dd->dm_dev);
+ if (r) {
kfree(dd);
return r;
}
@@ -401,7 +402,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
goto out;
} else if (dd->dm_dev->mode != (mode | dd->dm_dev->mode)) {
- r = upgrade_mode(dd, mode, t->md);
+ r = upgrade_mode(dd, mode, interpose, t->md);
if (r)
return r;
}
@@ -410,7 +411,7 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
*result = dd->dm_dev;
return 0;
}
-EXPORT_SYMBOL(dm_get_device);
+EXPORT_SYMBOL(dm_get_device_ex);
static int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
@@ -2206,6 +2207,12 @@ struct mapped_device *dm_table_get_md(struct dm_table *t)
}
EXPORT_SYMBOL(dm_table_get_md);
+bool dm_table_is_interposer(struct dm_table *t)
+{
+ return t->md->interpose;
+}
+EXPORT_SYMBOL(dm_table_is_interposer);
+
const char *dm_table_device_name(struct dm_table *t)
{
return dm_device_name(t->md);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3f3be9408afa..818462b46c91 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -149,6 +149,7 @@ EXPORT_SYMBOL_GPL(dm_bio_get_target_bio_nr);
#define DMF_DEFERRED_REMOVE 6
#define DMF_SUSPENDED_INTERNALLY 7
#define DMF_POST_SUSPENDING 8
+#define DMF_INTERPOSER_ATTACHED 9
#define DM_NUMA_NODE NUMA_NO_NODE
static int dm_numa_node = DM_NUMA_NODE;
@@ -757,18 +758,24 @@ static int open_table_device(struct table_device *td, dev_t dev,
struct mapped_device *md)
{
struct block_device *bdev;
-
+ fmode_t mode = td->dm_dev.mode;
+ void *holder = NULL;
int r;
BUG_ON(td->dm_dev.bdev);
- bdev = blkdev_get_by_dev(dev, td->dm_dev.mode | FMODE_EXCL, _dm_claim_ptr);
+ if (!td->dm_dev.interpose) {
+ mode |= FMODE_EXCL;
+ holder = _dm_claim_ptr;
+ }
+
+ bdev = blkdev_get_by_dev(dev, mode, holder);
if (IS_ERR(bdev))
return PTR_ERR(bdev);
r = bd_link_disk_holder(bdev, dm_disk(md));
if (r) {
- blkdev_put(bdev, td->dm_dev.mode | FMODE_EXCL);
+ blkdev_put(bdev, mode);
return r;
}
@@ -782,11 +789,16 @@ static int open_table_device(struct table_device *td, dev_t dev,
*/
static void close_table_device(struct table_device *td, struct mapped_device *md)
{
+ fmode_t mode = td->dm_dev.mode;
+
if (!td->dm_dev.bdev)
return;
bd_unlink_disk_holder(td->dm_dev.bdev, dm_disk(md));
- blkdev_put(td->dm_dev.bdev, td->dm_dev.mode | FMODE_EXCL);
+ if (!td->dm_dev.interpose)
+ mode |= FMODE_EXCL;
+ blkdev_put(td->dm_dev.bdev, mode);
+
put_dax(td->dm_dev.dax_dev);
td->dm_dev.bdev = NULL;
td->dm_dev.dax_dev = NULL;
@@ -805,7 +817,7 @@ static struct table_device *find_table_device(struct list_head *l, dev_t dev,
}
int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
- struct dm_dev **result)
+ bool interpose, struct dm_dev **result)
{
int r;
struct table_device *td;
@@ -821,6 +833,7 @@ int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
td->dm_dev.mode = mode;
td->dm_dev.bdev = NULL;
+ td->dm_dev.interpose = interpose;
if ((r = open_table_device(td, dev, md))) {
mutex_unlock(&md->table_devices_lock);
@@ -1696,6 +1709,13 @@ static blk_qc_t dm_submit_bio(struct bio *bio)
goto out;
}
+ /*
+ * If md is an interposer, then we must set the BIO_INTERPOSE flag
+ * so that the request is not re-interposed.
+ */
+ if (md->interpose)
+ bio_set_flag(bio, BIO_INTERPOSED);
+
/* If suspended, queue this IO for later */
if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) {
if (bio->bi_opf & REQ_NOWAIT)
@@ -2453,26 +2473,50 @@ struct dm_table *dm_swap_table(struct mapped_device *md, struct dm_table *table)
* Functions to lock and unlock any filesystem running on the
* device.
*/
-static int lock_fs(struct mapped_device *md)
+static int lock_bdev_fs(struct mapped_device *md, struct block_device *bdev)
{
int r;
WARN_ON(test_bit(DMF_FROZEN, &md->flags));
- r = freeze_bdev(md->disk->part0);
+ r = freeze_bdev(bdev);
if (!r)
set_bit(DMF_FROZEN, &md->flags);
return r;
}
-static void unlock_fs(struct mapped_device *md)
+static void unlock_bdev_fs(struct mapped_device *md, struct block_device *bdev)
{
if (!test_bit(DMF_FROZEN, &md->flags))
return;
- thaw_bdev(md->disk->part0);
+ thaw_bdev(bdev);
clear_bit(DMF_FROZEN, &md->flags);
}
+static inline int lock_fs(struct mapped_device *md)
+{
+ return lock_bdev_fs(md, md->disk->part0);
+}
+
+static inline void unlock_fs(struct mapped_device *md)
+{
+ unlock_bdev_fs(md, md->disk->part0);
+}
+
+static inline struct block_device *get_interposed_bdev(struct dm_table *t)
+{
+ struct dm_dev_internal *dd;
+
+ /*
+ * For interposer should be only one device in dm table
+ */
+ list_for_each_entry(dd, dm_table_get_devices(t), list)
+ if (dd->dm_dev->interpose)
+ return bdgrab(dd->dm_dev->bdev);
+
+ return NULL;
+}
+
/*
* @suspend_flags: DM_SUSPEND_LOCKFS_FLAG and/or DM_SUSPEND_NOFLUSH_FLAG
* @task_state: e.g. TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE
@@ -2488,7 +2532,10 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
{
bool do_lockfs = suspend_flags & DM_SUSPEND_LOCKFS_FLAG;
bool noflush = suspend_flags & DM_SUSPEND_NOFLUSH_FLAG;
- int r;
+ bool detach_ip = suspend_flags & DM_SUSPEND_DETACH_IP_FLAG
+ && md->interpose;
+ struct block_device *original_bdev = NULL;
+ int r = 0;
lockdep_assert_held(&md->suspend_lock);
@@ -2507,18 +2554,50 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
*/
dm_table_presuspend_targets(map);
+ if (!md->interpose) {
+ /*
+ * Flush I/O to the device.
+ * Any I/O submitted after lock_fs() may not be flushed.
+ * noflush takes precedence over do_lockfs.
+ * (lock_fs() flushes I/Os and waits for them to complete.)
+ */
+ if (!noflush && do_lockfs)
+ r = lock_fs(md);
+ } else if (map) {
+ /*
+ * Interposer should not lock mapped device, but
+ * should freeze interposed device and lock it.
+ */
+ original_bdev = get_interposed_bdev(map);
+ if (!original_bdev) {
+ r = -EINVAL;
+ DMERR("%s: interposer cannot get interposed device from table",
+ dm_device_name(md));
+ goto presuspend_undo;
+ }
+
+ if (!noflush && do_lockfs) {
+ r = lock_bdev_fs(md, original_bdev);
+ if (r) {
+ DMERR("%s: interposer cannot freeze interposed device",
+ dm_device_name(md));
+ goto presuspend_undo;
+ }
+ }
+
+ bdev_interposer_lock(original_bdev);
+ }
/*
- * Flush I/O to the device.
- * Any I/O submitted after lock_fs() may not be flushed.
- * noflush takes precedence over do_lockfs.
- * (lock_fs() flushes I/Os and waits for them to complete.)
+ * If map is not initialized, then we cannot suspend
+ * interposed device
*/
- if (!noflush && do_lockfs) {
- r = lock_fs(md);
- if (r) {
- dm_table_presuspend_undo_targets(map);
- return r;
- }
+
+presuspend_undo:
+ if (r) {
+ if (original_bdev)
+ bdput(original_bdev);
+ dm_table_presuspend_undo_targets(map);
+ return r;
}
/*
@@ -2559,14 +2638,40 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
if (map)
synchronize_srcu(&md->io_barrier);
- /* were we interrupted ? */
- if (r < 0) {
+ if (r == 0) { /* the wait ended successfully */
+ if (md->interpose && original_bdev) {
+ if (detach_ip) {
+ bdev_interposer_detach(original_bdev);
+ clear_bit(DMF_INTERPOSER_ATTACHED, &md->flags);
+ }
+
+ bdev_interposer_unlock(original_bdev);
+
+ if (detach_ip) {
+ /*
+ * If th interposer is detached, then there is
+ * no reason in keeping the queue of the
+ * interposed device stopped.
+ */
+ unlock_bdev_fs(md, original_bdev);
+ }
+
+ bdput(original_bdev);
+ }
+ } else { /* were we interrupted ? */
dm_queue_flush(md);
if (dm_request_based(md))
dm_start_queue(md->queue);
- unlock_fs(md);
+ if (md->interpose && original_bdev) {
+ bdev_interposer_unlock(original_bdev);
+ unlock_bdev_fs(md, original_bdev);
+
+ bdput(original_bdev);
+ } else
+ unlock_fs(md);
+
dm_table_presuspend_undo_targets(map);
/* pushback list is already flushed, so skip flush */
}
@@ -2574,6 +2679,47 @@ static int __dm_suspend(struct mapped_device *md, struct dm_table *map,
return r;
}
+static struct block_device *__dm_get_original_bdev(struct mapped_device *md)
+{
+ struct dm_table *map;
+ struct block_device *original_bdev = NULL;
+
+ map = rcu_dereference_protected(md->map,
+ lockdep_is_held(&md->suspend_lock));
+ if (!map) {
+ DMERR("%s: interposers table is not initialized",
+ dm_device_name(md));
+ return ERR_PTR(-EINVAL);
+ }
+
+ original_bdev = get_interposed_bdev(map);
+ if (!original_bdev) {
+ DMERR("%s: interposer cannot get interposed device from table",
+ dm_device_name(md));
+ return ERR_PTR(-EINVAL);
+ }
+
+ return original_bdev;
+}
+
+static int __dm_detach_interposer(struct mapped_device *md)
+{
+ struct block_device *original_bdev;
+
+ original_bdev = __dm_get_original_bdev(md);
+ if (IS_ERR(original_bdev))
+ return PTR_ERR(original_bdev);
+
+ bdev_interposer_lock(original_bdev);
+
+ bdev_interposer_detach(original_bdev);
+ clear_bit(DMF_INTERPOSER_ATTACHED, &md->flags);
+
+ bdev_interposer_unlock(original_bdev);
+
+ bdput(original_bdev);
+ return 0;
+}
/*
* We need to be able to change a mapping table under a mounted
* filesystem. For example we might want to move some data in
@@ -2599,7 +2745,17 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
mutex_lock_nested(&md->suspend_lock, SINGLE_DEPTH_NESTING);
if (dm_suspended_md(md)) {
- r = -EINVAL;
+ if (suspend_flags & DM_SUSPEND_DETACH_IP_FLAG) {
+ /*
+ * If mapped device is suspended, but should be
+ * detached we just detach without freeze fs on
+ * interposed device.
+ */
+ if (dm_interposer_attached(md))
+ r = __dm_detach_interposer(md);
+ } else
+ r = -EINVAL;
+
goto out_unlock;
}
@@ -2629,8 +2785,11 @@ int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
static int __dm_resume(struct mapped_device *md, struct dm_table *map)
{
+ int r = 0;
+ struct block_device *original_bdev;
+
if (map) {
- int r = dm_table_resume_targets(map);
+ r = dm_table_resume_targets(map);
if (r)
return r;
}
@@ -2645,9 +2804,33 @@ static int __dm_resume(struct mapped_device *md, struct dm_table *map)
if (dm_request_based(md))
dm_start_queue(md->queue);
- unlock_fs(md);
+ if (!md->interpose) {
+ unlock_fs(md);
+ return 0;
+ }
- return 0;
+ original_bdev = __dm_get_original_bdev(md);
+ if (IS_ERR(original_bdev))
+ return PTR_ERR(original_bdev);
+
+ if (dm_interposer_attached(md)) {
+ bdev_interposer_lock(original_bdev);
+
+ r = bdev_interposer_attach(original_bdev, dm_disk(md)->part0);
+ if (r)
+ DMERR("%s: failed to attach interposer",
+ dm_device_name(md));
+ else
+ set_bit(DMF_INTERPOSER_ATTACHED, &md->flags);
+
+ bdev_interposer_unlock(original_bdev);
+ }
+
+ unlock_bdev_fs(md, original_bdev);
+
+ bdput(original_bdev);
+
+ return r;
}
int dm_resume(struct mapped_device *md)
@@ -2880,6 +3063,11 @@ int dm_suspended_md(struct mapped_device *md)
return test_bit(DMF_SUSPENDED, &md->flags);
}
+int dm_interposer_attached(struct mapped_device *md)
+{
+ return test_bit(DMF_INTERPOSER_ATTACHED, &md->flags);
+}
+
static int dm_post_suspending_md(struct mapped_device *md)
{
return test_bit(DMF_POST_SUSPENDING, &md->flags);
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index b441ad772c18..35f71e48abd1 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -28,6 +28,7 @@
*/
#define DM_SUSPEND_LOCKFS_FLAG (1 << 0)
#define DM_SUSPEND_NOFLUSH_FLAG (1 << 1)
+#define DM_SUSPEND_DETACH_IP_FLAG (1 << 2)
/*
* Status feature flags
@@ -122,6 +123,11 @@ int dm_deleting_md(struct mapped_device *md);
*/
int dm_suspended_md(struct mapped_device *md);
+/*
+ * Is the interposer of this mapped_device is attached?
+ */
+int dm_interposer_attached(struct mapped_device *md);
+
/*
* Internal suspend and resume methods.
*/
@@ -180,7 +186,7 @@ int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only
int dm_cancel_deferred_remove(struct mapped_device *md);
int dm_request_based(struct mapped_device *md);
int dm_get_table_device(struct mapped_device *md, dev_t dev, fmode_t mode,
- struct dm_dev **result);
+ bool interpose, struct dm_dev **result);
void dm_put_table_device(struct mapped_device *md, struct dm_dev *d);
int dm_kobject_uevent(struct mapped_device *md, enum kobject_action action,
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 5c641f930caf..aa94c7e10ecc 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -159,6 +159,7 @@ struct dm_dev {
struct block_device *bdev;
struct dax_device *dax_dev;
fmode_t mode;
+ bool interpose;
char name[16];
};
@@ -168,8 +169,13 @@ dev_t dm_get_dev_t(const char *path);
* Constructors should call these functions to ensure destination devices
* are opened/closed correctly.
*/
-int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode,
- struct dm_dev **result);
+int dm_get_device_ex(struct dm_target *ti, const char *path, fmode_t mode,
+ bool interpose, struct dm_dev **result);
+static inline int dm_get_device(struct dm_target *ti, const char *path,
+ fmode_t mode, struct dm_dev **result)
+{
+ return dm_get_device_ex(ti, path, mode, false, result);
+};
void dm_put_device(struct dm_target *ti, struct dm_dev *d);
/*
@@ -550,6 +556,7 @@ sector_t dm_table_get_size(struct dm_table *t);
unsigned int dm_table_get_num_targets(struct dm_table *t);
fmode_t dm_table_get_mode(struct dm_table *t);
struct mapped_device *dm_table_get_md(struct dm_table *t);
+bool dm_table_is_interposer(struct dm_table *t);
const char *dm_table_device_name(struct dm_table *t);
/*
diff --git a/include/uapi/linux/dm-ioctl.h b/include/uapi/linux/dm-ioctl.h
index fcff6669137b..7f88f3d2d852 100644
--- a/include/uapi/linux/dm-ioctl.h
+++ b/include/uapi/linux/dm-ioctl.h
@@ -362,4 +362,10 @@ enum {
*/
#define DM_INTERNAL_SUSPEND_FLAG (1 << 18) /* Out */
+/*
+ * If set, the underlying device should open without FMODE_EXCL
+ * and attach mapped device via blk_interposer.
+ */
+#define DM_INTERPOSE_FLAG (1 << 19) /* In/Out */
+
#endif /* _LINUX_DM_IOCTL_H */
--
2.20.1
In order to prevent the same bio request from being intercepted multiple
times, the BIO_INTERPOSED flag was added.
The blk_partition_remap() function was moved from submit_bio_checks()
to submit_bio_noacct(). This allows the interposer to receive the bio
request unchanged.
The __submit_bio() and __submit_bio_noacct_mq() functions have been
removed and their respective functionalities were merged into
submit_bio_noacct() and __submit_bio_noacct() accordingly. This allows
to process bio requests from request-based and bio-based block devices
in one common loop.
Functions bio_interposer_lock() and bio_interposer_unlock() in
submit_bio_noacct() allow to stop the receipt of new bio requests for
processing, but not lock the processing of bio requests that have been
already added to the current->bio_list. To keep the penalty for a new
lock to a minimum, percpu_rw_sem is used.
Signed-off-by: Sergei Shtepa <[email protected]>
---
block/bio.c | 2 +
block/blk-core.c | 194 ++++++++++++++++++++++++++---------------------
2 files changed, 108 insertions(+), 88 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 50e579088aca..6fc9e8f395a6 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -640,6 +640,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
bio_set_flag(bio, BIO_THROTTLED);
if (bio_flagged(bio_src, BIO_REMAPPED))
bio_set_flag(bio, BIO_REMAPPED);
+ if (bio_flagged(bio_src, BIO_INTERPOSED))
+ bio_set_flag(bio, BIO_INTERPOSED);
bio->bi_opf = bio_src->bi_opf;
bio->bi_ioprio = bio_src->bi_ioprio;
bio->bi_write_hint = bio_src->bi_write_hint;
diff --git a/block/blk-core.c b/block/blk-core.c
index fc60ff208497..a987daa76a79 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -735,26 +735,27 @@ static inline int bio_check_eod(struct bio *bio)
handle_bad_sector(bio, maxsector);
return -EIO;
}
+
+ if (unlikely(should_fail_request(bio->bi_bdev, bio->bi_iter.bi_size)))
+ return -EIO;
+
return 0;
}
/*
* Remap block n of partition p to block n+start(p) of the disk.
*/
-static int blk_partition_remap(struct bio *bio)
+static inline void blk_partition_remap(struct bio *bio)
{
- struct block_device *p = bio->bi_bdev;
+ struct block_device *bdev = bio->bi_bdev;
- if (unlikely(should_fail_request(p, bio->bi_iter.bi_size)))
- return -EIO;
- if (bio_sectors(bio)) {
- bio->bi_iter.bi_sector += p->bd_start_sect;
- trace_block_bio_remap(bio, p->bd_dev,
+ if (bdev->bd_partno && bio_sectors(bio)) {
+ bio->bi_iter.bi_sector += bdev->bd_start_sect;
+ trace_block_bio_remap(bio, bdev->bd_dev,
bio->bi_iter.bi_sector -
- p->bd_start_sect);
+ bdev->bd_start_sect);
}
bio_set_flag(bio, BIO_REMAPPED);
- return 0;
}
/*
@@ -819,8 +820,6 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
if (!bio_flagged(bio, BIO_REMAPPED)) {
if (unlikely(bio_check_eod(bio)))
goto end_io;
- if (bdev->bd_partno && unlikely(blk_partition_remap(bio)))
- goto end_io;
}
/*
@@ -910,20 +909,6 @@ static noinline_for_stack bool submit_bio_checks(struct bio *bio)
return false;
}
-static blk_qc_t __submit_bio(struct bio *bio)
-{
- struct gendisk *disk = bio->bi_bdev->bd_disk;
- blk_qc_t ret = BLK_QC_T_NONE;
-
- if (blk_crypto_bio_prep(&bio)) {
- if (!disk->fops->submit_bio)
- return blk_mq_submit_bio(bio);
- ret = disk->fops->submit_bio(bio);
- }
- blk_queue_exit(disk->queue);
- return ret;
-}
-
/*
* The loop in this function may be a bit non-obvious, and so deserves some
* explanation:
@@ -931,7 +916,7 @@ static blk_qc_t __submit_bio(struct bio *bio)
* - Before entering the loop, bio->bi_next is NULL (as all callers ensure
* that), so we have a list with a single bio.
* - We pretend that we have just taken it off a longer list, so we assign
- * bio_list to a pointer to the bio_list_on_stack, thus initialising the
+ * bio_list to a pointer to the current->bio_list, thus initialising the
* bio_list of new bios to be added. ->submit_bio() may indeed add some more
* bios through a recursive call to submit_bio_noacct. If it did, we find a
* non-NULL value in bio_list and re-enter the loop from the top.
@@ -939,83 +924,75 @@ static blk_qc_t __submit_bio(struct bio *bio)
* pretending) and so remove it from bio_list, and call into ->submit_bio()
* again.
*
- * bio_list_on_stack[0] contains bios submitted by the current ->submit_bio.
- * bio_list_on_stack[1] contains bios that were submitted before the current
+ * current->bio_list[0] contains bios submitted by the current ->submit_bio.
+ * current->bio_list[1] contains bios that were submitted before the current
* ->submit_bio_bio, but that haven't been processed yet.
*/
static blk_qc_t __submit_bio_noacct(struct bio *bio)
{
- struct bio_list bio_list_on_stack[2];
- blk_qc_t ret = BLK_QC_T_NONE;
-
- BUG_ON(bio->bi_next);
-
- bio_list_init(&bio_list_on_stack[0]);
- current->bio_list = bio_list_on_stack;
-
- do {
- struct request_queue *q = bio->bi_bdev->bd_disk->queue;
- struct bio_list lower, same;
+ struct gendisk *disk = bio->bi_bdev->bd_disk;
+ struct bio_list lower, same;
+ blk_qc_t ret;
- if (unlikely(bio_queue_enter(bio) != 0))
- continue;
+ if (!blk_crypto_bio_prep(&bio)) {
+ blk_queue_exit(disk->queue);
+ return BLK_QC_T_NONE;
+ }
- /*
- * Create a fresh bio_list for all subordinate requests.
- */
- bio_list_on_stack[1] = bio_list_on_stack[0];
- bio_list_init(&bio_list_on_stack[0]);
+ if (queue_is_mq(disk->queue))
+ return blk_mq_submit_bio(bio);
- ret = __submit_bio(bio);
+ /*
+ * Create a fresh bio_list for all subordinate requests.
+ */
+ current->bio_list[1] = current->bio_list[0];
+ bio_list_init(¤t->bio_list[0]);
- /*
- * Sort new bios into those for a lower level and those for the
- * same level.
- */
- bio_list_init(&lower);
- bio_list_init(&same);
- while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL)
- if (q == bio->bi_bdev->bd_disk->queue)
- bio_list_add(&same, bio);
- else
- bio_list_add(&lower, bio);
+ WARN_ON_ONCE(!disk->fops->submit_bio);
+ ret = disk->fops->submit_bio(bio);
+ blk_queue_exit(disk->queue);
+ /*
+ * Sort new bios into those for a lower level and those
+ * for the same level.
+ */
+ bio_list_init(&lower);
+ bio_list_init(&same);
+ while ((bio = bio_list_pop(¤t->bio_list[0])) != NULL)
+ if (disk->queue == bio->bi_bdev->bd_disk->queue)
+ bio_list_add(&same, bio);
+ else
+ bio_list_add(&lower, bio);
- /*
- * Now assemble so we handle the lowest level first.
- */
- bio_list_merge(&bio_list_on_stack[0], &lower);
- bio_list_merge(&bio_list_on_stack[0], &same);
- bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
- } while ((bio = bio_list_pop(&bio_list_on_stack[0])));
+ /*
+ * Now assemble so we handle the lowest level first.
+ */
+ bio_list_merge(¤t->bio_list[0], &lower);
+ bio_list_merge(¤t->bio_list[0], &same);
+ bio_list_merge(¤t->bio_list[0], ¤t->bio_list[1]);
- current->bio_list = NULL;
return ret;
}
-static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
+static inline struct block_device *bio_interposer_lock(struct bio *bio)
{
- struct bio_list bio_list[2] = { };
- blk_qc_t ret = BLK_QC_T_NONE;
-
- current->bio_list = bio_list;
-
- do {
- struct gendisk *disk = bio->bi_bdev->bd_disk;
-
- if (unlikely(bio_queue_enter(bio) != 0))
- continue;
+ bool locked;
+ struct block_device *bdev = bio->bi_bdev;
- if (!blk_crypto_bio_prep(&bio)) {
- blk_queue_exit(disk->queue);
- ret = BLK_QC_T_NONE;
- continue;
+ if (bio->bi_opf & REQ_NOWAIT) {
+ locked = percpu_down_read_trylock(&bdev->bd_interposer_lock);
+ if (unlikely(!locked)) {
+ bio_wouldblock_error(bio);
+ return NULL;
}
+ } else
+ percpu_down_read(&bdev->bd_interposer_lock);
- ret = blk_mq_submit_bio(bio);
- } while ((bio = bio_list_pop(&bio_list[0])));
+ return bdev;
+}
- current->bio_list = NULL;
- return ret;
+static inline void bio_interposer_unlock(struct block_device *locked_bdev)
+{
+ percpu_up_read(&locked_bdev->bd_interposer_lock);
}
/**
@@ -1029,6 +1006,10 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
*/
blk_qc_t submit_bio_noacct(struct bio *bio)
{
+ struct block_device *locked_bdev;
+ struct bio_list bio_list_on_stack[2] = { };
+ blk_qc_t ret = BLK_QC_T_NONE;
+
if (!submit_bio_checks(bio))
return BLK_QC_T_NONE;
@@ -1043,9 +1024,46 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
return BLK_QC_T_NONE;
}
- if (!bio->bi_bdev->bd_disk->fops->submit_bio)
- return __submit_bio_noacct_mq(bio);
- return __submit_bio_noacct(bio);
+ BUG_ON(bio->bi_next);
+
+ locked_bdev = bio_interposer_lock(bio);
+ if (!locked_bdev)
+ return BLK_QC_T_NONE;
+
+ current->bio_list = bio_list_on_stack;
+
+ do {
+ if (unlikely(bio_queue_enter(bio) != 0)) {
+ ret = BLK_QC_T_NONE;
+ continue;
+ }
+
+ if (!bio_flagged(bio, BIO_INTERPOSED) &&
+ bio->bi_bdev->bd_interposer) {
+ struct gendisk *disk = bio->bi_bdev->bd_disk;
+
+ bio_set_dev(bio, bio->bi_bdev->bd_interposer);
+ bio_set_flag(bio, BIO_INTERPOSED);
+
+ bio_list_add(&bio_list_on_stack[0], bio);
+
+ blk_queue_exit(disk->queue);
+ ret = BLK_QC_T_NONE;
+ continue;
+ }
+
+ if (!bio_flagged(bio, BIO_REMAPPED))
+ blk_partition_remap(bio);
+
+ ret = __submit_bio_noacct(bio);
+
+ } while ((bio = bio_list_pop(&bio_list_on_stack[0])));
+
+ current->bio_list = NULL;
+
+ bio_interposer_unlock(locked_bdev);
+
+ return ret;
}
EXPORT_SYMBOL(submit_bio_noacct);
--
2.20.1
Additional fields were added in the block_device structure:
bd_interposer and bd_interposer_lock. The bd_interposer field contains
a pointer to an interposer block device. bd_interposer_lock is a lock
which allows to safely attach and detach the interposer device.
New functions bdev_interposer_attach() and bdev_interposer_detach()
allow to attach and detach an interposer device. But first it is
required to lock the processing of bio requests by the block device
with bdev_interposer_lock() function.
The BIO_INTERPOSED flag means that the bio request has been already
interposed. This flag avoids recursive bio request interception.
Signed-off-by: Sergei Shtepa <[email protected]>
---
block/genhd.c | 52 +++++++++++++++++++++++++++++++++++++++
fs/block_dev.c | 3 +++
include/linux/blk_types.h | 6 +++++
include/linux/blkdev.h | 32 ++++++++++++++++++++++++
4 files changed, 93 insertions(+)
diff --git a/block/genhd.c b/block/genhd.c
index 8c8f543572e6..3ec77947b3ba 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1938,3 +1938,55 @@ static void disk_release_events(struct gendisk *disk)
WARN_ON_ONCE(disk->ev && disk->ev->block != 1);
kfree(disk->ev);
}
+
+/**
+ * bdev_interposer_attach - Attach an interposer block device to original
+ * @original: original block device
+ * @interposer: interposer block device
+ *
+ * Before attaching an interposer, it is necessary to lock the processing
+ * of bio requests of the original device by calling bdev_interposer_lock().
+ *
+ * The bdev_interposer_detach() function allows to detach the interposer
+ * from the original block device.
+ */
+int bdev_interposer_attach(struct block_device *original,
+ struct block_device *interposer)
+{
+ struct block_device *bdev;
+
+ WARN_ON(!original);
+ if (original->bd_interposer)
+ return -EBUSY;
+
+ bdev = bdgrab(interposer);
+ if (!bdev)
+ return -ENODEV;
+
+ original->bd_interposer = bdev;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(bdev_interposer_attach);
+
+/**
+ * bdev_interposer_detach - Detach interposer from block device
+ * @original: original block device
+ *
+ * Before detaching an interposer, it is necessary to lock the processing
+ * of bio requests of the original device by calling bdev_interposer_lock().
+ *
+ * The interposer should be attached using the bdev_interposer_attach()
+ * function.
+ */
+void bdev_interposer_detach(struct block_device *original)
+{
+ if (WARN_ON(!original))
+ return;
+
+ if (!original->bd_interposer)
+ return;
+
+ bdput(original->bd_interposer);
+ original->bd_interposer = NULL;
+}
+EXPORT_SYMBOL_GPL(bdev_interposer_detach);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 09d6f7229db9..a98a56cc634f 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -809,6 +809,7 @@ static void bdev_free_inode(struct inode *inode)
{
struct block_device *bdev = I_BDEV(inode);
+ percpu_free_rwsem(&bdev->bd_interposer_lock);
free_percpu(bdev->bd_stats);
kfree(bdev->bd_meta_info);
@@ -909,6 +910,8 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
iput(inode);
return NULL;
}
+ bdev->bd_interposer = NULL;
+ percpu_init_rwsem(&bdev->bd_interposer_lock);
return bdev;
}
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index db026b6ec15a..8e4309eb3b18 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -46,6 +46,11 @@ struct block_device {
spinlock_t bd_size_lock; /* for bd_inode->i_size updates */
struct gendisk * bd_disk;
struct backing_dev_info *bd_bdi;
+ /* The interposer allows to redirect bio to another device */
+ struct block_device *bd_interposer;
+ /* Lock the queue of block device to attach or detach interposer.
+ * Allows to safely suspend and flush interposer. */
+ struct percpu_rw_semaphore bd_interposer_lock;
/* The counter of freeze processes */
int bd_fsfreeze_count;
@@ -304,6 +309,7 @@ enum {
BIO_CGROUP_ACCT, /* has been accounted to a cgroup */
BIO_TRACKED, /* set if bio goes through the rq_qos path */
BIO_REMAPPED,
+ BIO_INTERPOSED, /* bio was reassigned to another block device */
BIO_FLAG_LAST
};
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 158aefae1030..3e38b0c40b9d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -2029,4 +2029,36 @@ int fsync_bdev(struct block_device *bdev);
int freeze_bdev(struct block_device *bdev);
int thaw_bdev(struct block_device *bdev);
+/**
+ * bdev_interposer_lock - Lock bio processing
+ * @bdev: locking block device
+ *
+ * Lock the bio processing in submit_bio_noacct() for the new requests in the
+ * original block device. Requests from the interposer will not be locked.
+ *
+ * To unlock, use the bdev_interposer_unlock() function.
+ *
+ * This lock should be used to attach/detach the interposer to the device.
+ */
+static inline void bdev_interposer_lock(struct block_device *bdev)
+{
+ percpu_down_write(&bdev->bd_interposer_lock);
+}
+
+/**
+ * bdev_interposer_unlock - Unlock bio processing
+ * @bdev: locked block device
+ *
+ * Unlock the bio processing that was locked by bdev_interposer_lock() function.
+ *
+ * This lock should be used to attach/detach the interposer to the device.
+ */
+static inline void bdev_interposer_unlock(struct block_device *bdev)
+{
+ percpu_up_write(&bdev->bd_interposer_lock);
+}
+
+int bdev_interposer_attach(struct block_device *original,
+ struct block_device *interposer);
+void bdev_interposer_detach(struct block_device *original);
#endif /* _LINUX_BLKDEV_H */
--
2.20.1
Hi Mike,
This is a follow up message.
Did you have a chance to take a look at the latest patchset from April?
Thanks!
I can update the patch to be compatible with kernel 5.14.
I would like to know if you still have interest in blk_interposer.
--
Sergei Shtepa
Veeam Software developer.