This work aims to allow userspace to create and destroy block devices
in a race-free way, and to allow them to be exposed to other Xen VMs via
blkback without races.
Changes since v1:
- Several device-mapper fixes added.
- The diskseq is now a separate Xenstore node, rather than being part of
physical-device.
- Potentially backwards-incompatible changes to device-mapper now
require userspace opt-in.
- The code has been tested: I have a block script written in C that uses
these changes to successfully boot a Xen VM.
- The core block layer is almost completely untouched. Instead of
exposing a block device inode directly to userspace, device-mapper
ioctls that create a block device now return that device's diskseq.
Userspace can then use that diskseq to safely open the device.
Furthermore, ioctls that operate on an existing device-mapper device
now accept a diskseq parameter, which can be used to prevent races.
There are a few changes that make device-mapper's table validation
stricter. Two of them are clear-cut fixes for memory safety bugs: one
prevents a misaligned pointer dereference in the kernel, and the other
prevents pointer arithmetic overflow that could cause the kernel to
dereference a userspace pointer, especially on 32-bit systems. One
fixes a double-fetch bug that happens to be harmless right now, but
would make distribution backports of subsequent patches unsafe. The
remaining fixes prevent totally nonsensical tables from being accepted.
This includes parameter strings that overlap the subsequent target spec,
and target specs that overlap the 'struct dm_ioctl' or each other. I
doubt there is any userspace extant that generates such tables.
Finally, one patch forbids device-mapper devices to be named ".", "..",
or "control". Since device-mapper devices are often accessed via
/dev/mapper/NAME, such names would likely greatly confuse userspace. I
consider this to be an extension of the existing check that prohibits
device mapper names or UUIDs from containing '/'.
Demi Marie Obenour (16):
device-mapper: Check that target specs are sufficiently aligned
device-mapper: Avoid pointer arithmetic overflow
device-mapper: do not allow targets to overlap 'struct dm_ioctl'
device-mapper: Better error message for too-short target spec
device-mapper: Target parameters must not overlap next target spec
device-mapper: Avoid double-fetch of version
device-mapper: Allow userspace to opt-in to strict parameter checks
device-mapper: Allow userspace to provide expected diskseq
device-mapper: Allow userspace to suppress uevent generation
device-mapper: Refuse to create device named "control"
device-mapper: "." and ".." are not valid symlink names
device-mapper: inform caller about already-existing device
xen-blkback: Implement diskseq checks
block, loop: Increment diskseq when releasing a loop device
xen-blkback: Minor cleanups
xen-blkback: Inform userspace that device has been opened
block/genhd.c | 1 +
drivers/block/loop.c | 6 +
drivers/block/xen-blkback/blkback.c | 8 +-
drivers/block/xen-blkback/xenbus.c | 147 ++++++++--
drivers/md/dm-core.h | 2 +
drivers/md/dm-ioctl.c | 432 ++++++++++++++++++++++------
drivers/md/dm.c | 5 +-
include/linux/device-mapper.h | 2 +-
include/uapi/linux/dm-ioctl.h | 67 ++++-
9 files changed, 551 insertions(+), 119 deletions(-)
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
Using either of these is going to greatly confuse userspace, as they are
not valid symlink names and so creating the usual /dev/mapper/NAME
symlink will not be possible. As creating a device with either of these
names is almost certainly a userspace bug, just error out.
Signed-off-by: Demi Marie Obenour <[email protected]>
---
drivers/md/dm-ioctl.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 9ae00e3c1a72c19575814cf473774835b364320b..17ece816d490b6c40d019da131ade44c9a201dab 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -775,8 +775,10 @@ static int check_name(const char *name)
return -EINVAL;
}
- if (strcmp(name, DM_CONTROL_NODE) == 0) {
- DMERR("device name cannot be \"%s\"", DM_CONTROL_NODE);
+ if (strcmp(name, DM_CONTROL_NODE) == 0 ||
+ strcmp(name, ".") == 0 ||
+ strcmp(name, "..") == 0) {
+ DMERR("device name cannot be \"%s\", \".\", or \"..\"", DM_CONTROL_NODE);
return -EINVAL;
}
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
Not only is this helpful for debugging, it also saves the caller an
ioctl in the case where a device should be used if it exists or created
otherwise. To ensure existing userspace is not broken, this feature is
only enabled in strict mode.
Signed-off-by: Demi Marie Obenour <[email protected]>
---
drivers/md/dm-ioctl.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 17ece816d490b6c40d019da131ade44c9a201dab..44425093d3b908abf80e05e1fc99a26b17e18a42 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -256,11 +256,13 @@ static void free_cell(struct hash_cell *hc)
}
}
+static void __dev_status(struct mapped_device *md, struct dm_ioctl *param);
+
/*
* The kdev_t and uuid of a device can never change once it is
* initially inserted.
*/
-static int dm_hash_insert(const char *name, const char *uuid, struct mapped_device *md)
+static int dm_hash_insert(const char *name, const char *uuid, struct mapped_device *md, struct dm_ioctl *param)
{
struct hash_cell *cell, *hc;
@@ -277,6 +279,8 @@ static int dm_hash_insert(const char *name, const char *uuid, struct mapped_devi
down_write(&_hash_lock);
hc = __get_name_cell(name);
if (hc) {
+ if (param)
+ __dev_status(hc->md, param);
dm_put(hc->md);
goto bad;
}
@@ -287,6 +291,8 @@ static int dm_hash_insert(const char *name, const char *uuid, struct mapped_devi
hc = __get_uuid_cell(uuid);
if (hc) {
__unlink_name(cell);
+ if (param)
+ __dev_status(hc->md, param);
dm_put(hc->md);
goto bad;
}
@@ -901,12 +907,14 @@ static int dev_create(struct file *filp, struct dm_ioctl *param, size_t param_si
m = MINOR(huge_decode_dev(param->dev));
r = dm_create(m, &md, param->flags & DM_DISABLE_UEVENTS_FLAG);
- if (r)
+ if (r) {
+ DMERR("Could not create device-mapper device");
return r;
+ }
param->flags &= ~DM_INACTIVE_PRESENT_FLAG;
- r = dm_hash_insert(param->name, *param->uuid ? param->uuid : NULL, md);
+ r = dm_hash_insert(param->name, *param->uuid ? param->uuid : NULL, md, param);
if (r) {
dm_put(md);
dm_destroy(md);
@@ -2269,7 +2277,6 @@ static int ctl_ioctl(struct file *file, uint command, struct dm_ioctl __user *us
goto out;
/* This XOR keeps only the flags validate_params has changed. */
old_flags ^= param->flags;
-
param->data_size = sloppy_checks(param) ? offsetof(struct dm_ioctl, data) : sizeof(struct dm_ioctl);
r = fn(file, param, input_param_size);
@@ -2284,9 +2291,14 @@ static int ctl_ioctl(struct file *file, uint command, struct dm_ioctl __user *us
param->flags |= old_flags;
/*
- * Copy the results back to userland.
+ * Copy the results back to userland if either:
+ *
+ * - The ioctl succeeded.
+ * - The ioctl is DM_DEV_CREATE, the return value is -EBUSY,
+ * and strict parameter checking is enabled.
*/
- if (!r && copy_to_user(user, param, param->data_size))
+ if ((!r || (!sloppy_checks(param) && cmd == DM_DEV_CREATE_CMD && r == -EBUSY)) &&
+ copy_to_user(user, param, param->data_size))
r = -EFAULT;
out:
@@ -2465,7 +2477,7 @@ int __init dm_early_create(struct dm_ioctl *dmi,
return r;
/* hash insert */
- r = dm_hash_insert(dmi->name, *dmi->uuid ? dmi->uuid : NULL, md);
+ r = dm_hash_insert(dmi->name, *dmi->uuid ? dmi->uuid : NULL, md, NULL);
if (r)
goto err_destroy_dm;
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
Typical userspace setups create a symlink under /dev/mapper with the
name of the device, but /dev/mapper/control is reserved for the control
device. Therefore, trying to create such a device is almost certain to
be a userspace bug.
Signed-off-by: Demi Marie Obenour <[email protected]>
---
drivers/md/dm-ioctl.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index 52aa5505d23b2f3d9c0faf6e8a91b74cd7845581..9ae00e3c1a72c19575814cf473774835b364320b 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -771,7 +771,12 @@ static int get_target_version(struct file *filp, struct dm_ioctl *param, size_t
static int check_name(const char *name)
{
if (strchr(name, '/')) {
- DMERR("invalid device name");
+ DMERR("device name cannot contain '/'");
+ return -EINVAL;
+ }
+
+ if (strcmp(name, DM_CONTROL_NODE) == 0) {
+ DMERR("device name cannot be \"%s\"", DM_CONTROL_NODE);
return -EINVAL;
}
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
This adds a couple of BUILD_BUG_ON()s and moves some arithmetic after
the validation code that checks the arithmetic’s preconditions. The
previous code was correct but could potentially trip sanitizers that
check for unsigned integer wraparound.
Signed-off-by: Demi Marie Obenour <[email protected]>
---
drivers/block/xen-blkback/blkback.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index c362f4ad80ab07bfb58caff0ed7da37dc1484fc5..ac760a08d559085ab875784f1c58cdf2ead95a43 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -1342,6 +1342,8 @@ static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
nseg = req->operation == BLKIF_OP_INDIRECT ?
req->u.indirect.nr_segments : req->u.rw.nr_segments;
+ BUILD_BUG_ON(offsetof(struct blkif_request, u.rw.id) != 8);
+ BUILD_BUG_ON(offsetof(struct blkif_request, u.indirect.id) != 8);
if (unlikely(nseg == 0 && operation_flags != REQ_PREFLUSH) ||
unlikely((req->operation != BLKIF_OP_INDIRECT) &&
(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) ||
@@ -1365,13 +1367,13 @@ static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
preq.sector_number = req->u.rw.sector_number;
for (i = 0; i < nseg; i++) {
pages[i]->gref = req->u.rw.seg[i].gref;
- seg[i].nsec = req->u.rw.seg[i].last_sect -
- req->u.rw.seg[i].first_sect + 1;
- seg[i].offset = (req->u.rw.seg[i].first_sect << 9);
if ((req->u.rw.seg[i].last_sect >= (XEN_PAGE_SIZE >> 9)) ||
(req->u.rw.seg[i].last_sect <
req->u.rw.seg[i].first_sect))
goto fail_response;
+ seg[i].nsec = req->u.rw.seg[i].last_sect -
+ req->u.rw.seg[i].first_sect + 1;
+ seg[i].offset = (req->u.rw.seg[i].first_sect << 9);
preq.nr_sects += seg[i].nsec;
}
} else {
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
On Tue, May 30, 2023 at 04:31:00PM -0400, Demi Marie Obenour wrote:
> This work aims to allow userspace to create and destroy block devices
> in a race-free way, and to allow them to be exposed to other Xen VMs via
> blkback without races.
>
> Changes since v1:
>
> - Several device-mapper fixes added.
Let's get these reviewed by the DM maintainers independently. This
series is mixing up way too many things.