2023-10-07 01:28:35

by Sarthak Kukreti

[permalink] [raw]
Subject: [PATCH v8 0/5] Introduce provisioning primitives

Hi,

This patch series is version 8 of the patch series to introduce
block-level provisioning mechanism (original [1]), which is useful for provisioning
space across thinly provisioned storage architectures (loop devices
backed by sparse files, dm-thin devices, virtio-blk). This series has
minimal changes over v7[2].

This patch series is rebased from the linux-dm/dm-6.5-provision-support [1] on to
(cac405a3bfa2 Merge tag 'for-6.6-rc3-tag'). In addition, there's an
additional patch to allow passing through an unshare intent via REQ_OP_PROVISION
(suggested by Darrick in [4]).

[1] Original: https://lore.kernel.org/lkml/[email protected]/
[2] v7 (last series): https://lore.kernel.org/linux-fsdevel/[email protected]/
[3] linux-dm/dm-6.5-provision-suppport tree:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-6.5-provision-support
(with the last two WIP patches for dm-thinpool dropped as per discussion with
maintainers).
[4] https://lore.kernel.org/linux-fsdevel/20230522163710.GA11607@frogsfrogsfrogs/

Changes from v7:
- Drop dm-thinpool (will be independently developed with snapshot
support) and dm-snapshot (will not be supported) from the series.
- (By [email protected]) Fixes for block device provision limits.
- (Suggested by [email protected]) Add mechanism to pass unshare intent
via REQ_OP_PROVISION

Sarthak Kukreti (5):
block: Don't invalidate pagecache for invalid falloc modes
block: Introduce provisioning primitives
loop: Add support for provision requests
dm: Add block provisioning support
block: Pass unshare intent via REQ_OP_PROVISION

block/blk-core.c | 5 +++
block/blk-lib.c | 55 ++++++++++++++++++++++++++++++++
block/blk-merge.c | 18 +++++++++++
block/blk-settings.c | 19 +++++++++++
block/blk-sysfs.c | 9 ++++++
block/bounce.c | 1 +
block/fops.c | 33 ++++++++++++++++----
drivers/block/loop.c | 59 ++++++++++++++++++++++++++++++++---
drivers/md/dm-crypt.c | 4 ++-
drivers/md/dm-linear.c | 1 +
drivers/md/dm-table.c | 23 ++++++++++++++
drivers/md/dm.c | 7 +++++
include/linux/bio.h | 6 ++--
include/linux/blk_types.h | 8 ++++-
include/linux/blkdev.h | 17 ++++++++++
include/linux/device-mapper.h | 17 ++++++++++
16 files changed, 268 insertions(+), 14 deletions(-)

--
2.42.0.609.gbb76f46606-goog


2023-10-07 01:28:51

by Sarthak Kukreti

[permalink] [raw]
Subject: [PATCH v8 3/5] loop: Add support for provision requests

Add support for provision requests to loopback devices.
Loop devices will configure provision support based on
whether the underlying block device/file can support
the provision request and upon receiving a provision bio,
will map it to the backing device/storage. For loop devices
over files, a REQ_OP_PROVISION request will translate to
an fallocate mode 0 call on the backing file.

Signed-off-by: Sarthak Kukreti <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
drivers/block/loop.c | 34 +++++++++++++++++++++++++++++++---
1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 9f2d412fc560..abb4dddbd4fd 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -311,16 +311,20 @@ static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
{
/*
* We use fallocate to manipulate the space mappings used by the image
- * a.k.a. discard/zerorange.
+ * a.k.a. discard/provision/zerorange.
*/
struct file *file = lo->lo_backing_file;
int ret;

- mode |= FALLOC_FL_KEEP_SIZE;
+ if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE) &&
+ !bdev_max_discard_sectors(lo->lo_device))
+ return -EOPNOTSUPP;

- if (!bdev_max_discard_sectors(lo->lo_device))
+ if (mode == 0 && !bdev_max_provision_sectors(lo->lo_device))
return -EOPNOTSUPP;

+ mode |= FALLOC_FL_KEEP_SIZE;
+
ret = file->f_op->fallocate(file, mode, pos, blk_rq_bytes(rq));
if (unlikely(ret && ret != -EINVAL && ret != -EOPNOTSUPP))
return -EIO;
@@ -488,6 +492,8 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
FALLOC_FL_PUNCH_HOLE);
case REQ_OP_DISCARD:
return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
+ case REQ_OP_PROVISION:
+ return lo_fallocate(lo, rq, pos, 0);
case REQ_OP_WRITE:
if (cmd->use_aio)
return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
@@ -754,6 +760,25 @@ static void loop_sysfs_exit(struct loop_device *lo)
&loop_attribute_group);
}

+static void loop_config_provision(struct loop_device *lo)
+{
+ struct file *file = lo->lo_backing_file;
+ struct inode *inode = file->f_mapping->host;
+
+ /*
+ * If the backing device is a block device, mirror its provisioning
+ * capability.
+ */
+ if (S_ISBLK(inode->i_mode)) {
+ blk_queue_max_provision_sectors(lo->lo_queue,
+ bdev_max_provision_sectors(I_BDEV(inode)));
+ } else if (file->f_op->fallocate) {
+ blk_queue_max_provision_sectors(lo->lo_queue, UINT_MAX >> 9);
+ } else {
+ blk_queue_max_provision_sectors(lo->lo_queue, 0);
+ }
+}
+
static void loop_config_discard(struct loop_device *lo)
{
struct file *file = lo->lo_backing_file;
@@ -1092,6 +1117,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
blk_queue_io_min(lo->lo_queue, bsize);

loop_config_discard(lo);
+ loop_config_provision(lo);
loop_update_rotational(lo);
loop_update_dio(lo);
loop_sysfs_init(lo);
@@ -1304,6 +1330,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
}

loop_config_discard(lo);
+ loop_config_provision(lo);

/* update dio if lo_offset or transfer is changed */
__loop_update_dio(lo, lo->use_dio);
@@ -1857,6 +1884,7 @@ static blk_status_t loop_queue_rq(struct blk_mq_hw_ctx *hctx,
case REQ_OP_FLUSH:
case REQ_OP_DISCARD:
case REQ_OP_WRITE_ZEROES:
+ case REQ_OP_PROVISION:
cmd->use_aio = false;
break;
default:
--
2.42.0.609.gbb76f46606-goog

2023-10-07 01:28:57

by Sarthak Kukreti

[permalink] [raw]
Subject: [PATCH v8 2/5] block: Introduce provisioning primitives

Introduce block request REQ_OP_PROVISION. The intent of this request
is to request underlying storage to preallocate disk space for the given
block range. Block devices that support this capability will export
a provision limit within their request queues.

This patch also adds the capability to call fallocate() in mode 0
on block devices, which will send REQ_OP_PROVISION to the block
device for the specified range,

Signed-off-by: Sarthak Kukreti <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
block/blk-core.c | 5 ++++
block/blk-lib.c | 51 +++++++++++++++++++++++++++++++++++++++
block/blk-merge.c | 18 ++++++++++++++
block/blk-settings.c | 19 +++++++++++++++
block/blk-sysfs.c | 9 +++++++
block/bounce.c | 1 +
block/fops.c | 10 +++++++-
include/linux/bio.h | 6 +++--
include/linux/blk_types.h | 5 +++-
include/linux/blkdev.h | 16 ++++++++++++
10 files changed, 136 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 9d51e9894ece..e1615ffa71bc 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -123,6 +123,7 @@ static const char *const blk_op_name[] = {
REQ_OP_NAME(WRITE_ZEROES),
REQ_OP_NAME(DRV_IN),
REQ_OP_NAME(DRV_OUT),
+ REQ_OP_NAME(PROVISION)
};
#undef REQ_OP_NAME

@@ -792,6 +793,10 @@ void submit_bio_noacct(struct bio *bio)
if (!q->limits.max_write_zeroes_sectors)
goto not_supported;
break;
+ case REQ_OP_PROVISION:
+ if (!q->limits.max_provision_sectors)
+ goto not_supported;
+ break;
default:
break;
}
diff --git a/block/blk-lib.c b/block/blk-lib.c
index e59c3069e835..b1f720e198cd 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -343,3 +343,54 @@ int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
return ret;
}
EXPORT_SYMBOL(blkdev_issue_secure_erase);
+
+/**
+ * blkdev_issue_provision - provision a block range
+ * @bdev: blockdev to write
+ * @sector: start sector
+ * @nr_sects: number of sectors to provision
+ * @gfp_mask: memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ * Issues a provision request to the block device for the range of sectors.
+ * For thinly provisioned block devices, this acts as a signal for the
+ * underlying storage pool to allocate space for this block range.
+ */
+int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
+ sector_t nr_sects, gfp_t gfp)
+{
+ sector_t bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
+ unsigned int max_sectors = bdev_max_provision_sectors(bdev);
+ struct bio *bio = NULL;
+ struct blk_plug plug;
+ int ret = 0;
+
+ if (max_sectors == 0)
+ return -EOPNOTSUPP;
+ if ((sector | nr_sects) & bs_mask)
+ return -EINVAL;
+ if (bdev_read_only(bdev))
+ return -EPERM;
+
+ blk_start_plug(&plug);
+ for (;;) {
+ unsigned int req_sects = min_t(sector_t, nr_sects, max_sectors);
+
+ bio = blk_next_bio(bio, bdev, 0, REQ_OP_PROVISION, gfp);
+ bio->bi_iter.bi_sector = sector;
+ bio->bi_iter.bi_size = req_sects << SECTOR_SHIFT;
+
+ sector += req_sects;
+ nr_sects -= req_sects;
+ if (!nr_sects) {
+ ret = submit_bio_wait(bio);
+ bio_put(bio);
+ break;
+ }
+ cond_resched();
+ }
+ blk_finish_plug(&plug);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(blkdev_issue_provision);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 65e75efa9bd3..83e516d2121f 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -158,6 +158,21 @@ static struct bio *bio_split_write_zeroes(struct bio *bio,
return bio_split(bio, lim->max_write_zeroes_sectors, GFP_NOIO, bs);
}

+static struct bio *bio_split_provision(struct bio *bio,
+ const struct queue_limits *lim,
+ unsigned int *nsegs, struct bio_set *bs)
+{
+ *nsegs = 0;
+
+ if (!lim->max_provision_sectors)
+ return NULL;
+
+ if (bio_sectors(bio) <= lim->max_provision_sectors)
+ return NULL;
+
+ return bio_split(bio, lim->max_provision_sectors, GFP_NOIO, bs);
+}
+
/*
* Return the maximum number of sectors from the start of a bio that may be
* submitted as a single request to a block device. If enough sectors remain,
@@ -366,6 +381,9 @@ struct bio *__bio_split_to_limits(struct bio *bio,
case REQ_OP_WRITE_ZEROES:
split = bio_split_write_zeroes(bio, lim, nr_segs, bs);
break;
+ case REQ_OP_PROVISION:
+ split = bio_split_provision(bio, lim, nr_segs, bs);
+ break;
default:
split = bio_split_rw(bio, lim, nr_segs, bs,
get_max_io_size(bio, lim) << SECTOR_SHIFT);
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 0046b447268f..c81820406f2f 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -59,6 +59,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->zoned = BLK_ZONED_NONE;
lim->zone_write_granularity = 0;
lim->dma_alignment = 511;
+ lim->max_provision_sectors = 0;
}

/**
@@ -82,6 +83,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
lim->max_dev_sectors = UINT_MAX;
lim->max_write_zeroes_sectors = UINT_MAX;
lim->max_zone_append_sectors = UINT_MAX;
+ lim->max_provision_sectors = UINT_MAX;
}
EXPORT_SYMBOL(blk_set_stacking_limits);

@@ -208,6 +210,20 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
}
EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);

+/**
+ * blk_queue_max_provision_sectors - set max sectors for a single provision
+ *
+ * @q: the request queue for the device
+ * @max_provision_sectors: maximum number of sectors to provision per command
+ **/
+
+void blk_queue_max_provision_sectors(struct request_queue *q,
+ unsigned int max_provision_sectors)
+{
+ q->limits.max_provision_sectors = max_provision_sectors;
+}
+EXPORT_SYMBOL(blk_queue_max_provision_sectors);
+
/**
* blk_queue_max_zone_append_sectors - set max sectors for a single zone append
* @q: the request queue for the device
@@ -578,6 +594,9 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->max_segment_size = min_not_zero(t->max_segment_size,
b->max_segment_size);

+ t->max_provision_sectors = min(t->max_provision_sectors,
+ b->max_provision_sectors);
+
t->misaligned |= b->misaligned;

alignment = queue_limit_alignment_offset(b, start);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 63e481262336..9a78c36f3199 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -199,6 +199,13 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
return queue_var_show(0, page);
}

+static ssize_t queue_provision_max_show(struct request_queue *q,
+ char *page)
+{
+ return sprintf(page, "%llu\n",
+ (unsigned long long)q->limits.max_provision_sectors << 9);
+}
+
static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
{
return queue_var_show(0, page);
@@ -507,6 +514,7 @@ QUEUE_RO_ENTRY(queue_discard_max_hw, "discard_max_hw_bytes");
QUEUE_RW_ENTRY(queue_discard_max, "discard_max_bytes");
QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");

+QUEUE_RO_ENTRY(queue_provision_max, "provision_max_bytes");
QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
QUEUE_RO_ENTRY(queue_write_zeroes_max, "write_zeroes_max_bytes");
QUEUE_RO_ENTRY(queue_zone_append_max, "zone_append_max_bytes");
@@ -633,6 +641,7 @@ static struct attribute *queue_attrs[] = {
&queue_discard_max_entry.attr,
&queue_discard_max_hw_entry.attr,
&queue_discard_zeroes_data_entry.attr,
+ &queue_provision_max_entry.attr,
&queue_write_same_max_entry.attr,
&queue_write_zeroes_max_entry.attr,
&queue_zone_append_max_entry.attr,
diff --git a/block/bounce.c b/block/bounce.c
index 7cfcb242f9a1..ab9d8723ae64 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -176,6 +176,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
case REQ_OP_DISCARD:
case REQ_OP_SECURE_ERASE:
case REQ_OP_WRITE_ZEROES:
+ case REQ_OP_PROVISION:
break;
default:
bio_for_each_segment(bv, bio_src, iter)
diff --git a/block/fops.c b/block/fops.c
index 73e42742543f..99b24bd9d461 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -737,7 +737,8 @@ static ssize_t blkdev_read_iter(struct kiocb *iocb, struct iov_iter *to)

#define BLKDEV_FALLOC_FL_SUPPORTED \
(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \
- FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
+ FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE | \
+ FALLOC_FL_UNSHARE_RANGE)

static long blkdev_fallocate(struct file *file, int mode, loff_t start,
loff_t len)
@@ -777,6 +778,13 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
* de-allocate mode calls to fallocate().
*/
switch (mode) {
+ case 0:
+ case FALLOC_FL_UNSHARE_RANGE:
+ case FALLOC_FL_KEEP_SIZE:
+ case FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE:
+ error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
+ len >> SECTOR_SHIFT, GFP_KERNEL);
+ break;
case FALLOC_FL_ZERO_RANGE:
case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
error = truncate_bdev_range(bdev, file_to_blk_mode(file), start, end);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 41d417ee1349..aa2119f7cd5a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -57,7 +57,8 @@ static inline bool bio_has_data(struct bio *bio)
bio->bi_iter.bi_size &&
bio_op(bio) != REQ_OP_DISCARD &&
bio_op(bio) != REQ_OP_SECURE_ERASE &&
- bio_op(bio) != REQ_OP_WRITE_ZEROES)
+ bio_op(bio) != REQ_OP_WRITE_ZEROES &&
+ bio_op(bio) != REQ_OP_PROVISION)
return true;

return false;
@@ -67,7 +68,8 @@ static inline bool bio_no_advance_iter(const struct bio *bio)
{
return bio_op(bio) == REQ_OP_DISCARD ||
bio_op(bio) == REQ_OP_SECURE_ERASE ||
- bio_op(bio) == REQ_OP_WRITE_ZEROES;
+ bio_op(bio) == REQ_OP_WRITE_ZEROES ||
+ bio_op(bio) == REQ_OP_PROVISION;
}

static inline void *bio_data(struct bio *bio)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d5c5e59ddbd2..e55828ddfafe 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -397,7 +397,10 @@ enum req_op {
REQ_OP_DRV_IN = (__force blk_opf_t)34,
REQ_OP_DRV_OUT = (__force blk_opf_t)35,

- REQ_OP_LAST = (__force blk_opf_t)36,
+ /* request device to provision block */
+ REQ_OP_PROVISION = (__force blk_opf_t)37,
+
+ REQ_OP_LAST = (__force blk_opf_t)38,
};

enum req_flag_bits {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index eef450f25982..dcae5538f99a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -308,6 +308,7 @@ struct queue_limits {
unsigned int discard_granularity;
unsigned int discard_alignment;
unsigned int zone_write_granularity;
+ unsigned int max_provision_sectors;

unsigned short max_segments;
unsigned short max_integrity_segments;
@@ -900,6 +901,8 @@ extern void blk_queue_max_discard_sectors(struct request_queue *q,
unsigned int max_discard_sectors);
extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
unsigned int max_write_same_sectors);
+extern void blk_queue_max_provision_sectors(struct request_queue *q,
+ unsigned int max_provision_sectors);
extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
extern void blk_queue_max_zone_append_sectors(struct request_queue *q,
unsigned int max_zone_append_sectors);
@@ -1038,6 +1041,9 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp);

+extern int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
+ sector_t nr_sects, gfp_t gfp_mask);
+
#define BLKDEV_ZERO_NOUNMAP (1 << 0) /* do not free blocks */
#define BLKDEV_ZERO_NOFALLBACK (1 << 1) /* don't write explicit zeroes */

@@ -1117,6 +1123,11 @@ static inline unsigned short queue_max_discard_segments(const struct request_que
return q->limits.max_discard_segments;
}

+static inline unsigned short queue_max_provision_sectors(const struct request_queue *q)
+{
+ return q->limits.max_provision_sectors;
+}
+
static inline unsigned int queue_max_segment_size(const struct request_queue *q)
{
return q->limits.max_segment_size;
@@ -1259,6 +1270,11 @@ static inline bool bdev_nowait(struct block_device *bdev)
return test_bit(QUEUE_FLAG_NOWAIT, &bdev_get_queue(bdev)->queue_flags);
}

+static inline unsigned int bdev_max_provision_sectors(struct block_device *bdev)
+{
+ return bdev_get_queue(bdev)->limits.max_provision_sectors;
+}
+
static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
{
return blk_queue_zoned_model(bdev_get_queue(bdev));
--
2.42.0.609.gbb76f46606-goog

2023-10-07 01:29:14

by Sarthak Kukreti

[permalink] [raw]
Subject: [PATCH v8 4/5] dm: Add block provisioning support

Add block provisioning support for device-mapper targets.
dm-crypt and dm-linear will, by default, passthrough REQ_OP_PROVISION
requests to the underlying device, if supported.

Signed-off-by: Sarthak Kukreti <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
drivers/md/dm-crypt.c | 4 +++-
drivers/md/dm-linear.c | 1 +
drivers/md/dm-table.c | 23 +++++++++++++++++++++++
drivers/md/dm.c | 7 +++++++
include/linux/device-mapper.h | 17 +++++++++++++++++
5 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index f2662c21a6df..8f94d98e241b 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -3362,6 +3362,8 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
cc->tag_pool_max_sectors <<= cc->sector_shift;
}

+ ti->num_provision_bios = 1;
+
ret = -ENOMEM;
cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM, 1, devname);
if (!cc->io_queue) {
@@ -3416,7 +3418,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
* - for REQ_OP_DISCARD caller must use flush if IO ordering matters
*/
if (unlikely(bio->bi_opf & REQ_PREFLUSH ||
- bio_op(bio) == REQ_OP_DISCARD)) {
+ bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_PROVISION)) {
bio_set_dev(bio, cc->dev->bdev);
if (bio_sectors(bio))
bio->bi_iter.bi_sector = cc->start +
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index f4448d520ee9..74ee27ca551a 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -62,6 +62,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
ti->num_discard_bios = 1;
ti->num_secure_erase_bios = 1;
ti->num_write_zeroes_bios = 1;
+ ti->num_provision_bios = 1;
ti->private = lc;
return 0;

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 37b48f63ae6a..1839317d047e 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1856,6 +1856,26 @@ static bool dm_table_supports_write_zeroes(struct dm_table *t)
return true;
}

+static int device_provision_capable(struct dm_target *ti, struct dm_dev *dev,
+ sector_t start, sector_t len, void *data)
+{
+ return bdev_max_provision_sectors(dev->bdev);
+}
+
+static bool dm_table_supports_provision(struct dm_table *t)
+{
+ for (unsigned int i = 0; i < t->num_targets; i++) {
+ struct dm_target *ti = dm_table_get_target(t, i);
+
+ if (ti->provision_supported ||
+ (ti->type->iterate_devices &&
+ ti->type->iterate_devices(ti, device_provision_capable, NULL)))
+ return true;
+ }
+
+ return false;
+}
+
static int device_not_nowait_capable(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
{
@@ -1989,6 +2009,9 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
if (!dm_table_supports_write_zeroes(t))
q->limits.max_write_zeroes_sectors = 0;

+ if (!dm_table_supports_provision(t))
+ q->limits.max_provision_sectors = 0;
+
dm_table_verify_integrity(t);

/*
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 64a1f306c96c..0e6cf1a5a414 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1598,6 +1598,7 @@ static bool is_abnormal_io(struct bio *bio)
case REQ_OP_DISCARD:
case REQ_OP_SECURE_ERASE:
case REQ_OP_WRITE_ZEROES:
+ case REQ_OP_PROVISION:
return true;
default:
break;
@@ -1634,6 +1635,12 @@ static blk_status_t __process_abnormal_io(struct clone_info *ci,
if (ti->max_write_zeroes_granularity)
max_granularity = max_sectors;
break;
+ case REQ_OP_PROVISION:
+ num_bios = ti->num_provision_bios;
+ max_sectors = limits->max_provision_sectors;
+ if (ti->max_provision_granularity)
+ max_granularity = max_sectors;
+ break;
default:
break;
}
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 69d0435c7ebb..41fd4e456d1f 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -332,6 +332,12 @@ struct dm_target {
*/
unsigned int num_write_zeroes_bios;

+ /*
+ * The number of PROVISION bios that will be submitted to the target.
+ * The bio number can be accessed with dm_bio_get_target_bio_nr.
+ */
+ unsigned int num_provision_bios;
+
/*
* The minimum number of extra bytes allocated in each io for the
* target to use.
@@ -356,6 +362,11 @@ struct dm_target {
*/
bool discards_supported:1;

+ /* Set if this target needs to receive provision requests regardless of
+ * whether or not its underlying devices have support.
+ */
+ bool provision_supported:1;
+
/*
* Set if this target requires that discards be split on
* 'max_discard_sectors' boundaries.
@@ -374,6 +385,12 @@ struct dm_target {
*/
bool max_write_zeroes_granularity:1;

+ /*
+ * Set if this target requires that provisions be split on
+ * 'max_provision_sectors' boundaries.
+ */
+ bool max_provision_granularity:1;
+
/*
* Set if we need to limit the number of in-flight bios when swapping.
*/
--
2.42.0.609.gbb76f46606-goog

2023-10-07 01:29:19

by Sarthak Kukreti

[permalink] [raw]
Subject: [PATCH v8 5/5] block: Pass unshare intent via REQ_OP_PROVISION

Allow REQ_OP_PROVISION to pass in an extra REQ_UNSHARE bit to
annotate unshare requests to underlying layers. Layers that support
FALLOC_FL_UNSHARE will be able to use this as an indicator of which
fallocate() mode to use.

Suggested-by: Darrick J. Wong <[email protected]>
Signed-off-by: Sarthak Kukreti <[email protected]>
---
block/blk-lib.c | 6 +++++-
block/fops.c | 6 ++++--
drivers/block/loop.c | 35 +++++++++++++++++++++++++++++------
include/linux/blk_types.h | 3 +++
include/linux/blkdev.h | 3 ++-
5 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index b1f720e198cd..d6cf572605f5 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -350,6 +350,7 @@ EXPORT_SYMBOL(blkdev_issue_secure_erase);
* @sector: start sector
* @nr_sects: number of sectors to provision
* @gfp_mask: memory allocation flags (for bio_alloc)
+ * @flags: controls detailed behavior
*
* Description:
* Issues a provision request to the block device for the range of sectors.
@@ -357,7 +358,7 @@ EXPORT_SYMBOL(blkdev_issue_secure_erase);
* underlying storage pool to allocate space for this block range.
*/
int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
- sector_t nr_sects, gfp_t gfp)
+ sector_t nr_sects, gfp_t gfp, unsigned flags)
{
sector_t bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
unsigned int max_sectors = bdev_max_provision_sectors(bdev);
@@ -380,6 +381,9 @@ int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
bio->bi_iter.bi_sector = sector;
bio->bi_iter.bi_size = req_sects << SECTOR_SHIFT;

+ if (flags & BLKDEV_PROVISION_UNSHARE_RANGE)
+ bio->bi_opf |= REQ_UNSHARE;
+
sector += req_sects;
nr_sects -= req_sects;
if (!nr_sects) {
diff --git a/block/fops.c b/block/fops.c
index 99b24bd9d461..dd442b6f0486 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -782,8 +782,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
case FALLOC_FL_UNSHARE_RANGE:
case FALLOC_FL_KEEP_SIZE:
case FALLOC_FL_UNSHARE_RANGE | FALLOC_FL_KEEP_SIZE:
- error = blkdev_issue_provision(bdev, start >> SECTOR_SHIFT,
- len >> SECTOR_SHIFT, GFP_KERNEL);
+ error = blkdev_issue_provision(
+ bdev, start >> SECTOR_SHIFT, len >> SECTOR_SHIFT, GFP_KERNEL,
+ (mode & FALLOC_FL_UNSHARE_RANGE) ?
+ BLKDEV_PROVISION_UNSHARE_RANGE : 0);
break;
case FALLOC_FL_ZERO_RANGE:
case FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE:
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index abb4dddbd4fd..f30479deb615 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -306,6 +306,30 @@ static int lo_read_simple(struct loop_device *lo, struct request *rq,
return 0;
}

+static bool validate_fallocate_mode(struct loop_device *lo, int mode)
+{
+ bool ret = true;
+
+ switch (mode) {
+ case FALLOC_FL_PUNCH_HOLE:
+ case FALLOC_FL_ZERO_RANGE:
+ if (!bdev_max_discard_sectors(lo->lo_device))
+ ret = false;
+ break;
+ case 0:
+ case FALLOC_FL_UNSHARE_RANGE:
+ if (!bdev_max_provision_sectors(lo->lo_device))
+ ret = false;
+ break;
+
+ default:
+ ret = false;
+ }
+
+ return ret;
+}
+
+
static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
int mode)
{
@@ -316,11 +340,7 @@ static int lo_fallocate(struct loop_device *lo, struct request *rq, loff_t pos,
struct file *file = lo->lo_backing_file;
int ret;

- if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE) &&
- !bdev_max_discard_sectors(lo->lo_device))
- return -EOPNOTSUPP;
-
- if (mode == 0 && !bdev_max_provision_sectors(lo->lo_device))
+ if (!validate_fallocate_mode(lo, mode))
return -EOPNOTSUPP;

mode |= FALLOC_FL_KEEP_SIZE;
@@ -493,7 +513,10 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
case REQ_OP_DISCARD:
return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
case REQ_OP_PROVISION:
- return lo_fallocate(lo, rq, pos, 0);
+ return lo_fallocate(lo, rq, pos,
+ (rq->cmd_flags & REQ_UNSHARE) ?
+ FALLOC_FL_UNSHARE_RANGE :
+ 0);
case REQ_OP_WRITE:
if (cmd->use_aio)
return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index e55828ddfafe..f16187ae4c4a 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -430,6 +430,8 @@ enum req_flag_bits {
*/
/* for REQ_OP_WRITE_ZEROES: */
__REQ_NOUNMAP, /* do not free blocks when zeroing */
+ /* for REQ_OP_PROVISION: */
+ __REQ_UNSHARE, /* unshare blocks */

__REQ_NR_BITS, /* stops here */
};
@@ -458,6 +460,7 @@ enum req_flag_bits {
#define REQ_FS_PRIVATE (__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)

#define REQ_NOUNMAP (__force blk_opf_t)(1ULL << __REQ_NOUNMAP)
+#define REQ_UNSHARE (__force blk_opf_t)(1ULL << __REQ_UNSHARE)

#define REQ_FAILFAST_MASK \
(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index dcae5538f99a..0f88ccbde12f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1042,10 +1042,11 @@ int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp);

extern int blkdev_issue_provision(struct block_device *bdev, sector_t sector,
- sector_t nr_sects, gfp_t gfp_mask);
+ sector_t nr_sects, gfp_t gfp_mask, unsigned int flags);

#define BLKDEV_ZERO_NOUNMAP (1 << 0) /* do not free blocks */
#define BLKDEV_ZERO_NOFALLBACK (1 << 1) /* don't write explicit zeroes */
+#define BLKDEV_PROVISION_UNSHARE_RANGE (1 << 2) /* unshare range on provision */

extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
--
2.42.0.609.gbb76f46606-goog

2023-10-08 23:38:05

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v8 3/5] loop: Add support for provision requests

On Fri, Oct 06, 2023 at 06:28:15PM -0700, Sarthak Kukreti wrote:
> Add support for provision requests to loopback devices.
> Loop devices will configure provision support based on
> whether the underlying block device/file can support
> the provision request and upon receiving a provision bio,
> will map it to the backing device/storage. For loop devices
> over files, a REQ_OP_PROVISION request will translate to
> an fallocate mode 0 call on the backing file.
>
> Signed-off-by: Sarthak Kukreti <[email protected]>
> Signed-off-by: Mike Snitzer <[email protected]>


Hmmmm.

This doesn't actually implement the required semantics of
REQ_PROVISION. Yes, it passes the command to the filesystem
fallocate() implementation, but fallocate() at the filesystem level
does not have the same semantics as REQ_PROVISION.

i.e. at the filesystem level, fallocate() only guarantees the next
write to the provisioned range will succeed without ENOSPC, it does
not guarantee *every* write to the range will succeed without
ENOSPC. If someone clones the loop file while it is in use (i.e.
snapshots it via cp --reflink) then all guarantees that the next
write to a provisioned LBA range will succeed without ENOSPC are
voided.

So while this will work for basic testing that the filesystem is
issuing REQ_PROVISION based IO correctly, it can't actually be used
for hosting production filesystems that need full REQ_PROVISION
guarantees when the loop device backing file is independently
shapshotted via FICLONE....

At minimuim, this set of implementation constraints needs tobe
documented somewhere...

-Dave.
--
Dave Chinner
[email protected]

2023-10-08 23:51:26

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v8 0/5] Introduce provisioning primitives

On Fri, Oct 06, 2023 at 06:28:12PM -0700, Sarthak Kukreti wrote:
> Hi,
>
> This patch series is version 8 of the patch series to introduce
> block-level provisioning mechanism (original [1]), which is useful for provisioning
> space across thinly provisioned storage architectures (loop devices
> backed by sparse files, dm-thin devices, virtio-blk). This series has
> minimal changes over v7[2].
>
> This patch series is rebased from the linux-dm/dm-6.5-provision-support [1] on to
> (cac405a3bfa2 Merge tag 'for-6.6-rc3-tag'). In addition, there's an
> additional patch to allow passing through an unshare intent via REQ_OP_PROVISION
> (suggested by Darrick in [4]).

The XFS patches I just posted were smoke tested a while back against
loop devices and then forward ported to this patchset. Good for
testing that userspace driven file preallocation gets propagated by
the filesystem down to the backing device correctly and that
subsequent IO to the file then does the right thing (e.g. fio
testing using fallocate() to set up the files being written to)....

-Dave.
--
Dave Chinner
[email protected]

2023-10-10 22:43:35

by Sarthak Kukreti

[permalink] [raw]
Subject: Re: [PATCH v8 3/5] loop: Add support for provision requests

On Sun, Oct 8, 2023 at 4:37 PM Dave Chinner <[email protected]> wrote:
>
> On Fri, Oct 06, 2023 at 06:28:15PM -0700, Sarthak Kukreti wrote:
> > Add support for provision requests to loopback devices.
> > Loop devices will configure provision support based on
> > whether the underlying block device/file can support
> > the provision request and upon receiving a provision bio,
> > will map it to the backing device/storage. For loop devices
> > over files, a REQ_OP_PROVISION request will translate to
> > an fallocate mode 0 call on the backing file.
> >
> > Signed-off-by: Sarthak Kukreti <[email protected]>
> > Signed-off-by: Mike Snitzer <[email protected]>
>
>
> Hmmmm.
>
> This doesn't actually implement the required semantics of
> REQ_PROVISION. Yes, it passes the command to the filesystem
> fallocate() implementation, but fallocate() at the filesystem level
> does not have the same semantics as REQ_PROVISION.
>
> i.e. at the filesystem level, fallocate() only guarantees the next
> write to the provisioned range will succeed without ENOSPC, it does
> not guarantee *every* write to the range will succeed without
> ENOSPC. If someone clones the loop file while it is in use (i.e.
> snapshots it via cp --reflink) then all guarantees that the next
> write to a provisioned LBA range will succeed without ENOSPC are
> voided.
>
> So while this will work for basic testing that the filesystem is
> issuing REQ_PROVISION based IO correctly, it can't actually be used
> for hosting production filesystems that need full REQ_PROVISION
> guarantees when the loop device backing file is independently
> shapshotted via FICLONE....
>
> At minimuim, this set of implementation constraints needs tobe
> documented somewhere...
>
Fair point. I wanted to have a separate fallocate() mode
(FALLOC_FL_PROVISION) in the earlier series of the patchset so that we
can distinguish between a provision request and a regular fallocate()
call; I dropped it from the series after feedback that the default
case should suffice. But this might be one of the cases where we need
an explicit intent that we want to provision space.

Given a separate FALLOC_FL_PROVISION mode in the scenario you
mentioned, the filesystem could copy previously 'provisioned' blocks
to new blocks (which implicitly provisions them) or reserve blocks for
use (and passing through REQ_OP_PROVISION below). That also means that
the filesystem should track 'provisioned' blocks and take appropriate
actions to ensure the provisioning guarantees.

For filesystems without copy-on-write semantics (eg. ext4),
REQ_OP_PROVISION should still be equivalent to mode == 0.

For now, I'll add the details to the commit message and the loop
driver code (side note: should there be device documentation on the
loop device driver?). WDYT?

Best
Sarthak

> -Dave.
> --
> Dave Chinner
> [email protected]

2023-10-10 23:59:57

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v8 3/5] loop: Add support for provision requests

On Tue, Oct 10, 2023 at 03:43:10PM -0700, Sarthak Kukreti wrote:
> On Sun, Oct 8, 2023 at 4:37 PM Dave Chinner <[email protected]> wrote:
> >
> > On Fri, Oct 06, 2023 at 06:28:15PM -0700, Sarthak Kukreti wrote:
> > > Add support for provision requests to loopback devices.
> > > Loop devices will configure provision support based on
> > > whether the underlying block device/file can support
> > > the provision request and upon receiving a provision bio,
> > > will map it to the backing device/storage. For loop devices
> > > over files, a REQ_OP_PROVISION request will translate to
> > > an fallocate mode 0 call on the backing file.
> > >
> > > Signed-off-by: Sarthak Kukreti <[email protected]>
> > > Signed-off-by: Mike Snitzer <[email protected]>
> >
> >
> > Hmmmm.
> >
> > This doesn't actually implement the required semantics of
> > REQ_PROVISION. Yes, it passes the command to the filesystem
> > fallocate() implementation, but fallocate() at the filesystem level
> > does not have the same semantics as REQ_PROVISION.
> >
> > i.e. at the filesystem level, fallocate() only guarantees the next
> > write to the provisioned range will succeed without ENOSPC, it does
> > not guarantee *every* write to the range will succeed without
> > ENOSPC. If someone clones the loop file while it is in use (i.e.
> > snapshots it via cp --reflink) then all guarantees that the next
> > write to a provisioned LBA range will succeed without ENOSPC are
> > voided.
> >
> > So while this will work for basic testing that the filesystem is
> > issuing REQ_PROVISION based IO correctly, it can't actually be used
> > for hosting production filesystems that need full REQ_PROVISION
> > guarantees when the loop device backing file is independently
> > shapshotted via FICLONE....
> >
> > At minimuim, this set of implementation constraints needs tobe
> > documented somewhere...
> >
> Fair point. I wanted to have a separate fallocate() mode
> (FALLOC_FL_PROVISION) in the earlier series of the patchset so that we
> can distinguish between a provision request and a regular fallocate()
> call; I dropped it from the series after feedback that the default
> case should suffice. But this might be one of the cases where we need
> an explicit intent that we want to provision space.

ISTR that I commented that filesystems like XFS can't implement
REQ_PROVISION semantics for extents without on-disk format
changes. Hence that needs to happen before we expose a new API to
userspace....

> Given a separate FALLOC_FL_PROVISION mode in the scenario you
> mentioned, the filesystem could copy previously 'provisioned' blocks
> to new blocks (which implicitly provisions them) or reserve blocks for
> use (and passing through REQ_OP_PROVISION below). That also means that
> the filesystem should track 'provisioned' blocks and take appropriate
> actions to ensure the provisioning guarantees.

Yes, tracking provisioned ranges persistently and the reservations
they require needs on-disk filesytem format changes compared to just
preallocating space. None of this functionality currently exists in
any filesystem that supports shared extents, and it's a fairly
significant chunk of development work to support it.

Nobody has planned to do this sort of complex surgery to XFS at
this point in time. I doubt that anyone on the btrfs side of
things is really even following this discussion because this is
largely for block device thinp and snapshot support
and btrfs just doesn't care about that.

> For filesystems without copy-on-write semantics (eg. ext4),
> REQ_OP_PROVISION should still be equivalent to mode == 0.

Well, yes. This is the same situation as "for non-sparse block
devices, REQ_PROVISION can just be ignored." This is not an
interesting use case, nor a use case that the functionality or APIs
should be designed around.

-Dave.
--
Dave Chinner
[email protected]

2023-10-11 00:07:16

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v8 5/5] block: Pass unshare intent via REQ_OP_PROVISION

On Tue, Oct 10, 2023 at 03:42:39PM -0700, Sarthak Kukreti wrote:
> On Sun, Oct 8, 2023 at 4:27 PM Dave Chinner <[email protected]> wrote:
> >
> > On Fri, Oct 06, 2023 at 06:28:17PM -0700, Sarthak Kukreti wrote:
> > > Allow REQ_OP_PROVISION to pass in an extra REQ_UNSHARE bit to
> > > annotate unshare requests to underlying layers. Layers that support
> > > FALLOC_FL_UNSHARE will be able to use this as an indicator of which
> > > fallocate() mode to use.
> > >
> > > Suggested-by: Darrick J. Wong <[email protected]>
> > > Signed-off-by: Sarthak Kukreti <[email protected]>
> > > ---
> > > block/blk-lib.c | 6 +++++-
> > > block/fops.c | 6 ++++--
> > > drivers/block/loop.c | 35 +++++++++++++++++++++++++++++------
> > > include/linux/blk_types.h | 3 +++
> > > include/linux/blkdev.h | 3 ++-
> > > 5 files changed, 43 insertions(+), 10 deletions(-)
> >
> > I have no idea how filesystems (or even userspace applications, for
> > that matter) are supposed to use this - they have no idea if the
> > underlying block device has shared blocks for LBA ranges it already
> > has allocated and provisioned. IOWs, I don't know waht the semantics
> > of this function is, it is not documented anywhere, and there is no
> > use case present that tells me how it might get used.
> >
> > Yes, unshare at the file level means the filesystem tries to break
> > internal data extent sharing, but if the block layers or backing
> > devices are doing deduplication and sharing unknown to the
> > application or filesystem, how do they ever know that this operation
> > might need to be performed? In what cases do we need to be able to
> > unshare block device ranges, and how is that different to the
> > guarantees that REQ_PROVISION is already supposed to give for
> > provisioned ranges that are then subsequently shared by the block
> > device (e.g. by snapshots)?
> >
> > Also, from an API perspective, this is an "unshare" data operation,
> > not a "provision" operation. Hence I'd suggest that the API should
> > be blkdev_issue_unshare() rather than optional behaviour to
> > _provision() which - before this patch - had clear and well defined
> > meaning....
> >
> Fair points, the intent from the conversation with Darrick was the
> addition of support for FALLOC_FL_UNSHARE_RANGE in patch 2 of v4
> (originally suggested by Brian Forster in [1]): if we allow
> fallocate(UNSHARE_RANGE) on a loop device (ex. for creating a
> snapshot, similar in nature to the FICLONE example you mentioned on
> the loop patch), we'd (ideally) want to pass it through to the
> underlying layers and let them figure out what to do with it. But it
> is only for situations where we are explicitly know what the
> underlying layers are and what's the mecha
>
> I agree though that it clouds the API a bit and I don't think it
> necessarily needs to be a part of the initial patch series: for now, I
> propose keeping just mode zero (and FALLOC_FL_KEEP_SIZE) handling in
> the block series patch and drop this patch for now. WDYT?

Until we have an actual use case for unsharing (which explicitly
breaks extent sharing) as opposed to provisioning (which ensures
overwrites always succeed regardless of extent state) then let's
leave it out of this -provisioning- series.

-Dave.
--
Dave Chinner
[email protected]

2023-10-11 00:13:29

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v8 0/5] Introduce provisioning primitives

On Tue, Oct 10, 2023 at 03:42:53PM -0700, Sarthak Kukreti wrote:
> On Sun, Oct 8, 2023 at 4:50 PM Dave Chinner <[email protected]> wrote:
> >
> > On Fri, Oct 06, 2023 at 06:28:12PM -0700, Sarthak Kukreti wrote:
> > > Hi,
> > >
> > > This patch series is version 8 of the patch series to introduce
> > > block-level provisioning mechanism (original [1]), which is useful for provisioning
> > > space across thinly provisioned storage architectures (loop devices
> > > backed by sparse files, dm-thin devices, virtio-blk). This series has
> > > minimal changes over v7[2].
> > >
> > > This patch series is rebased from the linux-dm/dm-6.5-provision-support [1] on to
> > > (cac405a3bfa2 Merge tag 'for-6.6-rc3-tag'). In addition, there's an
> > > additional patch to allow passing through an unshare intent via REQ_OP_PROVISION
> > > (suggested by Darrick in [4]).
> >
> > The XFS patches I just posted were smoke tested a while back against
> > loop devices and then forward ported to this patchset. Good for
> > testing that userspace driven file preallocation gets propagated by
> > the filesystem down to the backing device correctly and that
> > subsequent IO to the file then does the right thing (e.g. fio
> > testing using fallocate() to set up the files being written to)....
> >
>
> Thanks! I've been testing with a WIP patch for ext4, I'll give your
> patches a try. Once we are closer to submitting the filesystem
> support, we can formalize the test into an xfstest (sparse file + loop
> + filesystem, fallocate() file, check the size of the underlying
> sparse file).

That's not really a valid test - there are so many optional filesystem
behaviours that can change the layout of the backing file for the
same upper filesystem operations.

What we actually need to test is the ENOSPC guarantees, not that
fallocate has been called by the loop device. i.e. that ENOSPC is
propagated from the underlying filesystem though the loop device to
the application running on the upper filesystem appropriately. e.g.
when the lower filesystem is at ENOSPC, the writes into provisioned
space in the loop device backing file continue to succeed without
ENOSPC being reported to the upper filesystem.

i.e. this needs to be tested from the perspective of the API
presented to the upper filesystem, not by running an upper fs
operation and then trying to infer correct behaviour by peering at
the state of the lower filesystem...

Cheers,

Dave.
--
Dave Chinner
[email protected]