2020-01-22 11:00:15

by Kirill Tkhai

[permalink] [raw]
Subject: [PATCH v5 0/6] block: Introduce REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES

(was "[PATCH block v2 0/3] block: Introduce REQ_NOZERO flag
for REQ_OP_WRITE_ZEROES operation";
was "[PATCH RFC 0/3] block,ext4: Introduce REQ_OP_ASSIGN_RANGE
to reflect extents allocation in block device internals")

v5: Kill dm|md patch, which disables REQ_ALLOCATE for these devices.
Disable REQ_ALLOCATE for all stacking devices instead of this.

v4: Correct argument for mddev_check_write_zeroes().

v3: Rename REQ_NOZERO to REQ_ALLOCATE.
Split helpers to separate patches.
Add a patch, disabling max_allocate_sectors inheritance for dm.

v2: Introduce new flag for REQ_OP_WRITE_ZEROES instead of
introduction a new operation as suggested by Martin K. Petersen.
Removed ext4-related patch to focus on block changes
for now.

Information about continuous extent placement may be useful
for some block devices. Say, distributed network filesystems,
which provide block device interface, may use this information
for better blocks placement over the nodes in their cluster,
and for better performance. Block devices, which map a file
on another filesystem (loop), may request the same length extent
on underlining filesystem for less fragmentation and for batching
allocation requests. Also, hypervisors like QEMU may use this
information for optimization of cluster allocations.

This patchset introduces REQ_ALLOCATE flag for REQ_OP_WRITE_ZEROES,
which makes a block device to allocate blocks instead of actual
blocks zeroing. This may be used for forwarding user's fallocate(0)
requests into block device internals. E.g., in loop driver this
will result in allocation extents in backing-file, so subsequent
write won't fail by the reason of no available space. Distributed
network filesystems will be able to assign specific servers for
specific extents, so subsequent write will be more efficient.

Patches [1-3/6] are preparation on helper functions, patch [4/6]
introduces REQ_ALLOCATE flag and implements all the logic,
patch [5/6] adds one more helper, patch [6/6] adds loop
as the first user of the flag.

Note, that here is only block-related patches, example of usage
for ext4 with a performance numbers may be seen in [1].

[1] https://lore.kernel.org/linux-ext4/157599697369.12112.10138136904533871162.stgit@localhost.localdomain/T/#me5bdd5cc313e14de615d81bea214f355ae975db0
---

Kirill Tkhai (6):
block: Add @flags argument to bdev_write_zeroes_sectors()
block: Pass op_flags into blk_queue_get_max_sectors()
block: Introduce blk_queue_get_max_write_zeroes_sectors()
block: Add support for REQ_ALLOCATE flag
block: Add blk_queue_max_allocate_sectors()
loop: Add support for REQ_ALLOCATE


block/blk-core.c | 6 +++---
block/blk-lib.c | 17 ++++++++++-------
block/blk-merge.c | 9 ++++++---
block/blk-settings.c | 17 +++++++++++++++++
drivers/block/loop.c | 15 ++++++++++++---
drivers/md/dm-kcopyd.c | 2 +-
drivers/target/target_core_iblock.c | 4 ++--
fs/block_dev.c | 4 ++++
include/linux/blk_types.h | 5 ++++-
include/linux/blkdev.h | 34 ++++++++++++++++++++++++++--------
10 files changed, 85 insertions(+), 28 deletions(-)

--
Signed-off-by: Kirill Tkhai <[email protected]>


2020-01-22 11:00:41

by Kirill Tkhai

[permalink] [raw]
Subject: [PATCH v5 2/6] block: Pass op_flags into blk_queue_get_max_sectors()

This preparation patch changes argument type, and now
the function takes full op_flags instead of just op code.

Signed-off-by: Kirill Tkhai <[email protected]>
---
block/blk-core.c | 4 ++--
include/linux/blkdev.h | 8 +++++---
2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 50a5de025d5e..ac2634bcda1f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1250,10 +1250,10 @@ EXPORT_SYMBOL(submit_bio);
static int blk_cloned_rq_check_limits(struct request_queue *q,
struct request *rq)
{
- if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, req_op(rq))) {
+ if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
printk(KERN_ERR "%s: over max size limit. (%u > %u)\n",
__func__, blk_rq_sectors(rq),
- blk_queue_get_max_sectors(q, req_op(rq)));
+ blk_queue_get_max_sectors(q, rq->cmd_flags));
return -EIO;
}

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0f1127d0b043..23a5850f35f6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -989,8 +989,10 @@ static inline struct bio_vec req_bvec(struct request *rq)
}

static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
- int op)
+ unsigned int op_flags)
{
+ int op = op_flags & REQ_OP_MASK;
+
if (unlikely(op == REQ_OP_DISCARD || op == REQ_OP_SECURE_ERASE))
return min(q->limits.max_discard_sectors,
UINT_MAX >> SECTOR_SHIFT);
@@ -1029,10 +1031,10 @@ static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
if (!q->limits.chunk_sectors ||
req_op(rq) == REQ_OP_DISCARD ||
req_op(rq) == REQ_OP_SECURE_ERASE)
- return blk_queue_get_max_sectors(q, req_op(rq));
+ return blk_queue_get_max_sectors(q, rq->cmd_flags);

return min(blk_max_size_offset(q, offset),
- blk_queue_get_max_sectors(q, req_op(rq)));
+ blk_queue_get_max_sectors(q, rq->cmd_flags));
}

static inline unsigned int blk_rq_count_bios(struct request *rq)


2020-01-22 11:00:41

by Kirill Tkhai

[permalink] [raw]
Subject: [PATCH v5 5/6] block: Add blk_queue_max_allocate_sectors()

This is a new helper to assign max_allocate_sectors
limit of block device queue.

Signed-off-by: Kirill Tkhai <[email protected]>
---
block/blk-settings.c | 13 +++++++++++++
include/linux/blkdev.h | 2 ++
2 files changed, 15 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 8d5df9d37239..24cf8fbbd125 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -259,6 +259,19 @@ void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
}
EXPORT_SYMBOL(blk_queue_max_write_zeroes_sectors);

+/**
+ * blk_queue_max_allocate_sectors - set max sectors for a single
+ * allocate request
+ * @q: the request queue for the device
+ * @max_allocate_sectors: maximum number of sectors to write per command
+ **/
+void blk_queue_max_allocate_sectors(struct request_queue *q,
+ unsigned int max_allocate_sectors)
+{
+ q->limits.max_allocate_sectors = max_allocate_sectors;
+}
+EXPORT_SYMBOL(blk_queue_max_allocate_sectors);
+
/**
* blk_queue_max_segments - set max hw segments for a request for this queue
* @q: the request queue for the device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 20c94a7f9411..249dce6dd436 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1089,6 +1089,8 @@ extern void blk_queue_max_write_same_sectors(struct request_queue *q,
unsigned int max_write_same_sectors);
extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q,
unsigned int max_write_same_sectors);
+extern void blk_queue_max_allocate_sectors(struct request_queue *q,
+ unsigned int max_allocate_sectors);
extern void blk_queue_logical_block_size(struct request_queue *, unsigned int);
extern void blk_queue_physical_block_size(struct request_queue *, unsigned int);
extern void blk_queue_alignment_offset(struct request_queue *q,


2020-01-22 11:01:16

by Kirill Tkhai

[permalink] [raw]
Subject: [PATCH v5 4/6] block: Add support for REQ_ALLOCATE flag

This adds support for REQ_ALLOCATE extension of REQ_OP_WRITE_ZEROES
operation, which encourages a block device driver to just allocate
blocks (or mark them allocated) instead of actual blocks zeroing.
REQ_ALLOCATE is aimed to be used for network filesystems providing
a block device interface. Also, block devices, which map a file
on other filesystem (like loop), may use this for less fragmentation
and batching fallocate() requests. Hypervisors like QEMU may
introduce optimizations of clusters allocations based on this.

BLKDEV_ZERO_ALLOCATE is a new corresponding flag for
blkdev_issue_zeroout().

Stacking devices start from zero max_allocate_sectors limit for now,
and the support is going to be implemented separate for each device
in the future.

Signed-off-by: Kirill Tkhai <[email protected]>
---
block/blk-lib.c | 17 ++++++++++-------
block/blk-settings.c | 4 ++++
fs/block_dev.c | 4 ++++
include/linux/blk_types.h | 5 ++++-
include/linux/blkdev.h | 13 ++++++++++---
5 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 3e38c93cfc53..9cd6f86523ba 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -214,7 +214,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
struct bio **biop, unsigned flags)
{
struct bio *bio = *biop;
- unsigned int max_write_zeroes_sectors;
+ unsigned int max_write_zeroes_sectors, req_flags = 0;
struct request_queue *q = bdev_get_queue(bdev);

if (!q)
@@ -224,18 +224,21 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
return -EPERM;

/* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
- max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
+ max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, flags);

if (max_write_zeroes_sectors == 0)
return -EOPNOTSUPP;

+ if (flags & BLKDEV_ZERO_NOUNMAP)
+ req_flags |= REQ_NOUNMAP;
+ if (flags & BLKDEV_ZERO_ALLOCATE)
+ req_flags |= REQ_ALLOCATE|REQ_NOUNMAP;
+
while (nr_sects) {
bio = blk_next_bio(bio, 0, gfp_mask);
bio->bi_iter.bi_sector = sector;
bio_set_dev(bio, bdev);
- bio->bi_opf = REQ_OP_WRITE_ZEROES;
- if (flags & BLKDEV_ZERO_NOUNMAP)
- bio->bi_opf |= REQ_NOUNMAP;
+ bio->bi_opf = REQ_OP_WRITE_ZEROES | req_flags;

if (nr_sects > max_write_zeroes_sectors) {
bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
@@ -362,7 +365,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t bs_mask;
struct bio *bio;
struct blk_plug plug;
- bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
+ bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, flags);

bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
if ((sector | nr_sects) & bs_mask)
@@ -391,7 +394,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
try_write_zeroes = false;
goto retry;
}
- if (!bdev_write_zeroes_sectors(bdev, 0)) {
+ if (!bdev_write_zeroes_sectors(bdev, flags)) {
/*
* Zeroing offload support was indicated, but the
* device reported ILLEGAL REQUEST (for some devices
diff --git a/block/blk-settings.c b/block/blk-settings.c
index c8eda2e7b91e..8d5df9d37239 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->chunk_sectors = 0;
lim->max_write_same_sectors = 0;
lim->max_write_zeroes_sectors = 0;
+ lim->max_allocate_sectors = 0;
lim->max_discard_sectors = 0;
lim->max_hw_discard_sectors = 0;
lim->discard_granularity = 0;
@@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
lim->max_dev_sectors = UINT_MAX;
lim->max_write_same_sectors = UINT_MAX;
lim->max_write_zeroes_sectors = UINT_MAX;
+ lim->max_allocate_sectors = 0;
}
EXPORT_SYMBOL(blk_set_stacking_limits);

@@ -506,6 +508,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
b->max_write_same_sectors);
t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors,
b->max_write_zeroes_sectors);
+ t->max_allocate_sectors = min(t->max_allocate_sectors,
+ b->max_allocate_sectors);
t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);

t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 69bf2fb6f7cd..1ffef894b3bd 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -2122,6 +2122,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
GFP_KERNEL, BLKDEV_ZERO_NOFALLBACK);
break;
+ case FALLOC_FL_KEEP_SIZE:
+ error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
+ GFP_KERNEL, BLKDEV_ZERO_ALLOCATE | BLKDEV_ZERO_NOFALLBACK);
+ break;
case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
GFP_KERNEL, 0);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 70254ae11769..86accd2caa4e 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -335,7 +335,9 @@ enum req_flag_bits {

/* command specific flags for REQ_OP_WRITE_ZEROES: */
__REQ_NOUNMAP, /* do not free blocks when zeroing */
-
+ __REQ_ALLOCATE, /* only notify about allocated blocks,
+ * and do not actually zero them
+ */
__REQ_HIPRI,

/* for driver use */
@@ -362,6 +364,7 @@ enum req_flag_bits {
#define REQ_CGROUP_PUNT (1ULL << __REQ_CGROUP_PUNT)

#define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP)
+#define REQ_ALLOCATE (1ULL << __REQ_ALLOCATE)
#define REQ_HIPRI (1ULL << __REQ_HIPRI)

#define REQ_DRV (1ULL << __REQ_DRV)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 264202fa3bf8..20c94a7f9411 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -337,6 +337,7 @@ struct queue_limits {
unsigned int max_hw_discard_sectors;
unsigned int max_write_same_sectors;
unsigned int max_write_zeroes_sectors;
+ unsigned int max_allocate_sectors;
unsigned int discard_granularity;
unsigned int discard_alignment;

@@ -991,6 +992,8 @@ static inline struct bio_vec req_bvec(struct request *rq)
static inline unsigned int blk_queue_get_max_write_zeroes_sectors(
struct request_queue *q, unsigned int op_flags)
{
+ if (op_flags & REQ_ALLOCATE)
+ return q->limits.max_allocate_sectors;
return q->limits.max_write_zeroes_sectors;
}

@@ -1227,6 +1230,7 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,

#define BLKDEV_ZERO_NOUNMAP (1 << 0) /* do not free blocks */
#define BLKDEV_ZERO_NOFALLBACK (1 << 1) /* don't write explicit zeroes */
+#define BLKDEV_ZERO_ALLOCATE (1 << 2) /* allocate range of blocks */

extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
@@ -1431,10 +1435,13 @@ static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev,
{
struct request_queue *q = bdev_get_queue(bdev);

- if (q)
- return q->limits.max_write_zeroes_sectors;
+ if (!q)
+ return 0;

- return 0;
+ if (flags & BLKDEV_ZERO_ALLOCATE)
+ return q->limits.max_allocate_sectors;
+ else
+ return q->limits.max_write_zeroes_sectors;
}

static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)


2020-01-25 02:40:45

by Bob Liu

[permalink] [raw]
Subject: Re: [PATCH v5 2/6] block: Pass op_flags into blk_queue_get_max_sectors()

On 1/22/20 6:58 PM, Kirill Tkhai wrote:
> This preparation patch changes argument type, and now
> the function takes full op_flags instead of just op code.
>
> Signed-off-by: Kirill Tkhai <[email protected]>
> ---
> block/blk-core.c | 4 ++--
> include/linux/blkdev.h | 8 +++++---
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 50a5de025d5e..ac2634bcda1f 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1250,10 +1250,10 @@ EXPORT_SYMBOL(submit_bio);
> static int blk_cloned_rq_check_limits(struct request_queue *q,
> struct request *rq)
> {
> - if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, req_op(rq))) {
> + if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
> printk(KERN_ERR "%s: over max size limit. (%u > %u)\n",
> __func__, blk_rq_sectors(rq),
> - blk_queue_get_max_sectors(q, req_op(rq)));
> + blk_queue_get_max_sectors(q, rq->cmd_flags));
> return -EIO;
> }
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 0f1127d0b043..23a5850f35f6 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -989,8 +989,10 @@ static inline struct bio_vec req_bvec(struct request *rq)
> }
>
> static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
> - int op)
> + unsigned int op_flags)
> {
> + int op = op_flags & REQ_OP_MASK;
> +

Nitpick. int op = req_op(rq);

Anyway, looks good to me.
Reviewed-by: Bob Liu <[email protected]>

> if (unlikely(op == REQ_OP_DISCARD || op == REQ_OP_SECURE_ERASE))
> return min(q->limits.max_discard_sectors,
> UINT_MAX >> SECTOR_SHIFT);
> @@ -1029,10 +1031,10 @@ static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
> if (!q->limits.chunk_sectors ||
> req_op(rq) == REQ_OP_DISCARD ||
> req_op(rq) == REQ_OP_SECURE_ERASE)
> - return blk_queue_get_max_sectors(q, req_op(rq));
> + return blk_queue_get_max_sectors(q, rq->cmd_flags);
>
> return min(blk_max_size_offset(q, offset),
> - blk_queue_get_max_sectors(q, req_op(rq)));
> + blk_queue_get_max_sectors(q, rq->cmd_flags));
> }
>
> static inline unsigned int blk_rq_count_bios(struct request *rq)
>
>

2020-01-25 03:22:01

by Bob Liu

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] block: Add support for REQ_ALLOCATE flag

On 1/22/20 6:58 PM, Kirill Tkhai wrote:
> This adds support for REQ_ALLOCATE extension of REQ_OP_WRITE_ZEROES
> operation, which encourages a block device driver to just allocate
> blocks (or mark them allocated) instead of actual blocks zeroing.
> REQ_ALLOCATE is aimed to be used for network filesystems providing
> a block device interface. Also, block devices, which map a file
> on other filesystem (like loop), may use this for less fragmentation
> and batching fallocate() requests. Hypervisors like QEMU may
> introduce optimizations of clusters allocations based on this.
>
> BLKDEV_ZERO_ALLOCATE is a new corresponding flag for
> blkdev_issue_zeroout().
>
> Stacking devices start from zero max_allocate_sectors limit for now,
> and the support is going to be implemented separate for each device
> in the future.
>
> Signed-off-by: Kirill Tkhai <[email protected]>
> ---
> block/blk-lib.c | 17 ++++++++++-------
> block/blk-settings.c | 4 ++++
> fs/block_dev.c | 4 ++++
> include/linux/blk_types.h | 5 ++++-
> include/linux/blkdev.h | 13 ++++++++++---
> 5 files changed, 32 insertions(+), 11 deletions(-)
>

This patch and following two are looks fine to me.
Feel free to add.
Reviewed-by: Bob Liu <[email protected]>

> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 3e38c93cfc53..9cd6f86523ba 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -214,7 +214,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
> struct bio **biop, unsigned flags)
> {
> struct bio *bio = *biop;
> - unsigned int max_write_zeroes_sectors;
> + unsigned int max_write_zeroes_sectors, req_flags = 0;
> struct request_queue *q = bdev_get_queue(bdev);
>
> if (!q)
> @@ -224,18 +224,21 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
> return -EPERM;
>
> /* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
> - max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
> + max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, flags);
>
> if (max_write_zeroes_sectors == 0)
> return -EOPNOTSUPP;
>
> + if (flags & BLKDEV_ZERO_NOUNMAP)
> + req_flags |= REQ_NOUNMAP;
> + if (flags & BLKDEV_ZERO_ALLOCATE)
> + req_flags |= REQ_ALLOCATE|REQ_NOUNMAP;
> +
> while (nr_sects) {
> bio = blk_next_bio(bio, 0, gfp_mask);
> bio->bi_iter.bi_sector = sector;
> bio_set_dev(bio, bdev);
> - bio->bi_opf = REQ_OP_WRITE_ZEROES;
> - if (flags & BLKDEV_ZERO_NOUNMAP)
> - bio->bi_opf |= REQ_NOUNMAP;
> + bio->bi_opf = REQ_OP_WRITE_ZEROES | req_flags;
>
> if (nr_sects > max_write_zeroes_sectors) {
> bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
> @@ -362,7 +365,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> sector_t bs_mask;
> struct bio *bio;
> struct blk_plug plug;
> - bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
> + bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, flags);
>
> bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
> if ((sector | nr_sects) & bs_mask)
> @@ -391,7 +394,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> try_write_zeroes = false;
> goto retry;
> }
> - if (!bdev_write_zeroes_sectors(bdev, 0)) {
> + if (!bdev_write_zeroes_sectors(bdev, flags)) {
> /*
> * Zeroing offload support was indicated, but the
> * device reported ILLEGAL REQUEST (for some devices
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index c8eda2e7b91e..8d5df9d37239 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim)
> lim->chunk_sectors = 0;
> lim->max_write_same_sectors = 0;
> lim->max_write_zeroes_sectors = 0;
> + lim->max_allocate_sectors = 0;
> lim->max_discard_sectors = 0;
> lim->max_hw_discard_sectors = 0;
> lim->discard_granularity = 0;
> @@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
> lim->max_dev_sectors = UINT_MAX;
> lim->max_write_same_sectors = UINT_MAX;
> lim->max_write_zeroes_sectors = UINT_MAX;
> + lim->max_allocate_sectors = 0;
> }
> EXPORT_SYMBOL(blk_set_stacking_limits);
>
> @@ -506,6 +508,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> b->max_write_same_sectors);
> t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors,
> b->max_write_zeroes_sectors);
> + t->max_allocate_sectors = min(t->max_allocate_sectors,
> + b->max_allocate_sectors);
> t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
>
> t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 69bf2fb6f7cd..1ffef894b3bd 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -2122,6 +2122,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
> error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
> GFP_KERNEL, BLKDEV_ZERO_NOFALLBACK);
> break;
> + case FALLOC_FL_KEEP_SIZE:
> + error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
> + GFP_KERNEL, BLKDEV_ZERO_ALLOCATE | BLKDEV_ZERO_NOFALLBACK);
> + break;
> case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
> error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
> GFP_KERNEL, 0);
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 70254ae11769..86accd2caa4e 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -335,7 +335,9 @@ enum req_flag_bits {
>
> /* command specific flags for REQ_OP_WRITE_ZEROES: */
> __REQ_NOUNMAP, /* do not free blocks when zeroing */
> -
> + __REQ_ALLOCATE, /* only notify about allocated blocks,
> + * and do not actually zero them
> + */
> __REQ_HIPRI,
>
> /* for driver use */
> @@ -362,6 +364,7 @@ enum req_flag_bits {
> #define REQ_CGROUP_PUNT (1ULL << __REQ_CGROUP_PUNT)
>
> #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP)
> +#define REQ_ALLOCATE (1ULL << __REQ_ALLOCATE)
> #define REQ_HIPRI (1ULL << __REQ_HIPRI)
>
> #define REQ_DRV (1ULL << __REQ_DRV)
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 264202fa3bf8..20c94a7f9411 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -337,6 +337,7 @@ struct queue_limits {
> unsigned int max_hw_discard_sectors;
> unsigned int max_write_same_sectors;
> unsigned int max_write_zeroes_sectors;
> + unsigned int max_allocate_sectors;
> unsigned int discard_granularity;
> unsigned int discard_alignment;
>
> @@ -991,6 +992,8 @@ static inline struct bio_vec req_bvec(struct request *rq)
> static inline unsigned int blk_queue_get_max_write_zeroes_sectors(
> struct request_queue *q, unsigned int op_flags)
> {
> + if (op_flags & REQ_ALLOCATE)
> + return q->limits.max_allocate_sectors;
> return q->limits.max_write_zeroes_sectors;
> }
>
> @@ -1227,6 +1230,7 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>
> #define BLKDEV_ZERO_NOUNMAP (1 << 0) /* do not free blocks */
> #define BLKDEV_ZERO_NOFALLBACK (1 << 1) /* don't write explicit zeroes */
> +#define BLKDEV_ZERO_ALLOCATE (1 << 2) /* allocate range of blocks */
>
> extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
> sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
> @@ -1431,10 +1435,13 @@ static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev,
> {
> struct request_queue *q = bdev_get_queue(bdev);
>
> - if (q)
> - return q->limits.max_write_zeroes_sectors;
> + if (!q)
> + return 0;
>
> - return 0;
> + if (flags & BLKDEV_ZERO_ALLOCATE)
> + return q->limits.max_allocate_sectors;
> + else
> + return q->limits.max_write_zeroes_sectors;
> }
>
> static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
>
>

2020-01-27 08:54:08

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] block: Add support for REQ_ALLOCATE flag

On 25.01.2020 06:18, Bob Liu wrote:
> On 1/22/20 6:58 PM, Kirill Tkhai wrote:
>> This adds support for REQ_ALLOCATE extension of REQ_OP_WRITE_ZEROES
>> operation, which encourages a block device driver to just allocate
>> blocks (or mark them allocated) instead of actual blocks zeroing.
>> REQ_ALLOCATE is aimed to be used for network filesystems providing
>> a block device interface. Also, block devices, which map a file
>> on other filesystem (like loop), may use this for less fragmentation
>> and batching fallocate() requests. Hypervisors like QEMU may
>> introduce optimizations of clusters allocations based on this.
>>
>> BLKDEV_ZERO_ALLOCATE is a new corresponding flag for
>> blkdev_issue_zeroout().
>>
>> Stacking devices start from zero max_allocate_sectors limit for now,
>> and the support is going to be implemented separate for each device
>> in the future.
>>
>> Signed-off-by: Kirill Tkhai <[email protected]>
>> ---
>> block/blk-lib.c | 17 ++++++++++-------
>> block/blk-settings.c | 4 ++++
>> fs/block_dev.c | 4 ++++
>> include/linux/blk_types.h | 5 ++++-
>> include/linux/blkdev.h | 13 ++++++++++---
>> 5 files changed, 32 insertions(+), 11 deletions(-)
>>
>
> This patch and following two are looks fine to me.
> Feel free to add.
> Reviewed-by: Bob Liu <[email protected]>

Thank you, Bob.

Kirill

>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>> index 3e38c93cfc53..9cd6f86523ba 100644
>> --- a/block/blk-lib.c
>> +++ b/block/blk-lib.c
>> @@ -214,7 +214,7 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>> struct bio **biop, unsigned flags)
>> {
>> struct bio *bio = *biop;
>> - unsigned int max_write_zeroes_sectors;
>> + unsigned int max_write_zeroes_sectors, req_flags = 0;
>> struct request_queue *q = bdev_get_queue(bdev);
>>
>> if (!q)
>> @@ -224,18 +224,21 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
>> return -EPERM;
>>
>> /* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
>> - max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, 0);
>> + max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev, flags);
>>
>> if (max_write_zeroes_sectors == 0)
>> return -EOPNOTSUPP;
>>
>> + if (flags & BLKDEV_ZERO_NOUNMAP)
>> + req_flags |= REQ_NOUNMAP;
>> + if (flags & BLKDEV_ZERO_ALLOCATE)
>> + req_flags |= REQ_ALLOCATE|REQ_NOUNMAP;
>> +
>> while (nr_sects) {
>> bio = blk_next_bio(bio, 0, gfp_mask);
>> bio->bi_iter.bi_sector = sector;
>> bio_set_dev(bio, bdev);
>> - bio->bi_opf = REQ_OP_WRITE_ZEROES;
>> - if (flags & BLKDEV_ZERO_NOUNMAP)
>> - bio->bi_opf |= REQ_NOUNMAP;
>> + bio->bi_opf = REQ_OP_WRITE_ZEROES | req_flags;
>>
>> if (nr_sects > max_write_zeroes_sectors) {
>> bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
>> @@ -362,7 +365,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>> sector_t bs_mask;
>> struct bio *bio;
>> struct blk_plug plug;
>> - bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, 0);
>> + bool try_write_zeroes = !!bdev_write_zeroes_sectors(bdev, flags);
>>
>> bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
>> if ((sector | nr_sects) & bs_mask)
>> @@ -391,7 +394,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>> try_write_zeroes = false;
>> goto retry;
>> }
>> - if (!bdev_write_zeroes_sectors(bdev, 0)) {
>> + if (!bdev_write_zeroes_sectors(bdev, flags)) {
>> /*
>> * Zeroing offload support was indicated, but the
>> * device reported ILLEGAL REQUEST (for some devices
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index c8eda2e7b91e..8d5df9d37239 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -48,6 +48,7 @@ void blk_set_default_limits(struct queue_limits *lim)
>> lim->chunk_sectors = 0;
>> lim->max_write_same_sectors = 0;
>> lim->max_write_zeroes_sectors = 0;
>> + lim->max_allocate_sectors = 0;
>> lim->max_discard_sectors = 0;
>> lim->max_hw_discard_sectors = 0;
>> lim->discard_granularity = 0;
>> @@ -83,6 +84,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>> lim->max_dev_sectors = UINT_MAX;
>> lim->max_write_same_sectors = UINT_MAX;
>> lim->max_write_zeroes_sectors = UINT_MAX;
>> + lim->max_allocate_sectors = 0;
>> }
>> EXPORT_SYMBOL(blk_set_stacking_limits);
>>
>> @@ -506,6 +508,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>> b->max_write_same_sectors);
>> t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors,
>> b->max_write_zeroes_sectors);
>> + t->max_allocate_sectors = min(t->max_allocate_sectors,
>> + b->max_allocate_sectors);
>> t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
>>
>> t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
>> diff --git a/fs/block_dev.c b/fs/block_dev.c
>> index 69bf2fb6f7cd..1ffef894b3bd 100644
>> --- a/fs/block_dev.c
>> +++ b/fs/block_dev.c
>> @@ -2122,6 +2122,10 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
>> error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
>> GFP_KERNEL, BLKDEV_ZERO_NOFALLBACK);
>> break;
>> + case FALLOC_FL_KEEP_SIZE:
>> + error = blkdev_issue_zeroout(bdev, start >> 9, len >> 9,
>> + GFP_KERNEL, BLKDEV_ZERO_ALLOCATE | BLKDEV_ZERO_NOFALLBACK);
>> + break;
>> case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
>> error = blkdev_issue_discard(bdev, start >> 9, len >> 9,
>> GFP_KERNEL, 0);
>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>> index 70254ae11769..86accd2caa4e 100644
>> --- a/include/linux/blk_types.h
>> +++ b/include/linux/blk_types.h
>> @@ -335,7 +335,9 @@ enum req_flag_bits {
>>
>> /* command specific flags for REQ_OP_WRITE_ZEROES: */
>> __REQ_NOUNMAP, /* do not free blocks when zeroing */
>> -
>> + __REQ_ALLOCATE, /* only notify about allocated blocks,
>> + * and do not actually zero them
>> + */
>> __REQ_HIPRI,
>>
>> /* for driver use */
>> @@ -362,6 +364,7 @@ enum req_flag_bits {
>> #define REQ_CGROUP_PUNT (1ULL << __REQ_CGROUP_PUNT)
>>
>> #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP)
>> +#define REQ_ALLOCATE (1ULL << __REQ_ALLOCATE)
>> #define REQ_HIPRI (1ULL << __REQ_HIPRI)
>>
>> #define REQ_DRV (1ULL << __REQ_DRV)
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 264202fa3bf8..20c94a7f9411 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -337,6 +337,7 @@ struct queue_limits {
>> unsigned int max_hw_discard_sectors;
>> unsigned int max_write_same_sectors;
>> unsigned int max_write_zeroes_sectors;
>> + unsigned int max_allocate_sectors;
>> unsigned int discard_granularity;
>> unsigned int discard_alignment;
>>
>> @@ -991,6 +992,8 @@ static inline struct bio_vec req_bvec(struct request *rq)
>> static inline unsigned int blk_queue_get_max_write_zeroes_sectors(
>> struct request_queue *q, unsigned int op_flags)
>> {
>> + if (op_flags & REQ_ALLOCATE)
>> + return q->limits.max_allocate_sectors;
>> return q->limits.max_write_zeroes_sectors;
>> }
>>
>> @@ -1227,6 +1230,7 @@ extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>>
>> #define BLKDEV_ZERO_NOUNMAP (1 << 0) /* do not free blocks */
>> #define BLKDEV_ZERO_NOFALLBACK (1 << 1) /* don't write explicit zeroes */
>> +#define BLKDEV_ZERO_ALLOCATE (1 << 2) /* allocate range of blocks */
>>
>> extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>> sector_t nr_sects, gfp_t gfp_mask, struct bio **biop,
>> @@ -1431,10 +1435,13 @@ static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev,
>> {
>> struct request_queue *q = bdev_get_queue(bdev);
>>
>> - if (q)
>> - return q->limits.max_write_zeroes_sectors;
>> + if (!q)
>> + return 0;
>>
>> - return 0;
>> + if (flags & BLKDEV_ZERO_ALLOCATE)
>> + return q->limits.max_allocate_sectors;
>> + else
>> + return q->limits.max_write_zeroes_sectors;
>> }
>>
>> static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
>>
>>
>

2020-01-27 11:15:40

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH v5 2/6] block: Pass op_flags into blk_queue_get_max_sectors()

On 25.01.2020 05:37, Bob Liu wrote:
> On 1/22/20 6:58 PM, Kirill Tkhai wrote:
>> This preparation patch changes argument type, and now
>> the function takes full op_flags instead of just op code.
>>
>> Signed-off-by: Kirill Tkhai <[email protected]>
>> ---
>> block/blk-core.c | 4 ++--
>> include/linux/blkdev.h | 8 +++++---
>> 2 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 50a5de025d5e..ac2634bcda1f 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1250,10 +1250,10 @@ EXPORT_SYMBOL(submit_bio);
>> static int blk_cloned_rq_check_limits(struct request_queue *q,
>> struct request *rq)
>> {
>> - if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, req_op(rq))) {
>> + if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
>> printk(KERN_ERR "%s: over max size limit. (%u > %u)\n",
>> __func__, blk_rq_sectors(rq),
>> - blk_queue_get_max_sectors(q, req_op(rq)));
>> + blk_queue_get_max_sectors(q, rq->cmd_flags));
>> return -EIO;
>> }
>>
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 0f1127d0b043..23a5850f35f6 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -989,8 +989,10 @@ static inline struct bio_vec req_bvec(struct request *rq)
>> }
>>
>> static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
>> - int op)
>> + unsigned int op_flags)
>> {
>> + int op = op_flags & REQ_OP_MASK;
>> +
>
> Nitpick. int op = req_op(rq);
>
> Anyway, looks good to me.
> Reviewed-by: Bob Liu <[email protected]>

Thanks, Bob. I'll merge this nitpick and your "Reviewed-by" at next resend.
It will be after merge window is closed, and new patches are welcomed.

>> if (unlikely(op == REQ_OP_DISCARD || op == REQ_OP_SECURE_ERASE))
>> return min(q->limits.max_discard_sectors,
>> UINT_MAX >> SECTOR_SHIFT);
>> @@ -1029,10 +1031,10 @@ static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
>> if (!q->limits.chunk_sectors ||
>> req_op(rq) == REQ_OP_DISCARD ||
>> req_op(rq) == REQ_OP_SECURE_ERASE)
>> - return blk_queue_get_max_sectors(q, req_op(rq));
>> + return blk_queue_get_max_sectors(q, rq->cmd_flags);
>>
>> return min(blk_max_size_offset(q, offset),
>> - blk_queue_get_max_sectors(q, req_op(rq)));
>> + blk_queue_get_max_sectors(q, rq->cmd_flags));
>> }
>>
>> static inline unsigned int blk_rq_count_bios(struct request *rq)
>>
>>
>