2022-11-23 06:42:16

by Nitesh Shetty

[permalink] [raw]
Subject: [PATCH v5 00/10] Implement copy offload support

The patch series covers the points discussed in November 2021 virtual
call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0].
We have covered the initial agreed requirements in this patchset and
further additional features suggested by community.
Patchset borrows Mikulas's token based approach for 2 bdev
implementation.

This is on top of our previous patchset v4[1].

Overall series supports:
========================
1. Driver
- NVMe Copy command (single NS, TP 4065), including support
in nvme-target (for block and file backend).

2. Block layer
- Block-generic copy (REQ_COPY flag), with interface
accommodating two block-devs, and multi-source/destination
interface
- Emulation, when offload is natively absent
- dm-linear support (for cases not requiring split)

3. User-interface
- new ioctl
- copy_file_range for zonefs

4. In-kernel user
- dm-kcopyd
- copy_file_range in zonefs

Testing
=======
Copy offload can be tested on:
a. QEMU: NVME simple copy (TP 4065). By setting nvme-ns
parameters mssrl,mcl, msrc. For more info [2].
b. Fabrics loopback.
c. zonefs copy_file_range

Emuation can be tested on any device.

Sample application to use IOCTL is present in patch desciption.
fio[3].

Performance
===========
With the async design of copy-emulation/offload using fio[3],
we were able to see the following improvements as
compared to userspace read and write on a NVMeOF TCP setup:
Setup1: Network Speed: 1000Mb/s
Host PC: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Target PC: AMD Ryzen 9 5900X 12-Core Processor
block size 8k, range 1:
635% improvement in IO BW (107 MiB/s to 787 MiB/s).
Network utilisation drops from 97% to 14%.
block-size 2M, range 16:
2555% improvement in IO BW (100 MiB/s to 2655 MiB/s).
Network utilisation drops from 89% to 0.62%.
Setup2: Network Speed: 100Gb/s
Server: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 72 cores
(host and target have the same configuration)
block-size 8k, range 1:
6.5% improvement in IO BW (791 MiB/s to 843 MiB/s).
Network utilisation drops from 6.75% to 0.14%.
block-size 2M, range 16:
15% improvement in IO BW (1027 MiB/s to 1183 MiB/s).
Network utilisation drops from 8.42% to ~0%.
block-size 8k, 8 ranges:
18% drop in IO BW (from 798 MiB/s to 647 MiB/s)
Network utilisation drops from 6.66% to 0.13%.

At present we see drop in performance for bs 8k,16k and
higher ranges (8, 16), so something more to check there.
Overall, in these tests, kernel copy emulation performs better than
userspace read+write.

Zonefs copy_file_range
======================
Sample tests for zonefs-tools[4]. Test 0118 and 0119 will test
basic CFR. Will raise a PR, once this series is finalized.

Future Work
===========
- nullblk: copy-offload emulation
- generic copy file range (CFR):
I did go through this, but couldn't find straight forward
way to plug in copy offload for all the cases. We are doing
detailed study, will address this future versions.
- loopback device copy offload support
- upstream fio to use copy offload

These are to be taken up after we reach consensus on the
plumbing of current elements that are part of this series.


Additional links:
=================
[0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@mail.gmail.com/
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://qemu-project.gitlab.io/qemu/system/devices/nvme.html#simple-copy
[3] https://github.com/vincentkfu/fio/tree/copyoffload
[4] https://github.com/nitesh-shetty/zonefs-tools/tree/cfr

Changes since v4:
=================
- make the offload and emulation design asynchronous (Hannes
Reinecke)
- fabrics loopback support
- sysfs naming improvements (Damien Le Moal)
- use kfree() instead of kvfree() in cio_await_completion
(Damien Le Moal)
- use ranges instead of rlist to represent range_entry (Damien
Le Moal)
- change argument ordering in blk_copy_offload suggested (Damien
Le Moal)
- removed multiple copy limit and merged into only one limit
(Damien Le Moal)
- wrap overly long lines (Damien Le Moal)
- other naming improvements and cleanups (Damien Le Moal)
- correctly format the code example in description (Damien Le
Moal)
- mark blk_copy_offload as static (kernel test robot)

Changes since v3:
=================
- added copy_file_range support for zonefs
- added documentation about new sysfs entries
- incorporated review comments on v3
- minor fixes

Changes since v2:
=================
- fixed possible race condition reported by Damien Le Moal
- new sysfs controls as suggested by Damien Le Moal
- fixed possible memory leak reported by Dan Carpenter, lkp
- minor fixes

Nitesh Shetty (10):
block: Introduce queue limits for copy-offload support
block: Add copy offload support infrastructure
block: add emulation for copy
block: Introduce a new ioctl for copy
nvme: add copy offload support
nvmet: add copy command support for bdev and file ns
dm: Add support for copy offload.
dm: Enable copy offload for dm-linear target
dm kcopyd: use copy offload support
fs: add support for copy file range in zonefs

Documentation/ABI/stable/sysfs-block | 36 ++
block/blk-lib.c | 597 +++++++++++++++++++++++++++
block/blk-map.c | 4 +-
block/blk-settings.c | 24 ++
block/blk-sysfs.c | 64 +++
block/blk.h | 2 +
block/ioctl.c | 36 ++
drivers/md/dm-kcopyd.c | 56 ++-
drivers/md/dm-linear.c | 1 +
drivers/md/dm-table.c | 42 ++
drivers/md/dm.c | 7 +
drivers/nvme/host/core.c | 106 ++++-
drivers/nvme/host/fc.c | 5 +
drivers/nvme/host/nvme.h | 7 +
drivers/nvme/host/pci.c | 28 +-
drivers/nvme/host/rdma.c | 7 +
drivers/nvme/host/tcp.c | 16 +
drivers/nvme/host/trace.c | 19 +
drivers/nvme/target/admin-cmd.c | 9 +-
drivers/nvme/target/io-cmd-bdev.c | 79 ++++
drivers/nvme/target/io-cmd-file.c | 51 +++
drivers/nvme/target/loop.c | 6 +
drivers/nvme/target/nvmet.h | 2 +
fs/zonefs/super.c | 179 ++++++++
include/linux/blk_types.h | 44 ++
include/linux/blkdev.h | 18 +
include/linux/device-mapper.h | 5 +
include/linux/nvme.h | 43 +-
include/uapi/linux/fs.h | 27 ++
29 files changed, 1502 insertions(+), 18 deletions(-)


base-commit: e4cd8d3ff7f9efeb97330e5e9b99eeb2a68f5cf9
--
2.35.1.500.gb896f729e2


2022-11-23 06:44:36

by Nitesh Shetty

[permalink] [raw]
Subject: [PATCH v5 01/10] block: Introduce queue limits for copy-offload support

Add device limits as sysfs entries,
- copy_offload (RW)
- copy_max_bytes (RW)
- copy_max_bytes_hw (RO)

Above limits help to split the copy payload in block layer.
copy_offload: used for setting copy offload(1) or emulation(0).
copy_max_bytes: maximum total length of copy in single payload.
copy_max_bytes_hw: Reflects the device supported maximum limit.

Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Nitesh Shetty <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
Signed-off-by: Anuj Gupta <[email protected]>
---
Documentation/ABI/stable/sysfs-block | 36 ++++++++++++++++
block/blk-settings.c | 24 +++++++++++
block/blk-sysfs.c | 64 ++++++++++++++++++++++++++++
include/linux/blkdev.h | 12 ++++++
include/uapi/linux/fs.h | 3 ++
5 files changed, 139 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index cd14ecb3c9a5..e0c9be009706 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -155,6 +155,42 @@ Description:
last zone of the device which may be smaller.


+What: /sys/block/<disk>/queue/copy_offload
+Date: November 2022
+Contact: [email protected]
+Description:
+ [RW] When read, this file shows whether offloading copy to
+ device is enabled (1) or disabled (0). Writing '0' to this
+ file will disable offloading copies for this device.
+ Writing any '1' value will enable this feature. If device
+ does not support offloading, then writing 1, will result in
+ error.
+
+
+What: /sys/block/<disk>/queue/copy_max_bytes
+Date: November 2022
+Contact: [email protected]
+Description:
+ [RW] While 'copy_max_bytes_hw' is the hardware limit for the
+ device, 'copy_max_bytes' setting is the software limit.
+ Setting this value lower will make Linux issue smaller size
+ copies from block layer.
+
+
+What: /sys/block/<disk>/queue/copy_max_bytes_hw
+Date: November 2022
+Contact: [email protected]
+Description:
+ [RO] Devices that support offloading copy functionality may have
+ internal limits on the number of bytes that can be offloaded
+ in a single operation. The `copy_max_bytes_hw`
+ parameter is set by the device driver to the maximum number of
+ bytes that can be copied in a single operation. Copy
+ requests issued to the device must not exceed this limit.
+ A value of 0 means that the device does not
+ support copy offload.
+
+
What: /sys/block/<disk>/queue/crypto/
Date: February 2022
Contact: [email protected]
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 0477c4d527fe..ca6f15a70fdc 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -58,6 +58,8 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->zoned = BLK_ZONED_NONE;
lim->zone_write_granularity = 0;
lim->dma_alignment = 511;
+ lim->max_copy_sectors_hw = 0;
+ lim->max_copy_sectors = 0;
}

/**
@@ -81,6 +83,8 @@ void blk_set_stacking_limits(struct queue_limits *lim)
lim->max_dev_sectors = UINT_MAX;
lim->max_write_zeroes_sectors = UINT_MAX;
lim->max_zone_append_sectors = UINT_MAX;
+ lim->max_copy_sectors_hw = ULONG_MAX;
+ lim->max_copy_sectors = ULONG_MAX;
}
EXPORT_SYMBOL(blk_set_stacking_limits);

@@ -177,6 +181,22 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
}
EXPORT_SYMBOL(blk_queue_max_discard_sectors);

+/**
+ * blk_queue_max_copy_sectors_hw - set max sectors for a single copy payload
+ * @q: the request queue for the device
+ * @max_copy_sectors: maximum number of sectors to copy
+ **/
+void blk_queue_max_copy_sectors_hw(struct request_queue *q,
+ unsigned int max_copy_sectors)
+{
+ if (max_copy_sectors >= MAX_COPY_TOTAL_LENGTH)
+ max_copy_sectors = MAX_COPY_TOTAL_LENGTH;
+
+ q->limits.max_copy_sectors_hw = max_copy_sectors;
+ q->limits.max_copy_sectors = max_copy_sectors;
+}
+EXPORT_SYMBOL_GPL(blk_queue_max_copy_sectors_hw);
+
/**
* blk_queue_max_secure_erase_sectors - set max sectors for a secure erase
* @q: the request queue for the device
@@ -572,6 +592,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->max_segment_size = min_not_zero(t->max_segment_size,
b->max_segment_size);

+ t->max_copy_sectors = min(t->max_copy_sectors, b->max_copy_sectors);
+ t->max_copy_sectors_hw = min(t->max_copy_sectors_hw,
+ b->max_copy_sectors_hw);
+
t->misaligned |= b->misaligned;

alignment = queue_limit_alignment_offset(b, start);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 02e94c4beff1..903285b04029 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -212,6 +212,63 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
return queue_var_show(0, page);
}

+static ssize_t queue_copy_offload_show(struct request_queue *q, char *page)
+{
+ return queue_var_show(blk_queue_copy(q), page);
+}
+
+static ssize_t queue_copy_offload_store(struct request_queue *q,
+ const char *page, size_t count)
+{
+ s64 copy_offload;
+ ssize_t ret = queue_var_store64(&copy_offload, page);
+
+ if (ret < 0)
+ return ret;
+
+ if (copy_offload && !q->limits.max_copy_sectors_hw)
+ return -EINVAL;
+
+ if (copy_offload)
+ blk_queue_flag_set(QUEUE_FLAG_COPY, q);
+ else
+ blk_queue_flag_clear(QUEUE_FLAG_COPY, q);
+
+ return count;
+}
+
+static ssize_t queue_copy_max_hw_show(struct request_queue *q, char *page)
+{
+ return sprintf(page, "%llu\n", (unsigned long long)
+ q->limits.max_copy_sectors_hw << SECTOR_SHIFT);
+}
+
+static ssize_t queue_copy_max_show(struct request_queue *q, char *page)
+{
+ return sprintf(page, "%llu\n", (unsigned long long)
+ q->limits.max_copy_sectors << SECTOR_SHIFT);
+}
+
+static ssize_t queue_copy_max_store(struct request_queue *q,
+ const char *page, size_t count)
+{
+ s64 max_copy;
+ ssize_t ret = queue_var_store64(&max_copy, page);
+
+ if (ret < 0)
+ return ret;
+
+ if (max_copy & (queue_logical_block_size(q) - 1))
+ return -EINVAL;
+
+ max_copy >>= SECTOR_SHIFT;
+ if (max_copy > q->limits.max_copy_sectors_hw)
+ max_copy = q->limits.max_copy_sectors_hw;
+
+ q->limits.max_copy_sectors = max_copy;
+ return count;
+}
+
static ssize_t queue_write_same_max_show(struct request_queue *q, char *page)
{
return queue_var_show(0, page);
@@ -604,6 +661,10 @@ QUEUE_RO_ENTRY(queue_nr_zones, "nr_zones");
QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");

+QUEUE_RW_ENTRY(queue_copy_offload, "copy_offload");
+QUEUE_RO_ENTRY(queue_copy_max_hw, "copy_max_bytes_hw");
+QUEUE_RW_ENTRY(queue_copy_max, "copy_max_bytes");
+
QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
QUEUE_RW_ENTRY(queue_poll, "io_poll");
@@ -651,6 +712,9 @@ static struct attribute *queue_attrs[] = {
&queue_discard_max_entry.attr,
&queue_discard_max_hw_entry.attr,
&queue_discard_zeroes_data_entry.attr,
+ &queue_copy_offload_entry.attr,
+ &queue_copy_max_hw_entry.attr,
+ &queue_copy_max_entry.attr,
&queue_write_same_max_entry.attr,
&queue_write_zeroes_max_entry.attr,
&queue_zone_append_max_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a0452ba08e9a..3ac324208f2f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -302,6 +302,9 @@ struct queue_limits {
unsigned int discard_alignment;
unsigned int zone_write_granularity;

+ unsigned long max_copy_sectors_hw;
+ unsigned long max_copy_sectors;
+
unsigned short max_segments;
unsigned short max_integrity_segments;
unsigned short max_discard_segments;
@@ -573,6 +576,7 @@ struct request_queue {
#define QUEUE_FLAG_NOWAIT 29 /* device supports NOWAIT */
#define QUEUE_FLAG_SQ_SCHED 30 /* single queue style io dispatch */
#define QUEUE_FLAG_SKIP_TAGSET_QUIESCE 31 /* quiesce_tagset skip the queue*/
+#define QUEUE_FLAG_COPY 32 /* supports copy offload */

#define QUEUE_FLAG_MQ_DEFAULT ((1UL << QUEUE_FLAG_IO_STAT) | \
(1UL << QUEUE_FLAG_SAME_COMP) | \
@@ -593,6 +597,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags)
#define blk_queue_io_stat(q) test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
#define blk_queue_add_random(q) test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
+#define blk_queue_copy(q) test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
#define blk_queue_zone_resetall(q) \
test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
#define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
@@ -913,6 +918,8 @@ extern void blk_queue_chunk_sectors(struct request_queue *, unsigned int);
extern void blk_queue_max_segments(struct request_queue *, unsigned short);
extern void blk_queue_max_discard_segments(struct request_queue *,
unsigned short);
+extern void blk_queue_max_copy_sectors_hw(struct request_queue *q,
+ unsigned int max_copy_sectors);
void blk_queue_max_secure_erase_sectors(struct request_queue *q,
unsigned int max_sectors);
extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
@@ -1231,6 +1238,11 @@ static inline unsigned int bdev_discard_granularity(struct block_device *bdev)
return bdev_get_queue(bdev)->limits.discard_granularity;
}

+static inline unsigned int bdev_max_copy_sectors(struct block_device *bdev)
+{
+ return bdev_get_queue(bdev)->limits.max_copy_sectors;
+}
+
static inline unsigned int
bdev_max_secure_erase_sectors(struct block_device *bdev)
{
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index b7b56871029c..b3ad173f619c 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -64,6 +64,9 @@ struct fstrim_range {
__u64 minlen;
};

+/* maximum total copy length */
+#define MAX_COPY_TOTAL_LENGTH (1 << 27)
+
/* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
#define FILE_DEDUPE_RANGE_SAME 0
#define FILE_DEDUPE_RANGE_DIFFERS 1
--
2.35.1.500.gb896f729e2

2022-11-23 07:16:39

by Nitesh Shetty

[permalink] [raw]
Subject: [PATCH v5 04/10] block: Introduce a new ioctl for copy

Add new BLKCOPY ioctl that offloads copying of one or more sources ranges
to one or more destination in a device. COPY ioctl accepts a 'copy_range'
structure that contains no of range, a reserved field , followed by an
array of ranges. Each source range is represented by 'range_entry' that
contains source start offset, destination start offset and length of
source ranges (in bytes)

MAX_COPY_NR_RANGE, limits the number of entries for the IOCTL and
MAX_COPY_TOTAL_LENGTH limits the total copy length, IOCTL can handle.

Example code, to issue BLKCOPY:
/* Sample example to copy three entries with [dest,src,len],
* [32768, 0, 4096] [36864, 4096, 4096] [40960,8192,4096] on same device */

int main(void)
{
int i, ret, fd;
unsigned long src = 0, dst = 32768, len = 4096;
struct copy_range *cr;

cr = (struct copy_range *)malloc(sizeof(*cr)+
(sizeof(struct range_entry)*3));
cr->nr_range = 3;
cr->reserved = 0;
for (i = 0; i< cr->nr_range; i++, src += len, dst += len) {
cr->ranges[i].dst = dst;
cr->ranges[i].src = src;
cr->ranges[i].len = len;
cr->ranges[i].comp_len = 0;
}

fd = open("/dev/nvme0n1", O_RDWR);
if (fd < 0) return 1;

ret = ioctl(fd, BLKCOPY, cr);
if (ret != 0)
printf("copy failed, ret= %d\n", ret);

for (i=0; i< cr->nr_range; i++)
if (cr->ranges[i].len != cr->ranges[i].comp_len)
printf("Partial copy for entry %d: requested %llu,
completed %llu\n",
i, cr->ranges[i].len,
cr->ranges[i].comp_len);
close(fd);
free(cr);
return ret;
}

Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Nitesh Shetty <[email protected]>
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Anuj Gupta <[email protected]>
---
block/ioctl.c | 36 ++++++++++++++++++++++++++++++++++++
include/uapi/linux/fs.h | 9 +++++++++
2 files changed, 45 insertions(+)

diff --git a/block/ioctl.c b/block/ioctl.c
index 60121e89052b..7daf76199161 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -120,6 +120,40 @@ static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
return err;
}

+static int blk_ioctl_copy(struct block_device *bdev, fmode_t mode,
+ unsigned long arg)
+{
+ struct copy_range ucopy_range, *kcopy_range = NULL;
+ size_t payload_size = 0;
+ int ret;
+
+ if (!(mode & FMODE_WRITE))
+ return -EBADF;
+
+ if (copy_from_user(&ucopy_range, (void __user *)arg,
+ sizeof(ucopy_range)))
+ return -EFAULT;
+
+ if (unlikely(!ucopy_range.nr_range || ucopy_range.reserved ||
+ ucopy_range.nr_range >= MAX_COPY_NR_RANGE))
+ return -EINVAL;
+
+ payload_size = (ucopy_range.nr_range * sizeof(struct range_entry)) +
+ sizeof(ucopy_range);
+
+ kcopy_range = memdup_user((void __user *)arg, payload_size);
+ if (IS_ERR(kcopy_range))
+ return PTR_ERR(kcopy_range);
+
+ ret = blkdev_issue_copy(bdev, bdev, kcopy_range->ranges,
+ kcopy_range->nr_range, NULL, NULL, GFP_KERNEL);
+ if (copy_to_user((void __user *)arg, kcopy_range, payload_size))
+ ret = -EFAULT;
+
+ kfree(kcopy_range);
+ return ret;
+}
+
static int blk_ioctl_secure_erase(struct block_device *bdev, fmode_t mode,
void __user *argp)
{
@@ -481,6 +515,8 @@ static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode,
return blk_ioctl_discard(bdev, mode, arg);
case BLKSECDISCARD:
return blk_ioctl_secure_erase(bdev, mode, argp);
+ case BLKCOPY:
+ return blk_ioctl_copy(bdev, mode, arg);
case BLKZEROOUT:
return blk_ioctl_zeroout(bdev, mode, arg);
case BLKGETDISKSEQ:
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 9248b6d259de..8af10b926a6f 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -82,6 +82,14 @@ struct range_entry {
__u64 comp_len;
};

+struct copy_range {
+ __u64 nr_range;
+ __u64 reserved;
+
+ /* Ranges always must be at the end */
+ struct range_entry ranges[];
+};
+
/* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
#define FILE_DEDUPE_RANGE_SAME 0
#define FILE_DEDUPE_RANGE_DIFFERS 1
@@ -203,6 +211,7 @@ struct fsxattr {
#define BLKROTATIONAL _IO(0x12,126)
#define BLKZEROOUT _IO(0x12,127)
#define BLKGETDISKSEQ _IOR(0x12,128,__u64)
+#define BLKCOPY _IOWR(0x12, 129, struct copy_range)
/*
* A jump here: 130-136 are reserved for zoned block devices
* (see uapi/linux/blkzoned.h)
--
2.35.1.500.gb896f729e2

2022-11-23 07:21:42

by Nitesh Shetty

[permalink] [raw]
Subject: [PATCH v5 09/10] dm kcopyd: use copy offload support

Introduce copy_jobs to use copy-offload, if supported by underlying devices
otherwise fall back to existing method.

run_copy_jobs() calls block layer copy offload API, if both source and
destination request queue are same and support copy offload.
On successful completion, destination regions copied count is made zero,
failed regions are processed via existing method.

Signed-off-by: Nitesh Shetty <[email protected]>
Signed-off-by: Anuj Gupta <[email protected]>
---
drivers/md/dm-kcopyd.c | 56 +++++++++++++++++++++++++++++++++++++-----
1 file changed, 50 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 4d3bbbea2e9a..2f9985f671ac 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -74,18 +74,20 @@ struct dm_kcopyd_client {
atomic_t nr_jobs;

/*
- * We maintain four lists of jobs:
+ * We maintain five lists of jobs:
*
- * i) jobs waiting for pages
- * ii) jobs that have pages, and are waiting for the io to be issued.
- * iii) jobs that don't need to do any IO and just run a callback
- * iv) jobs that have completed.
+ * i) jobs waiting to try copy offload
+ * ii) jobs waiting for pages
+ * iii) jobs that have pages, and are waiting for the io to be issued.
+ * iv) jobs that don't need to do any IO and just run a callback
+ * v) jobs that have completed.
*
- * All four of these are protected by job_lock.
+ * All five of these are protected by job_lock.
*/
spinlock_t job_lock;
struct list_head callback_jobs;
struct list_head complete_jobs;
+ struct list_head copy_jobs;
struct list_head io_jobs;
struct list_head pages_jobs;
};
@@ -579,6 +581,43 @@ static int run_io_job(struct kcopyd_job *job)
return r;
}

+static int run_copy_job(struct kcopyd_job *job)
+{
+ int r, i, count = 0;
+ struct range_entry range;
+
+ struct request_queue *src_q, *dest_q;
+
+ for (i = 0; i < job->num_dests; i++) {
+ range.dst = job->dests[i].sector << SECTOR_SHIFT;
+ range.src = job->source.sector << SECTOR_SHIFT;
+ range.len = job->source.count << SECTOR_SHIFT;
+
+ src_q = bdev_get_queue(job->source.bdev);
+ dest_q = bdev_get_queue(job->dests[i].bdev);
+
+ if (src_q != dest_q || !blk_queue_copy(src_q))
+ break;
+
+ r = blkdev_issue_copy(job->source.bdev, job->dests[i].bdev,
+ &range, 1, NULL, NULL, GFP_KERNEL);
+ if (r)
+ break;
+
+ job->dests[i].count = 0;
+ count++;
+ }
+
+ if (count == job->num_dests) {
+ push(&job->kc->complete_jobs, job);
+ } else {
+ push(&job->kc->pages_jobs, job);
+ r = 0;
+ }
+
+ return r;
+}
+
static int run_pages_job(struct kcopyd_job *job)
{
int r;
@@ -659,6 +698,7 @@ static void do_work(struct work_struct *work)
spin_unlock_irq(&kc->job_lock);

blk_start_plug(&plug);
+ process_jobs(&kc->copy_jobs, kc, run_copy_job);
process_jobs(&kc->complete_jobs, kc, run_complete_job);
process_jobs(&kc->pages_jobs, kc, run_pages_job);
process_jobs(&kc->io_jobs, kc, run_io_job);
@@ -676,6 +716,8 @@ static void dispatch_job(struct kcopyd_job *job)
atomic_inc(&kc->nr_jobs);
if (unlikely(!job->source.count))
push(&kc->callback_jobs, job);
+ else if (job->source.bdev->bd_disk == job->dests[0].bdev->bd_disk)
+ push(&kc->copy_jobs, job);
else if (job->pages == &zero_page_list)
push(&kc->io_jobs, job);
else
@@ -916,6 +958,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro
spin_lock_init(&kc->job_lock);
INIT_LIST_HEAD(&kc->callback_jobs);
INIT_LIST_HEAD(&kc->complete_jobs);
+ INIT_LIST_HEAD(&kc->copy_jobs);
INIT_LIST_HEAD(&kc->io_jobs);
INIT_LIST_HEAD(&kc->pages_jobs);
kc->throttle = throttle;
@@ -971,6 +1014,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc)

BUG_ON(!list_empty(&kc->callback_jobs));
BUG_ON(!list_empty(&kc->complete_jobs));
+ WARN_ON(!list_empty(&kc->copy_jobs));
BUG_ON(!list_empty(&kc->io_jobs));
BUG_ON(!list_empty(&kc->pages_jobs));
destroy_workqueue(kc->kcopyd_wq);
--
2.35.1.500.gb896f729e2

2022-11-23 07:49:06

by Nitesh Shetty

[permalink] [raw]
Subject: [PATCH v5 08/10] dm: Enable copy offload for dm-linear target

Setting copy_offload_supported flag to enable offload.

Signed-off-by: Nitesh Shetty <[email protected]>
---
drivers/md/dm-linear.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 3212ef6aa81b..b4b57bead495 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -61,6 +61,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
ti->num_discard_bios = 1;
ti->num_secure_erase_bios = 1;
ti->num_write_zeroes_bios = 1;
+ ti->copy_offload_supported = 1;
ti->private = lc;
return 0;

--
2.35.1.500.gb896f729e2

2022-11-23 07:49:07

by Nitesh Shetty

[permalink] [raw]
Subject: [PATCH v5 06/10] nvmet: add copy command support for bdev and file ns

Add support for handling target command on target.
For bdev-ns we call into blkdev_issue_copy, which the block layer
completes by a offloaded copy request to backend bdev or by emulating the
request.

For file-ns we call vfs_copy_file_range to service our request.

Currently target always shows copy capability by setting
NVME_CTRL_ONCS_COPY in controller ONCS.

Signed-off-by: Nitesh Shetty <[email protected]>
Signed-off-by: Anuj Gupta <[email protected]>
---
drivers/nvme/target/admin-cmd.c | 9 +++-
drivers/nvme/target/io-cmd-bdev.c | 79 +++++++++++++++++++++++++++++++
drivers/nvme/target/io-cmd-file.c | 51 ++++++++++++++++++++
drivers/nvme/target/loop.c | 6 +++
drivers/nvme/target/nvmet.h | 2 +
5 files changed, 145 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index c8a061ce3ee5..5ae509ff4b19 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -431,8 +431,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
id->nn = cpu_to_le32(NVMET_MAX_NAMESPACES);
id->mnan = cpu_to_le32(NVMET_MAX_NAMESPACES);
id->oncs = cpu_to_le16(NVME_CTRL_ONCS_DSM |
- NVME_CTRL_ONCS_WRITE_ZEROES);
-
+ NVME_CTRL_ONCS_WRITE_ZEROES | NVME_CTRL_ONCS_COPY);
/* XXX: don't report vwc if the underlying device is write through */
id->vwc = NVME_CTRL_VWC_PRESENT;

@@ -534,6 +533,12 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)

if (req->ns->bdev)
nvmet_bdev_set_limits(req->ns->bdev, id);
+ else {
+ id->msrc = (u8)to0based(BIO_MAX_VECS - 1);
+ id->mssrl = cpu_to_le16(BIO_MAX_VECS <<
+ (PAGE_SHIFT - SECTOR_SHIFT));
+ id->mcl = cpu_to_le32(le16_to_cpu(id->mssrl));
+ }

/*
* We just provide a single LBA format that matches what the
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index c2d6cea0236b..01f0160125fb 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -46,6 +46,19 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
id->npda = id->npdg;
/* NOWS = Namespace Optimal Write Size */
id->nows = to0based(bdev_io_opt(bdev) / bdev_logical_block_size(bdev));
+
+ /*Copy limits*/
+ if (bdev_max_copy_sectors(bdev)) {
+ id->msrc = id->msrc;
+ id->mssrl = cpu_to_le16((bdev_max_copy_sectors(bdev) <<
+ SECTOR_SHIFT) / bdev_logical_block_size(bdev));
+ id->mcl = cpu_to_le32(id->mssrl);
+ } else {
+ id->msrc = (u8)to0based(BIO_MAX_VECS - 1);
+ id->mssrl = cpu_to_le16((BIO_MAX_VECS << PAGE_SHIFT) /
+ bdev_logical_block_size(bdev));
+ id->mcl = cpu_to_le32(id->mssrl);
+ }
}

void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
@@ -184,6 +197,23 @@ static void nvmet_bio_done(struct bio *bio)
nvmet_req_bio_put(req, bio);
}

+static void nvmet_bdev_copy_end_io(void *private, int status)
+{
+ struct nvmet_req *req = (struct nvmet_req *)private;
+ int id;
+
+ if (status) {
+ for (id = 0 ; id < req->nr_range; id++) {
+ if (req->ranges[id].len != req->ranges[id].comp_len) {
+ req->cqe->result.u32 = cpu_to_le32(id);
+ break;
+ }
+ }
+ }
+ kfree(req->ranges);
+ nvmet_req_complete(req, errno_to_nvme_status(req, status));
+}
+
#ifdef CONFIG_BLK_DEV_INTEGRITY
static int nvmet_bdev_alloc_bip(struct nvmet_req *req, struct bio *bio,
struct sg_mapping_iter *miter)
@@ -450,6 +480,51 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
}
}

+static void nvmet_bdev_execute_copy(struct nvmet_req *req)
+{
+ struct nvme_copy_range range;
+ struct range_entry *ranges;
+ struct nvme_command *cmnd = req->cmd;
+ sector_t dest, dest_off = 0;
+ int ret, id, nr_range;
+
+ nr_range = cmnd->copy.nr_range + 1;
+ dest = le64_to_cpu(cmnd->copy.sdlba) << req->ns->blksize_shift;
+ ranges = kmalloc_array(nr_range, sizeof(*ranges), GFP_KERNEL);
+
+ for (id = 0 ; id < nr_range; id++) {
+ ret = nvmet_copy_from_sgl(req, id * sizeof(range),
+ &range, sizeof(range));
+ if (ret)
+ goto out;
+
+ ranges[id].dst = dest + dest_off;
+ ranges[id].src = le64_to_cpu(range.slba) <<
+ req->ns->blksize_shift;
+ ranges[id].len = (le16_to_cpu(range.nlb) + 1) <<
+ req->ns->blksize_shift;
+ ranges[id].comp_len = 0;
+ dest_off += ranges[id].len;
+ }
+ req->ranges = ranges;
+ req->nr_range = nr_range;
+ ret = blkdev_issue_copy(req->ns->bdev, req->ns->bdev, ranges, nr_range,
+ nvmet_bdev_copy_end_io, (void *)req, GFP_KERNEL);
+ if (ret) {
+ for (id = 0 ; id < nr_range; id++) {
+ if (ranges[id].len != ranges[id].comp_len) {
+ req->cqe->result.u32 = cpu_to_le32(id);
+ break;
+ }
+ }
+ goto out;
+ } else
+ return;
+out:
+ kfree(ranges);
+ nvmet_req_complete(req, errno_to_nvme_status(req, ret));
+}
+
u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
{
switch (req->cmd->common.opcode) {
@@ -468,6 +543,10 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
case nvme_cmd_write_zeroes:
req->execute = nvmet_bdev_execute_write_zeroes;
return 0;
+ case nvme_cmd_copy:
+ req->execute = nvmet_bdev_execute_copy;
+ return 0;
+
default:
return nvmet_report_invalid_opcode(req);
}
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 64b47e2a4633..a81d38796e17 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -338,6 +338,48 @@ static void nvmet_file_dsm_work(struct work_struct *w)
}
}

+static void nvmet_file_copy_work(struct work_struct *w)
+{
+ struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
+ int nr_range;
+ loff_t pos;
+ struct nvme_command *cmnd = req->cmd;
+ int ret = 0, len = 0, src, id;
+
+ nr_range = cmnd->copy.nr_range + 1;
+ pos = le64_to_cpu(req->cmd->copy.sdlba) << req->ns->blksize_shift;
+ if (unlikely(pos + req->transfer_len > req->ns->size)) {
+ nvmet_req_complete(req, errno_to_nvme_status(req, -ENOSPC));
+ return;
+ }
+
+ for (id = 0 ; id < nr_range; id++) {
+ struct nvme_copy_range range;
+
+ ret = nvmet_copy_from_sgl(req, id * sizeof(range), &range,
+ sizeof(range));
+ if (ret)
+ goto out;
+
+ len = (le16_to_cpu(range.nlb) + 1) << (req->ns->blksize_shift);
+ src = (le64_to_cpu(range.slba) << (req->ns->blksize_shift));
+ ret = vfs_copy_file_range(req->ns->file, src, req->ns->file,
+ pos, len, 0);
+out:
+ if (ret != len) {
+ pos += ret;
+ req->cqe->result.u32 = cpu_to_le32(id);
+ nvmet_req_complete(req, ret < 0 ?
+ errno_to_nvme_status(req, ret) :
+ errno_to_nvme_status(req, -EIO));
+ return;
+
+ } else
+ pos += len;
+}
+ nvmet_req_complete(req, ret);
+
+}
static void nvmet_file_execute_dsm(struct nvmet_req *req)
{
if (!nvmet_check_data_len_lte(req, nvmet_dsm_len(req)))
@@ -346,6 +388,12 @@ static void nvmet_file_execute_dsm(struct nvmet_req *req)
queue_work(nvmet_wq, &req->f.work);
}

+static void nvmet_file_execute_copy(struct nvmet_req *req)
+{
+ INIT_WORK(&req->f.work, nvmet_file_copy_work);
+ queue_work(nvmet_wq, &req->f.work);
+}
+
static void nvmet_file_write_zeroes_work(struct work_struct *w)
{
struct nvmet_req *req = container_of(w, struct nvmet_req, f.work);
@@ -392,6 +440,9 @@ u16 nvmet_file_parse_io_cmd(struct nvmet_req *req)
case nvme_cmd_write_zeroes:
req->execute = nvmet_file_execute_write_zeroes;
return 0;
+ case nvme_cmd_copy:
+ req->execute = nvmet_file_execute_copy;
+ return 0;
default:
return nvmet_report_invalid_opcode(req);
}
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index b45fe3adf015..55802632b407 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -146,6 +146,12 @@ static blk_status_t nvme_loop_queue_rq(struct blk_mq_hw_ctx *hctx,
return ret;

blk_mq_start_request(req);
+ if (unlikely((req->cmd_flags & REQ_COPY) &&
+ (req_op(req) == REQ_OP_READ))) {
+ blk_mq_set_request_complete(req);
+ blk_mq_end_request(req, BLK_STS_OK);
+ return BLK_STS_OK;
+ }
iod->cmd.common.flags |= NVME_CMD_SGL_METABUF;
iod->req.port = queue->ctrl->port;
if (!nvmet_req_init(&iod->req, &queue->nvme_cq,
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index dfe3894205aa..3b4c7d2ee45d 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -391,6 +391,8 @@ struct nvmet_req {
struct device *p2p_client;
u16 error_loc;
u64 error_slba;
+ struct range_entry *ranges;
+ unsigned int nr_range;
};

extern struct workqueue_struct *buffered_io_wq;
--
2.35.1.500.gb896f729e2

2022-11-23 23:08:47

by Chaitanya Kulkarni

[permalink] [raw]
Subject: Re: [PATCH v5 00/10] Implement copy offload support

(+ Shinichiro)

On 11/22/22 21:58, Nitesh Shetty wrote:
> The patch series covers the points discussed in November 2021 virtual
> call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0].
> We have covered the initial agreed requirements in this patchset and
> further additional features suggested by community.
> Patchset borrows Mikulas's token based approach for 2 bdev
> implementation.
>
> This is on top of our previous patchset v4[1].

Now that series is converging, since patch-series touches
drivers and key components in the block layer you need accompany
the patch-series with the blktests to cover the corner cases in the
drivers which supports this operations, as I mentioned this in the
call last year....

If you need any help with that feel free to send an email to linux-block
and CC me or Shinichiro (added in CC )...

-ck

2022-11-30 00:46:55

by Chaitanya Kulkarni

[permalink] [raw]
Subject: Re: [PATCH v5 00/10] Implement copy offload support

On 11/29/22 04:16, Nitesh Shetty wrote:
> On Wed, Nov 23, 2022 at 10:56:23PM +0000, Chaitanya Kulkarni wrote:
>> (+ Shinichiro)
>>
>> On 11/22/22 21:58, Nitesh Shetty wrote:
>>> The patch series covers the points discussed in November 2021 virtual
>>> call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0].
>>> We have covered the initial agreed requirements in this patchset and
>>> further additional features suggested by community.
>>> Patchset borrows Mikulas's token based approach for 2 bdev
>>> implementation.
>>>
>>> This is on top of our previous patchset v4[1].
>>
>> Now that series is converging, since patch-series touches
>> drivers and key components in the block layer you need accompany
>> the patch-series with the blktests to cover the corner cases in the
>> drivers which supports this operations, as I mentioned this in the
>> call last year....
>>
>> If you need any help with that feel free to send an email to linux-block
>> and CC me or Shinichiro (added in CC )...
>>
>> -ck
>>
>
> Yes any help would be appreciated. I am not familiar with blktest
> development/testing cycle. Do we need add blktests along with patch
> series or do we need to add after patch series gets merged(to be merged)?
>
> Thanks
> Nitesh
>
>

we have many testcases you can refer to as an example.
Your cover-letter mentions that you have tested this code, just move
all the testcases to the blktests.

More importantly for a feature like this you should be providing
outstanding testcases in your github tree when you post the
series, it should cover critical parts of the block layer and
drivers in question.

The objective here is to have blktests updated when the code
is upstream so all the distros can test the code from
upstream blktest repo. You can refer to what we have done it
for NVMeOF in-band authentication (Thanks to Hannes and Sagi
in linux-nvme email-archives.

-ck

2022-12-06 09:44:11

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v5 06/10] nvmet: add copy command support for bdev and file ns

Hi Nitesh,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on next-20221206]
[cannot apply to device-mapper-dm/for-next linus/master v6.1-rc8]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20221123-145837
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/r/20221123055827.26996-7-nj.shetty%40samsung.com
patch subject: [PATCH v5 06/10] nvmet: add copy command support for bdev and file ns
config: openrisc-randconfig-s041-20221205
compiler: or1k-linux-gcc (GCC) 12.1.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.4-39-gce1a6720-dirty
# https://github.com/intel-lab-lkp/linux/commit/2696a8eeb2e224871e7c183660baea9d87f4fb71
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Nitesh-Shetty/block-Introduce-queue-limits-for-copy-offload-support/20221123-145837
git checkout 2696a8eeb2e224871e7c183660baea9d87f4fb71
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=openrisc SHELL=/bin/bash drivers/nvme/target/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

sparse warnings: (new ones prefixed by >>)
>> drivers/nvme/target/io-cmd-bdev.c:55:27: sparse: sparse: incorrect type in argument 1 (different base types) @@ expected unsigned int [usertype] val @@ got restricted __le16 [usertype] mssrl @@
drivers/nvme/target/io-cmd-bdev.c:55:27: sparse: expected unsigned int [usertype] val
drivers/nvme/target/io-cmd-bdev.c:55:27: sparse: got restricted __le16 [usertype] mssrl
>> drivers/nvme/target/io-cmd-bdev.c:55:27: sparse: sparse: cast from restricted __le16
>> drivers/nvme/target/io-cmd-bdev.c:55:27: sparse: sparse: cast from restricted __le16
>> drivers/nvme/target/io-cmd-bdev.c:55:27: sparse: sparse: cast from restricted __le16
>> drivers/nvme/target/io-cmd-bdev.c:55:27: sparse: sparse: cast from restricted __le16
drivers/nvme/target/io-cmd-bdev.c:57:29: sparse: sparse: cast from restricted __le16
drivers/nvme/target/io-cmd-bdev.c:60:27: sparse: sparse: incorrect type in argument 1 (different base types) @@ expected unsigned int [usertype] val @@ got restricted __le16 [usertype] mssrl @@
drivers/nvme/target/io-cmd-bdev.c:60:27: sparse: expected unsigned int [usertype] val
drivers/nvme/target/io-cmd-bdev.c:60:27: sparse: got restricted __le16 [usertype] mssrl
drivers/nvme/target/io-cmd-bdev.c:60:27: sparse: sparse: cast from restricted __le16
drivers/nvme/target/io-cmd-bdev.c:60:27: sparse: sparse: cast from restricted __le16
drivers/nvme/target/io-cmd-bdev.c:60:27: sparse: sparse: cast from restricted __le16
drivers/nvme/target/io-cmd-bdev.c:60:27: sparse: sparse: cast from restricted __le16

vim +55 drivers/nvme/target/io-cmd-bdev.c

12
13 void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
14 {
15 /* Logical blocks per physical block, 0's based. */
16 const __le16 lpp0b = to0based(bdev_physical_block_size(bdev) /
17 bdev_logical_block_size(bdev));
18
19 /*
20 * For NVMe 1.2 and later, bit 1 indicates that the fields NAWUN,
21 * NAWUPF, and NACWU are defined for this namespace and should be
22 * used by the host for this namespace instead of the AWUN, AWUPF,
23 * and ACWU fields in the Identify Controller data structure. If
24 * any of these fields are zero that means that the corresponding
25 * field from the identify controller data structure should be used.
26 */
27 id->nsfeat |= 1 << 1;
28 id->nawun = lpp0b;
29 id->nawupf = lpp0b;
30 id->nacwu = lpp0b;
31
32 /*
33 * Bit 4 indicates that the fields NPWG, NPWA, NPDG, NPDA, and
34 * NOWS are defined for this namespace and should be used by
35 * the host for I/O optimization.
36 */
37 id->nsfeat |= 1 << 4;
38 /* NPWG = Namespace Preferred Write Granularity. 0's based */
39 id->npwg = lpp0b;
40 /* NPWA = Namespace Preferred Write Alignment. 0's based */
41 id->npwa = id->npwg;
42 /* NPDG = Namespace Preferred Deallocate Granularity. 0's based */
43 id->npdg = to0based(bdev_discard_granularity(bdev) /
44 bdev_logical_block_size(bdev));
45 /* NPDG = Namespace Preferred Deallocate Alignment */
46 id->npda = id->npdg;
47 /* NOWS = Namespace Optimal Write Size */
48 id->nows = to0based(bdev_io_opt(bdev) / bdev_logical_block_size(bdev));
49
50 /*Copy limits*/
51 if (bdev_max_copy_sectors(bdev)) {
52 id->msrc = id->msrc;
53 id->mssrl = cpu_to_le16((bdev_max_copy_sectors(bdev) <<
54 SECTOR_SHIFT) / bdev_logical_block_size(bdev));
> 55 id->mcl = cpu_to_le32(id->mssrl);
56 } else {
57 id->msrc = (u8)to0based(BIO_MAX_VECS - 1);
58 id->mssrl = cpu_to_le16((BIO_MAX_VECS << PAGE_SHIFT) /
59 bdev_logical_block_size(bdev));
60 id->mcl = cpu_to_le32(id->mssrl);
61 }
62 }
63

--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (5.81 kB)
config (193.02 kB)
Download all attachments