2022-07-27 18:19:05

by Pankaj Raghav

[permalink] [raw]
Subject: [PATCH v8 00/11] support non power of 2 zoned device

- Background and Motivation:

The zone storage implementation in Linux, introduced since v4.10, first
targetted SMR drives which have a power of 2 (po2) zone size alignment
requirement. The po2 zone size was further imposed implicitly by the
block layer's blk_queue_chunk_sectors(), used to prevent IO merging
across chunks beyond the specified size, since v3.16 through commit
762380ad9322 ("block: add notion of a chunk size for request merging").
But this same general block layer po2 requirement for blk_queue_chunk_sectors()
was removed on v5.10 through commit 07d098e6bbad ("block: allow 'chunk_sectors'
to be non-power-of-2").

NAND, which is the media used in newer zoned storage devices, does not
naturally align to po2. In these devices, zone cap is not the same as the
po2 zone size. When the zone cap != zone size, then unmapped LBAs are
introduced to cover the space between the zone cap and zone size. po2
requirement does not make sense for these type of zone storage devices.
This patch series aims to remove these unmapped LBAs for zoned devices when
zone cap is npo2. This is done by relaxing the po2 zone size constraint
in the kernel and allowing zoned device with npo2 zone sizes if zone cap
== zone size.

Removing the po2 requirement from zone storage should be possible
now provided that no userspace regression and no performance regressions are
introduced. Stop-gap patches have been already merged into f2fs-tools to
proactively not allow npo2 zone sizes until proper support is added [1].

There were two efforts previously to add support to npo2 devices: 1) via
device level emulation [2] but that was rejected with a final conclusion
to add support for non po2 zoned device in the complete stack[3] 2)
adding support to the complete stack by removing the constraint in the
block layer and NVMe layer with support to btrfs, zonefs, etc which was
rejected with a conclusion to add a dm target for FS support [0]
to reduce the regression impact.

This series adds support to npo2 zoned devices in the block and nvme
layer and a new **dm target** is added: dm-po2z-target. This new
target will be initially used for filesystems such as btrfs and
f2fs that does not have native npo2 zone support.

- Patchset description:
Patches 1-2 deals with removing the po2 constraint from the
block layer.

Patches 3-4 deals with removing the constraint from nvme zns.

Patch 5 removes the po2 contraint in null blk

Patch 6 adds npo2 support to zonefs

Patches 7-11 adds support for npo2 zoned devices in the DM layer and
adds a new target dm-po2z-target which converts a npo2 zoned device into
a po2 zoned target.

The patch series is based on linux-next tag: next-20220727

Testing:
The new target was tested with blktest and zonefs test suite in qemu and
on a real ZNS device.

[0] https://lore.kernel.org/lkml/PH0PR04MB74166C87F694B150A5AE0F009BD09@PH0PR04MB7416.namprd04.prod.outlook.com/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?h=dev-test&id=6afcf6493578e77528abe65ab8b12f3e1c16749f
[2] https://lore.kernel.org/all/[email protected]/T/
[3] https://lore.kernel.org/all/20220315135245.eqf4tqngxxb7ymqa@unifi/

Changes since v1:
- Put the function declaration and its usage in the same commit (Bart)
- Remove bdev_zone_aligned function (Bart)
- Change the name from blk_queue_zone_aligned to blk_queue_is_zone_start
(Damien)
- q is never null in from bdev_get_queue (Damien)
- Add condition during bringup and check for zsze == zcap for npo2
drives (Damien)
- Rounddown operation should be made generic to work in 32 bits arch
(bart)
- Add comments where generic calculation is directly used instead having
special handling for po2 zone sizes (Hannes)
- Make the minimum zone size alignment requirement for btrfs to be 1M
instead of BTRFS_STRIPE_LEN(David)

Changes since v2:
- Minor formatting changes

Changes since v3:
- Make superblock mirror align with the existing superblock log offsets
(David)
- DM change return value and remove extra newline
- Optimize null blk zone index lookup with shift for po2 zone size

Changes since v4:
- Remove direct filesystems support for npo2 devices (Johannes, Hannes,
Damien)

Changes since v5:
- Use DIV_ROUND_UP* helper instead of round_up as it breaks 32bit arch
build in null blk(kernel-test-robot, Nathan)
- Use DIV_ROUND_UP_SECTOR_T also in blkdev_nr_zones function instead of
open coding it with div64_u64
- Added extra condition in dm-zoned and in dm to reject non power of 2
zone sizes.

Changes since v6:
- Added a new dm target for non power of 2 devices
- Added support for non power of 2 devices in the DM layer.

Changes since v7:
- Improved dm target for non power of 2 zoned devices with some bug
fixes and rearrangement
- Removed some unnecessary comments.

Luis Chamberlain (1):
dm-zoned: ensure only power of 2 zone sizes are allowed

Pankaj Raghav (10):
block: make bdev_nr_zones and disk_zone_no generic for npo2 zsze
block: allow blk-zoned devices to have non-power-of-2 zone size
nvme: zns: Allow ZNS drives that have non-power_of_2 zone size
nvmet: Allow ZNS target to support non-power_of_2 zone sizes
null_blk: allow non power of 2 zoned devices
zonefs: allow non power of 2 zoned devices
dm-zone: use generic helpers to calculate offset from zone start
dm-table: allow non po2 zoned devices
dm: call dm_zone_endio after the target endio callback for zoned
devices
dm: add power-of-2 zoned target for non-power-of-2 zoned devices

block/blk-core.c | 2 +-
block/blk-zoned.c | 37 +++--
drivers/block/null_blk/main.c | 5 +-
drivers/block/null_blk/null_blk.h | 6 +
drivers/block/null_blk/zoned.c | 18 ++-
drivers/md/Kconfig | 9 ++
drivers/md/Makefile | 2 +
drivers/md/dm-po2z-target.c | 261 ++++++++++++++++++++++++++++++
drivers/md/dm-table.c | 21 ++-
drivers/md/dm-zone.c | 10 +-
drivers/md/dm-zoned-target.c | 8 +
drivers/md/dm.c | 8 +-
drivers/nvme/host/zns.c | 16 +-
drivers/nvme/target/zns.c | 3 +-
fs/zonefs/super.c | 6 +-
fs/zonefs/zonefs.h | 1 -
include/linux/blkdev.h | 91 ++++++++---
include/linux/device-mapper.h | 9 ++
18 files changed, 438 insertions(+), 75 deletions(-)
create mode 100644 drivers/md/dm-po2z-target.c

--
2.25.1


2022-07-27 18:21:28

by Pankaj Raghav

[permalink] [raw]
Subject: [PATCH v8 05/11] null_blk: allow non power of 2 zoned devices

Convert the power of 2 based calculation with zone size to be generic in
null_zone_no with optimization for power of 2 based zone sizes.

The nr_zones calculation in null_init_zoned_dev has been replaced with a
division without special handling for power of 2 based zone sizes as
this function is called only during the initialization and will not
invoked in the hot path.

Performance Measurement:

Device:
zone size = 128M, blocksize=4k

FIO cmd:

fio --name=zbc --filename=/dev/nullb0 --direct=1 --zonemode=zbd --size=23G
--io_size=<iosize> --ioengine=io_uring --iodepth=<iod> --rw=<mode> --bs=4k
--loops=4

The following results are an average of 4 runs on AMD Ryzen 5 5600X with
32GB of RAM:

Sequential Write:

x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patch | 578 | 2257 | 12.80 | 576 | 2248 | 25.78 |
x-----------------x---------------------------------x---------------------------------x
| With patch | 581 | 2268 | 12.74 | 576 | 2248 | 25.85 |
x-----------------x---------------------------------x---------------------------------x

Sequential read:

x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patch | 667 | 2605 | 11.79 | 675 | 2637 | 23.49 |
x-----------------x---------------------------------x---------------------------------x
| With patch | 667 | 2605 | 11.79 | 675 | 2638 | 23.48 |
x-----------------x---------------------------------x---------------------------------x

Random read:

x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patch | 522 | 2038 | 15.05 | 514 | 2006 | 30.87 |
x-----------------x---------------------------------x---------------------------------x
| With patch | 522 | 2039 | 15.04 | 523 | 2042 | 30.33 |
x-----------------x---------------------------------x---------------------------------x

Minor variations are noticed in Sequential write with io depth 8 and
in random read with io depth 16. But overall no noticeable differences
were noticed

Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed by: Adam Manzanares <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
---
drivers/block/null_blk/main.c | 5 ++---
drivers/block/null_blk/null_blk.h | 6 ++++++
drivers/block/null_blk/zoned.c | 18 +++++++++++-------
3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index c451c477978f..f1e0605dee94 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -1976,9 +1976,8 @@ static int null_validate_conf(struct nullb_device *dev)
if (dev->queue_mode == NULL_Q_BIO)
dev->mbps = 0;

- if (dev->zoned &&
- (!dev->zone_size || !is_power_of_2(dev->zone_size))) {
- pr_err("zone_size must be power-of-two\n");
+ if (dev->zoned && !dev->zone_size) {
+ pr_err("Invalid zero zone size\n");
return -EINVAL;
}

diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index 94ff68052b1e..ece6dded9508 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -83,6 +83,12 @@ struct nullb_device {
unsigned int imp_close_zone_no;
struct nullb_zone *zones;
sector_t zone_size_sects;
+ /*
+ * zone_size_sects_shift is only useful when the zone size is
+ * power of 2. This variable is set to zero when zone size is non
+ * power of 2.
+ */
+ unsigned int zone_size_sects_shift;
bool need_zone_res_mgmt;
spinlock_t zone_res_lock;

diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
index 55a69e48ef8b..015f6823706c 100644
--- a/drivers/block/null_blk/zoned.c
+++ b/drivers/block/null_blk/zoned.c
@@ -16,7 +16,10 @@ static inline sector_t mb_to_sects(unsigned long mb)

static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
{
- return sect >> ilog2(dev->zone_size_sects);
+ if (dev->zone_size_sects_shift)
+ return sect >> dev->zone_size_sects_shift;
+
+ return div64_u64(sect, dev->zone_size_sects);
}

static inline void null_lock_zone_res(struct nullb_device *dev)
@@ -65,10 +68,6 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
sector_t sector = 0;
unsigned int i;

- if (!is_power_of_2(dev->zone_size)) {
- pr_err("zone_size must be power-of-two\n");
- return -EINVAL;
- }
if (dev->zone_size > dev->size) {
pr_err("Zone size larger than device capacity\n");
return -EINVAL;
@@ -86,9 +85,14 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
zone_capacity_sects = mb_to_sects(dev->zone_capacity);
dev_capacity_sects = mb_to_sects(dev->size);
dev->zone_size_sects = mb_to_sects(dev->zone_size);
- dev->nr_zones = round_up(dev_capacity_sects, dev->zone_size_sects)
- >> ilog2(dev->zone_size_sects);

+ if (is_power_of_2(dev->zone_size_sects))
+ dev->zone_size_sects_shift = ilog2(dev->zone_size_sects);
+ else
+ dev->zone_size_sects_shift = 0;
+
+ dev->nr_zones = DIV_ROUND_UP_SECTOR_T(dev_capacity_sects,
+ dev->zone_size_sects);
dev->zones = kvmalloc_array(dev->nr_zones, sizeof(struct nullb_zone),
GFP_KERNEL | __GFP_ZERO);
if (!dev->zones)
--
2.25.1

2022-07-27 18:21:36

by Pankaj Raghav

[permalink] [raw]
Subject: [PATCH v8 04/11] nvmet: Allow ZNS target to support non-power_of_2 zone sizes

A generic bdev_zone_no() helper is added to calculate zone number for a
given sector in a block device. This helper internally uses disk_zone_no()
to find the zone number.

Use the helper bdev_zone_no() to calculate nr of zones. This let's us
make modifications to the math if needed in one place and adds now
support for npo2 zone devices.

Reviewed by: Adam Manzanares <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
---
drivers/nvme/target/zns.c | 3 +--
include/linux/blkdev.h | 5 +++++
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index c7ef69f29fe4..662f1a92f39b 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -241,8 +241,7 @@ static unsigned long nvmet_req_nr_zones_from_slba(struct nvmet_req *req)
{
unsigned int sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);

- return bdev_nr_zones(req->ns->bdev) -
- (sect >> ilog2(bdev_zone_sectors(req->ns->bdev)));
+ return bdev_nr_zones(req->ns->bdev) - bdev_zone_no(req->ns->bdev, sect);
}

static unsigned long get_nr_zones_from_buf(struct nvmet_req *req, u32 bufsize)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1be805223026..d1ef9b9552ed 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1350,6 +1350,11 @@ static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
return BLK_ZONED_NONE;
}

+static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
+{
+ return disk_zone_no(bdev->bd_disk, sec);
+}
+
static inline int queue_dma_alignment(const struct request_queue *q)
{
return q ? q->dma_alignment : 511;
--
2.25.1

2022-07-27 18:31:54

by Pankaj Raghav

[permalink] [raw]
Subject: [PATCH v8 07/11] dm-zoned: ensure only power of 2 zone sizes are allowed

From: Luis Chamberlain <[email protected]>

dm-zoned relies on the assumption that the zone size is a
power-of-2(po2) and the zone capacity is same as the zone size.

Ensure only po2 devices can be used as dm-zoned target until a native
non po2 support is added.

Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
---
drivers/md/dm-zoned-target.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index 95b132b52f33..16499b75c5ee 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -792,6 +792,10 @@ static int dmz_fixup_devices(struct dm_target *ti)
return -EINVAL;
}
zone_nr_sectors = bdev_zone_sectors(bdev);
+ if (!is_power_of_2(zone_nr_sectors)) {
+ ti->error = "Zone size is not power of 2";
+ return -EINVAL;
+ }
zoned_dev->zone_nr_sectors = zone_nr_sectors;
zoned_dev->nr_zones = bdev_nr_zones(bdev);
}
@@ -804,6 +808,10 @@ static int dmz_fixup_devices(struct dm_target *ti)
return -EINVAL;
}
zoned_dev->zone_nr_sectors = bdev_zone_sectors(bdev);
+ if (!is_power_of_2(zoned_dev->zone_nr_sectors)) {
+ ti->error = "Zone size is not power of 2";
+ return -EINVAL;
+ }
zoned_dev->nr_zones = bdev_nr_zones(bdev);
}

--
2.25.1

2022-07-27 18:59:23

by Pankaj Raghav

[permalink] [raw]
Subject: [PATCH v8 01/11] block: make bdev_nr_zones and disk_zone_no generic for npo2 zsze

Adapt bdev_nr_zones and disk_zone_no function so that it can
also work for non-power-of-2 zone sizes.

As the existing deployments of zoned devices had power-of-2
assumption, power-of-2 optimized calculation is kept for those devices.

There are no direct hot paths modified and the changes just
introduce one new branch per call.

Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed-by: Adam Manzanares <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
---
block/blk-zoned.c | 13 +++++++++----
include/linux/blkdev.h | 8 +++++++-
2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index a264621d4905..dce9c95b4bcd 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -111,17 +111,22 @@ EXPORT_SYMBOL_GPL(__blk_req_zone_write_unlock);
* bdev_nr_zones - Get number of zones
* @bdev: Target device
*
- * Return the total number of zones of a zoned block device. For a block
- * device without zone capabilities, the number of zones is always 0.
+ * Return the total number of zones of a zoned block device, including the
+ * eventual small last zone if present. For a block device without zone
+ * capabilities, the number of zones is always 0.
*/
unsigned int bdev_nr_zones(struct block_device *bdev)
{
sector_t zone_sectors = bdev_zone_sectors(bdev);
+ sector_t capacity = bdev_nr_sectors(bdev);

if (!bdev_is_zoned(bdev))
return 0;
- return (bdev_nr_sectors(bdev) + zone_sectors - 1) >>
- ilog2(zone_sectors);
+
+ if (is_power_of_2(zone_sectors))
+ return (capacity + zone_sectors - 1) >> ilog2(zone_sectors);
+
+ return DIV_ROUND_UP_SECTOR_T(capacity, zone_sectors);
}
EXPORT_SYMBOL_GPL(bdev_nr_zones);

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index dccdf1551c62..85b832908f28 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -673,9 +673,15 @@ static inline unsigned int disk_nr_zones(struct gendisk *disk)

static inline unsigned int disk_zone_no(struct gendisk *disk, sector_t sector)
{
+ sector_t zone_sectors = disk->queue->limits.chunk_sectors;
+
if (!blk_queue_is_zoned(disk->queue))
return 0;
- return sector >> ilog2(disk->queue->limits.chunk_sectors);
+
+ if (is_power_of_2(zone_sectors))
+ return sector >> ilog2(zone_sectors);
+
+ return div64_u64(sector, zone_sectors);
}

static inline bool disk_zone_is_seq(struct gendisk *disk, sector_t sector)
--
2.25.1

2022-07-27 22:04:48

by Chaitanya Kulkarni

[permalink] [raw]
Subject: Re: [PATCH v8 05/11] null_blk: allow non power of 2 zoned devices


> Performance Measurement:
>
> Device:
> zone size = 128M, blocksize=4k
>
> FIO cmd:
>
> fio --name=zbc --filename=/dev/nullb0 --direct=1 --zonemode=zbd --size=23G
> --io_size=<iosize> --ioengine=io_uring --iodepth=<iod> --rw=<mode> --bs=4k
> --loops=4
>
> The following results are an average of 4 runs on AMD Ryzen 5 5600X with
> 32GB of RAM:
>
> Sequential Write:
>
> x-----------------x---------------------------------x---------------------------------x
> | IOdepth | 8 | 16 |
> x-----------------x---------------------------------x---------------------------------x
> | | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
> x-----------------x---------------------------------x---------------------------------x
> | Without patch | 578 | 2257 | 12.80 | 576 | 2248 | 25.78 |
> x-----------------x---------------------------------x---------------------------------x
> | With patch | 581 | 2268 | 12.74 | 576 | 2248 | 25.85 |
> x-----------------x---------------------------------x---------------------------------x
>
> Sequential read:
>
> x-----------------x---------------------------------x---------------------------------x
> | IOdepth | 8 | 16 |
> x-----------------x---------------------------------x---------------------------------x
> | | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
> x-----------------x---------------------------------x---------------------------------x
> | Without patch | 667 | 2605 | 11.79 | 675 | 2637 | 23.49 |
> x-----------------x---------------------------------x---------------------------------x
> | With patch | 667 | 2605 | 11.79 | 675 | 2638 | 23.48 |
> x-----------------x---------------------------------x---------------------------------x
>
> Random read:
>
> x-----------------x---------------------------------x---------------------------------x
> | IOdepth | 8 | 16 |
> x-----------------x---------------------------------x---------------------------------x
> | | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
> x-----------------x---------------------------------x---------------------------------x
> | Without patch | 522 | 2038 | 15.05 | 514 | 2006 | 30.87 |
> x-----------------x---------------------------------x---------------------------------x
> | With patch | 522 | 2039 | 15.04 | 523 | 2042 | 30.33 |
> x-----------------x---------------------------------x---------------------------------x
>
> Minor variations are noticed in Sequential write with io depth 8 and
> in random read with io depth 16. But overall no noticeable differences
> were noticed

minor variations in with aspect of the performance ?
are these documented somewhere ?

move the large table of performance numbers to the cover letter it looks
ugly in the git log...


2022-07-27 23:23:07

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v8 00/11] support non power of 2 zoned device

On 7/27/22 09:22, Pankaj Raghav wrote:
> This series adds support to npo2 zoned devices in the block and nvme
> layer and a new **dm target** is added: dm-po2z-target. This new
> target will be initially used for filesystems such as btrfs and
> f2fs that does not have native npo2 zone support.

Should any SCSI changes be included in this patch series? From sd_zbc.c:

if (!is_power_of_2(zone_blocks)) {
sd_printk(KERN_ERR, sdkp,
"Zone size %llu is not a power of two.\n",
zone_blocks);
return -EINVAL;
}

Thanks,

Bart.

2022-07-28 01:57:35

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH v8 00/11] support non power of 2 zoned device

On 7/28/22 08:19, Bart Van Assche wrote:
> On 7/27/22 09:22, Pankaj Raghav wrote:
>> This series adds support to npo2 zoned devices in the block and nvme
>> layer and a new **dm target** is added: dm-po2z-target. This new
>> target will be initially used for filesystems such as btrfs and
>> f2fs that does not have native npo2 zone support.
>
> Should any SCSI changes be included in this patch series? From sd_zbc.c:
>
> if (!is_power_of_2(zone_blocks)) {
> sd_printk(KERN_ERR, sdkp,
> "Zone size %llu is not a power of two.\n",
> zone_blocks);
> return -EINVAL;
> }

There are no non-power of 2 SMR drives on the market and no plans to have
any as far as I know. Users want power of 2 zone size. So I think it is
better to leave sd_zbc & scsi_debug as is for now.

>
> Thanks,
>
> Bart.


--
Damien Le Moal
Western Digital Research

2022-07-28 03:04:34

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH v8 01/11] block: make bdev_nr_zones and disk_zone_no generic for npo2 zsze

On 7/28/22 01:22, Pankaj Raghav wrote:
> Adapt bdev_nr_zones and disk_zone_no function so that it can

s/function/functions
s/it/they

> also work for non-power-of-2 zone sizes.
>
> As the existing deployments of zoned devices had power-of-2

had power-of-2 assumption -> assume that a device zone size is a power of
2 number of sectors

Existing deployments still exist, so do not use the past tense please.

> assumption, power-of-2 optimized calculation is kept for those devices.
>
> There are no direct hot paths modified and the changes just
> introduce one new branch per call.
>
> Reviewed-by: Luis Chamberlain <[email protected]>
> Reviewed-by: Adam Manzanares <[email protected]>
> Reviewed-by: Hannes Reinecke <[email protected]>
> Signed-off-by: Pankaj Raghav <[email protected]>
> Reviewed-by: Bart Van Assche <[email protected]>
> ---
> block/blk-zoned.c | 13 +++++++++----
> include/linux/blkdev.h | 8 +++++++-
> 2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index a264621d4905..dce9c95b4bcd 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -111,17 +111,22 @@ EXPORT_SYMBOL_GPL(__blk_req_zone_write_unlock);
> * bdev_nr_zones - Get number of zones
> * @bdev: Target device
> *
> - * Return the total number of zones of a zoned block device. For a block
> - * device without zone capabilities, the number of zones is always 0.
> + * Return the total number of zones of a zoned block device, including the
> + * eventual small last zone if present. For a block device without zone
> + * capabilities, the number of zones is always 0.
> */
> unsigned int bdev_nr_zones(struct block_device *bdev)
> {
> sector_t zone_sectors = bdev_zone_sectors(bdev);
> + sector_t capacity = bdev_nr_sectors(bdev);
>
> if (!bdev_is_zoned(bdev))
> return 0;
> - return (bdev_nr_sectors(bdev) + zone_sectors - 1) >>
> - ilog2(zone_sectors);
> +
> + if (is_power_of_2(zone_sectors))
> + return (capacity + zone_sectors - 1) >> ilog2(zone_sectors);
> +
> + return DIV_ROUND_UP_SECTOR_T(capacity, zone_sectors);
> }
> EXPORT_SYMBOL_GPL(bdev_nr_zones);
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index dccdf1551c62..85b832908f28 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -673,9 +673,15 @@ static inline unsigned int disk_nr_zones(struct gendisk *disk)
>
> static inline unsigned int disk_zone_no(struct gendisk *disk, sector_t sector)
> {
> + sector_t zone_sectors = disk->queue->limits.chunk_sectors;
> +
> if (!blk_queue_is_zoned(disk->queue))
> return 0;
> - return sector >> ilog2(disk->queue->limits.chunk_sectors);
> +
> + if (is_power_of_2(zone_sectors))
> + return sector >> ilog2(zone_sectors);
> +
> + return div64_u64(sector, zone_sectors);
> }
>
> static inline bool disk_zone_is_seq(struct gendisk *disk, sector_t sector)


--
Damien Le Moal
Western Digital Research

2022-07-28 03:05:01

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v8 00/11] support non power of 2 zoned device

On 7/27/22 18:52, Damien Le Moal wrote:
> On 7/28/22 08:19, Bart Van Assche wrote:
>> On 7/27/22 09:22, Pankaj Raghav wrote:
>>> This series adds support to npo2 zoned devices in the block and nvme
>>> layer and a new **dm target** is added: dm-po2z-target. This new
>>> target will be initially used for filesystems such as btrfs and
>>> f2fs that does not have native npo2 zone support.
>>
>> Should any SCSI changes be included in this patch series? From sd_zbc.c:
>>
>> if (!is_power_of_2(zone_blocks)) {
>> sd_printk(KERN_ERR, sdkp,
>> "Zone size %llu is not a power of two.\n",
>> zone_blocks);
>> return -EINVAL;
>> }
>
> There are no non-power of 2 SMR drives on the market and no plans to have
> any as far as I know. Users want power of 2 zone size. So I think it is
> better to leave sd_zbc & scsi_debug as is for now.

Zoned UFS devices will support ZBC and may have a zone size that is not
a power of two.

Thanks,

Bart.

2022-07-28 03:21:20

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH v8 05/11] null_blk: allow non power of 2 zoned devices

On 7/28/22 01:22, Pankaj Raghav wrote:
> Convert the power of 2 based calculation with zone size to be generic in
> null_zone_no with optimization for power of 2 based zone sizes.
>
> The nr_zones calculation in null_init_zoned_dev has been replaced with a
> division without special handling for power of 2 based zone sizes as
> this function is called only during the initialization and will not
> invoked in the hot path.
>
> Performance Measurement:
>
> Device:
> zone size = 128M, blocksize=4k
>
> FIO cmd:
>
> fio --name=zbc --filename=/dev/nullb0 --direct=1 --zonemode=zbd --size=23G
> --io_size=<iosize> --ioengine=io_uring --iodepth=<iod> --rw=<mode> --bs=4k
> --loops=4
>
> The following results are an average of 4 runs on AMD Ryzen 5 5600X with
> 32GB of RAM:
>
> Sequential Write:
>
> x-----------------x---------------------------------x---------------------------------x
> | IOdepth | 8 | 16 |
> x-----------------x---------------------------------x---------------------------------x
> | | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
> x-----------------x---------------------------------x---------------------------------x
> | Without patch | 578 | 2257 | 12.80 | 576 | 2248 | 25.78 |
> x-----------------x---------------------------------x---------------------------------x
> | With patch | 581 | 2268 | 12.74 | 576 | 2248 | 25.85 |
> x-----------------x---------------------------------x---------------------------------x
>
> Sequential read:
>
> x-----------------x---------------------------------x---------------------------------x
> | IOdepth | 8 | 16 |
> x-----------------x---------------------------------x---------------------------------x
> | | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
> x-----------------x---------------------------------x---------------------------------x
> | Without patch | 667 | 2605 | 11.79 | 675 | 2637 | 23.49 |
> x-----------------x---------------------------------x---------------------------------x
> | With patch | 667 | 2605 | 11.79 | 675 | 2638 | 23.48 |
> x-----------------x---------------------------------x---------------------------------x
>
> Random read:
>
> x-----------------x---------------------------------x---------------------------------x
> | IOdepth | 8 | 16 |
> x-----------------x---------------------------------x---------------------------------x
> | | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
> x-----------------x---------------------------------x---------------------------------x
> | Without patch | 522 | 2038 | 15.05 | 514 | 2006 | 30.87 |
> x-----------------x---------------------------------x---------------------------------x
> | With patch | 522 | 2039 | 15.04 | 523 | 2042 | 30.33 |
> x-----------------x---------------------------------x---------------------------------x
>
> Minor variations are noticed in Sequential write with io depth 8 and
> in random read with io depth 16. But overall no noticeable differences
> were noticed
>
> Reviewed-by: Luis Chamberlain <[email protected]>
> Reviewed by: Adam Manzanares <[email protected]>
> Reviewed-by: Hannes Reinecke <[email protected]>
> Signed-off-by: Pankaj Raghav <[email protected]>
> ---
> drivers/block/null_blk/main.c | 5 ++---
> drivers/block/null_blk/null_blk.h | 6 ++++++
> drivers/block/null_blk/zoned.c | 18 +++++++++++-------
> 3 files changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index c451c477978f..f1e0605dee94 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -1976,9 +1976,8 @@ static int null_validate_conf(struct nullb_device *dev)
> if (dev->queue_mode == NULL_Q_BIO)
> dev->mbps = 0;
>
> - if (dev->zoned &&
> - (!dev->zone_size || !is_power_of_2(dev->zone_size))) {
> - pr_err("zone_size must be power-of-two\n");
> + if (dev->zoned && !dev->zone_size) {
> + pr_err("Invalid zero zone size\n");
> return -EINVAL;
> }
>
> diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
> index 94ff68052b1e..ece6dded9508 100644
> --- a/drivers/block/null_blk/null_blk.h
> +++ b/drivers/block/null_blk/null_blk.h
> @@ -83,6 +83,12 @@ struct nullb_device {
> unsigned int imp_close_zone_no;
> struct nullb_zone *zones;
> sector_t zone_size_sects;
> + /*
> + * zone_size_sects_shift is only useful when the zone size is
> + * power of 2. This variable is set to zero when zone size is non
> + * power of 2.
> + */

Simplify:

/*
* zone_size_sects_shift is used only when the zone size is a
* power of 2 number of sectors.
*/

But personally, I would simply drop this comment. The name is obvious and
the code very clear.

> + unsigned int zone_size_sects_shift;
> bool need_zone_res_mgmt;
> spinlock_t zone_res_lock;
>
> diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
> index 55a69e48ef8b..015f6823706c 100644
> --- a/drivers/block/null_blk/zoned.c
> +++ b/drivers/block/null_blk/zoned.c
> @@ -16,7 +16,10 @@ static inline sector_t mb_to_sects(unsigned long mb)
>
> static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
> {
> - return sect >> ilog2(dev->zone_size_sects);
> + if (dev->zone_size_sects_shift)
> + return sect >> dev->zone_size_sects_shift;
> +
> + return div64_u64(sect, dev->zone_size_sects);
> }
>
> static inline void null_lock_zone_res(struct nullb_device *dev)
> @@ -65,10 +68,6 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
> sector_t sector = 0;
> unsigned int i;
>
> - if (!is_power_of_2(dev->zone_size)) {
> - pr_err("zone_size must be power-of-two\n");
> - return -EINVAL;
> - }
> if (dev->zone_size > dev->size) {
> pr_err("Zone size larger than device capacity\n");
> return -EINVAL;
> @@ -86,9 +85,14 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
> zone_capacity_sects = mb_to_sects(dev->zone_capacity);
> dev_capacity_sects = mb_to_sects(dev->size);
> dev->zone_size_sects = mb_to_sects(dev->zone_size);
> - dev->nr_zones = round_up(dev_capacity_sects, dev->zone_size_sects)
> - >> ilog2(dev->zone_size_sects);
>
> + if (is_power_of_2(dev->zone_size_sects))
> + dev->zone_size_sects_shift = ilog2(dev->zone_size_sects);
> + else
> + dev->zone_size_sects_shift = 0;
> +
> + dev->nr_zones = DIV_ROUND_UP_SECTOR_T(dev_capacity_sects,
> + dev->zone_size_sects);
> dev->zones = kvmalloc_array(dev->nr_zones, sizeof(struct nullb_zone),
> GFP_KERNEL | __GFP_ZERO);
> if (!dev->zones)


--
Damien Le Moal
Western Digital Research

2022-07-28 03:35:31

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH v8 07/11] dm-zoned: ensure only power of 2 zone sizes are allowed

On 7/28/22 01:22, Pankaj Raghav wrote:
> From: Luis Chamberlain <[email protected]>
>
> dm-zoned relies on the assumption that the zone size is a
> power-of-2(po2) and the zone capacity is same as the zone size.
>
> Ensure only po2 devices can be used as dm-zoned target until a native
> non po2 support is added.
>
> Reviewed-by: Hannes Reinecke <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> Signed-off-by: Pankaj Raghav <[email protected]>
> ---
> drivers/md/dm-zoned-target.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
> index 95b132b52f33..16499b75c5ee 100644
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -792,6 +792,10 @@ static int dmz_fixup_devices(struct dm_target *ti)
> return -EINVAL;
> }
> zone_nr_sectors = bdev_zone_sectors(bdev);
> + if (!is_power_of_2(zone_nr_sectors)) {
> + ti->error = "Zone size is not power of 2";

Let's clarify:

ti->error = "Zone size is not a power of 2 number of sectors";

> + return -EINVAL;
> + }
> zoned_dev->zone_nr_sectors = zone_nr_sectors;
> zoned_dev->nr_zones = bdev_nr_zones(bdev);
> }
> @@ -804,6 +808,10 @@ static int dmz_fixup_devices(struct dm_target *ti)
> return -EINVAL;
> }
> zoned_dev->zone_nr_sectors = bdev_zone_sectors(bdev);
> + if (!is_power_of_2(zoned_dev->zone_nr_sectors)) {
> + ti->error = "Zone size is not power of 2";

Same comment.

> + return -EINVAL;
> + }
> zoned_dev->nr_zones = bdev_nr_zones(bdev);
> }
>


--
Damien Le Moal
Western Digital Research

2022-07-28 03:36:33

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH v8 00/11] support non power of 2 zoned device

On 7/28/22 11:58, Bart Van Assche wrote:
> On 7/27/22 18:52, Damien Le Moal wrote:
>> On 7/28/22 08:19, Bart Van Assche wrote:
>>> On 7/27/22 09:22, Pankaj Raghav wrote:
>>>> This series adds support to npo2 zoned devices in the block and nvme
>>>> layer and a new **dm target** is added: dm-po2z-target. This new
>>>> target will be initially used for filesystems such as btrfs and
>>>> f2fs that does not have native npo2 zone support.
>>>
>>> Should any SCSI changes be included in this patch series? From sd_zbc.c:
>>>
>>> if (!is_power_of_2(zone_blocks)) {
>>> sd_printk(KERN_ERR, sdkp,
>>> "Zone size %llu is not a power of two.\n",
>>> zone_blocks);
>>> return -EINVAL;
>>> }
>>
>> There are no non-power of 2 SMR drives on the market and no plans to have
>> any as far as I know. Users want power of 2 zone size. So I think it is
>> better to leave sd_zbc & scsi_debug as is for now.
>
> Zoned UFS devices will support ZBC and may have a zone size that is not
> a power of two.

OK. So the check needs to be removed then and the entire zone append
emulation checked carefully. The divisions for zone no etc on non power of
2 zone size devices in zone append emulation hot path are really not
welcome though.


>
> Thanks,
>
> Bart.


--
Damien Le Moal
Western Digital Research

2022-07-28 03:37:15

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH v8 04/11] nvmet: Allow ZNS target to support non-power_of_2 zone sizes

On 7/28/22 01:22, Pankaj Raghav wrote:
> A generic bdev_zone_no() helper is added to calculate zone number for a
> given sector in a block device. This helper internally uses disk_zone_no()
> to find the zone number.
>
> Use the helper bdev_zone_no() to calculate nr of zones. This let's us
> make modifications to the math if needed in one place and adds now
> support for npo2 zone devices.
>
> Reviewed by: Adam Manzanares <[email protected]>
> Reviewed-by: Bart Van Assche <[email protected]>
> Reviewed-by: Hannes Reinecke <[email protected]>
> Reviewed-by: Johannes Thumshirn <[email protected]>
> Signed-off-by: Luis Chamberlain <[email protected]>
> Signed-off-by: Pankaj Raghav <[email protected]>
> ---
> drivers/nvme/target/zns.c | 3 +--
> include/linux/blkdev.h | 5 +++++
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
> index c7ef69f29fe4..662f1a92f39b 100644
> --- a/drivers/nvme/target/zns.c
> +++ b/drivers/nvme/target/zns.c
> @@ -241,8 +241,7 @@ static unsigned long nvmet_req_nr_zones_from_slba(struct nvmet_req *req)
> {
> unsigned int sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
>
> - return bdev_nr_zones(req->ns->bdev) -
> - (sect >> ilog2(bdev_zone_sectors(req->ns->bdev)));
> + return bdev_nr_zones(req->ns->bdev) - bdev_zone_no(req->ns->bdev, sect);
> }

This change should go into patch 3. Otherwise, adding patch 3 only will
break the nvme target zone code.

>
> static unsigned long get_nr_zones_from_buf(struct nvmet_req *req, u32 bufsize)
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 1be805223026..d1ef9b9552ed 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1350,6 +1350,11 @@ static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
> return BLK_ZONED_NONE;
> }
>
> +static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
> +{
> + return disk_zone_no(bdev->bd_disk, sec);
> +}
> +

This should go into a prep patch before patch 3.

> static inline int queue_dma_alignment(const struct request_queue *q)
> {
> return q ? q->dma_alignment : 511;


--
Damien Le Moal
Western Digital Research

2022-07-28 12:02:19

by Pankaj Raghav

[permalink] [raw]
Subject: Re: [PATCH v8 00/11] support non power of 2 zoned device

Hi Bart,

On 2022-07-28 01:19, Bart Van Assche wrote:
> On 7/27/22 09:22, Pankaj Raghav wrote:
>> This series adds support to npo2 zoned devices in the block and nvme
>> layer and a new **dm target** is added: dm-po2z-target. This new
>> target will be initially used for filesystems such as btrfs and
>> f2fs that does not have native npo2 zone support.
>
> Should any SCSI changes be included in this patch series? From sd_zbc.c:
>
>     if (!is_power_of_2(zone_blocks)) {
>         sd_printk(KERN_ERR, sdkp,
>               "Zone size %llu is not a power of two.\n",
>               zone_blocks);
>         return -EINVAL;
>     }
>
I would keep these changes out of the current patch series because it
will also increase the test scope. I think once the block layer
constraint is removed as a part of this series, we can work on the SCSI
changes in the next cycle.
> Thanks,
>
> Bart.

2022-07-28 12:47:04

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v8 07/11] dm-zoned: ensure only power of 2 zone sizes are allowed

On Wed, Jul 27, 2022 at 06:22:41PM +0200, Pankaj Raghav wrote:
> --- a/drivers/md/dm-zoned-target.c
> +++ b/drivers/md/dm-zoned-target.c
> @@ -792,6 +792,10 @@ static int dmz_fixup_devices(struct dm_target *ti)
> return -EINVAL;
> }
> zone_nr_sectors = bdev_zone_sectors(bdev);
> + if (!is_power_of_2(zone_nr_sectors)) {
> + ti->error = "Zone size is not power of 2";

This could print what's the value of zone_nr_sectors

> + return -EINVAL;
> + }
> zoned_dev->zone_nr_sectors = zone_nr_sectors;
> zoned_dev->nr_zones = bdev_nr_zones(bdev);
> }
> @@ -804,6 +808,10 @@ static int dmz_fixup_devices(struct dm_target *ti)
> return -EINVAL;
> }
> zoned_dev->zone_nr_sectors = bdev_zone_sectors(bdev);
> + if (!is_power_of_2(zoned_dev->zone_nr_sectors)) {
> + ti->error = "Zone size is not power of 2";

Same

> + return -EINVAL;
> + }
> zoned_dev->nr_zones = bdev_nr_zones(bdev);
> }
>
> --
> 2.25.1

2022-07-28 14:20:45

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v8 00/11] support non power of 2 zoned device

On 7/28/22 04:57, Pankaj Raghav wrote:
> Hi Bart,
>
> On 2022-07-28 01:19, Bart Van Assche wrote:
>> On 7/27/22 09:22, Pankaj Raghav wrote:
>>> This series adds support to npo2 zoned devices in the block and nvme
>>> layer and a new **dm target** is added: dm-po2z-target. This new
>>> target will be initially used for filesystems such as btrfs and
>>> f2fs that does not have native npo2 zone support.
>>
>> Should any SCSI changes be included in this patch series? From sd_zbc.c:
>>
>>     if (!is_power_of_2(zone_blocks)) {
>>         sd_printk(KERN_ERR, sdkp,
>>               "Zone size %llu is not a power of two.\n",
>>               zone_blocks);
>>         return -EINVAL;
>>     }
>>
> I would keep these changes out of the current patch series because it
> will also increase the test scope. I think once the block layer
> constraint is removed as a part of this series, we can work on the SCSI
> changes in the next cycle.

That's fine with me.

Thanks,

Bart.

2022-07-29 08:05:14

by Pankaj Raghav

[permalink] [raw]
Subject: Re: [PATCH v8 05/11] null_blk: allow non power of 2 zoned devices

On 2022-07-27 23:59, Chaitanya Kulkarni wrote:
>> Sequential Write:
>>
>> x-----------------x---------------------------------x---------------------------------x
>> | IOdepth | 8 | 16 |
>> x-----------------x---------------------------------x---------------------------------x
>> | | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
>> x-----------------x---------------------------------x---------------------------------x
>> | Without patch | 578 | 2257 | 12.80 | 576 | 2248 | 25.78 |
>> x-----------------x---------------------------------x---------------------------------x
>> | With patch | 581 | 2268 | 12.74 | 576 | 2248 | 25.85 |
>> x-----------------x---------------------------------x---------------------------------x
>>
>> Sequential read:
>>
>> x-----------------x---------------------------------x---------------------------------x
>> | IOdepth | 8 | 16 |
>> x-----------------x---------------------------------x---------------------------------x
>> | | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
>> x-----------------x---------------------------------x---------------------------------x
>> | Without patch | 667 | 2605 | 11.79 | 675 | 2637 | 23.49 |
>> x-----------------x---------------------------------x---------------------------------x
>> | With patch | 667 | 2605 | 11.79 | 675 | 2638 | 23.48 |
>> x-----------------x---------------------------------x---------------------------------x
>>
>> Random read:
>>
>> x-----------------x---------------------------------x---------------------------------x
>> | IOdepth | 8 | 16 |
>> x-----------------x---------------------------------x---------------------------------x
>> | | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
>> x-----------------x---------------------------------x---------------------------------x
>> | Without patch | 522 | 2038 | 15.05 | 514 | 2006 | 30.87 |
>> x-----------------x---------------------------------x---------------------------------x
>> | With patch | 522 | 2039 | 15.04 | 523 | 2042 | 30.33 |
>> x-----------------x---------------------------------x---------------------------------x
>>
>> Minor variations are noticed in Sequential write with io depth 8 and
>> in random read with io depth 16. But overall no noticeable differences
>> were noticed
>
> minor variations in with aspect of the performance ?
> are these documented somewhere ?
>
The table above documents those minor variations in performance.
> move the large table of performance numbers to the cover letter it looks
> ugly in the git log...
>
I could do that.
>

2022-07-29 08:06:08

by Pankaj Raghav

[permalink] [raw]
Subject: Re: [PATCH v8 04/11] nvmet: Allow ZNS target to support non-power_of_2 zone sizes

On 2022-07-28 05:09, Damien Le Moal wrote:

>> }
>
> This change should go into patch 3. Otherwise, adding patch 3 only will
> break the nvme target zone code.
>
Ok.
>>
>> static unsigned long get_nr_zones_from_buf(struct nvmet_req *req, u32 bufsize)
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 1be805223026..d1ef9b9552ed 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -1350,6 +1350,11 @@ static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
>> return BLK_ZONED_NONE;
>> }
>>
>> +static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
>> +{
>> + return disk_zone_no(bdev->bd_disk, sec);
>> +}
>> +
>
> This should go into a prep patch before patch 3.
>
Bart commented in the earlier versions to see the new helpers along with
its usage instead of having it separately in a prep patch.

>> static inline int queue_dma_alignment(const struct request_queue *q)
>> {
>> return q ? q->dma_alignment : 511;
>
>

2022-07-29 08:11:44

by Pankaj Raghav

[permalink] [raw]
Subject: Re: [PATCH v8 07/11] dm-zoned: ensure only power of 2 zone sizes are allowed

On 2022-07-28 14:15, David Sterba wrote:
> On Wed, Jul 27, 2022 at 06:22:41PM +0200, Pankaj Raghav wrote:
>> --- a/drivers/md/dm-zoned-target.c
>> +++ b/drivers/md/dm-zoned-target.c
>> @@ -792,6 +792,10 @@ static int dmz_fixup_devices(struct dm_target *ti)
>> return -EINVAL;
>> }
>> zone_nr_sectors = bdev_zone_sectors(bdev);
>> + if (!is_power_of_2(zone_nr_sectors)) {
>> + ti->error = "Zone size is not power of 2";
>
> This could print what's the value of zone_nr_sectors
Ok. I will rephrase based on Damien's comment and add the
zone_nr_sectors to be included. Thanks.
>
>> + return -EINVAL;
>> + }
>> zoned_dev->zone_nr_sectors = zone_nr_sectors;
>> zoned_dev->nr_zones = bdev_nr_zones(bdev);
>> }
>> @@ -804,6 +808,10 @@ static int dmz_fixup_devices(struct dm_target *ti)
>> return -EINVAL;
>> }
>> zoned_dev->zone_nr_sectors = bdev_zone_sectors(bdev);
>> + if (!is_power_of_2(zoned_dev->zone_nr_sectors)) {
>> + ti->error = "Zone size is not power of 2";
>
> Same
>
>> + return -EINVAL;
>> + }
>> zoned_dev->nr_zones = bdev_nr_zones(bdev);
>> }
>>
>> --
>> 2.25.1