2022-05-23 16:17:11

by Pankaj Raghav

[permalink] [raw]
Subject: [PATCH v5 0/7] support non power of 2 zoned devices

Hello,
The previous revision ended up leading to a new direction to add npo2
device support as a dm target instead of adding support to filesystems
directly[0]. I would like to hear some inputs from the community,
especially from Christoph and Mike Snitzer about this approach.

- Background and Motivation:

The zone storage implementation in Linux, introduced since v4.10, first
targetted SMR drives which have a power of 2 (po2) zone size alignment
requirement. The po2 zone size was further imposed implicitly by the
block layer's blk_queue_chunk_sectors(), used to prevent IO merging
across chunks beyond the specified size, since v3.16 through commit
762380ad9322 ("block: add notion of a chunk size for request merging").
But this same general block layer po2 requirement for blk_queue_chunk_sectors()
was removed on v5.10 through commit 07d098e6bbad ("block: allow 'chunk_sectors'
to be non-power-of-2").

NAND, which is the media used in newer zoned storage devices, does not
naturally align to po2. In these devices, zone cap is not the same as the
po2 zone size. When the zone cap != zone size, then unmapped LBAs are
introduced to cover the space between the zone cap and zone size. po2
requirement does not make sense for these type of zone storage devices.
This patch series aims to remove these unmapped LBAs for zoned devices when
zone cap is npo2. This is done by relaxing the po2 zone size constraint
in the kernel and allowing zoned device with npo2 zone sizes if zone cap
== zone size.

Removing the po2 requirement from zone storage should be possible
now provided that no userspace regression and no performance regressions are
introduced. Stop-gap patches have been already merged into f2fs-tools to
proactively not allow npo2 zone sizes until proper support is added [1].

There were two efforts previously to add support to npo2 devices: 1) via
device level emulation [2] but that was rejected with a final conclusion
to add support for non po2 zoned device in the complete stack[3] 2)
adding support to the complete stack by removing the constraint in the
block layer and NVMe layer with support to btrfs, zonefs, etc which was
rejected with a conclusion to add a dm target for FS support [0]
to reduce the regression impact.

- Patchset description:

The support is planned to be added in two phases:
- Add npo2 support to block, nvme layer and necessary stop gap patches
in the filesystems
- Add dm target for npo2 devices so that they are presented as a po2
device to filesystems

This patchset addresses the first phase for adding support to npo2
devices.

Patches 1-2 deals with removing the po2 constraint from the
block layer.

Patches 3-4 deals with removing the constraint from nvme zns.

Patch 5-6 removes the po2 contraint in null blk

Patches 7 adds conditions to not allow non power of 2 devices in
DM.

The patch series is based on linux-next tag: next-20220520

- Future work
Add DM target for npo2 devices to be presented as a po2 device.

[0] https://lore.kernel.org/lkml/PH0PR04MB74166C87F694B150A5AE0F009BD09@PH0PR04MB7416.namprd04.prod.outlook.com/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?h=dev-test&id=6afcf6493578e77528abe65ab8b12f3e1c16749f
[2] https://lore.kernel.org/all/[email protected]/T/
[3] https://lore.kernel.org/all/20220315135245.eqf4tqngxxb7ymqa@unifi/

Changes since v1:
- Put the function declaration and its usage in the same commit (Bart)
- Remove bdev_zone_aligned function (Bart)
- Change the name from blk_queue_zone_aligned to blk_queue_is_zone_start
(Damien)
- q is never null in from bdev_get_queue (Damien)
- Add condition during bringup and check for zsze == zcap for npo2
drives (Damien)
- Rounddown operation should be made generic to work in 32 bits arch
(bart)
- Add comments where generic calculation is directly used instead having
special handling for po2 zone sizes (Hannes)
- Make the minimum zone size alignment requirement for btrfs to be 1M
instead of BTRFS_STRIPE_LEN(David)

Changes since v2:
- Minor formatting changes

Changes since v3:
- Make superblock mirror align with the existing superblock log offsets
(David)
- DM change return value and remove extra newline
- Optimize null blk zone index lookup with shift for po2 zone size

Changes since v4:
- Remove direct filesystems support for npo2 devices (Johannes, Hannes,
Damien)

Luis Chamberlain (1):
dm-zoned: ensure only power of 2 zone sizes are allowed

Pankaj Raghav (6):
block: make blkdev_nr_zones and blk_queue_zone_no generic for npo2
zsze
block: allow blk-zoned devices to have non-power-of-2 zone size
nvme: zns: Allow ZNS drives that have non-power_of_2 zone size
nvmet: Allow ZNS target to support non-power_of_2 zone sizes
null_blk: allow non power of 2 zoned devices
null_blk: use zone_size_sects_shift for power of 2 zoned devices

block/blk-core.c | 3 +--
block/blk-zoned.c | 38 +++++++++++++++++++++++--------
drivers/block/null_blk/main.c | 5 ++--
drivers/block/null_blk/null_blk.h | 6 +++++
drivers/block/null_blk/zoned.c | 20 ++++++++++------
drivers/md/dm-zone.c | 12 ++++++++++
drivers/nvme/host/zns.c | 21 +++++++++--------
drivers/nvme/target/zns.c | 2 +-
include/linux/blkdev.h | 36 ++++++++++++++++++++++++++++-
9 files changed, 111 insertions(+), 32 deletions(-)

--
2.25.1



2022-05-23 16:17:21

by Pankaj Raghav

[permalink] [raw]
Subject: [PATCH v5 2/7] block: allow blk-zoned devices to have non-power-of-2 zone size

Checking if a given sector is aligned to a zone is a common
operation that is performed for zoned devices. Add
blk_queue_is_zone_start helper to check for this instead of opencoding it
everywhere.

Convert the calculations on zone size to be generic instead of relying on
power_of_2 based logic in the block layer using the helpers wherever
possible.

The only hot path affected by this change for power_of_2 zoned devices
is in blk_check_zone_append() but blk_queue_is_zone_start() helper is
used to optimize the calculation for po2 zone sizes. Note that the append
path cannot be accessed by direct raw access to the block device but only
through a filesystem abstraction.

Finally, allow non power of 2 zoned devices provided that their zone
capacity and zone size are equal. The main motivation to allow non
power_of_2 zoned device is to remove the unmapped LBA between zcap and
zsze for devices that cannot have a power_of_2 zcap.

Reviewed-by: Luis Chamberlain <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
---
block/blk-core.c | 3 +--
block/blk-zoned.c | 26 ++++++++++++++++++++------
include/linux/blkdev.h | 21 +++++++++++++++++++++
3 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index bc0506772152..bf1eae142118 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -630,8 +630,7 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
return BLK_STS_NOTSUPP;

/* The bio sector must point to the start of a sequential zone */
- if (pos & (blk_queue_zone_sectors(q) - 1) ||
- !blk_queue_zone_is_seq(q, pos))
+ if (!blk_queue_is_zone_start(q, pos) || !blk_queue_zone_is_seq(q, pos))
return BLK_STS_IOERR;

/*
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index e7eec513dd42..665993b13668 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -288,10 +288,10 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
return -EINVAL;

/* Check alignment (handle eventual smaller last zone) */
- if (sector & (zone_sectors - 1))
+ if (!blk_queue_is_zone_start(q, sector))
return -EINVAL;

- if ((nr_sectors & (zone_sectors - 1)) && end_sector != capacity)
+ if (!blk_queue_is_zone_start(q, nr_sectors) && end_sector != capacity)
return -EINVAL;

/*
@@ -489,14 +489,28 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx,
* smaller last zone.
*/
if (zone->start == 0) {
- if (zone->len == 0 || !is_power_of_2(zone->len)) {
- pr_warn("%s: Invalid zoned device with non power of two zone size (%llu)\n",
- disk->disk_name, zone->len);
+ if (zone->len == 0) {
+ pr_warn("%s: Invalid zone size", disk->disk_name);
+ return -ENODEV;
+ }
+
+ /*
+ * Don't allow zoned device with non power_of_2 zone size with
+ * zone capacity less than zone size.
+ */
+ if (!is_power_of_2(zone->len) &&
+ zone->capacity < zone->len) {
+ pr_warn("%s: Invalid zone capacity for non power of 2 zone size",
+ disk->disk_name);
return -ENODEV;
}

args->zone_sectors = zone->len;
- args->nr_zones = (capacity + zone->len - 1) >> ilog2(zone->len);
+ /*
+ * Division is used to calculate nr_zones for both power_of_2
+ * and non power_of_2 zone sizes as it is not in the hot path.
+ */
+ args->nr_zones = div64_u64(capacity + zone->len - 1, zone->len);
} else if (zone->start + args->zone_sectors < capacity) {
if (zone->len != args->zone_sectors) {
pr_warn("%s: Invalid zoned device with non constant zone size\n",
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c4e4c7071b7b..f5c7a41032ba 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -676,6 +676,21 @@ static inline unsigned int blk_queue_zone_no(struct request_queue *q,
return div64_u64(sector, zone_sectors);
}

+static inline bool blk_queue_is_zone_start(struct request_queue *q, sector_t sec)
+{
+ sector_t zone_sectors = blk_queue_zone_sectors(q);
+ u64 remainder = 0;
+
+ if (!blk_queue_is_zoned(q))
+ return false;
+
+ if (is_power_of_2(zone_sectors))
+ return IS_ALIGNED(sec, zone_sectors);
+
+ div64_u64_rem(sec, zone_sectors, &remainder);
+ return remainder == 0;
+}
+
static inline bool blk_queue_zone_is_seq(struct request_queue *q,
sector_t sector)
{
@@ -722,6 +737,12 @@ static inline unsigned int blk_queue_zone_no(struct request_queue *q,
{
return 0;
}
+
+static inline bool blk_queue_is_zone_start(struct request_queue *q, sector_t sec)
+{
+ return false;
+}
+
static inline unsigned int queue_max_open_zones(const struct request_queue *q)
{
return 0;
--
2.25.1


2022-05-23 16:17:23

by Pankaj Raghav

[permalink] [raw]
Subject: [PATCH v5 4/7] nvmet: Allow ZNS target to support non-power_of_2 zone sizes

A generic bdev_zone_no helper is added to calculate zone number for a given
sector in a block device. This helper internally uses blk_queue_zone_no to
find the zone number.

Use the helper bdev_zone_no() to calculate nr of zones. This let's us
make modifications to the math if needed in one place and adds now
support for npo2 zone devices.

Reviewed by: Adam Manzanares <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
---
drivers/nvme/target/zns.c | 2 +-
include/linux/blkdev.h | 7 +++++++
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
index e34718b09550..e41b6a6ef048 100644
--- a/drivers/nvme/target/zns.c
+++ b/drivers/nvme/target/zns.c
@@ -243,7 +243,7 @@ static unsigned long nvmet_req_nr_zones_from_slba(struct nvmet_req *req)
unsigned int sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);

return blkdev_nr_zones(req->ns->bdev->bd_disk) -
- (sect >> ilog2(bdev_zone_sectors(req->ns->bdev)));
+ bdev_zone_no(req->ns->bdev, sect);
}

static unsigned long get_nr_zones_from_buf(struct nvmet_req *req, u32 bufsize)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f5c7a41032ba..ed8742a72dcb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1382,6 +1382,13 @@ static inline sector_t bdev_zone_sectors(struct block_device *bdev)
return 0;
}

+static inline unsigned int bdev_zone_no(struct block_device *bdev, sector_t sec)
+{
+ struct request_queue *q = bdev_get_queue(bdev);
+
+ return blk_queue_zone_no(q, sec);
+}
+
static inline unsigned int bdev_max_open_zones(struct block_device *bdev)
{
struct request_queue *q = bdev_get_queue(bdev);
--
2.25.1


2022-05-23 16:17:23

by Pankaj Raghav

[permalink] [raw]
Subject: [PATCH v5 3/7] nvme: zns: Allow ZNS drives that have non-power_of_2 zone size

Remove the condition which disallows non-power_of_2 zone size ZNS drive
to be updated and use generic method to calculate number of zones
instead of relying on log and shift based calculation on zone size.

The power_of_2 calculation has been replaced directly with generic
calculation without special handling. Both modified functions are not
used in hot paths, they are only used during initialization &
revalidation of the ZNS device.

As rounddown macro from math.h does not work for 32 bit architectures,
round down operation is open coded.

Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed by: Adam Manzanares <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
---
drivers/nvme/host/zns.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
index 9f81beb4df4e..d92f937d5cb9 100644
--- a/drivers/nvme/host/zns.c
+++ b/drivers/nvme/host/zns.c
@@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
}

ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));
- if (!is_power_of_2(ns->zsze)) {
- dev_warn(ns->ctrl->device,
- "invalid zone size:%llu for namespace:%u\n",
- ns->zsze, ns->head->ns_id);
- status = -ENODEV;
- goto free_data;
- }

blk_queue_set_zoned(ns->disk, BLK_ZONED_HM);
blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
@@ -128,8 +121,13 @@ static void *nvme_zns_alloc_report_buffer(struct nvme_ns *ns,
const size_t min_bufsize = sizeof(struct nvme_zone_report) +
sizeof(struct nvme_zone_descriptor);

+ /*
+ * Division is used to calculate nr_zones with no special handling
+ * for power of 2 zone sizes as this function is not invoked in a
+ * hot path
+ */
nr_zones = min_t(unsigned int, nr_zones,
- get_capacity(ns->disk) >> ilog2(ns->zsze));
+ div64_u64(get_capacity(ns->disk), ns->zsze));

bufsize = sizeof(struct nvme_zone_report) +
nr_zones * sizeof(struct nvme_zone_descriptor);
@@ -182,6 +180,7 @@ int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
int ret, zone_idx = 0;
unsigned int nz, i;
size_t buflen;
+ u64 remainder = 0;

if (ns->head->ids.csi != NVME_CSI_ZNS)
return -EINVAL;
@@ -197,7 +196,11 @@ int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL;
c.zmr.pr = NVME_REPORT_ZONE_PARTIAL;

- sector &= ~(ns->zsze - 1);
+ /*
+ * Round down the sector value to the nearest zone start
+ */
+ div64_u64_rem(sector, ns->zsze, &remainder);
+ sector -= remainder;
while (zone_idx < nr_zones && sector < get_capacity(ns->disk)) {
memset(report, 0, buflen);

--
2.25.1


2022-05-23 16:17:29

by Pankaj Raghav

[permalink] [raw]
Subject: [PATCH v5 5/7] null_blk: allow non power of 2 zoned devices

Convert the power of 2 based calculation with zone size to be generic in
null_zone_no with optimization for power of 2 based zone sizes.

The nr_zones calculation in null_init_zoned_dev has been replaced with a
division without special handling for power of 2 based zone sizes as
this function is called only during the initialization and will not
invoked in the hot path.

Performance Measurement:

Device:
zone size = 128M, blocksize=4k

FIO cmd:

fio --name=zbc --filename=/dev/nullb0 --direct=1 --zonemode=zbd --size=23G
--io_size=<iosize> --ioengine=io_uring --iodepth=<iod> --rw=<mode> --bs=4k
--loops=4

The following results are an average of 4 runs on AMD Ryzen 5 5600X with
32GB of RAM:

Sequential Write:

x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patch | 578 | 2257 | 12.80 | 576 | 2248 | 25.78 |
x-----------------x---------------------------------x---------------------------------x
| With patch | 581 | 2268 | 12.74 | 576 | 2248 | 25.85 |
x-----------------x---------------------------------x---------------------------------x

Sequential read:

x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patch | 667 | 2605 | 11.79 | 675 | 2637 | 23.49 |
x-----------------x---------------------------------x---------------------------------x
| With patch | 667 | 2605 | 11.79 | 675 | 2638 | 23.48 |
x-----------------x---------------------------------x---------------------------------x

Random read:

x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patch | 522 | 2038 | 15.05 | 514 | 2006 | 30.87 |
x-----------------x---------------------------------x---------------------------------x
| With patch | 522 | 2039 | 15.04 | 523 | 2042 | 30.33 |
x-----------------x---------------------------------x---------------------------------x

Minor variations are noticed in Sequential write with io depth 8 and
in random read with io depth 16. But overall no noticeable differences
were noticed

Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed by: Adam Manzanares <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
---
drivers/block/null_blk/main.c | 5 ++---
drivers/block/null_blk/zoned.c | 14 +++++++-------
2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index c441a4972064..1dec51d69674 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -1930,9 +1930,8 @@ static int null_validate_conf(struct nullb_device *dev)
if (dev->queue_mode == NULL_Q_BIO)
dev->mbps = 0;

- if (dev->zoned &&
- (!dev->zone_size || !is_power_of_2(dev->zone_size))) {
- pr_err("zone_size must be power-of-two\n");
+ if (dev->zoned && !dev->zone_size) {
+ pr_err("Invalid zero zone size\n");
return -EINVAL;
}

diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
index dae54dd1aeac..00c34e65ef0a 100644
--- a/drivers/block/null_blk/zoned.c
+++ b/drivers/block/null_blk/zoned.c
@@ -13,7 +13,10 @@ static inline sector_t mb_to_sects(unsigned long mb)

static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
{
- return sect >> ilog2(dev->zone_size_sects);
+ if (is_power_of_2(dev->zone_size_sects))
+ return sect >> ilog2(dev->zone_size_sects);
+
+ return div64_u64(sect, dev->zone_size_sects);
}

static inline void null_lock_zone_res(struct nullb_device *dev)
@@ -62,10 +65,6 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
sector_t sector = 0;
unsigned int i;

- if (!is_power_of_2(dev->zone_size)) {
- pr_err("zone_size must be power-of-two\n");
- return -EINVAL;
- }
if (dev->zone_size > dev->size) {
pr_err("Zone size larger than device capacity\n");
return -EINVAL;
@@ -83,8 +82,9 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
zone_capacity_sects = mb_to_sects(dev->zone_capacity);
dev_capacity_sects = mb_to_sects(dev->size);
dev->zone_size_sects = mb_to_sects(dev->zone_size);
- dev->nr_zones = round_up(dev_capacity_sects, dev->zone_size_sects)
- >> ilog2(dev->zone_size_sects);
+ dev->nr_zones =
+ div64_u64(roundup(dev_capacity_sects, dev->zone_size_sects),
+ dev->zone_size_sects);

dev->zones = kvmalloc_array(dev->nr_zones, sizeof(struct nullb_zone),
GFP_KERNEL | __GFP_ZERO);
--
2.25.1


2022-05-23 16:17:56

by Pankaj Raghav

[permalink] [raw]
Subject: [PATCH v5 7/7] dm-zoned: ensure only power of 2 zone sizes are allowed

From: Luis Chamberlain <[email protected]>

Today dm-zoned relies on the assumption that you have a zone size
with a power of 2. Even though the block layer today enforces this
requirement, these devices do exist and so provide a stop-gap measure
to ensure these devices cannot be used by mistake

Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
---
drivers/md/dm-zone.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
index 57daa86c19cf..f87a1eda8252 100644
--- a/drivers/md/dm-zone.c
+++ b/drivers/md/dm-zone.c
@@ -231,6 +231,18 @@ static int dm_revalidate_zones(struct mapped_device *md, struct dm_table *t)
struct request_queue *q = md->queue;
unsigned int noio_flag;
int ret;
+ struct block_device *bdev = md->disk->part0;
+ sector_t zone_sectors;
+ char bname[BDEVNAME_SIZE];
+
+ zone_sectors = bdev_zone_sectors(bdev);
+
+ if (!is_power_of_2(zone_sectors)) {
+ DMWARN("%s: %s only power of two zone size supported",
+ dm_device_name(md),
+ bdevname(bdev, bname));
+ return -EINVAL;
+ }

/*
* Check if something changed. If yes, cleanup the current resources
--
2.25.1


2022-05-23 16:18:28

by Pankaj Raghav

[permalink] [raw]
Subject: [PATCH v5 6/7] null_blk: use zone_size_sects_shift for power of 2 zoned devices

Instead of doing is_power_of_2 and ilog2 operation for every IO, cache
the zone_size_sects_shift variable and use it for power of 2 zoned
devices.

This variable will be set to zero for non power of 2 zoned devices.

Suggested-by: Damien Le Moal <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
---
drivers/block/null_blk/null_blk.h | 6 ++++++
drivers/block/null_blk/zoned.c | 10 ++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
index 78eb56b0ca55..3d6e41a9491f 100644
--- a/drivers/block/null_blk/null_blk.h
+++ b/drivers/block/null_blk/null_blk.h
@@ -74,6 +74,12 @@ struct nullb_device {
unsigned int imp_close_zone_no;
struct nullb_zone *zones;
sector_t zone_size_sects;
+ /*
+ * zone_size_sects_shift is only useful when the zone size is
+ * power of 2. This variable is set to zero when zone size is non
+ * power of 2.
+ */
+ unsigned int zone_size_sects_shift;
bool need_zone_res_mgmt;
spinlock_t zone_res_lock;

diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
index 00c34e65ef0a..806bef98ac83 100644
--- a/drivers/block/null_blk/zoned.c
+++ b/drivers/block/null_blk/zoned.c
@@ -13,8 +13,8 @@ static inline sector_t mb_to_sects(unsigned long mb)

static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
{
- if (is_power_of_2(dev->zone_size_sects))
- return sect >> ilog2(dev->zone_size_sects);
+ if (dev->zone_size_sects_shift)
+ return sect >> dev->zone_size_sects_shift;

return div64_u64(sect, dev->zone_size_sects);
}
@@ -82,6 +82,12 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
zone_capacity_sects = mb_to_sects(dev->zone_capacity);
dev_capacity_sects = mb_to_sects(dev->size);
dev->zone_size_sects = mb_to_sects(dev->zone_size);
+
+ if (is_power_of_2(dev->zone_size_sects))
+ dev->zone_size_sects_shift = ilog2(dev->zone_size_sects);
+ else
+ dev->zone_size_sects_shift = 0;
+
dev->nr_zones =
div64_u64(roundup(dev_capacity_sects, dev->zone_size_sects),
dev->zone_size_sects);
--
2.25.1


2022-05-23 18:35:56

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH v5 6/7] null_blk: use zone_size_sects_shift for power of 2 zoned devices

On Mon, May 23, 2022 at 06:16:00PM +0200, Pankaj Raghav wrote:
> Instead of doing is_power_of_2 and ilog2 operation for every IO, cache
> the zone_size_sects_shift variable and use it for power of 2 zoned
> devices.
>
> This variable will be set to zero for non power of 2 zoned devices.
>
> Suggested-by: Damien Le Moal <[email protected]>
> Signed-off-by: Pankaj Raghav <[email protected]>

Reviewed-by: Luis Chamberlain <[email protected]>

Luis

2022-05-24 05:58:05

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v5 5/7] null_blk: allow non power of 2 zoned devices

Hi Pankaj,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on device-mapper-dm/for-next linus/master hch-configfs/for-next v5.18 next-20220523]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/intel-lab-lkp/linux/commits/Pankaj-Raghav/block-make-blkdev_nr_zones-and-blk_queue_zone_no-generic-for-npo2-zsze/20220524-011616
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: hexagon-randconfig-r045-20220523 (https://download.01.org/0day-ci/archive/20220524/[email protected]/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 10c9ecce9f6096e18222a331c5e7d085bd813f75)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/3d3c81da0adbd40eb0d2125327b7e227582b2a37
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Pankaj-Raghav/block-make-blkdev_nr_zones-and-blk_queue_zone_no-generic-for-npo2-zsze/20220524-011616
git checkout 3d3c81da0adbd40eb0d2125327b7e227582b2a37
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "__hexagon_umoddi3" [drivers/block/null_blk/null_blk.ko] undefined!

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-05-24 07:08:56

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v5 2/7] block: allow blk-zoned devices to have non-power-of-2 zone size

On 5/23/22 18:15, Pankaj Raghav wrote:
> Checking if a given sector is aligned to a zone is a common
> operation that is performed for zoned devices. Add
> blk_queue_is_zone_start helper to check for this instead of opencoding it
> everywhere.
>
> Convert the calculations on zone size to be generic instead of relying on
> power_of_2 based logic in the block layer using the helpers wherever
> possible.
>
> The only hot path affected by this change for power_of_2 zoned devices
> is in blk_check_zone_append() but blk_queue_is_zone_start() helper is
> used to optimize the calculation for po2 zone sizes. Note that the append
> path cannot be accessed by direct raw access to the block device but only
> through a filesystem abstraction.
>
> Finally, allow non power of 2 zoned devices provided that their zone
> capacity and zone size are equal. The main motivation to allow non
> power_of_2 zoned device is to remove the unmapped LBA between zcap and
> zsze for devices that cannot have a power_of_2 zcap.
>
> Reviewed-by: Luis Chamberlain <[email protected]>
> Signed-off-by: Pankaj Raghav <[email protected]>
> ---
> block/blk-core.c | 3 +--
> block/blk-zoned.c | 26 ++++++++++++++++++++------
> include/linux/blkdev.h | 21 +++++++++++++++++++++
> 3 files changed, 42 insertions(+), 8 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index bc0506772152..bf1eae142118 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -630,8 +630,7 @@ static inline blk_status_t blk_check_zone_append(struct request_queue *q,
> return BLK_STS_NOTSUPP;
>
> /* The bio sector must point to the start of a sequential zone */
> - if (pos & (blk_queue_zone_sectors(q) - 1) ||
> - !blk_queue_zone_is_seq(q, pos))
> + if (!blk_queue_is_zone_start(q, pos) || !blk_queue_zone_is_seq(q, pos))
> return BLK_STS_IOERR;
>
> /*
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index e7eec513dd42..665993b13668 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -288,10 +288,10 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
> return -EINVAL;
>
> /* Check alignment (handle eventual smaller last zone) */
> - if (sector & (zone_sectors - 1))
> + if (!blk_queue_is_zone_start(q, sector))
> return -EINVAL;
>
> - if ((nr_sectors & (zone_sectors - 1)) && end_sector != capacity)
> + if (!blk_queue_is_zone_start(q, nr_sectors) && end_sector != capacity)
> return -EINVAL;
>
> /*
> @@ -489,14 +489,28 @@ static int blk_revalidate_zone_cb(struct blk_zone *zone, unsigned int idx,
> * smaller last zone.
> */
> if (zone->start == 0) {
> - if (zone->len == 0 || !is_power_of_2(zone->len)) {
> - pr_warn("%s: Invalid zoned device with non power of two zone size (%llu)\n",
> - disk->disk_name, zone->len);
> + if (zone->len == 0) {
> + pr_warn("%s: Invalid zone size", disk->disk_name);
> + return -ENODEV;
> + }
> +
> + /*
> + * Don't allow zoned device with non power_of_2 zone size with
> + * zone capacity less than zone size.
> + */
> + if (!is_power_of_2(zone->len) &&
> + zone->capacity < zone->len) {
> + pr_warn("%s: Invalid zone capacity for non power of 2 zone size",
> + disk->disk_name);
> return -ENODEV;
> }
>
> args->zone_sectors = zone->len;
> - args->nr_zones = (capacity + zone->len - 1) >> ilog2(zone->len);
> + /*
> + * Division is used to calculate nr_zones for both power_of_2
> + * and non power_of_2 zone sizes as it is not in the hot path.
> + */
> + args->nr_zones = div64_u64(capacity + zone->len - 1, zone->len);
> } else if (zone->start + args->zone_sectors < capacity) {
> if (zone->len != args->zone_sectors) {
> pr_warn("%s: Invalid zoned device with non constant zone size\n",
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c4e4c7071b7b..f5c7a41032ba 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -676,6 +676,21 @@ static inline unsigned int blk_queue_zone_no(struct request_queue *q,
> return div64_u64(sector, zone_sectors);
> }
>
> +static inline bool blk_queue_is_zone_start(struct request_queue *q, sector_t sec)
> +{
> + sector_t zone_sectors = blk_queue_zone_sectors(q);
> + u64 remainder = 0;
> +
> + if (!blk_queue_is_zoned(q))
> + return false;
> +

Not sure if we need this here; surely blk_queue_zone_sectors() will
already barf, and none of the callers did this check.

> + if (is_power_of_2(zone_sectors))
> + return IS_ALIGNED(sec, zone_sectors);
> +
> + div64_u64_rem(sec, zone_sectors, &remainder);
> + return remainder == 0;
> +}
> +
> static inline bool blk_queue_zone_is_seq(struct request_queue *q,
> sector_t sector)
> {
> @@ -722,6 +737,12 @@ static inline unsigned int blk_queue_zone_no(struct request_queue *q,
> {
> return 0;
> }
> +
> +static inline bool blk_queue_is_zone_start(struct request_queue *q, sector_t sec)
> +{
> + return false;
> +}
> +
> static inline unsigned int queue_max_open_zones(const struct request_queue *q)
> {
> return 0;
Otherwise looks good.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

2022-05-24 07:37:42

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v5 3/7] nvme: zns: Allow ZNS drives that have non-power_of_2 zone size

On 5/23/22 18:15, Pankaj Raghav wrote:
> Remove the condition which disallows non-power_of_2 zone size ZNS drive
> to be updated and use generic method to calculate number of zones
> instead of relying on log and shift based calculation on zone size.
>
> The power_of_2 calculation has been replaced directly with generic
> calculation without special handling. Both modified functions are not
> used in hot paths, they are only used during initialization &
> revalidation of the ZNS device.
>
> As rounddown macro from math.h does not work for 32 bit architectures,
> round down operation is open coded.
>
> Reviewed-by: Luis Chamberlain <[email protected]>
> Reviewed by: Adam Manzanares <[email protected]>
> Signed-off-by: Pankaj Raghav <[email protected]>
> ---
> drivers/nvme/host/zns.c | 21 ++++++++++++---------
> 1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
> index 9f81beb4df4e..d92f937d5cb9 100644
> --- a/drivers/nvme/host/zns.c
> +++ b/drivers/nvme/host/zns.c
> @@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
> }
>
> ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));
> - if (!is_power_of_2(ns->zsze)) {
> - dev_warn(ns->ctrl->device,
> - "invalid zone size:%llu for namespace:%u\n",
> - ns->zsze, ns->head->ns_id);
> - status = -ENODEV;
> - goto free_data;
> - }
>
> blk_queue_set_zoned(ns->disk, BLK_ZONED_HM);
> blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
> @@ -128,8 +121,13 @@ static void *nvme_zns_alloc_report_buffer(struct nvme_ns *ns,
> const size_t min_bufsize = sizeof(struct nvme_zone_report) +
> sizeof(struct nvme_zone_descriptor);
>
> + /*
> + * Division is used to calculate nr_zones with no special handling
> + * for power of 2 zone sizes as this function is not invoked in a
> + * hot path
> + */
> nr_zones = min_t(unsigned int, nr_zones,
> - get_capacity(ns->disk) >> ilog2(ns->zsze));
> + div64_u64(get_capacity(ns->disk), ns->zsze));
>
> bufsize = sizeof(struct nvme_zone_report) +
> nr_zones * sizeof(struct nvme_zone_descriptor);
> @@ -182,6 +180,7 @@ int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
> int ret, zone_idx = 0;
> unsigned int nz, i;
> size_t buflen;
> + u64 remainder = 0;
>
> if (ns->head->ids.csi != NVME_CSI_ZNS)
> return -EINVAL;
> @@ -197,7 +196,11 @@ int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
> c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL;
> c.zmr.pr = NVME_REPORT_ZONE_PARTIAL;
>
> - sector &= ~(ns->zsze - 1);
> + /*
> + * Round down the sector value to the nearest zone start
> + */
> + div64_u64_rem(sector, ns->zsze, &remainder);
> + sector -= remainder;
> while (zone_idx < nr_zones && sector < get_capacity(ns->disk)) {
> memset(report, 0, buflen);
>
Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

2022-05-24 07:56:53

by Pankaj Raghav

[permalink] [raw]
Subject: Re: [PATCH v5 5/7] null_blk: allow non power of 2 zoned devices

On 5/24/22 04:40, kernel test robot wrote:
> Hi Pankaj,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on axboe-block/for-next]
> [also build test ERROR on device-mapper-dm/for-next linus/master hch-configfs/for-next v5.18 next-20220523]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://protect2.fireeye.com/v1/url?k=8acc50e6-d557681b-8acddba9-000babff317b-0ca211d60a57a8c6&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fgithub.com%2Fintel-lab-lkp%2Flinux%2Fcommits%2FPankaj-Raghav%2Fblock-make-blkdev_nr_zones-and-blk_queue_zone_no-generic-for-npo2-zsze%2F20220524-011616
> base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
> config: hexagon-randconfig-r045-20220523 (https://protect2.fireeye.com/v1/url?k=0dc32741-52581fbc-0dc2ac0e-000babff317b-fb6f258f0c80ebb9&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fdownload.01.org%2F0day-ci%2Farchive%2F20220524%2F202205241034.izkLMTcH-lkp%40intel.com%2Fconfig)
> compiler: clang version 15.0.0 (https://protect2.fireeye.com/v1/url?k=afbb4f4f-f02077b2-afbac400-000babff317b-a4413fa4cefe1877&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fgithub.com%2Fllvm%2Fllvm-project 10c9ecce9f6096e18222a331c5e7d085bd813f75)
> reproduce (this is a W=1 build):
> wget https://protect2.fireeye.com/v1/url?k=718573dc-2e1e4b21-7184f893-000babff317b-b6c06a1c569b0d77&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fraw.githubusercontent.com%2Fintel%2Flkp-tests%2Fmaster%2Fsbin%2Fmake.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # https://protect2.fireeye.com/v1/url?k=247d070f-7be63ff2-247c8c40-000babff317b-6fd7f2f6a5a5bc60&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fgithub.com%2Fintel-lab-lkp%2Flinux%2Fcommit%2F3d3c81da0adbd40eb0d2125327b7e227582b2a37
> git remote add linux-review https://protect2.fireeye.com/v1/url?k=081be8db-5780d026-081a6394-000babff317b-d12fdca0fccd0493&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fgithub.com%2Fintel-lab-lkp%2Flinux
> git fetch --no-tags linux-review Pankaj-Raghav/block-make-blkdev_nr_zones-and-blk_queue_zone_no-generic-for-npo2-zsze/20220524-011616
> git checkout 3d3c81da0adbd40eb0d2125327b7e227582b2a37
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <[email protected]>
>
> All errors (new ones prefixed by >>, old ones prefixed by <<):
>
>>> ERROR: modpost: "__hexagon_umoddi3" [drivers/block/null_blk/null_blk.ko] undefined!

I am able to apply my patches cleanly against next-20220523, build it
without any errors with GCC and boot into it in my x86_64 machine. Not
sure why this error is popping up.

2022-05-24 08:55:40

by Pankaj Raghav

[permalink] [raw]
Subject: Re: [PATCH v5 2/7] block: allow blk-zoned devices to have non-power-of-2 zone size

On 5/24/22 07:19, Hannes Reinecke wrote:

>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index c4e4c7071b7b..f5c7a41032ba 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -676,6 +676,21 @@ static inline unsigned int
>> blk_queue_zone_no(struct request_queue *q,
>>       return div64_u64(sector, zone_sectors);
>>   }
>>   +static inline bool blk_queue_is_zone_start(struct request_queue *q,
>> sector_t sec)
>> +{
>> +    sector_t zone_sectors = blk_queue_zone_sectors(q);
>> +    u64 remainder = 0;
>> +
>> +    if (!blk_queue_is_zoned(q))
>> +        return false;
>> +
>
> Not sure if we need this here; surely blk_queue_zone_sectors() will
> already barf, and none of the callers did this check.
>
I totally agree with you but all the other blk_queue_* functions had
this defensive check and I didn't want to break that pattern:

static inline unsigned int blk_queue_zone_no(struct request_queue *q,
sector_t sector)
{
....
if (!blk_queue_is_zoned(q))
return 0;
....
}


>> +    if (is_power_of_2(zone_sectors))
>> +        return IS_ALIGNED(sec, zone_sectors);
>> +
And, if the chunk sectors is 0, then we will do the npo2 calculation
resulting in a divide by zero even though chances of that happening is
very very low as you pointed out.
>> +    div64_u64_rem(sec, zone_sectors, &remainder);
>> +    return remainder == 0;
>> +}
>> +
>>   static inline bool blk_queue_zone_is_seq(struct request_queue *q,
>>                        sector_t sector)
>>   {
>> @@ -722,6 +737,12 @@ static inline unsigned int
>> blk_queue_zone_no(struct request_queue *q,
>>   {
>>       return 0;
>>   }
>> +
>> +static inline bool blk_queue_is_zone_start(struct request_queue *q,
>> sector_t sec)
>> +{
>> +    return false;
>> +}
>> +
>>   static inline unsigned int queue_max_open_zones(const struct
>> request_queue *q)
>>   {
>>       return 0;
> Otherwise looks good.
>
Thanks!
> Cheers,
>
> Hannes

2022-05-24 09:48:32

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v5 6/7] null_blk: use zone_size_sects_shift for power of 2 zoned devices

On 5/23/22 18:16, Pankaj Raghav wrote:
> Instead of doing is_power_of_2 and ilog2 operation for every IO, cache
> the zone_size_sects_shift variable and use it for power of 2 zoned
> devices.
>
> This variable will be set to zero for non power of 2 zoned devices.
>
> Suggested-by: Damien Le Moal <[email protected]>
> Signed-off-by: Pankaj Raghav <[email protected]>
> ---
> drivers/block/null_blk/null_blk.h | 6 ++++++
> drivers/block/null_blk/zoned.c | 10 ++++++++--
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/null_blk/null_blk.h b/drivers/block/null_blk/null_blk.h
> index 78eb56b0ca55..3d6e41a9491f 100644
> --- a/drivers/block/null_blk/null_blk.h
> +++ b/drivers/block/null_blk/null_blk.h
> @@ -74,6 +74,12 @@ struct nullb_device {
> unsigned int imp_close_zone_no;
> struct nullb_zone *zones;
> sector_t zone_size_sects;
> + /*
> + * zone_size_sects_shift is only useful when the zone size is
> + * power of 2. This variable is set to zero when zone size is non
> + * power of 2.
> + */
> + unsigned int zone_size_sects_shift;
> bool need_zone_res_mgmt;
> spinlock_t zone_res_lock;
>
> diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
> index 00c34e65ef0a..806bef98ac83 100644
> --- a/drivers/block/null_blk/zoned.c
> +++ b/drivers/block/null_blk/zoned.c
> @@ -13,8 +13,8 @@ static inline sector_t mb_to_sects(unsigned long mb)
>
> static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
> {
> - if (is_power_of_2(dev->zone_size_sects))
> - return sect >> ilog2(dev->zone_size_sects);
> + if (dev->zone_size_sects_shift)
> + return sect >> dev->zone_size_sects_shift;
>
> return div64_u64(sect, dev->zone_size_sects);
> }
> @@ -82,6 +82,12 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
> zone_capacity_sects = mb_to_sects(dev->zone_capacity);
> dev_capacity_sects = mb_to_sects(dev->size);
> dev->zone_size_sects = mb_to_sects(dev->zone_size);
> +
> + if (is_power_of_2(dev->zone_size_sects))
> + dev->zone_size_sects_shift = ilog2(dev->zone_size_sects);
> + else
> + dev->zone_size_sects_shift = 0;
> +
> dev->nr_zones =
> div64_u64(roundup(dev_capacity_sects, dev->zone_size_sects),
> dev->zone_size_sects);
Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

2022-05-24 14:04:47

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH v5 2/7] block: allow blk-zoned devices to have non-power-of-2 zone size

On 5/24/22 09:10, Pankaj Raghav wrote:
> On 5/24/22 07:19, Hannes Reinecke wrote:
>
>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>> index c4e4c7071b7b..f5c7a41032ba 100644
>>> --- a/include/linux/blkdev.h
>>> +++ b/include/linux/blkdev.h
>>> @@ -676,6 +676,21 @@ static inline unsigned int
>>> blk_queue_zone_no(struct request_queue *q,
>>>       return div64_u64(sector, zone_sectors);
>>>   }
>>>   +static inline bool blk_queue_is_zone_start(struct request_queue *q,
>>> sector_t sec)
>>> +{
>>> +    sector_t zone_sectors = blk_queue_zone_sectors(q);
>>> +    u64 remainder = 0;
>>> +
>>> +    if (!blk_queue_is_zoned(q))
>>> +        return false;
>>> +
>>
>> Not sure if we need this here; surely blk_queue_zone_sectors() will
>> already barf, and none of the callers did this check.
>>
> I totally agree with you but all the other blk_queue_* functions had
> this defensive check and I didn't want to break that pattern:
>
Ah, fair enough.

Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman

2022-05-25 07:13:38

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v5 5/7] null_blk: allow non power of 2 zoned devices

On Tue, May 24, 2022 at 09:30:41AM +0200, Pankaj Raghav wrote:
> On 5/24/22 04:40, kernel test robot wrote:
> > Hi Pankaj,
> >
> > Thank you for the patch! Yet something to improve:
> >
> > [auto build test ERROR on axboe-block/for-next]
> > [also build test ERROR on device-mapper-dm/for-next linus/master hch-configfs/for-next v5.18 next-20220523]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.com/docs/git-format-patch]
> >
> > url: https://protect2.fireeye.com/v1/url?k=8acc50e6-d557681b-8acddba9-000babff317b-0ca211d60a57a8c6&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fgithub.com%2Fintel-lab-lkp%2Flinux%2Fcommits%2FPankaj-Raghav%2Fblock-make-blkdev_nr_zones-and-blk_queue_zone_no-generic-for-npo2-zsze%2F20220524-011616
> > base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
> > config: hexagon-randconfig-r045-20220523 (https://protect2.fireeye.com/v1/url?k=0dc32741-52581fbc-0dc2ac0e-000babff317b-fb6f258f0c80ebb9&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fdownload.01.org%2F0day-ci%2Farchive%2F20220524%2F202205241034.izkLMTcH-lkp%40intel.com%2Fconfig)
> > compiler: clang version 15.0.0 (https://protect2.fireeye.com/v1/url?k=afbb4f4f-f02077b2-afbac400-000babff317b-a4413fa4cefe1877&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fgithub.com%2Fllvm%2Fllvm-project 10c9ecce9f6096e18222a331c5e7d085bd813f75)
> > reproduce (this is a W=1 build):
> > wget https://protect2.fireeye.com/v1/url?k=718573dc-2e1e4b21-7184f893-000babff317b-b6c06a1c569b0d77&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fraw.githubusercontent.com%2Fintel%2Flkp-tests%2Fmaster%2Fsbin%2Fmake.cross -O ~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > # https://protect2.fireeye.com/v1/url?k=247d070f-7be63ff2-247c8c40-000babff317b-6fd7f2f6a5a5bc60&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fgithub.com%2Fintel-lab-lkp%2Flinux%2Fcommit%2F3d3c81da0adbd40eb0d2125327b7e227582b2a37
> > git remote add linux-review https://protect2.fireeye.com/v1/url?k=081be8db-5780d026-081a6394-000babff317b-d12fdca0fccd0493&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fgithub.com%2Fintel-lab-lkp%2Flinux
> > git fetch --no-tags linux-review Pankaj-Raghav/block-make-blkdev_nr_zones-and-blk_queue_zone_no-generic-for-npo2-zsze/20220524-011616
> > git checkout 3d3c81da0adbd40eb0d2125327b7e227582b2a37
> > # save the config file
> > mkdir build_dir && cp config build_dir/.config
> > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag where applicable
> > Reported-by: kernel test robot <[email protected]>
> >
> > All errors (new ones prefixed by >>, old ones prefixed by <<):
> >
> >>> ERROR: modpost: "__hexagon_umoddi3" [drivers/block/null_blk/null_blk.ko] undefined!
>
> I am able to apply my patches cleanly against next-20220523, build it
> without any errors with GCC and boot into it in my x86_64 machine. Not
> sure why this error is popping up.

Do a 32-bit build and you'll see it. With GCC 12.1.0, ARCH=i386
defconfig + CONFIG_BLK_DEV_NULL_BLK=y + CONFIG_BLK_DEV_ZONED=y
reproduces it for me:

$ make -skj"$(nproc)" ARCH=i386 defconfig menuconfig all
ld: drivers/block/null_blk/zoned.o: in function `null_init_zoned_dev':
zoned.c:(.text+0x112e): undefined reference to `__umoddi3'
...

roundup() does a plain division with a 64-bit dividend, which will cause
issues with 32-bit architectures. I suspect that you should really be
using DIV_ROUND_UP_SECTOR_T() for the nr_zones calculation or maybe one
of the other rounding macros in include/linux/math.h but I am not sure.

Cheers,
Nathan

2022-05-25 07:54:01

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v5 5/7] null_blk: allow non power of 2 zoned devices

Hi Pankaj,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on axboe-block/for-next]
[also build test ERROR on device-mapper-dm/for-next linus/master hch-configfs/for-next v5.18 next-20220524]
[cannot apply to linux-nvme/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/intel-lab-lkp/linux/commits/Pankaj-Raghav/block-make-blkdev_nr_zones-and-blk_queue_zone_no-generic-for-npo2-zsze/20220524-011616
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-a015 (https://download.01.org/0day-ci/archive/20220525/[email protected]/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 10c9ecce9f6096e18222a331c5e7d085bd813f75)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/3d3c81da0adbd40eb0d2125327b7e227582b2a37
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Pankaj-Raghav/block-make-blkdev_nr_zones-and-blk_queue_zone_no-generic-for-npo2-zsze/20220524-011616
git checkout 3d3c81da0adbd40eb0d2125327b7e227582b2a37
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: __umoddi3
>>> referenced by zoned.c:89 (drivers/block/null_blk/zoned.c:89)
>>> block/null_blk/zoned.o:(null_init_zoned_dev) in archive drivers/built-in.a

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-05-25 10:13:24

by Pankaj Raghav

[permalink] [raw]
Subject: Re: [PATCH v5 5/7] null_blk: allow non power of 2 zoned devices

On 5/24/22 17:22, Nathan Chancellor wrote:
>>> git remote add linux-review https://protect2.fireeye.com/v1/url?k=081be8db-5780d026-081a6394-000babff317b-d12fdca0fccd0493&q=1&e=5ef82219-ef70-445f-a7d0-0ae0a30be69f&u=https%3A%2F%2Fgithub.com%2Fintel-lab-lkp%2Flinux
>>> git fetch --no-tags linux-review Pankaj-Raghav/block-make-blkdev_nr_zones-and-blk_queue_zone_no-generic-for-npo2-zsze/20220524-011616
>>> git checkout 3d3c81da0adbd40eb0d2125327b7e227582b2a37
>>> # save the config file
>>> mkdir build_dir && cp config build_dir/.config
>>> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash
>>>
>>> If you fix the issue, kindly add following tag where applicable
>>> Reported-by: kernel test robot <[email protected]>
>>>
>>> All errors (new ones prefixed by >>, old ones prefixed by <<):
>>>
>>>>> ERROR: modpost: "__hexagon_umoddi3" [drivers/block/null_blk/null_blk.ko] undefined!
>>
>> I am able to apply my patches cleanly against next-20220523, build it
>> without any errors with GCC and boot into it in my x86_64 machine. Not
>> sure why this error is popping up.
>
> Do a 32-bit build and you'll see it. With GCC 12.1.0, ARCH=i386
> defconfig + CONFIG_BLK_DEV_NULL_BLK=y + CONFIG_BLK_DEV_ZONED=y
> reproduces it for me:
>
> $ make -skj"$(nproc)" ARCH=i386 defconfig menuconfig all
> ld: drivers/block/null_blk/zoned.o: in function `null_init_zoned_dev':
> zoned.c:(.text+0x112e): undefined reference to `__umoddi3'
> ...
>
Ah.. I didn't see anything about 32 bit ARCH in the email so it didn't
strike to me that was the problem. It said ARCH=hexagon and I had no
idea what it was. Thanks for reproducing it.
> roundup() does a plain division with a 64-bit dividend, which will cause
> issues with 32-bit architectures. I suspect that you should really be
> using DIV_ROUND_UP_SECTOR_T() for the nr_zones calculation or maybe one
> of the other rounding macros in include/linux/math.h but I am not sure.
>
That must be it. I will fix it. Thanks Nathan.
> Cheers,
> Nathan