- Background and Motivation:
The zone storage implementation in Linux, introduced since v4.10, first
targetted SMR drives which have a power of 2 (po2) zone size alignment
requirement. The po2 zone size was further imposed implicitly by the
block layer's blk_queue_chunk_sectors(), used to prevent IO merging
across chunks beyond the specified size, since v3.16 through commit
762380ad9322 ("block: add notion of a chunk size for request merging").
But this same general block layer po2 requirement for blk_queue_chunk_sectors()
was removed on v5.10 through commit 07d098e6bbad ("block: allow 'chunk_sectors'
to be non-power-of-2").
NAND, which is the media used in newer zoned storage devices, does not
naturally align to po2. In these devices, zone capacity(cap) is not the
same as the po2 zone size. When the zone cap != zone size, then unmapped
LBAs are introduced to cover the space between the zone cap and zone size.
po2 requirement does not make sense for these type of zone storage devices.
This patch series aims to remove these unmapped LBAs for zoned devices when
zone cap is npo2. This is done by relaxing the po2 zone size constraint
in the kernel and allowing zoned device with npo2 zone sizes if zone cap
== zone size.
Removing the po2 requirement from zone storage should be possible
now provided that no userspace regression and no performance regressions are
introduced. Stop-gap patches have been already merged into f2fs-tools to
proactively not allow npo2 zone sizes until proper support is added [1].
There were two efforts previously to add support to npo2 devices: 1) via
device level emulation [2] but that was rejected with a final conclusion
to add support for non po2 zoned device in the complete stack[3] 2)
adding support to the complete stack by removing the constraint in the
block layer and NVMe layer with support to btrfs, zonefs, etc which was
rejected with a conclusion to add a dm target for FS support [0]
to reduce the regression impact.
This series adds support to npo2 zoned devices in the block and nvme
layer and a new **dm target** is added: dm-po2z-target. This new
target will be initially used for filesystems such as btrfs and
f2fs until native npo2 zone support is added.
- Patchset description:
Patches 1-3 deals with removing the po2 constraint from the
block layer.
Patches 4-5 deals with removing the constraint from nvme zns.
Patch 5 removes the po2 contraint in null blk
Patch 6 adds npo2 support to zonefs
Patches 7-13 adds support for npo2 zoned devices in the DM layer and
adds a new target dm-po2z-target which converts a zoned device with npo2
zone size into a zoned target with po2 zone size.
The patch series is based on linux-next tag: next-20220809
Testing:
The new target was tested with blktest and zonefs test suite in qemu and
on a real ZNS device with npo2 zone size.
Performance Measurement on a null blk:
Device:
zone size = 128M, blocksize=4k
FIO cmd:
fio --name=zbc --filename=/dev/nullb0 --direct=1 --zonemode=zbd --size=23G
--io_size=<iosize> --ioengine=io_uring --iodepth=<iod> --rw=<mode> --bs=4k
--loops=4
The following results are an average of 4 runs on AMD Ryzen 5 5600X with
32GB of RAM:
Sequential Write:
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patch | 578 | 2257 | 12.80 | 576 | 2248 | 25.78 |
x-----------------x---------------------------------x---------------------------------x
| With patch | 581 | 2268 | 12.74 | 576 | 2248 | 25.85 |
x-----------------x---------------------------------x---------------------------------x
Sequential read:
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patch | 667 | 2605 | 11.79 | 675 | 2637 | 23.49 |
x-----------------x---------------------------------x---------------------------------x
| With patch | 667 | 2605 | 11.79 | 675 | 2638 | 23.48 |
x-----------------x---------------------------------x---------------------------------x
Random read:
x-----------------x---------------------------------x---------------------------------x
| IOdepth | 8 | 16 |
x-----------------x---------------------------------x---------------------------------x
| | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
x-----------------x---------------------------------x---------------------------------x
| Without patch | 522 | 2038 | 15.05 | 514 | 2006 | 30.87 |
x-----------------x---------------------------------x---------------------------------x
| With patch | 522 | 2039 | 15.04 | 523 | 2042 | 30.33 |
x-----------------x---------------------------------x---------------------------------x
Minor variations are noticed in Sequential write with io depth 8 and
in random read with io depth 16. But overall no noticeable differences
were noticed
[0] https://lore.kernel.org/lkml/PH0PR04MB74166C87F694B150A5AE0F009BD09@PH0PR04MB7416.namprd04.prod.outlook.com/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git/commit/?h=dev-test&id=6afcf6493578e77528abe65ab8b12f3e1c16749f
[2] https://lore.kernel.org/all/[email protected]/T/
[3] https://lore.kernel.org/all/20220315135245.eqf4tqngxxb7ymqa@unifi/
Changes since v1:
- Put the function declaration and its usage in the same commit (Bart)
- Remove bdev_zone_aligned function (Bart)
- Change the name from blk_queue_zone_aligned to blk_queue_is_zone_start
(Damien)
- q is never null in from bdev_get_queue (Damien)
- Add condition during bringup and check for zsze == zcap for npo2
drives (Damien)
- Rounddown operation should be made generic to work in 32 bits arch
(bart)
- Add comments where generic calculation is directly used instead having
special handling for po2 zone sizes (Hannes)
- Make the minimum zone size alignment requirement for btrfs to be 1M
instead of BTRFS_STRIPE_LEN(David)
Changes since v2:
- Minor formatting changes
Changes since v3:
- Make superblock mirror align with the existing superblock log offsets
(David)
- DM change return value and remove extra newline
- Optimize null blk zone index lookup with shift for po2 zone size
Changes since v4:
- Remove direct filesystems support for npo2 devices (Johannes, Hannes,
Damien)
Changes since v5:
- Use DIV_ROUND_UP* helper instead of round_up as it breaks 32bit arch
build in null blk(kernel-test-robot, Nathan)
- Use DIV_ROUND_UP_SECTOR_T also in blkdev_nr_zones function instead of
open coding it with div64_u64
- Added extra condition in dm-zoned and in dm to reject non power of 2
zone sizes.
Changes since v6:
- Added a new dm target for non power of 2 devices
- Added support for non power of 2 devices in the DM layer.
Changes since v7:
- Improved dm target for non power of 2 zoned devices with some bug
fixes and rearrangement
- Removed some unnecessary comments.
Changes since v8:
- Rename dm-po2z to dm-po2zone
- set max_io_len for the target to po2 zone size sector
- Simplify dm-po2zone target by removing some superfluous conditions
- Added documentation for the new dm-po2zone target
- Change pr_warn to pr_err for critical errors
- Split patch 2 and 11 with their corresponding prep patches
- Minor spelling and grammatical improvements
Changes since v9:
- Add a check for a zoned device in dm-po2zone ctr.
- Rephrased some commit messages and documentation for clarity
Luis Chamberlain (1):
dm-zoned: ensure only power of 2 zone sizes are allowed
Pankaj Raghav (12):
block: make bdev_nr_zones and disk_zone_no generic for npo2 zone size
block:rearrange bdev_{is_zoned,zone_sectors,get_queue} helpers in
blkdev.h
block: allow blk-zoned devices to have non-power-of-2 zone size
nvmet: Allow ZNS target to support non-power_of_2 zone sizes
nvme: zns: Allow ZNS drives that have non-power_of_2 zone size
null_blk: allow zoned devices with non power-of-2 zone sizes
zonefs: allow non power of 2 zoned devices
dm-zone: use generic helpers to calculate offset from zone start
dm-table: allow zoned devices with non power-of-2 zone sizes
dm: call dm_zone_endio after the target endio callback for zoned
devices
dm: introduce DM_EMULATED_ZONES target type
dm: add power-of-2 target for zoned devices with non power-of-2 zone
sizes
.../admin-guide/device-mapper/dm-po2zone.rst | 71 +++++
.../admin-guide/device-mapper/index.rst | 1 +
block/blk-core.c | 2 +-
block/blk-zoned.c | 37 ++-
drivers/block/null_blk/main.c | 5 +-
drivers/block/null_blk/null_blk.h | 1 +
drivers/block/null_blk/zoned.c | 18 +-
drivers/md/Kconfig | 10 +
drivers/md/Makefile | 2 +
drivers/md/dm-po2zone-target.c | 245 ++++++++++++++++++
drivers/md/dm-table.c | 20 +-
drivers/md/dm-zone.c | 8 +-
drivers/md/dm-zoned-target.c | 8 +
drivers/md/dm.c | 8 +-
drivers/nvme/host/zns.c | 16 +-
drivers/nvme/target/zns.c | 3 +-
fs/zonefs/super.c | 6 +-
fs/zonefs/zonefs.h | 1 -
include/linux/blkdev.h | 80 ++++--
include/linux/device-mapper.h | 9 +
20 files changed, 476 insertions(+), 75 deletions(-)
create mode 100644 Documentation/admin-guide/device-mapper/dm-po2zone.rst
create mode 100644 drivers/md/dm-po2zone-target.c
--
2.25.1
From: Luis Chamberlain <[email protected]>
dm-zoned relies on the assumption that the zone size is a
power-of-2(po2) and the zone capacity is same as the zone size.
Ensure only po2 devices can be used as dm-zoned target until a native
support for zoned devices with non-po2 zone size is added.
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
---
drivers/md/dm-zoned-target.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index 95b132b52f33..9325bf5dee81 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -792,6 +792,10 @@ static int dmz_fixup_devices(struct dm_target *ti)
return -EINVAL;
}
zone_nr_sectors = bdev_zone_sectors(bdev);
+ if (!is_power_of_2(zone_nr_sectors)) {
+ ti->error = "Zone size is not a power-of-2 number of sectors";
+ return -EINVAL;
+ }
zoned_dev->zone_nr_sectors = zone_nr_sectors;
zoned_dev->nr_zones = bdev_nr_zones(bdev);
}
@@ -804,6 +808,10 @@ static int dmz_fixup_devices(struct dm_target *ti)
return -EINVAL;
}
zoned_dev->zone_nr_sectors = bdev_zone_sectors(bdev);
+ if (!is_power_of_2(zoned_dev->zone_nr_sectors)) {
+ ti->error = "Zone size is not a power-of-2 number of sectors";
+ return -EINVAL;
+ }
zoned_dev->nr_zones = bdev_nr_zones(bdev);
}
--
2.25.1
Remove the condition which disallows non-power_of_2 zone size ZNS drive
to be updated and use generic method to calculate number of zones
instead of relying on log and shift based calculation on zone size.
The power_of_2 calculation has been replaced directly with generic
calculation without special handling. Both modified functions are not
used in hot paths, they are only used during initialization &
revalidation of the ZNS device.
As rounddown macro from math.h does not work for 32 bit architectures,
round down operation is open coded.
Reviewed-by: Luis Chamberlain <[email protected]>
Reviewed by: Adam Manzanares <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
---
drivers/nvme/host/zns.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
index 12316ab51bda..73e4ad495ae8 100644
--- a/drivers/nvme/host/zns.c
+++ b/drivers/nvme/host/zns.c
@@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
}
ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));
- if (!is_power_of_2(ns->zsze)) {
- dev_warn(ns->ctrl->device,
- "invalid zone size:%llu for namespace:%u\n",
- ns->zsze, ns->head->ns_id);
- status = -ENODEV;
- goto free_data;
- }
disk_set_zoned(ns->disk, BLK_ZONED_HM);
blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
@@ -129,7 +122,7 @@ static void *nvme_zns_alloc_report_buffer(struct nvme_ns *ns,
sizeof(struct nvme_zone_descriptor);
nr_zones = min_t(unsigned int, nr_zones,
- get_capacity(ns->disk) >> ilog2(ns->zsze));
+ div64_u64(get_capacity(ns->disk), ns->zsze));
bufsize = sizeof(struct nvme_zone_report) +
nr_zones * sizeof(struct nvme_zone_descriptor);
@@ -182,6 +175,7 @@ int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
int ret, zone_idx = 0;
unsigned int nz, i;
size_t buflen;
+ u64 remainder = 0;
if (ns->head->ids.csi != NVME_CSI_ZNS)
return -EINVAL;
@@ -197,7 +191,11 @@ int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
c.zmr.zrasf = NVME_ZRASF_ZONE_REPORT_ALL;
c.zmr.pr = NVME_REPORT_ZONE_PARTIAL;
- sector &= ~(ns->zsze - 1);
+ /*
+ * Round down the sector value to the nearest zone start
+ */
+ div64_u64_rem(sector, ns->zsze, &remainder);
+ sector -= remainder;
while (zone_idx < nr_zones && sector < get_capacity(ns->disk)) {
memset(report, 0, buflen);
--
2.25.1
Introduce a new target type DM_EMULATED_ZONES for targets with
a different number of sectors per zone (aka zone size) than the underlying
device zone size.
This target type is introduced as the existing zoned targets assume
that the target and the underlying device have the same zone size.
The new target: dm-po2zone will use this new target
type as it emulates the zone boundary that is different from the
underlying zoned device.
Signed-off-by: Pankaj Raghav <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
---
drivers/md/dm-table.c | 13 ++++++++++---
include/linux/device-mapper.h | 9 +++++++++
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 31eb1d29d136..b37991ea3ffb 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1614,13 +1614,20 @@ static bool dm_table_supports_zoned_model(struct dm_table *t,
return true;
}
-static int device_not_matches_zone_sectors(struct dm_target *ti, struct dm_dev *dev,
+/*
+ * Callback function to check for device zone sector across devices. If the
+ * DM_TARGET_EMULATED_ZONES target feature flag is not set, then the target
+ * should have the same zone sector as the underlying devices.
+ */
+static int check_valid_device_zone_sectors(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
{
unsigned int *zone_sectors = data;
- if (!bdev_is_zoned(dev->bdev))
+ if (!bdev_is_zoned(dev->bdev) ||
+ dm_target_supports_emulated_zones(ti->type))
return 0;
+
return bdev_zone_sectors(dev->bdev) != *zone_sectors;
}
@@ -1645,7 +1652,7 @@ static int validate_hardware_zoned_model(struct dm_table *t,
if (!zone_sectors)
return -EINVAL;
- if (dm_table_any_dev_attr(t, device_not_matches_zone_sectors, &zone_sectors)) {
+ if (dm_table_any_dev_attr(t, check_valid_device_zone_sectors, &zone_sectors)) {
DMERR("%s: zone sectors is not consistent across all zoned devices",
dm_device_name(t->md));
return -EINVAL;
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 04c6acf7faaa..83e20de264c9 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -294,6 +294,15 @@ struct target_type {
#define dm_target_supports_mixed_zoned_model(type) (false)
#endif
+#ifdef CONFIG_BLK_DEV_ZONED
+#define DM_TARGET_EMULATED_ZONES 0x00000400
+#define dm_target_supports_emulated_zones(type) \
+ ((type)->features & DM_TARGET_EMULATED_ZONES)
+#else
+#define DM_TARGET_EMULATED_ZONES 0x00000000
+#define dm_target_supports_emulated_zones(type) (false)
+#endif
+
struct dm_target {
struct dm_table *table;
struct target_type *type;
--
2.25.1
Define bdev_is_zoned(), bdev_zone_sectors() and bdev_get_queue() earlier
in the blkdev.h include file. Simplify bdev_is_zoned() by removing the
superfluous NULL check for request queue while we are at it.
This commit has no functional change, and it is a prep patch for allowing
zoned devices with non-power-of-2 zone sizes in the block layer.
Signed-off-by: Pankaj Raghav <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
---
include/linux/blkdev.h | 43 +++++++++++++++++++-----------------------
1 file changed, 19 insertions(+), 24 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ab82d1ff0cce..84e7881262e3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -635,6 +635,11 @@ static inline bool queue_is_mq(struct request_queue *q)
return q->mq_ops;
}
+static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
+{
+ return bdev->bd_queue; /* this is never NULL */
+}
+
#ifdef CONFIG_PM
static inline enum rpm_status queue_rpm_status(struct request_queue *q)
{
@@ -666,6 +671,20 @@ static inline bool blk_queue_is_zoned(struct request_queue *q)
}
}
+static inline bool bdev_is_zoned(struct block_device *bdev)
+{
+ return blk_queue_is_zoned(bdev_get_queue(bdev));
+}
+
+static inline sector_t bdev_zone_sectors(struct block_device *bdev)
+{
+ struct request_queue *q = bdev_get_queue(bdev);
+
+ if (!blk_queue_is_zoned(q))
+ return 0;
+ return q->limits.chunk_sectors;
+}
+
#ifdef CONFIG_BLK_DEV_ZONED
static inline unsigned int disk_nr_zones(struct gendisk *disk)
{
@@ -892,11 +911,6 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags);
int iocb_bio_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob,
unsigned int flags);
-static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
-{
- return bdev->bd_queue; /* this is never NULL */
-}
-
/* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */
const char *blk_zone_cond_str(enum blk_zone_cond zone_cond);
@@ -1296,25 +1310,6 @@ static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
return BLK_ZONED_NONE;
}
-static inline bool bdev_is_zoned(struct block_device *bdev)
-{
- struct request_queue *q = bdev_get_queue(bdev);
-
- if (q)
- return blk_queue_is_zoned(q);
-
- return false;
-}
-
-static inline sector_t bdev_zone_sectors(struct block_device *bdev)
-{
- struct request_queue *q = bdev_get_queue(bdev);
-
- if (!blk_queue_is_zoned(q))
- return 0;
- return q->limits.chunk_sectors;
-}
-
static inline int queue_dma_alignment(const struct request_queue *q)
{
return q ? q->dma_alignment : 511;
--
2.25.1
On 2022/08/11 7:30, Pankaj Raghav wrote:
> Define bdev_is_zoned(), bdev_zone_sectors() and bdev_get_queue() earlier
> in the blkdev.h include file. Simplify bdev_is_zoned() by removing the
> superfluous NULL check for request queue while we are at it.
>
> This commit has no functional change, and it is a prep patch for allowing
> zoned devices with non-power-of-2 zone sizes in the block layer.
>
> Signed-off-by: Pankaj Raghav <[email protected]>
> Reviewed-by: Bart Van Assche <[email protected]>
> ---
> include/linux/blkdev.h | 43 +++++++++++++++++++-----------------------
> 1 file changed, 19 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index ab82d1ff0cce..84e7881262e3 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -635,6 +635,11 @@ static inline bool queue_is_mq(struct request_queue *q)
> return q->mq_ops;
> }
>
> +static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
> +{
> + return bdev->bd_queue; /* this is never NULL */
> +}
> +
> #ifdef CONFIG_PM
> static inline enum rpm_status queue_rpm_status(struct request_queue *q)
> {
> @@ -666,6 +671,20 @@ static inline bool blk_queue_is_zoned(struct request_queue *q)
> }
> }
>
> +static inline bool bdev_is_zoned(struct block_device *bdev)
> +{
> + return blk_queue_is_zoned(bdev_get_queue(bdev));
> +}
You changed this too, so drop the current reviewed-by tag please.
For the next round, feel free to add:
Reviewed-by: Damien Le Moal <[email protected]>
> +
> +static inline sector_t bdev_zone_sectors(struct block_device *bdev)
> +{
> + struct request_queue *q = bdev_get_queue(bdev);
> +
> + if (!blk_queue_is_zoned(q))
> + return 0;
> + return q->limits.chunk_sectors;
> +}
> +
> #ifdef CONFIG_BLK_DEV_ZONED
> static inline unsigned int disk_nr_zones(struct gendisk *disk)
> {
> @@ -892,11 +911,6 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags);
> int iocb_bio_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob,
> unsigned int flags);
>
> -static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
> -{
> - return bdev->bd_queue; /* this is never NULL */
> -}
> -
> /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */
> const char *blk_zone_cond_str(enum blk_zone_cond zone_cond);
>
> @@ -1296,25 +1310,6 @@ static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev)
> return BLK_ZONED_NONE;
> }
>
> -static inline bool bdev_is_zoned(struct block_device *bdev)
> -{
> - struct request_queue *q = bdev_get_queue(bdev);
> -
> - if (q)
> - return blk_queue_is_zoned(q);
> -
> - return false;
> -}
> -
> -static inline sector_t bdev_zone_sectors(struct block_device *bdev)
> -{
> - struct request_queue *q = bdev_get_queue(bdev);
> -
> - if (!blk_queue_is_zoned(q))
> - return 0;
> - return q->limits.chunk_sectors;
> -}
> -
> static inline int queue_dma_alignment(const struct request_queue *q)
> {
> return q ? q->dma_alignment : 511;
--
Damien Le Moal
Western Digital Research
On Thu, Aug 11, 2022 at 04:30:35PM +0200, Pankaj Raghav wrote:
> @@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
> }
>
> ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));
> - if (!is_power_of_2(ns->zsze)) {
> - dev_warn(ns->ctrl->device,
> - "invalid zone size:%llu for namespace:%u\n",
> - ns->zsze, ns->head->ns_id);
> - status = -ENODEV;
> - goto free_data;
> - }
We used to bail out early if the format wasn't supported, which wouldn't create
the namespace.
Now you are relying on blk_revalidate_disk_zones() to report an error for
invalid format, but the handling for an error there is different: the namespace
still gets created. I'm not sure if that's a problem, but it's different.
Hi Keith,
On 2022-08-16 23:14, Keith Busch wrote:
> On Thu, Aug 11, 2022 at 04:30:35PM +0200, Pankaj Raghav wrote:
>> @@ -101,13 +101,6 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
>> }
>>
>> ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));
>> - if (!is_power_of_2(ns->zsze)) {
>> - dev_warn(ns->ctrl->device,
>> - "invalid zone size:%llu for namespace:%u\n",
>> - ns->zsze, ns->head->ns_id);
>> - status = -ENODEV;
>> - goto free_data;
>> - }
>
> We used to bail out early if the format wasn't supported, which wouldn't create
> the namespace.
>
> Now you are relying on blk_revalidate_disk_zones() to report an error for
> invalid format, but the handling for an error there is different: the namespace
> still gets created. I'm not sure if that's a problem, but it's different.
That is right but it is not a problem. Once the block layer supports the
non-po2 zone sizes, we don't need to bail out here because it will be
compatible. If unequal zone sizes are found, block layer will report an
error during blk_revalidate_disk_zones() which is the current behavior anyway.
Other checks such as zone operation support are still in
nvme_update_zone_info() resulting in an early bail out if violated.