2022-05-09 02:33:35

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH v3 10/11] null_blk: allow non power of 2 zoned devices

On 2022/05/06 17:11, Pankaj Raghav wrote:
> Convert the power of 2 based calculation with zone size to be generic in
> null_zone_no with optimization for power of 2 based zone sizes.
>
> The nr_zones calculation in null_init_zoned_dev has been replaced with a
> division without special handling for power of 2 based zone sizes as
> this function is called only during the initialization and will not
> invoked in the hot path.
>
> Performance Measurement:
>
> Device:
> zone size = 128M, blocksize=4k
>
> FIO cmd:
>
> fio --name=zbc --filename=/dev/nullb0 --direct=1 --zonemode=zbd --size=23G
> --io_size=<iosize> --ioengine=io_uring --iodepth=<iod> --rw=<mode> --bs=4k
> --loops=4
>
> The following results are an average of 4 runs on AMD Ryzen 5 5600X with
> 32GB of RAM:
>
> Sequential Write:
>
> x-----------------x---------------------------------x---------------------------------x
> | IOdepth | 8 | 16 |
> x-----------------x---------------------------------x---------------------------------x
> | | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
> x-----------------x---------------------------------x---------------------------------x
> | Without patch | 578 | 2257 | 12.80 | 576 | 2248 | 25.78 |
> x-----------------x---------------------------------x---------------------------------x
> | With patch | 581 | 2268 | 12.74 | 576 | 2248 | 25.85 |
> x-----------------x---------------------------------x---------------------------------x
>
> Sequential read:
>
> x-----------------x---------------------------------x---------------------------------x
> | IOdepth | 8 | 16 |
> x-----------------x---------------------------------x---------------------------------x
> | | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
> x-----------------x---------------------------------x---------------------------------x
> | Without patch | 667 | 2605 | 11.79 | 675 | 2637 | 23.49 |
> x-----------------x---------------------------------x---------------------------------x
> | With patch | 667 | 2605 | 11.79 | 675 | 2638 | 23.48 |
> x-----------------x---------------------------------x---------------------------------x
>
> Random read:
>
> x-----------------x---------------------------------x---------------------------------x
> | IOdepth | 8 | 16 |
> x-----------------x---------------------------------x---------------------------------x
> | | KIOPS |BW(MiB/s) | Lat(usec) | KIOPS |BW(MiB/s) | Lat(usec) |
> x-----------------x---------------------------------x---------------------------------x
> | Without patch | 522 | 2038 | 15.05 | 514 | 2006 | 30.87 |
> x-----------------x---------------------------------x---------------------------------x
> | With patch | 522 | 2039 | 15.04 | 523 | 2042 | 30.33 |
> x-----------------x---------------------------------x---------------------------------x
>
> Minor variations are noticed in Sequential write with io depth 8 and
> in random read with io depth 16. But overall no noticeable differences
> were noticed
>
> Reviewed-by: Luis Chamberlain <[email protected]>
> Reviewed by: Adam Manzanares <[email protected]>
> Reviewed-by: Hannes Reinecke <[email protected]>
> Signed-off-by: Pankaj Raghav <[email protected]>
> ---
> drivers/block/null_blk/main.c | 5 ++---
> drivers/block/null_blk/zoned.c | 14 +++++++-------
> 2 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index 5cb4c92cd..ed9a58201 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -1929,9 +1929,8 @@ static int null_validate_conf(struct nullb_device *dev)
> if (dev->queue_mode == NULL_Q_BIO)
> dev->mbps = 0;
>
> - if (dev->zoned &&
> - (!dev->zone_size || !is_power_of_2(dev->zone_size))) {
> - pr_err("zone_size must be power-of-two\n");
> + if (dev->zoned && !dev->zone_size) {
> + pr_err("zone_size must not be zero\n");

May be a simpler phrasing would be better:

pr_err("Invalid zero zone size\n");

> return -EINVAL;
> }
>
> diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
> index dae54dd1a..00c34e65e 100644
> --- a/drivers/block/null_blk/zoned.c
> +++ b/drivers/block/null_blk/zoned.c
> @@ -13,7 +13,10 @@ static inline sector_t mb_to_sects(unsigned long mb)
>
> static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
> {
> - return sect >> ilog2(dev->zone_size_sects);
> + if (is_power_of_2(dev->zone_size_sects))
> + return sect >> ilog2(dev->zone_size_sects);

As a separate patch, I think we should really have ilog2(dev->zone_size_sects)
as a dev field to avoid doing this ilog2 for every call..

> +
> + return div64_u64(sect, dev->zone_size_sects);
> }
>
> static inline void null_lock_zone_res(struct nullb_device *dev)
> @@ -62,10 +65,6 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
> sector_t sector = 0;
> unsigned int i;
>
> - if (!is_power_of_2(dev->zone_size)) {
> - pr_err("zone_size must be power-of-two\n");
> - return -EINVAL;
> - }
> if (dev->zone_size > dev->size) {
> pr_err("Zone size larger than device capacity\n");
> return -EINVAL;
> @@ -83,8 +82,9 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
> zone_capacity_sects = mb_to_sects(dev->zone_capacity);
> dev_capacity_sects = mb_to_sects(dev->size);
> dev->zone_size_sects = mb_to_sects(dev->zone_size);
> - dev->nr_zones = round_up(dev_capacity_sects, dev->zone_size_sects)
> - >> ilog2(dev->zone_size_sects);
> + dev->nr_zones =
> + div64_u64(roundup(dev_capacity_sects, dev->zone_size_sects),
> + dev->zone_size_sects);
>
> dev->zones = kvmalloc_array(dev->nr_zones, sizeof(struct nullb_zone),
> GFP_KERNEL | __GFP_ZERO);


--
Damien Le Moal
Western Digital Research


2022-05-09 11:55:57

by Pankaj Raghav

[permalink] [raw]
Subject: Re: [PATCH v3 10/11] null_blk: allow non power of 2 zoned devices


>> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
>> index 5cb4c92cd..ed9a58201 100644
>> --- a/drivers/block/null_blk/main.c
>> +++ b/drivers/block/null_blk/main.c
>> @@ -1929,9 +1929,8 @@ static int null_validate_conf(struct nullb_device *dev)
>> if (dev->queue_mode == NULL_Q_BIO)
>> dev->mbps = 0;
>>
>> - if (dev->zoned &&
>> - (!dev->zone_size || !is_power_of_2(dev->zone_size))) {
>> - pr_err("zone_size must be power-of-two\n");
>> + if (dev->zoned && !dev->zone_size) {
>> + pr_err("zone_size must not be zero\n");
>
> May be a simpler phrasing would be better:
>
> pr_err("Invalid zero zone size\n");
>
Ack. I will change this in the next rev.
>> return -EINVAL;
>> }
>>
>> diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
>> index dae54dd1a..00c34e65e 100644
>> --- a/drivers/block/null_blk/zoned.c
>> +++ b/drivers/block/null_blk/zoned.c
>> @@ -13,7 +13,10 @@ static inline sector_t mb_to_sects(unsigned long mb)
>>
>> static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
>> {
>> - return sect >> ilog2(dev->zone_size_sects);
>> + if (is_power_of_2(dev->zone_size_sects))
>> + return sect >> ilog2(dev->zone_size_sects);
>
> As a separate patch, I think we should really have ilog2(dev->zone_size_sects)
> as a dev field to avoid doing this ilog2 for every call..
>
I don't think that is possible because `zone_size_sects` can also be non
po2.

2022-05-09 12:08:17

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH v3 10/11] null_blk: allow non power of 2 zoned devices

On 2022/05/09 20:06, Pankaj Raghav wrote:
>
>>> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
>>> index 5cb4c92cd..ed9a58201 100644
>>> --- a/drivers/block/null_blk/main.c
>>> +++ b/drivers/block/null_blk/main.c
>>> @@ -1929,9 +1929,8 @@ static int null_validate_conf(struct nullb_device *dev)
>>> if (dev->queue_mode == NULL_Q_BIO)
>>> dev->mbps = 0;
>>>
>>> - if (dev->zoned &&
>>> - (!dev->zone_size || !is_power_of_2(dev->zone_size))) {
>>> - pr_err("zone_size must be power-of-two\n");
>>> + if (dev->zoned && !dev->zone_size) {
>>> + pr_err("zone_size must not be zero\n");
>>
>> May be a simpler phrasing would be better:
>>
>> pr_err("Invalid zero zone size\n");
>>
> Ack. I will change this in the next rev.
>>> return -EINVAL;
>>> }
>>>
>>> diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
>>> index dae54dd1a..00c34e65e 100644
>>> --- a/drivers/block/null_blk/zoned.c
>>> +++ b/drivers/block/null_blk/zoned.c
>>> @@ -13,7 +13,10 @@ static inline sector_t mb_to_sects(unsigned long mb)
>>>
>>> static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
>>> {
>>> - return sect >> ilog2(dev->zone_size_sects);
>>> + if (is_power_of_2(dev->zone_size_sects))
>>> + return sect >> ilog2(dev->zone_size_sects);
>>
>> As a separate patch, I think we should really have ilog2(dev->zone_size_sects)
>> as a dev field to avoid doing this ilog2 for every call..
>>
> I don't think that is possible because `zone_size_sects` can also be non
> po2.

But when it is we can optimize that. All we need is add a "zone_size_sect_shift"
field that is initialized when zone_size_sects is set when the device is
created. Then, you can have code like:

if (dev->zone_size_sect_shift))
return sect >> dev->zone_size_sect_shift;

Which avoids both is_power_of_2() and ilog2() calls for every IO.

--
Damien Le Moal
Western Digital Research

2022-05-09 12:24:41

by Pankaj Raghav

[permalink] [raw]
Subject: Re: [PATCH v3 10/11] null_blk: allow non power of 2 zoned devices


On 2022-05-09 13:31, Damien Le Moal wrote:
>>>> diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
>>>> index dae54dd1a..00c34e65e 100644
>>>> --- a/drivers/block/null_blk/zoned.c
>>>> +++ b/drivers/block/null_blk/zoned.c
>>>> @@ -13,7 +13,10 @@ static inline sector_t mb_to_sects(unsigned long mb)
>>>>
>>>> static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
>>>> {
>>>> - return sect >> ilog2(dev->zone_size_sects);
>>>> + if (is_power_of_2(dev->zone_size_sects))
>>>> + return sect >> ilog2(dev->zone_size_sects);
>>>
>>> As a separate patch, I think we should really have ilog2(dev->zone_size_sects)
>>> as a dev field to avoid doing this ilog2 for every call..
>>>
>> I don't think that is possible because `zone_size_sects` can also be non
>> po2.
>
> But when it is we can optimize that. All we need is add a "zone_size_sect_shift"
> field that is initialized when zone_size_sects is set when the device is
> created. Then, you can have code like:
>
> if (dev->zone_size_sect_shift))
> return sect >> dev->zone_size_sect_shift;
>
My only concern was confusing people who are reading the code where they
might implicitly assume that it can only be po2 as we have shift_sects.

Even though I am not sure if this optimization will directly add value
looking at my experiments with the current change, I can fold this in
with a comment on top of zone_size_sect_shifts variable stating that
size can be npo2 and this variable is only meaningful for the po2 size
scenario.

2022-05-13 07:34:48

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v3 10/11] null_blk: allow non power of 2 zoned devices

On 5/9/22 04:56, Pankaj Raghav wrote:
> Even though I am not sure if this optimization will directly add value
> looking at my experiments with the current change, I can fold this in
> with a comment on top of zone_size_sect_shifts variable stating that
> size can be npo2 and this variable is only meaningful for the po2 size
> scenario.

Have these experiments perhaps been run on an x86_64 CPU? These CPUs
only need a single instruction to calculate ilog2(). No equivalent of
that instruction is available on ARM CPUs as far as I know. I think the
optimization Damien proposed will help on ARM CPUs.

Thanks,

Bart.