2022-10-26 15:38:24

by Dawei Li

[permalink] [raw]
Subject: [PATCH] block: simplify blksize_bits() implementation

Convert current looping-based implementation into bit operation,
which can bring improvement for:

1) bitops is more efficient for its arch-level optimization.

2) Given that blksize_bits() is inline, _if_ @size is compile-time
constant, it's possible that order_base_2() _may_ make output
compile-time evaluated, depending on code context and compiler behavior.

Signed-off-by: Dawei Li <[email protected]>
---
include/linux/blkdev.h | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 50e358a19d98..117061c8b9a1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1349,12 +1349,7 @@ static inline int blk_rq_aligned(struct request_queue *q, unsigned long addr,
/* assumes size > 256 */
static inline unsigned int blksize_bits(unsigned int size)
{
- unsigned int bits = 8;
- do {
- bits++;
- size >>= 1;
- } while (size > 256);
- return bits;
+ return size > 512 ? order_base_2(size) : 9;
}

static inline unsigned int block_size(struct block_device *bdev)
--
2.25.1



2022-10-26 16:52:13

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] block: simplify blksize_bits() implementation

On Wed, Oct 26, 2022 at 09:29:21AM -0700, Bart Van Assche wrote:
> On 10/26/22 08:14, Dawei Li wrote:
> > Convert current looping-based implementation into bit operation,
> > which can bring improvement for:
> >
> > 1) bitops is more efficient for its arch-level optimization.
>
> As far as I know blksize_bits() is not used in the hot path so performance
> of this function is not critical.

blksize_bits() is used on every IO going through iomap_dio_bio_iter(),
though the usage there is completely unnecessary and can be removed.

2022-10-26 17:13:50

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] block: simplify blksize_bits() implementation

On 10/26/22 08:14, Dawei Li wrote:
> Convert current looping-based implementation into bit operation,
> which can bring improvement for:
>
> 1) bitops is more efficient for its arch-level optimization.

As far as I know blksize_bits() is not used in the hot path so
performance of this function is not critical.

> 2) Given that blksize_bits() is inline, _if_ @size is compile-time
> constant, it's possible that order_base_2() _may_ make output
> compile-time evaluated, depending on code context and compiler behavior.
>
> Signed-off-by: Dawei Li <[email protected]>
> ---
> include/linux/blkdev.h | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 50e358a19d98..117061c8b9a1 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1349,12 +1349,7 @@ static inline int blk_rq_aligned(struct request_queue *q, unsigned long addr,
> /* assumes size > 256 */
> static inline unsigned int blksize_bits(unsigned int size)
> {
> - unsigned int bits = 8;
> - do {
> - bits++;
> - size >>= 1;
> - } while (size > 256);
> - return bits;
> + return size > 512 ? order_base_2(size) : 9;
> }

How about optimizing this function even further by eliminating the
ternary operator, e.g. as follows (untested)?

return order_base_2(size >> SECTOR_SHIFT) + SECTOR_SHIFT;

Thanks,

Bart.