2021-10-18 17:41:40

by Jens Axboe

[permalink] [raw]
Subject: Re: don't use ->bd_inode to access the block device size v3

On 10/18/21 11:18 AM, Christoph Hellwig wrote:
> On Mon, Oct 18, 2021 at 11:16:08AM -0600, Jens Axboe wrote:
>> This looks good to me. Followup question, as it's related - I've got a
>> hacky patch that caches the inode size in the bdev:
>>
>> https://git.kernel.dk/cgit/linux-block/commit/?h=perf-wip&id=c754951eb7193258c35a574bd1ccccb7c4946ee4
>>
>> so we don't have to dip into the inode itself for the fast path. While
>> it's obviously not something being proposed for inclusion right now, is
>> there a world in which we can make something like that work?
>
> There's just two places that update i_size for block devices:
> set_capacity and bdev_set_nr_sectors. So you just need to update
> bd_nr_sectors there and you're done.

This on top of your patches should do the trick, then.


commit eebb7c5048163985fb21d6cb740ebac78cb46051
Author: Jens Axboe <[email protected]>
Date: Mon Oct 18 11:39:45 2021 -0600

block: cache inode size in bdev

Reading the inode size brings in a new cacheline for IO submit, and
it's in the hot path being checked for every single IO. When doing
millions of IOs per core per second, this is noticeable overhead.

Cache the nr_sectors in the bdev itself.

Signed-off-by: Jens Axboe <[email protected]>

diff --git a/block/genhd.c b/block/genhd.c
index 759bc06810f8..53495e3391e3 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -58,6 +58,7 @@ void set_capacity(struct gendisk *disk, sector_t sectors)

spin_lock(&bdev->bd_size_lock);
i_size_write(bdev->bd_inode, (loff_t)sectors << SECTOR_SHIFT);
+ bdev->bd_nr_sectors = sectors;
spin_unlock(&bdev->bd_size_lock);
}
EXPORT_SYMBOL(set_capacity);
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 9dbddc355b40..66ef9bc6d6a1 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -91,6 +91,7 @@ static void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors)
{
spin_lock(&bdev->bd_size_lock);
i_size_write(bdev->bd_inode, (loff_t)sectors << SECTOR_SHIFT);
+ bdev->bd_nr_sectors = sectors;
spin_unlock(&bdev->bd_size_lock);
}

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 472e55e0e94f..fe065c394fff 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -39,6 +39,7 @@ struct bio_crypt_ctx;

struct block_device {
sector_t bd_start_sect;
+ sector_t bd_nr_sectors;
struct disk_stats __percpu *bd_stats;
unsigned long bd_stamp;
bool bd_read_only; /* read-only policy */
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 7b0326661a1e..001f617f82da 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -238,7 +238,7 @@ static inline sector_t get_start_sect(struct block_device *bdev)

static inline loff_t bdev_nr_bytes(struct block_device *bdev)
{
- return i_size_read(bdev->bd_inode);
+ return bdev->bd_nr_sectors;
}

static inline sector_t bdev_nr_sectors(struct block_device *bdev)

--
Jens Axboe


2021-10-18 17:49:43

by Christoph Hellwig

[permalink] [raw]
Subject: Re: don't use ->bd_inode to access the block device size v3

On Mon, Oct 18, 2021 at 11:40:51AM -0600, Jens Axboe wrote:
> static inline loff_t bdev_nr_bytes(struct block_device *bdev)
> {
> - return i_size_read(bdev->bd_inode);
> + return bdev->bd_nr_sectors;

This hunk needs to go into bdev_nr_sectors, and the bdev_nr_bytes
probably wants to call bdev_nr_sectors and do the shifting.