2020-09-21 08:10:55

by Christoph Hellwig

[permalink] [raw]
Subject: bdi cleanups v6

Hi Jens,

this series contains a bunch of different BDI cleanups. The biggest item
is to isolate block drivers from the BDI in preparation of changing the
lifetime of the block device BDI in a follow up series.

Changes since v5:
- improve a commit message
- improve the stable_writes deprecation printk
- drop "drbd: remove RB_CONGESTED_REMOTE"
- drop a few hunks that add a local variable in a otherwise unchanged
file due to changes in the previous revisions
- keep updating ->io_pages in queue_max_sectors_store
- set an optimal I/O size in aoe
- inherit the optimal I/O size in bcache

Changes since v4:
- add a back a prematurely removed assignment in dm-table.c
- pick up a few reviews from Johannes that got lost

Changes since v3:
- rebased on the lasted block tree, which has some of the prep
changes merged
- extend the ->ra_pages changes to ->io_pages
- move initializing ->ra_pages and ->io_pages for block devices to
blk_register_queue

Changes since v2:
- fix a rw_page return value check
- fix up various changelogs

Changes since v1:
- rebased to the for-5.9/block-merge branch
- explicitly set the readahead to 0 for ubifs, vboxsf and mtd
- split the zram block_device operations
- let rw_page users fall back to bios in swap_readpage


Diffstat:


2020-09-21 08:11:44

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 12/13] bdi: invert BDI_CAP_NO_ACCT_WB

Replace BDI_CAP_NO_ACCT_WB with a positive BDI_CAP_WRITEBACK_ACCT to
make the checks more obvious. Also remove the pointless
bdi_cap_account_writeback wrapper that just obsfucates the check.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
---
fs/fuse/inode.c | 3 ++-
include/linux/backing-dev.h | 13 +++----------
mm/backing-dev.c | 1 +
mm/page-writeback.c | 4 ++--
4 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 17b00670fb539e..581329203d6860 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1050,7 +1050,8 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
return err;

/* fuse does it's own writeback accounting */
- sb->s_bdi->capabilities = BDI_CAP_NO_ACCT_WB | BDI_CAP_STRICTLIMIT;
+ sb->s_bdi->capabilities &= ~BDI_CAP_WRITEBACK_ACCT;
+ sb->s_bdi->capabilities |= BDI_CAP_STRICTLIMIT;

/*
* For a single fuse filesystem use max 1% of dirty +
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 5da4ea3dd0cc5c..b217344a2c63be 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -120,17 +120,17 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio);
*
* BDI_CAP_NO_ACCT_DIRTY: Dirty pages shouldn't contribute to accounting
* BDI_CAP_NO_WRITEBACK: Don't write pages back
- * BDI_CAP_NO_ACCT_WB: Don't automatically account writeback pages
+ * BDI_CAP_WRITEBACK_ACCT: Automatically account writeback pages
* BDI_CAP_STRICTLIMIT: Keep number of dirty pages below bdi threshold.
*/
#define BDI_CAP_NO_ACCT_DIRTY 0x00000001
#define BDI_CAP_NO_WRITEBACK 0x00000002
-#define BDI_CAP_NO_ACCT_WB 0x00000004
+#define BDI_CAP_WRITEBACK_ACCT 0x00000004
#define BDI_CAP_STRICTLIMIT 0x00000010
#define BDI_CAP_CGROUP_WRITEBACK 0x00000020

#define BDI_CAP_NO_ACCT_AND_WRITEBACK \
- (BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_ACCT_WB)
+ (BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY)

extern struct backing_dev_info noop_backing_dev_info;

@@ -179,13 +179,6 @@ static inline bool bdi_cap_account_dirty(struct backing_dev_info *bdi)
return !(bdi->capabilities & BDI_CAP_NO_ACCT_DIRTY);
}

-static inline bool bdi_cap_account_writeback(struct backing_dev_info *bdi)
-{
- /* Paranoia: BDI_CAP_NO_WRITEBACK implies BDI_CAP_NO_ACCT_WB */
- return !(bdi->capabilities & (BDI_CAP_NO_ACCT_WB |
- BDI_CAP_NO_WRITEBACK));
-}
-
static inline bool mapping_cap_writeback_dirty(struct address_space *mapping)
{
return bdi_cap_writeback_dirty(inode_to_bdi(mapping->host));
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8e3802bf03a968..df18f0088dd3f5 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -745,6 +745,7 @@ struct backing_dev_info *bdi_alloc(int node_id)
kfree(bdi);
return NULL;
}
+ bdi->capabilities = BDI_CAP_WRITEBACK_ACCT;
bdi->ra_pages = VM_READAHEAD_PAGES;
bdi->io_pages = VM_READAHEAD_PAGES;
return bdi;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index e9c36521461aaa..0139f9622a92da 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2738,7 +2738,7 @@ int test_clear_page_writeback(struct page *page)
if (ret) {
__xa_clear_mark(&mapping->i_pages, page_index(page),
PAGECACHE_TAG_WRITEBACK);
- if (bdi_cap_account_writeback(bdi)) {
+ if (bdi->capabilities & BDI_CAP_WRITEBACK_ACCT) {
struct bdi_writeback *wb = inode_to_wb(inode);

dec_wb_stat(wb, WB_WRITEBACK);
@@ -2791,7 +2791,7 @@ int __test_set_page_writeback(struct page *page, bool keep_write)
PAGECACHE_TAG_WRITEBACK);

xas_set_mark(&xas, PAGECACHE_TAG_WRITEBACK);
- if (bdi_cap_account_writeback(bdi))
+ if (bdi->capabilities & BDI_CAP_WRITEBACK_ACCT)
inc_wb_stat(inode_to_wb(inode), WB_WRITEBACK);

/*
--
2.28.0

2020-09-21 08:11:59

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 03/13] bcache: inherit the optimal I/O size

Inherit the optimal I/O size setting just like the readahead window,
as any reason to do larger I/O does not apply to just readahead.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/md/bcache/super.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 1bbdc410ee3c51..48113005ed86ad 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1430,6 +1430,8 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size)
dc->disk.disk->queue->backing_dev_info->ra_pages =
max(dc->disk.disk->queue->backing_dev_info->ra_pages,
q->backing_dev_info->ra_pages);
+ blk_queue_io_opt(dc->disk.disk->queue,
+ max(queue_io_opt(dc->disk.disk->queue), queue_io_opt(q)));

atomic_set(&dc->io_errors, 0);
dc->io_disable = false;
--
2.28.0

2020-09-21 08:12:05

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 05/13] bdi: initialize ->ra_pages and ->io_pages in bdi_init

Set up a readahead size by default, as very few users have a good
reason to change it.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: David Sterba <[email protected]> [btrfs]
Acked-by: Richard Weinberger <[email protected]> [ubifs, mtd]
---
block/blk-core.c | 2 --
drivers/mtd/mtdcore.c | 2 ++
fs/9p/vfs_super.c | 6 ++++--
fs/afs/super.c | 1 -
fs/btrfs/disk-io.c | 1 -
fs/fuse/inode.c | 1 -
fs/nfs/super.c | 9 +--------
fs/ubifs/super.c | 2 ++
fs/vboxsf/super.c | 2 ++
mm/backing-dev.c | 2 ++
10 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ca3f0f00c9435f..865d39e5be2b28 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -538,8 +538,6 @@ struct request_queue *blk_alloc_queue(int node_id)
if (!q->stats)
goto fail_stats;

- q->backing_dev_info->ra_pages = VM_READAHEAD_PAGES;
- q->backing_dev_info->io_pages = VM_READAHEAD_PAGES;
q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
q->node = node_id;

diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 7d930569a7dfb7..b5e5d3140f578e 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -2196,6 +2196,8 @@ static struct backing_dev_info * __init mtd_bdi_init(char *name)
bdi = bdi_alloc(NUMA_NO_NODE);
if (!bdi)
return ERR_PTR(-ENOMEM);
+ bdi->ra_pages = 0;
+ bdi->io_pages = 0;

/*
* We put '-0' suffix to the name to get the same name format as we
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 74df32be4c6a52..e34fa20acf612e 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -80,8 +80,10 @@ v9fs_fill_super(struct super_block *sb, struct v9fs_session_info *v9ses,
if (ret)
return ret;

- if (v9ses->cache)
- sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
+ if (!v9ses->cache) {
+ sb->s_bdi->ra_pages = 0;
+ sb->s_bdi->io_pages = 0;
+ }

sb->s_flags |= SB_ACTIVE | SB_DIRSYNC;
if (!v9ses->cache)
diff --git a/fs/afs/super.c b/fs/afs/super.c
index b552357b1d1379..3a40ee752c1e3f 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -456,7 +456,6 @@ static int afs_fill_super(struct super_block *sb, struct afs_fs_context *ctx)
ret = super_setup_bdi(sb);
if (ret)
return ret;
- sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;

/* allocate the root inode and dentry */
if (as->dyn_root) {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f6bba7eb1fa171..047934cea25efa 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3092,7 +3092,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
}

sb->s_bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK;
- sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index bba747520e9b08..17b00670fb539e 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1049,7 +1049,6 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
if (err)
return err;

- sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
/* fuse does it's own writeback accounting */
sb->s_bdi->capabilities = BDI_CAP_NO_ACCT_WB | BDI_CAP_STRICTLIMIT;

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 7a70287f21a2c1..f943e37853fa25 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1200,13 +1200,6 @@ static void nfs_get_cache_cookie(struct super_block *sb,
}
#endif

-static void nfs_set_readahead(struct backing_dev_info *bdi,
- unsigned long iomax_pages)
-{
- bdi->ra_pages = VM_READAHEAD_PAGES;
- bdi->io_pages = iomax_pages;
-}
-
int nfs_get_tree_common(struct fs_context *fc)
{
struct nfs_fs_context *ctx = nfs_fc2context(fc);
@@ -1251,7 +1244,7 @@ int nfs_get_tree_common(struct fs_context *fc)
MINOR(server->s_dev));
if (error)
goto error_splat_super;
- nfs_set_readahead(s->s_bdi, server->rpages);
+ s->s_bdi->io_pages = server->rpages;
server->super = s;
}

diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index a2420c900275a8..fbddb2a1c03f5e 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2177,6 +2177,8 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
c->vi.vol_id);
if (err)
goto out_close;
+ sb->s_bdi->ra_pages = 0;
+ sb->s_bdi->io_pages = 0;

sb->s_fs_info = c;
sb->s_magic = UBIFS_SUPER_MAGIC;
diff --git a/fs/vboxsf/super.c b/fs/vboxsf/super.c
index 8fe03b4a0d2b03..8e3792177a8523 100644
--- a/fs/vboxsf/super.c
+++ b/fs/vboxsf/super.c
@@ -167,6 +167,8 @@ static int vboxsf_fill_super(struct super_block *sb, struct fs_context *fc)
err = super_setup_bdi_name(sb, "vboxsf-%d", sbi->bdi_id);
if (err)
goto fail_free;
+ sb->s_bdi->ra_pages = 0;
+ sb->s_bdi->io_pages = 0;

/* Turn source into a shfl_string and map the folder */
size = strlen(fc->source) + 1;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8e8b00627bb2d8..2dac3be6127127 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -746,6 +746,8 @@ struct backing_dev_info *bdi_alloc(int node_id)
kfree(bdi);
return NULL;
}
+ bdi->ra_pages = VM_READAHEAD_PAGES;
+ bdi->io_pages = VM_READAHEAD_PAGES;
return bdi;
}
EXPORT_SYMBOL(bdi_alloc);
--
2.28.0

2020-09-21 08:12:29

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 04/13] aoe: set an optimal I/O size

aoe forces a larger readahead size, but any reason to do larger I/O
is not limited to readahead. Also set the optimal I/O size, and
remove the local constants in favor of just using SZ_2G.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/block/aoe/aoeblk.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 5ca7216e9e01f3..d8cfc233e64b93 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -347,7 +347,6 @@ aoeblk_gdalloc(void *vp)
mempool_t *mp;
struct request_queue *q;
struct blk_mq_tag_set *set;
- enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
ulong flags;
int late = 0;
int err;
@@ -407,7 +406,8 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
- q->backing_dev_info->ra_pages = READ_AHEAD / PAGE_SIZE;
+ q->backing_dev_info->ra_pages = SZ_2M / PAGE_SIZE;
+ blk_queue_io_opt(q, SZ_2M);
d->bufpool = mp;
d->blkq = gd->queue = q;
q->queuedata = d;
--
2.28.0

2020-09-21 09:56:35

by Coly Li

[permalink] [raw]
Subject: Re: [PATCH 03/13] bcache: inherit the optimal I/O size

On 2020/9/21 16:07, Christoph Hellwig wrote:
> Inherit the optimal I/O size setting just like the readahead window,
> as any reason to do larger I/O does not apply to just readahead.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/md/bcache/super.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 1bbdc410ee3c51..48113005ed86ad 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1430,6 +1430,8 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size)
> dc->disk.disk->queue->backing_dev_info->ra_pages =
> max(dc->disk.disk->queue->backing_dev_info->ra_pages,
> q->backing_dev_info->ra_pages);
> + blk_queue_io_opt(dc->disk.disk->queue,
> + max(queue_io_opt(dc->disk.disk->queue), queue_io_opt(q)));
>

Hi Christoph,

I am not sure whether virtual bcache device's optimal request size can
be simply set like this.

Most of time inherit backing device's optimal request size is fine, but
there are two exceptions,
- Read request hits on cache device
- User sets sequential_cuttoff as 0, all writing may go into cache
device firstly.
For the above two conditions, all I/Os goes into cache device, using
optimal request size of backing device might be improper.

Just a guess, is it OK to set the optimal request size of the virtual
bcache device as the least common multiple of cache device's and backing
device's optimal request sizes ?


[snipped]

Thanks.

Coly Li

2020-09-21 14:04:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 03/13] bcache: inherit the optimal I/O size

On Mon, Sep 21, 2020 at 05:54:59PM +0800, Coly Li wrote:
> I am not sure whether virtual bcache device's optimal request size can
> be simply set like this.
>
> Most of time inherit backing device's optimal request size is fine, but
> there are two exceptions,
> - Read request hits on cache device
> - User sets sequential_cuttoff as 0, all writing may go into cache
> device firstly.
> For the above two conditions, all I/Os goes into cache device, using
> optimal request size of backing device might be improper.
>
> Just a guess, is it OK to set the optimal request size of the virtual
> bcache device as the least common multiple of cache device's and backing
> device's optimal request sizes ?

Well, if the optimal I/O size is wrong, the read ahead size also is
wrong. Can we just drop the setting?

2020-09-21 15:11:29

by Coly Li

[permalink] [raw]
Subject: Re: [PATCH 03/13] bcache: inherit the optimal I/O size

On 2020/9/21 22:00, Christoph Hellwig wrote:
> On Mon, Sep 21, 2020 at 05:54:59PM +0800, Coly Li wrote:
>> I am not sure whether virtual bcache device's optimal request size can
>> be simply set like this.
>>
>> Most of time inherit backing device's optimal request size is fine, but
>> there are two exceptions,
>> - Read request hits on cache device
>> - User sets sequential_cuttoff as 0, all writing may go into cache
>> device firstly.
>> For the above two conditions, all I/Os goes into cache device, using
>> optimal request size of backing device might be improper.
>>
>> Just a guess, is it OK to set the optimal request size of the virtual
>> bcache device as the least common multiple of cache device's and backing
>> device's optimal request sizes ?
>
> Well, if the optimal I/O size is wrong, the read ahead size also is
> wrong. Can we just drop the setting?
>

I feel this is something should be fixed. Indeed I overlooked it until
you point out the issue now.

The optimal request size and read ahead pages hint are necessary, but
current initialization is simple. A better way might be dynamically
setting them depends on the cache mode and some special configuration.

By your inspiration, I want to ACK your original patch although it
doesn't work fine for all condition. Then we may know these two settings
(ra_pages and queue_io_opt) should be improved for more situations. At
lease for most part of the situations they provide proper hints.

How do you think of the above idea ?

Coly Li

2020-09-21 18:22:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 03/13] bcache: inherit the optimal I/O size

On Mon, Sep 21, 2020 at 11:09:48PM +0800, Coly Li wrote:
> I feel this is something should be fixed. Indeed I overlooked it until
> you point out the issue now.
>
> The optimal request size and read ahead pages hint are necessary, but
> current initialization is simple. A better way might be dynamically
> setting them depends on the cache mode and some special configuration.
>
> By your inspiration, I want to ACK your original patch although it
> doesn't work fine for all condition. Then we may know these two settings
> (ra_pages and queue_io_opt) should be improved for more situations. At
> lease for most part of the situations they provide proper hints.
>
> How do you think of the above idea ?

Sounds like a plan. I'd reall like to get this series in to get
some soaking before the end of the merge window, but we should still
have plenty of time for localized bcache updates.

2020-09-22 08:45:58

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 03/13] bcache: inherit the optimal I/O size

On Mon 21-09-20 10:07:24, Christoph Hellwig wrote:
> Inherit the optimal I/O size setting just like the readahead window,
> as any reason to do larger I/O does not apply to just readahead.
>
> Signed-off-by: Christoph Hellwig <[email protected]>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> drivers/md/bcache/super.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 1bbdc410ee3c51..48113005ed86ad 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1430,6 +1430,8 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size)
> dc->disk.disk->queue->backing_dev_info->ra_pages =
> max(dc->disk.disk->queue->backing_dev_info->ra_pages,
> q->backing_dev_info->ra_pages);
> + blk_queue_io_opt(dc->disk.disk->queue,
> + max(queue_io_opt(dc->disk.disk->queue), queue_io_opt(q)));
>
> atomic_set(&dc->io_errors, 0);
> dc->io_disable = false;
> --
> 2.28.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-09-22 08:47:16

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 04/13] aoe: set an optimal I/O size

On Mon 21-09-20 10:07:25, Christoph Hellwig wrote:
> aoe forces a larger readahead size, but any reason to do larger I/O
> is not limited to readahead. Also set the optimal I/O size, and
> remove the local constants in favor of just using SZ_2G.
>
> Signed-off-by: Christoph Hellwig <[email protected]>

Looks good. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> drivers/block/aoe/aoeblk.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
> index 5ca7216e9e01f3..d8cfc233e64b93 100644
> --- a/drivers/block/aoe/aoeblk.c
> +++ b/drivers/block/aoe/aoeblk.c
> @@ -347,7 +347,6 @@ aoeblk_gdalloc(void *vp)
> mempool_t *mp;
> struct request_queue *q;
> struct blk_mq_tag_set *set;
> - enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
> ulong flags;
> int late = 0;
> int err;
> @@ -407,7 +406,8 @@ aoeblk_gdalloc(void *vp)
> WARN_ON(d->gd);
> WARN_ON(d->flags & DEVFL_UP);
> blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
> - q->backing_dev_info->ra_pages = READ_AHEAD / PAGE_SIZE;
> + q->backing_dev_info->ra_pages = SZ_2M / PAGE_SIZE;
> + blk_queue_io_opt(q, SZ_2M);
> d->bufpool = mp;
> d->blkq = gd->queue = q;
> q->queuedata = d;
> --
> 2.28.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-09-22 08:51:35

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 05/13] bdi: initialize ->ra_pages and ->io_pages in bdi_init

On Mon 21-09-20 10:07:26, Christoph Hellwig wrote:
> Set up a readahead size by default, as very few users have a good
> reason to change it.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> Acked-by: David Sterba <[email protected]> [btrfs]
> Acked-by: Richard Weinberger <[email protected]> [ubifs, mtd]

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <[email protected]>

I'd just prefer if the changelog explicitely mentioned that this patch
results in enabling readahead for coda, ecryptfs, and orangefs... Just in
case someone bisects some issue down to this patch :).

Honza

> ---
> block/blk-core.c | 2 --
> drivers/mtd/mtdcore.c | 2 ++
> fs/9p/vfs_super.c | 6 ++++--
> fs/afs/super.c | 1 -
> fs/btrfs/disk-io.c | 1 -
> fs/fuse/inode.c | 1 -
> fs/nfs/super.c | 9 +--------
> fs/ubifs/super.c | 2 ++
> fs/vboxsf/super.c | 2 ++
> mm/backing-dev.c | 2 ++
> 10 files changed, 13 insertions(+), 15 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index ca3f0f00c9435f..865d39e5be2b28 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -538,8 +538,6 @@ struct request_queue *blk_alloc_queue(int node_id)
> if (!q->stats)
> goto fail_stats;
>
> - q->backing_dev_info->ra_pages = VM_READAHEAD_PAGES;
> - q->backing_dev_info->io_pages = VM_READAHEAD_PAGES;
> q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
> q->node = node_id;
>
> diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
> index 7d930569a7dfb7..b5e5d3140f578e 100644
> --- a/drivers/mtd/mtdcore.c
> +++ b/drivers/mtd/mtdcore.c
> @@ -2196,6 +2196,8 @@ static struct backing_dev_info * __init mtd_bdi_init(char *name)
> bdi = bdi_alloc(NUMA_NO_NODE);
> if (!bdi)
> return ERR_PTR(-ENOMEM);
> + bdi->ra_pages = 0;
> + bdi->io_pages = 0;
>
> /*
> * We put '-0' suffix to the name to get the same name format as we
> diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
> index 74df32be4c6a52..e34fa20acf612e 100644
> --- a/fs/9p/vfs_super.c
> +++ b/fs/9p/vfs_super.c
> @@ -80,8 +80,10 @@ v9fs_fill_super(struct super_block *sb, struct v9fs_session_info *v9ses,
> if (ret)
> return ret;
>
> - if (v9ses->cache)
> - sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
> + if (!v9ses->cache) {
> + sb->s_bdi->ra_pages = 0;
> + sb->s_bdi->io_pages = 0;
> + }
>
> sb->s_flags |= SB_ACTIVE | SB_DIRSYNC;
> if (!v9ses->cache)
> diff --git a/fs/afs/super.c b/fs/afs/super.c
> index b552357b1d1379..3a40ee752c1e3f 100644
> --- a/fs/afs/super.c
> +++ b/fs/afs/super.c
> @@ -456,7 +456,6 @@ static int afs_fill_super(struct super_block *sb, struct afs_fs_context *ctx)
> ret = super_setup_bdi(sb);
> if (ret)
> return ret;
> - sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
>
> /* allocate the root inode and dentry */
> if (as->dyn_root) {
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index f6bba7eb1fa171..047934cea25efa 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3092,7 +3092,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
> }
>
> sb->s_bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK;
> - sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
> sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
> sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
>
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index bba747520e9b08..17b00670fb539e 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1049,7 +1049,6 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
> if (err)
> return err;
>
> - sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
> /* fuse does it's own writeback accounting */
> sb->s_bdi->capabilities = BDI_CAP_NO_ACCT_WB | BDI_CAP_STRICTLIMIT;
>
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index 7a70287f21a2c1..f943e37853fa25 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -1200,13 +1200,6 @@ static void nfs_get_cache_cookie(struct super_block *sb,
> }
> #endif
>
> -static void nfs_set_readahead(struct backing_dev_info *bdi,
> - unsigned long iomax_pages)
> -{
> - bdi->ra_pages = VM_READAHEAD_PAGES;
> - bdi->io_pages = iomax_pages;
> -}
> -
> int nfs_get_tree_common(struct fs_context *fc)
> {
> struct nfs_fs_context *ctx = nfs_fc2context(fc);
> @@ -1251,7 +1244,7 @@ int nfs_get_tree_common(struct fs_context *fc)
> MINOR(server->s_dev));
> if (error)
> goto error_splat_super;
> - nfs_set_readahead(s->s_bdi, server->rpages);
> + s->s_bdi->io_pages = server->rpages;
> server->super = s;
> }
>
> diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
> index a2420c900275a8..fbddb2a1c03f5e 100644
> --- a/fs/ubifs/super.c
> +++ b/fs/ubifs/super.c
> @@ -2177,6 +2177,8 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
> c->vi.vol_id);
> if (err)
> goto out_close;
> + sb->s_bdi->ra_pages = 0;
> + sb->s_bdi->io_pages = 0;
>
> sb->s_fs_info = c;
> sb->s_magic = UBIFS_SUPER_MAGIC;
> diff --git a/fs/vboxsf/super.c b/fs/vboxsf/super.c
> index 8fe03b4a0d2b03..8e3792177a8523 100644
> --- a/fs/vboxsf/super.c
> +++ b/fs/vboxsf/super.c
> @@ -167,6 +167,8 @@ static int vboxsf_fill_super(struct super_block *sb, struct fs_context *fc)
> err = super_setup_bdi_name(sb, "vboxsf-%d", sbi->bdi_id);
> if (err)
> goto fail_free;
> + sb->s_bdi->ra_pages = 0;
> + sb->s_bdi->io_pages = 0;
>
> /* Turn source into a shfl_string and map the folder */
> size = strlen(fc->source) + 1;
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 8e8b00627bb2d8..2dac3be6127127 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -746,6 +746,8 @@ struct backing_dev_info *bdi_alloc(int node_id)
> kfree(bdi);
> return NULL;
> }
> + bdi->ra_pages = VM_READAHEAD_PAGES;
> + bdi->io_pages = VM_READAHEAD_PAGES;
> return bdi;
> }
> EXPORT_SYMBOL(bdi_alloc);
> --
> 2.28.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-09-22 09:40:59

by Coly Li

[permalink] [raw]
Subject: Re: [PATCH 03/13] bcache: inherit the optimal I/O size

On 2020/9/21 16:07, Christoph Hellwig wrote:
> Inherit the optimal I/O size setting just like the readahead window,
> as any reason to do larger I/O does not apply to just readahead.
>
> Signed-off-by: Christoph Hellwig <[email protected]>

Acked-by: Coly Li <[email protected]>

Thanks.

Coly Li

> ---
> drivers/md/bcache/super.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 1bbdc410ee3c51..48113005ed86ad 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1430,6 +1430,8 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size)
> dc->disk.disk->queue->backing_dev_info->ra_pages =
> max(dc->disk.disk->queue->backing_dev_info->ra_pages,
> q->backing_dev_info->ra_pages);
> + blk_queue_io_opt(dc->disk.disk->queue,
> + max(queue_io_opt(dc->disk.disk->queue), queue_io_opt(q)));
>
> atomic_set(&dc->io_errors, 0);
> dc->io_disable = false;
>

2020-09-23 15:19:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 05/13] bdi: initialize ->ra_pages and ->io_pages in bdi_init

On Tue, Sep 22, 2020 at 10:49:54AM +0200, Jan Kara wrote:
> On Mon 21-09-20 10:07:26, Christoph Hellwig wrote:
> > Set up a readahead size by default, as very few users have a good
> > reason to change it.
> >
> > Signed-off-by: Christoph Hellwig <[email protected]>
> > Acked-by: David Sterba <[email protected]> [btrfs]
> > Acked-by: Richard Weinberger <[email protected]> [ubifs, mtd]
>
> The patch looks good to me. You can add:
>
> Reviewed-by: Jan Kara <[email protected]>
>
> I'd just prefer if the changelog explicitely mentioned that this patch
> results in enabling readahead for coda, ecryptfs, and orangefs... Just in
> case someone bisects some issue down to this patch :).

Ok, I've updated the changelog.