2020-09-24 06:55:39

by Christoph Hellwig

[permalink] [raw]
Subject: bdi cleanups v7

Hi Jens,

this series contains a bunch of different BDI cleanups. The biggest item
is to isolate block drivers from the BDI in preparation of changing the
lifetime of the block device BDI in a follow up series.

Changes since v6:
- add a new blk_queue_update_readahead helper and use it in stacking
drivers
- improve another commit log

Changes since v5:
- improve a commit message
- improve the stable_writes deprecation printk
- drop "drbd: remove RB_CONGESTED_REMOTE"
- drop a few hunks that add a local variable in a otherwise unchanged
file due to changes in the previous revisions
- keep updating ->io_pages in queue_max_sectors_store
- set an optimal I/O size in aoe
- inherit the optimal I/O size in bcache

Changes since v4:
- add a back a prematurely removed assignment in dm-table.c
- pick up a few reviews from Johannes that got lost

Changes since v3:
- rebased on the lasted block tree, which has some of the prep
changes merged
- extend the ->ra_pages changes to ->io_pages
- move initializing ->ra_pages and ->io_pages for block devices to
blk_register_queue

Changes since v2:
- fix a rw_page return value check
- fix up various changelogs

Changes since v1:
- rebased to the for-5.9/block-merge branch
- explicitly set the readahead to 0 for ubifs, vboxsf and mtd
- split the zram block_device operations
- let rw_page users fall back to bios in swap_readpage


Diffstat:


2020-09-24 06:55:44

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 04/13] aoe: set an optimal I/O size

aoe forces a larger readahead size, but any reason to do larger I/O
is not limited to readahead. Also set the optimal I/O size, and
remove the local constants in favor of just using SZ_2G.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
drivers/block/aoe/aoeblk.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 5ca7216e9e01f3..d8cfc233e64b93 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -347,7 +347,6 @@ aoeblk_gdalloc(void *vp)
mempool_t *mp;
struct request_queue *q;
struct blk_mq_tag_set *set;
- enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
ulong flags;
int late = 0;
int err;
@@ -407,7 +406,8 @@ aoeblk_gdalloc(void *vp)
WARN_ON(d->gd);
WARN_ON(d->flags & DEVFL_UP);
blk_queue_max_hw_sectors(q, BLK_DEF_MAX_SECTORS);
- q->backing_dev_info->ra_pages = READ_AHEAD / PAGE_SIZE;
+ q->backing_dev_info->ra_pages = SZ_2M / PAGE_SIZE;
+ blk_queue_io_opt(q, SZ_2M);
d->bufpool = mp;
d->blkq = gd->queue = q;
q->queuedata = d;
--
2.28.0

2020-09-24 06:56:05

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 03/13] bcache: inherit the optimal I/O size

Inherit the optimal I/O size setting just like the readahead window,
as any reason to do larger I/O does not apply to just readahead.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: Coly Li <[email protected]>
---
drivers/md/bcache/super.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 1bbdc410ee3c51..48113005ed86ad 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1430,6 +1430,8 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size)
dc->disk.disk->queue->backing_dev_info->ra_pages =
max(dc->disk.disk->queue->backing_dev_info->ra_pages,
q->backing_dev_info->ra_pages);
+ blk_queue_io_opt(dc->disk.disk->queue,
+ max(queue_io_opt(dc->disk.disk->queue), queue_io_opt(q)));

atomic_set(&dc->io_errors, 0);
dc->io_disable = false;
--
2.28.0

2020-09-24 06:56:07

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 02/13] drbd: remove dead code in device_to_statistics

Ever since the switch to blk-mq, a lower device not used for VM
writeback will not be marked congested, so the check will never
trigger.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 6 ------
1 file changed, 6 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 43c8ae4d9fca81..aaff5bde391506 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -3370,7 +3370,6 @@ static void device_to_statistics(struct device_statistics *s,
if (get_ldev(device)) {
struct drbd_md *md = &device->ldev->md;
u64 *history_uuids = (u64 *)s->history_uuids;
- struct request_queue *q;
int n;

spin_lock_irq(&md->uuid_lock);
@@ -3384,11 +3383,6 @@ static void device_to_statistics(struct device_statistics *s,
spin_unlock_irq(&md->uuid_lock);

s->dev_disk_flags = md->flags;
- q = bdev_get_queue(device->ldev->backing_bdev);
- s->dev_lower_blocked =
- bdi_congested(q->backing_dev_info,
- (1 << WB_async_congested) |
- (1 << WB_sync_congested));
put_ldev(device);
}
s->dev_size = drbd_get_capacity(device->this_bdev);
--
2.28.0

2020-09-24 06:56:25

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 06/13] md: update the optimal I/O size on reshape

The raid5 and raid10 drivers currently update the read-ahead size,
but not the optimal I/O size on reshape. To prepare for deriving the
read-ahead size from the optimal I/O size make sure it is updated
as well.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Song Liu <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
---
drivers/md/raid10.c | 22 ++++++++++++++--------
drivers/md/raid5.c | 10 ++++++++--
2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index e8fa327339171c..9956a04ac13bd6 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3703,10 +3703,20 @@ static struct r10conf *setup_conf(struct mddev *mddev)
return ERR_PTR(err);
}

+static void raid10_set_io_opt(struct r10conf *conf)
+{
+ int raid_disks = conf->geo.raid_disks;
+
+ if (!(conf->geo.raid_disks % conf->geo.near_copies))
+ raid_disks /= conf->geo.near_copies;
+ blk_queue_io_opt(conf->mddev->queue, (conf->mddev->chunk_sectors << 9) *
+ raid_disks);
+}
+
static int raid10_run(struct mddev *mddev)
{
struct r10conf *conf;
- int i, disk_idx, chunk_size;
+ int i, disk_idx;
struct raid10_info *disk;
struct md_rdev *rdev;
sector_t size;
@@ -3742,18 +3752,13 @@ static int raid10_run(struct mddev *mddev)
mddev->thread = conf->thread;
conf->thread = NULL;

- chunk_size = mddev->chunk_sectors << 9;
if (mddev->queue) {
blk_queue_max_discard_sectors(mddev->queue,
mddev->chunk_sectors);
blk_queue_max_write_same_sectors(mddev->queue, 0);
blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
- blk_queue_io_min(mddev->queue, chunk_size);
- if (conf->geo.raid_disks % conf->geo.near_copies)
- blk_queue_io_opt(mddev->queue, chunk_size * conf->geo.raid_disks);
- else
- blk_queue_io_opt(mddev->queue, chunk_size *
- (conf->geo.raid_disks / conf->geo.near_copies));
+ blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
+ raid10_set_io_opt(conf);
}

rdev_for_each(rdev, mddev) {
@@ -4727,6 +4732,7 @@ static void end_reshape(struct r10conf *conf)
stripe /= conf->geo.near_copies;
if (conf->mddev->queue->backing_dev_info->ra_pages < 2 * stripe)
conf->mddev->queue->backing_dev_info->ra_pages = 2 * stripe;
+ raid10_set_io_opt(conf);
}
conf->fullsync = 0;
}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 225380efd1e24f..9a7d1250894ef1 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7232,6 +7232,12 @@ static int only_parity(int raid_disk, int algo, int raid_disks, int max_degraded
return 0;
}

+static void raid5_set_io_opt(struct r5conf *conf)
+{
+ blk_queue_io_opt(conf->mddev->queue, (conf->chunk_sectors << 9) *
+ (conf->raid_disks - conf->max_degraded));
+}
+
static int raid5_run(struct mddev *mddev)
{
struct r5conf *conf;
@@ -7521,8 +7527,7 @@ static int raid5_run(struct mddev *mddev)

chunk_size = mddev->chunk_sectors << 9;
blk_queue_io_min(mddev->queue, chunk_size);
- blk_queue_io_opt(mddev->queue, chunk_size *
- (conf->raid_disks - conf->max_degraded));
+ raid5_set_io_opt(conf);
mddev->queue->limits.raid_partial_stripes_expensive = 1;
/*
* We can only discard a whole stripe. It doesn't make sense to
@@ -8115,6 +8120,7 @@ static void end_reshape(struct r5conf *conf)
/ PAGE_SIZE);
if (conf->mddev->queue->backing_dev_info->ra_pages < 2 * stripe)
conf->mddev->queue->backing_dev_info->ra_pages = 2 * stripe;
+ raid5_set_io_opt(conf);
}
}
}
--
2.28.0

2020-09-24 06:56:51

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 11/13] bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag

The BDI_CAP_STABLE_WRITES is one of the few bits of information in the
backing_dev_info shared between the block drivers and the writeback code.
To help untangling the dependency replace it with a queue flag and a
superblock flag derived from it. This also helps with the case of e.g.
a file system requiring stable writes due to its own checksumming, but
not forcing it on other users of the block device like the swap code.

One downside is that we an't support the stable_pages_required bdi
attribute in sysfs anymore. It is replaced with a queue attribute which
also is writable for easier testing.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
---
block/blk-integrity.c | 4 ++--
block/blk-mq-debugfs.c | 1 +
block/blk-sysfs.c | 3 +++
drivers/block/rbd.c | 2 +-
drivers/block/zram/zram_drv.c | 2 +-
drivers/md/dm-table.c | 6 +++---
drivers/md/raid5.c | 8 ++++----
drivers/mmc/core/queue.c | 3 +--
drivers/nvme/host/core.c | 3 +--
drivers/nvme/host/multipath.c | 10 +++-------
drivers/scsi/iscsi_tcp.c | 4 ++--
fs/super.c | 2 ++
include/linux/backing-dev.h | 6 ------
include/linux/blkdev.h | 3 +++
include/linux/fs.h | 1 +
mm/backing-dev.c | 7 +++----
mm/page-writeback.c | 2 +-
mm/swapfile.c | 2 +-
18 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index c03705cbb9c9f2..2b36a8f9b81390 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -408,7 +408,7 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
bi->tuple_size = template->tuple_size;
bi->tag_size = template->tag_size;

- disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
+ blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);

#ifdef CONFIG_BLK_INLINE_ENCRYPTION
if (disk->queue->ksm) {
@@ -428,7 +428,7 @@ EXPORT_SYMBOL(blk_integrity_register);
*/
void blk_integrity_unregister(struct gendisk *disk)
{
- disk->queue->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES;
+ blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, disk->queue);
memset(&disk->queue->integrity, 0, sizeof(struct blk_integrity));
}
EXPORT_SYMBOL(blk_integrity_unregister);
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 645b7f800cb827..3094542e12ae0f 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -116,6 +116,7 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(SAME_FORCE),
QUEUE_FLAG_NAME(DEAD),
QUEUE_FLAG_NAME(INIT_DONE),
+ QUEUE_FLAG_NAME(STABLE_WRITES),
QUEUE_FLAG_NAME(POLL),
QUEUE_FLAG_NAME(WC),
QUEUE_FLAG_NAME(FUA),
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 869ed21a9edcab..76b54c7750b07e 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -287,6 +287,7 @@ queue_##name##_store(struct request_queue *q, const char *page, size_t count) \
QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
+QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
#undef QUEUE_SYSFS_BIT_FNS

static ssize_t queue_zoned_show(struct request_queue *q, char *page)
@@ -613,6 +614,7 @@ static struct queue_sysfs_entry queue_hw_sector_size_entry = {
QUEUE_RW_ENTRY(queue_nonrot, "rotational");
QUEUE_RW_ENTRY(queue_iostats, "iostats");
QUEUE_RW_ENTRY(queue_random, "add_random");
+QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");

static struct attribute *queue_attrs[] = {
&queue_requests_entry.attr,
@@ -645,6 +647,7 @@ static struct attribute *queue_attrs[] = {
&queue_nomerges_entry.attr,
&queue_rq_affinity_entry.attr,
&queue_iostats_entry.attr,
+ &queue_stable_writes_entry.attr,
&queue_random_entry.attr,
&queue_poll_entry.attr,
&queue_wc_entry.attr,
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 5d3923c0997ce0..cf5b016358cdab 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -5022,7 +5022,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
}

if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
- q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
+ blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);

/*
* disk_release() expects a queue ref from add_disk() and will
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index e21ca844d7c291..bff3d4021c18e1 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1955,7 +1955,7 @@ static int zram_add(void)
if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);

- zram->disk->queue->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
+ blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, zram->disk->queue);
device_add_disk(NULL, zram->disk, zram_disk_attr_groups);

strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index ef2757012f59d5..405d7cf10eb973 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1815,7 +1815,7 @@ static int device_requires_stable_pages(struct dm_target *ti,
{
struct request_queue *q = bdev_get_queue(dev->bdev);

- return q && bdi_cap_stable_pages_required(q->backing_dev_info);
+ return q && blk_queue_stable_writes(q);
}

/*
@@ -1900,9 +1900,9 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
* because they do their own checksumming.
*/
if (dm_table_requires_stable_pages(t))
- q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
+ blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
else
- q->backing_dev_info->capabilities &= ~BDI_CAP_STABLE_WRITES;
+ blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, q);

/*
* Determine whether or not this queue's I/O timings contribute
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7ace1f76b14736..d589d26c86ea3f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6638,14 +6638,14 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
if (!conf)
err = -ENODEV;
else if (new != conf->skip_copy) {
+ struct request_queue *q = mddev->queue;
+
mddev_suspend(mddev);
conf->skip_copy = new;
if (new)
- mddev->queue->backing_dev_info->capabilities |=
- BDI_CAP_STABLE_WRITES;
+ blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
else
- mddev->queue->backing_dev_info->capabilities &=
- ~BDI_CAP_STABLE_WRITES;
+ blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, q);
mddev_resume(mddev);
}
mddev_unlock(mddev);
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 6c022ef0f84d72..80fe3852ce0f75 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -472,8 +472,7 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card)
}

if (mmc_host_is_spi(host) && host->use_spi_crc)
- mq->queue->backing_dev_info->capabilities |=
- BDI_CAP_STABLE_WRITES;
+ blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, mq->queue);

mq->queue->queuedata = mq;
blk_queue_rq_timeout(mq->queue, 60 * HZ);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 741c9bfa8e14c7..c190c56bf7020d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3926,8 +3926,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
goto out_free_ns;

if (ctrl->opts && ctrl->opts->data_digest)
- ns->queue->backing_dev_info->capabilities
- |= BDI_CAP_STABLE_WRITES;
+ blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);

blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index d4ba736c6c8905..74896be40c1769 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -673,13 +673,9 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id)
nvme_mpath_set_live(ns);
}

- if (bdi_cap_stable_pages_required(ns->queue->backing_dev_info)) {
- struct gendisk *disk = ns->head->disk;
-
- if (disk)
- disk->queue->backing_dev_info->capabilities |=
- BDI_CAP_STABLE_WRITES;
- }
+ if (blk_queue_stable_writes(ns->queue) && ns->head->disk)
+ blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES,
+ ns->head->disk->queue);
}

void nvme_mpath_remove_disk(struct nvme_ns_head *head)
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index b5dd1caae5e92d..a622f334c933f5 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -962,8 +962,8 @@ static int iscsi_sw_tcp_slave_configure(struct scsi_device *sdev)
struct iscsi_conn *conn = session->leadconn;

if (conn->datadgst_en)
- sdev->request_queue->backing_dev_info->capabilities
- |= BDI_CAP_STABLE_WRITES;
+ blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES,
+ sdev->request_queue);
blk_queue_dma_alignment(sdev->request_queue, 0);
return 0;
}
diff --git a/fs/super.c b/fs/super.c
index 904459b3511995..a51c2083cd6b18 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1256,6 +1256,8 @@ static int set_bdev_super(struct super_block *s, void *data)
s->s_dev = s->s_bdev->bd_dev;
s->s_bdi = bdi_get(s->s_bdev->bd_bdi);

+ if (blk_queue_stable_writes(s->s_bdev->bd_disk->queue))
+ s->s_iflags |= SB_I_STABLE_WRITES;
return 0;
}

diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 860ea33571bce5..5da4ea3dd0cc5c 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -126,7 +126,6 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio);
#define BDI_CAP_NO_ACCT_DIRTY 0x00000001
#define BDI_CAP_NO_WRITEBACK 0x00000002
#define BDI_CAP_NO_ACCT_WB 0x00000004
-#define BDI_CAP_STABLE_WRITES 0x00000008
#define BDI_CAP_STRICTLIMIT 0x00000010
#define BDI_CAP_CGROUP_WRITEBACK 0x00000020

@@ -170,11 +169,6 @@ static inline int wb_congested(struct bdi_writeback *wb, int cong_bits)
long congestion_wait(int sync, long timeout);
long wait_iff_congested(int sync, long timeout);

-static inline bool bdi_cap_stable_pages_required(struct backing_dev_info *bdi)
-{
- return bdi->capabilities & BDI_CAP_STABLE_WRITES;
-}
-
static inline bool bdi_cap_writeback_dirty(struct backing_dev_info *bdi)
{
return !(bdi->capabilities & BDI_CAP_NO_WRITEBACK);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 282f5ca424f14a..8e77f12de52214 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -606,6 +606,7 @@ struct request_queue {
#define QUEUE_FLAG_SAME_FORCE 12 /* force complete on same CPU */
#define QUEUE_FLAG_DEAD 13 /* queue tear-down finished */
#define QUEUE_FLAG_INIT_DONE 14 /* queue is initialized */
+#define QUEUE_FLAG_STABLE_WRITES 15 /* don't modify blks until WB is done */
#define QUEUE_FLAG_POLL 16 /* IO polling enabled if set */
#define QUEUE_FLAG_WC 17 /* Write back caching */
#define QUEUE_FLAG_FUA 18 /* device supports FUA writes */
@@ -635,6 +636,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
#define blk_queue_noxmerges(q) \
test_bit(QUEUE_FLAG_NOXMERGES, &(q)->queue_flags)
#define blk_queue_nonrot(q) test_bit(QUEUE_FLAG_NONROT, &(q)->queue_flags)
+#define blk_queue_stable_writes(q) \
+ test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags)
#define blk_queue_io_stat(q) test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
#define blk_queue_add_random(q) test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
#define blk_queue_discard(q) test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fbd74df5ce5f34..222465b7cf4178 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1385,6 +1385,7 @@ extern int send_sigurg(struct fown_struct *fown);
#define SB_I_CGROUPWB 0x00000001 /* cgroup-aware writeback enabled */
#define SB_I_NOEXEC 0x00000002 /* Ignore executables on this fs */
#define SB_I_NODEV 0x00000004 /* Ignore devices on this fs */
+#define SB_I_STABLE_WRITES 0x00000008 /* don't modify blks until WB is done */

/* sb->s_iflags to limit user namespace mounts */
#define SB_I_USERNS_VISIBLE 0x00000010 /* fstype already mounted */
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 2dac3be6127127..8e3802bf03a968 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -204,10 +204,9 @@ static ssize_t stable_pages_required_show(struct device *dev,
struct device_attribute *attr,
char *page)
{
- struct backing_dev_info *bdi = dev_get_drvdata(dev);
-
- return snprintf(page, PAGE_SIZE-1, "%d\n",
- bdi_cap_stable_pages_required(bdi) ? 1 : 0);
+ dev_warn_once(dev,
+ "the stable_pages_required attribute has been removed. Use the stable_writes queue attribute instead.\n");
+ return snprintf(page, PAGE_SIZE-1, "%d\n", 0);
}
static DEVICE_ATTR_RO(stable_pages_required);

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 4e4ddd67b71e58..e9c36521461aaa 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2849,7 +2849,7 @@ EXPORT_SYMBOL_GPL(wait_on_page_writeback);
*/
void wait_for_stable_page(struct page *page)
{
- if (bdi_cap_stable_pages_required(inode_to_bdi(page->mapping->host)))
+ if (page->mapping->host->i_sb->s_iflags & SB_I_STABLE_WRITES)
wait_on_page_writeback(page);
}
EXPORT_SYMBOL_GPL(wait_for_stable_page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 96a7c47dd533e4..65ef407512b5b0 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3237,7 +3237,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
goto bad_swap_unlock_inode;
}

- if (bdi_cap_stable_pages_required(inode_to_bdi(inode)))
+ if (p->bdev && blk_queue_stable_writes(p->bdev->bd_disk->queue))
p->flags |= SWP_STABLE_WRITES;

if (p->bdev && p->bdev->bd_disk->fops->rw_page)
--
2.28.0

2020-09-24 06:56:58

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 05/13] bdi: initialize ->ra_pages and ->io_pages in bdi_init

Set up a readahead size by default, as very few users have a good
reason to change it. This means code, ecryptfs, and orangefs now
set up the values while they were previously missing it, while ubifs,
mtd and vboxsf manually set it to 0 to avoid readahead.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Acked-by: David Sterba <[email protected]> [btrfs]
Acked-by: Richard Weinberger <[email protected]> [ubifs, mtd]
---
block/blk-core.c | 2 --
drivers/mtd/mtdcore.c | 2 ++
fs/9p/vfs_super.c | 6 ++++--
fs/afs/super.c | 1 -
fs/btrfs/disk-io.c | 1 -
fs/fuse/inode.c | 1 -
fs/nfs/super.c | 9 +--------
fs/ubifs/super.c | 2 ++
fs/vboxsf/super.c | 2 ++
mm/backing-dev.c | 2 ++
10 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ca3f0f00c9435f..865d39e5be2b28 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -538,8 +538,6 @@ struct request_queue *blk_alloc_queue(int node_id)
if (!q->stats)
goto fail_stats;

- q->backing_dev_info->ra_pages = VM_READAHEAD_PAGES;
- q->backing_dev_info->io_pages = VM_READAHEAD_PAGES;
q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
q->node = node_id;

diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 7d930569a7dfb7..b5e5d3140f578e 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -2196,6 +2196,8 @@ static struct backing_dev_info * __init mtd_bdi_init(char *name)
bdi = bdi_alloc(NUMA_NO_NODE);
if (!bdi)
return ERR_PTR(-ENOMEM);
+ bdi->ra_pages = 0;
+ bdi->io_pages = 0;

/*
* We put '-0' suffix to the name to get the same name format as we
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 74df32be4c6a52..e34fa20acf612e 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -80,8 +80,10 @@ v9fs_fill_super(struct super_block *sb, struct v9fs_session_info *v9ses,
if (ret)
return ret;

- if (v9ses->cache)
- sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
+ if (!v9ses->cache) {
+ sb->s_bdi->ra_pages = 0;
+ sb->s_bdi->io_pages = 0;
+ }

sb->s_flags |= SB_ACTIVE | SB_DIRSYNC;
if (!v9ses->cache)
diff --git a/fs/afs/super.c b/fs/afs/super.c
index b552357b1d1379..3a40ee752c1e3f 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -456,7 +456,6 @@ static int afs_fill_super(struct super_block *sb, struct afs_fs_context *ctx)
ret = super_setup_bdi(sb);
if (ret)
return ret;
- sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;

/* allocate the root inode and dentry */
if (as->dyn_root) {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f6bba7eb1fa171..047934cea25efa 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3092,7 +3092,6 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
}

sb->s_bdi->capabilities |= BDI_CAP_CGROUP_WRITEBACK;
- sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index bba747520e9b08..17b00670fb539e 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1049,7 +1049,6 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb)
if (err)
return err;

- sb->s_bdi->ra_pages = VM_READAHEAD_PAGES;
/* fuse does it's own writeback accounting */
sb->s_bdi->capabilities = BDI_CAP_NO_ACCT_WB | BDI_CAP_STRICTLIMIT;

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 7a70287f21a2c1..f943e37853fa25 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1200,13 +1200,6 @@ static void nfs_get_cache_cookie(struct super_block *sb,
}
#endif

-static void nfs_set_readahead(struct backing_dev_info *bdi,
- unsigned long iomax_pages)
-{
- bdi->ra_pages = VM_READAHEAD_PAGES;
- bdi->io_pages = iomax_pages;
-}
-
int nfs_get_tree_common(struct fs_context *fc)
{
struct nfs_fs_context *ctx = nfs_fc2context(fc);
@@ -1251,7 +1244,7 @@ int nfs_get_tree_common(struct fs_context *fc)
MINOR(server->s_dev));
if (error)
goto error_splat_super;
- nfs_set_readahead(s->s_bdi, server->rpages);
+ s->s_bdi->io_pages = server->rpages;
server->super = s;
}

diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index a2420c900275a8..fbddb2a1c03f5e 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -2177,6 +2177,8 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
c->vi.vol_id);
if (err)
goto out_close;
+ sb->s_bdi->ra_pages = 0;
+ sb->s_bdi->io_pages = 0;

sb->s_fs_info = c;
sb->s_magic = UBIFS_SUPER_MAGIC;
diff --git a/fs/vboxsf/super.c b/fs/vboxsf/super.c
index 8fe03b4a0d2b03..8e3792177a8523 100644
--- a/fs/vboxsf/super.c
+++ b/fs/vboxsf/super.c
@@ -167,6 +167,8 @@ static int vboxsf_fill_super(struct super_block *sb, struct fs_context *fc)
err = super_setup_bdi_name(sb, "vboxsf-%d", sbi->bdi_id);
if (err)
goto fail_free;
+ sb->s_bdi->ra_pages = 0;
+ sb->s_bdi->io_pages = 0;

/* Turn source into a shfl_string and map the folder */
size = strlen(fc->source) + 1;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8e8b00627bb2d8..2dac3be6127127 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -746,6 +746,8 @@ struct backing_dev_info *bdi_alloc(int node_id)
kfree(bdi);
return NULL;
}
+ bdi->ra_pages = VM_READAHEAD_PAGES;
+ bdi->io_pages = VM_READAHEAD_PAGES;
return bdi;
}
EXPORT_SYMBOL(bdi_alloc);
--
2.28.0

2020-09-24 15:51:54

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH 03/13] bcache: inherit the optimal I/O size


Christoph,

> Inherit the optimal I/O size setting just like the readahead window,
> as any reason to do larger I/O does not apply to just readahead.

Looks fine.

Reviewed-by: Martin K. Petersen <[email protected]>

--
Martin K. Petersen Oracle Linux Engineering

2020-09-24 15:54:59

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH 04/13] aoe: set an optimal I/O size


Christoph,

> aoe forces a larger readahead size, but any reason to do larger I/O is
> not limited to readahead. Also set the optimal I/O size, and remove
> the local constants in favor of just using SZ_2G.

Reviewed-by: Martin K. Petersen <[email protected]>

--
Martin K. Petersen Oracle Linux Engineering

2020-09-24 15:59:36

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH 06/13] md: update the optimal I/O size on reshape


Christoph,

> The raid5 and raid10 drivers currently update the read-ahead size, but
> not the optimal I/O size on reshape. To prepare for deriving the
> read-ahead size from the optimal I/O size make sure it is updated as
> well.

Reviewed-by: Martin K. Petersen <[email protected]>

--
Martin K. Petersen Oracle Linux Engineering

2020-09-24 19:48:30

by Jens Axboe

[permalink] [raw]
Subject: Re: bdi cleanups v7

On 9/24/20 12:51 AM, Christoph Hellwig wrote:
> Hi Jens,
>
> this series contains a bunch of different BDI cleanups. The biggest item
> is to isolate block drivers from the BDI in preparation of changing the
> lifetime of the block device BDI in a follow up series.

Applied, thanks.

--
Jens Axboe