2017-06-26 12:11:32

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 00/51] block: support multipage bvec

Hi,

This patchset brings multipage bvec into block layer:

1) what is multipage bvec?

Multipage bvecs means that one 'struct bio_bvec' can hold
multiple pages which are physically contiguous instead
of one single page used in linux kernel for long time.

2) why is multipage bvec introduced?

Kent proposed the idea[1] first.

As system's RAM becomes much bigger than before, and
at the same time huge page, transparent huge page and
memory compaction are widely used, it is a bit easy now
to see physically contiguous pages from fs in I/O.
On the other hand, from block layer's view, it isn't
necessary to store intermediate pages into bvec, and
it is enough to just store the physicallly contiguous
'segment' in each io vector.

Also huge pages are being brought to filesystem and swap
[2][6], we can do IO on a hugepage each time[3], which
requires that one bio can transfer at least one huge page
one time. Turns out it isn't flexiable to change BIO_MAX_PAGES
simply[3][5]. Multipage bvec can fit in this case very well.

With multipage bvec:

- segment handling in block layer can be improved much
in future since it should be quite easy to convert
multipage bvec into segment easily. For example, we might
just store segment in each bvec directly in future.

- bio size can be increased and it should improve some
high-bandwidth IO case in theory[4].

- Inside block layer, both bio splitting and sg map can
become more efficient than before by just traversing the
physically contiguous 'segment' instead of each page.

- there is opportunity in future to improve memory footprint
of bvecs.

3) how is multipage bvec implemented in this patchset?

The 1st 18 patches comment on some special cases and deal with
some special cases of direct access to bvec table.

The 2nd part(19~29) implements multipage bvec in block layer:

- put all tricks into bvec/bio/rq iterators, and as far as
drivers and fs use these standard iterators, they are happy
with multipage bvec

- use multipage bvec to split bio and map sg

- bio_for_each_segment_all() changes
this helper pass pointer of each bvec directly to user, and
it has to be changed. Two new helpers(bio_for_each_segment_all_sp()
and bio_for_each_segment_all_mp()) are introduced.

The 3rd part(30~49) convert current users of bio_for_each_segment_all()
to bio_for_each_segment_all_sp()/bio_for_each_segment_all_mp().

The last part(50~51) enables multipage bvec.

These patches can be found in the following git tree:

https://github.com/ming1/linux/commits/mp-bvec-1.4-v4.12-rc

Thanks Christoph for looking at the early version and providing
very good suggestions, such as: introduce bio_init_with_vec_table(),
remove another unnecessary helpers for cleanup and so on.

Any comments are welcome!

V2:
- bvec table direct access in raid has been cleaned, so NO_MP
flag is dropped
- rebase on recent Neil Brown's change on bio and bounce code
- reorganize the patchset

V1:
- against v4.10-rc1 and some cleanup in V0 are in -linus already
- handle queue_virt_boundary() in mp bvec change and make NVMe happy
- further BTRFS cleanup
- remove QUEUE_FLAG_SPLIT_MP
- rename for two new helpers of bio_for_each_segment_all()
- fix bounce convertion
- address comments in V0

[1], http://marc.info/?l=linux-kernel&m=141680246629547&w=2
[2], https://patchwork.kernel.org/patch/9451523/
[3], http://marc.info/?t=147735447100001&r=1&w=2
[4], http://marc.info/?l=linux-mm&m=147745525801433&w=2
[5], http://marc.info/?t=149569484500007&r=1&w=2
[6], http://marc.info/?t=149820215300004&r=1&w=2

Ming Lei (51):
block: drbd: comment on direct access bvec table
block: loop: comment on direct access to bvec table
kernel/power/swap.c: comment on direct access to bvec table
mm: page_io.c: comment on direct access to bvec table
fs/buffer: comment on direct access to bvec table
f2fs: f2fs_read_end_io: comment on direct access to bvec table
bcache: comment on direct access to bvec table
block: comment on bio_alloc_pages()
block: comment on bio_iov_iter_get_pages()
dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE
md: raid1: initialize bvec table via bio_add_page()
md: raid10: avoid to access bvec table directly
btrfs: avoid access to .bi_vcnt directly
btrfs: avoid to access bvec table directly for a cloned bio
btrfs: comment on direct access bvec table
block: bounce: avoid direct access to bvec table
bvec_iter: introduce BVEC_ITER_ALL_INIT
block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq
block: comments on bio_for_each_segment[_all]
block: introduce multipage/single page bvec helpers
block: implement sp version of bvec iterator helpers
block: introduce bio_for_each_segment_mp()
blk-merge: compute bio->bi_seg_front_size efficiently
block: blk-merge: try to make front segments in full size
block: blk-merge: remove unnecessary check
block: use bio_for_each_segment_mp() to compute segments count
block: use bio_for_each_segment_mp() to map sg
block: introduce bvec_for_each_sp_bvec()
block: bio: introduce single/multi page version of
bio_for_each_segment_all()
block: introduce bvec_get_last_page()
fs/buffer.c: use bvec iterator to truncate the bio
btrfs: use bvec_get_last_page to get bio's last page
block: deal with dirtying pages for multipage bvec
block: convert to singe/multi page version of
bio_for_each_segment_all()
bcache: convert to bio_for_each_segment_all_sp()
md: raid1: convert to bio_for_each_segment_all_sp()
dm-crypt: don't clear bvec->bv_page in crypt_free_buffer_pages()
dm-crypt: convert to bio_for_each_segment_all_sp()
fs/mpage: convert to bio_for_each_segment_all_sp()
fs/block: convert to bio_for_each_segment_all_sp()
fs/iomap: convert to bio_for_each_segment_all_sp()
ext4: convert to bio_for_each_segment_all_sp()
xfs: convert to bio_for_each_segment_all_sp()
gfs2: convert to bio_for_each_segment_all_sp()
f2fs: convert to bio_for_each_segment_all_sp()
exofs: convert to bio_for_each_segment_all_sp()
fs: crypto: convert to bio_for_each_segment_all_sp()
fs/btrfs: convert to bio_for_each_segment_all_sp()
fs/direct-io: convert to bio_for_each_segment_all_sp()
block: enable multipage bvecs
block: bio: pass segments to bio if bio_add_page() is bypassed

block/bio.c | 137 ++++++++++++++++++++----
block/blk-merge.c | 226 +++++++++++++++++++++++++++++++--------
block/blk-zoned.c | 5 +-
block/bounce.c | 35 +++---
drivers/block/drbd/drbd_bitmap.c | 1 +
drivers/block/loop.c | 5 +
drivers/md/bcache/btree.c | 4 +-
drivers/md/bcache/super.c | 6 ++
drivers/md/bcache/util.c | 7 ++
drivers/md/dm-crypt.c | 4 +-
drivers/md/dm.c | 11 +-
drivers/md/raid1.c | 30 +++---
drivers/md/raid10.c | 22 +++-
fs/block_dev.c | 6 +-
fs/btrfs/compression.c | 12 ++-
fs/btrfs/disk-io.c | 3 +-
fs/btrfs/extent_io.c | 39 +++++--
fs/btrfs/extent_io.h | 2 +-
fs/btrfs/inode.c | 22 +++-
fs/btrfs/raid56.c | 6 +-
fs/buffer.c | 11 +-
fs/crypto/bio.c | 3 +-
fs/direct-io.c | 4 +-
fs/exofs/ore.c | 3 +-
fs/exofs/ore_raid.c | 3 +-
fs/ext4/page-io.c | 3 +-
fs/ext4/readpage.c | 3 +-
fs/f2fs/data.c | 13 ++-
fs/gfs2/lops.c | 3 +-
fs/gfs2/meta_io.c | 3 +-
fs/iomap.c | 3 +-
fs/mpage.c | 3 +-
fs/xfs/xfs_aops.c | 3 +-
include/linux/bio.h | 72 +++++++++++--
include/linux/blk_types.h | 6 ++
include/linux/bvec.h | 142 ++++++++++++++++++++++--
kernel/power/swap.c | 2 +
mm/page_io.c | 2 +
38 files changed, 714 insertions(+), 151 deletions(-)

--
2.9.4


2017-06-26 12:11:47

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 01/51] block: drbd: comment on direct access bvec table

Signed-off-by: Ming Lei <[email protected]>
---
drivers/block/drbd/drbd_bitmap.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index 809fd245c3dc..70890d950dc9 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -953,6 +953,7 @@ static void drbd_bm_endio(struct bio *bio)
struct drbd_bm_aio_ctx *ctx = bio->bi_private;
struct drbd_device *device = ctx->device;
struct drbd_bitmap *b = device->bitmap;
+ /* single page bio, safe for multipage bvec */
unsigned int idx = bm_page_to_idx(bio->bi_io_vec[0].bv_page);

if ((ctx->flags & BM_AIO_COPY_PAGES) == 0 &&
--
2.9.4

2017-06-26 12:12:10

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 02/51] block: loop: comment on direct access to bvec table

Signed-off-by: Ming Lei <[email protected]>
---
drivers/block/loop.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0de11444e317..88063ab17e9a 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -487,6 +487,11 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
/* nomerge for loop request queue */
WARN_ON(cmd->rq->bio != cmd->rq->biotail);

+ /*
+ * For multipage bvec support, it is safe to pass the bvec
+ * table to iov iterator, because iov iter still uses bvec
+ * iter helpers to travese bvec.
+ */
bvec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
iov_iter_bvec(&iter, ITER_BVEC | rw, bvec,
bio_segments(bio), blk_rq_bytes(cmd->rq));
--
2.9.4

2017-06-26 12:12:41

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 03/51] kernel/power/swap.c: comment on direct access to bvec table

Cc: "Rafael J. Wysocki" <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
kernel/power/swap.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 57d22571f306..aa52ccc03fcc 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -238,6 +238,8 @@ static void hib_init_batch(struct hib_bio_batch *hb)
static void hib_end_io(struct bio *bio)
{
struct hib_bio_batch *hb = bio->bi_private;
+
+ /* single page bio, safe for multipage bvec */
struct page *page = bio->bi_io_vec[0].bv_page;

if (bio->bi_status) {
--
2.9.4

2017-06-26 12:12:55

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 04/51] mm: page_io.c: comment on direct access to bvec table

Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
mm/page_io.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/page_io.c b/mm/page_io.c
index b6c4ac388209..11c6f4a9a25b 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -43,6 +43,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,

void end_swap_bio_write(struct bio *bio)
{
+ /* single page bio, safe for multipage bvec */
struct page *page = bio->bi_io_vec[0].bv_page;

if (bio->bi_status) {
@@ -116,6 +117,7 @@ static void swap_slot_free_notify(struct page *page)

static void end_swap_bio_read(struct bio *bio)
{
+ /* single page bio, safe for multipage bvec */
struct page *page = bio->bi_io_vec[0].bv_page;
struct task_struct *waiter = bio->bi_private;

--
2.9.4

2017-06-26 12:13:11

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 05/51] fs/buffer: comment on direct access to bvec table

Signed-off-by: Ming Lei <[email protected]>
---
fs/buffer.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 4d5d03b42e11..1910f539770b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3052,8 +3052,13 @@ static void end_bio_bh_io_sync(struct bio *bio)
void guard_bio_eod(int op, struct bio *bio)
{
sector_t maxsector;
- struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];
unsigned truncated_bytes;
+ /*
+ * It is safe to truncate the last bvec in the following way
+ * even though multipage bvec is supported, but we need to
+ * fix the parameters passed to zero_user().
+ */
+ struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];

maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;
if (!maxsector)
--
2.9.4

2017-06-26 12:13:49

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 06/51] f2fs: f2fs_read_end_io: comment on direct access to bvec table

Cc: Jaegeuk Kim <[email protected]>
Cc: Chao Yu <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
fs/f2fs/data.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 7697d03e8a98..622c44a1be78 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -56,6 +56,10 @@ static void f2fs_read_end_io(struct bio *bio)
int i;

#ifdef CONFIG_F2FS_FAULT_INJECTION
+ /*
+ * It is still safe to retrieve the 1st page of the bio
+ * in this way after supporting multipage bvec.
+ */
if (time_to_inject(F2FS_P_SB(bio->bi_io_vec->bv_page), FAULT_IO)) {
f2fs_show_injection_info(FAULT_IO);
bio->bi_status = BLK_STS_IOERR;
--
2.9.4

2017-06-26 12:14:02

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 07/51] bcache: comment on direct access to bvec table

Looks all are safe after multipage bvec is supported.

Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
drivers/md/bcache/btree.c | 1 +
drivers/md/bcache/super.c | 6 ++++++
drivers/md/bcache/util.c | 7 +++++++
3 files changed, 14 insertions(+)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 866dcf78ff8e..3da595ae565b 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -431,6 +431,7 @@ static void do_btree_node_write(struct btree *b)

continue_at(cl, btree_node_write_done, NULL);
} else {
+ /* No harm for multipage bvec since the new is just allocated */
b->bio->bi_vcnt = 0;
bch_bio_map(b->bio, i);

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 8352fad765f6..6808f548cd13 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -208,6 +208,7 @@ static void write_bdev_super_endio(struct bio *bio)

static void __write_super(struct cache_sb *sb, struct bio *bio)
{
+ /* single page bio, safe for multipage bvec */
struct cache_sb *out = page_address(bio->bi_io_vec[0].bv_page);
unsigned i;

@@ -1154,6 +1155,8 @@ static void register_bdev(struct cache_sb *sb, struct page *sb_page,
dc->bdev->bd_holder = dc;

bio_init(&dc->sb_bio, dc->sb_bio.bi_inline_vecs, 1);
+
+ /* single page bio, safe for multipage bvec */
dc->sb_bio.bi_io_vec[0].bv_page = sb_page;
get_page(sb_page);

@@ -1799,6 +1802,7 @@ void bch_cache_release(struct kobject *kobj)
for (i = 0; i < RESERVE_NR; i++)
free_fifo(&ca->free[i]);

+ /* single page bio, safe for multipage bvec */
if (ca->sb_bio.bi_inline_vecs[0].bv_page)
put_page(ca->sb_bio.bi_io_vec[0].bv_page);

@@ -1854,6 +1858,8 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page,
ca->bdev->bd_holder = ca;

bio_init(&ca->sb_bio, ca->sb_bio.bi_inline_vecs, 1);
+
+ /* single page bio, safe for multipage bvec */
ca->sb_bio.bi_io_vec[0].bv_page = sb_page;
get_page(sb_page);

diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index 8c3a938f4bf0..11b4230ea6ad 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -223,6 +223,13 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done)
: 0;
}

+/*
+ * Generally it isn't good to access .bi_io_vec and .bi_vcnt
+ * directly, the preferred way is bio_add_page, but in
+ * this case, bch_bio_map() supposes that the bvec table
+ * is empty, so it is safe to access .bi_vcnt & .bi_io_vec
+ * in this way even after multipage bvec is supported.
+ */
void bch_bio_map(struct bio *bio, void *base)
{
size_t size = bio->bi_iter.bi_size;
--
2.9.4

2017-06-26 12:14:14

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 08/51] block: comment on bio_alloc_pages()

This patch adds comment on usage of bio_alloc_pages().

Signed-off-by: Ming Lei <[email protected]>
---
block/bio.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index 89a51bd49ab7..a5db117e8dfa 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -972,7 +972,9 @@ EXPORT_SYMBOL(bio_advance);
* @bio: bio to allocate pages for
* @gfp_mask: flags for allocation
*
- * Allocates pages up to @bio->bi_vcnt.
+ * Allocates pages up to @bio->bi_vcnt, and this function should only
+ * be called on a new initialized bio, which means all pages aren't added
+ * to the bio via bio_add_page() yet.
*
* Returns 0 on success, -ENOMEM on failure. On failure, any allocated pages are
* freed.
--
2.9.4

2017-06-26 12:14:23

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 09/51] block: comment on bio_iov_iter_get_pages()

bio_iov_iter_get_pages() used unused bvec spaces for
storing page pointer array temporarily, and this patch
comments on this usage wrt. multipage bvec support.

Signed-off-by: Ming Lei <[email protected]>
---
block/bio.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index a5db117e8dfa..bf7f25889f6e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -870,6 +870,10 @@ EXPORT_SYMBOL(bio_add_page);
*
* Pins as many pages from *iter and appends them to @bio's bvec array. The
* pages will have to be released using put_page() when done.
+ *
+ * The hacking way of using bvec table as page pointer array is safe
+ * even after multipage bvec is introduced because that space can be
+ * thought as unused by bio_add_page().
*/
int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
{
--
2.9.4

2017-06-26 12:14:50

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 10/51] dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE

For BIO based DM, some targets aren't ready for dealing with
bigger incoming bio than 1Mbyte, such as crypt target.

Cc: Mike Snitzer <[email protected]>
Cc:[email protected]
Signed-off-by: Ming Lei <[email protected]>
---
drivers/md/dm.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 96bd13e581cd..49583c623cdd 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -921,7 +921,16 @@ int dm_set_target_max_io_len(struct dm_target *ti, sector_t len)
return -EINVAL;
}

- ti->max_io_len = (uint32_t) len;
+ /*
+ * BIO based queue uses its own splitting. When multipage bvecs
+ * is switched on, size of the incoming bio may be too big to
+ * be handled in some targets, such as crypt.
+ *
+ * When these targets are ready for the big bio, we can remove
+ * the limit.
+ */
+ ti->max_io_len = min_t(uint32_t, len,
+ (BIO_MAX_PAGES * PAGE_SIZE));

return 0;
}
--
2.9.4

2017-06-26 12:15:06

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()

We will support multipage bvec soon, so initialize bvec
table using the standardy way instead of writing the
talbe directly. Otherwise it won't work any more once
multipage bvec is enabled.

Cc: Shaohua Li <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
drivers/md/raid1.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3febfc8391fb..835c42396861 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2086,10 +2086,8 @@ static void process_checks(struct r1bio *r1_bio)
/* Fix variable parts of all bios */
vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
for (i = 0; i < conf->raid_disks * 2; i++) {
- int j;
int size;
blk_status_t status;
- struct bio_vec *bi;
struct bio *b = r1_bio->bios[i];
struct resync_pages *rp = get_resync_pages(b);
if (b->bi_end_io != end_sync_read)
@@ -2098,8 +2096,6 @@ static void process_checks(struct r1bio *r1_bio)
status = b->bi_status;
bio_reset(b);
b->bi_status = status;
- b->bi_vcnt = vcnt;
- b->bi_iter.bi_size = r1_bio->sectors << 9;
b->bi_iter.bi_sector = r1_bio->sector +
conf->mirrors[i].rdev->data_offset;
b->bi_bdev = conf->mirrors[i].rdev->bdev;
@@ -2107,15 +2103,20 @@ static void process_checks(struct r1bio *r1_bio)
rp->raid_bio = r1_bio;
b->bi_private = rp;

- size = b->bi_iter.bi_size;
- bio_for_each_segment_all(bi, b, j) {
- bi->bv_offset = 0;
- if (size > PAGE_SIZE)
- bi->bv_len = PAGE_SIZE;
- else
- bi->bv_len = size;
- size -= PAGE_SIZE;
- }
+ /* initialize bvec table again */
+ rp->idx = 0;
+ size = r1_bio->sectors << 9;
+ do {
+ struct page *page = resync_fetch_page(rp, rp->idx++);
+ int len = min_t(int, size, PAGE_SIZE);
+
+ /*
+ * won't fail because the vec table is big
+ * enough to hold all these pages
+ */
+ bio_add_page(b, page, len, 0);
+ size -= len;
+ } while (rp->idx < RESYNC_PAGES && size > 0);
}
for (primary = 0; primary < conf->raid_disks * 2; primary++)
if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
--
2.9.4

2017-06-26 12:15:15

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 12/51] md: raid10: avoid to access bvec table directly

Inside sync_request_write(), .bi_vcnt is written after this bio
is reseted, this way won't work any more after multipage bvec
is enabled.

So reset_bvec_table() is introduced for re-add these pages into
bio, then .bi_vcnt needn't to be touched any more.

Cc: Shaohua Li <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
drivers/md/raid10.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5026e7ad51d3..2fca1fe67092 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1995,6 +1995,24 @@ static void end_sync_write(struct bio *bio)
end_sync_request(r10_bio);
}

+/* called after bio_reset() */
+static void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size)
+{
+ /* initialize bvec table again */
+ rp->idx = 0;
+ do {
+ struct page *page = resync_fetch_page(rp, rp->idx++);
+ int len = min_t(int, size, PAGE_SIZE);
+
+ /*
+ * won't fail because the vec table is big
+ * enough to hold all these pages
+ */
+ bio_add_page(bio, page, len, 0);
+ size -= len;
+ } while (rp->idx < RESYNC_PAGES && size > 0);
+}
+
/*
* Note: sync and recover and handled very differently for raid10
* This code is for resync.
@@ -2087,8 +2105,8 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
rp = get_resync_pages(tbio);
bio_reset(tbio);

- tbio->bi_vcnt = vcnt;
- tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
+ reset_bvec_table(tbio, rp, fbio->bi_iter.bi_size);
+
rp->raid_bio = r10_bio;
tbio->bi_private = rp;
tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
--
2.9.4

2017-06-26 12:15:21

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 13/51] btrfs: avoid access to .bi_vcnt directly

BTRFS uses bio->bi_vcnt to figure out page numbers, this
way becomes not correct once we start to enable multipage
bvec.

So use bio_for_each_segment_all() to do that instead.

Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: David Sterba <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
fs/btrfs/extent_io.c | 21 +++++++++++++++++----
fs/btrfs/extent_io.h | 2 +-
2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0863164d97d2..5b453cada1ea 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2258,7 +2258,7 @@ int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
return 0;
}

-int btrfs_check_repairable(struct inode *inode, struct bio *failed_bio,
+int btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages,
struct io_failure_record *failrec, int failed_mirror)
{
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -2282,7 +2282,7 @@ int btrfs_check_repairable(struct inode *inode, struct bio *failed_bio,
* a) deliver good data to the caller
* b) correct the bad sectors on disk
*/
- if (failed_bio->bi_vcnt > 1) {
+ if (failed_bio_pages > 1) {
/*
* to fulfill b), we need to know the exact failing sectors, as
* we don't want to rewrite any more than the failed ones. thus,
@@ -2355,6 +2355,17 @@ struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
return bio;
}

+static unsigned int get_bio_pages(struct bio *bio)
+{
+ unsigned i;
+ struct bio_vec *bv;
+
+ bio_for_each_segment_all(bv, bio, i)
+ ;
+
+ return i;
+}
+
/*
* this is a generic handler for readpage errors (default
* readpage_io_failed_hook). if other copies exist, read those and write back
@@ -2375,6 +2386,7 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset,
int read_mode = 0;
blk_status_t status;
int ret;
+ unsigned failed_bio_pages = get_bio_pages(failed_bio);

BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);

@@ -2382,13 +2394,14 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset,
if (ret)
return ret;

- ret = btrfs_check_repairable(inode, failed_bio, failrec, failed_mirror);
+ ret = btrfs_check_repairable(inode, failed_bio_pages, failrec,
+ failed_mirror);
if (!ret) {
free_io_failure(failure_tree, tree, failrec);
return -EIO;
}

- if (failed_bio->bi_vcnt > 1)
+ if (failed_bio_pages > 1)
read_mode |= REQ_FAILFAST_DEV;

phy_offset >>= inode->i_sb->s_blocksize_bits;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index d4942d94a16b..90681d1f0786 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -539,7 +539,7 @@ void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start,
u64 end);
int btrfs_get_io_failure_record(struct inode *inode, u64 start, u64 end,
struct io_failure_record **failrec_ret);
-int btrfs_check_repairable(struct inode *inode, struct bio *failed_bio,
+int btrfs_check_repairable(struct inode *inode, unsigned failed_bio_pages,
struct io_failure_record *failrec, int fail_mirror);
struct bio *btrfs_create_repair_bio(struct inode *inode, struct bio *failed_bio,
struct io_failure_record *failrec,
--
2.9.4

2017-06-26 12:15:39

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 15/51] btrfs: comment on direct access bvec table

Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: David Sterba <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
fs/btrfs/compression.c | 4 ++++
fs/btrfs/inode.c | 12 ++++++++++++
2 files changed, 16 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 2c0b7b57fcd5..5972f74354ca 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -541,6 +541,10 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,

/* we need the actual starting offset of this extent in the file */
read_lock(&em_tree->lock);
+ /*
+ * It is still safe to retrieve the 1st page of the bio
+ * in this way after supporting multipage bvec.
+ */
em = lookup_extent_mapping(em_tree,
page_offset(bio->bi_io_vec->bv_page),
PAGE_SIZE);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4ab02b34f029..7e725d84917b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8055,6 +8055,12 @@ static void btrfs_retry_endio_nocsum(struct bio *bio)
if (bio->bi_status)
goto end;

+ /*
+ * WARNING:
+ *
+ * With multipage bvec, the following way of direct access to
+ * bvec table is only safe if the bio includes single page.
+ */
ASSERT(bio->bi_vcnt == 1);
io_tree = &BTRFS_I(inode)->io_tree;
failure_tree = &BTRFS_I(inode)->io_failure_tree;
@@ -8146,6 +8152,12 @@ static void btrfs_retry_endio(struct bio *bio)

uptodate = 1;

+ /*
+ * WARNING:
+ *
+ * With multipage bvec, the following way of direct access to
+ * bvec table is only safe if the bio includes single page.
+ */
ASSERT(bio->bi_vcnt == 1);
ASSERT(bio->bi_io_vec->bv_len == btrfs_inode_sectorsize(done->inode));

--
2.9.4

2017-06-26 12:15:52

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 16/51] block: bounce: avoid direct access to bvec table

We will support multipage bvecs in the future, so change to
iterator way for getting bv_page of bvec from original bio.

Signed-off-by: Ming Lei <[email protected]>
---
block/bounce.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/block/bounce.c b/block/bounce.c
index 916ee9a9a216..4eea1b2d8618 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -135,21 +135,22 @@ static void copy_to_high_bio_irq(struct bio *to, struct bio *from)
static void bounce_end_io(struct bio *bio, mempool_t *pool)
{
struct bio *bio_orig = bio->bi_private;
- struct bio_vec *bvec, *org_vec;
+ struct bio_vec *bvec, orig_vec;
int i;
- int start = bio_orig->bi_iter.bi_idx;
+ struct bvec_iter orig_iter = bio_orig->bi_iter;

/*
* free up bounce indirect pages used
*/
bio_for_each_segment_all(bvec, bio, i) {
- org_vec = bio_orig->bi_io_vec + i + start;
-
- if (bvec->bv_page == org_vec->bv_page)
- continue;
+ orig_vec = bio_iter_iovec(bio_orig, orig_iter);
+ if (bvec->bv_page == orig_vec.bv_page)
+ goto next;

dec_zone_page_state(bvec->bv_page, NR_BOUNCE);
mempool_free(bvec->bv_page, pool);
+ next:
+ bio_advance_iter(bio_orig, &orig_iter, orig_vec.bv_len);
}

bio_orig->bi_status = bio->bi_status;
--
2.9.4

2017-06-26 12:16:11

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 19/51] block: comments on bio_for_each_segment[_all]

This patch clarifies the fact that even though both
bio_for_each_segment() and bio_for_each_segment_all()
are named as _segment/_segment_all, they still return
one page in each vector, instead of real segment(multipage bvec).

With comming multipage bvec, both the two helpers
are capable of returning real segment(multipage bvec),
but the callers(users) of the two helpers may not be
capable of handling of the multipage bvec or real
segment, so we still keep the interfaces of the helpers
not changed. And new helpers for returning multipage bvec(real segment)
will be introduced later.

Signed-off-by: Ming Lei <[email protected]>
---
include/linux/bio.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 4907bea03908..d425be4d1ced 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -155,7 +155,10 @@ static inline void *bio_data(struct bio *bio)

/*
* drivers should _never_ use the all version - the bio may have been split
- * before it got to the driver and the driver won't own all of it
+ * before it got to the driver and the driver won't own all of it.
+ *
+ * Even though the helper is named as _segment_all, it still returns
+ * page one by one instead of real segment.
*/
#define bio_for_each_segment_all(bvl, bio, i) \
for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)
@@ -177,6 +180,10 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
((bvl = bio_iter_iovec((bio), (iter))), 1); \
bio_advance_iter((bio), &(iter), (bvl).bv_len))

+/*
+ * Even though the helper is named as _segment, it still returns
+ * page one by one instead of real segment.
+ */
#define bio_for_each_segment(bvl, bio, iter) \
__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)

--
2.9.4

2017-06-26 12:16:31

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 21/51] block: implement sp version of bvec iterator helpers

This patch implements singlepage version of the following
3 helpers:
- bvec_iter_offset_sp()
- bvec_iter_len_sp()
- bvec_iter_page_sp()

So that one multipage bvec can be splited to singlepage
bvec, and make users of current bvec iterator happy.

Signed-off-by: Ming Lei <[email protected]>
---
include/linux/bvec.h | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index f52587e283d4..61632e9db3b8 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -22,6 +22,7 @@

#include <linux/kernel.h>
#include <linux/bug.h>
+#include <linux/mm.h>

/*
* What is multipage bvecs(segment)?
@@ -95,14 +96,25 @@ struct bvec_iter {
#define bvec_iter_offset_mp(bvec, iter) \
(__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)

+#define bvec_iter_page_idx_mp(bvec, iter) \
+ (bvec_iter_offset_mp((bvec), (iter)) / PAGE_SIZE)
+
+
/*
* <page, offset,length> of singlepage(sp) segment.
*
* This helpers will be implemented for building sp bvec in flight.
*/
-#define bvec_iter_offset_sp(bvec, iter) bvec_iter_offset_mp((bvec), (iter))
-#define bvec_iter_len_sp(bvec, iter) bvec_iter_len_mp((bvec), (iter))
-#define bvec_iter_page_sp(bvec, iter) bvec_iter_page_mp((bvec), (iter))
+#define bvec_iter_offset_sp(bvec, iter) \
+ (bvec_iter_offset_mp((bvec), (iter)) % PAGE_SIZE)
+
+#define bvec_iter_len_sp(bvec, iter) \
+ min_t(unsigned, bvec_iter_len_mp((bvec), (iter)), \
+ (PAGE_SIZE - (bvec_iter_offset_sp((bvec), (iter)))))
+
+#define bvec_iter_page_sp(bvec, iter) \
+ nth_page(bvec_iter_page_mp((bvec), (iter)), \
+ bvec_iter_page_idx_mp((bvec), (iter)))

/* current interfaces support sp style at default */
#define bvec_iter_page(bvec, iter) bvec_iter_page_sp((bvec), (iter))
--
2.9.4

2017-06-26 12:16:48

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 23/51] blk-merge: compute bio->bi_seg_front_size efficiently

It is enough to check and compute bio->bi_seg_front_size just
after the 1st segment is found, but current code checks that
for each bvec, which is inefficient.

This patch follows the way in __blk_recalc_rq_segments()
for computing bio->bi_seg_front_size, and it is more efficient
and code becomes more readable too.

Signed-off-by: Ming Lei <[email protected]>
---
block/blk-merge.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 5df13041b851..821b9c206308 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -145,22 +145,21 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
bvprvp = &bvprv;
sectors += bv.bv_len >> 9;

- if (nsegs == 1 && seg_size > front_seg_size)
- front_seg_size = seg_size;
continue;
}
new_segment:
if (nsegs == queue_max_segments(q))
goto split;

+ if (nsegs == 1 && seg_size > front_seg_size)
+ front_seg_size = seg_size;
+
nsegs++;
bvprv = bv;
bvprvp = &bvprv;
seg_size = bv.bv_len;
sectors += bv.bv_len >> 9;

- if (nsegs == 1 && seg_size > front_seg_size)
- front_seg_size = seg_size;
}

do_split = false;
@@ -173,6 +172,8 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
bio = new;
}

+ if (nsegs == 1 && seg_size > front_seg_size)
+ front_seg_size = seg_size;
bio->bi_seg_front_size = front_seg_size;
if (seg_size > bio->bi_seg_back_size)
bio->bi_seg_back_size = seg_size;
--
2.9.4

2017-06-26 12:16:54

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 24/51] block: blk-merge: try to make front segments in full size

When merging one bvec into segment, if the bvec is too big
to merge, current policy is to move the whole bvec into another
new segment.

This patchset changes the policy into trying to maximize size of
front segments, that means in above situation, part of bvec
is merged into current segment, and the remainder is put
into next segment.

This patch prepares for support multipage bvec because
it can be quite common to see this case and we should try
to make front segments in full size.

Signed-off-by: Ming Lei <[email protected]>
---
block/blk-merge.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 49 insertions(+), 5 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 821b9c206308..bf7a0fa0199f 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -108,6 +108,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
bool do_split = true;
struct bio *new = NULL;
const unsigned max_sectors = get_max_io_size(q, bio);
+ unsigned advance = 0;

bio_for_each_segment(bv, bio, iter) {
/*
@@ -133,12 +134,32 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
}

if (bvprvp && blk_queue_cluster(q)) {
- if (seg_size + bv.bv_len > queue_max_segment_size(q))
- goto new_segment;
if (!BIOVEC_PHYS_MERGEABLE(bvprvp, &bv))
goto new_segment;
if (!BIOVEC_SEG_BOUNDARY(q, bvprvp, &bv))
goto new_segment;
+ if (seg_size + bv.bv_len > queue_max_segment_size(q)) {
+ /*
+ * On assumption is that initial value of
+ * @seg_size(equals to bv.bv_len) won't be
+ * bigger than max segment size, but will
+ * becomes false after multipage bvec comes.
+ */
+ advance = queue_max_segment_size(q) - seg_size;
+
+ if (advance > 0) {
+ seg_size += advance;
+ sectors += advance >> 9;
+ bv.bv_len -= advance;
+ bv.bv_offset += advance;
+ }
+
+ /*
+ * Still need to put remainder of current
+ * bvec into a new segment.
+ */
+ goto new_segment;
+ }

seg_size += bv.bv_len;
bvprv = bv;
@@ -160,6 +181,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
seg_size = bv.bv_len;
sectors += bv.bv_len >> 9;

+ /* restore the bvec for iterator */
+ if (advance) {
+ bv.bv_len += advance;
+ bv.bv_offset -= advance;
+ advance = 0;
+ }
}

do_split = false;
@@ -360,16 +387,29 @@ __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
{

int nbytes = bvec->bv_len;
+ unsigned advance = 0;

if (*sg && *cluster) {
- if ((*sg)->length + nbytes > queue_max_segment_size(q))
- goto new_segment;
-
if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec))
goto new_segment;
if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec))
goto new_segment;

+ /*
+ * try best to merge part of the bvec into previous
+ * segment and follow same policy with
+ * blk_bio_segment_split()
+ */
+ if ((*sg)->length + nbytes > queue_max_segment_size(q)) {
+ advance = queue_max_segment_size(q) - (*sg)->length;
+ if (advance) {
+ (*sg)->length += advance;
+ bvec->bv_offset += advance;
+ bvec->bv_len -= advance;
+ }
+ goto new_segment;
+ }
+
(*sg)->length += nbytes;
} else {
new_segment:
@@ -392,6 +432,10 @@ __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,

sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
(*nsegs)++;
+
+ /* for making iterator happy */
+ bvec->bv_offset -= advance;
+ bvec->bv_len += advance;
}
*bvprv = *bvec;
}
--
2.9.4

2017-06-26 12:17:06

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 25/51] block: blk-merge: remove unnecessary check

In this case, 'sectors' can't be zero at all, so remove the check
and let the bio be splitted.

Signed-off-by: Ming Lei <[email protected]>
---
block/blk-merge.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index bf7a0fa0199f..c6fcc49b9aea 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -128,9 +128,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
nsegs++;
sectors = max_sectors;
}
- if (sectors)
- goto split;
- /* Make this single bvec as the 1st segment */
+ goto split;
}

if (bvprvp && blk_queue_cluster(q)) {
--
2.9.4

2017-06-26 12:16:03

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 17/51] bvec_iter: introduce BVEC_ITER_ALL_INIT

Introduce BVEC_ITER_ALL_INIT for iterating one bio
from start to end.

Signed-off-by: Ming Lei <[email protected]>
---
include/linux/bvec.h | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 89b65b82d98f..162ca7caf510 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -94,4 +94,13 @@ static inline void bvec_iter_advance(const struct bio_vec *bv,
((bvl = bvec_iter_bvec((bio_vec), (iter))), 1); \
bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))

+/* for iterating one bio from start to end */
+#define BVEC_ITER_ALL_INIT (struct bvec_iter) \
+{ \
+ .bi_sector = 0, \
+ .bi_size = UINT_MAX, \
+ .bi_idx = 0, \
+ .bi_bvec_done = 0, \
+}
+
#endif /* __LINUX_BVEC_ITER_H */
--
2.9.4

2017-06-26 12:17:52

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 28/51] block: introduce bvec_for_each_sp_bvec()

This helper can be used to iterate each singlepage bvec
from one multipage bvec.

Signed-off-by: Ming Lei <[email protected]>
---
include/linux/bvec.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 5c51c58fe202..7addceea9828 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -192,4 +192,18 @@ static inline void bvec_iter_advance_mp(const struct bio_vec *bv,
.bi_bvec_done = 0, \
}

+/*
+ * This helper iterates over the multipage bvec of @mp_bvec and
+ * returns each singlepage bvec via @sp_bvl.
+ */
+#define __bvec_for_each_sp_bvec(sp_bvl, mp_bvec, iter, start) \
+ for (iter = start, \
+ (iter).bi_size = (mp_bvec)->bv_len - (iter).bi_bvec_done; \
+ (iter).bi_size && \
+ ((sp_bvl = bvec_iter_bvec((mp_bvec), (iter))), 1); \
+ bvec_iter_advance((mp_bvec), &(iter), (sp_bvl).bv_len))
+
+#define bvec_for_each_sp_bvec(sp_bvl, mp_bvec, iter) \
+ __bvec_for_each_sp_bvec(sp_bvl, mp_bvec, iter, BVEC_ITER_ALL_INIT)
+
#endif /* __LINUX_BVEC_ITER_H */
--
2.9.4

2017-06-26 12:18:10

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 29/51] block: bio: introduce single/multi page version of bio_for_each_segment_all()

This patches introduce bio_for_each_segment_all_sp() and
bio_for_each_segment_all_mp().

bio_for_each_segment_all_sp() is for replacing bio_for_each_segment_all()
in case that the returned bvec has to be single page bvec.

bio_for_each_segment_all_mp() is for replacing bio_for_each_segment_all()
in case that user wants to update the returned bvec via the pointer.

Signed-off-by: Ming Lei <[email protected]>
---
include/linux/bio.h | 24 ++++++++++++++++++++++++
include/linux/blk_types.h | 6 ++++++
2 files changed, 30 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index bdbc9480229d..a4bb694b4da5 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -216,6 +216,30 @@ static inline void bio_advance_iter_mp(struct bio *bio, struct bvec_iter *iter,
#define bio_for_each_segment_mp(bvl, bio, iter) \
__bio_for_each_segment_mp(bvl, bio, iter, (bio)->bi_iter)

+/*
+ * This helper returns each bvec stored in bvec table directly,
+ * so the returned bvec points to one multipage bvec in the table
+ * and caller can update the bvec via the returnd pointer.
+ */
+#define bio_for_each_segment_all_mp(bvl, bio, i) \
+ bio_for_each_segment_all((bvl), (bio), (i))
+
+/*
+ * This helper returns singlepage bvec to caller, and the sp bvec
+ * is generated in-flight from multipage bvec stored in bvec table.
+ * So we can _not_ change the bvec stored in bio->bi_io_vec[] via
+ * this helper.
+ *
+ * If someone need to update bvec in the table, please use
+ * bio_for_each_segment_all_mp() and make sure it is correctly used
+ * since the bvec points to one multipage bvec.
+ */
+#define bio_for_each_segment_all_sp(bvl, bio, i, bi) \
+ for ((bi).iter = BVEC_ITER_ALL_INIT, i = 0, bvl = &(bi).bv; \
+ (bi).iter.bi_idx < (bio)->bi_vcnt && \
+ (((bi).bv = bio_iter_iovec((bio), (bi).iter)), 1); \
+ bio_advance_iter((bio), &(bi).iter, (bi).bv.bv_len), i++)
+
#define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)

static inline unsigned bio_segments(struct bio *bio)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index e210da6d14b8..3650932f2080 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -118,6 +118,12 @@ struct bio {

#define BIO_RESET_BYTES offsetof(struct bio, bi_max_vecs)

+/* this iter is only for implementing bio_for_each_segment_rd() */
+struct bvec_iter_all {
+ struct bvec_iter iter;
+ struct bio_vec bv; /* in-flight singlepage bvec */
+};
+
/*
* bio flags
*/
--
2.9.4

2017-06-26 12:18:19

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 30/51] block: introduce bvec_get_last_page()

BTRFS and guard_bio_eod() need to get the last page, so introduce
this helper to make them happy.

Signed-off-by: Ming Lei <[email protected]>
---
include/linux/bvec.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 7addceea9828..6673e3c0b7eb 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -206,4 +206,18 @@ static inline void bvec_iter_advance_mp(const struct bio_vec *bv,
#define bvec_for_each_sp_bvec(sp_bvl, mp_bvec, iter) \
__bvec_for_each_sp_bvec(sp_bvl, mp_bvec, iter, BVEC_ITER_ALL_INIT)

+/*
+ * get the last page from the multipage bvec and store it
+ * in @sp_bv
+ */
+static inline void bvec_get_last_page(struct bio_vec *mp_bv,
+ struct bio_vec *sp_bv)
+{
+ struct bvec_iter iter;
+
+ *sp_bv = *mp_bv;
+ bvec_for_each_sp_bvec(*sp_bv, mp_bv, iter)
+ ;
+}
+
#endif /* __LINUX_BVEC_ITER_H */
--
2.9.4

2017-06-26 12:18:28

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 27/51] block: use bio_for_each_segment_mp() to map sg

It is more efficient to use bio_for_each_segment_mp()
for mapping sg, meantime we have to consider splitting
multipage bvec as done in blk_bio_segment_split().

Signed-off-by: Ming Lei <[email protected]>
---
block/blk-merge.c | 72 +++++++++++++++++++++++++++++++++++++++----------------
1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8d2c2d763456..894dcd017b56 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -439,6 +439,56 @@ static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
return 0;
}

+static inline struct scatterlist *blk_next_sg(struct scatterlist **sg,
+ struct scatterlist *sglist)
+{
+ if (!*sg)
+ return sglist;
+ else {
+ /*
+ * If the driver previously mapped a shorter
+ * list, we could see a termination bit
+ * prematurely unless it fully inits the sg
+ * table on each mapping. We KNOW that there
+ * must be more entries here or the driver
+ * would be buggy, so force clear the
+ * termination bit to avoid doing a full
+ * sg_init_table() in drivers for each command.
+ */
+ sg_unmark_end(*sg);
+ return sg_next(*sg);
+ }
+}
+
+static inline unsigned
+blk_bvec_map_sg(struct request_queue *q, struct bio_vec *bvec,
+ struct scatterlist *sglist, struct scatterlist **sg)
+{
+ unsigned nbytes = bvec->bv_len;
+ unsigned nsegs = 0, total = 0;
+
+ while (nbytes > 0) {
+ unsigned seg_size;
+ struct page *pg;
+ unsigned offset, idx;
+
+ *sg = blk_next_sg(sg, sglist);
+
+ seg_size = min(nbytes, queue_max_segment_size(q));
+ offset = (total + bvec->bv_offset) % PAGE_SIZE;
+ idx = (total + bvec->bv_offset) / PAGE_SIZE;
+ pg = nth_page(bvec->bv_page, idx);
+
+ sg_set_page(*sg, pg, seg_size, offset);
+
+ total += seg_size;
+ nbytes -= seg_size;
+ nsegs++;
+ }
+
+ return nsegs;
+}
+
static inline void
__blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
struct scatterlist *sglist, struct bio_vec *bvprv,
@@ -472,25 +522,7 @@ __blk_segment_map_sg(struct request_queue *q, struct bio_vec *bvec,
(*sg)->length += nbytes;
} else {
new_segment:
- if (!*sg)
- *sg = sglist;
- else {
- /*
- * If the driver previously mapped a shorter
- * list, we could see a termination bit
- * prematurely unless it fully inits the sg
- * table on each mapping. We KNOW that there
- * must be more entries here or the driver
- * would be buggy, so force clear the
- * termination bit to avoid doing a full
- * sg_init_table() in drivers for each command.
- */
- sg_unmark_end(*sg);
- *sg = sg_next(*sg);
- }
-
- sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset);
- (*nsegs)++;
+ (*nsegs) += blk_bvec_map_sg(q, bvec, sglist, sg);

/* for making iterator happy */
bvec->bv_offset -= advance;
@@ -516,7 +548,7 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
int cluster = blk_queue_cluster(q), nsegs = 0;

for_each_bio(bio)
- bio_for_each_segment(bvec, bio, iter)
+ bio_for_each_segment_mp(bvec, bio, iter)
__blk_segment_map_sg(q, &bvec, sglist, &bvprv, sg,
&nsegs, &cluster);

--
2.9.4

2017-06-26 12:18:37

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 26/51] block: use bio_for_each_segment_mp() to compute segments count

Firstly it is more efficient to use bio_for_each_segment_mp()
in both blk_bio_segment_split() and __blk_recalc_rq_segments()
to compute how many segments there are in the bio.

Secondaly once bio_for_each_segment_mp() is used, the bvec
may need to be splitted because its length can be very long
and more than max segment size, so we have to support to split
one bvec into several segments.

Thirdly during splitting mp bvec into segments, max segment
number may be reached, then the bio need to be splitted when
this happens.

Signed-off-by: Ming Lei <[email protected]>
---
block/blk-merge.c | 97 ++++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 79 insertions(+), 18 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index c6fcc49b9aea..8d2c2d763456 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -96,6 +96,62 @@ static inline unsigned get_max_io_size(struct request_queue *q,
return sectors;
}

+/*
+ * Split the bvec @bv into segments, and update all kinds of
+ * variables.
+ */
+static bool bvec_split_segs(struct request_queue *q, struct bio_vec *bv,
+ unsigned *nsegs, unsigned *last_seg_size,
+ unsigned *front_seg_size, unsigned *sectors)
+{
+ bool need_split = false;
+ unsigned len = bv->bv_len;
+ unsigned total_len = 0;
+ unsigned new_nsegs = 0, seg_size = 0;
+
+ if ((*nsegs >= queue_max_segments(q)) || !len)
+ return need_split;
+
+ /*
+ * Multipage bvec may be too big to hold in one segment,
+ * so the current bvec has to be splitted as multiple
+ * segments.
+ */
+ while (new_nsegs + *nsegs < queue_max_segments(q)) {
+ seg_size = min(queue_max_segment_size(q), len);
+
+ new_nsegs++;
+ total_len += seg_size;
+ len -= seg_size;
+
+ if ((queue_virt_boundary(q) && ((bv->bv_offset +
+ total_len) & queue_virt_boundary(q))) || !len)
+ break;
+ }
+
+ /* split in the middle of the bvec */
+ if (len)
+ need_split = true;
+
+ /* update front segment size */
+ if (!*nsegs) {
+ unsigned first_seg_size = seg_size;
+
+ if (new_nsegs > 1)
+ first_seg_size = queue_max_segment_size(q);
+ if (*front_seg_size < first_seg_size)
+ *front_seg_size = first_seg_size;
+ }
+
+ /* update other varibles */
+ *last_seg_size = seg_size;
+ *nsegs += new_nsegs;
+ if (sectors)
+ *sectors += total_len >> 9;
+
+ return need_split;
+}
+
static struct bio *blk_bio_segment_split(struct request_queue *q,
struct bio *bio,
struct bio_set *bs,
@@ -110,7 +166,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
const unsigned max_sectors = get_max_io_size(q, bio);
unsigned advance = 0;

- bio_for_each_segment(bv, bio, iter) {
+ bio_for_each_segment_mp(bv, bio, iter) {
/*
* If the queue doesn't support SG gaps and adding this
* offset would create a gap, disallow it.
@@ -125,8 +181,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
*/
if (nsegs < queue_max_segments(q) &&
sectors < max_sectors) {
- nsegs++;
- sectors = max_sectors;
+ /* split in the middle of bvec */
+ bv.bv_len = (max_sectors - sectors) << 9;
+ bvec_split_segs(q, &bv, &nsegs,
+ &seg_size,
+ &front_seg_size,
+ &sectors);
}
goto split;
}
@@ -138,10 +198,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
goto new_segment;
if (seg_size + bv.bv_len > queue_max_segment_size(q)) {
/*
- * On assumption is that initial value of
- * @seg_size(equals to bv.bv_len) won't be
- * bigger than max segment size, but will
- * becomes false after multipage bvec comes.
+ * The initial value of @seg_size won't be
+ * bigger than max segment size, because we
+ * split the bvec via bvec_split_segs().
*/
advance = queue_max_segment_size(q) - seg_size;

@@ -173,11 +232,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
if (nsegs == 1 && seg_size > front_seg_size)
front_seg_size = seg_size;

- nsegs++;
bvprv = bv;
bvprvp = &bvprv;
- seg_size = bv.bv_len;
- sectors += bv.bv_len >> 9;
+
+ if (bvec_split_segs(q, &bv, &nsegs, &seg_size,
+ &front_seg_size, &sectors))
+ goto split;

/* restore the bvec for iterator */
if (advance) {
@@ -251,6 +311,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
struct bio_vec bv, bvprv = { NULL };
int cluster, prev = 0;
unsigned int seg_size, nr_phys_segs;
+ unsigned front_seg_size = bio->bi_seg_front_size;
struct bio *fbio, *bbio;
struct bvec_iter iter;

@@ -271,7 +332,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
seg_size = 0;
nr_phys_segs = 0;
for_each_bio(bio) {
- bio_for_each_segment(bv, bio, iter) {
+ bio_for_each_segment_mp(bv, bio, iter) {
/*
* If SG merging is disabled, each bio vector is
* a segment
@@ -293,20 +354,20 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
continue;
}
new_segment:
- if (nr_phys_segs == 1 && seg_size >
- fbio->bi_seg_front_size)
- fbio->bi_seg_front_size = seg_size;
+ if (nr_phys_segs == 1 && seg_size > front_seg_size)
+ front_seg_size = seg_size;

- nr_phys_segs++;
bvprv = bv;
prev = 1;
- seg_size = bv.bv_len;
+ bvec_split_segs(q, &bv, &nr_phys_segs, &seg_size,
+ &front_seg_size, NULL);
}
bbio = bio;
}

- if (nr_phys_segs == 1 && seg_size > fbio->bi_seg_front_size)
- fbio->bi_seg_front_size = seg_size;
+ if (nr_phys_segs == 1 && seg_size > front_seg_size)
+ front_seg_size = seg_size;
+ fbio->bi_seg_front_size = front_seg_size;
if (seg_size > bbio->bi_seg_back_size)
bbio->bi_seg_back_size = seg_size;

--
2.9.4

2017-06-26 12:18:41

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 31/51] fs/buffer.c: use bvec iterator to truncate the bio

Once multipage bvec is enabled, the last bvec may include
more than one page, this patch use bvec_get_last_page()
to truncate the bio.

Signed-off-by: Ming Lei <[email protected]>
---
fs/buffer.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 1910f539770b..53b8a29f4525 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3055,8 +3055,7 @@ void guard_bio_eod(int op, struct bio *bio)
unsigned truncated_bytes;
/*
* It is safe to truncate the last bvec in the following way
- * even though multipage bvec is supported, but we need to
- * fix the parameters passed to zero_user().
+ * even though multipage bvec is supported.
*/
struct bio_vec *bvec = &bio->bi_io_vec[bio->bi_vcnt - 1];

@@ -3085,7 +3084,10 @@ void guard_bio_eod(int op, struct bio *bio)

/* ..and clear the end of the buffer for reads */
if (op == REQ_OP_READ) {
- zero_user(bvec->bv_page, bvec->bv_offset + bvec->bv_len,
+ struct bio_vec bv;
+
+ bvec_get_last_page(bvec, &bv);
+ zero_user(bv.bv_page, bv.bv_offset + bv.bv_len,
truncated_bytes);
}
}
--
2.9.4

2017-06-26 12:19:06

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 34/51] block: convert to singe/multi page version of bio_for_each_segment_all()

Signed-off-by: Ming Lei <[email protected]>
---
block/bio.c | 17 +++++++++++------
block/blk-zoned.c | 5 +++--
block/bounce.c | 6 ++++--
3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 22e5deec7ec7..c460888f14b5 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -988,7 +988,7 @@ int bio_alloc_pages(struct bio *bio, gfp_t gfp_mask)
int i;
struct bio_vec *bv;

- bio_for_each_segment_all(bv, bio, i) {
+ bio_for_each_segment_all_mp(bv, bio, i) {
bv->bv_page = alloc_page(gfp_mask);
if (!bv->bv_page) {
while (--bv >= bio->bi_io_vec)
@@ -1089,8 +1089,9 @@ static int bio_copy_from_iter(struct bio *bio, struct iov_iter iter)
{
int i;
struct bio_vec *bvec;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
ssize_t ret;

ret = copy_page_from_iter(bvec->bv_page,
@@ -1120,8 +1121,9 @@ static int bio_copy_to_iter(struct bio *bio, struct iov_iter iter)
{
int i;
struct bio_vec *bvec;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
ssize_t ret;

ret = copy_page_to_iter(bvec->bv_page,
@@ -1143,8 +1145,9 @@ void bio_free_pages(struct bio *bio)
{
struct bio_vec *bvec;
int i;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i)
+ bio_for_each_segment_all_sp(bvec, bio, i, bia)
__free_page(bvec->bv_page);
}
EXPORT_SYMBOL(bio_free_pages);
@@ -1435,11 +1438,12 @@ static void __bio_unmap_user(struct bio *bio)
{
struct bio_vec *bvec;
int i;
+ struct bvec_iter_all bia;

/*
* make sure we dirty pages we wrote to
*/
- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
if (bio_data_dir(bio) == READ)
set_page_dirty_lock(bvec->bv_page);

@@ -1531,8 +1535,9 @@ static void bio_copy_kern_endio_read(struct bio *bio)
char *p = bio->bi_private;
struct bio_vec *bvec;
int i;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
memcpy(p, page_address(bvec->bv_page), bvec->bv_len);
p += bvec->bv_len;
}
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 3bd15d8095b1..558b84ae2d86 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -81,6 +81,7 @@ int blkdev_report_zones(struct block_device *bdev,
unsigned int ofst;
void *addr;
int ret;
+ struct bvec_iter_all bia;

if (!q)
return -ENXIO;
@@ -148,7 +149,7 @@ int blkdev_report_zones(struct block_device *bdev,
n = 0;
nz = 0;
nr_rep = 0;
- bio_for_each_segment_all(bv, bio, i) {
+ bio_for_each_segment_all_sp(bv, bio, i, bia) {

if (!bv->bv_page)
break;
@@ -181,7 +182,7 @@ int blkdev_report_zones(struct block_device *bdev,

*nr_zones = nz;
out:
- bio_for_each_segment_all(bv, bio, i)
+ bio_for_each_segment_all_sp(bv, bio, i, bia)
__free_page(bv->bv_page);
bio_put(bio);

diff --git a/block/bounce.c b/block/bounce.c
index 590dcdb1de76..1f46ba9535c1 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -144,11 +144,12 @@ static void bounce_end_io(struct bio *bio, mempool_t *pool)
struct bio_vec *bvec, orig_vec;
int i;
struct bvec_iter orig_iter = bio_orig->bi_iter;
+ struct bvec_iter_all bia;

/*
* free up bounce indirect pages used
*/
- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
orig_vec = bio_iter_iovec(bio_orig, orig_iter);
if (bvec->bv_page == orig_vec.bv_page)
goto next;
@@ -205,6 +206,7 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
unsigned i = 0;
bool bounce = false;
int sectors = 0;
+ struct bvec_iter_all bia;

bio_for_each_segment(from, *bio_orig, iter) {
if (i++ < BIO_MAX_PAGES)
@@ -223,7 +225,7 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
}
bio = bio_clone_bioset(*bio_orig, GFP_NOIO, bounce_bio_set);

- bio_for_each_segment_all(to, bio, i) {
+ bio_for_each_segment_all_sp(to, bio, i, bia) {
struct page *page = to->bv_page;

if (page_to_pfn(page) <= queue_bounce_pfn(q))
--
2.9.4

2017-06-26 12:19:31

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 35/51] bcache: convert to bio_for_each_segment_all_sp()

Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
drivers/md/bcache/btree.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 3da595ae565b..74cbb7387dc5 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -422,8 +422,9 @@ static void do_btree_node_write(struct btree *b)
int j;
struct bio_vec *bv;
void *base = (void *) ((unsigned long) i & ~(PAGE_SIZE - 1));
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bv, b->bio, j)
+ bio_for_each_segment_all_sp(bv, b->bio, j, bia)
memcpy(page_address(bv->bv_page),
base + j * PAGE_SIZE, PAGE_SIZE);

--
2.9.4

2017-06-26 12:19:35

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 38/51] dm-crypt: convert to bio_for_each_segment_all_sp()

Cc: Mike Snitzer <[email protected]>
Cc:[email protected]
Signed-off-by: Ming Lei <[email protected]>
---
drivers/md/dm-crypt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 664ba3504f48..0f2f44a73a32 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1446,8 +1446,9 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone)
{
unsigned int i;
struct bio_vec *bv;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bv, clone, i) {
+ bio_for_each_segment_all_sp(bv, clone, i, bia) {
BUG_ON(!bv->bv_page);
mempool_free(bv->bv_page, cc->page_pool);
}
--
2.9.4

2017-06-26 12:19:56

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 39/51] fs/mpage: convert to bio_for_each_segment_all_sp()

Signed-off-by: Ming Lei <[email protected]>
---
fs/mpage.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/mpage.c b/fs/mpage.c
index 0da38f401564..bdb4692ae30c 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -46,9 +46,10 @@
static void mpage_end_io(struct bio *bio)
{
struct bio_vec *bv;
+ struct bvec_iter_all bia;
int i;

- bio_for_each_segment_all(bv, bio, i) {
+ bio_for_each_segment_all_sp(bv, bio, i, bia) {
struct page *page = bv->bv_page;
page_endio(page, op_is_write(bio_op(bio)),
blk_status_to_errno(bio->bi_status));
--
2.9.4

2017-06-26 12:20:08

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 41/51] fs/iomap: convert to bio_for_each_segment_all_sp()

Signed-off-by: Ming Lei <[email protected]>
---
fs/iomap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index c71a64b97fba..4319284c1fbd 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -696,8 +696,9 @@ static void iomap_dio_bio_end_io(struct bio *bio)
} else {
struct bio_vec *bvec;
int i;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i)
+ bio_for_each_segment_all_sp(bvec, bio, i, bia)
put_page(bvec->bv_page);
bio_put(bio);
}
--
2.9.4

2017-06-26 12:20:24

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 43/51] xfs: convert to bio_for_each_segment_all_sp()

Cc: "Darrick J. Wong" <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
fs/xfs/xfs_aops.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 11ef989b8629..621efe71c70a 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -139,6 +139,7 @@ xfs_destroy_ioend(
for (bio = &ioend->io_inline_bio; bio; bio = next) {
struct bio_vec *bvec;
int i;
+ struct bvec_iter_all bia;

/*
* For the last bio, bi_private points to the ioend, so we
@@ -150,7 +151,7 @@ xfs_destroy_ioend(
next = bio->bi_private;

/* walk each page on bio, ending page IO on them */
- bio_for_each_segment_all(bvec, bio, i)
+ bio_for_each_segment_all_sp(bvec, bio, i, bia)
xfs_finish_page_writeback(inode, bvec, error);

bio_put(bio);
--
2.9.4

2017-06-26 12:20:13

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 42/51] ext4: convert to bio_for_each_segment_all_sp()

Cc: "Theodore Ts'o" <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
fs/ext4/page-io.c | 3 ++-
fs/ext4/readpage.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 930ca0fc9a0f..0e59404fc530 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -62,8 +62,9 @@ static void ext4_finish_bio(struct bio *bio)
{
int i;
struct bio_vec *bvec;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
#ifdef CONFIG_EXT4_FS_ENCRYPTION
struct page *data_page = NULL;
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index 40a5497b0f60..6bd33c4c1f7f 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -71,6 +71,7 @@ static void mpage_end_io(struct bio *bio)
{
struct bio_vec *bv;
int i;
+ struct bvec_iter_all bia;

if (ext4_bio_encrypted(bio)) {
if (bio->bi_status) {
@@ -80,7 +81,7 @@ static void mpage_end_io(struct bio *bio)
return;
}
}
- bio_for_each_segment_all(bv, bio, i) {
+ bio_for_each_segment_all_sp(bv, bio, i, bia) {
struct page *page = bv->bv_page;

if (!bio->bi_status) {
--
2.9.4

2017-06-26 12:20:11

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 40/51] fs/block: convert to bio_for_each_segment_all_sp()

Signed-off-by: Ming Lei <[email protected]>
---
fs/block_dev.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index a57c26bcb970..d82e43bd8e82 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -209,6 +209,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
ssize_t ret;
blk_qc_t qc;
int i;
+ struct bvec_iter_all bia;

if ((pos | iov_iter_alignment(iter)) &
(bdev_logical_block_size(bdev) - 1))
@@ -253,7 +254,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
}
__set_current_state(TASK_RUNNING);

- bio_for_each_segment_all(bvec, &bio, i) {
+ bio_for_each_segment_all_sp(bvec, &bio, i, bia) {
if (should_dirty && !PageCompound(bvec->bv_page))
set_page_dirty_lock(bvec->bv_page);
put_page(bvec->bv_page);
@@ -317,8 +318,9 @@ static void blkdev_bio_end_io(struct bio *bio)
} else {
struct bio_vec *bvec;
int i;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i)
+ bio_for_each_segment_all_sp(bvec, bio, i, bia)
put_page(bvec->bv_page);
bio_put(bio);
}
--
2.9.4

2017-06-26 12:21:12

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 44/51] gfs2: convert to bio_for_each_segment_all_sp()

Cc: Steven Whitehouse <[email protected]>
Cc: Bob Peterson <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
fs/gfs2/lops.c | 3 ++-
fs/gfs2/meta_io.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index d62939f00d53..294f1926d9be 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -206,11 +206,12 @@ static void gfs2_end_log_write(struct bio *bio)
struct bio_vec *bvec;
struct page *page;
int i;
+ struct bvec_iter_all bia;

if (bio->bi_status)
fs_err(sdp, "Error %d writing to log\n", bio->bi_status);

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
page = bvec->bv_page;
if (page_has_buffers(page))
gfs2_end_log_write_bh(sdp, bvec, bio->bi_status);
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index fabe1614f879..6879b0103539 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -190,8 +190,9 @@ static void gfs2_meta_read_endio(struct bio *bio)
{
struct bio_vec *bvec;
int i;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
struct buffer_head *bh = page_buffers(page);
unsigned int len = bvec->bv_len;
--
2.9.4

2017-06-26 12:21:24

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 46/51] exofs: convert to bio_for_each_segment_all_sp()

Cc: Boaz Harrosh <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
fs/exofs/ore.c | 3 ++-
fs/exofs/ore_raid.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c
index 8bb72807e70d..38a7d8bfdd4c 100644
--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -406,8 +406,9 @@ static void _clear_bio(struct bio *bio)
{
struct bio_vec *bv;
unsigned i;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bv, bio, i) {
+ bio_for_each_segment_all_sp(bv, bio, i, bia) {
unsigned this_count = bv->bv_len;

if (likely(PAGE_SIZE == this_count))
diff --git a/fs/exofs/ore_raid.c b/fs/exofs/ore_raid.c
index 27cbdb697649..37c0a9aa2ec2 100644
--- a/fs/exofs/ore_raid.c
+++ b/fs/exofs/ore_raid.c
@@ -429,6 +429,7 @@ static void _mark_read4write_pages_uptodate(struct ore_io_state *ios, int ret)
{
struct bio_vec *bv;
unsigned i, d;
+ struct bvec_iter_all bia;

/* loop on all devices all pages */
for (d = 0; d < ios->numdevs; d++) {
@@ -437,7 +438,7 @@ static void _mark_read4write_pages_uptodate(struct ore_io_state *ios, int ret)
if (!bio)
continue;

- bio_for_each_segment_all(bv, bio, i) {
+ bio_for_each_segment_all_sp(bv, bio, i, bia) {
struct page *page = bv->bv_page;

SetPageUptodate(page);
--
2.9.4

2017-06-26 12:22:27

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 48/51] fs/btrfs: convert to bio_for_each_segment_all_sp()

Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: David Sterba <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
fs/btrfs/compression.c | 3 ++-
fs/btrfs/disk-io.c | 3 ++-
fs/btrfs/extent_io.c | 12 ++++++++----
fs/btrfs/inode.c | 6 ++++--
fs/btrfs/raid56.c | 6 ++++--
5 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index fdab5b821aa8..9d1693ecf468 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -147,12 +147,13 @@ static void end_compressed_bio_read(struct bio *bio)
} else {
int i;
struct bio_vec *bvec;
+ struct bvec_iter_all bia;

/*
* we have verified the checksum already, set page
* checked so the end_io handlers know about it
*/
- bio_for_each_segment_all(bvec, cb->orig_bio, i)
+ bio_for_each_segment_all_sp(bvec, cb->orig_bio, i, bia)
SetPageChecked(bvec->bv_page);

bio_endio(cb->orig_bio);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f4f54d13db6d..e7efbaa3566c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -963,8 +963,9 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
struct bio_vec *bvec;
struct btrfs_root *root;
int i, ret = 0;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
root = BTRFS_I(bvec->bv_page->mapping->host)->root;
ret = csum_dirty_buffer(root->fs_info, bvec->bv_page);
if (ret)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7cc6c8a52e49..8e51452894ba 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2359,8 +2359,9 @@ static unsigned int get_bio_pages(struct bio *bio)
{
unsigned i;
struct bio_vec *bv;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bv, bio, i)
+ bio_for_each_segment_all_sp(bv, bio, i, bia)
;

return i;
@@ -2468,8 +2469,9 @@ static void end_bio_extent_writepage(struct bio *bio)
u64 start;
u64 end;
int i;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
struct inode *inode = page->mapping->host;
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -2538,8 +2540,9 @@ static void end_bio_extent_readpage(struct bio *bio)
int mirror;
int ret;
int i;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
struct inode *inode = page->mapping->host;
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -3695,8 +3698,9 @@ static void end_bio_extent_buffer_writepage(struct bio *bio)
struct bio_vec *bvec;
struct extent_buffer *eb;
int i, done;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;

eb = (struct extent_buffer *)page->private;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7e725d84917b..61cc6d899ae5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8051,6 +8051,7 @@ static void btrfs_retry_endio_nocsum(struct bio *bio)
struct bio_vec *bvec;
struct extent_io_tree *io_tree, *failure_tree;
int i;
+ struct bvec_iter_all bia;

if (bio->bi_status)
goto end;
@@ -8067,7 +8068,7 @@ static void btrfs_retry_endio_nocsum(struct bio *bio)
ASSERT(bio->bi_io_vec->bv_len == btrfs_inode_sectorsize(inode));

done->uptodate = 1;
- bio_for_each_segment_all(bvec, bio, i)
+ bio_for_each_segment_all_sp(bvec, bio, i, bia)
clean_io_failure(BTRFS_I(inode)->root->fs_info, failure_tree,
io_tree, done->start, bvec->bv_page,
btrfs_ino(BTRFS_I(inode)), 0);
@@ -8146,6 +8147,7 @@ static void btrfs_retry_endio(struct bio *bio)
int uptodate;
int ret;
int i;
+ struct bvec_iter_all bia;

if (bio->bi_status)
goto end;
@@ -8164,7 +8166,7 @@ static void btrfs_retry_endio(struct bio *bio)
io_tree = &BTRFS_I(inode)->io_tree;
failure_tree = &BTRFS_I(inode)->io_failure_tree;

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
ret = __readpage_endio_check(inode, io_bio, i, bvec->bv_page,
bvec->bv_offset, done->start,
bvec->bv_len);
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 6f845d219cd6..9c68f1c88b40 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1141,6 +1141,7 @@ static void index_rbio_pages(struct btrfs_raid_bio *rbio)
unsigned long stripe_offset;
unsigned long page_index;
int i;
+ struct bvec_iter_all bia;

spin_lock_irq(&rbio->bio_list_lock);
bio_list_for_each(bio, &rbio->bio_list) {
@@ -1148,7 +1149,7 @@ static void index_rbio_pages(struct btrfs_raid_bio *rbio)
stripe_offset = start - rbio->bbio->raid_map[0];
page_index = stripe_offset >> PAGE_SHIFT;

- bio_for_each_segment_all(bvec, bio, i)
+ bio_for_each_segment_all_sp(bvec, bio, i, bia)
rbio->bio_pages[page_index + i] = bvec->bv_page;
}
spin_unlock_irq(&rbio->bio_list_lock);
@@ -1425,8 +1426,9 @@ static void set_bio_pages_uptodate(struct bio *bio)
{
struct bio_vec *bvec;
int i;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i)
+ bio_for_each_segment_all_sp(bvec, bio, i, bia)
SetPageUptodate(bvec->bv_page);
}

--
2.9.4

2017-06-26 12:22:26

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 49/51] fs/direct-io: convert to bio_for_each_segment_all_sp()

Signed-off-by: Ming Lei <[email protected]>
---
fs/direct-io.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index c87077d1dc33..a139b3bbad8e 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -489,7 +489,9 @@ static blk_status_t dio_bio_complete(struct dio *dio, struct bio *bio)
if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) {
bio_check_pages_dirty(bio); /* transfers ownership */
} else {
- bio_for_each_segment_all(bvec, bio, i) {
+ struct bvec_iter_all bia;
+
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;

if (dio->op == REQ_OP_READ && !PageCompound(page) &&
--
2.9.4

2017-06-26 12:22:24

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 50/51] block: enable multipage bvecs

This patch pulls the trigger for multipage bvecs.

Now any request queue which supports queue cluster
will see multipage bvecs.

Signed-off-by: Ming Lei <[email protected]>
---
block/bio.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index c460888f14b5..436305cde045 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -839,6 +839,11 @@ int bio_add_page(struct bio *bio, struct page *page,
* a consecutive offset. Optimize this special case.
*/
if (bio->bi_vcnt > 0) {
+ struct request_queue *q = NULL;
+
+ if (bio->bi_bdev)
+ q = bdev_get_queue(bio->bi_bdev);
+
bv = &bio->bi_io_vec[bio->bi_vcnt - 1];

if (page == bv->bv_page &&
@@ -846,6 +851,14 @@ int bio_add_page(struct bio *bio, struct page *page,
bv->bv_len += len;
goto done;
}
+
+ /* disable multipage bvec too if cluster isn't enabled */
+ if (q && blk_queue_cluster(q) &&
+ (bvec_to_phys(bv) + bv->bv_len ==
+ page_to_phys(page) + offset)) {
+ bv->bv_len += len;
+ goto done;
+ }
}

if (bio->bi_vcnt >= bio->bi_max_vecs)
--
2.9.4

2017-06-26 12:21:39

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 47/51] fs: crypto: convert to bio_for_each_segment_all_sp()

Signed-off-by: Ming Lei <[email protected]>
---
fs/crypto/bio.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index 6181e9526860..d5516ed19166 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -36,8 +36,9 @@ static void completion_pages(struct work_struct *work)
struct bio *bio = ctx->r.bio;
struct bio_vec *bv;
int i;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bv, bio, i) {
+ bio_for_each_segment_all_sp(bv, bio, i, bia) {
struct page *page = bv->bv_page;
int ret = fscrypt_decrypt_page(page->mapping->host, page,
PAGE_SIZE, 0, page->index);
--
2.9.4

2017-06-26 12:22:21

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 51/51] block: bio: pass segments to bio if bio_add_page() is bypassed

Under some situations, such as block direct I/O, we can't use
bio_add_page() for merging pages into multipage bvec, so
a new function is implemented for converting page array into one
segment array, then these cases can benefit from multipage bvec
too.

Signed-off-by: Ming Lei <[email protected]>
---
block/bio.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 48 insertions(+), 6 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 436305cde045..e2bcbb842982 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -876,6 +876,41 @@ int bio_add_page(struct bio *bio, struct page *page,
}
EXPORT_SYMBOL(bio_add_page);

+static unsigned convert_to_segs(struct bio* bio, struct page **pages,
+ unsigned char *page_cnt,
+ unsigned nr_pages)
+{
+
+ unsigned idx;
+ unsigned nr_seg = 0;
+ struct request_queue *q = NULL;
+
+ if (bio->bi_bdev)
+ q = bdev_get_queue(bio->bi_bdev);
+
+ if (!q || !blk_queue_cluster(q)) {
+ memset(page_cnt, 0, nr_pages);
+ return nr_pages;
+ }
+
+ page_cnt[nr_seg] = 0;
+ for (idx = 1; idx < nr_pages; idx++) {
+ struct page *pg_s = pages[nr_seg];
+ struct page *pg = pages[idx];
+
+ if (page_to_pfn(pg_s) + page_cnt[nr_seg] + 1 ==
+ page_to_pfn(pg)) {
+ page_cnt[nr_seg]++;
+ } else {
+ page_cnt[++nr_seg] = 0;
+ if (nr_seg < idx)
+ pages[nr_seg] = pg;
+ }
+ }
+
+ return nr_seg + 1;
+}
+
/**
* bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio
* @bio: bio to add pages to
@@ -895,6 +930,8 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
struct page **pages = (struct page **)bv;
size_t offset, diff;
ssize_t size;
+ unsigned short nr_segs;
+ unsigned char page_cnt[nr_pages]; /* at most 256 pages */

size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
if (unlikely(size <= 0))
@@ -910,13 +947,18 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
* need to be reflected here as well.
*/
bio->bi_iter.bi_size += size;
- bio->bi_vcnt += nr_pages;
-
diff = (nr_pages * PAGE_SIZE - offset) - size;
- while (nr_pages--) {
- bv[nr_pages].bv_page = pages[nr_pages];
- bv[nr_pages].bv_len = PAGE_SIZE;
- bv[nr_pages].bv_offset = 0;
+
+ /* convert into segments */
+ nr_segs = convert_to_segs(bio, pages, page_cnt, nr_pages);
+ bio->bi_vcnt += nr_segs;
+
+ while (nr_segs--) {
+ unsigned cnt = (unsigned)page_cnt[nr_segs] + 1;
+
+ bv[nr_segs].bv_page = pages[nr_segs];
+ bv[nr_segs].bv_len = PAGE_SIZE * cnt;
+ bv[nr_segs].bv_offset = 0;
}

bv[0].bv_offset += offset;
--
2.9.4

2017-06-26 12:24:10

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 45/51] f2fs: convert to bio_for_each_segment_all_sp()

Cc: Jaegeuk Kim <[email protected]>
Cc: Chao Yu <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
fs/f2fs/data.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 622c44a1be78..57d5a2760bf1 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -54,6 +54,7 @@ static void f2fs_read_end_io(struct bio *bio)
{
struct bio_vec *bvec;
int i;
+ struct bvec_iter_all bia;

#ifdef CONFIG_F2FS_FAULT_INJECTION
/*
@@ -75,7 +76,7 @@ static void f2fs_read_end_io(struct bio *bio)
}
}

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;

if (!bio->bi_status) {
@@ -95,8 +96,9 @@ static void f2fs_write_end_io(struct bio *bio)
struct f2fs_sb_info *sbi = bio->bi_private;
struct bio_vec *bvec;
int i;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;
enum count_type type = WB_DATA_TYPE(page);

@@ -256,6 +258,7 @@ static bool __has_merged_page(struct f2fs_bio_info *io,
struct bio_vec *bvec;
struct page *target;
int i;
+ struct bvec_iter_all bia;

if (!io->bio)
return false;
@@ -263,7 +266,7 @@ static bool __has_merged_page(struct f2fs_bio_info *io,
if (!inode && !ino)
return true;

- bio_for_each_segment_all(bvec, io->bio, i) {
+ bio_for_each_segment_all_sp(bvec, io->bio, i, bia) {

if (bvec->bv_page->mapping)
target = bvec->bv_page;
--
2.9.4

2017-06-26 12:19:23

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 36/51] md: raid1: convert to bio_for_each_segment_all_sp()

Cc: Shaohua Li <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
drivers/md/raid1.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 835c42396861..ca4b9ff8d39b 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2135,13 +2135,14 @@ static void process_checks(struct r1bio *r1_bio)
struct page **spages = get_resync_pages(sbio)->pages;
struct bio_vec *bi;
int page_len[RESYNC_PAGES] = { 0 };
+ struct bvec_iter_all bia;

if (sbio->bi_end_io != end_sync_read)
continue;
/* Now we can 'fixup' the error value */
sbio->bi_status = 0;

- bio_for_each_segment_all(bi, sbio, j)
+ bio_for_each_segment_all_sp(bi, sbio, j, bia)
page_len[j] = bi->bv_len;

if (!status) {
--
2.9.4

2017-06-26 12:24:44

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 37/51] dm-crypt: don't clear bvec->bv_page in crypt_free_buffer_pages()

The bio is always freed after running crypt_free_buffer_pages(),
so it isn't necessary to clear the bv->bv_page.

Cc: Mike Snitzer <[email protected]>
Cc:[email protected]
Signed-off-by: Ming Lei <[email protected]>
---
drivers/md/dm-crypt.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index cdf6b1e12460..664ba3504f48 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1450,7 +1450,6 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone)
bio_for_each_segment_all(bv, clone, i) {
BUG_ON(!bv->bv_page);
mempool_free(bv->bv_page, cc->page_pool);
- bv->bv_page = NULL;
}
}

--
2.9.4

2017-06-26 12:25:35

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 33/51] block: deal with dirtying pages for multipage bvec

In bio_check_pages_dirty(), bvec->bv_page is used as flag
for marking if the page has been dirtied & released, and if
no, it will be dirtied in deferred workqueue.

With multipage bvec, we can't do that any more, so change
the logic into checking all pages in one mp bvec, and only
release all these pages if all are dirtied, otherwise dirty
them all in deferred wrokqueue.

Signed-off-by: Ming Lei <[email protected]>
---
block/bio.c | 45 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index bf7f25889f6e..22e5deec7ec7 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1641,8 +1641,9 @@ void bio_set_pages_dirty(struct bio *bio)
{
struct bio_vec *bvec;
int i;
+ struct bvec_iter_all bia;

- bio_for_each_segment_all(bvec, bio, i) {
+ bio_for_each_segment_all_sp(bvec, bio, i, bia) {
struct page *page = bvec->bv_page;

if (page && !PageCompound(page))
@@ -1650,16 +1651,26 @@ void bio_set_pages_dirty(struct bio *bio)
}
}

+static inline void release_mp_bvec_pages(struct bio_vec *bvec)
+{
+ struct bio_vec bv;
+ struct bvec_iter iter;
+
+ bvec_for_each_sp_bvec(bv, bvec, iter)
+ put_page(bv.bv_page);
+}
+
static void bio_release_pages(struct bio *bio)
{
struct bio_vec *bvec;
int i;

- bio_for_each_segment_all(bvec, bio, i) {
+ /* iterate each mp bvec */
+ bio_for_each_segment_all_mp(bvec, bio, i) {
struct page *page = bvec->bv_page;

if (page)
- put_page(page);
+ release_mp_bvec_pages(bvec);
}
}

@@ -1703,20 +1714,38 @@ static void bio_dirty_fn(struct work_struct *work)
}
}

+static inline void check_mp_bvec_pages(struct bio_vec *bvec,
+ int *nr_dirty, int *nr_pages)
+{
+ struct bio_vec bv;
+ struct bvec_iter iter;
+
+ bvec_for_each_sp_bvec(bv, bvec, iter) {
+ struct page *page = bv.bv_page;
+
+ if (PageDirty(page) || PageCompound(page))
+ (*nr_dirty)++;
+ (*nr_pages)++;
+ }
+}
+
void bio_check_pages_dirty(struct bio *bio)
{
struct bio_vec *bvec;
int nr_clean_pages = 0;
int i;

- bio_for_each_segment_all(bvec, bio, i) {
- struct page *page = bvec->bv_page;
+ bio_for_each_segment_all_mp(bvec, bio, i) {
+ int nr_dirty = 0, nr_pages = 0;
+
+ check_mp_bvec_pages(bvec, &nr_dirty, &nr_pages);

- if (PageDirty(page) || PageCompound(page)) {
- put_page(page);
+ /* release all pages in the mp bvec if all are dirtied */
+ if (nr_dirty == nr_pages) {
+ release_mp_bvec_pages(bvec);
bvec->bv_page = NULL;
} else {
- nr_clean_pages++;
+ nr_clean_pages += nr_pages;
}
}

--
2.9.4

2017-06-26 12:26:14

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 32/51] btrfs: use bvec_get_last_page to get bio's last page

Preparing for supporting multipage bvec.

Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: David Sterba <[email protected]>
Cc: [email protected]
Signed-off-by: Ming Lei <[email protected]>
---
fs/btrfs/compression.c | 5 ++++-
fs/btrfs/extent_io.c | 8 ++++++--
2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 5972f74354ca..fdab5b821aa8 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -391,8 +391,11 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
static u64 bio_end_offset(struct bio *bio)
{
struct bio_vec *last = &bio->bi_io_vec[bio->bi_vcnt - 1];
+ struct bio_vec bv;

- return page_offset(last->bv_page) + last->bv_len + last->bv_offset;
+ bvec_get_last_page(last, &bv);
+
+ return page_offset(bv.bv_page) + bv.bv_len + bv.bv_offset;
}

static noinline int add_ra_bio_pages(struct inode *inode,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5b453cada1ea..7cc6c8a52e49 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2741,11 +2741,15 @@ static int __must_check submit_one_bio(struct bio *bio, int mirror_num,
{
blk_status_t ret = 0;
struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
- struct page *page = bvec->bv_page;
struct extent_io_tree *tree = bio->bi_private;
+ struct bio_vec bv;
+ struct page *page;
u64 start;

- start = page_offset(page) + bvec->bv_offset;
+ bvec_get_last_page(bvec, &bv);
+ page = bv.bv_page;
+
+ start = page_offset(page) + bv.bv_offset;

bio->bi_private = NULL;
bio_get(bio);
--
2.9.4

2017-06-26 12:16:47

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 22/51] block: introduce bio_for_each_segment_mp()

This helper is used to iterate multipage bvec and it is
required in bio_clone().

Signed-off-by: Ming Lei <[email protected]>
---
include/linux/bio.h | 39 ++++++++++++++++++++++++++++++++++-----
include/linux/bvec.h | 37 ++++++++++++++++++++++++++++++++-----
2 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index d425be4d1ced..bdbc9480229d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -68,6 +68,9 @@
#define bio_data_dir(bio) \
(op_is_write(bio_op(bio)) ? WRITE : READ)

+#define bio_iter_iovec_mp(bio, iter) \
+ bvec_iter_bvec_mp((bio)->bi_io_vec, (iter))
+
/*
* Check whether this bio carries any data or not. A NULL bio is allowed.
*/
@@ -163,15 +166,31 @@ static inline void *bio_data(struct bio *bio)
#define bio_for_each_segment_all(bvl, bio, i) \
for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)

-static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
- unsigned bytes)
+static inline void __bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+ unsigned bytes, bool mp)
{
iter->bi_sector += bytes >> 9;

- if (bio_no_advance_iter(bio))
+ if (bio_no_advance_iter(bio)) {
iter->bi_size -= bytes;
- else
- bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+ } else {
+ if (!mp)
+ bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+ else
+ bvec_iter_advance_mp(bio->bi_io_vec, iter, bytes);
+ }
+}
+
+static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+ unsigned bytes)
+{
+ __bio_advance_iter(bio, iter, bytes, false);
+}
+
+static inline void bio_advance_iter_mp(struct bio *bio, struct bvec_iter *iter,
+ unsigned bytes)
+{
+ __bio_advance_iter(bio, iter, bytes, true);
}

#define __bio_for_each_segment(bvl, bio, iter, start) \
@@ -187,6 +206,16 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
#define bio_for_each_segment(bvl, bio, iter) \
__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)

+#define __bio_for_each_segment_mp(bvl, bio, iter, start) \
+ for (iter = (start); \
+ (iter).bi_size && \
+ ((bvl = bio_iter_iovec_mp((bio), (iter))), 1); \
+ bio_advance_iter_mp((bio), &(iter), (bvl).bv_len))
+
+/* returns one real segment(multipage bvec) each time */
+#define bio_for_each_segment_mp(bvl, bio, iter) \
+ __bio_for_each_segment_mp(bvl, bio, iter, (bio)->bi_iter)
+
#define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)

static inline unsigned bio_segments(struct bio *bio)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 61632e9db3b8..5c51c58fe202 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -128,16 +128,29 @@ struct bvec_iter {
.bv_offset = bvec_iter_offset((bvec), (iter)), \
})

-static inline void bvec_iter_advance(const struct bio_vec *bv,
- struct bvec_iter *iter,
- unsigned bytes)
+#define bvec_iter_bvec_mp(bvec, iter) \
+((struct bio_vec) { \
+ .bv_page = bvec_iter_page_mp((bvec), (iter)), \
+ .bv_len = bvec_iter_len_mp((bvec), (iter)), \
+ .bv_offset = bvec_iter_offset_mp((bvec), (iter)), \
+})
+
+static inline void __bvec_iter_advance(const struct bio_vec *bv,
+ struct bvec_iter *iter,
+ unsigned bytes, bool mp)
{
WARN_ONCE(bytes > iter->bi_size,
"Attempted to advance past end of bvec iter\n");

while (bytes) {
- unsigned iter_len = bvec_iter_len(bv, *iter);
- unsigned len = min(bytes, iter_len);
+ unsigned len;
+
+ if (mp)
+ len = bvec_iter_len_mp(bv, *iter);
+ else
+ len = bvec_iter_len_sp(bv, *iter);
+
+ len = min(bytes, len);

bytes -= len;
iter->bi_size -= len;
@@ -150,6 +163,20 @@ static inline void bvec_iter_advance(const struct bio_vec *bv,
}
}

+static inline void bvec_iter_advance(const struct bio_vec *bv,
+ struct bvec_iter *iter,
+ unsigned bytes)
+{
+ __bvec_iter_advance(bv, iter, bytes, false);
+}
+
+static inline void bvec_iter_advance_mp(const struct bio_vec *bv,
+ struct bvec_iter *iter,
+ unsigned bytes)
+{
+ __bvec_iter_advance(bv, iter, bytes, true);
+}
+
#define for_each_bvec(bvl, bio_vec, iter, start) \
for (iter = (start); \
(iter).bi_size && \
--
2.9.4

2017-06-26 12:27:39

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 20/51] block: introduce multipage/single page bvec helpers

This patch introduces helpers which are suffixed with _mp
and _sp for the multipage bvec/segment support.

The helpers with _mp suffix are the interfaces for treating
one bvec/segment as real multipage one, for example, .bv_len
is the total length of the multipage segment.

The helpers with _sp suffix are interfaces for supporting
current bvec iterator which is thought as singlepage only
by drivers, fs, dm and etc. These _sp helpers are introduced
to build singlepage bvec in flight, so users of bio/bvec
iterator still can work well and needn't change even though
we store multipage into bvec.

Signed-off-by: Ming Lei <[email protected]>
---
include/linux/bvec.h | 56 +++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 162ca7caf510..f52587e283d4 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -24,6 +24,42 @@
#include <linux/bug.h>

/*
+ * What is multipage bvecs(segment)?
+ *
+ * - bvec stored in bio->bi_io_vec is always multipage style vector
+ *
+ * - bvec(struct bio_vec) represents one physically contiguous I/O
+ * buffer, now the buffer may include more than one pages since
+ * multipage(mp) bvec is supported, and all these pages represented
+ * by one bvec is physically contiguous. Before mp support, at most
+ * one page can be included in one bvec, we call it singlepage(sp)
+ * bvec.
+ *
+ * - .bv_page of th bvec represents the 1st page in the mp segment
+ *
+ * - .bv_offset of the bvec represents offset of the buffer in the bvec
+ *
+ * The effect on the current drivers/filesystem/dm/bcache/...:
+ *
+ * - almost everyone supposes that one bvec only includes one single
+ * page, so we keep the sp interface not changed, for example,
+ * bio_for_each_segment() still returns bvec with single page
+ *
+ * - bio_for_each_segment_all() will be changed to return singlepage
+ * bvec too
+ *
+ * - during iterating, iterator variable(struct bvec_iter) is always
+ * updated in multipage bvec style and that means bvec_iter_advance()
+ * is kept not changed
+ *
+ * - returned(copied) singlepage bvec is generated in flight by bvec
+ * helpers from the stored mp bvec
+ *
+ * - In case that some components(such as iov_iter) need to support mp
+ * segment, we introduce new helpers(suffixed with _mp) for them.
+ */
+
+/*
* was unsigned short, but we might as well be ready for > 64kB I/O pages
*/
struct bio_vec {
@@ -49,16 +85,30 @@ struct bvec_iter {
*/
#define __bvec_iter_bvec(bvec, iter) (&(bvec)[(iter).bi_idx])

-#define bvec_iter_page(bvec, iter) \
+#define bvec_iter_page_mp(bvec, iter) \
(__bvec_iter_bvec((bvec), (iter))->bv_page)

-#define bvec_iter_len(bvec, iter) \
+#define bvec_iter_len_mp(bvec, iter) \
min((iter).bi_size, \
__bvec_iter_bvec((bvec), (iter))->bv_len - (iter).bi_bvec_done)

-#define bvec_iter_offset(bvec, iter) \
+#define bvec_iter_offset_mp(bvec, iter) \
(__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)

+/*
+ * <page, offset,length> of singlepage(sp) segment.
+ *
+ * This helpers will be implemented for building sp bvec in flight.
+ */
+#define bvec_iter_offset_sp(bvec, iter) bvec_iter_offset_mp((bvec), (iter))
+#define bvec_iter_len_sp(bvec, iter) bvec_iter_len_mp((bvec), (iter))
+#define bvec_iter_page_sp(bvec, iter) bvec_iter_page_mp((bvec), (iter))
+
+/* current interfaces support sp style at default */
+#define bvec_iter_page(bvec, iter) bvec_iter_page_sp((bvec), (iter))
+#define bvec_iter_len(bvec, iter) bvec_iter_len_sp((bvec), (iter))
+#define bvec_iter_offset(bvec, iter) bvec_iter_offset_sp((bvec), (iter))
+
#define bvec_iter_bvec(bvec, iter) \
((struct bio_vec) { \
.bv_page = bvec_iter_page((bvec), (iter)), \
--
2.9.4

2017-06-26 12:16:09

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 18/51] block: bounce: don't access bio->bi_io_vec in copy_to_high_bio_irq

As we need to support multipage bvecs, so don't access bio->bi_io_vec
in copy_to_high_bio_irq(), and just use the standard iterator
to do that.

Signed-off-by: Ming Lei <[email protected]>
---
block/bounce.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/bounce.c b/block/bounce.c
index 4eea1b2d8618..590dcdb1de76 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -111,24 +111,30 @@ int init_emergency_isa_pool(void)
static void copy_to_high_bio_irq(struct bio *to, struct bio *from)
{
unsigned char *vfrom;
- struct bio_vec tovec, *fromvec = from->bi_io_vec;
+ struct bio_vec tovec, fromvec;
struct bvec_iter iter;
+ /*
+ * The bio of @from is created by bounce, so we can iterate
+ * its bvec from start to end, but the @from->bi_iter can't be
+ * trusted because it might be changed by splitting.
+ */
+ struct bvec_iter from_iter = BVEC_ITER_ALL_INIT;

bio_for_each_segment(tovec, to, iter) {
- if (tovec.bv_page != fromvec->bv_page) {
+ fromvec = bio_iter_iovec(from, from_iter);
+ if (tovec.bv_page != fromvec.bv_page) {
/*
* fromvec->bv_offset and fromvec->bv_len might have
* been modified by the block layer, so use the original
* copy, bounce_copy_vec already uses tovec->bv_len
*/
- vfrom = page_address(fromvec->bv_page) +
+ vfrom = page_address(fromvec.bv_page) +
tovec.bv_offset;

bounce_copy_vec(&tovec, vfrom);
flush_dcache_page(tovec.bv_page);
}
-
- fromvec++;
+ bio_advance_iter(from, &from_iter, tovec.bv_len);
}
}

--
2.9.4

2017-06-26 12:15:19

by Ming Lei

[permalink] [raw]
Subject: [PATCH v2 14/51] btrfs: avoid to access bvec table directly for a cloned bio

Commit 17347cec15f919901c90(Btrfs: change how we iterate bios in endio)
mentioned that for dio the submitted bio may be fast cloned, we
can't access the bvec table directly for a cloned bio, so use
bio_get_first_bvec() to retrieve the 1st bvec.

Cc: Chris Mason <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: David Sterba <[email protected]>
Cc: [email protected]
Cc: Liu Bo <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
fs/btrfs/inode.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 06dea7c89bbd..4ab02b34f029 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7993,6 +7993,7 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio,
int read_mode = 0;
int segs;
int ret;
+ struct bio_vec bvec;

BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);

@@ -8008,8 +8009,9 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio,
}

segs = bio_segments(failed_bio);
+ bio_get_first_bvec(failed_bio, &bvec);
if (segs > 1 ||
- (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode)))
+ (bvec.bv_len > btrfs_inode_sectorsize(inode)))
read_mode |= REQ_FAILFAST_DEV;

isector = start - btrfs_io_bio(failed_bio)->logical;
--
2.9.4

2017-06-26 16:44:03

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH v2 00/51] block: support multipage bvec

On Mon, Jun 26, 2017 at 08:09:43PM +0800, Ming Lei wrote:
> btrfs: avoid access to .bi_vcnt directly
> btrfs: avoid to access bvec table directly for a cloned bio
> btrfs: comment on direct access bvec table
> btrfs: use bvec_get_last_page to get bio's last page
> fs/btrfs: convert to bio_for_each_segment_all_sp()

Acked-by: David Sterba <[email protected]>

for all the btrfs patches.

2017-06-26 16:46:45

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2 00/51] block: support multipage bvec

On 06/26/2017 06:09 AM, Ming Lei wrote:
> Hi,
>
> This patchset brings multipage bvec into block layer:
>
> 1) what is multipage bvec?
>
> Multipage bvecs means that one 'struct bio_bvec' can hold
> multiple pages which are physically contiguous instead
> of one single page used in linux kernel for long time.
>
> 2) why is multipage bvec introduced?
>
> Kent proposed the idea[1] first.
>
> As system's RAM becomes much bigger than before, and
> at the same time huge page, transparent huge page and
> memory compaction are widely used, it is a bit easy now
> to see physically contiguous pages from fs in I/O.
> On the other hand, from block layer's view, it isn't
> necessary to store intermediate pages into bvec, and
> it is enough to just store the physicallly contiguous
> 'segment' in each io vector.
>
> Also huge pages are being brought to filesystem and swap
> [2][6], we can do IO on a hugepage each time[3], which
> requires that one bio can transfer at least one huge page
> one time. Turns out it isn't flexiable to change BIO_MAX_PAGES
> simply[3][5]. Multipage bvec can fit in this case very well.
>
> With multipage bvec:
>
> - segment handling in block layer can be improved much
> in future since it should be quite easy to convert
> multipage bvec into segment easily. For example, we might
> just store segment in each bvec directly in future.
>
> - bio size can be increased and it should improve some
> high-bandwidth IO case in theory[4].
>
> - Inside block layer, both bio splitting and sg map can
> become more efficient than before by just traversing the
> physically contiguous 'segment' instead of each page.
>
> - there is opportunity in future to improve memory footprint
> of bvecs.
>
> 3) how is multipage bvec implemented in this patchset?
>
> The 1st 18 patches comment on some special cases and deal with
> some special cases of direct access to bvec table.
>
> The 2nd part(19~29) implements multipage bvec in block layer:
>
> - put all tricks into bvec/bio/rq iterators, and as far as
> drivers and fs use these standard iterators, they are happy
> with multipage bvec
>
> - use multipage bvec to split bio and map sg
>
> - bio_for_each_segment_all() changes
> this helper pass pointer of each bvec directly to user, and
> it has to be changed. Two new helpers(bio_for_each_segment_all_sp()
> and bio_for_each_segment_all_mp()) are introduced.
>
> The 3rd part(30~49) convert current users of bio_for_each_segment_all()
> to bio_for_each_segment_all_sp()/bio_for_each_segment_all_mp().
>
> The last part(50~51) enables multipage bvec.
>
> These patches can be found in the following git tree:
>
> https://github.com/ming1/linux/commits/mp-bvec-1.4-v4.12-rc
>
> Thanks Christoph for looking at the early version and providing
> very good suggestions, such as: introduce bio_init_with_vec_table(),
> remove another unnecessary helpers for cleanup and so on.
>
> Any comments are welcome!

I'll take some time to review this over the next week or so. In any
case, it's a little late to stuff into 4.13 and get a decent amount
of exposure and testing on it. A 4.14 target for this would be the
way to go, imho.


--
Jens Axboe

2017-06-26 18:08:27

by Liu Bo

[permalink] [raw]
Subject: Re: [PATCH v2 14/51] btrfs: avoid to access bvec table directly for a cloned bio

On Mon, Jun 26, 2017 at 08:09:57PM +0800, Ming Lei wrote:
> Commit 17347cec15f919901c90(Btrfs: change how we iterate bios in endio)
> mentioned that for dio the submitted bio may be fast cloned, we
> can't access the bvec table directly for a cloned bio, so use
> bio_get_first_bvec() to retrieve the 1st bvec.
>

Looks good to me.

Reviewed-by: Liu Bo <[email protected]>

-liubo
> Cc: Chris Mason <[email protected]>
> Cc: Josef Bacik <[email protected]>
> Cc: David Sterba <[email protected]>
> Cc: [email protected]
> Cc: Liu Bo <[email protected]>
> Signed-off-by: Ming Lei <[email protected]>
> ---
> fs/btrfs/inode.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 06dea7c89bbd..4ab02b34f029 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7993,6 +7993,7 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio,
> int read_mode = 0;
> int segs;
> int ret;
> + struct bio_vec bvec;
>
> BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
>
> @@ -8008,8 +8009,9 @@ static int dio_read_error(struct inode *inode, struct bio *failed_bio,
> }
>
> segs = bio_segments(failed_bio);
> + bio_get_first_bvec(failed_bio, &bvec);
> if (segs > 1 ||
> - (failed_bio->bi_io_vec->bv_len > btrfs_inode_sectorsize(inode)))
> + (bvec.bv_len > btrfs_inode_sectorsize(inode)))
> read_mode |= REQ_FAILFAST_DEV;
>
> isector = start - btrfs_io_bio(failed_bio)->logical;
> --
> 2.9.4
>

2017-06-27 06:12:22

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 16/51] block: bounce: avoid direct access to bvec table

On Mon, Jun 26, 2017 at 08:09:59PM +0800, Ming Lei wrote:
> bio_for_each_segment_all(bvec, bio, i) {
> - org_vec = bio_orig->bi_io_vec + i + start;
> -
> - if (bvec->bv_page == org_vec->bv_page)
> - continue;
> + orig_vec = bio_iter_iovec(bio_orig, orig_iter);
> + if (bvec->bv_page == orig_vec.bv_page)
> + goto next;
>
> dec_zone_page_state(bvec->bv_page, NR_BOUNCE);
> mempool_free(bvec->bv_page, pool);
> + next:
> + bio_advance_iter(bio_orig, &orig_iter, orig_vec.bv_len);
> }
>

I think this might be written more clearly as:

bio_for_each_segment_all(bvec, bio, i) {
orig_vec = bio_iter_iovec(bio_orig, orig_iter);
if (bvec->bv_page != orig_vec.bv_page) {
dec_zone_page_state(bvec->bv_page, NR_BOUNCE);
mempool_free(bvec->bv_page, pool);
}
bio_advance_iter(bio_orig, &orig_iter, orig_vec.bv_len);
}

What do you think?

2017-06-27 07:35:27

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH v2 16/51] block: bounce: avoid direct access to bvec table

On Mon, Jun 26, 2017 at 11:12:11PM -0700, Matthew Wilcox wrote:
> On Mon, Jun 26, 2017 at 08:09:59PM +0800, Ming Lei wrote:
> > bio_for_each_segment_all(bvec, bio, i) {
> > - org_vec = bio_orig->bi_io_vec + i + start;
> > -
> > - if (bvec->bv_page == org_vec->bv_page)
> > - continue;
> > + orig_vec = bio_iter_iovec(bio_orig, orig_iter);
> > + if (bvec->bv_page == orig_vec.bv_page)
> > + goto next;
> >
> > dec_zone_page_state(bvec->bv_page, NR_BOUNCE);
> > mempool_free(bvec->bv_page, pool);
> > + next:
> > + bio_advance_iter(bio_orig, &orig_iter, orig_vec.bv_len);
> > }
> >
>
> I think this might be written more clearly as:
>
> bio_for_each_segment_all(bvec, bio, i) {
> orig_vec = bio_iter_iovec(bio_orig, orig_iter);
> if (bvec->bv_page != orig_vec.bv_page) {
> dec_zone_page_state(bvec->bv_page, NR_BOUNCE);
> mempool_free(bvec->bv_page, pool);
> }
> bio_advance_iter(bio_orig, &orig_iter, orig_vec.bv_len);
> }
>
> What do you think?

Yeah, looks the above code is more clean, will do it in V3.

thanks,
Ming

2017-06-27 09:37:13

by Guoqing Jiang

[permalink] [raw]
Subject: Re: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()



On 06/26/2017 08:09 PM, Ming Lei wrote:
> We will support multipage bvec soon, so initialize bvec
> table using the standardy way instead of writing the
> talbe directly. Otherwise it won't work any more once
> multipage bvec is enabled.
>
> Cc: Shaohua Li <[email protected]>
> Cc: [email protected]
> Signed-off-by: Ming Lei <[email protected]>
> ---
> drivers/md/raid1.c | 27 ++++++++++++++-------------
> 1 file changed, 14 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 3febfc8391fb..835c42396861 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2086,10 +2086,8 @@ static void process_checks(struct r1bio *r1_bio)
> /* Fix variable parts of all bios */
> vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
> for (i = 0; i < conf->raid_disks * 2; i++) {
> - int j;
> int size;
> blk_status_t status;
> - struct bio_vec *bi;
> struct bio *b = r1_bio->bios[i];
> struct resync_pages *rp = get_resync_pages(b);
> if (b->bi_end_io != end_sync_read)
> @@ -2098,8 +2096,6 @@ static void process_checks(struct r1bio *r1_bio)
> status = b->bi_status;
> bio_reset(b);
> b->bi_status = status;
> - b->bi_vcnt = vcnt;
> - b->bi_iter.bi_size = r1_bio->sectors << 9;
> b->bi_iter.bi_sector = r1_bio->sector +
> conf->mirrors[i].rdev->data_offset;
> b->bi_bdev = conf->mirrors[i].rdev->bdev;
> @@ -2107,15 +2103,20 @@ static void process_checks(struct r1bio *r1_bio)
> rp->raid_bio = r1_bio;
> b->bi_private = rp;
>
> - size = b->bi_iter.bi_size;
> - bio_for_each_segment_all(bi, b, j) {
> - bi->bv_offset = 0;
> - if (size > PAGE_SIZE)
> - bi->bv_len = PAGE_SIZE;
> - else
> - bi->bv_len = size;
> - size -= PAGE_SIZE;
> - }
> + /* initialize bvec table again */
> + rp->idx = 0;
> + size = r1_bio->sectors << 9;
> + do {
> + struct page *page = resync_fetch_page(rp, rp->idx++);
> + int len = min_t(int, size, PAGE_SIZE);
> +
> + /*
> + * won't fail because the vec table is big
> + * enough to hold all these pages
> + */
> + bio_add_page(b, page, len, 0);
> + size -= len;
> + } while (rp->idx < RESYNC_PAGES && size > 0);
> }

Seems above section is similar as reset_bvec_table introduced in next patch,
why there is difference between raid1 and raid10? Maybe add reset_bvec_table
into md.c, then call it in raid1 or raid10 is better, just my 2 cents.

Thanks,
Guoqing

2017-06-27 16:23:24

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()

On Tue, Jun 27, 2017 at 05:36:39PM +0800, Guoqing Jiang wrote:
>
>
> On 06/26/2017 08:09 PM, Ming Lei wrote:
> > We will support multipage bvec soon, so initialize bvec
> > table using the standardy way instead of writing the
> > talbe directly. Otherwise it won't work any more once
> > multipage bvec is enabled.
> >
> > Cc: Shaohua Li <[email protected]>
> > Cc: [email protected]
> > Signed-off-by: Ming Lei <[email protected]>
> > ---
> > drivers/md/raid1.c | 27 ++++++++++++++-------------
> > 1 file changed, 14 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index 3febfc8391fb..835c42396861 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -2086,10 +2086,8 @@ static void process_checks(struct r1bio *r1_bio)
> > /* Fix variable parts of all bios */
> > vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
> > for (i = 0; i < conf->raid_disks * 2; i++) {
> > - int j;
> > int size;
> > blk_status_t status;
> > - struct bio_vec *bi;
> > struct bio *b = r1_bio->bios[i];
> > struct resync_pages *rp = get_resync_pages(b);
> > if (b->bi_end_io != end_sync_read)
> > @@ -2098,8 +2096,6 @@ static void process_checks(struct r1bio *r1_bio)
> > status = b->bi_status;
> > bio_reset(b);
> > b->bi_status = status;
> > - b->bi_vcnt = vcnt;
> > - b->bi_iter.bi_size = r1_bio->sectors << 9;
> > b->bi_iter.bi_sector = r1_bio->sector +
> > conf->mirrors[i].rdev->data_offset;
> > b->bi_bdev = conf->mirrors[i].rdev->bdev;
> > @@ -2107,15 +2103,20 @@ static void process_checks(struct r1bio *r1_bio)
> > rp->raid_bio = r1_bio;
> > b->bi_private = rp;
> > - size = b->bi_iter.bi_size;
> > - bio_for_each_segment_all(bi, b, j) {
> > - bi->bv_offset = 0;
> > - if (size > PAGE_SIZE)
> > - bi->bv_len = PAGE_SIZE;
> > - else
> > - bi->bv_len = size;
> > - size -= PAGE_SIZE;
> > - }
> > + /* initialize bvec table again */
> > + rp->idx = 0;
> > + size = r1_bio->sectors << 9;
> > + do {
> > + struct page *page = resync_fetch_page(rp, rp->idx++);
> > + int len = min_t(int, size, PAGE_SIZE);
> > +
> > + /*
> > + * won't fail because the vec table is big
> > + * enough to hold all these pages
> > + */
> > + bio_add_page(b, page, len, 0);
> > + size -= len;
> > + } while (rp->idx < RESYNC_PAGES && size > 0);
> > }
>
> Seems above section is similar as reset_bvec_table introduced in next patch,
> why there is difference between raid1 and raid10? Maybe add reset_bvec_table
> into md.c, then call it in raid1 or raid10 is better, just my 2 cents.

Hi Guoqing,

I think it is a good idea, and even the two patches can be put into
one, so how about the following patch?

Shaohua, what do you think of this one?

---
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3d957ac1e109..7ffc622dd7fa 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9126,6 +9126,24 @@ void md_reload_sb(struct mddev *mddev, int nr)
}
EXPORT_SYMBOL(md_reload_sb);

+/* generally called after bio_reset() for reseting bvec */
+void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size)
+{
+ /* initialize bvec table again */
+ rp->idx = 0;
+ do {
+ struct page *page = resync_fetch_page(rp, rp->idx++);
+ int len = min_t(int, size, PAGE_SIZE);
+
+ /*
+ * won't fail because the vec table is big
+ * enough to hold all these pages
+ */
+ bio_add_page(bio, page, len, 0);
+ size -= len;
+ } while (rp->idx < RESYNC_PAGES && size > 0);
+}
+
#ifndef MODULE

/*
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 991f0fe2dcc6..f569831b1de9 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -783,4 +783,6 @@ static inline struct page *resync_fetch_page(struct resync_pages *rp,
return NULL;
return rp->pages[idx];
}
+
+void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size);
#endif /* _MD_MD_H */
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 3febfc8391fb..6ae2665ecd58 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2086,10 +2086,7 @@ static void process_checks(struct r1bio *r1_bio)
/* Fix variable parts of all bios */
vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
for (i = 0; i < conf->raid_disks * 2; i++) {
- int j;
- int size;
blk_status_t status;
- struct bio_vec *bi;
struct bio *b = r1_bio->bios[i];
struct resync_pages *rp = get_resync_pages(b);
if (b->bi_end_io != end_sync_read)
@@ -2098,8 +2095,6 @@ static void process_checks(struct r1bio *r1_bio)
status = b->bi_status;
bio_reset(b);
b->bi_status = status;
- b->bi_vcnt = vcnt;
- b->bi_iter.bi_size = r1_bio->sectors << 9;
b->bi_iter.bi_sector = r1_bio->sector +
conf->mirrors[i].rdev->data_offset;
b->bi_bdev = conf->mirrors[i].rdev->bdev;
@@ -2107,15 +2102,8 @@ static void process_checks(struct r1bio *r1_bio)
rp->raid_bio = r1_bio;
b->bi_private = rp;

- size = b->bi_iter.bi_size;
- bio_for_each_segment_all(bi, b, j) {
- bi->bv_offset = 0;
- if (size > PAGE_SIZE)
- bi->bv_len = PAGE_SIZE;
- else
- bi->bv_len = size;
- size -= PAGE_SIZE;
- }
+ /* initialize bvec table again */
+ reset_bvec_table(b, rp, r1_bio->sectors << 9);
}
for (primary = 0; primary < conf->raid_disks * 2; primary++)
if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5026e7ad51d3..72f4fa07449b 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2087,8 +2087,8 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
rp = get_resync_pages(tbio);
bio_reset(tbio);

- tbio->bi_vcnt = vcnt;
- tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
+ reset_bvec_table(tbio, rp, fbio->bi_iter.bi_size);
+
rp->raid_bio = r10_bio;
tbio->bi_private = rp;
tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;

Thanks,
Ming

2017-06-29 01:37:02

by Guoqing Jiang

[permalink] [raw]
Subject: Re: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()



On 06/28/2017 12:22 AM, Ming Lei wrote:
>
>> Seems above section is similar as reset_bvec_table introduced in next patch,
>> why there is difference between raid1 and raid10? Maybe add reset_bvec_table
>> into md.c, then call it in raid1 or raid10 is better, just my 2 cents.
> Hi Guoqing,
>
> I think it is a good idea, and even the two patches can be put into
> one, so how about the following patch?

Looks good.

Acked-by: Guoqing Jiang <[email protected]>

Thanks,
Guoqing

>
> Shaohua, what do you think of this one?
>
> ---
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 3d957ac1e109..7ffc622dd7fa 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9126,6 +9126,24 @@ void md_reload_sb(struct mddev *mddev, int nr)
> }
> EXPORT_SYMBOL(md_reload_sb);
>
> +/* generally called after bio_reset() for reseting bvec */
> +void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size)
> +{
> + /* initialize bvec table again */
> + rp->idx = 0;
> + do {
> + struct page *page = resync_fetch_page(rp, rp->idx++);
> + int len = min_t(int, size, PAGE_SIZE);
> +
> + /*
> + * won't fail because the vec table is big
> + * enough to hold all these pages
> + */
> + bio_add_page(bio, page, len, 0);
> + size -= len;
> + } while (rp->idx < RESYNC_PAGES && size > 0);
> +}
> +
> #ifndef MODULE
>
> /*
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 991f0fe2dcc6..f569831b1de9 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -783,4 +783,6 @@ static inline struct page *resync_fetch_page(struct resync_pages *rp,
> return NULL;
> return rp->pages[idx];
> }
> +
> +void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size);
> #endif /* _MD_MD_H */
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 3febfc8391fb..6ae2665ecd58 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2086,10 +2086,7 @@ static void process_checks(struct r1bio *r1_bio)
> /* Fix variable parts of all bios */
> vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
> for (i = 0; i < conf->raid_disks * 2; i++) {
> - int j;
> - int size;
> blk_status_t status;
> - struct bio_vec *bi;
> struct bio *b = r1_bio->bios[i];
> struct resync_pages *rp = get_resync_pages(b);
> if (b->bi_end_io != end_sync_read)
> @@ -2098,8 +2095,6 @@ static void process_checks(struct r1bio *r1_bio)
> status = b->bi_status;
> bio_reset(b);
> b->bi_status = status;
> - b->bi_vcnt = vcnt;
> - b->bi_iter.bi_size = r1_bio->sectors << 9;
> b->bi_iter.bi_sector = r1_bio->sector +
> conf->mirrors[i].rdev->data_offset;
> b->bi_bdev = conf->mirrors[i].rdev->bdev;
> @@ -2107,15 +2102,8 @@ static void process_checks(struct r1bio *r1_bio)
> rp->raid_bio = r1_bio;
> b->bi_private = rp;
>
> - size = b->bi_iter.bi_size;
> - bio_for_each_segment_all(bi, b, j) {
> - bi->bv_offset = 0;
> - if (size > PAGE_SIZE)
> - bi->bv_len = PAGE_SIZE;
> - else
> - bi->bv_len = size;
> - size -= PAGE_SIZE;
> - }
> + /* initialize bvec table again */
> + reset_bvec_table(b, rp, r1_bio->sectors << 9);
> }
> for (primary = 0; primary < conf->raid_disks * 2; primary++)
> if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 5026e7ad51d3..72f4fa07449b 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2087,8 +2087,8 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
> rp = get_resync_pages(tbio);
> bio_reset(tbio);
>
> - tbio->bi_vcnt = vcnt;
> - tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
> + reset_bvec_table(tbio, rp, fbio->bi_iter.bi_size);
> +
> rp->raid_bio = r10_bio;
> tbio->bi_private = rp;
> tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
>
> Thanks,
> Ming
>

2017-06-29 19:00:46

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()

On Wed, Jun 28, 2017 at 12:22:51AM +0800, Ming Lei wrote:
> On Tue, Jun 27, 2017 at 05:36:39PM +0800, Guoqing Jiang wrote:
> >
> >
> > On 06/26/2017 08:09 PM, Ming Lei wrote:
> > > We will support multipage bvec soon, so initialize bvec
> > > table using the standardy way instead of writing the
> > > talbe directly. Otherwise it won't work any more once
> > > multipage bvec is enabled.
> > >
> > > Cc: Shaohua Li <[email protected]>
> > > Cc: [email protected]
> > > Signed-off-by: Ming Lei <[email protected]>
> > > ---
> > > drivers/md/raid1.c | 27 ++++++++++++++-------------
> > > 1 file changed, 14 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > > index 3febfc8391fb..835c42396861 100644
> > > --- a/drivers/md/raid1.c
> > > +++ b/drivers/md/raid1.c
> > > @@ -2086,10 +2086,8 @@ static void process_checks(struct r1bio *r1_bio)
> > > /* Fix variable parts of all bios */
> > > vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
> > > for (i = 0; i < conf->raid_disks * 2; i++) {
> > > - int j;
> > > int size;
> > > blk_status_t status;
> > > - struct bio_vec *bi;
> > > struct bio *b = r1_bio->bios[i];
> > > struct resync_pages *rp = get_resync_pages(b);
> > > if (b->bi_end_io != end_sync_read)
> > > @@ -2098,8 +2096,6 @@ static void process_checks(struct r1bio *r1_bio)
> > > status = b->bi_status;
> > > bio_reset(b);
> > > b->bi_status = status;
> > > - b->bi_vcnt = vcnt;
> > > - b->bi_iter.bi_size = r1_bio->sectors << 9;
> > > b->bi_iter.bi_sector = r1_bio->sector +
> > > conf->mirrors[i].rdev->data_offset;
> > > b->bi_bdev = conf->mirrors[i].rdev->bdev;
> > > @@ -2107,15 +2103,20 @@ static void process_checks(struct r1bio *r1_bio)
> > > rp->raid_bio = r1_bio;
> > > b->bi_private = rp;
> > > - size = b->bi_iter.bi_size;
> > > - bio_for_each_segment_all(bi, b, j) {
> > > - bi->bv_offset = 0;
> > > - if (size > PAGE_SIZE)
> > > - bi->bv_len = PAGE_SIZE;
> > > - else
> > > - bi->bv_len = size;
> > > - size -= PAGE_SIZE;
> > > - }
> > > + /* initialize bvec table again */
> > > + rp->idx = 0;
> > > + size = r1_bio->sectors << 9;
> > > + do {
> > > + struct page *page = resync_fetch_page(rp, rp->idx++);
> > > + int len = min_t(int, size, PAGE_SIZE);
> > > +
> > > + /*
> > > + * won't fail because the vec table is big
> > > + * enough to hold all these pages
> > > + */
> > > + bio_add_page(b, page, len, 0);
> > > + size -= len;
> > > + } while (rp->idx < RESYNC_PAGES && size > 0);
> > > }
> >
> > Seems above section is similar as reset_bvec_table introduced in next patch,
> > why there is difference between raid1 and raid10? Maybe add reset_bvec_table
> > into md.c, then call it in raid1 or raid10 is better, just my 2 cents.
>
> Hi Guoqing,
>
> I think it is a good idea, and even the two patches can be put into
> one, so how about the following patch?
>
> Shaohua, what do you think of this one?

generally looks good.

> ---
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 3d957ac1e109..7ffc622dd7fa 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9126,6 +9126,24 @@ void md_reload_sb(struct mddev *mddev, int nr)
> }
> EXPORT_SYMBOL(md_reload_sb);
>
> +/* generally called after bio_reset() for reseting bvec */
> +void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size)
> +{
> + /* initialize bvec table again */
> + rp->idx = 0;
> + do {
> + struct page *page = resync_fetch_page(rp, rp->idx++);
> + int len = min_t(int, size, PAGE_SIZE);
> +
> + /*
> + * won't fail because the vec table is big
> + * enough to hold all these pages
> + */
> + bio_add_page(bio, page, len, 0);
> + size -= len;
> + } while (rp->idx < RESYNC_PAGES && size > 0);
> +}
> +

used in raid1/10, so must export the symbol. Then please not pollute global
names, maybe call it md_bio_reset_resync_pages?

> #ifndef MODULE
>
> /*
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 991f0fe2dcc6..f569831b1de9 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -783,4 +783,6 @@ static inline struct page *resync_fetch_page(struct resync_pages *rp,
> return NULL;
> return rp->pages[idx];
> }
> +
> +void reset_bvec_table(struct bio *bio, struct resync_pages *rp, int size);
> #endif /* _MD_MD_H */
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 3febfc8391fb..6ae2665ecd58 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2086,10 +2086,7 @@ static void process_checks(struct r1bio *r1_bio)
> /* Fix variable parts of all bios */
> vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9);
> for (i = 0; i < conf->raid_disks * 2; i++) {
> - int j;
> - int size;
> blk_status_t status;
> - struct bio_vec *bi;
> struct bio *b = r1_bio->bios[i];
> struct resync_pages *rp = get_resync_pages(b);
> if (b->bi_end_io != end_sync_read)
> @@ -2098,8 +2095,6 @@ static void process_checks(struct r1bio *r1_bio)
> status = b->bi_status;
> bio_reset(b);
> b->bi_status = status;
> - b->bi_vcnt = vcnt;
> - b->bi_iter.bi_size = r1_bio->sectors << 9;
> b->bi_iter.bi_sector = r1_bio->sector +
> conf->mirrors[i].rdev->data_offset;
> b->bi_bdev = conf->mirrors[i].rdev->bdev;
> @@ -2107,15 +2102,8 @@ static void process_checks(struct r1bio *r1_bio)
> rp->raid_bio = r1_bio;
> b->bi_private = rp;
>
> - size = b->bi_iter.bi_size;
> - bio_for_each_segment_all(bi, b, j) {
> - bi->bv_offset = 0;
> - if (size > PAGE_SIZE)
> - bi->bv_len = PAGE_SIZE;
> - else
> - bi->bv_len = size;
> - size -= PAGE_SIZE;
> - }
> + /* initialize bvec table again */
> + reset_bvec_table(b, rp, r1_bio->sectors << 9);
> }
> for (primary = 0; primary < conf->raid_disks * 2; primary++)
> if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 5026e7ad51d3..72f4fa07449b 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -2087,8 +2087,8 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
> rp = get_resync_pages(tbio);
> bio_reset(tbio);
>
> - tbio->bi_vcnt = vcnt;
> - tbio->bi_iter.bi_size = fbio->bi_iter.bi_size;
> + reset_bvec_table(tbio, rp, fbio->bi_iter.bi_size);
> +
> rp->raid_bio = r10_bio;
> tbio->bi_private = rp;
> tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;
>
> Thanks,
> Ming