2013-08-07 21:54:57

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 0/22] Immutable biovecs, block layer changes

Jens - here's the immutable biovec patch series. I'd really like to get
this stuff into 3.12 if at all possible, and I think this series ought
to be ready - and, the dio rewrite and various other fun stuff is gated
on this stuff.

Not much in these patches has changed over the past six months or so,
besides rebasing. The first two patches are new, but pretty trivial.

Testing wise - I've exercised the core changes pretty thoroughly with
bcache, but the driver changes are pretty hard for me to test - even if
I managed to get test environments rigged up for everything, I wouldn't
trust myself to get useful code coverage - so this is going to need help
from the various driver maintainers testing wise. Fortunately, most of
the driver changes are pretty mechanical.

I _did_ test the md changes as well as I could think of, to hopefully
avoid a similar fiasco as the one in 3.10 with the bio_copy_data().
Uhh, apologies for sticking you with that Neil, hopefully this series
goes smoother :)

Jens - I'm sure you'll want to performance test this stuff before it
goes in, and it would be warrented; this does inflate struct bio.

_However_, I have more patches (that depend on this patch series) to get
that back - segment merging improvements that get rid of
bi_seg_front_size, bi_seg_back_size, and bi_phys_segments. Once all that
is in it should be a net reduction to the size of struct bio.

Patch series is available in git -
git://evilpiepirate.org/~kent/linux-bcache.git for-jens


2013-08-07 21:55:03

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 02/22] block: Consolidate duplicated bio_trim() implementations

Someone cut and pasted md's md_trim_bio() into xen-blkfront.c. Come on,
we should know better than this.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Jeremy Fitzhardinge <[email protected]>
---
drivers/block/xen-blkfront.c | 53 +-------------------------------------------
drivers/md/md.c | 40 ---------------------------------
drivers/md/md.h | 1 -
drivers/md/raid1.c | 10 ++++-----
drivers/md/raid10.c | 18 +++++++--------
fs/bio.c | 46 ++++++++++++++++++++++++++++++++++++++
include/linux/bio.h | 1 +
7 files changed, 61 insertions(+), 108 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index a4660bb..8d53ed2 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1336,57 +1336,6 @@ static int blkfront_probe(struct xenbus_device *dev,
return 0;
}

-/*
- * This is a clone of md_trim_bio, used to split a bio into smaller ones
- */
-static void trim_bio(struct bio *bio, int offset, int size)
-{
- /* 'bio' is a cloned bio which we need to trim to match
- * the given offset and size.
- * This requires adjusting bi_sector, bi_size, and bi_io_vec
- */
- int i;
- struct bio_vec *bvec;
- int sofar = 0;
-
- size <<= 9;
- if (offset == 0 && size == bio->bi_size)
- return;
-
- bio->bi_sector += offset;
- bio->bi_size = size;
- offset <<= 9;
- clear_bit(BIO_SEG_VALID, &bio->bi_flags);
-
- while (bio->bi_idx < bio->bi_vcnt &&
- bio->bi_io_vec[bio->bi_idx].bv_len <= offset) {
- /* remove this whole bio_vec */
- offset -= bio->bi_io_vec[bio->bi_idx].bv_len;
- bio->bi_idx++;
- }
- if (bio->bi_idx < bio->bi_vcnt) {
- bio->bi_io_vec[bio->bi_idx].bv_offset += offset;
- bio->bi_io_vec[bio->bi_idx].bv_len -= offset;
- }
- /* avoid any complications with bi_idx being non-zero*/
- if (bio->bi_idx) {
- memmove(bio->bi_io_vec, bio->bi_io_vec+bio->bi_idx,
- (bio->bi_vcnt - bio->bi_idx) * sizeof(struct bio_vec));
- bio->bi_vcnt -= bio->bi_idx;
- bio->bi_idx = 0;
- }
- /* Make sure vcnt and last bv are not too big */
- bio_for_each_segment(bvec, bio, i) {
- if (sofar + bvec->bv_len > size)
- bvec->bv_len = size - sofar;
- if (bvec->bv_len == 0) {
- bio->bi_vcnt = i;
- break;
- }
- sofar += bvec->bv_len;
- }
-}
-
static void split_bio_end(struct bio *bio, int error)
{
struct split_bio *split_bio = bio->bi_private;
@@ -1522,7 +1471,7 @@ static int blkif_recover(struct blkfront_info *info)
(unsigned int)(bio->bi_size >> 9) - offset);
cloned_bio = bio_clone(bio, GFP_NOIO);
BUG_ON(cloned_bio == NULL);
- trim_bio(cloned_bio, offset, size);
+ bio_trim(cloned_bio, offset, size);
cloned_bio->bi_private = split_bio;
cloned_bio->bi_end_io = split_bio_end;
submit_bio(cloned_bio->bi_rw, cloned_bio);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9f13e13..60e10f1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -183,46 +183,6 @@ struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
}
EXPORT_SYMBOL_GPL(bio_clone_mddev);

-void md_trim_bio(struct bio *bio, int offset, int size)
-{
- /* 'bio' is a cloned bio which we need to trim to match
- * the given offset and size.
- * This requires adjusting bi_sector, bi_size, and bi_io_vec
- */
- int i;
- struct bio_vec *bvec;
- int sofar = 0;
-
- size <<= 9;
- if (offset == 0 && size == bio->bi_size)
- return;
-
- clear_bit(BIO_SEG_VALID, &bio->bi_flags);
-
- bio_advance(bio, offset << 9);
-
- bio->bi_size = size;
-
- /* avoid any complications with bi_idx being non-zero*/
- if (bio->bi_idx) {
- memmove(bio->bi_io_vec, bio->bi_io_vec+bio->bi_idx,
- (bio->bi_vcnt - bio->bi_idx) * sizeof(struct bio_vec));
- bio->bi_vcnt -= bio->bi_idx;
- bio->bi_idx = 0;
- }
- /* Make sure vcnt and last bv are not too big */
- bio_for_each_segment(bvec, bio, i) {
- if (sofar + bvec->bv_len > size)
- bvec->bv_len = size - sofar;
- if (bvec->bv_len == 0) {
- bio->bi_vcnt = i;
- break;
- }
- sofar += bvec->bv_len;
- }
-}
-EXPORT_SYMBOL_GPL(md_trim_bio);
-
/*
* We have a system wide 'event count' that is incremented
* on any 'interesting' event, and readers of /proc/mdstat
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 20f02c0..01401bc 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -613,7 +613,6 @@ extern struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
struct mddev *mddev);
extern struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
struct mddev *mddev);
-extern void md_trim_bio(struct bio *bio, int offset, int size);

extern void md_unplug(struct blk_plug_cb *cb, bool from_schedule);
static inline int mddev_check_plugged(struct mddev *mddev)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d60412c..fc817ff 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1097,8 +1097,8 @@ read_again:
r1_bio->read_disk = rdisk;

read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(read_bio, r1_bio->sector - bio->bi_sector,
- max_sectors);
+ bio_trim(read_bio, r1_bio->sector - bio->bi_sector,
+ max_sectors);

r1_bio->bios[rdisk] = read_bio;

@@ -1266,7 +1266,7 @@ read_again:
continue;

mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(mbio, r1_bio->sector - bio->bi_sector, max_sectors);
+ bio_trim(mbio, r1_bio->sector - bio->bi_sector, max_sectors);

if (first_clone) {
/* do behind I/O ?
@@ -2125,7 +2125,7 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
wbio->bi_sector = r1_bio->sector;
wbio->bi_size = r1_bio->sectors << 9;

- md_trim_bio(wbio, sector - r1_bio->sector, sectors);
+ bio_trim(wbio, sector - r1_bio->sector, sectors);
wbio->bi_sector += rdev->data_offset;
wbio->bi_bdev = rdev->bdev;
if (submit_bio_wait(WRITE, wbio) == 0)
@@ -2240,7 +2240,7 @@ read_more:
}
r1_bio->read_disk = disk;
bio = bio_clone_mddev(r1_bio->master_bio, GFP_NOIO, mddev);
- md_trim_bio(bio, r1_bio->sector - bio->bi_sector, max_sectors);
+ bio_trim(bio, r1_bio->sector - bio->bi_sector, max_sectors);
r1_bio->bios[r1_bio->read_disk] = bio;
rdev = conf->mirrors[disk].rdev;
printk_ratelimited(KERN_ERR
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index df7b0a0..a582235 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1302,8 +1302,8 @@ read_again:
slot = r10_bio->read_slot;

read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(read_bio, r10_bio->sector - bio->bi_sector,
- max_sectors);
+ bio_trim(read_bio, r10_bio->sector - bio->bi_sector,
+ max_sectors);

r10_bio->devs[slot].bio = read_bio;
r10_bio->devs[slot].rdev = rdev;
@@ -1510,8 +1510,8 @@ retry_write:
if (r10_bio->devs[i].bio) {
struct md_rdev *rdev = conf->mirrors[d].rdev;
mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(mbio, r10_bio->sector - bio->bi_sector,
- max_sectors);
+ bio_trim(mbio, r10_bio->sector - bio->bi_sector,
+ max_sectors);
r10_bio->devs[i].bio = mbio;

mbio->bi_sector = (r10_bio->devs[i].addr+
@@ -1553,8 +1553,8 @@ retry_write:
rdev = conf->mirrors[d].rdev;
}
mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(mbio, r10_bio->sector - bio->bi_sector,
- max_sectors);
+ bio_trim(mbio, r10_bio->sector - bio->bi_sector,
+ max_sectors);
r10_bio->devs[i].repl_bio = mbio;

mbio->bi_sector = (r10_bio->devs[i].addr +
@@ -2613,7 +2613,7 @@ static int narrow_write_error(struct r10bio *r10_bio, int i)
sectors = sect_to_write;
/* Write at 'sector' for 'sectors' */
wbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(wbio, sector - bio->bi_sector, sectors);
+ bio_trim(wbio, sector - bio->bi_sector, sectors);
wbio->bi_sector = (r10_bio->devs[i].addr+
choose_data_offset(r10_bio, rdev) +
(sector - r10_bio->sector));
@@ -2686,9 +2686,7 @@ read_more:
(unsigned long long)r10_bio->sector);
bio = bio_clone_mddev(r10_bio->master_bio,
GFP_NOIO, mddev);
- md_trim_bio(bio,
- r10_bio->sector - bio->bi_sector,
- max_sectors);
+ bio_trim(bio, r10_bio->sector - bio->bi_sector, max_sectors);
r10_bio->devs[slot].bio = bio;
r10_bio->devs[slot].rdev = rdev;
bio->bi_sector = r10_bio->devs[slot].addr
diff --git a/fs/bio.c b/fs/bio.c
index 94bbc04..2b6a816 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1795,6 +1795,52 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)
EXPORT_SYMBOL(bio_split);

/**
+ * bio_trim - trim a bio
+ * @bio: bio to trim
+ * @offset: number of sectors to trim from the front of @bio
+ * @size: size we want to trim @bio to, in sectors
+ */
+void bio_trim(struct bio *bio, int offset, int size)
+{
+ /* 'bio' is a cloned bio which we need to trim to match
+ * the given offset and size.
+ * This requires adjusting bi_sector, bi_size, and bi_io_vec
+ */
+ int i;
+ struct bio_vec *bvec;
+ int sofar = 0;
+
+ size <<= 9;
+ if (offset == 0 && size == bio->bi_size)
+ return;
+
+ clear_bit(BIO_SEG_VALID, &bio->bi_flags);
+
+ bio_advance(bio, offset << 9);
+
+ bio->bi_size = size;
+
+ /* avoid any complications with bi_idx being non-zero*/
+ if (bio->bi_idx) {
+ memmove(bio->bi_io_vec, bio->bi_io_vec+bio->bi_idx,
+ (bio->bi_vcnt - bio->bi_idx) * sizeof(struct bio_vec));
+ bio->bi_vcnt -= bio->bi_idx;
+ bio->bi_idx = 0;
+ }
+ /* Make sure vcnt and last bv are not too big */
+ bio_for_each_segment(bvec, bio, i) {
+ if (sofar + bvec->bv_len > size)
+ bvec->bv_len = size - sofar;
+ if (bvec->bv_len == 0) {
+ bio->bi_vcnt = i;
+ break;
+ }
+ sofar += bvec->bv_len;
+ }
+}
+EXPORT_SYMBOL_GPL(bio_trim);
+
+/**
* bio_sector_offset - Find hardware sector offset in bio
* @bio: bio to inspect
* @index: bio_vec index
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ec48bac..162036a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -218,6 +218,7 @@ struct bio_pair {
};
extern struct bio_pair *bio_split(struct bio *bi, int first_sectors);
extern void bio_pair_release(struct bio_pair *dbio);
+extern void bio_trim(struct bio *bio, int offset, int size);

extern struct bio_set *bioset_create(unsigned int, unsigned int);
extern void bioset_free(struct bio_set *);
--
1.8.4.rc1

2013-08-07 21:55:22

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 05/22] dm: Use bvec_iter for dm_bio_record()

This patch doesn't itself have any functional changes, but immutable
biovecs are going to add a bi_bvec_done member to bi_iter, which will
need to be saved too here.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: [email protected]
---
drivers/md/dm-bio-record.h | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm-bio-record.h b/drivers/md/dm-bio-record.h
index 5ace48e..4f46e8e 100644
--- a/drivers/md/dm-bio-record.h
+++ b/drivers/md/dm-bio-record.h
@@ -28,11 +28,9 @@ struct dm_bio_vec_details {
};

struct dm_bio_details {
- sector_t bi_sector;
struct block_device *bi_bdev;
- unsigned int bi_size;
- unsigned short bi_idx;
unsigned long bi_flags;
+ struct bvec_iter bi_iter;
struct dm_bio_vec_details bi_io_vec[BIO_MAX_PAGES];
};

@@ -40,11 +38,9 @@ static inline void dm_bio_record(struct dm_bio_details *bd, struct bio *bio)
{
unsigned i;

- bd->bi_sector = bio->bi_iter.bi_sector;
bd->bi_bdev = bio->bi_bdev;
- bd->bi_size = bio->bi_iter.bi_size;
- bd->bi_idx = bio->bi_iter.bi_idx;
bd->bi_flags = bio->bi_flags;
+ bd->bi_iter = bio->bi_iter;

for (i = 0; i < bio->bi_vcnt; i++) {
bd->bi_io_vec[i].bv_len = bio->bi_io_vec[i].bv_len;
@@ -56,11 +52,9 @@ static inline void dm_bio_restore(struct dm_bio_details *bd, struct bio *bio)
{
unsigned i;

- bio->bi_iter.bi_sector = bd->bi_sector;
bio->bi_bdev = bd->bi_bdev;
- bio->bi_iter.bi_size = bd->bi_size;
- bio->bi_iter.bi_idx = bd->bi_idx;
bio->bi_flags = bd->bi_flags;
+ bio->bi_iter = bd->bi_iter;

for (i = 0; i < bio->bi_vcnt; i++) {
bio->bi_io_vec[i].bv_len = bd->bi_io_vec[i].bv_len;
--
1.8.4.rc1

2013-08-07 21:55:36

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 14/22] block: Kill bio_iovec_idx(), __bio_iovec()

bio_iovec_idx() and __bio_iovec() don't have any valid uses anymore -
previous users have been converted to bio_iovec_iter() or other methods.

__BVEC_END() has to go too - the bvec array can't be used directly for
the last biovec because we might only be using the first portion of it,
we have to iterate over the bvec array with bio_for_each_segment() which
checks against the current value of bi_iter.bi_size.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
---
block/blk-merge.c | 13 +++++++++++--
include/linux/bio.h | 26 ++++++++------------------
2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8940562..b53ddac 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -88,6 +88,9 @@ EXPORT_SYMBOL(blk_recount_segments);
static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
struct bio *nxt)
{
+ struct bio_vec end_bv, nxt_bv;
+ struct bvec_iter iter;
+
if (!blk_queue_cluster(q))
return 0;

@@ -98,14 +101,20 @@ static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
if (!bio_has_data(bio))
return 1;

- if (!BIOVEC_PHYS_MERGEABLE(__BVEC_END(bio), __BVEC_START(nxt)))
+ bio_for_each_segment(end_bv, bio, iter)
+ if (end_bv.bv_len == iter.bi_size)
+ break;
+
+ nxt_bv = bio_iovec(nxt);
+
+ if (!BIOVEC_PHYS_MERGEABLE(&end_bv, &nxt_bv))
return 0;

/*
* bio and nxt are contiguous in memory; check if the queue allows
* these two to be merged into one
*/
- if (BIO_SEG_BOUNDARY(q, bio, nxt))
+ if (BIOVEC_SEG_BOUNDARY(q, &end_bv, &nxt_bv))
return 1;

return 0;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 9736683..486a997 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -61,9 +61,6 @@
* various member access, note that bio_data should of course not be used
* on highmem page vectors
*/
-#define bio_iovec_idx(bio, idx) (&((bio)->bi_io_vec[(idx)]))
-#define __bio_iovec(bio) bio_iovec_idx((bio), (bio)->bi_iter.bi_idx)
-
#define __bvec_iter_bvec(bvec, iter) (&(bvec)[(iter).bi_idx])

#define bvec_iter_page(bvec, iter) \
@@ -143,19 +140,16 @@ static inline void *bio_data(struct bio *bio)
* permanent PIO fall back, user is probably better off disabling highmem
* I/O completely on that queue (see ide-dma for example)
*/
-#define __bio_kmap_atomic(bio, idx) \
- (kmap_atomic(bio_iovec_idx((bio), (idx))->bv_page) + \
- bio_iovec_idx((bio), (idx))->bv_offset)
+#define __bio_kmap_atomic(bio, iter) \
+ (kmap_atomic(bio_iter_iovec((bio), (iter)).bv_page) + \
+ bio_iter_iovec((bio), (iter)).bv_offset)

-#define __bio_kunmap_atomic(addr) kunmap_atomic(addr)
+#define __bio_kunmap_atomic(addr) kunmap_atomic(addr)

/*
* merge helpers etc
*/

-#define __BVEC_END(bio) bio_iovec_idx((bio), (bio)->bi_vcnt - 1)
-#define __BVEC_START(bio) bio_iovec_idx((bio), (bio)->bi_iter.bi_idx)
-
/* Default implementation of BIOVEC_PHYS_MERGEABLE */
#define __BIOVEC_PHYS_MERGEABLE(vec1, vec2) \
((bvec_to_phys((vec1)) + (vec1)->bv_len) == bvec_to_phys((vec2)))
@@ -172,8 +166,6 @@ static inline void *bio_data(struct bio *bio)
(((addr1) | (mask)) == (((addr2) - 1) | (mask)))
#define BIOVEC_SEG_BOUNDARY(q, b1, b2) \
__BIO_SEG_BOUNDARY(bvec_to_phys((b1)), bvec_to_phys((b2)) + (b2)->bv_len, queue_segment_boundary((q)))
-#define BIO_SEG_BOUNDARY(q, b1, b2) \
- BIOVEC_SEG_BOUNDARY((q), __BVEC_END((b1)), __BVEC_START((b2)))

#define bio_io_error(bio) bio_endio((bio), -EIO)

@@ -182,9 +174,7 @@ static inline void *bio_data(struct bio *bio)
* before it got to the driver and the driver won't own all of it
*/
#define bio_for_each_segment_all(bvl, bio, i) \
- for (i = 0; \
- bvl = bio_iovec_idx((bio), (i)), i < (bio)->bi_vcnt; \
- i++)
+ for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)

static inline void bvec_iter_advance(struct bio_vec *bv, struct bvec_iter *iter,
unsigned bytes)
@@ -449,15 +439,15 @@ static inline void bvec_kunmap_irq(char *buffer, unsigned long *flags)
}
#endif

-static inline char *__bio_kmap_irq(struct bio *bio, unsigned short idx,
+static inline char *__bio_kmap_irq(struct bio *bio, struct bvec_iter iter,
unsigned long *flags)
{
- return bvec_kmap_irq(bio_iovec_idx(bio, idx), flags);
+ return bvec_kmap_irq(&bio_iter_iovec(bio, iter), flags);
}
#define __bio_kunmap_irq(buf, flags) bvec_kunmap_irq(buf, flags)

#define bio_kmap_irq(bio, flags) \
- __bio_kmap_irq((bio), (bio)->bi_iter.bi_idx, (flags))
+ __bio_kmap_irq((bio), (bio)->bi_iter, (flags))
#define bio_kunmap_irq(buf,flags) __bio_kunmap_irq(buf, flags)

static inline bool bio_is_rw(struct bio *bio)
--
1.8.4.rc1

2013-08-07 21:55:39

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 15/22] rbd: Refactor bio cloning, don't clone biovecs

Now that we've got drivers converted to the new immutable bvec
primitives, bio splitting becomes much easier. In a few patches,
bio_clone() will be changed to share the old bio's bvec instead of
copying it, and bio_split() will do exactly what's being done here.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Yehuda Sadeh <[email protected]>
Cc: Alex Elder <[email protected]>
Cc: [email protected]
---
drivers/block/rbd.c | 64 ++---------------------------------------------------
1 file changed, 2 insertions(+), 62 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index f99f347..efb082c 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1167,73 +1167,13 @@ static struct bio *bio_clone_range(struct bio *bio_src,
unsigned int len,
gfp_t gfpmask)
{
- struct bio_vec bv;
- struct bvec_iter iter;
- struct bvec_iter end_iter;
- unsigned int resid;
- unsigned int voff;
- unsigned short vcnt;
struct bio *bio;

- /* Handle the easy case for the caller */
-
- if (!offset && len == bio_src->bi_iter.bi_size)
- return bio_clone(bio_src, gfpmask);
-
- if (WARN_ON_ONCE(!len))
- return NULL;
- if (WARN_ON_ONCE(len > bio_src->bi_iter.bi_size))
- return NULL;
- if (WARN_ON_ONCE(offset > bio_src->bi_iter.bi_size - len))
- return NULL;
-
- /* Find first affected segment... */
-
- resid = offset;
- bio_for_each_segment(bv, bio_src, iter) {
- if (resid < bv.bv_len)
- break;
- resid -= bv.bv_len;
- }
- voff = resid;
-
- /* ...and the last affected segment */
-
- resid += len;
- __bio_for_each_segment(bv, bio_src, end_iter, iter) {
- if (resid <= bv.bv_len)
- break;
- resid -= bv.bv_len;
- }
- vcnt = end_iter.bi_idx = iter.bi_idx + 1;
-
- /* Build the clone */
-
- bio = bio_alloc(gfpmask, (unsigned int) vcnt);
+ bio = bio_clone(bio_src, gfpmask);
if (!bio)
return NULL; /* ENOMEM */

- bio->bi_bdev = bio_src->bi_bdev;
- bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector +
- (offset >> SECTOR_SHIFT);
- bio->bi_rw = bio_src->bi_rw;
- bio->bi_flags |= 1 << BIO_CLONED;
-
- /*
- * Copy over our part of the bio_vec, then update the first
- * and last (or only) entries.
- */
- memcpy(&bio->bi_io_vec[0], &bio_src->bi_io_vec[iter.bi_idx],
- vcnt * sizeof (struct bio_vec));
- bio->bi_io_vec[0].bv_offset += voff;
- if (vcnt > 1) {
- bio->bi_io_vec[0].bv_len -= voff;
- bio->bi_io_vec[vcnt - 1].bv_len = resid;
- } else {
- bio->bi_io_vec[0].bv_len = len;
- }
-
- bio->bi_vcnt = vcnt;
+ bio_advance(bio, offset);
bio->bi_iter.bi_size = len;

return bio;
--
1.8.4.rc1

2013-08-07 21:56:23

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 22/22] block: Don't save/copy bvec array anymore, share when cloning

Now that drivers have been converted to the bvec_iter primitives, they
shouldn't be modifying the biovec anymore and thus saving it is
unnecessary - code that was previously making a backup of the bvec array
can now just save bio->bi_iter.

Also, when cloning bios we can usually just reuse the original bio's
bvec array. For code that does need to modify the clone's biovec (the
bounce buffer code, mainly), add bio_clone_biovec().

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: [email protected]
---
drivers/md/bcache/request.c | 2 -
drivers/md/bcache/request.h | 1 -
drivers/md/dm-bio-record.h | 25 -------
fs/bio-integrity.c | 12 +---
fs/bio.c | 154 +++++++++++++++++++-------------------------
include/linux/bio.h | 1 +
mm/bounce.c | 1 +
7 files changed, 71 insertions(+), 125 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 542ae36..8d0880f 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -681,8 +681,6 @@ static void do_bio_hook(struct search *s)
struct bio *bio = &s->bio.bio;

bio_init(bio);
- bio->bi_io_vec = s->bv;
- bio->bi_max_vecs = BIO_MAX_PAGES;
__bio_clone(bio, s->orig_bio);
bio->bi_end_io = request_endio;
bio->bi_private = &s->cl;
diff --git a/drivers/md/bcache/request.h b/drivers/md/bcache/request.h
index bee95a9..6bcdf33 100644
--- a/drivers/md/bcache/request.h
+++ b/drivers/md/bcache/request.h
@@ -26,7 +26,6 @@ struct search {

/* Anything past op->keys won't get zeroed in do_bio_hook */
struct btree_op op;
- struct bio_vec bv[BIO_MAX_PAGES];
};

void bch_cache_read_endio(struct bio *, int);
diff --git a/drivers/md/dm-bio-record.h b/drivers/md/dm-bio-record.h
index 4f46e8e..dd36461 100644
--- a/drivers/md/dm-bio-record.h
+++ b/drivers/md/dm-bio-record.h
@@ -17,49 +17,24 @@
* original bio state.
*/

-struct dm_bio_vec_details {
-#if PAGE_SIZE < 65536
- __u16 bv_len;
- __u16 bv_offset;
-#else
- unsigned bv_len;
- unsigned bv_offset;
-#endif
-};
-
struct dm_bio_details {
struct block_device *bi_bdev;
unsigned long bi_flags;
struct bvec_iter bi_iter;
- struct dm_bio_vec_details bi_io_vec[BIO_MAX_PAGES];
};

static inline void dm_bio_record(struct dm_bio_details *bd, struct bio *bio)
{
- unsigned i;
-
bd->bi_bdev = bio->bi_bdev;
bd->bi_flags = bio->bi_flags;
bd->bi_iter = bio->bi_iter;
-
- for (i = 0; i < bio->bi_vcnt; i++) {
- bd->bi_io_vec[i].bv_len = bio->bi_io_vec[i].bv_len;
- bd->bi_io_vec[i].bv_offset = bio->bi_io_vec[i].bv_offset;
- }
}

static inline void dm_bio_restore(struct dm_bio_details *bd, struct bio *bio)
{
- unsigned i;
-
bio->bi_bdev = bd->bi_bdev;
bio->bi_flags = bd->bi_flags;
bio->bi_iter = bd->bi_iter;
-
- for (i = 0; i < bio->bi_vcnt; i++) {
- bio->bi_io_vec[i].bv_len = bd->bi_io_vec[i].bv_len;
- bio->bi_io_vec[i].bv_offset = bd->bi_io_vec[i].bv_offset;
- }
}

#endif
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index 72fa942..0c466e6 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -594,17 +594,11 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
struct bio_integrity_payload *bip_src = bio_src->bi_integrity;
struct bio_integrity_payload *bip;

- BUG_ON(bip_src == NULL);
-
- bip = bio_integrity_alloc(bio, gfp_mask, bip_src->bip_vcnt);
-
+ bip = bio_integrity_alloc(bio, gfp_mask, 0);
if (bip == NULL)
- return -EIO;
-
- memcpy(bip->bip_vec, bip_src->bip_vec,
- bip_src->bip_vcnt * sizeof(struct bio_vec));
+ return -ENOMEM;

- bip->bip_vcnt = bip_src->bip_vcnt;
+ bip->bip_vec = bip_src->bip_vec;
bip->bip_iter = bip_src->bip_iter;

return 0;
diff --git a/fs/bio.c b/fs/bio.c
index 7cc64c4..498ca24 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -544,8 +544,7 @@ EXPORT_SYMBOL(bio_phys_segments);
*/
void __bio_clone(struct bio *bio, struct bio *bio_src)
{
- memcpy(bio->bi_io_vec, bio_src->bi_io_vec,
- bio_src->bi_max_vecs * sizeof(struct bio_vec));
+ BUG_ON(bio->bi_pool && BIO_POOL_IDX(bio) != BIO_POOL_NONE);

/*
* most users will be overriding ->bi_bdev with a new target,
@@ -554,8 +553,8 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
bio->bi_bdev = bio_src->bi_bdev;
bio->bi_flags |= 1 << BIO_CLONED;
bio->bi_rw = bio_src->bi_rw;
- bio->bi_vcnt = bio_src->bi_vcnt;
bio->bi_iter = bio_src->bi_iter;
+ bio->bi_io_vec = bio_src->bi_io_vec;
}
EXPORT_SYMBOL(__bio_clone);

@@ -572,7 +571,7 @@ struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
{
struct bio *b;

- b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
+ b = bio_alloc_bioset(gfp_mask, 0, bs);
if (!b)
return NULL;

@@ -594,6 +593,50 @@ struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
EXPORT_SYMBOL(bio_clone_bioset);

/**
+ * bio_clone_biovec: Given a cloned bio, give the clone its own copy of the
+ * biovec
+ * @bio: cloned bio
+ *
+ * @bio must have been allocated from a bioset - i.e. returned from
+ * bio_clone_bioset()
+ */
+int bio_clone_biovec(struct bio *bio, gfp_t gfp_mask)
+{
+ unsigned long idx = BIO_POOL_NONE;
+ unsigned nr_iovecs = 0;
+ struct bio_vec bv, *bvl = NULL;
+ struct bvec_iter iter;
+
+ BUG_ON(!bio->bi_pool);
+ BUG_ON(BIO_POOL_IDX(bio) != BIO_POOL_NONE);
+
+ bio_for_each_segment(bv, bio, iter)
+ nr_iovecs++;
+
+ if (nr_iovecs > BIO_INLINE_VECS) {
+ bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx,
+ bio->bi_pool->bvec_pool);
+ if (!bvl)
+ return -ENOMEM;
+ } else if (nr_iovecs) {
+ bvl = bio->bi_inline_vecs;
+ }
+
+ bio_for_each_segment(bv, bio, iter)
+ bvl[bio->bi_vcnt++] = bv;
+
+ bio->bi_io_vec = bvl;
+ bio->bi_iter.bi_idx = 0;
+ bio->bi_iter.bi_bvec_done = 0;
+
+ bio->bi_flags &= BIO_POOL_MASK - 1;
+ bio->bi_flags |= idx << BIO_POOL_OFFSET;
+
+ return 0;
+}
+EXPORT_SYMBOL(bio_clone_biovec);
+
+/**
* bio_get_nr_vecs - return approx number of vecs
* @bdev: I/O target
*
@@ -918,60 +961,33 @@ void bio_copy_data(struct bio *dst, struct bio *src)
EXPORT_SYMBOL(bio_copy_data);

struct bio_map_data {
- struct bio_vec *iovecs;
- struct sg_iovec *sgvecs;
int nr_sgvecs;
int is_our_pages;
+ struct sg_iovec sgvecs[];
};

static void bio_set_map_data(struct bio_map_data *bmd, struct bio *bio,
struct sg_iovec *iov, int iov_count,
int is_our_pages)
{
- memcpy(bmd->iovecs, bio->bi_io_vec, sizeof(struct bio_vec) * bio->bi_vcnt);
memcpy(bmd->sgvecs, iov, sizeof(struct sg_iovec) * iov_count);
bmd->nr_sgvecs = iov_count;
bmd->is_our_pages = is_our_pages;
bio->bi_private = bmd;
}

-static void bio_free_map_data(struct bio_map_data *bmd)
-{
- kfree(bmd->iovecs);
- kfree(bmd->sgvecs);
- kfree(bmd);
-}
-
static struct bio_map_data *bio_alloc_map_data(int nr_segs,
unsigned int iov_count,
gfp_t gfp_mask)
{
- struct bio_map_data *bmd;
-
if (iov_count > UIO_MAXIOV)
return NULL;

- bmd = kmalloc(sizeof(*bmd), gfp_mask);
- if (!bmd)
- return NULL;
-
- bmd->iovecs = kmalloc(sizeof(struct bio_vec) * nr_segs, gfp_mask);
- if (!bmd->iovecs) {
- kfree(bmd);
- return NULL;
- }
-
- bmd->sgvecs = kmalloc(sizeof(struct sg_iovec) * iov_count, gfp_mask);
- if (bmd->sgvecs)
- return bmd;
-
- kfree(bmd->iovecs);
- kfree(bmd);
- return NULL;
+ return kmalloc(sizeof(struct bio_map_data) +
+ sizeof(struct sg_iovec) * iov_count, gfp_mask);
}

-static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
- struct sg_iovec *iov, int iov_count,
+static int __bio_copy_iov(struct bio *bio, struct sg_iovec *iov, int iov_count,
int to_user, int from_user, int do_free_page)
{
int ret = 0, i;
@@ -981,7 +997,7 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,

bio_for_each_segment_all(bvec, bio, i) {
char *bv_addr = page_address(bvec->bv_page);
- unsigned int bv_len = iovecs[i].bv_len;
+ unsigned int bv_len = bvec->bv_len;

while (bv_len && iov_idx < iov_count) {
unsigned int bytes;
@@ -1035,10 +1051,10 @@ int bio_uncopy_user(struct bio *bio)
int ret = 0;

if (!bio_flagged(bio, BIO_NULL_MAPPED))
- ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
- bmd->nr_sgvecs, bio_data_dir(bio) == READ,
+ ret = __bio_copy_iov(bio, bmd->sgvecs, bmd->nr_sgvecs,
+ bio_data_dir(bio) == READ,
0, bmd->is_our_pages);
- bio_free_map_data(bmd);
+ kfree(bmd);
bio_put(bio);
return ret;
}
@@ -1152,7 +1168,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
*/
if ((!write_to_vm && (!map_data || !map_data->null_mapped)) ||
(map_data && map_data->from_user)) {
- ret = __bio_copy_iov(bio, bio->bi_io_vec, iov, iov_count, 0, 1, 0);
+ ret = __bio_copy_iov(bio, iov, iov_count, 0, 1, 0);
if (ret)
goto cleanup;
}
@@ -1166,7 +1182,7 @@ cleanup:

bio_put(bio);
out_bmd:
- bio_free_map_data(bmd);
+ kfree(bmd);
return ERR_PTR(ret);
}

@@ -1483,16 +1499,15 @@ static void bio_copy_kern_endio(struct bio *bio, int err)

bio_for_each_segment_all(bvec, bio, i) {
char *addr = page_address(bvec->bv_page);
- int len = bmd->iovecs[i].bv_len;

if (read)
- memcpy(p, addr, len);
+ memcpy(p, addr, bvec->bv_len);

__free_page(bvec->bv_page);
- p += len;
+ p += bvec->bv_len;
}

- bio_free_map_data(bmd);
+ kfree(bmd);
bio_put(bio);
}

@@ -1720,62 +1735,25 @@ EXPORT_SYMBOL(bio_endio);
* Allocates and returns a new bio which represents @sectors from the start of
* @bio, and updates @bio to represent the remaining sectors.
*
- * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
- * unchanged.
+ * The newly allocated bio will point to @bio's bi_io_vec; it is the caller's
+ * responsibility to ensure that @bio is not freed before the split.
*/
struct bio *bio_split(struct bio *bio, int sectors,
gfp_t gfp, struct bio_set *bs)
{
- unsigned vcnt = 0, nbytes = sectors << 9;
- struct bio_vec bv;
- struct bvec_iter iter;
struct bio *split = NULL;

BUG_ON(sectors <= 0);
BUG_ON(sectors >= bio_sectors(bio));

- if (bio->bi_rw & REQ_DISCARD) {
- split = bio_alloc_bioset(gfp, 1, bs);
- if (!split)
- return NULL;
- goto out;
- }
-
- bio_for_each_segment(bv, bio, iter) {
- vcnt++;
-
- if (nbytes <= bv.bv_len)
- break;
-
- nbytes -= bv.bv_len;
- }
-
- split = bio_alloc_bioset(gfp, vcnt, bs);
+ split = bio_clone_bioset(bio, gfp, bs);
if (!split)
return NULL;

- bio_for_each_segment(bv, bio, iter) {
- split->bi_io_vec[split->bi_vcnt++] = bv;
-
- if (split->bi_vcnt == vcnt)
- break;
- }
+ split->bi_iter.bi_size = sectors << 9;

- split->bi_io_vec[split->bi_vcnt - 1].bv_len = nbytes;
-out:
- split->bi_bdev = bio->bi_bdev;
- split->bi_iter.bi_sector = bio->bi_iter.bi_sector;
- split->bi_iter.bi_size = sectors << 9;
- split->bi_rw = bio->bi_rw;
-
- if (bio_integrity(bio)) {
- if (bio_integrity_clone(split, bio, gfp)) {
- bio_put(split);
- return NULL;
- }
-
- bio_integrity_trim(split, 0, bio_sectors(split));
- }
+ if (bio_integrity(split))
+ bio_integrity_trim(split, 0, sectors);

bio_advance(bio, split->bi_iter.bi_size);

diff --git a/include/linux/bio.h b/include/linux/bio.h
index c9c2bd0..d65cf85 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -310,6 +310,7 @@ extern void bio_put(struct bio *);

extern void __bio_clone(struct bio *, struct bio *);
extern struct bio *bio_clone_bioset(struct bio *, gfp_t, struct bio_set *bs);
+extern int bio_clone_biovec(struct bio *bio, gfp_t gfp_mask);

extern struct bio_set *fs_bio_set;

diff --git a/mm/bounce.c b/mm/bounce.c
index 4525e3d..985fe23 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -209,6 +209,7 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
return;
bounce:
bio = bio_clone_bioset(*bio_orig, GFP_NOIO, fs_bio_set);
+ bio_clone_biovec(bio, GFP_NOIO);

bio_for_each_segment_all(to, bio, i) {
struct page *page = to->bv_page;
--
1.8.4.rc1

2013-08-07 21:55:33

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 13/22] ceph: Convert to immutable biovecs

Now that we've got a mechanism for immutable biovecs -
bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
respect it instead of using the bvec array directly.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Sage Weil <[email protected]>
Cc: [email protected]
---
include/linux/ceph/messenger.h | 4 ++--
net/ceph/messenger.c | 43 +++++++++++++++++-------------------------
2 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
index 7c1420b..091fdb6 100644
--- a/include/linux/ceph/messenger.h
+++ b/include/linux/ceph/messenger.h
@@ -1,6 +1,7 @@
#ifndef __FS_CEPH_MESSENGER_H
#define __FS_CEPH_MESSENGER_H

+#include <linux/blk_types.h>
#include <linux/kref.h>
#include <linux/mutex.h>
#include <linux/net.h>
@@ -119,8 +120,7 @@ struct ceph_msg_data_cursor {
#ifdef CONFIG_BLOCK
struct { /* bio */
struct bio *bio; /* bio from list */
- unsigned int vector_index; /* vector from bio */
- unsigned int vector_offset; /* bytes from vector */
+ struct bvec_iter bvec_iter;
};
#endif /* CONFIG_BLOCK */
struct { /* pages */
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index eb0a46a..86d336e 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -777,13 +777,12 @@ static void ceph_msg_data_bio_cursor_init(struct ceph_msg_data_cursor *cursor,

bio = data->bio;
BUG_ON(!bio);
- BUG_ON(!bio->bi_vcnt);

cursor->resid = min(length, data->bio_length);
cursor->bio = bio;
- cursor->vector_index = 0;
- cursor->vector_offset = 0;
- cursor->last_piece = length <= bio->bi_io_vec[0].bv_len;
+ cursor->bvec_iter = bio->bi_iter;
+ cursor->last_piece =
+ cursor->resid <= bio_iter_len(bio, cursor->bvec_iter);
}

static struct page *ceph_msg_data_bio_next(struct ceph_msg_data_cursor *cursor,
@@ -792,71 +791,63 @@ static struct page *ceph_msg_data_bio_next(struct ceph_msg_data_cursor *cursor,
{
struct ceph_msg_data *data = cursor->data;
struct bio *bio;
- struct bio_vec *bio_vec;
- unsigned int index;
+ struct bio_vec bio_vec;

BUG_ON(data->type != CEPH_MSG_DATA_BIO);

bio = cursor->bio;
BUG_ON(!bio);

- index = cursor->vector_index;
- BUG_ON(index >= (unsigned int) bio->bi_vcnt);
+ bio_vec = bio_iter_iovec(bio, cursor->bvec_iter);

- bio_vec = &bio->bi_io_vec[index];
- BUG_ON(cursor->vector_offset >= bio_vec->bv_len);
- *page_offset = (size_t) (bio_vec->bv_offset + cursor->vector_offset);
+ *page_offset = (size_t) bio_vec.bv_offset;
BUG_ON(*page_offset >= PAGE_SIZE);
if (cursor->last_piece) /* pagelist offset is always 0 */
*length = cursor->resid;
else
- *length = (size_t) (bio_vec->bv_len - cursor->vector_offset);
+ *length = (size_t) bio_vec.bv_len;
BUG_ON(*length > cursor->resid);
BUG_ON(*page_offset + *length > PAGE_SIZE);

- return bio_vec->bv_page;
+ return bio_vec.bv_page;
}

static bool ceph_msg_data_bio_advance(struct ceph_msg_data_cursor *cursor,
size_t bytes)
{
struct bio *bio;
- struct bio_vec *bio_vec;
- unsigned int index;
+ struct bio_vec bio_vec;

BUG_ON(cursor->data->type != CEPH_MSG_DATA_BIO);

bio = cursor->bio;
BUG_ON(!bio);

- index = cursor->vector_index;
- BUG_ON(index >= (unsigned int) bio->bi_vcnt);
- bio_vec = &bio->bi_io_vec[index];
+ bio_vec = bio_iter_iovec(bio, cursor->bvec_iter);

/* Advance the cursor offset */

BUG_ON(cursor->resid < bytes);
cursor->resid -= bytes;
- cursor->vector_offset += bytes;
- if (cursor->vector_offset < bio_vec->bv_len)
+
+ bio_advance_iter(bio, &cursor->bvec_iter, bytes);
+
+ if (bytes < bio_vec.bv_len)
return false; /* more bytes to process in this segment */
- BUG_ON(cursor->vector_offset != bio_vec->bv_len);

/* Move on to the next segment, and possibly the next bio */

- if (++index == (unsigned int) bio->bi_vcnt) {
+ if (!cursor->bvec_iter.bi_size) {
bio = bio->bi_next;
- index = 0;
+ cursor->bvec_iter = bio->bi_iter;
}
cursor->bio = bio;
- cursor->vector_index = index;
- cursor->vector_offset = 0;

if (!cursor->last_piece) {
BUG_ON(!cursor->resid);
BUG_ON(!bio);
/* A short read is OK, so use <= rather than == */
- if (cursor->resid <= bio->bi_io_vec[index].bv_len)
+ if (cursor->resid <= bio_iter_len(bio, cursor->bvec_iter))
cursor->last_piece = true;
}

--
1.8.4.rc1

2013-08-07 21:57:06

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 21/22] block: Kill bio_pair_split()

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
---
fs/bio-integrity.c | 45 ---------------------------
fs/bio.c | 90 -----------------------------------------------------
include/linux/bio.h | 30 ------------------
3 files changed, 165 deletions(-)

diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index 61f41ff..72fa942 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -581,51 +581,6 @@ void bio_integrity_trim(struct bio *bio, unsigned int offset,
EXPORT_SYMBOL(bio_integrity_trim);

/**
- * bio_integrity_split - Split integrity metadata
- * @bio: Protected bio
- * @bp: Resulting bio_pair
- * @sectors: Offset
- *
- * Description: Splits an integrity page into a bio_pair.
- */
-void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
-{
- struct blk_integrity *bi;
- struct bio_integrity_payload *bip = bio->bi_integrity;
- unsigned int nr_sectors;
-
- if (bio_integrity(bio) == 0)
- return;
-
- bi = bdev_get_integrity(bio->bi_bdev);
- BUG_ON(bi == NULL);
- BUG_ON(bip->bip_vcnt != 1);
-
- nr_sectors = bio_integrity_hw_sectors(bi, sectors);
-
- bp->bio1.bi_integrity = &bp->bip1;
- bp->bio2.bi_integrity = &bp->bip2;
-
- bp->iv1 = bip->bip_vec[bip->bip_iter.bi_idx];
- bp->iv2 = bip->bip_vec[bip->bip_iter.bi_idx];
-
- bp->bip1.bip_vec = &bp->iv1;
- bp->bip2.bip_vec = &bp->iv2;
-
- bp->iv1.bv_len = sectors * bi->tuple_size;
- bp->iv2.bv_offset += sectors * bi->tuple_size;
- bp->iv2.bv_len -= sectors * bi->tuple_size;
-
- bp->bip1.bip_iter.bi_sector = bio->bi_integrity->bip_iter.bi_sector;
- bp->bip2.bip_iter.bi_sector =
- bio->bi_integrity->bip_iter.bi_sector + nr_sectors;
-
- bp->bip1.bip_vcnt = bp->bip2.bip_vcnt = 1;
- bp->bip1.bip_iter.bi_idx = bp->bip2.bip_iter.bi_idx = 0;
-}
-EXPORT_SYMBOL(bio_integrity_split);
-
-/**
* bio_integrity_clone - Callback for cloning bios with integrity metadata
* @bio: New bio
* @bio_src: Original bio
diff --git a/fs/bio.c b/fs/bio.c
index 6e4185a..7cc64c4 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -38,8 +38,6 @@
*/
#define BIO_INLINE_VECS 4

-static mempool_t *bio_split_pool __read_mostly;
-
/*
* if you change this list, also change bvec_alloc or things will
* break badly! cannot be bigger than what you can fit into an
@@ -1785,89 +1783,6 @@ out:
}
EXPORT_SYMBOL(bio_split);

-void bio_pair_release(struct bio_pair *bp)
-{
- if (atomic_dec_and_test(&bp->cnt)) {
- struct bio *master = bp->bio1.bi_private;
-
- bio_endio(master, bp->error);
- mempool_free(bp, bp->bio2.bi_private);
- }
-}
-EXPORT_SYMBOL(bio_pair_release);
-
-static void bio_pair_end_1(struct bio *bi, int err)
-{
- struct bio_pair *bp = container_of(bi, struct bio_pair, bio1);
-
- if (err)
- bp->error = err;
-
- bio_pair_release(bp);
-}
-
-static void bio_pair_end_2(struct bio *bi, int err)
-{
- struct bio_pair *bp = container_of(bi, struct bio_pair, bio2);
-
- if (err)
- bp->error = err;
-
- bio_pair_release(bp);
-}
-
-/*
- * split a bio - only worry about a bio with a single page in its iovec
- */
-struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
-{
- struct bio_pair *bp = mempool_alloc(bio_split_pool, GFP_NOIO);
-
- if (!bp)
- return bp;
-
- trace_block_split(bdev_get_queue(bi->bi_bdev), bi,
- bi->bi_iter.bi_sector + first_sectors);
-
- BUG_ON(bio_multiple_segments(bi));
- atomic_set(&bp->cnt, 3);
- bp->error = 0;
- bp->bio1 = *bi;
- bp->bio2 = *bi;
- bp->bio2.bi_iter.bi_sector += first_sectors;
- bp->bio2.bi_iter.bi_size -= first_sectors << 9;
- bp->bio1.bi_iter.bi_size = first_sectors << 9;
-
- if (bi->bi_vcnt != 0) {
- bp->bv1 = bio_iovec(bi);
- bp->bv2 = bio_iovec(bi);
-
- if (bio_is_rw(bi)) {
- bp->bv2.bv_offset += first_sectors << 9;
- bp->bv2.bv_len -= first_sectors << 9;
- bp->bv1.bv_len = first_sectors << 9;
- }
-
- bp->bio1.bi_io_vec = &bp->bv1;
- bp->bio2.bi_io_vec = &bp->bv2;
-
- bp->bio1.bi_max_vecs = 1;
- bp->bio2.bi_max_vecs = 1;
- }
-
- bp->bio1.bi_end_io = bio_pair_end_1;
- bp->bio2.bi_end_io = bio_pair_end_2;
-
- bp->bio1.bi_private = bi;
- bp->bio2.bi_private = bio_split_pool;
-
- if (bio_integrity(bi))
- bio_integrity_split(bi, bp, first_sectors);
-
- return bp;
-}
-EXPORT_SYMBOL(bio_pair_split);
-
/**
* bio_trim - trim a bio
* @bio: bio to trim
@@ -2069,11 +1984,6 @@ static int __init init_bio(void)
if (bioset_integrity_create(fs_bio_set, BIO_POOL_SIZE))
panic("bio: can't create integrity pool\n");

- bio_split_pool = mempool_create_kmalloc_pool(BIO_SPLIT_ENTRIES,
- sizeof(struct bio_pair));
- if (!bio_split_pool)
- panic("bio: can't create split pool\n");
-
return 0;
}
subsys_initcall(init_bio);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 3ed3312..c9c2bd0 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -278,30 +278,7 @@ struct bio_integrity_payload {
};
#endif /* CONFIG_BLK_DEV_INTEGRITY */

-/*
- * A bio_pair is used when we need to split a bio.
- * This can only happen for a bio that refers to just one
- * page of data, and in the unusual situation when the
- * page crosses a chunk/device boundary
- *
- * The address of the master bio is stored in bio1.bi_private
- * The address of the pool the pair was allocated from is stored
- * in bio2.bi_private
- */
-struct bio_pair {
- struct bio bio1, bio2;
- struct bio_vec bv1, bv2;
-#if defined(CONFIG_BLK_DEV_INTEGRITY)
- struct bio_integrity_payload bip1, bip2;
- struct bio_vec iv1, iv2;
-#endif
- atomic_t cnt;
- int error;
-};
-extern struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors);
-extern void bio_pair_release(struct bio_pair *dbio);
extern void bio_trim(struct bio *bio, int offset, int size);
-
extern struct bio *bio_split(struct bio *bio, int sectors,
gfp_t gfp, struct bio_set *bs);

@@ -673,7 +650,6 @@ extern int bio_integrity_prep(struct bio *);
extern void bio_integrity_endio(struct bio *, int);
extern void bio_integrity_advance(struct bio *, unsigned int);
extern void bio_integrity_trim(struct bio *, unsigned int, unsigned int);
-extern void bio_integrity_split(struct bio *, struct bio_pair *, int);
extern int bio_integrity_clone(struct bio *, struct bio *, gfp_t);
extern int bioset_integrity_create(struct bio_set *, int);
extern void bioset_integrity_free(struct bio_set *);
@@ -717,12 +693,6 @@ static inline int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
return 0;
}

-static inline void bio_integrity_split(struct bio *bio, struct bio_pair *bp,
- int sectors)
-{
- return;
-}
-
static inline void bio_integrity_advance(struct bio *bio,
unsigned int bytes_done)
{
--
1.8.4.rc1

2013-08-07 21:57:37

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 20/22] block: Introduce new bio_split()

The new bio_split() can split arbitrary bios - it's not restricted to
single page bios, like the old bio_split() (previously renamed to
bio_pair_split()). It also has different semantics - it doesn't allocate
a struct bio_pair, leaving it up to the caller to handle completions.

Then convert the existing bio_pair_split() users to the new bio_split()
- and also nvme, which was open coding bio splitting.

(We have to take that BUG_ON() out of bio_integrity_trim() because this
bio_split() needs to use it, and there's no reason it has to be used on
bios marked as cloned; BIO_CLONED doesn't seem to have clearly
documented semantics anyways.)

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Neil Brown <[email protected]>
---
drivers/block/nvme-core.c | 108 +++--------------------------------
drivers/block/pktcdvd.c | 135 ++++++++++++++++++++++++--------------------
drivers/md/bcache/bcache.h | 1 -
drivers/md/bcache/btree.c | 2 +-
drivers/md/bcache/io.c | 82 +--------------------------
drivers/md/bcache/request.c | 4 +-
drivers/md/linear.c | 96 +++++++++++++++----------------
drivers/md/raid0.c | 77 +++++++++----------------
drivers/md/raid10.c | 113 +++++++++++++++---------------------
fs/bio.c | 73 ++++++++++++++++++++++++
include/linux/bio.h | 22 ++++++++
11 files changed, 306 insertions(+), 407 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 1982f346..d38b9b9 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -406,106 +406,19 @@ int nvme_setup_prps(struct nvme_dev *dev, struct nvme_common_command *cmd,
return total_len;
}

-struct nvme_bio_pair {
- struct bio b1, b2, *parent;
- struct bio_vec *bv1, *bv2;
- int err;
- atomic_t cnt;
-};
-
-static void nvme_bio_pair_endio(struct bio *bio, int err)
-{
- struct nvme_bio_pair *bp = bio->bi_private;
-
- if (err)
- bp->err = err;
-
- if (atomic_dec_and_test(&bp->cnt)) {
- bio_endio(bp->parent, bp->err);
- if (bp->bv1)
- kfree(bp->bv1);
- if (bp->bv2)
- kfree(bp->bv2);
- kfree(bp);
- }
-}
-
-static struct nvme_bio_pair *nvme_bio_split(struct bio *bio, int idx,
- int len, int offset)
-{
- struct nvme_bio_pair *bp;
-
- BUG_ON(len > bio->bi_iter.bi_size);
- BUG_ON(idx > bio->bi_vcnt);
-
- bp = kmalloc(sizeof(*bp), GFP_ATOMIC);
- if (!bp)
- return NULL;
- bp->err = 0;
-
- bp->b1 = *bio;
- bp->b2 = *bio;
-
- bp->b1.bi_iter.bi_size = len;
- bp->b2.bi_iter.bi_size -= len;
- bp->b1.bi_vcnt = idx;
- bp->b2.bi_iter.bi_idx = idx;
- bp->b2.bi_iter.bi_sector += len >> 9;
-
- if (offset) {
- bp->bv1 = kmalloc(bio->bi_max_vecs * sizeof(struct bio_vec),
- GFP_ATOMIC);
- if (!bp->bv1)
- goto split_fail_1;
-
- bp->bv2 = kmalloc(bio->bi_max_vecs * sizeof(struct bio_vec),
- GFP_ATOMIC);
- if (!bp->bv2)
- goto split_fail_2;
-
- memcpy(bp->bv1, bio->bi_io_vec,
- bio->bi_max_vecs * sizeof(struct bio_vec));
- memcpy(bp->bv2, bio->bi_io_vec,
- bio->bi_max_vecs * sizeof(struct bio_vec));
-
- bp->b1.bi_io_vec = bp->bv1;
- bp->b2.bi_io_vec = bp->bv2;
- bp->b2.bi_io_vec[idx].bv_offset += offset;
- bp->b2.bi_io_vec[idx].bv_len -= offset;
- bp->b1.bi_io_vec[idx].bv_len = offset;
- bp->b1.bi_vcnt++;
- } else
- bp->bv1 = bp->bv2 = NULL;
-
- bp->b1.bi_private = bp;
- bp->b2.bi_private = bp;
-
- bp->b1.bi_end_io = nvme_bio_pair_endio;
- bp->b2.bi_end_io = nvme_bio_pair_endio;
-
- bp->parent = bio;
- atomic_set(&bp->cnt, 2);
-
- return bp;
-
- split_fail_2:
- kfree(bp->bv1);
- split_fail_1:
- kfree(bp);
- return NULL;
-}
-
static int nvme_split_and_submit(struct bio *bio, struct nvme_queue *nvmeq,
- int idx, int len, int offset)
+ int len)
{
- struct nvme_bio_pair *bp = nvme_bio_split(bio, idx, len, offset);
- if (!bp)
+ struct bio *split = bio_split(bio, len >> 9, GFP_ATOMIC, NULL);
+ if (!split)
return -ENOMEM;

+ bio_chain(split, bio);
+
if (bio_list_empty(&nvmeq->sq_cong))
add_wait_queue(&nvmeq->sq_full, &nvmeq->sq_cong_wait);
- bio_list_add(&nvmeq->sq_cong, &bp->b1);
- bio_list_add(&nvmeq->sq_cong, &bp->b2);
+ bio_list_add(&nvmeq->sq_cong, split);
+ bio_list_add(&nvmeq->sq_cong, bio);

return 0;
}
@@ -535,8 +448,7 @@ static int nvme_map_bio(struct nvme_queue *nvmeq, struct nvme_iod *iod,
} else {
if (!first && BIOVEC_NOT_VIRT_MERGEABLE(&bvprv, &bvec))
return nvme_split_and_submit(bio, nvmeq,
- iter.bi_idx,
- length, 0);
+ length);

sg = sg ? sg + 1 : iod->sg;
sg_set_page(sg, bvec.bv_page,
@@ -545,9 +457,7 @@ static int nvme_map_bio(struct nvme_queue *nvmeq, struct nvme_iod *iod,
}

if (split_len - length < bvec.bv_len)
- return nvme_split_and_submit(bio, nvmeq, iter.bi_idx,
- split_len,
- split_len - length);
+ return nvme_split_and_submit(bio, nvmeq, split_len);
length += bvec.bv_len;
bvprv = bvec;
first = 0;
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index c4c8871..d4d2da6 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2353,74 +2353,29 @@ static void pkt_end_io_read_cloned(struct bio *bio, int err)
pkt_bio_finished(pd);
}

-static void pkt_make_request(struct request_queue *q, struct bio *bio)
+static void pkt_make_request_read(struct pktcdvd_device *pd, struct bio *bio)
{
- struct pktcdvd_device *pd;
- char b[BDEVNAME_SIZE];
+ struct bio *cloned_bio = bio_clone(bio, GFP_NOIO);
+ struct packet_stacked_data *psd = mempool_alloc(psd_pool, GFP_NOIO);
+
+ psd->pd = pd;
+ psd->bio = bio;
+ cloned_bio->bi_bdev = pd->bdev;
+ cloned_bio->bi_private = psd;
+ cloned_bio->bi_end_io = pkt_end_io_read_cloned;
+ pd->stats.secs_r += bio_sectors(bio);
+ pkt_queue_bio(pd, cloned_bio);
+}
+
+static void pkt_make_request_write(struct request_queue *q, struct bio *bio)
+{
+ struct pktcdvd_device *pd = q->queuedata;
sector_t zone;
struct packet_data *pkt;
int was_empty, blocked_bio;
struct pkt_rb_node *node;

- pd = q->queuedata;
- if (!pd) {
- printk(DRIVER_NAME": %s incorrect request queue\n", bdevname(bio->bi_bdev, b));
- goto end_io;
- }
-
- /*
- * Clone READ bios so we can have our own bi_end_io callback.
- */
- if (bio_data_dir(bio) == READ) {
- struct bio *cloned_bio = bio_clone(bio, GFP_NOIO);
- struct packet_stacked_data *psd = mempool_alloc(psd_pool, GFP_NOIO);
-
- psd->pd = pd;
- psd->bio = bio;
- cloned_bio->bi_bdev = pd->bdev;
- cloned_bio->bi_private = psd;
- cloned_bio->bi_end_io = pkt_end_io_read_cloned;
- pd->stats.secs_r += bio_sectors(bio);
- pkt_queue_bio(pd, cloned_bio);
- return;
- }
-
- if (!test_bit(PACKET_WRITABLE, &pd->flags)) {
- printk(DRIVER_NAME": WRITE for ro device %s (%llu)\n",
- pd->name, (unsigned long long)bio->bi_iter.bi_sector);
- goto end_io;
- }
-
- if (!bio->bi_iter.bi_size || (bio->bi_iter.bi_size % CD_FRAMESIZE)) {
- printk(DRIVER_NAME": wrong bio size\n");
- goto end_io;
- }
-
- blk_queue_bounce(q, &bio);
-
zone = ZONE(bio->bi_iter.bi_sector, pd);
- VPRINTK("pkt_make_request: start = %6llx stop = %6llx\n",
- (unsigned long long)bio->bi_sector,
- (unsigned long long)bio_end_sector(bio));
-
- /* Check if we have to split the bio */
- {
- struct bio_pair *bp;
- sector_t last_zone;
- int first_sectors;
-
- last_zone = ZONE(bio_end_sector(bio) - 1, pd);
- if (last_zone != zone) {
- BUG_ON(last_zone != zone + pd->settings.size);
- first_sectors = last_zone - bio->bi_iter.bi_sector;
- bp = bio_pair_split(bio, first_sectors);
- BUG_ON(!bp);
- pkt_make_request(q, &bp->bio1);
- pkt_make_request(q, &bp->bio2);
- bio_pair_release(bp);
- return;
- }
- }

/*
* If we find a matching packet in state WAITING or READ_WAIT, we can
@@ -2494,6 +2449,64 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
*/
wake_up(&pd->wqueue);
}
+}
+
+static void pkt_make_request(struct request_queue *q, struct bio *bio)
+{
+ struct pktcdvd_device *pd;
+ char b[BDEVNAME_SIZE];
+ struct bio *split;
+
+ pd = q->queuedata;
+ if (!pd) {
+ printk(DRIVER_NAME": %s incorrect request queue\n",
+ bdevname(bio->bi_bdev, b));
+ goto end_io;
+ }
+
+ VPRINTK("pkt_make_request: start = %6llx stop = %6llx\n",
+ (unsigned long long)bio->bi_sector,
+ (unsigned long long)(bio->bi_sector + bio_sectors(bio)));
+
+ /*
+ * Clone READ bios so we can have our own bi_end_io callback.
+ */
+ if (bio_data_dir(bio) == READ) {
+ pkt_make_request_read(pd, bio);
+ return;
+ }
+
+ if (!test_bit(PACKET_WRITABLE, &pd->flags)) {
+ printk(DRIVER_NAME": WRITE for ro device %s (%llu)\n",
+ pd->name, (unsigned long long)bio->bi_iter.bi_sector);
+ goto end_io;
+ }
+
+ if (!bio->bi_iter.bi_size || (bio->bi_iter.bi_size % CD_FRAMESIZE)) {
+ printk(DRIVER_NAME": wrong bio size\n");
+ goto end_io;
+ }
+
+ blk_queue_bounce(q, &bio);
+
+ do {
+ sector_t zone = ZONE(bio->bi_iter.bi_sector, pd);
+ sector_t last_zone = ZONE(bio_end_sector(bio) - 1, pd);
+
+ if (last_zone != zone) {
+ BUG_ON(last_zone != zone + pd->settings.size);
+
+ split = bio_split(bio, last_zone -
+ bio->bi_iter.bi_sector,
+ GFP_NOIO, fs_bio_set);
+ bio_chain(split, bio);
+ } else {
+ split = bio;
+ }
+
+ pkt_make_request_write(q, split);
+ } while (split != bio);
+
return;
end_io:
bio_io_error(bio);
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index ec5f17c..691bfda 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1164,7 +1164,6 @@ void bch_bbio_endio(struct cache_set *, struct bio *, int, const char *);
void bch_bbio_free(struct bio *, struct cache_set *);
struct bio *bch_bbio_alloc(struct cache_set *);

-struct bio *bch_bio_split(struct bio *, int, gfp_t, struct bio_set *);
void bch_generic_make_request(struct bio *, struct bio_split_pool *);
void __bch_submit_bbio(struct bio *, struct cache_set *);
void bch_submit_bbio(struct bio *, struct cache_set *, struct bkey *, unsigned);
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 72c86ed..4402966 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -2195,7 +2195,7 @@ static int submit_partial_cache_hit(struct btree *b, struct btree_op *op,
unsigned sectors = min_t(uint64_t, INT_MAX,
KEY_OFFSET(k) - bio->bi_iter.bi_sector);

- n = bch_bio_split(bio, sectors, GFP_NOIO, s->d->bio_split);
+ n = bio_next_split(bio, sectors, GFP_NOIO, s->d->bio_split);
if (n == bio)
op->lookup_done = true;

diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 10f6065..e38bc8f 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -11,84 +11,6 @@

#include <linux/blkdev.h>

-/**
- * bch_bio_split - split a bio
- * @bio: bio to split
- * @sectors: number of sectors to split from the front of @bio
- * @gfp: gfp mask
- * @bs: bio set to allocate from
- *
- * Allocates and returns a new bio which represents @sectors from the start of
- * @bio, and updates @bio to represent the remaining sectors.
- *
- * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
- * unchanged.
- *
- * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
- * bvec boundry; it is the caller's responsibility to ensure that @bio is not
- * freed before the split.
- */
-struct bio *bch_bio_split(struct bio *bio, int sectors,
- gfp_t gfp, struct bio_set *bs)
-{
- unsigned vcnt = 0, nbytes = sectors << 9;
- struct bio_vec bv;
- struct bvec_iter iter;
- struct bio *ret = NULL;
-
- BUG_ON(sectors <= 0);
-
- if (sectors >= bio_sectors(bio))
- return bio;
-
- if (bio->bi_rw & REQ_DISCARD) {
- ret = bio_alloc_bioset(gfp, 1, bs);
- if (!ret)
- return NULL;
- goto out;
- }
-
- bio_for_each_segment(bv, bio, iter) {
- vcnt++;
-
- if (nbytes <= bv.bv_len)
- break;
-
- nbytes -= bv.bv_len;
- }
-
- ret = bio_alloc_bioset(gfp, vcnt, bs);
- if (!ret)
- return NULL;
-
- bio_for_each_segment(bv, bio, iter) {
- ret->bi_io_vec[ret->bi_vcnt++] = bv;
-
- if (ret->bi_vcnt == vcnt)
- break;
- }
-
- ret->bi_io_vec[ret->bi_vcnt - 1].bv_len = nbytes;
-out:
- ret->bi_bdev = bio->bi_bdev;
- ret->bi_iter.bi_sector = bio->bi_iter.bi_sector;
- ret->bi_iter.bi_size = sectors << 9;
- ret->bi_rw = bio->bi_rw;
-
- if (bio_integrity(bio)) {
- if (bio_integrity_clone(ret, bio, gfp)) {
- bio_put(ret);
- return NULL;
- }
-
- bio_integrity_trim(ret, 0, bio_sectors(ret));
- }
-
- bio_advance(bio, ret->bi_iter.bi_size);
-
- return ret;
-}
-
static unsigned bch_bio_max_sectors(struct bio *bio)
{
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
@@ -172,8 +94,8 @@ void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
bio_get(bio);

do {
- n = bch_bio_split(bio, bch_bio_max_sectors(bio),
- GFP_NOIO, s->p->bio_split);
+ n = bio_next_split(bio, bch_bio_max_sectors(bio),
+ GFP_NOIO, s->p->bio_split);

n->bi_end_io = bch_bio_submit_split_endio;
n->bi_private = &s->cl;
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 39cffcc..542ae36 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -514,7 +514,7 @@ static void bch_insert_data_loop(struct closure *cl)
if (!bch_alloc_sectors(k, bio_sectors(bio), s))
goto err;

- n = bch_bio_split(bio, KEY_SIZE(k), GFP_NOIO, split);
+ n = bio_next_split(bio, KEY_SIZE(k), GFP_NOIO, split);

n->bi_end_io = bch_insert_data_endio;
n->bi_private = cl;
@@ -854,7 +854,7 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
struct bio *miss;

- miss = bch_bio_split(bio, sectors, GFP_NOIO, s->d->bio_split);
+ miss = bio_next_split(bio, sectors, GFP_NOIO, s->d->bio_split);
if (miss == bio)
s->op.lookup_done = true;

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index e9b53e9..56f534b 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -288,65 +288,65 @@ static int linear_stop (struct mddev *mddev)

static void linear_make_request(struct mddev *mddev, struct bio *bio)
{
+ char b[BDEVNAME_SIZE];
struct dev_info *tmp_dev;
- sector_t start_sector;
+ struct bio *split;
+ sector_t start_sector, end_sector, data_offset;

if (unlikely(bio->bi_rw & REQ_FLUSH)) {
md_flush_request(mddev, bio);
return;
}

- rcu_read_lock();
- tmp_dev = which_dev(mddev, bio->bi_iter.bi_sector);
- start_sector = tmp_dev->end_sector - tmp_dev->rdev->sectors;
-
-
- if (unlikely(bio->bi_iter.bi_sector >= (tmp_dev->end_sector)
- || (bio->bi_iter.bi_sector < start_sector))) {
- char b[BDEVNAME_SIZE];
-
- printk(KERN_ERR
- "md/linear:%s: make_request: Sector %llu out of bounds on "
- "dev %s: %llu sectors, offset %llu\n",
- mdname(mddev),
- (unsigned long long)bio->bi_iter.bi_sector,
- bdevname(tmp_dev->rdev->bdev, b),
- (unsigned long long)tmp_dev->rdev->sectors,
- (unsigned long long)start_sector);
- rcu_read_unlock();
- bio_io_error(bio);
- return;
- }
- if (unlikely(bio_end_sector(bio) > tmp_dev->end_sector)) {
- /* This bio crosses a device boundary, so we have to
- * split it.
- */
- struct bio_pair *bp;
- sector_t end_sector = tmp_dev->end_sector;
+ do {
+ rcu_read_lock();

- rcu_read_unlock();
-
- bp = bio_pair_split(bio, end_sector - bio->bi_iter.bi_sector);
+ tmp_dev = which_dev(mddev, bio->bi_iter.bi_sector);
+ start_sector = tmp_dev->end_sector - tmp_dev->rdev->sectors;
+ end_sector = tmp_dev->end_sector;
+ data_offset = tmp_dev->rdev->data_offset;
+ bio->bi_bdev = tmp_dev->rdev->bdev;

- linear_make_request(mddev, &bp->bio1);
- linear_make_request(mddev, &bp->bio2);
- bio_pair_release(bp);
- return;
- }
-
- bio->bi_bdev = tmp_dev->rdev->bdev;
- bio->bi_iter.bi_sector = bio->bi_iter.bi_sector - start_sector
- + tmp_dev->rdev->data_offset;
- rcu_read_unlock();
+ rcu_read_unlock();

- if (unlikely((bio->bi_rw & REQ_DISCARD) &&
- !blk_queue_discard(bdev_get_queue(bio->bi_bdev)))) {
- /* Just ignore it */
- bio_endio(bio, 0);
- return;
- }
+ if (unlikely(bio->bi_iter.bi_sector >= end_sector ||
+ bio->bi_iter.bi_sector < start_sector))
+ goto out_of_bounds;
+
+ if (unlikely(bio_end_sector(bio) > end_sector)) {
+ /* This bio crosses a device boundary, so we have to
+ * split it.
+ */
+ split = bio_split(bio, end_sector -
+ bio->bi_iter.bi_sector,
+ GFP_NOIO, fs_bio_set);
+ bio_chain(split, bio);
+ } else {
+ split = bio;
+ }

- generic_make_request(bio);
+ split->bi_iter.bi_sector = split->bi_iter.bi_sector -
+ start_sector + data_offset;
+
+ if (unlikely((split->bi_rw & REQ_DISCARD) &&
+ !blk_queue_discard(bdev_get_queue(split->bi_bdev)))) {
+ /* Just ignore it */
+ bio_endio(split, 0);
+ } else
+ generic_make_request(split);
+ } while (split != bio);
+ return;
+
+out_of_bounds:
+ printk(KERN_ERR
+ "md/linear:%s: make_request: Sector %llu out of bounds on "
+ "dev %s: %llu sectors, offset %llu\n",
+ mdname(mddev),
+ (unsigned long long)bio->bi_iter.bi_sector,
+ bdevname(tmp_dev->rdev->bdev, b),
+ (unsigned long long)tmp_dev->rdev->sectors,
+ (unsigned long long)start_sector);
+ bio_io_error(bio);
}

static void linear_status (struct seq_file *seq, struct mddev *mddev)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index ea754dd..407a99e 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -513,65 +513,44 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,

static void raid0_make_request(struct mddev *mddev, struct bio *bio)
{
- unsigned int chunk_sects;
- sector_t sector_offset;
struct strip_zone *zone;
struct md_rdev *tmp_dev;
+ struct bio *split;

if (unlikely(bio->bi_rw & REQ_FLUSH)) {
md_flush_request(mddev, bio);
return;
}

- chunk_sects = mddev->chunk_sectors;
- if (unlikely(!is_io_in_chunk_boundary(mddev, chunk_sects, bio))) {
+ do {
sector_t sector = bio->bi_iter.bi_sector;
- struct bio_pair *bp;
- /* Sanity check -- queue functions should prevent this happening */
- if (bio_multiple_segments(bio))
- goto bad_map;
- /* This is a one page bio that upper layers
- * refuse to split for us, so we need to split it.
- */
- if (likely(is_power_of_2(chunk_sects)))
- bp = bio_pair_split(bio, chunk_sects - (sector &
- (chunk_sects-1)));
- else
- bp = bio_pair_split(bio, chunk_sects -
- sector_div(sector, chunk_sects));
- raid0_make_request(mddev, &bp->bio1);
- raid0_make_request(mddev, &bp->bio2);
- bio_pair_release(bp);
- return;
- }
-
- sector_offset = bio->bi_iter.bi_sector;
- zone = find_zone(mddev->private, &sector_offset);
- tmp_dev = map_sector(mddev, zone, bio->bi_iter.bi_sector,
- &sector_offset);
- bio->bi_bdev = tmp_dev->bdev;
- bio->bi_iter.bi_sector = sector_offset + zone->dev_start +
- tmp_dev->data_offset;
-
- if (unlikely((bio->bi_rw & REQ_DISCARD) &&
- !blk_queue_discard(bdev_get_queue(bio->bi_bdev)))) {
- /* Just ignore it */
- bio_endio(bio, 0);
- return;
- }
-
- generic_make_request(bio);
- return;
-
-bad_map:
- printk("md/raid0:%s: make_request bug: can't convert block across chunks"
- " or bigger than %dk %llu %d\n",
- mdname(mddev), chunk_sects / 2,
- (unsigned long long)bio->bi_iter.bi_sector,
- bio_sectors(bio) / 2);
+ unsigned chunk_sects = mddev->chunk_sectors;
+
+ unsigned sectors = chunk_sects -
+ (likely(is_power_of_2(chunk_sects))
+ ? (sector & (chunk_sects-1))
+ : sector_div(sector, chunk_sects));
+
+ if (sectors < bio_sectors(bio)) {
+ split = bio_split(bio, sectors, GFP_NOIO, fs_bio_set);
+ bio_chain(split, bio);
+ } else {
+ split = bio;
+ }

- bio_io_error(bio);
- return;
+ zone = find_zone(mddev->private, &sector);
+ tmp_dev = map_sector(mddev, zone, sector, &sector);
+ split->bi_bdev = tmp_dev->bdev;
+ split->bi_iter.bi_sector = sector + zone->dev_start +
+ tmp_dev->data_offset;
+
+ if (unlikely((split->bi_rw & REQ_DISCARD) &&
+ !blk_queue_discard(bdev_get_queue(split->bi_bdev)))) {
+ /* Just ignore it */
+ bio_endio(split, 0);
+ } else
+ generic_make_request(split);
+ } while (split != bio);
}

static void raid0_status(struct seq_file *seq, struct mddev *mddev)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 7e49cbe..9870e45 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1152,14 +1152,12 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
kfree(plug);
}

-static void make_request(struct mddev *mddev, struct bio * bio)
+static void __make_request(struct mddev *mddev, struct bio *bio)
{
struct r10conf *conf = mddev->private;
struct r10bio *r10_bio;
struct bio *read_bio;
int i;
- sector_t chunk_mask = (conf->geo.chunk_mask & conf->prev.chunk_mask);
- int chunk_sects = chunk_mask + 1;
const int rw = bio_data_dir(bio);
const unsigned long do_sync = (bio->bi_rw & REQ_SYNC);
const unsigned long do_fua = (bio->bi_rw & REQ_FUA);
@@ -1174,69 +1172,6 @@ static void make_request(struct mddev *mddev, struct bio * bio)
int max_sectors;
int sectors;

- if (unlikely(bio->bi_rw & REQ_FLUSH)) {
- md_flush_request(mddev, bio);
- return;
- }
-
- /* If this request crosses a chunk boundary, we need to
- * split it. This will only happen for 1 PAGE (or less) requests.
- */
- if (unlikely((bio->bi_iter.bi_sector & chunk_mask) + bio_sectors(bio)
- > chunk_sects
- && (conf->geo.near_copies < conf->geo.raid_disks
- || conf->prev.near_copies < conf->prev.raid_disks))) {
- struct bio_pair *bp;
- /* Sanity check -- queue functions should prevent this happening */
- if (bio_multiple_segments(bio))
- goto bad_map;
- /* This is a one page bio that upper layers
- * refuse to split for us, so we need to split it.
- */
- bp = bio_pair_split(bio, chunk_sects -
- (bio->bi_iter.bi_sector & (chunk_sects - 1)));
-
- /* Each of these 'make_request' calls will call 'wait_barrier'.
- * If the first succeeds but the second blocks due to the resync
- * thread raising the barrier, we will deadlock because the
- * IO to the underlying device will be queued in generic_make_request
- * and will never complete, so will never reduce nr_pending.
- * So increment nr_waiting here so no new raise_barriers will
- * succeed, and so the second wait_barrier cannot block.
- */
- spin_lock_irq(&conf->resync_lock);
- conf->nr_waiting++;
- spin_unlock_irq(&conf->resync_lock);
-
- make_request(mddev, &bp->bio1);
- make_request(mddev, &bp->bio2);
-
- spin_lock_irq(&conf->resync_lock);
- conf->nr_waiting--;
- wake_up(&conf->wait_barrier);
- spin_unlock_irq(&conf->resync_lock);
-
- bio_pair_release(bp);
- return;
- bad_map:
- printk("md/raid10:%s: make_request bug: can't convert block across chunks"
- " or bigger than %dk %llu %d\n", mdname(mddev), chunk_sects/2,
- (unsigned long long)bio->bi_iter.bi_sector,
- bio_sectors(bio) / 2);
-
- bio_io_error(bio);
- return;
- }
-
- md_write_start(mddev, bio);
-
- /*
- * Register the new request and wait if the reconstruction
- * thread has put up a bar for new requests.
- * Continue immediately if no resync is active currently.
- */
- wait_barrier(conf);
-
sectors = bio_sectors(bio);
while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
bio->bi_iter.bi_sector < conf->reshape_progress &&
@@ -1600,6 +1535,52 @@ retry_write:
goto retry_write;
}
one_write_done(r10_bio);
+}
+
+static void make_request(struct mddev *mddev, struct bio *bio)
+{
+ struct r10conf *conf = mddev->private;
+ sector_t chunk_mask = (conf->geo.chunk_mask & conf->prev.chunk_mask);
+ int chunk_sects = chunk_mask + 1;
+
+ struct bio *split;
+
+ if (unlikely(bio->bi_rw & REQ_FLUSH)) {
+ md_flush_request(mddev, bio);
+ return;
+ }
+
+ md_write_start(mddev, bio);
+
+ /*
+ * Register the new request and wait if the reconstruction
+ * thread has put up a bar for new requests.
+ * Continue immediately if no resync is active currently.
+ */
+ wait_barrier(conf);
+
+ do {
+
+ /*
+ * If this request crosses a chunk boundary, we need to split
+ * it.
+ */
+ if (unlikely((bio->bi_iter.bi_sector & chunk_mask) +
+ bio_sectors(bio) > chunk_sects
+ && (conf->geo.near_copies < conf->geo.raid_disks
+ || conf->prev.near_copies <
+ conf->prev.raid_disks))) {
+ split = bio_split(bio, chunk_sects -
+ (bio->bi_iter.bi_sector &
+ (chunk_sects - 1)),
+ GFP_NOIO, fs_bio_set);
+ bio_chain(split, bio);
+ } else {
+ split = bio;
+ }
+
+ __make_request(mddev, split);
+ } while (split != bio);

/* In case raid10d snuck in to freeze_array */
wake_up(&conf->wait_barrier);
diff --git a/fs/bio.c b/fs/bio.c
index 5767f97..6e4185a 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1712,6 +1712,79 @@ void bio_endio(struct bio *bio, int error)
}
EXPORT_SYMBOL(bio_endio);

+/**
+ * bio_split - split a bio
+ * @bio: bio to split
+ * @sectors: number of sectors to split from the front of @bio
+ * @gfp: gfp mask
+ * @bs: bio set to allocate from
+ *
+ * Allocates and returns a new bio which represents @sectors from the start of
+ * @bio, and updates @bio to represent the remaining sectors.
+ *
+ * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
+ * unchanged.
+ */
+struct bio *bio_split(struct bio *bio, int sectors,
+ gfp_t gfp, struct bio_set *bs)
+{
+ unsigned vcnt = 0, nbytes = sectors << 9;
+ struct bio_vec bv;
+ struct bvec_iter iter;
+ struct bio *split = NULL;
+
+ BUG_ON(sectors <= 0);
+ BUG_ON(sectors >= bio_sectors(bio));
+
+ if (bio->bi_rw & REQ_DISCARD) {
+ split = bio_alloc_bioset(gfp, 1, bs);
+ if (!split)
+ return NULL;
+ goto out;
+ }
+
+ bio_for_each_segment(bv, bio, iter) {
+ vcnt++;
+
+ if (nbytes <= bv.bv_len)
+ break;
+
+ nbytes -= bv.bv_len;
+ }
+
+ split = bio_alloc_bioset(gfp, vcnt, bs);
+ if (!split)
+ return NULL;
+
+ bio_for_each_segment(bv, bio, iter) {
+ split->bi_io_vec[split->bi_vcnt++] = bv;
+
+ if (split->bi_vcnt == vcnt)
+ break;
+ }
+
+ split->bi_io_vec[split->bi_vcnt - 1].bv_len = nbytes;
+out:
+ split->bi_bdev = bio->bi_bdev;
+ split->bi_iter.bi_sector = bio->bi_iter.bi_sector;
+ split->bi_iter.bi_size = sectors << 9;
+ split->bi_rw = bio->bi_rw;
+
+ if (bio_integrity(bio)) {
+ if (bio_integrity_clone(split, bio, gfp)) {
+ bio_put(split);
+ return NULL;
+ }
+
+ bio_integrity_trim(split, 0, bio_sectors(split));
+ }
+
+ bio_advance(bio, split->bi_iter.bi_size);
+
+ return split;
+}
+EXPORT_SYMBOL(bio_split);
+
void bio_pair_release(struct bio_pair *bp)
{
if (atomic_dec_and_test(&bp->cnt)) {
diff --git a/include/linux/bio.h b/include/linux/bio.h
index bc3fa72..3ed3312 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -302,6 +302,28 @@ extern struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors);
extern void bio_pair_release(struct bio_pair *dbio);
extern void bio_trim(struct bio *bio, int offset, int size);

+extern struct bio *bio_split(struct bio *bio, int sectors,
+ gfp_t gfp, struct bio_set *bs);
+
+/**
+ * bio_next_split - get next @sectors from a bio, splitting if necessary
+ * @bio: bio to split
+ * @sectors: number of sectors to split from the front of @bio
+ * @gfp: gfp mask
+ * @bs: bio set to allocate from
+ *
+ * Returns a bio representing the next @sectors of @bio - if the bio is smaller
+ * than @sectors, returns the original bio unchanged.
+ */
+static inline struct bio *bio_next_split(struct bio *bio, int sectors,
+ gfp_t gfp, struct bio_set *bs)
+{
+ if (sectors >= bio_sectors(bio))
+ return bio;
+
+ return bio_split(bio, sectors, gfp, bs);
+}
+
extern struct bio_set *bioset_create(unsigned int, unsigned int);
extern void bioset_free(struct bio_set *);
extern mempool_t *biovec_create_pool(struct bio_set *bs, int pool_entries);
--
1.8.4.rc1

2013-08-07 21:58:11

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 16/22] dm: Refactor for new bio cloning/splitting

We need to convert the dm code to the new bvec_iter primitives which
respect bi_bvec_done; they also allow us to drastically simplify dm's
bio splitting code.

Also kill bio_sector_offset(), dm was the only user and it doesn't make
much sense anymore.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: [email protected]
---
drivers/md/dm.c | 170 ++++++----------------------------------------------
fs/bio.c | 38 ------------
include/linux/bio.h | 1 -
3 files changed, 18 insertions(+), 191 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 5544af7..696269d 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1050,7 +1050,6 @@ struct clone_info {
struct dm_io *io;
sector_t sector;
sector_t sector_count;
- unsigned short idx;
};

static void bio_setup_sector(struct bio *bio, sector_t sector, sector_t len)
@@ -1059,68 +1058,24 @@ static void bio_setup_sector(struct bio *bio, sector_t sector, sector_t len)
bio->bi_iter.bi_size = to_bytes(len);
}

-static void bio_setup_bv(struct bio *bio, unsigned short idx, unsigned short bv_count)
-{
- bio->bi_iter.bi_idx = idx;
- bio->bi_vcnt = idx + bv_count;
- bio->bi_flags &= ~(1 << BIO_SEG_VALID);
-}
-
-static void clone_bio_integrity(struct bio *bio, struct bio *clone,
- unsigned short idx, unsigned len, unsigned offset,
- unsigned trim)
-{
- if (!bio_integrity(bio))
- return;
-
- bio_integrity_clone(clone, bio, GFP_NOIO);
-
- if (trim)
- bio_integrity_trim(clone, bio_sector_offset(bio, idx, offset), len);
-}
-
-/*
- * Creates a little bio that just does part of a bvec.
- */
-static void clone_split_bio(struct dm_target_io *tio, struct bio *bio,
- sector_t sector, unsigned short idx,
- unsigned offset, unsigned len)
-{
- struct bio *clone = &tio->clone;
- struct bio_vec *bv = bio->bi_io_vec + idx;
-
- *clone->bi_io_vec = *bv;
-
- bio_setup_sector(clone, sector, len);
-
- clone->bi_bdev = bio->bi_bdev;
- clone->bi_rw = bio->bi_rw;
- clone->bi_vcnt = 1;
- clone->bi_io_vec->bv_offset = offset;
- clone->bi_io_vec->bv_len = clone->bi_iter.bi_size;
- clone->bi_flags |= 1 << BIO_CLONED;
-
- clone_bio_integrity(bio, clone, idx, len, offset, 1);
-}
-
/*
* Creates a bio that consists of range of complete bvecs.
*/
static void clone_bio(struct dm_target_io *tio, struct bio *bio,
- sector_t sector, unsigned short idx,
- unsigned short bv_count, unsigned len)
+ sector_t sector, unsigned len)
{
struct bio *clone = &tio->clone;
- unsigned trim = 0;

__bio_clone(clone, bio);
- bio_setup_sector(clone, sector, len);
- bio_setup_bv(clone, idx, bv_count);

- if (idx != bio->bi_iter.bi_idx ||
- clone->bi_iter.bi_size < bio->bi_iter.bi_size)
- trim = 1;
- clone_bio_integrity(bio, clone, idx, len, 0, trim);
+ if (bio_integrity(bio))
+ bio_integrity_clone(clone, bio, GFP_NOIO);
+
+ bio_advance(clone, (sector - clone->bi_iter.bi_sector) << 9);
+ bio->bi_iter.bi_size = len << 9;
+
+ if (bio_integrity(bio))
+ bio_integrity_trim(clone, 0, len);
}

static struct dm_target_io *alloc_tio(struct clone_info *ci,
@@ -1182,10 +1137,7 @@ static int __send_empty_flush(struct clone_info *ci)
}

static void __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti,
- sector_t sector, int nr_iovecs,
- unsigned short idx, unsigned short bv_count,
- unsigned offset, unsigned len,
- unsigned split_bvec)
+ sector_t sector, unsigned len)
{
struct bio *bio = ci->bio;
struct dm_target_io *tio;
@@ -1199,11 +1151,8 @@ static void __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti
num_target_bios = ti->num_write_bios(ti, bio);

for (target_bio_nr = 0; target_bio_nr < num_target_bios; target_bio_nr++) {
- tio = alloc_tio(ci, ti, nr_iovecs, target_bio_nr);
- if (split_bvec)
- clone_split_bio(tio, bio, sector, idx, offset, len);
- else
- clone_bio(tio, bio, sector, idx, bv_count, len);
+ tio = alloc_tio(ci, ti, 0, target_bio_nr);
+ clone_bio(tio, bio, sector, len);
__map_bio(tio);
}
}
@@ -1275,68 +1224,13 @@ static int __send_write_same(struct clone_info *ci)
}

/*
- * Find maximum number of sectors / bvecs we can process with a single bio.
- */
-static sector_t __len_within_target(struct clone_info *ci, sector_t max, int *idx)
-{
- struct bio *bio = ci->bio;
- sector_t bv_len, total_len = 0;
-
- for (*idx = ci->idx; max && (*idx < bio->bi_vcnt); (*idx)++) {
- bv_len = to_sector(bio->bi_io_vec[*idx].bv_len);
-
- if (bv_len > max)
- break;
-
- max -= bv_len;
- total_len += bv_len;
- }
-
- return total_len;
-}
-
-static int __split_bvec_across_targets(struct clone_info *ci,
- struct dm_target *ti, sector_t max)
-{
- struct bio *bio = ci->bio;
- struct bio_vec *bv = bio->bi_io_vec + ci->idx;
- sector_t remaining = to_sector(bv->bv_len);
- unsigned offset = 0;
- sector_t len;
-
- do {
- if (offset) {
- ti = dm_table_find_target(ci->map, ci->sector);
- if (!dm_target_is_valid(ti))
- return -EIO;
-
- max = max_io_len(ci->sector, ti);
- }
-
- len = min(remaining, max);
-
- __clone_and_map_data_bio(ci, ti, ci->sector, 1, ci->idx, 0,
- bv->bv_offset + offset, len, 1);
-
- ci->sector += len;
- ci->sector_count -= len;
- offset += to_bytes(len);
- } while (remaining -= len);
-
- ci->idx++;
-
- return 0;
-}
-
-/*
* Select the correct strategy for processing a non-flush bio.
*/
static int __split_and_process_non_flush(struct clone_info *ci)
{
struct bio *bio = ci->bio;
struct dm_target *ti;
- sector_t len, max;
- int idx;
+ unsigned len;

if (unlikely(bio->bi_rw & REQ_DISCARD))
return __send_discard(ci);
@@ -1347,41 +1241,14 @@ static int __split_and_process_non_flush(struct clone_info *ci)
if (!dm_target_is_valid(ti))
return -EIO;

- max = max_io_len(ci->sector, ti);
-
- /*
- * Optimise for the simple case where we can do all of
- * the remaining io with a single clone.
- */
- if (ci->sector_count <= max) {
- __clone_and_map_data_bio(ci, ti, ci->sector, bio->bi_max_vecs,
- ci->idx, bio->bi_vcnt - ci->idx, 0,
- ci->sector_count, 0);
- ci->sector_count = 0;
- return 0;
- }
-
- /*
- * There are some bvecs that don't span targets.
- * Do as many of these as possible.
- */
- if (to_sector(bio->bi_io_vec[ci->idx].bv_len) <= max) {
- len = __len_within_target(ci, max, &idx);
-
- __clone_and_map_data_bio(ci, ti, ci->sector, bio->bi_max_vecs,
- ci->idx, idx - ci->idx, 0, len, 0);
+ len = min_t(unsigned, max_io_len(ci->sector, ti), bio_sectors(bio));

- ci->sector += len;
- ci->sector_count -= len;
- ci->idx = idx;
+ __clone_and_map_data_bio(ci, ti, ci->sector, len);

- return 0;
- }
+ ci->sector += len;
+ ci->sector_count -= len;

- /*
- * Handle a bvec that must be split between two or more targets.
- */
- return __split_bvec_across_targets(ci, ti, max);
+ return 0;
}

/*
@@ -1407,7 +1274,6 @@ static void __split_and_process_bio(struct mapped_device *md,
ci.io->md = md;
spin_lock_init(&ci.io->endio_lock);
ci.sector = bio->bi_iter.bi_sector;
- ci.idx = bio->bi_iter.bi_idx;

start_io_acct(ci.io);

diff --git a/fs/bio.c b/fs/bio.c
index 46cf8a6..bb62198 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1809,44 +1809,6 @@ void bio_trim(struct bio *bio, int offset, int size)
}
EXPORT_SYMBOL_GPL(bio_trim);

-/**
- * bio_sector_offset - Find hardware sector offset in bio
- * @bio: bio to inspect
- * @index: bio_vec index
- * @offset: offset in bv_page
- *
- * Return the number of hardware sectors between beginning of bio
- * and an end point indicated by a bio_vec index and an offset
- * within that vector's page.
- */
-sector_t bio_sector_offset(struct bio *bio, unsigned short index,
- unsigned int offset)
-{
- unsigned int sector_sz;
- struct bio_vec *bv;
- sector_t sectors;
- int i;
-
- sector_sz = queue_logical_block_size(bio->bi_bdev->bd_disk->queue);
- sectors = 0;
-
- if (index >= bio->bi_iter.bi_idx)
- index = bio->bi_vcnt - 1;
-
- bio_for_each_segment_all(bv, bio, i) {
- if (i == index) {
- if (offset > bv->bv_offset)
- sectors += (offset - bv->bv_offset) / sector_sz;
- break;
- }
-
- sectors += bv->bv_len / sector_sz;
- }
-
- return sectors;
-}
-EXPORT_SYMBOL(bio_sector_offset);
-
/*
* create memory pools for biovec's in a bio_set.
* use the global biovec slabs created for general use.
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 486a997..e9a4fce 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -349,7 +349,6 @@ extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
unsigned int, unsigned int);
extern int bio_get_nr_vecs(struct block_device *);
-extern sector_t bio_sector_offset(struct bio *, unsigned short, unsigned int);
extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
unsigned long, unsigned int, int, gfp_t);
struct sg_iovec;
--
1.8.4.rc1

2013-08-07 21:58:08

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 17/22] block: Remove bi_idx hacks

Now that drivers have been converted to the new bvec_iter primitives,
there's no need to trim the bvec before we submit it; and we can't trim
it once we start sharing bvecs.

It used to be that passing a partially completed bio (i.e. one with
nonzero bi_idx) to generic_make_request() was a dangerous thing -
various drivers would choke on such things. But with immutable biovecs
and our new bio splitting that shares the biovecs, submitting partially
completed bios has to work (and should work, now that all the drivers
have been completed to the new primitives)

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Neil Brown <[email protected]>
---
drivers/md/bcache/io.c | 47 ++---------------------------------------------
fs/bio.c | 23 -----------------------
2 files changed, 2 insertions(+), 68 deletions(-)

diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 6e04f3b..0f0ab65 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -11,49 +11,6 @@

#include <linux/blkdev.h>

-static void bch_bi_idx_hack_endio(struct bio *bio, int error)
-{
- struct bio *p = bio->bi_private;
-
- bio_endio(p, error);
- bio_put(bio);
-}
-
-static void bch_generic_make_request_hack(struct bio *bio)
-{
- if (bio->bi_iter.bi_idx) {
- struct bio_vec bv;
- struct bvec_iter iter;
- unsigned segs = bio_segments(bio);
- struct bio *clone = bio_alloc(GFP_NOIO, segs);
-
- bio_for_each_segment(bv, bio, iter)
- clone->bi_io_vec[clone->bi_vcnt++] = bv;
-
- clone->bi_iter.bi_sector = bio->bi_iter.bi_sector;
- clone->bi_bdev = bio->bi_bdev;
- clone->bi_rw = bio->bi_rw;
- clone->bi_vcnt = segs;
- clone->bi_iter.bi_size = bio->bi_iter.bi_size;
-
- clone->bi_private = bio;
- clone->bi_end_io = bch_bi_idx_hack_endio;
-
- bio = clone;
- }
-
- /*
- * Hack, since drivers that clone bios clone up to bi_max_vecs, but our
- * bios might have had more than that (before we split them per device
- * limitations).
- *
- * To be taken out once immutable bvec stuff is in.
- */
- bio->bi_max_vecs = bio->bi_vcnt;
-
- generic_make_request(bio);
-}
-
/**
* bch_bio_split - split a bio
* @bio: bio to split
@@ -222,12 +179,12 @@ void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
n->bi_private = &s->cl;

closure_get(&s->cl);
- bch_generic_make_request_hack(n);
+ generic_make_request(n);
} while (n != bio);

continue_at(&s->cl, bch_bio_submit_split_done, NULL);
submit:
- bch_generic_make_request_hack(bio);
+ generic_make_request(bio);
}

/* Bios with headers */
diff --git a/fs/bio.c b/fs/bio.c
index bb62198..7d14b79 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1772,11 +1772,7 @@ void bio_trim(struct bio *bio, int offset, int size)
{
/* 'bio' is a cloned bio which we need to trim to match
* the given offset and size.
- * This requires adjusting bi_sector, bi_size, and bi_io_vec
*/
- int i;
- struct bio_vec *bvec;
- int sofar = 0;

size <<= 9;
if (offset == 0 && size == bio->bi_iter.bi_size)
@@ -1787,25 +1783,6 @@ void bio_trim(struct bio *bio, int offset, int size)
bio_advance(bio, offset << 9);

bio->bi_iter.bi_size = size;
-
- /* avoid any complications with bi_idx being non-zero*/
- if (bio->bi_iter.bi_idx) {
- memmove(bio->bi_io_vec, bio->bi_io_vec+bio->bi_iter.bi_idx,
- (bio->bi_vcnt - bio->bi_iter.bi_idx) *
- sizeof(struct bio_vec));
- bio->bi_vcnt -= bio->bi_iter.bi_idx;
- bio->bi_iter.bi_idx = 0;
- }
- /* Make sure vcnt and last bv are not too big */
- bio_for_each_segment_all(bvec, bio, i) {
- if (sofar + bvec->bv_len > size)
- bvec->bv_len = size - sofar;
- if (bvec->bv_len == 0) {
- bio->bi_vcnt = i;
- break;
- }
- sofar += bvec->bv_len;
- }
}
EXPORT_SYMBOL_GPL(bio_trim);

--
1.8.4.rc1

2013-08-07 21:58:05

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 18/22] block: Generic bio chaining

This adds a generic mechanism for chaining bio completions. This is
going to be used for a bio_split() replacement, and some other things in
the future.

This is implemented with a new bio flag that bio_endio() checks; it
would definitely be cleaner to implement chaining with a bi_end_io
function, but since there's no limits on the depth of a bio chain (and
with arbitrary bio splitting coming this is going to be a real issue)
using an endio function would lead to unbounded stack usage.

Tail call optimization could solve that, but CONFIG_FRAME_POINTER
disables gcc's tail call optimization (-fno-optimize-sibling-calls) - so
we do it the hacky but safe way.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
---
drivers/md/bcache/io.c | 2 +-
fs/bio.c | 45 +++++++++++++++++++++++++++++++++++++++------
include/linux/bio.h | 1 +
include/linux/blk_types.h | 7 +++++--
4 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 0f0ab65..10f6065 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -133,7 +133,7 @@ static void bch_bio_submit_split_done(struct closure *cl)

s->bio->bi_end_io = s->bi_end_io;
s->bio->bi_private = s->bi_private;
- bio_endio(s->bio, 0);
+ s->bio->bi_end_io(s->bio, 0);

closure_debug_destroy(&s->cl);
mempool_free(s, s->p->bio_split_hook);
diff --git a/fs/bio.c b/fs/bio.c
index 7d14b79..7737984 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -273,6 +273,7 @@ void bio_init(struct bio *bio)
{
memset(bio, 0, sizeof(*bio));
bio->bi_flags = 1 << BIO_UPTODATE;
+ atomic_set(&bio->bi_remaining, 1);
atomic_set(&bio->bi_cnt, 1);
}
EXPORT_SYMBOL(bio_init);
@@ -295,9 +296,29 @@ void bio_reset(struct bio *bio)

memset(bio, 0, BIO_RESET_BYTES);
bio->bi_flags = flags|(1 << BIO_UPTODATE);
+ atomic_set(&bio->bi_remaining, 1);
}
EXPORT_SYMBOL(bio_reset);

+/**
+ * bio_chain - chain bio completions
+ *
+ * The caller won't have a bi_end_io called when @bio completes - instead,
+ * @parent's bi_end_io won't be called until both @parent and @bio have
+ * completed.
+ *
+ * The caller must not set bi_private or bi_end_io in @bio.
+ */
+void bio_chain(struct bio *bio, struct bio *parent)
+{
+ BUG_ON(bio->bi_private || bio->bi_end_io);
+
+ bio->bi_flags |= 1 << BIO_CHAINED;
+ bio->bi_private = parent;
+ atomic_inc(&parent->bi_remaining);
+}
+EXPORT_SYMBOL(bio_chain);
+
static void bio_alloc_rescue(struct work_struct *work)
{
struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
@@ -1669,13 +1690,25 @@ EXPORT_SYMBOL(bio_flush_dcache_pages);
**/
void bio_endio(struct bio *bio, int error)
{
- if (error)
- clear_bit(BIO_UPTODATE, &bio->bi_flags);
- else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
- error = -EIO;
+ while (bio) {
+ BUG_ON(atomic_read(&bio->bi_remaining) <= 0);
+
+ if (error)
+ clear_bit(BIO_UPTODATE, &bio->bi_flags);
+ else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
+ error = -EIO;
+
+ if (!atomic_dec_and_test(&bio->bi_remaining))
+ return;

- if (bio->bi_end_io)
- bio->bi_end_io(bio, error);
+ if (bio_flagged(bio, BIO_CHAINED)) {
+ bio = bio->bi_private;
+ } else {
+ if (bio->bi_end_io)
+ bio->bi_end_io(bio, error);
+ bio = NULL;
+ }
+ }
}
EXPORT_SYMBOL(bio_endio);

diff --git a/include/linux/bio.h b/include/linux/bio.h
index e9a4fce..1b06bcb 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -344,6 +344,7 @@ extern void bio_advance(struct bio *, unsigned);

extern void bio_init(struct bio *);
extern void bio_reset(struct bio *);
+void bio_chain(struct bio *, struct bio *);

extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 72f1274..69f5c0d 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -64,6 +64,8 @@ struct bio {
unsigned int bi_seg_front_size;
unsigned int bi_seg_back_size;

+ atomic_t bi_remaining;
+
bio_end_io_t *bi_end_io;

void *bi_private;
@@ -119,13 +121,14 @@ struct bio {
#define BIO_QUIET 10 /* Make BIO Quiet */
#define BIO_MAPPED_INTEGRITY 11/* integrity metadata has been remapped */
#define BIO_SNAP_STABLE 12 /* bio data must be snapshotted during write */
+#define BIO_CHAINED 13 /* bi_private points to a parent bio */

/*
* Flags starting here get preserved by bio_reset() - this includes
* BIO_POOL_IDX()
*/
-#define BIO_RESET_BITS 13
-#define BIO_OWNS_VEC 13 /* bio_free() should free bvec */
+#define BIO_RESET_BITS 14
+#define BIO_OWNS_VEC 14 /* bio_free() should free bvec */

#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))

--
1.8.4.rc1

2013-08-07 21:58:03

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 19/22] block: Rename bio_split() -> bio_pair_split()

This is prep work for introducing a more general bio_split().

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Lars Ellenberg <[email protected]>
Cc: Peter Osterlund <[email protected]>
Cc: Sage Weil <[email protected]>
---
drivers/block/pktcdvd.c | 2 +-
drivers/md/linear.c | 2 +-
drivers/md/raid0.c | 6 +++---
drivers/md/raid10.c | 2 +-
fs/bio.c | 4 ++--
include/linux/bio.h | 2 +-
6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index f16dfca..c4c8871 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2413,7 +2413,7 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
if (last_zone != zone) {
BUG_ON(last_zone != zone + pd->settings.size);
first_sectors = last_zone - bio->bi_iter.bi_sector;
- bp = bio_split(bio, first_sectors);
+ bp = bio_pair_split(bio, first_sectors);
BUG_ON(!bp);
pkt_make_request(q, &bp->bio1);
pkt_make_request(q, &bp->bio2);
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index fb3b0d0..e9b53e9 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -326,7 +326,7 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)

rcu_read_unlock();

- bp = bio_split(bio, end_sector - bio->bi_iter.bi_sector);
+ bp = bio_pair_split(bio, end_sector - bio->bi_iter.bi_sector);

linear_make_request(mddev, &bp->bio1);
linear_make_request(mddev, &bp->bio2);
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 8ee1a6c..ea754dd 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -534,11 +534,11 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
* refuse to split for us, so we need to split it.
*/
if (likely(is_power_of_2(chunk_sects)))
- bp = bio_split(bio, chunk_sects - (sector &
+ bp = bio_pair_split(bio, chunk_sects - (sector &
(chunk_sects-1)));
else
- bp = bio_split(bio, chunk_sects -
- sector_div(sector, chunk_sects));
+ bp = bio_pair_split(bio, chunk_sects -
+ sector_div(sector, chunk_sects));
raid0_make_request(mddev, &bp->bio1);
raid0_make_request(mddev, &bp->bio2);
bio_pair_release(bp);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 2303f59..7e49cbe 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1193,7 +1193,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
/* This is a one page bio that upper layers
* refuse to split for us, so we need to split it.
*/
- bp = bio_split(bio, chunk_sects -
+ bp = bio_pair_split(bio, chunk_sects -
(bio->bi_iter.bi_sector & (chunk_sects - 1)));

/* Each of these 'make_request' calls will call 'wait_barrier'.
diff --git a/fs/bio.c b/fs/bio.c
index 7737984..5767f97 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1746,7 +1746,7 @@ static void bio_pair_end_2(struct bio *bi, int err)
/*
* split a bio - only worry about a bio with a single page in its iovec
*/
-struct bio_pair *bio_split(struct bio *bi, int first_sectors)
+struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
{
struct bio_pair *bp = mempool_alloc(bio_split_pool, GFP_NOIO);

@@ -1793,7 +1793,7 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)

return bp;
}
-EXPORT_SYMBOL(bio_split);
+EXPORT_SYMBOL(bio_pair_split);

/**
* bio_trim - trim a bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1b06bcb..bc3fa72 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -298,7 +298,7 @@ struct bio_pair {
atomic_t cnt;
int error;
};
-extern struct bio_pair *bio_split(struct bio *bi, int first_sectors);
+extern struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors);
extern void bio_pair_release(struct bio_pair *dbio);
extern void bio_trim(struct bio *bio, int offset, int size);

--
1.8.4.rc1

2013-08-07 21:59:33

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 12/22] block: Convert drivers to immutable biovecs

Now that we've got a mechanism for immutable biovecs -
bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
respect it instead of using the bvec array directly.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: "Ed L. Cashin" <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: [email protected]
---
drivers/block/aoe/aoe.h | 10 +---
drivers/block/aoe/aoecmd.c | 127 +++++++++++++++++----------------------------
drivers/block/umem.c | 50 ++++++++----------
drivers/md/dm-crypt.c | 52 ++++++++-----------
drivers/md/dm-io.c | 31 ++++++-----
drivers/md/dm-raid1.c | 8 +--
drivers/md/dm-verity.c | 52 +++++--------------
include/linux/dm-io.h | 4 +-
8 files changed, 131 insertions(+), 203 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index 025c41d..c5e1e9b 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -100,11 +100,8 @@ enum {

struct buf {
ulong nframesout;
- ulong resid;
- ulong bv_resid;
- sector_t sector;
struct bio *bio;
- struct bio_vec *bv;
+ struct bvec_iter iter;
struct request *rq;
};

@@ -120,13 +117,10 @@ struct frame {
ulong waited;
ulong waited_total;
struct aoetgt *t; /* parent target I belong to */
- sector_t lba;
struct sk_buff *skb; /* command skb freed on module exit */
struct sk_buff *r_skb; /* response skb for async processing */
struct buf *buf;
- struct bio_vec *bv;
- ulong bcnt;
- ulong bv_off;
+ struct bvec_iter iter;
char flags;
};

diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index f17260b..23c644e 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -196,8 +196,7 @@ aoe_freetframe(struct frame *f)

t = f->t;
f->buf = NULL;
- f->lba = 0;
- f->bv = NULL;
+ memset(&f->iter, 0, sizeof(f->iter));
f->r_skb = NULL;
f->flags = 0;
list_add(&f->head, &t->ffree);
@@ -295,21 +294,17 @@ newframe(struct aoedev *d)
}

static void
-skb_fillup(struct sk_buff *skb, struct bio_vec *bv, ulong off, ulong cnt)
+skb_fillup(struct sk_buff *skb, struct bio *bio, struct bvec_iter *iter)
{
int frag = 0;
- ulong fcnt;
-loop:
- fcnt = bv->bv_len - (off - bv->bv_offset);
- if (fcnt > cnt)
- fcnt = cnt;
- skb_fill_page_desc(skb, frag++, bv->bv_page, off, fcnt);
- cnt -= fcnt;
- if (cnt <= 0)
- return;
- bv++;
- off = bv->bv_offset;
- goto loop;
+
+ while (iter->bi_size) {
+ struct bio_vec bv = bio_iter_iovec(bio, *iter);
+
+ skb_fill_page_desc(skb, frag++, bv.bv_page,
+ bv.bv_offset, bv.bv_len);
+ bio_advance_iter(bio, iter, bv.bv_len);
+ }
}

static void
@@ -346,12 +341,10 @@ ata_rw_frameinit(struct frame *f)
t->nout++;
f->waited = 0;
f->waited_total = 0;
- if (f->buf)
- f->lba = f->buf->sector;

/* set up ata header */
- ah->scnt = f->bcnt >> 9;
- put_lba(ah, f->lba);
+ ah->scnt = f->iter.bi_size >> 9;
+ put_lba(ah, f->iter.bi_sector);
if (t->d->flags & DEVFL_EXT) {
ah->aflags |= AOEAFL_EXT;
} else {
@@ -360,11 +353,11 @@ ata_rw_frameinit(struct frame *f)
ah->lba3 |= 0xe0; /* LBA bit + obsolete 0xa0 */
}
if (f->buf && bio_data_dir(f->buf->bio) == WRITE) {
- skb_fillup(skb, f->bv, f->bv_off, f->bcnt);
+ skb->len += f->iter.bi_size;
+ skb->data_len = f->iter.bi_size;
+ skb->truesize += f->iter.bi_size;
+ skb_fillup(skb, f->buf->bio, &f->iter);
ah->aflags |= AOEAFL_WRITE;
- skb->len += f->bcnt;
- skb->data_len = f->bcnt;
- skb->truesize += f->bcnt;
t->wpkts++;
} else {
t->rpkts++;
@@ -383,7 +376,7 @@ aoecmd_ata_rw(struct aoedev *d)
struct aoetgt *t;
struct sk_buff *skb;
struct sk_buff_head queue;
- ulong bcnt, fbcnt;
+ ulong bcnt;

buf = nextbuf(d);
if (buf == NULL)
@@ -395,36 +388,19 @@ aoecmd_ata_rw(struct aoedev *d)
bcnt = d->maxbcnt;
if (bcnt == 0)
bcnt = DEFAULTBCNT;
- if (bcnt > buf->resid)
- bcnt = buf->resid;
- fbcnt = bcnt;
- f->bv = buf->bv;
- f->bv_off = f->bv->bv_offset + (f->bv->bv_len - buf->bv_resid);
- do {
- if (fbcnt < buf->bv_resid) {
- buf->bv_resid -= fbcnt;
- buf->resid -= fbcnt;
- break;
- }
- fbcnt -= buf->bv_resid;
- buf->resid -= buf->bv_resid;
- if (buf->resid == 0) {
- d->ip.buf = NULL;
- break;
- }
- buf->bv++;
- buf->bv_resid = buf->bv->bv_len;
- WARN_ON(buf->bv_resid == 0);
- } while (fbcnt);
+ if (bcnt > buf->iter.bi_size)
+ bcnt = buf->iter.bi_size;
+
+ bio_advance_iter(buf->bio, &buf->iter, bcnt);

/* initialize the headers & frame */
f->buf = buf;
- f->bcnt = bcnt;
+ f->iter = buf->iter;
+ f->iter.bi_size = bcnt;
ata_rw_frameinit(f);

/* mark all tracking fields and load out */
buf->nframesout += 1;
- buf->sector += bcnt >> 9;

skb = skb_clone(f->skb, GFP_ATOMIC);
if (skb) {
@@ -617,10 +593,7 @@ reassign_frame(struct frame *f)
skb = nf->skb;
nf->skb = f->skb;
nf->buf = f->buf;
- nf->bcnt = f->bcnt;
- nf->lba = f->lba;
- nf->bv = f->bv;
- nf->bv_off = f->bv_off;
+ nf->iter = f->iter;
nf->waited = 0;
nf->waited_total = f->waited_total;
nf->sent = f->sent;
@@ -639,6 +612,7 @@ probe(struct aoetgt *t)
struct sk_buff_head queue;
size_t n, m;
int frag;
+ ulong bcnt;

d = t->d;
f = newtframe(d, t);
@@ -652,19 +626,20 @@ probe(struct aoetgt *t)
}
f->flags |= FFL_PROBE;
ifrotate(t);
- f->bcnt = t->d->maxbcnt ? t->d->maxbcnt : DEFAULTBCNT;
+ bcnt = t->d->maxbcnt ? t->d->maxbcnt : DEFAULTBCNT;
+ f->iter.bi_size = bcnt;
ata_rw_frameinit(f);
skb = f->skb;
- for (frag = 0, n = f->bcnt; n > 0; ++frag, n -= m) {
+ for (frag = 0, n = bcnt; n > 0; ++frag, n -= m) {
if (n < PAGE_SIZE)
m = n;
else
m = PAGE_SIZE;
skb_fill_page_desc(skb, frag, empty_page, 0, m);
}
- skb->len += f->bcnt;
- skb->data_len = f->bcnt;
- skb->truesize += f->bcnt;
+ skb->len += bcnt;
+ skb->data_len = bcnt;
+ skb->truesize += bcnt;

skb = skb_clone(f->skb, GFP_ATOMIC);
if (skb) {
@@ -936,12 +911,8 @@ bufinit(struct buf *buf, struct request *rq, struct bio *bio)
memset(buf, 0, sizeof(*buf));
buf->rq = rq;
buf->bio = bio;
- buf->resid = bio->bi_iter.bi_size;
- buf->sector = bio->bi_iter.bi_sector;
+ buf->iter = bio->bi_iter;
bio_pageinc(bio);
- buf->bv = __bio_iovec(bio);
- buf->bv_resid = buf->bv->bv_len;
- WARN_ON(buf->bv_resid == 0);
}

static struct buf *
@@ -1126,24 +1097,23 @@ gettgt(struct aoedev *d, char *addr)
}

static void
-bvcpy(struct bio_vec *bv, ulong off, struct sk_buff *skb, long cnt)
+bvcpy(struct sk_buff *skb, struct bio *bio, struct bvec_iter *iter, long cnt)
{
- ulong fcnt;
char *p;
int soff = 0;
-loop:
- fcnt = bv->bv_len - (off - bv->bv_offset);
- if (fcnt > cnt)
- fcnt = cnt;
- p = page_address(bv->bv_page) + off;
- skb_copy_bits(skb, soff, p, fcnt);
- soff += fcnt;
- cnt -= fcnt;
- if (cnt <= 0)
- return;
- bv++;
- off = bv->bv_offset;
- goto loop;
+
+ do {
+ struct bio_vec bv = bio_iter_iovec(bio, *iter);
+
+ p = page_address(bv.bv_page) + bv.bv_offset;
+ skb_copy_bits(skb, soff, p, bv.bv_len);
+
+ bio_advance_iter(bio, iter, bv.bv_len);
+ soff += bv.bv_len;
+ cnt -= bv.bv_len;
+ if (cnt <= 0)
+ return;
+ } while (cnt > 0);
}

void
@@ -1236,7 +1206,7 @@ noskb: if (buf)
clear_bit(BIO_UPTODATE, &buf->bio->bi_flags);
break;
}
- bvcpy(f->bv, f->bv_off, skb, n);
+ bvcpy(skb, f->buf->bio, &f->iter, n);
case ATA_CMD_PIO_WRITE:
case ATA_CMD_PIO_WRITE_EXT:
spin_lock_irq(&d->lock);
@@ -1279,7 +1249,7 @@ out:

aoe_freetframe(f);

- if (buf && --buf->nframesout == 0 && buf->resid == 0)
+ if (buf && --buf->nframesout == 0 && buf->iter.bi_size == 0)
aoe_end_buf(d, buf);

spin_unlock_irq(&d->lock);
@@ -1734,7 +1704,6 @@ aoe_failbuf(struct aoedev *d, struct buf *buf)
{
if (buf == NULL)
return;
- buf->resid = 0;
clear_bit(BIO_UPTODATE, &buf->bio->bi_flags);
if (buf->nframesout == 0)
aoe_end_buf(d, buf);
diff --git a/drivers/block/umem.c b/drivers/block/umem.c
index dab4f1a..4cf81b5 100644
--- a/drivers/block/umem.c
+++ b/drivers/block/umem.c
@@ -108,8 +108,7 @@ struct cardinfo {
* have been written
*/
struct bio *bio, *currentbio, **biotail;
- int current_idx;
- sector_t current_sector;
+ struct bvec_iter current_iter;

struct request_queue *queue;

@@ -118,7 +117,7 @@ struct cardinfo {
struct mm_dma_desc *desc;
int cnt, headcnt;
struct bio *bio, **biotail;
- int idx;
+ struct bvec_iter iter;
} mm_pages[2];
#define DESC_PER_PAGE ((PAGE_SIZE*2)/sizeof(struct mm_dma_desc))

@@ -344,16 +343,13 @@ static int add_bio(struct cardinfo *card)
dma_addr_t dma_handle;
int offset;
struct bio *bio;
- struct bio_vec *vec;
- int idx;
+ struct bio_vec vec;
int rw;
- int len;

bio = card->currentbio;
if (!bio && card->bio) {
card->currentbio = card->bio;
- card->current_idx = card->bio->bi_iter.bi_idx;
- card->current_sector = card->bio->bi_iter.bi_sector;
+ card->current_iter = card->bio->bi_iter;
card->bio = card->bio->bi_next;
if (card->bio == NULL)
card->biotail = &card->bio;
@@ -362,18 +358,17 @@ static int add_bio(struct cardinfo *card)
}
if (!bio)
return 0;
- idx = card->current_idx;

rw = bio_rw(bio);
if (card->mm_pages[card->Ready].cnt >= DESC_PER_PAGE)
return 0;

- vec = bio_iovec_idx(bio, idx);
- len = vec->bv_len;
+ vec = bio_iter_iovec(bio, card->current_iter);
+
dma_handle = pci_map_page(card->dev,
- vec->bv_page,
- vec->bv_offset,
- len,
+ vec.bv_page,
+ vec.bv_offset,
+ vec.bv_len,
(rw == READ) ?
PCI_DMA_FROMDEVICE : PCI_DMA_TODEVICE);

@@ -381,7 +376,7 @@ static int add_bio(struct cardinfo *card)
desc = &p->desc[p->cnt];
p->cnt++;
if (p->bio == NULL)
- p->idx = idx;
+ p->iter = card->current_iter;
if ((p->biotail) != &bio->bi_next) {
*(p->biotail) = bio;
p->biotail = &(bio->bi_next);
@@ -391,8 +386,8 @@ static int add_bio(struct cardinfo *card)
desc->data_dma_handle = dma_handle;

desc->pci_addr = cpu_to_le64((u64)desc->data_dma_handle);
- desc->local_addr = cpu_to_le64(card->current_sector << 9);
- desc->transfer_size = cpu_to_le32(len);
+ desc->local_addr = cpu_to_le64(card->current_iter.bi_sector << 9);
+ desc->transfer_size = cpu_to_le32(vec.bv_len);
offset = (((char *)&desc->sem_control_bits) - ((char *)p->desc));
desc->sem_addr = cpu_to_le64((u64)(p->page_dma+offset));
desc->zero1 = desc->zero2 = 0;
@@ -407,10 +402,9 @@ static int add_bio(struct cardinfo *card)
desc->control_bits |= cpu_to_le32(DMASCR_TRANSFER_READ);
desc->sem_control_bits = desc->control_bits;

- card->current_sector += (len >> 9);
- idx++;
- card->current_idx = idx;
- if (idx >= bio->bi_vcnt)
+
+ bio_advance_iter(bio, &card->current_iter, vec.bv_len);
+ if (!card->current_iter.bi_size)
card->currentbio = NULL;

return 1;
@@ -439,23 +433,25 @@ static void process_page(unsigned long data)
struct mm_dma_desc *desc = &page->desc[page->headcnt];
int control = le32_to_cpu(desc->sem_control_bits);
int last = 0;
- int idx;
+ struct bio_vec vec;

if (!(control & DMASCR_DMA_COMPLETE)) {
control = dma_status;
last = 1;
}
+
page->headcnt++;
- idx = page->idx;
- page->idx++;
- if (page->idx >= bio->bi_vcnt) {
+ vec = bio_iter_iovec(bio, page->iter);
+ bio_advance_iter(bio, &page->iter, vec.bv_len);
+
+ if (!page->iter.bi_size) {
page->bio = bio->bi_next;
if (page->bio)
- page->idx = page->bio->bi_iter.bi_idx;
+ page->iter = page->bio->bi_iter;
}

pci_unmap_page(card->dev, desc->data_dma_handle,
- bio_iovec_idx(bio, idx)->bv_len,
+ vec.bv_len,
(control & DMASCR_TRANSFER_READ) ?
PCI_DMA_TODEVICE : PCI_DMA_FROMDEVICE);
if (control & DMASCR_HARD_ERROR) {
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index fca3bba..456867b 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -38,10 +38,8 @@ struct convert_context {
struct completion restart;
struct bio *bio_in;
struct bio *bio_out;
- unsigned int offset_in;
- unsigned int offset_out;
- unsigned int idx_in;
- unsigned int idx_out;
+ struct bvec_iter iter_in;
+ struct bvec_iter iter_out;
sector_t cc_sector;
atomic_t cc_pending;
};
@@ -650,10 +648,12 @@ static void crypt_convert_init(struct crypt_config *cc,
{
ctx->bio_in = bio_in;
ctx->bio_out = bio_out;
- ctx->offset_in = 0;
- ctx->offset_out = 0;
- ctx->idx_in = bio_in ? bio_in->bi_iter.bi_idx : 0;
- ctx->idx_out = bio_out ? bio_out->bi_iter.bi_idx : 0;
+
+ if (bio_in)
+ ctx->iter_in = bio_in->bi_iter;
+ if (bio_out)
+ ctx->iter_out = bio_out->bi_iter;
+
ctx->cc_sector = sector + cc->iv_offset;
init_completion(&ctx->restart);
}
@@ -681,8 +681,8 @@ static int crypt_convert_block(struct crypt_config *cc,
struct convert_context *ctx,
struct ablkcipher_request *req)
{
- struct bio_vec *bv_in = bio_iovec_idx(ctx->bio_in, ctx->idx_in);
- struct bio_vec *bv_out = bio_iovec_idx(ctx->bio_out, ctx->idx_out);
+ struct bio_vec bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
+ struct bio_vec bv_out = bio_iter_iovec(ctx->bio_out, ctx->iter_out);
struct dm_crypt_request *dmreq;
u8 *iv;
int r;
@@ -693,24 +693,15 @@ static int crypt_convert_block(struct crypt_config *cc,
dmreq->iv_sector = ctx->cc_sector;
dmreq->ctx = ctx;
sg_init_table(&dmreq->sg_in, 1);
- sg_set_page(&dmreq->sg_in, bv_in->bv_page, 1 << SECTOR_SHIFT,
- bv_in->bv_offset + ctx->offset_in);
+ sg_set_page(&dmreq->sg_in, bv_in.bv_page, 1 << SECTOR_SHIFT,
+ bv_in.bv_offset);

sg_init_table(&dmreq->sg_out, 1);
- sg_set_page(&dmreq->sg_out, bv_out->bv_page, 1 << SECTOR_SHIFT,
- bv_out->bv_offset + ctx->offset_out);
+ sg_set_page(&dmreq->sg_out, bv_out.bv_page, 1 << SECTOR_SHIFT,
+ bv_out.bv_offset);

- ctx->offset_in += 1 << SECTOR_SHIFT;
- if (ctx->offset_in >= bv_in->bv_len) {
- ctx->offset_in = 0;
- ctx->idx_in++;
- }
-
- ctx->offset_out += 1 << SECTOR_SHIFT;
- if (ctx->offset_out >= bv_out->bv_len) {
- ctx->offset_out = 0;
- ctx->idx_out++;
- }
+ bio_advance_iter(ctx->bio_in, &ctx->iter_in, 1 << SECTOR_SHIFT);
+ bio_advance_iter(ctx->bio_out, &ctx->iter_out, 1 << SECTOR_SHIFT);

if (cc->iv_gen_ops) {
r = cc->iv_gen_ops->generator(cc, iv, dmreq);
@@ -761,8 +752,8 @@ static int crypt_convert(struct crypt_config *cc,

atomic_set(&ctx->cc_pending, 1);

- while(ctx->idx_in < ctx->bio_in->bi_vcnt &&
- ctx->idx_out < ctx->bio_out->bi_vcnt) {
+ while (ctx->iter_in.bi_size &&
+ ctx->iter_out.bi_size) {

crypt_alloc_req(cc, ctx);

@@ -1031,7 +1022,7 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
}

/* crypt_convert should have filled the clone bio */
- BUG_ON(io->ctx.idx_out < clone->bi_vcnt);
+ BUG_ON(io->ctx.iter_out.bi_size);

clone->bi_iter.bi_sector = cc->start + io->sector;

@@ -1070,7 +1061,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
}

io->ctx.bio_out = clone;
- io->ctx.idx_out = 0;
+ io->ctx.iter_out = clone->bi_iter;

remaining -= clone->bi_iter.bi_size;
sector += bio_sectors(clone);
@@ -1114,8 +1105,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
crypt_inc_pending(new_io);
crypt_convert_init(cc, &new_io->ctx, NULL,
io->base_bio, sector);
- new_io->ctx.idx_in = io->ctx.idx_in;
- new_io->ctx.offset_in = io->ctx.offset_in;
+ new_io->ctx.iter_in = io->ctx.iter_in;

/*
* Fragments after the first use the base_io
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index a6de5c9..c2a6c34 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -202,26 +202,29 @@ static void list_dp_init(struct dpages *dp, struct page_list *pl, unsigned offse
/*
* Functions for getting the pages from a bvec.
*/
-static void bvec_get_page(struct dpages *dp,
+static void bio_get_page(struct dpages *dp,
struct page **p, unsigned long *len, unsigned *offset)
{
- struct bio_vec *bvec = (struct bio_vec *) dp->context_ptr;
- *p = bvec->bv_page;
- *len = bvec->bv_len;
- *offset = bvec->bv_offset;
+ struct bio *bio = dp->context_ptr;
+ struct bio_vec bvec = bio_iovec(bio);
+ *p = bvec.bv_page;
+ *len = bvec.bv_len;
+ *offset = bvec.bv_offset;
}

-static void bvec_next_page(struct dpages *dp)
+static void bio_next_page(struct dpages *dp)
{
- struct bio_vec *bvec = (struct bio_vec *) dp->context_ptr;
- dp->context_ptr = bvec + 1;
+ struct bio *bio = dp->context_ptr;
+ struct bio_vec bvec = bio_iovec(bio);
+
+ bio_advance(bio, bvec.bv_len);
}

-static void bvec_dp_init(struct dpages *dp, struct bio_vec *bvec)
+static void bio_dp_init(struct dpages *dp, struct bio *bio)
{
- dp->get_page = bvec_get_page;
- dp->next_page = bvec_next_page;
- dp->context_ptr = bvec;
+ dp->get_page = bio_get_page;
+ dp->next_page = bio_next_page;
+ dp->context_ptr = bio;
}

/*
@@ -459,8 +462,8 @@ static int dp_init(struct dm_io_request *io_req, struct dpages *dp,
list_dp_init(dp, io_req->mem.ptr.pl, io_req->mem.offset);
break;

- case DM_IO_BVEC:
- bvec_dp_init(dp, io_req->mem.ptr.bvec);
+ case DM_IO_BIO:
+ bio_dp_init(dp, io_req->mem.ptr.bio);
break;

case DM_IO_VMA:
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index e3efb91..56e8844 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -526,8 +526,8 @@ static void read_async_bio(struct mirror *m, struct bio *bio)
struct dm_io_region io;
struct dm_io_request io_req = {
.bi_rw = READ,
- .mem.type = DM_IO_BVEC,
- .mem.ptr.bvec = bio->bi_io_vec + bio->bi_iter.bi_idx,
+ .mem.type = DM_IO_BIO,
+ .mem.ptr.bio = bio,
.notify.fn = read_callback,
.notify.context = bio,
.client = m->ms->io_client,
@@ -629,8 +629,8 @@ static void do_write(struct mirror_set *ms, struct bio *bio)
struct mirror *m;
struct dm_io_request io_req = {
.bi_rw = WRITE | (bio->bi_rw & WRITE_FLUSH_FUA),
- .mem.type = DM_IO_BVEC,
- .mem.ptr.bvec = bio->bi_io_vec + bio->bi_iter.bi_idx,
+ .mem.type = DM_IO_BIO,
+ .mem.ptr.bio = bio,
.notify.fn = write_callback,
.notify.context = bio,
.client = ms->io_client,
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index 27f9705..64d829a 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -73,15 +73,10 @@ struct dm_verity_io {
sector_t block;
unsigned n_blocks;

- /* saved bio vector */
- struct bio_vec *io_vec;
- unsigned io_vec_size;
+ struct bvec_iter iter;

struct work_struct work;

- /* A space for short vectors; longer vectors are allocated separately. */
- struct bio_vec io_vec_inline[DM_VERITY_IO_VEC_INLINE];
-
/*
* Three variably-size fields follow this struct:
*
@@ -284,9 +279,10 @@ release_ret_r:
static int verity_verify_io(struct dm_verity_io *io)
{
struct dm_verity *v = io->v;
+ struct bio *bio = dm_bio_from_per_bio_data(io,
+ v->ti->per_bio_data_size);
unsigned b;
int i;
- unsigned vector = 0, offset = 0;

for (b = 0; b < io->n_blocks; b++) {
struct shash_desc *desc;
@@ -336,31 +332,22 @@ test_block_hash:
}

todo = 1 << v->data_dev_block_bits;
- do {
- struct bio_vec *bv;
+ while (io->iter.bi_size) {
u8 *page;
- unsigned len;
-
- BUG_ON(vector >= io->io_vec_size);
- bv = &io->io_vec[vector];
- page = kmap_atomic(bv->bv_page);
- len = bv->bv_len - offset;
- if (likely(len >= todo))
- len = todo;
- r = crypto_shash_update(desc,
- page + bv->bv_offset + offset, len);
+ struct bio_vec bv = bio_iter_iovec(bio, io->iter);
+
+ page = kmap_atomic(bv.bv_page);
+ r = crypto_shash_update(desc, page + bv.bv_offset,
+ bv.bv_len);
kunmap_atomic(page);
+
if (r < 0) {
DMERR("crypto_shash_update failed: %d", r);
return r;
}
- offset += len;
- if (likely(offset == bv->bv_len)) {
- offset = 0;
- vector++;
- }
- todo -= len;
- } while (todo);
+
+ bio_advance_iter(bio, &io->iter, bv.bv_len);
+ }

if (!v->version) {
r = crypto_shash_update(desc, v->salt, v->salt_size);
@@ -383,8 +370,6 @@ test_block_hash:
return -EIO;
}
}
- BUG_ON(vector != io->io_vec_size);
- BUG_ON(offset);

return 0;
}
@@ -400,9 +385,6 @@ static void verity_finish_io(struct dm_verity_io *io, int error)
bio->bi_end_io = io->orig_bi_end_io;
bio->bi_private = io->orig_bi_private;

- if (io->io_vec != io->io_vec_inline)
- mempool_free(io->io_vec, v->vec_mempool);
-
bio_endio(bio, error);
}

@@ -520,13 +502,7 @@ static int verity_map(struct dm_target *ti, struct bio *bio)

bio->bi_end_io = verity_end_io;
bio->bi_private = io;
- io->io_vec_size = bio_segments(bio);
- if (io->io_vec_size < DM_VERITY_IO_VEC_INLINE)
- io->io_vec = io->io_vec_inline;
- else
- io->io_vec = mempool_alloc(v->vec_mempool, GFP_NOIO);
- memcpy(io->io_vec, __bio_iovec(bio),
- io->io_vec_size * sizeof(struct bio_vec));
+ io->iter = bio->bi_iter;

verity_submit_prefetch(v, io);

diff --git a/include/linux/dm-io.h b/include/linux/dm-io.h
index f4b0aa3..6cf1f62 100644
--- a/include/linux/dm-io.h
+++ b/include/linux/dm-io.h
@@ -29,7 +29,7 @@ typedef void (*io_notify_fn)(unsigned long error, void *context);

enum dm_io_mem_type {
DM_IO_PAGE_LIST,/* Page list */
- DM_IO_BVEC, /* Bio vector */
+ DM_IO_BIO,
DM_IO_VMA, /* Virtual memory area */
DM_IO_KMEM, /* Kernel memory */
};
@@ -41,7 +41,7 @@ struct dm_io_memory {

union {
struct page_list *pl;
- struct bio_vec *bvec;
+ struct bio *bio;
void *vma;
void *addr;
} ptr;
--
1.8.4.rc1

2013-08-07 22:00:31

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 09/22] block: Convert bio_copy_data() to bvec_iter

Our fancy new bvec iterator makes code like this much easier to write.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
---
fs/bio.c | 60 +++++++++++++++++++++++++-----------------------------------
1 file changed, 25 insertions(+), 35 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 1c10ccc..eab8487 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -852,58 +852,48 @@ EXPORT_SYMBOL(bio_alloc_pages);
*/
void bio_copy_data(struct bio *dst, struct bio *src)
{
- struct bio_vec *src_bv, *dst_bv;
- unsigned src_offset, dst_offset, bytes;
+ struct bvec_iter src_iter, dst_iter;
+ struct bio_vec src_bv, dst_bv;
void *src_p, *dst_p;
+ unsigned bytes;

- src_bv = __bio_iovec(src);
- dst_bv = __bio_iovec(dst);
-
- src_offset = src_bv->bv_offset;
- dst_offset = dst_bv->bv_offset;
+ src_iter = src->bi_iter;
+ dst_iter = dst->bi_iter;

while (1) {
- if (src_offset == src_bv->bv_offset + src_bv->bv_len) {
- src_bv++;
- if (src_bv == bio_iovec_idx(src, src->bi_vcnt)) {
- src = src->bi_next;
- if (!src)
- break;
-
- src_bv = __bio_iovec(src);
- }
+ if (!src_iter.bi_size) {
+ src = src->bi_next;
+ if (!src)
+ break;

- src_offset = src_bv->bv_offset;
+ src_iter = src->bi_iter;
}

- if (dst_offset == dst_bv->bv_offset + dst_bv->bv_len) {
- dst_bv++;
- if (dst_bv == bio_iovec_idx(dst, dst->bi_vcnt)) {
- dst = dst->bi_next;
- if (!dst)
- break;
-
- dst_bv = __bio_iovec(dst);
- }
+ if (!dst_iter.bi_size) {
+ dst = dst->bi_next;
+ if (!dst)
+ break;

- dst_offset = dst_bv->bv_offset;
+ dst_iter = dst->bi_iter;
}

- bytes = min(dst_bv->bv_offset + dst_bv->bv_len - dst_offset,
- src_bv->bv_offset + src_bv->bv_len - src_offset);
+ src_bv = bio_iter_iovec(src, src_iter);
+ dst_bv = bio_iter_iovec(dst, dst_iter);
+
+ bytes = min(src_bv.bv_len, dst_bv.bv_len);

- src_p = kmap_atomic(src_bv->bv_page);
- dst_p = kmap_atomic(dst_bv->bv_page);
+ src_p = kmap_atomic(src_bv.bv_page);
+ dst_p = kmap_atomic(dst_bv.bv_page);

- memcpy(dst_p + dst_bv->bv_offset,
- src_p + src_bv->bv_offset,
+ memcpy(dst_p + dst_bv.bv_offset,
+ src_p + src_bv.bv_offset,
bytes);

kunmap_atomic(dst_p);
kunmap_atomic(src_p);

- src_offset += bytes;
- dst_offset += bytes;
+ bio_advance_iter(src, &src_iter, bytes);
+ bio_advance_iter(dst, &dst_iter, bytes);
}
}
EXPORT_SYMBOL(bio_copy_data);
--
1.8.4.rc1

2013-08-07 22:00:28

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 10/22] bio-integrity: Convert to bvec_iter

The bio integrity is also stored in a bvec array, so if we use the bvec
iter code we just added, the integrity code won't need to implement its
own iteration stuff (bio_integrity_mark_head(), bio_integrity_mark_tail())

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
---
block/blk-integrity.c | 40 ++++++++++---------
drivers/scsi/sd_dif.c | 30 +++++++-------
fs/bio-integrity.c | 108 ++++++++++++--------------------------------------
include/linux/bio.h | 19 ++++-----
4 files changed, 71 insertions(+), 126 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index 03cf717..861fcae 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -43,30 +43,32 @@ static const char *bi_unsupported_name = "unsupported";
*/
int blk_rq_count_integrity_sg(struct request_queue *q, struct bio *bio)
{
- struct bio_vec *iv, *ivprv = NULL;
+ struct bio_vec iv, ivprv;
unsigned int segments = 0;
unsigned int seg_size = 0;
- unsigned int i = 0;
+ struct bvec_iter iter;
+ int prev = 0;

- bio_for_each_integrity_vec(iv, bio, i) {
+ bio_for_each_integrity_vec(iv, bio, iter) {

- if (ivprv) {
- if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))
+ if (prev) {
+ if (!BIOVEC_PHYS_MERGEABLE(&ivprv, &iv))
goto new_segment;

- if (!BIOVEC_SEG_BOUNDARY(q, ivprv, iv))
+ if (!BIOVEC_SEG_BOUNDARY(q, &ivprv, &iv))
goto new_segment;

- if (seg_size + iv->bv_len > queue_max_segment_size(q))
+ if (seg_size + iv.bv_len > queue_max_segment_size(q))
goto new_segment;

- seg_size += iv->bv_len;
+ seg_size += iv.bv_len;
} else {
new_segment:
segments++;
- seg_size = iv->bv_len;
+ seg_size = iv.bv_len;
}

+ prev = 1;
ivprv = iv;
}

@@ -87,24 +89,25 @@ EXPORT_SYMBOL(blk_rq_count_integrity_sg);
int blk_rq_map_integrity_sg(struct request_queue *q, struct bio *bio,
struct scatterlist *sglist)
{
- struct bio_vec *iv, *ivprv = NULL;
+ struct bio_vec iv, ivprv;
struct scatterlist *sg = NULL;
unsigned int segments = 0;
- unsigned int i = 0;
+ struct bvec_iter iter;
+ int prev = 0;

- bio_for_each_integrity_vec(iv, bio, i) {
+ bio_for_each_integrity_vec(iv, bio, iter) {

- if (ivprv) {
- if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))
+ if (prev) {
+ if (!BIOVEC_PHYS_MERGEABLE(&ivprv, &iv))
goto new_segment;

- if (!BIOVEC_SEG_BOUNDARY(q, ivprv, iv))
+ if (!BIOVEC_SEG_BOUNDARY(q, &ivprv, &iv))
goto new_segment;

- if (sg->length + iv->bv_len > queue_max_segment_size(q))
+ if (sg->length + iv.bv_len > queue_max_segment_size(q))
goto new_segment;

- sg->length += iv->bv_len;
+ sg->length += iv.bv_len;
} else {
new_segment:
if (!sg)
@@ -114,10 +117,11 @@ new_segment:
sg = sg_next(sg);
}

- sg_set_page(sg, iv->bv_page, iv->bv_len, iv->bv_offset);
+ sg_set_page(sg, iv.bv_page, iv.bv_len, iv.bv_offset);
segments++;
}

+ prev = 1;
ivprv = iv;
}

diff --git a/drivers/scsi/sd_dif.c b/drivers/scsi/sd_dif.c
index 6174ca4..a7a691d 100644
--- a/drivers/scsi/sd_dif.c
+++ b/drivers/scsi/sd_dif.c
@@ -365,7 +365,6 @@ void sd_dif_prepare(struct request *rq, sector_t hw_sector,
struct bio *bio;
struct scsi_disk *sdkp;
struct sd_dif_tuple *sdt;
- unsigned int i, j;
u32 phys, virt;

sdkp = rq->bio->bi_bdev->bd_disk->private_data;
@@ -376,19 +375,21 @@ void sd_dif_prepare(struct request *rq, sector_t hw_sector,
phys = hw_sector & 0xffffffff;

__rq_for_each_bio(bio, rq) {
- struct bio_vec *iv;
+ struct bio_vec iv;
+ struct bvec_iter iter;
+ unsigned int j;

/* Already remapped? */
if (bio_flagged(bio, BIO_MAPPED_INTEGRITY))
break;

- virt = bio->bi_integrity->bip_sector & 0xffffffff;
+ virt = bio->bi_integrity->bip_iter.bi_sector & 0xffffffff;

- bip_for_each_vec(iv, bio->bi_integrity, i) {
- sdt = kmap_atomic(iv->bv_page)
- + iv->bv_offset;
+ bip_for_each_vec(iv, bio->bi_integrity, iter) {
+ sdt = kmap_atomic(iv.bv_page)
+ + iv.bv_offset;

- for (j = 0 ; j < iv->bv_len ; j += tuple_sz, sdt++) {
+ for (j = 0; j < iv.bv_len; j += tuple_sz, sdt++) {

if (be32_to_cpu(sdt->ref_tag) == virt)
sdt->ref_tag = cpu_to_be32(phys);
@@ -414,7 +415,7 @@ void sd_dif_complete(struct scsi_cmnd *scmd, unsigned int good_bytes)
struct scsi_disk *sdkp;
struct bio *bio;
struct sd_dif_tuple *sdt;
- unsigned int i, j, sectors, sector_sz;
+ unsigned int j, sectors, sector_sz;
u32 phys, virt;

sdkp = scsi_disk(scmd->request->rq_disk);
@@ -430,15 +431,16 @@ void sd_dif_complete(struct scsi_cmnd *scmd, unsigned int good_bytes)
phys >>= 3;

__rq_for_each_bio(bio, scmd->request) {
- struct bio_vec *iv;
+ struct bio_vec iv;
+ struct bvec_iter iter;

- virt = bio->bi_integrity->bip_sector & 0xffffffff;
+ virt = bio->bi_integrity->bip_iter.bi_sector & 0xffffffff;

- bip_for_each_vec(iv, bio->bi_integrity, i) {
- sdt = kmap_atomic(iv->bv_page)
- + iv->bv_offset;
+ bip_for_each_vec(iv, bio->bi_integrity, iter) {
+ sdt = kmap_atomic(iv.bv_page)
+ + iv.bv_offset;

- for (j = 0 ; j < iv->bv_len ; j += tuple_sz, sdt++) {
+ for (j = 0; j < iv.bv_len; j += tuple_sz, sdt++) {

if (sectors == 0) {
kunmap_atomic(sdt);
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index 4220b96..61f41ff 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -134,8 +134,7 @@ int bio_integrity_add_page(struct bio *bio, struct page *page,
return 0;
}

- iv = bip_vec_idx(bip, bip->bip_vcnt);
- BUG_ON(iv == NULL);
+ iv = bip->bip_vec + bip->bip_vcnt;

iv->bv_page = page;
iv->bv_len = len;
@@ -203,6 +202,12 @@ static inline unsigned int bio_integrity_hw_sectors(struct blk_integrity *bi,
return sectors;
}

+static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
+ unsigned int sectors)
+{
+ return bio_integrity_hw_sectors(bi, sectors) * bi->tuple_size;
+}
+
/**
* bio_integrity_tag_size - Retrieve integrity tag space
* @bio: bio to inspect
@@ -235,9 +240,9 @@ int bio_integrity_tag(struct bio *bio, void *tag_buf, unsigned int len, int set)
nr_sectors = bio_integrity_hw_sectors(bi,
DIV_ROUND_UP(len, bi->tag_size));

- if (nr_sectors * bi->tuple_size > bip->bip_size) {
- printk(KERN_ERR "%s: tag too big for bio: %u > %u\n",
- __func__, nr_sectors * bi->tuple_size, bip->bip_size);
+ if (nr_sectors * bi->tuple_size > bip->bip_iter.bi_size) {
+ printk(KERN_ERR "%s: tag too big for bio: %u > %u\n", __func__,
+ nr_sectors * bi->tuple_size, bip->bip_iter.bi_size);
return -1;
}

@@ -322,7 +327,7 @@ static void bio_integrity_generate(struct bio *bio)
sector += sectors;
prot_buf += sectors * bi->tuple_size;
total += sectors * bi->tuple_size;
- BUG_ON(total > bio->bi_integrity->bip_size);
+ BUG_ON(total > bio->bi_integrity->bip_iter.bi_size);

kunmap_atomic(kaddr);
}
@@ -387,8 +392,8 @@ int bio_integrity_prep(struct bio *bio)

bip->bip_owns_buf = 1;
bip->bip_buf = buf;
- bip->bip_size = len;
- bip->bip_sector = bio->bi_iter.bi_sector;
+ bip->bip_iter.bi_size = len;
+ bip->bip_iter.bi_sector = bio->bi_iter.bi_sector;

/* Map it */
offset = offset_in_page(buf);
@@ -444,7 +449,7 @@ static int bio_integrity_verify(struct bio *bio)
struct blk_integrity_exchg bix;
struct bio_vec bv;
struct bvec_iter iter;
- sector_t sector = bio->bi_integrity->bip_sector;
+ sector_t sector = bio->bi_integrity->bip_iter.bi_sector;
unsigned int sectors, total, ret;
void *prot_buf = bio->bi_integrity->bip_buf;

@@ -470,7 +475,7 @@ static int bio_integrity_verify(struct bio *bio)
sector += sectors;
prot_buf += sectors * bi->tuple_size;
total += sectors * bi->tuple_size;
- BUG_ON(total > bio->bi_integrity->bip_size);
+ BUG_ON(total > bio->bi_integrity->bip_iter.bi_size);

kunmap_atomic(kaddr);
}
@@ -535,56 +540,6 @@ void bio_integrity_endio(struct bio *bio, int error)
EXPORT_SYMBOL(bio_integrity_endio);

/**
- * bio_integrity_mark_head - Advance bip_vec skip bytes
- * @bip: Integrity vector to advance
- * @skip: Number of bytes to advance it
- */
-void bio_integrity_mark_head(struct bio_integrity_payload *bip,
- unsigned int skip)
-{
- struct bio_vec *iv;
- unsigned int i;
-
- bip_for_each_vec(iv, bip, i) {
- if (skip == 0) {
- bip->bip_idx = i;
- return;
- } else if (skip >= iv->bv_len) {
- skip -= iv->bv_len;
- } else { /* skip < iv->bv_len) */
- iv->bv_offset += skip;
- iv->bv_len -= skip;
- bip->bip_idx = i;
- return;
- }
- }
-}
-
-/**
- * bio_integrity_mark_tail - Truncate bip_vec to be len bytes long
- * @bip: Integrity vector to truncate
- * @len: New length of integrity vector
- */
-void bio_integrity_mark_tail(struct bio_integrity_payload *bip,
- unsigned int len)
-{
- struct bio_vec *iv;
- unsigned int i;
-
- bip_for_each_vec(iv, bip, i) {
- if (len == 0) {
- bip->bip_vcnt = i;
- return;
- } else if (len >= iv->bv_len) {
- len -= iv->bv_len;
- } else { /* len < iv->bv_len) */
- iv->bv_len = len;
- len = 0;
- }
- }
-}
-
-/**
* bio_integrity_advance - Advance integrity vector
* @bio: bio whose integrity vector to update
* @bytes_done: number of data bytes that have been completed
@@ -597,13 +552,9 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
{
struct bio_integrity_payload *bip = bio->bi_integrity;
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
- unsigned int nr_sectors;
+ unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);

- BUG_ON(bip == NULL);
- BUG_ON(bi == NULL);
-
- nr_sectors = bio_integrity_hw_sectors(bi, bytes_done >> 9);
- bio_integrity_mark_head(bip, nr_sectors * bi->tuple_size);
+ bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
}
EXPORT_SYMBOL(bio_integrity_advance);

@@ -623,16 +574,9 @@ void bio_integrity_trim(struct bio *bio, unsigned int offset,
{
struct bio_integrity_payload *bip = bio->bi_integrity;
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
- unsigned int nr_sectors;

- BUG_ON(bip == NULL);
- BUG_ON(bi == NULL);
- BUG_ON(!bio_flagged(bio, BIO_CLONED));
-
- nr_sectors = bio_integrity_hw_sectors(bi, sectors);
- bip->bip_sector = bip->bip_sector + offset;
- bio_integrity_mark_head(bip, offset * bi->tuple_size);
- bio_integrity_mark_tail(bip, sectors * bi->tuple_size);
+ bio_integrity_advance(bio, offset << 9);
+ bip->bip_iter.bi_size = bio_integrity_bytes(bi, sectors);
}
EXPORT_SYMBOL(bio_integrity_trim);

@@ -662,8 +606,8 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
bp->bio1.bi_integrity = &bp->bip1;
bp->bio2.bi_integrity = &bp->bip2;

- bp->iv1 = bip->bip_vec[bip->bip_idx];
- bp->iv2 = bip->bip_vec[bip->bip_idx];
+ bp->iv1 = bip->bip_vec[bip->bip_iter.bi_idx];
+ bp->iv2 = bip->bip_vec[bip->bip_iter.bi_idx];

bp->bip1.bip_vec = &bp->iv1;
bp->bip2.bip_vec = &bp->iv2;
@@ -672,11 +616,12 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
bp->iv2.bv_offset += sectors * bi->tuple_size;
bp->iv2.bv_len -= sectors * bi->tuple_size;

- bp->bip1.bip_sector = bio->bi_integrity->bip_sector;
- bp->bip2.bip_sector = bio->bi_integrity->bip_sector + nr_sectors;
+ bp->bip1.bip_iter.bi_sector = bio->bi_integrity->bip_iter.bi_sector;
+ bp->bip2.bip_iter.bi_sector =
+ bio->bi_integrity->bip_iter.bi_sector + nr_sectors;

bp->bip1.bip_vcnt = bp->bip2.bip_vcnt = 1;
- bp->bip1.bip_idx = bp->bip2.bip_idx = 0;
+ bp->bip1.bip_iter.bi_idx = bp->bip2.bip_iter.bi_idx = 0;
}
EXPORT_SYMBOL(bio_integrity_split);

@@ -704,9 +649,8 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
memcpy(bip->bip_vec, bip_src->bip_vec,
bip_src->bip_vcnt * sizeof(struct bio_vec));

- bip->bip_sector = bip_src->bip_sector;
bip->bip_vcnt = bip_src->bip_vcnt;
- bip->bip_idx = bip_src->bip_idx;
+ bip->bip_iter = bip_src->bip_iter;

return 0;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 151868e..57a6f40 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -244,16 +244,15 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
struct bio_integrity_payload {
struct bio *bip_bio; /* parent bio */

- sector_t bip_sector; /* virtual start sector */
+ struct bvec_iter bip_iter;

+ /* kill - should just use bip_vec */
void *bip_buf; /* generated integrity data */
- bio_end_io_t *bip_end_io; /* saved I/O completion fn */

- unsigned int bip_size;
+ bio_end_io_t *bip_end_io; /* saved I/O completion fn */

unsigned short bip_slab; /* slab the bip came from */
unsigned short bip_vcnt; /* # of integrity bio_vecs */
- unsigned short bip_idx; /* current bip_vec index */
unsigned bip_owns_buf:1; /* should free bip_buf */

struct work_struct bip_work; /* I/O completion */
@@ -624,16 +623,12 @@ struct biovec_slab {

#if defined(CONFIG_BLK_DEV_INTEGRITY)

-#define bip_vec_idx(bip, idx) (&(bip->bip_vec[(idx)]))
-#define bip_vec(bip) bip_vec_idx(bip, 0)

-#define __bip_for_each_vec(bvl, bip, i, start_idx) \
- for (bvl = bip_vec_idx((bip), (start_idx)), i = (start_idx); \
- i < (bip)->bip_vcnt; \
- bvl++, i++)

-#define bip_for_each_vec(bvl, bip, i) \
- __bip_for_each_vec(bvl, bip, i, (bip)->bip_idx)
+#define bip_vec_idx(bip, idx) (&(bip->bip_vec[(idx)]))
+
+#define bip_for_each_vec(bvl, bip, iter) \
+ for_each_bvec(bvl, (bip)->bip_vec, iter, (bip)->bip_iter)

#define bio_for_each_integrity_vec(_bvl, _bio, _iter) \
for_each_bio(_bio) \
--
1.8.4.rc1

2013-08-07 22:00:20

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 11/22] block: Kill bio_segments()/bi_vcnt usage

When we start sharing biovecs, keeping bi_vcnt accurate for splits is
going to be error prone - and unnecessary, if we refactor some code.

So bio_segments() has to go - but most of the existing users just needed
to know if the bio had multiple segments, which is easier - add a
bio_multiple_segments() for them.

(Two of the current uses of bio_segments() are going to go away in a
couple patches, but the current implementation of bio_segments() is
unsafe as soon as we start doing driver conversions for immutable
biovecs - so implement a dumb version for bisectability, it'll go away
in a couple patches)

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Nagalakshmi Nandigama <[email protected]>
Cc: Sreekanth Reddy <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
---
drivers/block/ps3disk.c | 7 ++---
drivers/md/bcache/io.c | 53 ++++++++++++++------------------
drivers/md/raid0.c | 2 +-
drivers/md/raid10.c | 2 +-
drivers/message/fusion/mptsas.c | 8 ++---
drivers/scsi/libsas/sas_expander.c | 8 ++---
drivers/scsi/mpt2sas/mpt2sas_transport.c | 10 +++---
drivers/scsi/mpt3sas/mpt3sas_transport.c | 8 ++---
fs/bio.c | 2 +-
include/linux/bio.h | 43 +++++++++++++++++---------
10 files changed, 75 insertions(+), 68 deletions(-)

diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 464be78..8d1a19c 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -101,10 +101,9 @@ static void ps3disk_scatter_gather(struct ps3_storage_device *dev,

rq_for_each_segment(bvec, req, iter) {
unsigned long flags;
- dev_dbg(&dev->sbd.core,
- "%s:%u: bio %u: %u segs %u sectors from %lu\n",
- __func__, __LINE__, i, bio_segments(iter.bio),
- bio_sectors(iter.bio), iter.bio->bi_iter.bi_sector);
+ dev_dbg(&dev->sbd.core, "%s:%u: bio %u: %u sectors from %lu\n",
+ __func__, __LINE__, i, bio_sectors(iter.bio),
+ iter.bio->bi_iter.bi_sector);

size = bvec->bv_len;
buf = bvec_kmap_irq(bvec, &flags);
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 9b5b6a4..6e04f3b 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -24,7 +24,8 @@ static void bch_generic_make_request_hack(struct bio *bio)
if (bio->bi_iter.bi_idx) {
struct bio_vec bv;
struct bvec_iter iter;
- struct bio *clone = bio_alloc(GFP_NOIO, bio_segments(bio));
+ unsigned segs = bio_segments(bio);
+ struct bio *clone = bio_alloc(GFP_NOIO, segs);

bio_for_each_segment(bv, bio, iter)
clone->bi_io_vec[clone->bi_vcnt++] = bv;
@@ -32,7 +33,7 @@ static void bch_generic_make_request_hack(struct bio *bio)
clone->bi_iter.bi_sector = bio->bi_iter.bi_sector;
clone->bi_bdev = bio->bi_bdev;
clone->bi_rw = bio->bi_rw;
- clone->bi_vcnt = bio_segments(bio);
+ clone->bi_vcnt = segs;
clone->bi_iter.bi_size = bio->bi_iter.bi_size;

clone->bi_private = bio;
@@ -133,40 +134,32 @@ out:

static unsigned bch_bio_max_sectors(struct bio *bio)
{
- unsigned ret = bio_sectors(bio);
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
- unsigned max_segments = min_t(unsigned, BIO_MAX_PAGES,
- queue_max_segments(q));
+ struct bio_vec bv;
+ struct bvec_iter iter;
+ unsigned ret = 0, seg = 0;

if (bio->bi_rw & REQ_DISCARD)
- return min(ret, q->limits.max_discard_sectors);
-
- if (bio_segments(bio) > max_segments ||
- q->merge_bvec_fn) {
- struct bio_vec bv;
- struct bvec_iter iter;
- unsigned seg = 0;
-
- ret = 0;
+ return min(bio_sectors(bio), q->limits.max_discard_sectors);

- bio_for_each_segment(bv, bio, iter) {
- struct bvec_merge_data bvm = {
- .bi_bdev = bio->bi_bdev,
- .bi_sector = bio->bi_iter.bi_sector,
- .bi_size = ret << 9,
- .bi_rw = bio->bi_rw,
- };
-
- if (seg == max_segments)
- break;
+ bio_for_each_segment(bv, bio, iter) {
+ struct bvec_merge_data bvm = {
+ .bi_bdev = bio->bi_bdev,
+ .bi_sector = bio->bi_iter.bi_sector,
+ .bi_size = ret << 9,
+ .bi_rw = bio->bi_rw,
+ };
+
+ if (seg == min_t(unsigned, BIO_MAX_PAGES,
+ queue_max_segments(q)))
+ break;

- if (q->merge_bvec_fn &&
- q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
- break;
+ if (q->merge_bvec_fn &&
+ q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
+ break;

- seg++;
- ret += bv.bv_len >> 9;
- }
+ seg++;
+ ret += bv.bv_len >> 9;
}

ret = min(ret, queue_max_sectors(q));
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index e38d1d3..8ee1a6c 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -528,7 +528,7 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
sector_t sector = bio->bi_iter.bi_sector;
struct bio_pair *bp;
/* Sanity check -- queue functions should prevent this happening */
- if (bio_segments(bio) > 1)
+ if (bio_multiple_segments(bio))
goto bad_map;
/* This is a one page bio that upper layers
* refuse to split for us, so we need to split it.
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index fca8887..2303f59 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1188,7 +1188,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
|| conf->prev.near_copies < conf->prev.raid_disks))) {
struct bio_pair *bp;
/* Sanity check -- queue functions should prevent this happening */
- if (bio_segments(bio) > 1)
+ if (bio_multiple_segments(bio))
goto bad_map;
/* This is a one page bio that upper layers
* refuse to split for us, so we need to split it.
diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index dd239bd..00d339c 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -2235,10 +2235,10 @@ static int mptsas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
}

/* do we need to support multiple segments? */
- if (bio_segments(req->bio) > 1 || bio_segments(rsp->bio) > 1) {
- printk(MYIOC_s_ERR_FMT "%s: multiple segments req %u %u, rsp %u %u\n",
- ioc->name, __func__, bio_segments(req->bio), blk_rq_bytes(req),
- bio_segments(rsp->bio), blk_rq_bytes(rsp));
+ if (bio_multiple_segments(req->bio) ||
+ bio_multiple_segments(rsp->bio)) {
+ printk(MYIOC_s_ERR_FMT "%s: multiple segments req %u, rsp %u\n",
+ ioc->name, __func__, blk_rq_bytes(req), blk_rq_bytes(rsp));
return -EINVAL;
}

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 446b851..0cac7d8 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -2163,10 +2163,10 @@ int sas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
}

/* do we need to support multiple segments? */
- if (bio_segments(req->bio) > 1 || bio_segments(rsp->bio) > 1) {
- printk("%s: multiple segments req %u %u, rsp %u %u\n",
- __func__, bio_segments(req->bio), blk_rq_bytes(req),
- bio_segments(rsp->bio), blk_rq_bytes(rsp));
+ if (bio_multiple_segments(req->bio) ||
+ bio_multiple_segments(rsp->bio)) {
+ printk("%s: multiple segments req %u, rsp %u\n",
+ __func__, blk_rq_bytes(req), blk_rq_bytes(rsp));
return -EINVAL;
}

diff --git a/drivers/scsi/mpt2sas/mpt2sas_transport.c b/drivers/scsi/mpt2sas/mpt2sas_transport.c
index 2c2e01e..d2224b5 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_transport.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_transport.c
@@ -1940,7 +1940,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
ioc->transport_cmds.status = MPT2_CMD_PENDING;

/* Check if the request is split across multiple segments */
- if (bio_segments(req->bio) > 1) {
+ if (bio_multiple_segments(req->bio)) {
u32 offset = 0;

/* Allocate memory and copy the request */
@@ -1972,7 +1972,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,

/* Check if the response needs to be populated across
* multiple segments */
- if (bio_segments(rsp->bio) > 1) {
+ if (bio_multiple_segments(rsp->bio)) {
pci_addr_in = pci_alloc_consistent(ioc->pdev, blk_rq_bytes(rsp),
&pci_dma_in);
if (!pci_addr_in) {
@@ -2039,7 +2039,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
MPI2_SGE_FLAGS_END_OF_BUFFER | MPI2_SGE_FLAGS_HOST_TO_IOC);
sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
- if (bio_segments(req->bio) > 1) {
+ if (bio_multiple_segments(req->bio)) {
ioc->base_add_sg_single(psge, sgl_flags |
(blk_rq_bytes(req) - 4), pci_dma_out);
} else {
@@ -2055,7 +2055,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER |
MPI2_SGE_FLAGS_END_OF_LIST);
sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
- if (bio_segments(rsp->bio) > 1) {
+ if (bio_multiple_segments(rsp->bio)) {
ioc->base_add_sg_single(psge, sgl_flags |
(blk_rq_bytes(rsp) + 4), pci_dma_in);
} else {
@@ -2100,7 +2100,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
le16_to_cpu(mpi_reply->ResponseDataLength);
/* check if the resp needs to be copied from the allocated
* pci mem */
- if (bio_segments(rsp->bio) > 1) {
+ if (bio_multiple_segments(rsp->bio)) {
u32 offset = 0;
u32 bytes_to_copy =
le16_to_cpu(mpi_reply->ResponseDataLength);
diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index f279f46..55aa597 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -1923,7 +1923,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
ioc->transport_cmds.status = MPT3_CMD_PENDING;

/* Check if the request is split across multiple segments */
- if (req->bio->bi_vcnt > 1) {
+ if (bio_multiple_segments(req->bio)) {
u32 offset = 0;

/* Allocate memory and copy the request */
@@ -1955,7 +1955,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,

/* Check if the response needs to be populated across
* multiple segments */
- if (rsp->bio->bi_vcnt > 1) {
+ if (bio_multiple_segments(rsp->bio)) {
pci_addr_in = pci_alloc_consistent(ioc->pdev, blk_rq_bytes(rsp),
&pci_dma_in);
if (!pci_addr_in) {
@@ -2016,7 +2016,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
mpi_request->RequestDataLength = cpu_to_le16(blk_rq_bytes(req) - 4);
psge = &mpi_request->SGL;

- if (req->bio->bi_vcnt > 1)
+ if (bio_multiple_segments(req->bio))
ioc->build_sg(ioc, psge, pci_dma_out, (blk_rq_bytes(req) - 4),
pci_dma_in, (blk_rq_bytes(rsp) + 4));
else
@@ -2061,7 +2061,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,

/* check if the resp needs to be copied from the allocated
* pci mem */
- if (rsp->bio->bi_vcnt > 1) {
+ if (bio_multiple_segments(rsp->bio)) {
u32 offset = 0;
u32 bytes_to_copy =
le16_to_cpu(mpi_reply->ResponseDataLength);
diff --git a/fs/bio.c b/fs/bio.c
index eab8487..46cf8a6 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1723,7 +1723,7 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)
trace_block_split(bdev_get_queue(bi->bi_bdev), bi,
bi->bi_iter.bi_sector + first_sectors);

- BUG_ON(bio_segments(bi) > 1);
+ BUG_ON(bio_multiple_segments(bi));
atomic_set(&bp->cnt, 3);
bp->error = 0;
bp->bio1 = *bi;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 57a6f40..9736683 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -97,13 +97,27 @@
#define bio_offset(bio) bio_iter_offset((bio), (bio)->bi_iter)
#define bio_iovec(bio) bio_iter_iovec((bio), (bio)->bi_iter)

-#define bio_segments(bio) ((bio)->bi_vcnt - (bio)->bi_iter.bi_idx)
+#define bio_multiple_segments(bio) \
+ ((bio)->bi_iter.bi_size != bio_iovec(bio).bv_len)
#define bio_sectors(bio) ((bio)->bi_iter.bi_size >> 9)
#define bio_end_sector(bio) ((bio)->bi_iter.bi_sector + bio_sectors((bio)))

+/*
+ * Check whether this bio carries any data or not. A NULL bio is allowed.
+ */
+static inline bool bio_has_data(struct bio *bio)
+{
+ if (bio &&
+ bio->bi_iter.bi_size &&
+ !(bio->bi_rw & BIO_NO_ADVANCE_ITER_MASK))
+ return true;
+
+ return false;
+}
+
static inline unsigned int bio_cur_bytes(struct bio *bio)
{
- if (bio->bi_vcnt)
+ if (bio_has_data(bio))
return bio_iovec(bio).bv_len;
else /* dataless requests such as discard */
return bio->bi_iter.bi_size;
@@ -111,7 +125,7 @@ static inline unsigned int bio_cur_bytes(struct bio *bio)

static inline void *bio_data(struct bio *bio)
{
- if (bio->bi_vcnt)
+ if (bio_has_data(bio))
return page_address(bio_page(bio)) + bio_offset(bio);

return NULL;
@@ -221,6 +235,18 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,

#define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)

+static inline unsigned bio_segments(struct bio *bio)
+{
+ unsigned segs = 0;
+ struct bio_vec bv;
+ struct bvec_iter iter;
+
+ bio_for_each_segment(bv, bio, iter)
+ segs++;
+
+ return segs;
+}
+
/*
* get a reference to a bio, so it won't disappear. the intended use is
* something like:
@@ -434,17 +460,6 @@ static inline char *__bio_kmap_irq(struct bio *bio, unsigned short idx,
__bio_kmap_irq((bio), (bio)->bi_iter.bi_idx, (flags))
#define bio_kunmap_irq(buf,flags) __bio_kunmap_irq(buf, flags)

-/*
- * Check whether this bio carries any data or not. A NULL bio is allowed.
- */
-static inline bool bio_has_data(struct bio *bio)
-{
- if (bio && bio->bi_vcnt)
- return true;
-
- return false;
-}
-
static inline bool bio_is_rw(struct bio *bio)
{
if (!bio_has_data(bio))
--
1.8.4.rc1

2013-08-07 22:01:44

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 08/22] block: Immutable bio vecs

This adds a mechanism by which we can advance a bio by an arbitrary
number of bytes without modifying the biovec: bio->bi_iter.bi_bvec_done
indicates the number of bytes completed in the current bvec.

Various driver code still needs to be updated to not refer to the bvec
directly before we can use this for interesting things, like efficient
bio splitting.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Lars Ellenberg <[email protected]>
Cc: Paul Clements <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
drivers/block/drbd/drbd_main.c | 4 +--
drivers/block/nbd.c | 2 +-
fs/bio.c | 27 ++------------
include/linux/bio.h | 81 +++++++++++++++++++++++++++++++++++++-----
include/linux/blk_types.h | 2 ++
include/linux/blkdev.h | 4 +--
6 files changed, 82 insertions(+), 38 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 1589ea4..fee8b51 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1546,7 +1546,7 @@ static int _drbd_send_bio(struct drbd_conf *mdev, struct bio *bio)

err = _drbd_no_send_page(mdev, bvec.bv_page,
bvec.bv_offset, bvec.bv_len,
- bio_iter_last(bio, iter)
+ bio_iter_last(bvec, iter)
? 0 : MSG_MORE);
if (err)
return err;
@@ -1565,7 +1565,7 @@ static int _drbd_send_zc_bio(struct drbd_conf *mdev, struct bio *bio)

err = _drbd_send_page(mdev, bvec.bv_page,
bvec.bv_offset, bvec.bv_len,
- bio_iter_last(bio, iter) ? 0 : MSG_MORE);
+ bio_iter_last(bvec, iter) ? 0 : MSG_MORE);
if (err)
return err;
}
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index aa362f4..55298db 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -278,7 +278,7 @@ static int nbd_send_req(struct nbd_device *nbd, struct request *req)
*/
rq_for_each_segment(bvec, req, iter) {
flags = 0;
- if (!rq_iter_last(req, iter))
+ if (!rq_iter_last(bvec, iter))
flags = MSG_MORE;
dprintk(DBG_TX, "%s: request %p: sending %d bytes data\n",
nbd->disk->disk_name, req, bvec.bv_len);
diff --git a/fs/bio.c b/fs/bio.c
index 805befd..1c10ccc 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -532,13 +532,11 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
* most users will be overriding ->bi_bdev with a new target,
* so we don't set nor calculate new physical/hw segment counts here
*/
- bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector;
bio->bi_bdev = bio_src->bi_bdev;
bio->bi_flags |= 1 << BIO_CLONED;
bio->bi_rw = bio_src->bi_rw;
bio->bi_vcnt = bio_src->bi_vcnt;
- bio->bi_iter.bi_size = bio_src->bi_iter.bi_size;
- bio->bi_iter.bi_idx = bio_src->bi_iter.bi_idx;
+ bio->bi_iter = bio_src->bi_iter;
}
EXPORT_SYMBOL(__bio_clone);

@@ -808,28 +806,7 @@ void bio_advance(struct bio *bio, unsigned bytes)
if (bio_integrity(bio))
bio_integrity_advance(bio, bytes);

- bio->bi_iter.bi_sector += bytes >> 9;
- bio->bi_iter.bi_size -= bytes;
-
- if (bio->bi_rw & BIO_NO_ADVANCE_ITER_MASK)
- return;
-
- while (bytes) {
- if (unlikely(bio->bi_iter.bi_idx >= bio->bi_vcnt)) {
- WARN_ONCE(1, "bio idx %d >= vcnt %d\n",
- bio->bi_iter.bi_idx, bio->bi_vcnt);
- break;
- }
-
- if (bytes >= bio_iovec(bio).bv_len) {
- bytes -= bio_iovec(bio).bv_len;
- bio->bi_iter.bi_idx++;
- } else {
- bio_iovec(bio).bv_len -= bytes;
- bio_iovec(bio).bv_offset += bytes;
- bytes = 0;
- }
- }
+ bio_advance_iter(bio, &bio->bi_iter, bytes);
}
EXPORT_SYMBOL(bio_advance);

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5724feb..151868e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -64,11 +64,38 @@
#define bio_iovec_idx(bio, idx) (&((bio)->bi_io_vec[(idx)]))
#define __bio_iovec(bio) bio_iovec_idx((bio), (bio)->bi_iter.bi_idx)

-#define bio_iter_iovec(bio, iter) ((bio)->bi_io_vec[(iter).bi_idx])
+#define __bvec_iter_bvec(bvec, iter) (&(bvec)[(iter).bi_idx])

-#define bio_page(bio) (bio_iovec((bio)).bv_page)
-#define bio_offset(bio) (bio_iovec((bio)).bv_offset)
-#define bio_iovec(bio) (*__bio_iovec(bio))
+#define bvec_iter_page(bvec, iter) \
+ (__bvec_iter_bvec((bvec), (iter))->bv_page)
+
+#define bvec_iter_len(bvec, iter) \
+ min((iter).bi_size, \
+ __bvec_iter_bvec((bvec), (iter))->bv_len - (iter).bi_bvec_done)
+
+#define bvec_iter_offset(bvec, iter) \
+ (__bvec_iter_bvec((bvec), (iter))->bv_offset + (iter).bi_bvec_done)
+
+#define bvec_iter_bvec(bvec, iter) \
+((struct bio_vec) { \
+ .bv_page = bvec_iter_page((bvec), (iter)), \
+ .bv_len = bvec_iter_len((bvec), (iter)), \
+ .bv_offset = bvec_iter_offset((bvec), (iter)), \
+})
+
+#define bio_iter_iovec(bio, iter) \
+ bvec_iter_bvec((bio)->bi_io_vec, (iter))
+
+#define bio_iter_page(bio, iter) \
+ bvec_iter_page((bio)->bi_io_vec, (iter))
+#define bio_iter_len(bio, iter) \
+ bvec_iter_len((bio)->bi_io_vec, (iter))
+#define bio_iter_offset(bio, iter) \
+ bvec_iter_offset((bio)->bi_io_vec, (iter))
+
+#define bio_page(bio) bio_iter_page((bio), (bio)->bi_iter)
+#define bio_offset(bio) bio_iter_offset((bio), (bio)->bi_iter)
+#define bio_iovec(bio) bio_iter_iovec((bio), (bio)->bi_iter)

#define bio_segments(bio) ((bio)->bi_vcnt - (bio)->bi_iter.bi_idx)
#define bio_sectors(bio) ((bio)->bi_iter.bi_size >> 9)
@@ -145,16 +172,54 @@ static inline void *bio_data(struct bio *bio)
bvl = bio_iovec_idx((bio), (i)), i < (bio)->bi_vcnt; \
i++)

+static inline void bvec_iter_advance(struct bio_vec *bv, struct bvec_iter *iter,
+ unsigned bytes)
+{
+ WARN_ONCE(bytes > iter->bi_size,
+ "Attempted to advance past end of bvec iter\n");
+
+ while (bytes) {
+ unsigned len = min(bytes, bvec_iter_len(bv, *iter));
+
+ bytes -= len;
+ iter->bi_size -= len;
+ iter->bi_bvec_done += len;
+
+ if (iter->bi_bvec_done == __bvec_iter_bvec(bv, *iter)->bv_len) {
+ iter->bi_bvec_done = 0;
+ iter->bi_idx++;
+ }
+ }
+}
+
+#define for_each_bvec(bvl, bio_vec, iter, start) \
+ for ((iter) = start; \
+ (bvl) = bvec_iter_bvec((bio_vec), (iter)), \
+ (iter).bi_size; \
+ bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
+
+
+static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+ unsigned bytes)
+{
+ iter->bi_sector += bytes >> 9;
+
+ if (bio->bi_rw & BIO_NO_ADVANCE_ITER_MASK)
+ iter->bi_size -= bytes;
+ else
+ bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+}
+
#define __bio_for_each_segment(bvl, bio, iter, start) \
for (iter = (start); \
- bvl = bio_iter_iovec((bio), (iter)), \
- (iter).bi_idx < (bio)->bi_vcnt; \
- (iter).bi_idx++)
+ (iter).bi_size && \
+ ((bvl = bio_iter_iovec((bio), (iter))), 1); \
+ bio_advance_iter((bio), &(iter), (bvl).bv_len))

#define bio_for_each_segment(bvl, bio, iter) \
__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)

-#define bio_iter_last(bio, iter) ((iter).bi_idx == (bio)->bi_vcnt - 1)
+#define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)

/*
* get a reference to a bio, so it won't disappear. the intended use is
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d46e8a6..72f1274 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -34,6 +34,8 @@ struct bvec_iter {
unsigned int bi_size; /* residual I/O count */

unsigned int bi_idx; /* current index into bvl_vec */
+
+ unsigned int bi_bvec_done;
};

/*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1b9d47b..2a16de2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -714,9 +714,9 @@ struct req_iterator {
__rq_for_each_bio(_iter.bio, _rq) \
bio_for_each_segment(bvl, _iter.bio, _iter.iter)

-#define rq_iter_last(rq, _iter) \
+#define rq_iter_last(bvec, _iter) \
(_iter.bio->bi_next == NULL && \
- bio_iter_last(_iter.bio, _iter.iter))
+ bio_iter_last(bvec, _iter.iter))

#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
# error "You should define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE for your platform"
--
1.8.4.rc1

2013-08-07 22:01:58

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 06/22] block: Convert bio_iovec() to bvec_iter

For immutable biovecs, we'll be introducing a new bio_iovec() that uses
our new bvec iterator to construct a biovec, taking into account
bvec_iter->bi_bvec_done - this patch updates existing users for the new
usage.

Some of the existing users really do need a pointer into the bvec array
- those uses are all going to be removed, but we'll need the
functionality from immutable to remove them - so for now rename the
existing bio_iovec() -> __bio_iovec(), and it'll be removed in a couple
patches.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: "Ed L. Cashin" <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: [email protected]
Cc: "James E.J. Bottomley" <[email protected]>
---
drivers/block/aoe/aoecmd.c | 2 +-
drivers/md/bcache/io.c | 13 +++++++------
drivers/md/dm-verity.c | 2 +-
drivers/scsi/sd.c | 2 +-
fs/bio.c | 20 ++++++++++----------
include/linux/bio.h | 10 ++++++----
6 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 3f1a192..9ec8c2d 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -939,7 +939,7 @@ bufinit(struct buf *buf, struct request *rq, struct bio *bio)
buf->resid = bio->bi_iter.bi_size;
buf->sector = bio->bi_iter.bi_sector;
bio_pageinc(bio);
- buf->bv = bio_iovec(bio);
+ buf->bv = __bio_iovec(bio);
buf->bv_resid = buf->bv->bv_len;
WARN_ON(buf->bv_resid == 0);
}
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index cc4ba2d..dc44f06 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -22,11 +22,12 @@ static void bch_bi_idx_hack_endio(struct bio *bio, int error)
static void bch_generic_make_request_hack(struct bio *bio)
{
if (bio->bi_iter.bi_idx) {
+ int i;
+ struct bio_vec *bv;
struct bio *clone = bio_alloc(GFP_NOIO, bio_segments(bio));

- memcpy(clone->bi_io_vec,
- bio_iovec(bio),
- bio_segments(bio) * sizeof(struct bio_vec));
+ bio_for_each_segment(bv, bio, i)
+ clone->bi_io_vec[clone->bi_vcnt++] = *bv;

clone->bi_iter.bi_sector = bio->bi_iter.bi_sector;
clone->bi_bdev = bio->bi_bdev;
@@ -97,7 +98,7 @@ struct bio *bch_bio_split(struct bio *bio, int sectors,
if (!ret)
return NULL;

- memcpy(ret->bi_io_vec, bio_iovec(bio),
+ memcpy(ret->bi_io_vec, __bio_iovec(bio),
sizeof(struct bio_vec) * vcnt);

break;
@@ -106,7 +107,7 @@ struct bio *bch_bio_split(struct bio *bio, int sectors,
if (!ret)
return NULL;

- memcpy(ret->bi_io_vec, bio_iovec(bio),
+ memcpy(ret->bi_io_vec, __bio_iovec(bio),
sizeof(struct bio_vec) * vcnt);

ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
@@ -182,7 +183,7 @@ static unsigned bch_bio_max_sectors(struct bio *bio)
ret = min(ret, queue_max_sectors(q));

WARN_ON(!ret);
- ret = max_t(int, ret, bio_iovec(bio)->bv_len >> 9);
+ ret = max_t(int, ret, bio_iovec(bio).bv_len >> 9);

return ret;
}
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index bf84bb3..27f9705 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -525,7 +525,7 @@ static int verity_map(struct dm_target *ti, struct bio *bio)
io->io_vec = io->io_vec_inline;
else
io->io_vec = mempool_alloc(v->vec_mempool, GFP_NOIO);
- memcpy(io->io_vec, bio_iovec(bio),
+ memcpy(io->io_vec, __bio_iovec(bio),
io->io_vec_size * sizeof(struct bio_vec));

verity_submit_prefetch(v, io);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 86fcf2c..f7d8e3a 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -799,7 +799,7 @@ static int sd_setup_write_same_cmnd(struct scsi_device *sdp, struct request *rq)
if (sdkp->device->no_write_same)
return BLKPREP_KILL;

- BUG_ON(bio_offset(bio) || bio_iovec(bio)->bv_len != sdp->sector_size);
+ BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);

sector >>= ilog2(sdp->sector_size) - 9;
nr_sectors >>= ilog2(sdp->sector_size) - 9;
diff --git a/fs/bio.c b/fs/bio.c
index 0a86e2c..d10952c 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -821,12 +821,12 @@ void bio_advance(struct bio *bio, unsigned bytes)
break;
}

- if (bytes >= bio_iovec(bio)->bv_len) {
- bytes -= bio_iovec(bio)->bv_len;
+ if (bytes >= bio_iovec(bio).bv_len) {
+ bytes -= bio_iovec(bio).bv_len;
bio->bi_iter.bi_idx++;
} else {
- bio_iovec(bio)->bv_len -= bytes;
- bio_iovec(bio)->bv_offset += bytes;
+ bio_iovec(bio).bv_len -= bytes;
+ bio_iovec(bio).bv_offset += bytes;
bytes = 0;
}
}
@@ -879,8 +879,8 @@ void bio_copy_data(struct bio *dst, struct bio *src)
unsigned src_offset, dst_offset, bytes;
void *src_p, *dst_p;

- src_bv = bio_iovec(src);
- dst_bv = bio_iovec(dst);
+ src_bv = __bio_iovec(src);
+ dst_bv = __bio_iovec(dst);

src_offset = src_bv->bv_offset;
dst_offset = dst_bv->bv_offset;
@@ -893,7 +893,7 @@ void bio_copy_data(struct bio *dst, struct bio *src)
if (!src)
break;

- src_bv = bio_iovec(src);
+ src_bv = __bio_iovec(src);
}

src_offset = src_bv->bv_offset;
@@ -906,7 +906,7 @@ void bio_copy_data(struct bio *dst, struct bio *src)
if (!dst)
break;

- dst_bv = bio_iovec(dst);
+ dst_bv = __bio_iovec(dst);
}

dst_offset = dst_bv->bv_offset;
@@ -1766,8 +1766,8 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)
bp->bio1.bi_iter.bi_size = first_sectors << 9;

if (bi->bi_vcnt != 0) {
- bp->bv1 = *bio_iovec(bi);
- bp->bv2 = *bio_iovec(bi);
+ bp->bv1 = bio_iovec(bi);
+ bp->bv2 = bio_iovec(bi);

if (bio_is_rw(bi)) {
bp->bv2.bv_offset += first_sectors << 9;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5f440f0..58d5647 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -62,9 +62,11 @@
* on highmem page vectors
*/
#define bio_iovec_idx(bio, idx) (&((bio)->bi_io_vec[(idx)]))
-#define bio_iovec(bio) bio_iovec_idx((bio), (bio)->bi_iter.bi_idx)
-#define bio_page(bio) bio_iovec((bio))->bv_page
-#define bio_offset(bio) bio_iovec((bio))->bv_offset
+#define __bio_iovec(bio) bio_iovec_idx((bio), (bio)->bi_iter.bi_idx)
+#define bio_iovec(bio) (*__bio_iovec(bio))
+
+#define bio_page(bio) (bio_iovec((bio)).bv_page)
+#define bio_offset(bio) (bio_iovec((bio)).bv_offset)
#define bio_segments(bio) ((bio)->bi_vcnt - (bio)->bi_iter.bi_idx)
#define bio_sectors(bio) ((bio)->bi_iter.bi_size >> 9)
#define bio_end_sector(bio) ((bio)->bi_iter.bi_sector + bio_sectors((bio)))
@@ -72,7 +74,7 @@
static inline unsigned int bio_cur_bytes(struct bio *bio)
{
if (bio->bi_vcnt)
- return bio_iovec(bio)->bv_len;
+ return bio_iovec(bio).bv_len;
else /* dataless requests such as discard */
return bio->bi_iter.bi_size;
}
--
1.8.4.rc1

2013-08-07 22:02:24

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 03/22] bcache: Kill unaligned bvec hack

Bcache has a hack to avoid cloning the biovec if it's all full pages -
but with immutable biovecs coming this won't be necessary anymore.

For now, we remove the special case and always clone the bvec array so
that the immutable biovec patches are simpler.

Signed-off-by: Kent Overstreet <[email protected]>
---
drivers/md/bcache/bcache.h | 1 -
drivers/md/bcache/debug.c | 4 ----
drivers/md/bcache/request.c | 32 +++++---------------------------
drivers/md/bcache/request.h | 2 +-
drivers/md/bcache/super.c | 4 ----
5 files changed, 6 insertions(+), 37 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index b39f6f0..ec5f17c 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -443,7 +443,6 @@ struct bcache_device {
unsigned long sectors_dirty_last;
long sectors_dirty_derivative;

- mempool_t *unaligned_bvec;
struct bio_set *bio_split;

unsigned data_csum:1;
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 88e6411..545680b 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -191,10 +191,6 @@ void bch_data_verify(struct search *s)
struct bio_vec *bv;
int i;

- if (!s->unaligned_bvec)
- bio_for_each_segment(bv, s->orig_bio, i)
- bv->bv_offset = 0, bv->bv_len = PAGE_SIZE;
-
check = bio_clone(s->orig_bio, GFP_NOIO);
if (!check)
return;
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 786a1a4..2c2e1c1 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -679,10 +679,14 @@ static void bio_complete(struct search *s)
static void do_bio_hook(struct search *s)
{
struct bio *bio = &s->bio.bio;
- memcpy(bio, s->orig_bio, sizeof(struct bio));

+ bio_init(bio);
+ bio->bi_io_vec = s->bv;
+ bio->bi_max_vecs = BIO_MAX_PAGES;
+ __bio_clone(bio, s->orig_bio);
bio->bi_end_io = request_endio;
bio->bi_private = &s->cl;
+
atomic_set(&bio->bi_cnt, 3);
}

@@ -694,16 +698,12 @@ static void search_free(struct closure *cl)
if (s->op.cache_bio)
bio_put(s->op.cache_bio);

- if (s->unaligned_bvec)
- mempool_free(s->bio.bio.bi_io_vec, s->d->unaligned_bvec);
-
closure_debug_destroy(cl);
mempool_free(s, s->d->c->search);
}

static struct search *search_alloc(struct bio *bio, struct bcache_device *d)
{
- struct bio_vec *bv;
struct search *s = mempool_alloc(d->c->search, GFP_NOIO);
memset(s, 0, offsetof(struct search, op.keys));

@@ -722,15 +722,6 @@ static struct search *search_alloc(struct bio *bio, struct bcache_device *d)
s->start_time = jiffies;
do_bio_hook(s);

- if (bio->bi_size != bio_segments(bio) * PAGE_SIZE) {
- bv = mempool_alloc(d->unaligned_bvec, GFP_NOIO);
- memcpy(bv, bio_iovec(bio),
- sizeof(struct bio_vec) * bio_segments(bio));
-
- s->bio.bio.bi_io_vec = bv;
- s->unaligned_bvec = 1;
- }
-
return s;
}

@@ -780,26 +771,13 @@ static void cached_dev_read_complete(struct closure *cl)
static void request_read_error(struct closure *cl)
{
struct search *s = container_of(cl, struct search, cl);
- struct bio_vec *bv;
- int i;

if (s->recoverable) {
/* Retry from the backing device: */
trace_bcache_read_retry(s->orig_bio);

s->error = 0;
- bv = s->bio.bio.bi_io_vec;
do_bio_hook(s);
- s->bio.bio.bi_io_vec = bv;
-
- if (!s->unaligned_bvec)
- bio_for_each_segment(bv, s->orig_bio, i)
- bv->bv_offset = 0, bv->bv_len = PAGE_SIZE;
- else
- memcpy(s->bio.bio.bi_io_vec,
- bio_iovec(s->orig_bio),
- sizeof(struct bio_vec) *
- bio_segments(s->orig_bio));

/* XXX: invalidate cache */

diff --git a/drivers/md/bcache/request.h b/drivers/md/bcache/request.h
index 57dc478..bee95a9 100644
--- a/drivers/md/bcache/request.h
+++ b/drivers/md/bcache/request.h
@@ -16,7 +16,6 @@ struct search {
unsigned cache_bio_sectors;

unsigned recoverable:1;
- unsigned unaligned_bvec:1;

unsigned write:1;
unsigned writeback:1;
@@ -27,6 +26,7 @@ struct search {

/* Anything past op->keys won't get zeroed in do_bio_hook */
struct btree_op op;
+ struct bio_vec bv[BIO_MAX_PAGES];
};

void bch_cache_read_endio(struct bio *, int);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 547c4c5..dc073eb 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -743,8 +743,6 @@ static void bcache_device_free(struct bcache_device *d)
put_disk(d->disk);

bio_split_pool_free(&d->bio_split_hook);
- if (d->unaligned_bvec)
- mempool_destroy(d->unaligned_bvec);
if (d->bio_split)
bioset_free(d->bio_split);
if (is_vmalloc_addr(d->stripe_sectors_dirty))
@@ -778,8 +776,6 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size,
return -ENOMEM;

if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio))) ||
- !(d->unaligned_bvec = mempool_create_kmalloc_pool(1,
- sizeof(struct bio_vec) * BIO_MAX_PAGES)) ||
bio_split_pool_init(&d->bio_split_hook) ||
!(d->disk = alloc_disk(1)) ||
!(q = blk_alloc_queue(GFP_KERNEL)))
--
1.8.4.rc1

2013-08-07 22:03:05

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 01/22] block: Use rw_copy_check_uvector()

No need for silly open coding - and struct sg_iovec has exactly the same
layout as struct iovec...

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
---
block/scsi_ioctl.c | 39 ++++++++++-----------------------------
1 file changed, 10 insertions(+), 29 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index a5ffcc9..625e3e4 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -286,7 +286,8 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
struct sg_io_hdr *hdr, fmode_t mode)
{
unsigned long start_time;
- int writing = 0, ret = 0;
+ ssize_t ret = 0;
+ int writing = 0;
struct request *rq;
char sense[SCSI_SENSE_BUFFERSIZE];
struct bio *bio;
@@ -321,37 +322,16 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
}

if (hdr->iovec_count) {
- const int size = sizeof(struct sg_iovec) * hdr->iovec_count;
size_t iov_data_len;
- struct sg_iovec *sg_iov;
struct iovec *iov;
- int i;

- sg_iov = kmalloc(size, GFP_KERNEL);
- if (!sg_iov) {
- ret = -ENOMEM;
+ ret = rw_copy_check_uvector(-1, hdr->dxferp, hdr->iovec_count,
+ 0, NULL, &iov);
+ if (ret < 0)
goto out;
- }
-
- if (copy_from_user(sg_iov, hdr->dxferp, size)) {
- kfree(sg_iov);
- ret = -EFAULT;
- goto out;
- }

- /*
- * Sum up the vecs, making sure they don't overflow
- */
- iov = (struct iovec *) sg_iov;
- iov_data_len = 0;
- for (i = 0; i < hdr->iovec_count; i++) {
- if (iov_data_len + iov[i].iov_len < iov_data_len) {
- kfree(sg_iov);
- ret = -EINVAL;
- goto out;
- }
- iov_data_len += iov[i].iov_len;
- }
+ iov_data_len = ret;
+ ret = 0;

/* SG_IO howto says that the shorter of the two wins */
if (hdr->dxfer_len < iov_data_len) {
@@ -361,9 +341,10 @@ static int sg_io(struct request_queue *q, struct gendisk *bd_disk,
iov_data_len = hdr->dxfer_len;
}

- ret = blk_rq_map_user_iov(q, rq, NULL, sg_iov, hdr->iovec_count,
+ ret = blk_rq_map_user_iov(q, rq, NULL, (struct sg_iovec *) iov,
+ hdr->iovec_count,
iov_data_len, GFP_KERNEL);
- kfree(sg_iov);
+ kfree(iov);
} else if (hdr->dxfer_len)
ret = blk_rq_map_user(q, rq, NULL, hdr->dxferp, hdr->dxfer_len,
GFP_KERNEL);
--
1.8.4.rc1

2013-08-08 15:10:04

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 0/22] Immutable biovecs, block layer changes

On Wed, Aug 07, 2013 at 02:54:09PM -0700, Kent Overstreet wrote:
> _However_, I have more patches (that depend on this patch series) to get
> that back - segment merging improvements that get rid of
> bi_seg_front_size, bi_seg_back_size, and bi_phys_segments. Once all that
> is in it should be a net reduction to the size of struct bio.

What is preventing you from sending those out as well? While it's not
absolutely nessecary it would certainly be good if we'd avoid a struct
bio size regression.

2013-08-08 21:15:33

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 0/22] Immutable biovecs, block layer changes

On Thu, Aug 08, 2013 at 08:09:54AM -0700, Christoph Hellwig wrote:
> On Wed, Aug 07, 2013 at 02:54:09PM -0700, Kent Overstreet wrote:
> > _However_, I have more patches (that depend on this patch series) to get
> > that back - segment merging improvements that get rid of
> > bi_seg_front_size, bi_seg_back_size, and bi_phys_segments. Once all that
> > is in it should be a net reduction to the size of struct bio.
>
> What is preventing you from sending those out as well? While it's not
> absolutely nessecary it would certainly be good if we'd avoid a struct
> bio size regression.

There's still some fairly significant changes, and I don't want to make
too many invasive changes at once.

Main thing is making generic_make_request() take arbitrary size bios.
After this series that's just two simple patches, but then the changes
to make use of that will be changing behaviour in non obvious ways.

The way the merging changes work is it enables multi page bvecs - so a
bvec can point to many physically contiguous pages. This moves segment
merging to bio_add_page(), and gets rid of bi_phys_segments - now
bi_vcnt == bi_phys_segments, we just split the bio if it's too big.

Then, with bio_add_page() building up large bios and the block layer
splitting them as necessary, there shouldn't be any need for segment
merging across bios anymore (because generally when that would've
happened before, now we'd just be sending one larger bio down).

The remaining patches aren't terribly complicated though (less
complicated than this patch series). Trickiest bit is multipage bvecs,
and that's mostly just lots of code auditing - the way I convert
existing code is by adding bio_for_each_page() - analagous to
bio_for_each_segment, but giving you bvecs that point to single pages.
So it's an easy conversion, just have to make sure nothing's missed.

2013-08-10 07:38:58

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 18/22] block: Generic bio chaining

On Wed, Aug 07, 2013 at 02:54:27PM -0700, Kent Overstreet wrote:
> This adds a generic mechanism for chaining bio completions. This is
> going to be used for a bio_split() replacement, and some other things in
> the future.
>
> This is implemented with a new bio flag that bio_endio() checks; it
> would definitely be cleaner to implement chaining with a bi_end_io
> function, but since there's no limits on the depth of a bio chain (and
> with arbitrary bio splitting coming this is going to be a real issue)
> using an endio function would lead to unbounded stack usage.
>
> Tail call optimization could solve that, but CONFIG_FRAME_POINTER
> disables gcc's tail call optimization (-fno-optimize-sibling-calls) - so
> we do it the hacky but safe way.

Btw, if you saw this patch and went "Wtf? What's the justification for
inflating struct bio and sticking another atomic op in the fast path?" -
here's the justification: The below patch gets me a 5% increase in
throughput (doing 4k random reads, and on one core on an old gulftown so
cpu bound).

(it also considerably simplifies a lot of random code, but there's a
real performance win to drivers handling arbitrary size bios so upper
layers don't have to care).

>From a6b23c56c722ffbf30ca78c14d21dd8615e11474 Mon Sep 17 00:00:00 2001
From: Kent Overstreet <[email protected]>
Date: Sat, 10 Aug 2013 00:14:03 -0700
Subject: [PATCH] mtip32xx: handle arbitrary size bios


diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 3ea8234..058d86c 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -2645,24 +2645,6 @@ static void mtip_hw_submit_io(struct driver_data *dd, sector_t sector,
}

/*
- * Release a command slot.
- *
- * @dd Pointer to the driver data structure.
- * @tag Slot tag
- *
- * return value
- * None
- */
-static void mtip_hw_release_scatterlist(struct driver_data *dd, int tag,
- int unaligned)
-{
- struct semaphore *sem = unaligned ? &dd->port->cmd_slot_unal :
- &dd->port->cmd_slot;
- release_slot(dd->port, tag);
- up(sem);
-}
-
-/*
* Obtain a command slot and return its associated scatter list.
*
* @dd Pointer to the driver data structure.
@@ -3913,21 +3895,22 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)

sg = mtip_hw_get_scatterlist(dd, &tag, unaligned);
if (likely(sg != NULL)) {
- if (unlikely((bio)->bi_vcnt > MTIP_MAX_SG)) {
- dev_warn(&dd->pdev->dev,
- "Maximum number of SGL entries exceeded\n");
- bio_io_error(bio);
- mtip_hw_release_scatterlist(dd, tag, unaligned);
- return;
- }
-
/* Create the scatter list for this bio. */
bio_for_each_segment(bvec, bio, iter) {
- sg_set_page(&sg[nents],
- bvec.bv_page,
- bvec.bv_len,
- bvec.bv_offset);
- nents++;
+ if (unlikely(nents == MTIP_MAX_SG)) {
+ struct bio *split = bio_clone(bio, GFP_NOIO);
+
+ split->bi_iter = iter;
+ bio->bi_iter.bi_size -= iter.bi_size;
+ bio_chain(split, bio);
+ generic_make_request(split);
+ break;
+ }
+
+ sg_set_page(&sg[nents++],
+ bvec.bv_page,
+ bvec.bv_len,
+ bvec.bv_offset);
}

/* Issue the read/write. */
@@ -4040,6 +4023,7 @@ skip_create_disk:
blk_queue_max_hw_sectors(dd->queue, 0xffff);
blk_queue_max_segment_size(dd->queue, 0x400000);
blk_queue_io_min(dd->queue, 4096);
+ set_bit(QUEUE_FLAG_LARGEBIOS, &dd->queue->queue_flags);

/*
* write back cache is not supported in the device. FUA depends on

2013-09-24 11:00:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 0/22] Immutable biovecs, block layer changes

Just curious, what's the state of the remaining immutable bio work?

On Thu, Aug 08, 2013 at 02:15:29PM -0700, Kent Overstreet wrote:
> > What is preventing you from sending those out as well? While it's not
> > absolutely nessecary it would certainly be good if we'd avoid a struct
> > bio size regression.
>
> There's still some fairly significant changes, and I don't want to make
> too many invasive changes at once.
>
> Main thing is making generic_make_request() take arbitrary size bios.
> After this series that's just two simple patches, but then the changes
> to make use of that will be changing behaviour in non obvious ways.
>
> The way the merging changes work is it enables multi page bvecs - so a
> bvec can point to many physically contiguous pages. This moves segment
> merging to bio_add_page(), and gets rid of bi_phys_segments - now
> bi_vcnt == bi_phys_segments, we just split the bio if it's too big.
>
> Then, with bio_add_page() building up large bios and the block layer
> splitting them as necessary, there shouldn't be any need for segment
> merging across bios anymore (because generally when that would've
> happened before, now we'd just be sending one larger bio down).
>
> The remaining patches aren't terribly complicated though (less
> complicated than this patch series). Trickiest bit is multipage bvecs,
> and that's mostly just lots of code auditing - the way I convert
> existing code is by adding bio_for_each_page() - analagous to
> bio_for_each_segment, but giving you bvecs that point to single pages.
> So it's an easy conversion, just have to make sure nothing's missed.
---end quoted text---

2013-09-24 13:20:46

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 0/22] Immutable biovecs, block layer changes

On Tue, Sep 24 2013 at 7:00am -0400,
Christoph Hellwig <[email protected]> wrote:

> Just curious, what's the state of the remaining immutable bio work?

Hey Christoph,

Have you been over the patchset? Looks sane to you?

Given how disruptive this patchset is to the block layer I'm wondering
how painful this change will be in combination with Jens' blk-mq
changes. I'd prefer to see blk-mq before immutable biovecs; but I have
my own selfish reasons for that.

I'm also concerned about DM regressions given that a lot of DM code to
handle splitting a bio that spans target boundaries is ripped out, see:
https://patchwork.kernel.org/patch/2840657/

But that is just handwaving/FUD at this point cause I haven't put the
time to looking at this patchset close enough to feel comfortable. I
can make reviewing this patchset a priority this week though.

Kent, have you rebased this patchset at all? Do you have a git tree I
can clone to save the pain of pulling these patches out my my mbox,
etc.

Mike

2013-09-24 14:29:42

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 0/22] Immutable biovecs, block layer changes

On Tue, Sep 24, 2013 at 09:20:14AM -0400, Mike Snitzer wrote:
> Have you been over the patchset? Looks sane to you?

I looked over it, although I didn't dig into the details of all driver
patches, and I like what I see. As said in the previous mail I'd love
to see the patches to shrink struct bio again, too.

> Given how disruptive this patchset is to the block layer I'm wondering
> how painful this change will be in combination with Jens' blk-mq
> changes. I'd prefer to see blk-mq before immutable biovecs; but I have
> my own selfish reasons for that.

I don't see too much conflicts with blk-multiqueue as that is operating
at the request layer. Blk-multiqueue defintively is a higher priority
for me, but as it already looks fairly good I have no idea what we're
blocked on for it anyway.

2013-09-24 19:16:40

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 0/22] Immutable biovecs, block layer changes

On Tue, Sep 24, 2013 at 04:00:12AM -0700, Christoph Hellwig wrote:
> Just curious, what's the state of the remaining immutable bio work?

Mostly just waiting for Jens to pull it, though I did discover an issue
with the "generic bio chaining" patch last time I was testing it - but
that's towards the end of the series and the rest could be pulled now.

Going to work on the series more today - the reworked segment merging
patches is what lets us shrink struct bio; those patches aren't ready
yet (and they're more invasive than some other cleanup I was planning on
getting in first) but I think they're logically independent from most of
the other patches, I'll see if I can mail out a short series on top of
this one for you to look at.

>
> On Thu, Aug 08, 2013 at 02:15:29PM -0700, Kent Overstreet wrote:
> > > What is preventing you from sending those out as well? While it's not
> > > absolutely nessecary it would certainly be good if we'd avoid a struct
> > > bio size regression.
> >
> > There's still some fairly significant changes, and I don't want to make
> > too many invasive changes at once.
> >
> > Main thing is making generic_make_request() take arbitrary size bios.
> > After this series that's just two simple patches, but then the changes
> > to make use of that will be changing behaviour in non obvious ways.
> >
> > The way the merging changes work is it enables multi page bvecs - so a
> > bvec can point to many physically contiguous pages. This moves segment
> > merging to bio_add_page(), and gets rid of bi_phys_segments - now
> > bi_vcnt == bi_phys_segments, we just split the bio if it's too big.
> >
> > Then, with bio_add_page() building up large bios and the block layer
> > splitting them as necessary, there shouldn't be any need for segment
> > merging across bios anymore (because generally when that would've
> > happened before, now we'd just be sending one larger bio down).
> >
> > The remaining patches aren't terribly complicated though (less
> > complicated than this patch series). Trickiest bit is multipage bvecs,
> > and that's mostly just lots of code auditing - the way I convert
> > existing code is by adding bio_for_each_page() - analagous to
> > bio_for_each_segment, but giving you bvecs that point to single pages.
> > So it's an easy conversion, just have to make sure nothing's missed.
> ---end quoted text---

2013-09-24 19:20:08

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 0/22] Immutable biovecs, block layer changes

On Tue, Sep 24, 2013 at 09:20:14AM -0400, Mike Snitzer wrote:
> On Tue, Sep 24 2013 at 7:00am -0400,
> Christoph Hellwig <[email protected]> wrote:
>
> > Just curious, what's the state of the remaining immutable bio work?
>
> Hey Christoph,
>
> Have you been over the patchset? Looks sane to you?
>
> Given how disruptive this patchset is to the block layer I'm wondering
> how painful this change will be in combination with Jens' blk-mq
> changes. I'd prefer to see blk-mq before immutable biovecs; but I have
> my own selfish reasons for that.
>
> I'm also concerned about DM regressions given that a lot of DM code to
> handle splitting a bio that spans target boundaries is ripped out, see:
> https://patchwork.kernel.org/patch/2840657/

Oh good, you had seen that :)

Are you worried about performance regressions or actual bugs?

I _have_ explicitly tested that code path, but I don't know what corner
cases I may have missed so it definitely needs your eyeballs and testing
too.

As for performance - this'll be more efficient than what the dm code was
doing before. Especially when the multipage bvec patches go in.

> But that is just handwaving/FUD at this point cause I haven't put the
> time to looking at this patchset close enough to feel comfortable. I
> can make reviewing this patchset a priority this week though.
>
> Kent, have you rebased this patchset at all? Do you have a git tree I
> can clone to save the pain of pulling these patches out my my mbox,
> etc.

Yeah: git://evilpiepirate.org/~kent/linux-bcache.git for-jens

2013-09-27 18:39:23

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 0/22] Immutable biovecs, block layer changes

On Tue, Sep 24 2013 at 3:19pm -0400,
Kent Overstreet <[email protected]> wrote:

> On Tue, Sep 24, 2013 at 09:20:14AM -0400, Mike Snitzer wrote:
> > On Tue, Sep 24 2013 at 7:00am -0400,
> > Christoph Hellwig <[email protected]> wrote:
> >
> > > Just curious, what's the state of the remaining immutable bio work?
> >
> > Hey Christoph,
> >
> > Have you been over the patchset? Looks sane to you?
> >
> > Given how disruptive this patchset is to the block layer I'm wondering
> > how painful this change will be in combination with Jens' blk-mq
> > changes. I'd prefer to see blk-mq before immutable biovecs; but I have
> > my own selfish reasons for that.
> >
> > I'm also concerned about DM regressions given that a lot of DM code to
> > handle splitting a bio that spans target boundaries is ripped out, see:
> > https://patchwork.kernel.org/patch/2840657/
>
> Oh good, you had seen that :)
>
> Are you worried about performance regressions or actual bugs?

Actual bugs.

> I _have_ explicitly tested that code path, but I don't know what corner
> cases I may have missed so it definitely needs your eyeballs and testing
> too.
>
> As for performance - this'll be more efficient than what the dm code was
> doing before. Especially when the multipage bvec patches go in.

Sure, I can appreciate that.

> > But that is just handwaving/FUD at this point cause I haven't put the
> > time to looking at this patchset close enough to feel comfortable. I
> > can make reviewing this patchset a priority this week though.
> >
> > Kent, have you rebased this patchset at all? Do you have a git tree I
> > can clone to save the pain of pulling these patches out my my mbox,
> > etc.
>
> Yeah: git://evilpiepirate.org/~kent/linux-bcache.git for-jens

Looking now, will build and kickoff some regression tests before I then
jump into reviewing the code.

2013-09-28 04:59:15

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 16/22] dm: Refactor for new bio cloning/splitting

On Wed, Aug 07 2013 at 5:54pm -0400,
Kent Overstreet <[email protected]> wrote:

> We need to convert the dm code to the new bvec_iter primitives which
> respect bi_bvec_done; they also allow us to drastically simplify dm's
> bio splitting code.
>
> Also kill bio_sector_offset(), dm was the only user and it doesn't make
> much sense anymore.
>
> Signed-off-by: Kent Overstreet <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Cc: Alasdair Kergon <[email protected]>
> Cc: [email protected]
> ---
> drivers/md/dm.c | 170 ++++++----------------------------------------------
> fs/bio.c | 38 ------------
> include/linux/bio.h | 1 -
> 3 files changed, 18 insertions(+), 191 deletions(-)
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 5544af7..696269d 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1050,7 +1050,6 @@ struct clone_info {

<snip>

> /*
> * Creates a bio that consists of range of complete bvecs.
> */
> static void clone_bio(struct dm_target_io *tio, struct bio *bio,
> - sector_t sector, unsigned short idx,
> - unsigned short bv_count, unsigned len)
> + sector_t sector, unsigned len)
> {
> struct bio *clone = &tio->clone;
> - unsigned trim = 0;
>
> __bio_clone(clone, bio);
> - bio_setup_sector(clone, sector, len);
> - bio_setup_bv(clone, idx, bv_count);
>
> - if (idx != bio->bi_iter.bi_idx ||
> - clone->bi_iter.bi_size < bio->bi_iter.bi_size)
> - trim = 1;
> - clone_bio_integrity(bio, clone, idx, len, 0, trim);
> + if (bio_integrity(bio))
> + bio_integrity_clone(clone, bio, GFP_NOIO);
> +
> + bio_advance(clone, (sector - clone->bi_iter.bi_sector) << 9);
> + bio->bi_iter.bi_size = len << 9;
> +
> + if (bio_integrity(bio))
> + bio_integrity_trim(clone, 0, len);
> }
>
> static struct dm_target_io *alloc_tio(struct clone_info *ci,
> @@ -1182,10 +1137,7 @@ static int __send_empty_flush(struct clone_info *ci)
> }
>
> static void __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti,
> - sector_t sector, int nr_iovecs,
> - unsigned short idx, unsigned short bv_count,
> - unsigned offset, unsigned len,
> - unsigned split_bvec)
> + sector_t sector, unsigned len)
> {
> struct bio *bio = ci->bio;
> struct dm_target_io *tio;
> @@ -1199,11 +1151,8 @@ static void __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti
> num_target_bios = ti->num_write_bios(ti, bio);
>
> for (target_bio_nr = 0; target_bio_nr < num_target_bios; target_bio_nr++) {
> - tio = alloc_tio(ci, ti, nr_iovecs, target_bio_nr);
> - if (split_bvec)
> - clone_split_bio(tio, bio, sector, idx, offset, len);
> - else
> - clone_bio(tio, bio, sector, idx, bv_count, len);
> + tio = alloc_tio(ci, ti, 0, target_bio_nr);
> + clone_bio(tio, bio, sector, len);
> __map_bio(tio);
> }
> }

Hey Kent,

I haven't been able to pinpoint the issue yet, but using your for-jens
branch, if I create a dm-thin volume with this lvm command:
lvcreate -L20G -V20G -T vg/pool --name thinlv

and try to format /dev/vg/thinlv with XFS the kernel warns and then
hangs with the following:

WARNING: CPU: 0 PID: 11789 at include/linux/bio.h:202 bio_advance+0xd0/0xe0()
Attempted to advance past end of bvec iter
Modules linked in: dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c skd(O) ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 target_core_i
block target_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe 8021q libfc garp stp llc scsi_transport_fc scsi_tgt sunrpc cpufreq_ondemand ipt_R
EJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables bnx2i cni
c uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan vhost tun
kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio dm_mod ses e
nclosure sg acpi_cpufreq freq_table ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix sd_mod crc_t10dif crct10dif_common megaraid_sas
CPU: 0 PID: 11789 Comm: mkfs.xfs Tainted: G W O 3.12.0-rc2.snitm+ #74
Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011
00000000000000ca ffff8803313156a8 ffffffff8151e8e8 00000000000000ca
ffff8803313156f8 ffff8803313156e8 ffffffff8104c23c ffff880300000000
ffff8802dd524220 0000000000000400 ffff8802ddfb9680 ffff8802dd524200
Call Trace:
[<ffffffff8151e8e8>] dump_stack+0x49/0x61
[<ffffffff8104c23c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff8104c326>] warn_slowpath_fmt+0x46/0x50
[<ffffffff811b1e40>] bio_advance+0xd0/0xe0
[<ffffffffa015c63e>] __clone_and_map_data_bio+0xce/0x110 [dm_mod]
[<ffffffffa015c706>] __split_and_process_non_flush+0x86/0xd0 [dm_mod]
[<ffffffffa015c8ff>] __split_and_process_bio+0x1af/0x200 [dm_mod]
[<ffffffffa015ca72>] _dm_request+0x122/0x190 [dm_mod]
[<ffffffffa015cb08>] dm_request+0x28/0x40 [dm_mod]
[<ffffffff81247040>] generic_make_request+0xc0/0x100
[<ffffffff81247100>] submit_bio+0x80/0x170
[<ffffffff811b7d9a>] do_direct_IO+0x6ea/0x10f0
[<ffffffff811b8cf6>] do_blockdev_direct_IO+0x556/0x980
[<ffffffff811b3d60>] ? I_BDEV+0x10/0x10
[<ffffffff811b9175>] __blockdev_direct_IO+0x55/0x60
[<ffffffff811b3d60>] ? I_BDEV+0x10/0x10
[<ffffffff8112a53e>] ? lru_cache_add+0xe/0x10
[<ffffffff811b4ce6>] blkdev_direct_IO+0x56/0x60
[<ffffffff811b3d60>] ? I_BDEV+0x10/0x10
[<ffffffff8111c5f2>] generic_file_direct_write+0xc2/0x180
[<ffffffff81195b93>] ? file_update_time+0xa3/0xe0
[<ffffffff8111ddf0>] __generic_file_aio_write+0x2d0/0x3b0
[<ffffffff811b42e6>] blkdev_aio_write+0x56/0xa0
[<ffffffff8117b8af>] do_sync_write+0x5f/0xa0
[<ffffffff8117bb4d>] ? rw_verify_area+0x5d/0xe0
[<ffffffff8117bc98>] vfs_write+0xc8/0x170
[<ffffffff8117c2af>] SyS_write+0x5f/0xb0
[<ffffffff8117c24e>] ? SyS_lseek+0x7e/0x80
[<ffffffff8152b252>] system_call_fastpath+0x16/0x1b
---[ end trace 06fd13242c0bb957 ]---

Looks to be stuck in bvec_iter_advance's while loop?

BUG: soft lockup - CPU#0 stuck for 22s! [mkfs.xfs:11641]
Modules linked in: dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c dm_mod ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge autofs4 target_core_i
block target_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe libfcoe libfc 8021q garp scsi_transport_fc stp scsi_tgt llc sunrpc cpufreq_ondemand ipt_R
EJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables bnx2i cni
c uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net macvtap macvlan vhost tun kvm_intel kvm iTCO_wdt iTCO_vend
or_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_core mdio sg ses enclosure acpi_cpufreq freq_table ext
4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix sd_mod crc_t10dif crct10dif_common megaraid_sas [last unloaded: dm_mod]
CPU: 0 PID: 11641 Comm: mkfs.xfs Tainted: G W 3.12.0-rc2.snitm+ #74
Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011
task: ffff88032c626040 ti: ffff88032418e000 task.ti: ffff88032418e000
RIP: 0010:[<ffffffff811b1dd4>] [<ffffffff811b1dd4>] bio_advance+0x64/0xe0
RSP: 0018:ffff88032418f768 EFLAGS: 00000206
RAX: 0000000000000e00 RBX: ffff880321cec800 RCX: 0000000000000400
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
RBP: ffff88032418f788 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000006bf R11: 00000000000006bf R12: ffff88032efeec60
R13: ffff88032418f6e8 R14: ffffffff8104c24f R15: ffff88032418f6f8
FS: 00007fdd500a2740(0000) GS:ffff88033fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdd4a210000 CR3: 0000000324296000 CR4: 00000000000007f0
Stack:
0000000000000000 ffff8802cbc5c5c0 0000000000000088 0000000000001000
ffff88032418f7e8 ffffffffa036663c 0000000100000008 ffffc90006618040
ffff88032418f838 ffff88032efeec40 ffff880300000000 ffff88032418f838
Call Trace:
[<ffffffffa036663c>] __clone_and_map_data_bio+0xcc/0x110 [dm_mod]
[<ffffffffa0366706>] __split_and_process_non_flush+0x86/0xd0 [dm_mod]
[<ffffffffa03668ff>] __split_and_process_bio+0x1af/0x200 [dm_mod]
[<ffffffffa0366a72>] _dm_request+0x122/0x190 [dm_mod]
[<ffffffffa0366b08>] dm_request+0x28/0x40 [dm_mod]
[<ffffffff81247040>] generic_make_request+0xc0/0x100
[<ffffffff81247100>] submit_bio+0x80/0x170
[<ffffffff811b7d9a>] do_direct_IO+0x6ea/0x10f0
[<ffffffff811b8cf6>] do_blockdev_direct_IO+0x556/0x980
[<ffffffff811b3d60>] ? I_BDEV+0x10/0x10
[<ffffffff8112b179>] ? invalidate_inode_pages2_range+0x229/0x2c0
[<ffffffff811b9175>] __blockdev_direct_IO+0x55/0x60
[<ffffffff811b3d60>] ? I_BDEV+0x10/0x10
[<ffffffff811b4ce6>] blkdev_direct_IO+0x56/0x60
[<ffffffff811b3d60>] ? I_BDEV+0x10/0x10
[<ffffffff8111c5f2>] generic_file_direct_write+0xc2/0x180
[<ffffffff8111ddf0>] __generic_file_aio_write+0x2d0/0x3b0
[<ffffffff811b42e6>] blkdev_aio_write+0x56/0xa0
[<ffffffff8152667c>] ? __do_page_fault+0x25c/0x4b0
[<ffffffff8117b8af>] do_sync_write+0x5f/0xa0
[<ffffffff8117bb4d>] ? rw_verify_area+0x5d/0xe0
[<ffffffff8117bc98>] vfs_write+0xc8/0x170
[<ffffffff8117c89f>] SyS_pwrite64+0x9f/0xb0
[<ffffffff8152b252>] system_call_fastpath+0x16/0x1b
Code: 8b 6e 78 77 67 45 85 e4 74 4a 8b 53 08 eb 05 45 85 e4 74 40 8b 4b 0c 48 c1 e1 04 42 8b 44 29 08 2b 43 10 39 d0 0f 47 c2 44 39 e0 <41> 0f 47 c4 29 c2 41 29 c4 03
43 10 89 53 08 89 43 10 42 3b 44

2013-10-03 03:17:37

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 16/22] dm: Refactor for new bio cloning/splitting

On Sat, Sep 28, 2013 at 12:59:09AM -0400, Mike Snitzer wrote:
> Hey Kent,
>
> I haven't been able to pinpoint the issue yet, but using your for-jens
> branch, if I create a dm-thin volume with this lvm command:
> lvcreate -L20G -V20G -T vg/pool --name thinlv
>
> and try to format /dev/vg/thinlv with XFS the kernel warns and then
> hangs with the following:

Thanks, for whatever reason ext4 wasn't tickling that codepath but xfs
does. I folded the fix into my for-jens branch, here's what changed:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b60b350..79cee1a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1098,7 +1098,7 @@ static void clone_bio(struct dm_target_io *tio, struct bio *bio,
bio_integrity_clone(clone, bio, GFP_NOIO);

bio_advance(clone, to_bytes(sector - clone->bi_iter.bi_sector));
- bio->bi_iter.bi_size = to_bytes(len);
+ clone->bi_iter.bi_size = to_bytes(len);

if (bio_integrity(bio))
bio_integrity_trim(clone, 0, len);
@@ -1267,7 +1267,7 @@ static int __split_and_process_non_flush(struct clone_info *ci)
if (!dm_target_is_valid(ti))
return -EIO;

- len = min_t(unsigned, max_io_len(ci->sector, ti), bio_sectors(bio));
+ len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count);

__clone_and_map_data_bio(ci, ti, ci->sector, len);

2013-10-03 03:23:26

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 16/22] dm: Refactor for new bio cloning/splitting

On Wed, Oct 02 2013 at 11:17pm -0400,
Kent Overstreet <[email protected]> wrote:

> On Sat, Sep 28, 2013 at 12:59:09AM -0400, Mike Snitzer wrote:
> > Hey Kent,
> >
> > I haven't been able to pinpoint the issue yet, but using your for-jens
> > branch, if I create a dm-thin volume with this lvm command:
> > lvcreate -L20G -V20G -T vg/pool --name thinlv
> >
> > and try to format /dev/vg/thinlv with XFS the kernel warns and then
> > hangs with the following:
>
> Thanks, for whatever reason ext4 wasn't tickling that codepath but xfs
> does. I folded the fix into my for-jens branch, here's what changed:
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index b60b350..79cee1a 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1098,7 +1098,7 @@ static void clone_bio(struct dm_target_io *tio, struct bio *bio,
> bio_integrity_clone(clone, bio, GFP_NOIO);
>
> bio_advance(clone, to_bytes(sector - clone->bi_iter.bi_sector));
> - bio->bi_iter.bi_size = to_bytes(len);
> + clone->bi_iter.bi_size = to_bytes(len);
>
> if (bio_integrity(bio))
> bio_integrity_trim(clone, 0, len);

The above change matches my findings during review (I clearly should've
shared it sooner)

> @@ -1267,7 +1267,7 @@ static int __split_and_process_non_flush(struct clone_info *ci)
> if (!dm_target_is_valid(ti))
> return -EIO;
>
> - len = min_t(unsigned, max_io_len(ci->sector, ti), bio_sectors(bio));
> + len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count);
>
> __clone_and_map_data_bio(ci, ti, ci->sector, len);
>

Great, thanks for finding this, I'll test and review the code further.

2013-10-03 21:45:24

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 16/22] dm: Refactor for new bio cloning/splitting

On Wed, Oct 02, 2013 at 11:23:21PM -0400, Mike Snitzer wrote:
> Great, thanks for finding this, I'll test and review the code further.

Cool - let me know if you find anything else (how thoroughly do you
think you've tested it so far?)

2013-10-03 22:50:23

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 16/22] dm: Refactor for new bio cloning/splitting

On Thu, Oct 03 2013 at 5:45pm -0400,
Kent Overstreet <[email protected]> wrote:

> On Wed, Oct 02, 2013 at 11:23:21PM -0400, Mike Snitzer wrote:
> > Great, thanks for finding this, I'll test and review the code further.
>
> Cool - let me know if you find anything else (how thoroughly do you
> think you've tested it so far?)

Not very, the issue you resolved was a blocker for DM. But I haven't
even verified your fix yet though... will do tomorrow with more
extensive testing using the dm-thinp and dm-cache test-suite.

2013-10-04 17:07:39

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 16/22] dm: Refactor for new bio cloning/splitting

On Thu, Oct 03 2013 at 6:50pm -0400,
Mike Snitzer <[email protected]> wrote:

> On Thu, Oct 03 2013 at 5:45pm -0400,
> Kent Overstreet <[email protected]> wrote:
>
> > On Wed, Oct 02, 2013 at 11:23:21PM -0400, Mike Snitzer wrote:
> > > Great, thanks for finding this, I'll test and review the code further.
> >
> > Cool - let me know if you find anything else (how thoroughly do you
> > think you've tested it so far?)
>
> Not very, the issue you resolved was a blocker for DM. But I haven't
> even verified your fix yet though... will do tomorrow with more
> extensive testing using the dm-thinp and dm-cache test-suite.

With your latest fix I was able to create a thin device and format with
XFS. Unfortunately, when I tried to run the thinp-test-suite the very
first BasicTests test (test_dd_benchmark) fails -- need to look closer
but it would seem to me the thinp saved bio_endio path isn't happy. We
likely need an appropriately placed atomic_inc(&bio->bi_remaining); like
you did in dm-cache-target.c

------------[ cut here ]------------
kernel BUG at fs/bio.c:1722!
invalid opcode: 0000 [#1] SMP
Modules linked in: dm_cache_cleaner dm_cache_mq dm_cache dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mod xfs exportfs libcrc32c ebtable_nat ebtables xt_C
HECKSUM iptable_mangle bridge autofs4 target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc fcoe 8021q libfcoe libfc garp stp llc scsi_
transport_fc scsi_tgt sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_stat
e nf_conntrack ip6table_filter ip6_tables bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vhost_net macvtap
macvlan vhost tun kvm_intel kvm iTCO_wdt iTCO_vendor_support microcode i2c_i801 lpc_ich mfd_core igb i2c_algo_bit i2c_core i7core_edac edac_core ixgbe dca ptp pps_cor
e mdio sg ses enclosure acpi_cpufreq freq_table ext4 jbd2 mbcache sr_mod cdrom pata_acpi ata_generic ata_piix sd_mod crc_t10dif crct10dif_common megaraid_sas [last unl
oaded: dm_cache_mq]
CPU: 0 PID: 342 Comm: kworker/u24:7 Tainted: G W 3.12.0-rc2.snitm+ #76
Hardware name: FUJITSU PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.10.2619.N1 05/24/2011
Workqueue: dm-thin do_worker [dm_thin_pool]
task: ffff88032cd88ab0 ti: ffff88032c5e6000 task.ti: ffff88032c5e6000
RIP: 0010:[<ffffffff811b1464>] [<ffffffff811b1464>] bio_endio+0x74/0x80
RSP: 0018:ffff88032c5e7d48 EFLAGS: 00010246
RAX: ffff88032f10a784 RBX: ffff88032d097450 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88032f10a740
RBP: ffff88032c5e7d48 R08: ffff88033fc12d00 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000002 R12: ffff88032f10a740
R13: ffff88032def5880 R14: ffff88032d0c7c00 R15: ffff88032d457a05
FS: 0000000000000000(0000) GS:ffff88033fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003bfea13000 CR3: 0000000001a0b000 CR4: 00000000000007f0
Stack:
ffff88032c5e7d98 ffffffffa05f2ef9 ffff88032c5e7da8 0000000000000000
ffff88032d0c7c00 ffff88032c5e7da8 ffff88032d0c7d98 ffff88032d0974a8
ffff88032d0c7d14 ffff88032d457a05 ffff88032c5e7dd8 ffffffffa05f2ba7
Call Trace:
[<ffffffffa05f2ef9>] process_prepared_mapping+0x79/0x150 [dm_thin_pool]
[<ffffffffa05f2ba7>] process_prepared+0x87/0xa0 [dm_thin_pool]
[<ffffffffa05f58f3>] do_worker+0x33/0x60 [dm_thin_pool]
[<ffffffff81067862>] process_one_work+0x182/0x3b0
[<ffffffff81068c80>] worker_thread+0x120/0x3a0
[<ffffffff81068b60>] ? manage_workers+0x160/0x160
[<ffffffff8106eace>] kthread+0xce/0xe0
[<ffffffff8106ea00>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8152b1ac>] ret_from_fork+0x7c/0xb0
[<ffffffff8106ea00>] ? kthread_freezable_should_stop+0x70/0x70
Code: 1f 84 00 00 00 00 00 48 8b 57 10 83 e2 01 0f 44 f1 eb cd 0f 1f 40 00 48 8b 7f 50 48 85 ff 74 dd 8b 57 44 48 8d 47 44 85 d2 7f ac <0f> 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90
RIP [<ffffffff811b1464>] bio_endio+0x74/0x80
RSP <ffff88032c5e7d48>
---[ end trace acb5a7d638591b7b ]---

2013-10-07 00:14:16

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 16/22] dm: Refactor for new bio cloning/splitting

On Fri, Oct 04 2013 at 1:07pm -0400,
Mike Snitzer <[email protected]> wrote:

> With your latest fix I was able to create a thin device and format with
> XFS. Unfortunately, when I tried to run the thinp-test-suite the very
> first BasicTests test (test_dd_benchmark) fails -- need to look closer
> but it would seem to me the thinp saved bio_endio path isn't happy. We
> likely need an appropriately placed atomic_inc(&bio->bi_remaining); like
> you did in dm-cache-target.c
>
> ------------[ cut here ]------------
> kernel BUG at fs/bio.c:1722!
...
> Call Trace:
> [<ffffffffa05f2ef9>] process_prepared_mapping+0x79/0x150 [dm_thin_pool]
> [<ffffffffa05f2ba7>] process_prepared+0x87/0xa0 [dm_thin_pool]
> [<ffffffffa05f58f3>] do_worker+0x33/0x60 [dm_thin_pool]
> [<ffffffff81067862>] process_one_work+0x182/0x3b0
> [<ffffffff81068c80>] worker_thread+0x120/0x3a0
> [<ffffffff81068b60>] ? manage_workers+0x160/0x160
> [<ffffffff8106eace>] kthread+0xce/0xe0
> [<ffffffff8106ea00>] ? kthread_freezable_should_stop+0x70/0x70
> [<ffffffff8152b1ac>] ret_from_fork+0x7c/0xb0
> [<ffffffff8106ea00>] ? kthread_freezable_should_stop+0x70/0x70
> Code: 1f 84 00 00 00 00 00 48 8b 57 10 83 e2 01 0f 44 f1 eb cd 0f 1f 40 00 48 8b 7f 50 48 85 ff 74 dd 8b 57 44 48 8d 47 44 85 d2 7f ac <0f> 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90
> RIP [<ffffffff811b1464>] bio_endio+0x74/0x80
> RSP <ffff88032c5e7d48>
> ---[ end trace acb5a7d638591b7b ]---

Please fold this fix into your for-jens branch, thanks. (Could be that
by the time Jens takes your immutable biovec changes we'll need to
rebase but at least it won't slip through the cracks).

---
drivers/md/dm-thin.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index a654024..1abb4a2 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -611,8 +611,10 @@ static void cell_defer_no_holder(struct thin_c *tc, struct dm_bio_prison_cell *c

static void process_prepared_mapping_fail(struct dm_thin_new_mapping *m)
{
- if (m->bio)
+ if (m->bio) {
m->bio->bi_end_io = m->saved_bi_end_io;
+ atomic_inc(&m->bio->bi_remaining);
+ }
cell_error(m->tc->pool, m->cell);
list_del(&m->list);
mempool_free(m, m->tc->pool->mapping_pool);
@@ -626,8 +628,10 @@ static void process_prepared_mapping(struct dm_thin_new_mapping *m)
int r;

bio = m->bio;
- if (bio)
+ if (bio) {
bio->bi_end_io = m->saved_bi_end_io;
+ atomic_inc(&bio->bi_remaining);
+ }

if (m->err) {
cell_error(pool, m->cell);

2013-10-11 04:13:12

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 16/22] dm: Refactor for new bio cloning/splitting

On Sun, Oct 06, 2013 at 08:14:10PM -0400, Mike Snitzer wrote:
> On Fri, Oct 04 2013 at 1:07pm -0400,
> Mike Snitzer <[email protected]> wrote:
>
> > With your latest fix I was able to create a thin device and format with
> > XFS. Unfortunately, when I tried to run the thinp-test-suite the very
> > first BasicTests test (test_dd_benchmark) fails -- need to look closer
> > but it would seem to me the thinp saved bio_endio path isn't happy. We
> > likely need an appropriately placed atomic_inc(&bio->bi_remaining); like
> > you did in dm-cache-target.c
> >
> > ------------[ cut here ]------------
> > kernel BUG at fs/bio.c:1722!
> ...
> > Call Trace:
> > [<ffffffffa05f2ef9>] process_prepared_mapping+0x79/0x150 [dm_thin_pool]
> > [<ffffffffa05f2ba7>] process_prepared+0x87/0xa0 [dm_thin_pool]
> > [<ffffffffa05f58f3>] do_worker+0x33/0x60 [dm_thin_pool]
> > [<ffffffff81067862>] process_one_work+0x182/0x3b0
> > [<ffffffff81068c80>] worker_thread+0x120/0x3a0
> > [<ffffffff81068b60>] ? manage_workers+0x160/0x160
> > [<ffffffff8106eace>] kthread+0xce/0xe0
> > [<ffffffff8106ea00>] ? kthread_freezable_should_stop+0x70/0x70
> > [<ffffffff8152b1ac>] ret_from_fork+0x7c/0xb0
> > [<ffffffff8106ea00>] ? kthread_freezable_should_stop+0x70/0x70
> > Code: 1f 84 00 00 00 00 00 48 8b 57 10 83 e2 01 0f 44 f1 eb cd 0f 1f 40 00 48 8b 7f 50 48 85 ff 74 dd 8b 57 44 48 8d 47 44 85 d2 7f ac <0f> 0b eb fe 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90
> > RIP [<ffffffff811b1464>] bio_endio+0x74/0x80
> > RSP <ffff88032c5e7d48>
> > ---[ end trace acb5a7d638591b7b ]---
>
> Please fold this fix into your for-jens branch, thanks. (Could be that
> by the time Jens takes your immutable biovec changes we'll need to
> rebase but at least it won't slip through the cracks).

Thanks! I knew that bio chaining patch was going to cause a few issues
like this, but it seems to useful to pass up. Anything else come up/any
comments?

2013-10-11 21:16:48

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 16/22] dm: Refactor for new bio cloning/splitting

On Fri, Oct 11 2013 at 12:13am -0400,
Kent Overstreet <[email protected]> wrote:

> On Sun, Oct 06, 2013 at 08:14:10PM -0400, Mike Snitzer wrote:
> >
> > Please fold this fix into your for-jens branch, thanks. (Could be that
> > by the time Jens takes your immutable biovec changes we'll need to
> > rebase but at least it won't slip through the cracks).
>
> Thanks! I knew that bio chaining patch was going to cause a few issues
> like this, but it seems to useful to pass up. Anything else come up/any
> comments?

I'm wondering if there might be a cleaner way to hide the quirky nature
of the bio chaining with saved/restored bi_end_io. Joe Thornber
implemented utility wrappers local to dm-cache, see:
http://people.redhat.com/msnitzer/patches/upstream/dm-cache-for-v3.13/dm-cache-utils-for-hooking-bios.patch

would be natural to elevate something like this to the block layer and
then bury the quirk of needing to bump bi_remaining when the bi_end_io
is restored in unhook_bio().

Aside from that idea, your immutable biovec changes look sane to me. I
like how much was able to be removed from DM core. I don't see any
remaining problems that stand out. Feel free to add the following to
your DM patches in the series:

Reviewed-by: Mike Snitzer <[email protected]>

However, I did collect various style nits in the DM code during my
review. I'd like to see these changes applied:

Author: Mike Snitzer <[email protected]>
Date: Fri Sep 27 18:14:38 2013 -0400

dm: style nits and comment for immutable biovec changes
---
drivers/md/dm-cache-target.c | 16 ++++++++++------
drivers/md/dm-crypt.c | 8 ++------
drivers/md/dm-io.c | 3 +--
drivers/md/dm-snap.c | 8 ++++----
drivers/md/dm-thin.c | 3 +--
drivers/md/dm-verity.c | 3 +--
drivers/md/dm.c | 6 ++----
include/linux/dm-io.h | 2 +-
8 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index a9acec6..1ccbfff 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -571,13 +571,13 @@ static void remap_to_cache(struct cache *cache, struct bio *bio,

bio->bi_bdev = cache->cache_dev->bdev;
if (!block_size_is_power_of_two(cache))
- bio->bi_iter.bi_sector = (from_cblock(cblock) *
- cache->sectors_per_block) +
- sector_div(bi_sector, cache->sectors_per_block);
+ bio->bi_iter.bi_sector =
+ (from_cblock(cblock) * cache->sectors_per_block) +
+ sector_div(bi_sector, cache->sectors_per_block);
else
- bio->bi_iter.bi_sector = (from_cblock(cblock) <<
- cache->sectors_per_block_shift) |
- (bi_sector & (cache->sectors_per_block - 1));
+ bio->bi_iter.bi_sector =
+ (from_cblock(cblock) << cache->sectors_per_block_shift) |
+ (bi_sector & (cache->sectors_per_block - 1));
}

static void check_if_tick_bio_needed(struct cache *cache, struct bio *bio)
@@ -666,6 +666,10 @@ static void writethrough_endio(struct bio *bio, int err)
struct per_bio_data *pb = get_per_bio_data(bio, PB_DATA_SIZE_WT);
bio->bi_end_io = pb->saved_bi_end_io;

+ /*
+ * Must bump bi_remaining to allow bio to complete with
+ * restored bi_end_io.
+ */
atomic_inc(&bio->bi_remaining);

if (err) {
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index e0b902f..4db7e48 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -648,12 +648,10 @@ static void crypt_convert_init(struct crypt_config *cc,
{
ctx->bio_in = bio_in;
ctx->bio_out = bio_out;
-
if (bio_in)
ctx->iter_in = bio_in->bi_iter;
if (bio_out)
ctx->iter_out = bio_out->bi_iter;
-
ctx->cc_sector = sector + cc->iv_offset;
init_completion(&ctx->restart);
}
@@ -752,8 +750,7 @@ static int crypt_convert(struct crypt_config *cc,

atomic_set(&ctx->cc_pending, 1);

- while (ctx->iter_in.bi_size &&
- ctx->iter_out.bi_size) {
+ while (ctx->iter_in.bi_size && ctx->iter_out.bi_size) {

crypt_alloc_req(cc, ctx);

@@ -1676,8 +1673,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
}

- io = crypt_io_alloc(cc, bio,
- dm_target_offset(ti, bio->bi_iter.bi_sector));
+ io = crypt_io_alloc(cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector));

if (bio_data_dir(io->base_bio) == READ) {
if (kcryptd_io_read(io, GFP_NOWAIT))
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 9cc1f3a..b2b8a10 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -307,8 +307,7 @@ static void do_region(int rw, unsigned region, struct dm_io_region *where,
dm_sector_div_up(remaining, (PAGE_SIZE >> SECTOR_SHIFT)));

bio = bio_alloc_bioset(GFP_NOIO, num_bvecs, io->client->bios);
- bio->bi_iter.bi_sector = where->sector +
- (where->count - remaining);
+ bio->bi_iter.bi_sector = where->sector + (where->count - remaining);
bio->bi_bdev = where->bdev;
bio->bi_end_io = endio;
store_io_and_region_in_bio(bio, io, region);
diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index 2e55bda..b7a5053 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1563,10 +1563,10 @@ static void remap_exception(struct dm_snapshot *s, struct dm_exception *e,
{
bio->bi_bdev = s->cow->bdev;
bio->bi_iter.bi_sector = chunk_to_sector(s->store,
- dm_chunk_number(e->new_chunk) +
- (chunk - e->old_chunk)) +
- (bio->bi_iter.bi_sector &
- s->store->chunk_mask);
+ dm_chunk_number(e->new_chunk) +
+ (chunk - e->old_chunk)) +
+ (bio->bi_iter.bi_sector &
+ s->store->chunk_mask);
}

static int snapshot_map(struct dm_target *ti, struct bio *bio)
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 6742927..a654024 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1255,8 +1255,7 @@ static void process_bio_read_only(struct thin_c *tc, struct bio *bio)
r = dm_thin_find_block(tc->td, block, 1, &lookup_result);
switch (r) {
case 0:
- if (lookup_result.shared &&
- (rw == WRITE) && bio->bi_iter.bi_size)
+ if (lookup_result.shared && (rw == WRITE) && bio->bi_iter.bi_size)
bio_io_error(bio);
else {
inc_all_io_entry(tc->pool, bio);
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index 64d829a..ac35e95 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -496,8 +496,7 @@ static int verity_map(struct dm_target *ti, struct bio *bio)
io->v = v;
io->orig_bi_end_io = bio->bi_end_io;
io->orig_bi_private = bio->bi_private;
- io->block = bio->bi_iter.bi_sector >>
- (v->data_dev_block_bits - SECTOR_SHIFT);
+ io->block = bio->bi_iter.bi_sector >> (v->data_dev_block_bits - SECTOR_SHIFT);
io->n_blocks = bio->bi_iter.bi_size >> v->data_dev_block_bits;

bio->bi_end_io = verity_end_io;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a396137..5846801 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -539,8 +539,7 @@ static void start_io_acct(struct dm_io *io)
atomic_inc_return(&md->pending[rw]));

if (unlikely(dm_stats_used(&md->stats)))
- dm_stats_account_io(&md->stats, bio->bi_rw,
- bio->bi_iter.bi_sector,
+ dm_stats_account_io(&md->stats, bio->bi_rw, bio->bi_iter.bi_sector,
bio_sectors(bio), false, 0, &io->stats_aux);
}

@@ -558,8 +557,7 @@ static void end_io_acct(struct dm_io *io)
part_stat_unlock();

if (unlikely(dm_stats_used(&md->stats)))
- dm_stats_account_io(&md->stats, bio->bi_rw,
- bio->bi_iter.bi_sector,
+ dm_stats_account_io(&md->stats, bio->bi_rw, bio->bi_iter.bi_sector,
bio_sectors(bio), true, duration, &io->stats_aux);

/*
diff --git a/include/linux/dm-io.h b/include/linux/dm-io.h
index 6cf1f62..a68cbe5 100644
--- a/include/linux/dm-io.h
+++ b/include/linux/dm-io.h
@@ -29,7 +29,7 @@ typedef void (*io_notify_fn)(unsigned long error, void *context);

enum dm_io_mem_type {
DM_IO_PAGE_LIST,/* Page list */
- DM_IO_BIO,
+ DM_IO_BIO, /* Bio vector */
DM_IO_VMA, /* Virtual memory area */
DM_IO_KMEM, /* Kernel memory */
};

2013-10-11 22:11:29

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 16/22] dm: Refactor for new bio cloning/splitting

On Fri, Oct 11, 2013 at 05:16:42PM -0400, Mike Snitzer wrote:
> On Fri, Oct 11 2013 at 12:13am -0400,
> Kent Overstreet <[email protected]> wrote:
>
> > On Sun, Oct 06, 2013 at 08:14:10PM -0400, Mike Snitzer wrote:
> > >
> > > Please fold this fix into your for-jens branch, thanks. (Could be that
> > > by the time Jens takes your immutable biovec changes we'll need to
> > > rebase but at least it won't slip through the cracks).
> >
> > Thanks! I knew that bio chaining patch was going to cause a few issues
> > like this, but it seems to useful to pass up. Anything else come up/any
> > comments?
>
> I'm wondering if there might be a cleaner way to hide the quirky nature
> of the bio chaining with saved/restored bi_end_io. Joe Thornber
> implemented utility wrappers local to dm-cache, see:
> http://people.redhat.com/msnitzer/patches/upstream/dm-cache-for-v3.13/dm-cache-utils-for-hooking-bios.patch
>
> would be natural to elevate something like this to the block layer and
> then bury the quirk of needing to bump bi_remaining when the bi_end_io
> is restored in unhook_bio().

Hmm, that might not be a bad idea. I suppose there are a decent number
of places scattered around (like the bio integrity code) that are simple
enough that there's no point in cloning the bio, they just want to hook
into into the completion...

If I get time I may grep around and see what sort of consolidation
that'd enable.

> Aside from that idea, your immutable biovec changes look sane to me. I
> like how much was able to be removed from DM core. I don't see any
> remaining problems that stand out. Feel free to add the following to
> your DM patches in the series:
>
> Reviewed-by: Mike Snitzer <[email protected]>

Great, thanks!

> However, I did collect various style nits in the DM code during my
> review. I'd like to see these changes applied:

I'll fold them in :)

>
> Author: Mike Snitzer <[email protected]>
> Date: Fri Sep 27 18:14:38 2013 -0400
>
> dm: style nits and comment for immutable biovec changes
> ---
> drivers/md/dm-cache-target.c | 16 ++++++++++------
> drivers/md/dm-crypt.c | 8 ++------
> drivers/md/dm-io.c | 3 +--
> drivers/md/dm-snap.c | 8 ++++----
> drivers/md/dm-thin.c | 3 +--
> drivers/md/dm-verity.c | 3 +--
> drivers/md/dm.c | 6 ++----
> include/linux/dm-io.h | 2 +-
> 8 files changed, 22 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
> index a9acec6..1ccbfff 100644
> --- a/drivers/md/dm-cache-target.c
> +++ b/drivers/md/dm-cache-target.c
> @@ -571,13 +571,13 @@ static void remap_to_cache(struct cache *cache, struct bio *bio,
>
> bio->bi_bdev = cache->cache_dev->bdev;
> if (!block_size_is_power_of_two(cache))
> - bio->bi_iter.bi_sector = (from_cblock(cblock) *
> - cache->sectors_per_block) +
> - sector_div(bi_sector, cache->sectors_per_block);
> + bio->bi_iter.bi_sector =
> + (from_cblock(cblock) * cache->sectors_per_block) +
> + sector_div(bi_sector, cache->sectors_per_block);
> else
> - bio->bi_iter.bi_sector = (from_cblock(cblock) <<
> - cache->sectors_per_block_shift) |
> - (bi_sector & (cache->sectors_per_block - 1));
> + bio->bi_iter.bi_sector =
> + (from_cblock(cblock) << cache->sectors_per_block_shift) |
> + (bi_sector & (cache->sectors_per_block - 1));
> }
>
> static void check_if_tick_bio_needed(struct cache *cache, struct bio *bio)
> @@ -666,6 +666,10 @@ static void writethrough_endio(struct bio *bio, int err)
> struct per_bio_data *pb = get_per_bio_data(bio, PB_DATA_SIZE_WT);
> bio->bi_end_io = pb->saved_bi_end_io;
>
> + /*
> + * Must bump bi_remaining to allow bio to complete with
> + * restored bi_end_io.
> + */
> atomic_inc(&bio->bi_remaining);
>
> if (err) {
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index e0b902f..4db7e48 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -648,12 +648,10 @@ static void crypt_convert_init(struct crypt_config *cc,
> {
> ctx->bio_in = bio_in;
> ctx->bio_out = bio_out;
> -
> if (bio_in)
> ctx->iter_in = bio_in->bi_iter;
> if (bio_out)
> ctx->iter_out = bio_out->bi_iter;
> -
> ctx->cc_sector = sector + cc->iv_offset;
> init_completion(&ctx->restart);
> }
> @@ -752,8 +750,7 @@ static int crypt_convert(struct crypt_config *cc,
>
> atomic_set(&ctx->cc_pending, 1);
>
> - while (ctx->iter_in.bi_size &&
> - ctx->iter_out.bi_size) {
> + while (ctx->iter_in.bi_size && ctx->iter_out.bi_size) {
>
> crypt_alloc_req(cc, ctx);
>
> @@ -1676,8 +1673,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
> return DM_MAPIO_REMAPPED;
> }
>
> - io = crypt_io_alloc(cc, bio,
> - dm_target_offset(ti, bio->bi_iter.bi_sector));
> + io = crypt_io_alloc(cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector));
>
> if (bio_data_dir(io->base_bio) == READ) {
> if (kcryptd_io_read(io, GFP_NOWAIT))
> diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
> index 9cc1f3a..b2b8a10 100644
> --- a/drivers/md/dm-io.c
> +++ b/drivers/md/dm-io.c
> @@ -307,8 +307,7 @@ static void do_region(int rw, unsigned region, struct dm_io_region *where,
> dm_sector_div_up(remaining, (PAGE_SIZE >> SECTOR_SHIFT)));
>
> bio = bio_alloc_bioset(GFP_NOIO, num_bvecs, io->client->bios);
> - bio->bi_iter.bi_sector = where->sector +
> - (where->count - remaining);
> + bio->bi_iter.bi_sector = where->sector + (where->count - remaining);
> bio->bi_bdev = where->bdev;
> bio->bi_end_io = endio;
> store_io_and_region_in_bio(bio, io, region);
> diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
> index 2e55bda..b7a5053 100644
> --- a/drivers/md/dm-snap.c
> +++ b/drivers/md/dm-snap.c
> @@ -1563,10 +1563,10 @@ static void remap_exception(struct dm_snapshot *s, struct dm_exception *e,
> {
> bio->bi_bdev = s->cow->bdev;
> bio->bi_iter.bi_sector = chunk_to_sector(s->store,
> - dm_chunk_number(e->new_chunk) +
> - (chunk - e->old_chunk)) +
> - (bio->bi_iter.bi_sector &
> - s->store->chunk_mask);
> + dm_chunk_number(e->new_chunk) +
> + (chunk - e->old_chunk)) +
> + (bio->bi_iter.bi_sector &
> + s->store->chunk_mask);
> }
>
> static int snapshot_map(struct dm_target *ti, struct bio *bio)
> diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> index 6742927..a654024 100644
> --- a/drivers/md/dm-thin.c
> +++ b/drivers/md/dm-thin.c
> @@ -1255,8 +1255,7 @@ static void process_bio_read_only(struct thin_c *tc, struct bio *bio)
> r = dm_thin_find_block(tc->td, block, 1, &lookup_result);
> switch (r) {
> case 0:
> - if (lookup_result.shared &&
> - (rw == WRITE) && bio->bi_iter.bi_size)
> + if (lookup_result.shared && (rw == WRITE) && bio->bi_iter.bi_size)
> bio_io_error(bio);
> else {
> inc_all_io_entry(tc->pool, bio);
> diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
> index 64d829a..ac35e95 100644
> --- a/drivers/md/dm-verity.c
> +++ b/drivers/md/dm-verity.c
> @@ -496,8 +496,7 @@ static int verity_map(struct dm_target *ti, struct bio *bio)
> io->v = v;
> io->orig_bi_end_io = bio->bi_end_io;
> io->orig_bi_private = bio->bi_private;
> - io->block = bio->bi_iter.bi_sector >>
> - (v->data_dev_block_bits - SECTOR_SHIFT);
> + io->block = bio->bi_iter.bi_sector >> (v->data_dev_block_bits - SECTOR_SHIFT);
> io->n_blocks = bio->bi_iter.bi_size >> v->data_dev_block_bits;
>
> bio->bi_end_io = verity_end_io;
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index a396137..5846801 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -539,8 +539,7 @@ static void start_io_acct(struct dm_io *io)
> atomic_inc_return(&md->pending[rw]));
>
> if (unlikely(dm_stats_used(&md->stats)))
> - dm_stats_account_io(&md->stats, bio->bi_rw,
> - bio->bi_iter.bi_sector,
> + dm_stats_account_io(&md->stats, bio->bi_rw, bio->bi_iter.bi_sector,
> bio_sectors(bio), false, 0, &io->stats_aux);
> }
>
> @@ -558,8 +557,7 @@ static void end_io_acct(struct dm_io *io)
> part_stat_unlock();
>
> if (unlikely(dm_stats_used(&md->stats)))
> - dm_stats_account_io(&md->stats, bio->bi_rw,
> - bio->bi_iter.bi_sector,
> + dm_stats_account_io(&md->stats, bio->bi_rw, bio->bi_iter.bi_sector,
> bio_sectors(bio), true, duration, &io->stats_aux);
>
> /*
> diff --git a/include/linux/dm-io.h b/include/linux/dm-io.h
> index 6cf1f62..a68cbe5 100644
> --- a/include/linux/dm-io.h
> +++ b/include/linux/dm-io.h
> @@ -29,7 +29,7 @@ typedef void (*io_notify_fn)(unsigned long error, void *context);
>
> enum dm_io_mem_type {
> DM_IO_PAGE_LIST,/* Page list */
> - DM_IO_BIO,
> + DM_IO_BIO, /* Bio vector */
> DM_IO_VMA, /* Virtual memory area */
> DM_IO_KMEM, /* Kernel memory */
> };