2013-06-09 02:19:40

by Kent Overstreet

[permalink] [raw]
Subject: Immutable biovecs, dio rewrite

Immutable biovecs: Drivers no longer modify the biovec array directly
(bv_len/bv_offset in particular) - we add a real iterator to struct bio
that lets drivers partially complete a bio while only modifying the
iterator. The iterator has the existing bi_sector, bi_size, bi_idx
memembers, and also bi_bvec_done.

This gets us a couple things:
* Changing all the drivers to go through the iterator means that we can
submit a partially completed bio to generic_make_request() - this
previously worked on some drivers, but worked on others.

This makes it much easier for upper layers to process bios
incrementally - not just stacking drivers, my dio rewrite relies
heavily on this strategy.

* Previously, any code that might need to retry a bio somehow if it
errored (mainly stacking drivers) had to clone not just the bio, but
the entire biovec. The biovec can be up to BIO_MAX_PAGES, which works
out to 4k...

* When cloning a bio, now we don't have to clone the biovec unless we
want to modify it. Bio splitting also becomes just a special case of
cloning a bio.

We also get to delete a lot of code. And this patch series barely
scratches the surface - I've got more patches that delete another 1.5k
lines of code, without trying all that hard.

I'd like to get as much of this into 3.11 as possible - I don't know if
the dio rewrite is a realistic possibility (it currently breaks btrfs -
we need to add a different hook for them) and it does need a lot of
review and testing from the various driver maintainers. The dio rewrite
does pass xfstests for me, though.

Patch series is on top of v3.10-rc4, and it's available in my git tree:

git://evilpiepirate.org/~kent/linux-bcache.git block
http://evilpiepirate.org/git/linux-bcache.git block

Documentation/block/biodoc.txt | 7 +-
arch/m68k/emu/nfblock.c | 13 +-
arch/powerpc/sysdev/axonram.c | 21 +-
block/blk-core.c | 74 +-
block/blk-flush.c | 2 +-
block/blk-integrity.c | 40 +-
block/blk-lib.c | 179 +---
block/blk-map.c | 6 +-
block/blk-merge.c | 193 ++++-
block/blk-throttle.c | 13 +-
block/blk.h | 3 +
block/elevator.c | 2 +-
drivers/block/aoe/aoe.h | 10 +-
drivers/block/aoe/aoecmd.c | 145 ++--
drivers/block/brd.c | 16 +-
drivers/block/drbd/drbd_actlog.c | 2 +-
drivers/block/drbd/drbd_bitmap.c | 2 +-
drivers/block/drbd/drbd_main.c | 27 +-
drivers/block/drbd/drbd_receiver.c | 19 +-
drivers/block/drbd/drbd_req.c | 6 +-
drivers/block/drbd/drbd_req.h | 2 +-
drivers/block/drbd/drbd_worker.c | 8 +-
drivers/block/floppy.c | 16 +-
drivers/block/loop.c | 27 +-
drivers/block/mtip32xx/mtip32xx.c | 22 +-
drivers/block/nbd.c | 14 +-
drivers/block/nvme-core.c | 144 +---
drivers/block/pktcdvd.c | 178 ++--
drivers/block/ps3disk.c | 7 +-
drivers/block/ps3vram.c | 10 +-
drivers/block/rbd.c | 89 +-
drivers/block/rsxx/dev.c | 4 +-
drivers/block/rsxx/dma.c | 15 +-
drivers/block/umem.c | 53 +-
drivers/block/virtio_blk.c | 4 +-
drivers/block/xen-blkback/blkback.c | 2 +-
drivers/block/xen-blkfront.c | 14 +-
drivers/md/bcache/alloc.c | 4 +-
drivers/md/bcache/bcache.h | 20 -
drivers/md/bcache/btree.c | 32 +-
drivers/md/bcache/debug.c | 33 +-
drivers/md/bcache/io.c | 260 +-----
drivers/md/bcache/journal.c | 16 +-
drivers/md/bcache/movinggc.c | 11 +-
drivers/md/bcache/request.c | 203 ++---
drivers/md/bcache/request.h | 1 -
drivers/md/bcache/super.c | 60 +-
drivers/md/bcache/util.c | 21 +-
drivers/md/bcache/util.h | 8 +-
drivers/md/bcache/writeback.c | 17 +-
drivers/md/dm-bio-record.h | 37 +-
drivers/md/dm-bufio.c | 2 +-
drivers/md/dm-cache-policy-mq.c | 4 +-
drivers/md/dm-cache-target.c | 16 +-
drivers/md/dm-crypt.c | 68 +-
drivers/md/dm-delay.c | 7 +-
drivers/md/dm-flakey.c | 7 +-
drivers/md/dm-io.c | 38 +-
drivers/md/dm-linear.c | 3 +-
drivers/md/dm-raid1.c | 20 +-
drivers/md/dm-region-hash.c | 3 +-
drivers/md/dm-snap.c | 13 +-
drivers/md/dm-stripe.c | 13 +-
drivers/md/dm-thin.c | 23 +-
drivers/md/dm-verity.c | 61 +-
drivers/md/dm.c | 181 +---
drivers/md/faulty.c | 19 +-
drivers/md/linear.c | 96 +--
drivers/md/md.c | 35 +-
drivers/md/multipath.c | 13 +-
drivers/md/raid0.c | 79 +-
drivers/md/raid1.c | 63 +-
drivers/md/raid10.c | 198 +++--
drivers/md/raid5.c | 84 +-
drivers/message/fusion/mptsas.c | 8 +-
drivers/s390/block/dcssblk.c | 19 +-
drivers/s390/block/xpram.c | 19 +-
drivers/scsi/libsas/sas_expander.c | 8 +-
drivers/scsi/mpt2sas/mpt2sas_transport.c | 41 +-
drivers/scsi/mpt3sas/mpt3sas_transport.c | 39 +-
drivers/scsi/osd/osd_initiator.c | 2 +-
drivers/scsi/sd.c | 2 +-
drivers/scsi/sd_dif.c | 30 +-
drivers/staging/zram/zram_drv.c | 31 +-
drivers/target/target_core_iblock.c | 2 +-
fs/bio-integrity.c | 189 +----
fs/bio.c | 527 ++++++------
fs/btrfs/check-integrity.c | 10 +-
fs/btrfs/compression.c | 17 +-
fs/btrfs/extent_io.c | 16 +-
fs/btrfs/file-item.c | 13 +-
fs/btrfs/inode.c | 17 +-
fs/btrfs/raid56.c | 22 +-
fs/btrfs/scrub.c | 12 +-
fs/btrfs/volumes.c | 12 +-
fs/buffer.c | 12 +-
fs/direct-io.c | 1318 ++++++++----------------------
fs/ext4/page-io.c | 4 +-
fs/f2fs/data.c | 2 +-
fs/f2fs/segment.c | 3 +-
fs/gfs2/lops.c | 2 +-
fs/gfs2/ops_fstype.c | 2 +-
fs/hfsplus/wrapper.c | 2 +-
fs/jfs/jfs_logmgr.c | 10 +-
fs/jfs/jfs_metapage.c | 9 +-
fs/logfs/dev_bdev.c | 20 +-
fs/mpage.c | 2 +-
fs/nfs/blocklayout/blocklayout.c | 9 +-
fs/nilfs2/segbuf.c | 3 +-
fs/ocfs2/cluster/heartbeat.c | 2 +-
fs/xfs/xfs_aops.c | 2 +-
fs/xfs/xfs_buf.c | 4 +-
include/linux/bio.h | 226 +++--
include/linux/blk_types.h | 28 +-
include/linux/blkdev.h | 13 +-
include/linux/dm-io.h | 4 +-
include/trace/events/bcache.h | 20 +-
include/trace/events/block.h | 26 +-
include/trace/events/f2fs.h | 4 +-
kernel/power/block_io.c | 2 +-
kernel/trace/blktrace.c | 15 +-
mm/bounce.c | 45 +-
mm/page_io.c | 10 +-
123 files changed, 2236 insertions(+), 3757 deletions(-)


2013-06-09 02:19:43

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 02/26] bcache: Kill unaligned bvec hack

Bcache has a hack to avoid cloning the biovec if it's all full pages -
but with immutable biovecs coming this won't be necessary anymore.

For now, we remove the special case and always clone the bvec array so
that the immutable biovec patches are simpler.

Signed-off-by: Kent Overstreet <[email protected]>
---
drivers/md/bcache/bcache.h | 1 -
drivers/md/bcache/debug.c | 4 ----
drivers/md/bcache/request.c | 32 +++++---------------------------
drivers/md/bcache/request.h | 2 +-
drivers/md/bcache/super.c | 4 ----
5 files changed, 6 insertions(+), 37 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 340146d..850de67 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -443,7 +443,6 @@ struct bcache_device {
unsigned long sectors_dirty_last;
long sectors_dirty_derivative;

- mempool_t *unaligned_bvec;
struct bio_set *bio_split;

unsigned data_csum:1;
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 91cd5f8..5941ed7 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -192,10 +192,6 @@ void bch_data_verify(struct search *s)
struct bio_vec *bv;
int i;

- if (!s->unaligned_bvec)
- bio_for_each_segment(bv, s->orig_bio, i)
- bv->bv_offset = 0, bv->bv_len = PAGE_SIZE;
-
check = bio_clone(s->orig_bio, GFP_NOIO);
if (!check)
return;
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index c91ef76..8e5c35d 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -675,10 +675,14 @@ static void bio_complete(struct search *s)
static void do_bio_hook(struct search *s)
{
struct bio *bio = &s->bio.bio;
- memcpy(bio, s->orig_bio, sizeof(struct bio));

+ bio_init(bio);
+ bio->bi_io_vec = s->bv;
+ bio->bi_max_vecs = BIO_MAX_PAGES;
+ __bio_clone(bio, s->orig_bio);
bio->bi_end_io = request_endio;
bio->bi_private = &s->cl;
+
atomic_set(&bio->bi_cnt, 3);
}

@@ -690,16 +694,12 @@ static void search_free(struct closure *cl)
if (s->op.cache_bio)
bio_put(s->op.cache_bio);

- if (s->unaligned_bvec)
- mempool_free(s->bio.bio.bi_io_vec, s->d->unaligned_bvec);
-
closure_debug_destroy(cl);
mempool_free(s, s->d->c->search);
}

static struct search *search_alloc(struct bio *bio, struct bcache_device *d)
{
- struct bio_vec *bv;
struct search *s = mempool_alloc(d->c->search, GFP_NOIO);
memset(s, 0, offsetof(struct search, op.keys));

@@ -718,15 +718,6 @@ static struct search *search_alloc(struct bio *bio, struct bcache_device *d)
s->start_time = jiffies;
do_bio_hook(s);

- if (bio->bi_size != bio_segments(bio) * PAGE_SIZE) {
- bv = mempool_alloc(d->unaligned_bvec, GFP_NOIO);
- memcpy(bv, bio_iovec(bio),
- sizeof(struct bio_vec) * bio_segments(bio));
-
- s->bio.bio.bi_io_vec = bv;
- s->unaligned_bvec = 1;
- }
-
return s;
}

@@ -776,8 +767,6 @@ static void cached_dev_read_complete(struct closure *cl)
static void request_read_error(struct closure *cl)
{
struct search *s = container_of(cl, struct search, cl);
- struct bio_vec *bv;
- int i;

if (s->recoverable) {
/* The cache read failed, but we can retry from the backing
@@ -787,18 +776,7 @@ static void request_read_error(struct closure *cl)
(uint64_t) s->orig_bio->bi_sector);

s->error = 0;
- bv = s->bio.bio.bi_io_vec;
do_bio_hook(s);
- s->bio.bio.bi_io_vec = bv;
-
- if (!s->unaligned_bvec)
- bio_for_each_segment(bv, s->orig_bio, i)
- bv->bv_offset = 0, bv->bv_len = PAGE_SIZE;
- else
- memcpy(s->bio.bio.bi_io_vec,
- bio_iovec(s->orig_bio),
- sizeof(struct bio_vec) *
- bio_segments(s->orig_bio));

/* XXX: invalidate cache */

diff --git a/drivers/md/bcache/request.h b/drivers/md/bcache/request.h
index 254d9ab..7873029b 100644
--- a/drivers/md/bcache/request.h
+++ b/drivers/md/bcache/request.h
@@ -16,7 +16,6 @@ struct search {
unsigned cache_bio_sectors;

unsigned recoverable:1;
- unsigned unaligned_bvec:1;

unsigned write:1;
unsigned writeback:1;
@@ -27,6 +26,7 @@ struct search {

/* Anything past op->keys won't get zeroed in do_bio_hook */
struct btree_op op;
+ struct bio_vec bv[BIO_MAX_PAGES];
};

void bch_cache_read_endio(struct bio *, int);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index c8046bc..7f7ea78 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -741,8 +741,6 @@ static void bcache_device_free(struct bcache_device *d)
put_disk(d->disk);

bio_split_pool_free(&d->bio_split_hook);
- if (d->unaligned_bvec)
- mempool_destroy(d->unaligned_bvec);
if (d->bio_split)
bioset_free(d->bio_split);

@@ -754,8 +752,6 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size)
struct request_queue *q;

if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio))) ||
- !(d->unaligned_bvec = mempool_create_kmalloc_pool(1,
- sizeof(struct bio_vec) * BIO_MAX_PAGES)) ||
bio_split_pool_init(&d->bio_split_hook))

return -ENOMEM;
--
1.8.3.rc1

2013-06-09 02:19:55

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 04/26] dm: Use bvec_iter for dm_bio_record()

This patch doesn't itself have any functional changes, but immutable
biovecs are going to add a bi_bvec_done member to bi_iter, which will
need to be saved too here.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: [email protected]
---
drivers/md/dm-bio-record.h | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm-bio-record.h b/drivers/md/dm-bio-record.h
index 5ace48e..4f46e8e 100644
--- a/drivers/md/dm-bio-record.h
+++ b/drivers/md/dm-bio-record.h
@@ -28,11 +28,9 @@ struct dm_bio_vec_details {
};

struct dm_bio_details {
- sector_t bi_sector;
struct block_device *bi_bdev;
- unsigned int bi_size;
- unsigned short bi_idx;
unsigned long bi_flags;
+ struct bvec_iter bi_iter;
struct dm_bio_vec_details bi_io_vec[BIO_MAX_PAGES];
};

@@ -40,11 +38,9 @@ static inline void dm_bio_record(struct dm_bio_details *bd, struct bio *bio)
{
unsigned i;

- bd->bi_sector = bio->bi_iter.bi_sector;
bd->bi_bdev = bio->bi_bdev;
- bd->bi_size = bio->bi_iter.bi_size;
- bd->bi_idx = bio->bi_iter.bi_idx;
bd->bi_flags = bio->bi_flags;
+ bd->bi_iter = bio->bi_iter;

for (i = 0; i < bio->bi_vcnt; i++) {
bd->bi_io_vec[i].bv_len = bio->bi_io_vec[i].bv_len;
@@ -56,11 +52,9 @@ static inline void dm_bio_restore(struct dm_bio_details *bd, struct bio *bio)
{
unsigned i;

- bio->bi_iter.bi_sector = bd->bi_sector;
bio->bi_bdev = bd->bi_bdev;
- bio->bi_iter.bi_size = bd->bi_size;
- bio->bi_iter.bi_idx = bd->bi_idx;
bio->bi_flags = bd->bi_flags;
+ bio->bi_iter = bd->bi_iter;

for (i = 0; i < bio->bi_vcnt; i++) {
bio->bi_io_vec[i].bv_len = bd->bi_io_vec[i].bv_len;
--
1.8.3.rc1

2013-06-09 02:20:05

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 12/26] rbd: Refactor bio cloning, don't clone biovecs

Now that we've got drivers converted to the new immutable bvec
primitives, bio splitting becomes much easier. In a few patches,
bio_clone() will be changed to share the old bio's bvec instead of
copying it, and bio_split() will do exactly what's being done here.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Yehuda Sadeh <[email protected]>
Cc: Alex Elder <[email protected]>
Cc: [email protected]
---
drivers/block/rbd.c | 64 ++---------------------------------------------------
1 file changed, 2 insertions(+), 62 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 2a27dca..ed17d33 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1170,73 +1170,13 @@ static struct bio *bio_clone_range(struct bio *bio_src,
unsigned int len,
gfp_t gfpmask)
{
- struct bio_vec bv;
- struct bvec_iter iter;
- struct bvec_iter end_iter;
- unsigned int resid;
- unsigned int voff;
- unsigned short vcnt;
struct bio *bio;

- /* Handle the easy case for the caller */
-
- if (!offset && len == bio_src->bi_iter.bi_size)
- return bio_clone(bio_src, gfpmask);
-
- if (WARN_ON_ONCE(!len))
- return NULL;
- if (WARN_ON_ONCE(len > bio_src->bi_iter.bi_size))
- return NULL;
- if (WARN_ON_ONCE(offset > bio_src->bi_iter.bi_size - len))
- return NULL;
-
- /* Find first affected segment... */
-
- resid = offset;
- bio_for_each_segment(bv, bio_src, iter) {
- if (resid < bv.bv_len)
- break;
- resid -= bv.bv_len;
- }
- voff = resid;
-
- /* ...and the last affected segment */
-
- resid += len;
- __bio_for_each_segment(bv, bio_src, end_iter, iter) {
- if (resid <= bv.bv_len)
- break;
- resid -= bv.bv_len;
- }
- vcnt = end_iter.bi_idx = iter.bi_idx + 1;
-
- /* Build the clone */
-
- bio = bio_alloc(gfpmask, (unsigned int) vcnt);
+ bio = bio_clone(bio_src, gfpmask);
if (!bio)
return NULL; /* ENOMEM */

- bio->bi_bdev = bio_src->bi_bdev;
- bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector +
- (offset >> SECTOR_SHIFT);
- bio->bi_rw = bio_src->bi_rw;
- bio->bi_flags |= 1 << BIO_CLONED;
-
- /*
- * Copy over our part of the bio_vec, then update the first
- * and last (or only) entries.
- */
- memcpy(&bio->bi_io_vec[0], &bio_src->bi_io_vec[iter.bi_idx],
- vcnt * sizeof (struct bio_vec));
- bio->bi_io_vec[0].bv_offset += voff;
- if (vcnt > 1) {
- bio->bi_io_vec[0].bv_len -= voff;
- bio->bi_io_vec[vcnt - 1].bv_len = resid;
- } else {
- bio->bi_io_vec[0].bv_len = len;
- }
-
- bio->bi_vcnt = vcnt;
+ bio_advance(bio, offset);
bio->bi_iter.bi_size = len;

return bio;
--
1.8.3.rc1

2013-06-09 02:20:18

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 25/26] block: Add bio_get_user_pages()

This replaces some of the code that was in __bio_map_user_iov(), and
soon we're going to use this helper in the dio code.

Note that this relies on the recent change to make
generic_make_request() take arbitrary sized bios - we're not using
bio_add_page() here.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
---
fs/bio.c | 123 +++++++++++++++++++++++++++-------------------------
include/linux/bio.h | 2 +
2 files changed, 66 insertions(+), 59 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 10d71cf..fe88f6e 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1212,17 +1212,69 @@ struct bio *bio_copy_user(struct request_queue *q, struct rq_map_data *map_data,
}
EXPORT_SYMBOL(bio_copy_user);

+/**
+ * bio_get_user_pages - pin user pages and add them to a biovec
+ * @bio: bio to add pages to
+ * @uaddr: start of user address
+ * @len: length in bytes
+ * @write_to_vm: bool indicating writing to pages or not
+ *
+ * Pins pages for up to @len bytes and appends them to @bio's bvec array. May
+ * pin only part of the requested pages - @bio need not have room for all the
+ * pages and can already have had pages added to it.
+ *
+ * Returns the number of bytes from @len added to @bio.
+ */
+ssize_t bio_get_user_pages(struct bio *bio, unsigned long uaddr,
+ unsigned long len, int write_to_vm)
+{
+ int ret;
+ unsigned nr_pages, bytes;
+ unsigned offset = offset_in_page(uaddr);
+ struct bio_vec *bv;
+ struct page **pages;
+
+ nr_pages = min_t(size_t,
+ DIV_ROUND_UP(len + offset, PAGE_SIZE),
+ bio->bi_max_vecs - bio->bi_vcnt);
+
+ bv = &bio->bi_io_vec[bio->bi_vcnt];
+ pages = (void *) bv;
+
+ ret = get_user_pages_fast(uaddr, nr_pages, write_to_vm, pages);
+ if (ret < 0)
+ return ret;
+
+ bio->bi_vcnt += ret;
+ bytes = ret * PAGE_SIZE - offset;
+
+ while (ret--) {
+ bv[ret].bv_page = pages[ret];
+ bv[ret].bv_len = PAGE_SIZE;
+ bv[ret].bv_offset = 0;
+ }
+
+ bv[0].bv_offset += offset;
+ bv[0].bv_len -= offset;
+
+ if (bytes > len) {
+ bio->bi_io_vec[bio->bi_vcnt - 1].bv_len -= bytes - len;
+ bytes = len;
+ }
+
+ bio->bi_iter.bi_size += bytes;
+
+ return bytes;
+}
+
static struct bio *__bio_map_user_iov(struct request_queue *q,
struct block_device *bdev,
struct sg_iovec *iov, int iov_count,
int write_to_vm, gfp_t gfp_mask)
{
- int i, j;
- int nr_pages = 0;
- struct page **pages;
+ ssize_t ret;
+ int i, nr_pages = 0;
struct bio *bio;
- int cur_page = 0;
- int ret, offset;

for (i = 0; i < iov_count; i++) {
unsigned long uaddr = (unsigned long)iov[i].iov_base;
@@ -1251,57 +1303,17 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
if (!bio)
return ERR_PTR(-ENOMEM);

- ret = -ENOMEM;
- pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask);
- if (!pages)
- goto out;
-
for (i = 0; i < iov_count; i++) {
- unsigned long uaddr = (unsigned long)iov[i].iov_base;
- unsigned long len = iov[i].iov_len;
- unsigned long end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
- unsigned long start = uaddr >> PAGE_SHIFT;
- const int local_nr_pages = end - start;
- const int page_limit = cur_page + local_nr_pages;
-
- ret = get_user_pages_fast(uaddr, local_nr_pages,
- write_to_vm, &pages[cur_page]);
- if (ret < local_nr_pages) {
- ret = -EFAULT;
- goto out_unmap;
- }
-
- offset = uaddr & ~PAGE_MASK;
- for (j = cur_page; j < page_limit; j++) {
- unsigned int bytes = PAGE_SIZE - offset;
+ ret = bio_get_user_pages(bio, (size_t) iov[i].iov_base,
+ iov[i].iov_len,
+ write_to_vm);
+ if (ret < 0)
+ goto out;

- if (len <= 0)
- break;
-
- if (bytes > len)
- bytes = len;
-
- /*
- * sorry...
- */
- if (bio_add_pc_page(q, bio, pages[j], bytes, offset) <
- bytes)
- break;
-
- len -= bytes;
- offset = 0;
- }
-
- cur_page = j;
- /*
- * release the pages we didn't map into the bio, if any
- */
- while (j < page_limit)
- page_cache_release(pages[j++]);
+ if (ret != iov[i].iov_len)
+ break;
}

- kfree(pages);
-
/*
* set data direction, and check if mapped pages need bouncing
*/
@@ -1312,14 +1324,7 @@ static struct bio *__bio_map_user_iov(struct request_queue *q,
bio->bi_flags |= (1 << BIO_USER_MAPPED);
return bio;

- out_unmap:
- for (i = 0; i < nr_pages; i++) {
- if(!pages[i])
- break;
- page_cache_release(pages[i]);
- }
out:
- kfree(pages);
bio_put(bio);
return ERR_PTR(ret);
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 444cc91..340d859 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -333,6 +333,8 @@ extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
unsigned int, unsigned int);
extern int bio_get_nr_vecs(struct block_device *);
+extern ssize_t bio_get_user_pages(struct bio *, unsigned long,
+ unsigned long, int);
extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
unsigned long, unsigned int, int, gfp_t);
struct sg_iovec;
--
1.8.3.rc1

2013-06-09 02:20:15

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 20/26] block: Don't save/copy bvec array anymore, share when cloning

Now that drivers have been converted to the bvec_iter primitives, they
shouldn't be modifying the biovec anymore and thus saving it is
unnecessary - code that was previously making a backup of the bvec array
can now just save bio->bi_iter.

Also, when cloning bios we can usually just reuse the original bio's
bvec array. For code that does need to modify the clone's biovec (the
bounce buffer code, mainly), add bio_clone_biovec().

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: [email protected]
---
drivers/md/bcache/request.c | 2 -
drivers/md/bcache/request.h | 1 -
drivers/md/dm-bio-record.h | 25 --------
fs/bio-integrity.c | 12 +---
fs/bio.c | 152 +++++++++++++++++++-------------------------
include/linux/bio.h | 1 +
mm/bounce.c | 1 +
7 files changed, 71 insertions(+), 123 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index ca513d4..3c95b0e 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -677,8 +677,6 @@ static void do_bio_hook(struct search *s)
struct bio *bio = &s->bio.bio;

bio_init(bio);
- bio->bi_io_vec = s->bv;
- bio->bi_max_vecs = BIO_MAX_PAGES;
__bio_clone(bio, s->orig_bio);
bio->bi_end_io = request_endio;
bio->bi_private = &s->cl;
diff --git a/drivers/md/bcache/request.h b/drivers/md/bcache/request.h
index 7873029b..5c1ece2 100644
--- a/drivers/md/bcache/request.h
+++ b/drivers/md/bcache/request.h
@@ -26,7 +26,6 @@ struct search {

/* Anything past op->keys won't get zeroed in do_bio_hook */
struct btree_op op;
- struct bio_vec bv[BIO_MAX_PAGES];
};

void bch_cache_read_endio(struct bio *, int);
diff --git a/drivers/md/dm-bio-record.h b/drivers/md/dm-bio-record.h
index 4f46e8e..dd36461 100644
--- a/drivers/md/dm-bio-record.h
+++ b/drivers/md/dm-bio-record.h
@@ -17,49 +17,24 @@
* original bio state.
*/

-struct dm_bio_vec_details {
-#if PAGE_SIZE < 65536
- __u16 bv_len;
- __u16 bv_offset;
-#else
- unsigned bv_len;
- unsigned bv_offset;
-#endif
-};
-
struct dm_bio_details {
struct block_device *bi_bdev;
unsigned long bi_flags;
struct bvec_iter bi_iter;
- struct dm_bio_vec_details bi_io_vec[BIO_MAX_PAGES];
};

static inline void dm_bio_record(struct dm_bio_details *bd, struct bio *bio)
{
- unsigned i;
-
bd->bi_bdev = bio->bi_bdev;
bd->bi_flags = bio->bi_flags;
bd->bi_iter = bio->bi_iter;
-
- for (i = 0; i < bio->bi_vcnt; i++) {
- bd->bi_io_vec[i].bv_len = bio->bi_io_vec[i].bv_len;
- bd->bi_io_vec[i].bv_offset = bio->bi_io_vec[i].bv_offset;
- }
}

static inline void dm_bio_restore(struct dm_bio_details *bd, struct bio *bio)
{
- unsigned i;
-
bio->bi_bdev = bd->bi_bdev;
bio->bi_flags = bd->bi_flags;
bio->bi_iter = bd->bi_iter;
-
- for (i = 0; i < bio->bi_vcnt; i++) {
- bio->bi_io_vec[i].bv_len = bd->bi_io_vec[i].bv_len;
- bio->bi_io_vec[i].bv_offset = bd->bi_io_vec[i].bv_offset;
- }
}

#endif
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index 72fa942..0c466e6 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -594,17 +594,11 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
struct bio_integrity_payload *bip_src = bio_src->bi_integrity;
struct bio_integrity_payload *bip;

- BUG_ON(bip_src == NULL);
-
- bip = bio_integrity_alloc(bio, gfp_mask, bip_src->bip_vcnt);
-
+ bip = bio_integrity_alloc(bio, gfp_mask, 0);
if (bip == NULL)
- return -EIO;
-
- memcpy(bip->bip_vec, bip_src->bip_vec,
- bip_src->bip_vcnt * sizeof(struct bio_vec));
+ return -ENOMEM;

- bip->bip_vcnt = bip_src->bip_vcnt;
+ bip->bip_vec = bip_src->bip_vec;
bip->bip_iter = bip_src->bip_iter;

return 0;
diff --git a/fs/bio.c b/fs/bio.c
index ee033a7..10d71cf 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -544,8 +544,7 @@ EXPORT_SYMBOL(bio_phys_segments);
*/
void __bio_clone(struct bio *bio, struct bio *bio_src)
{
- memcpy(bio->bi_io_vec, bio_src->bi_io_vec,
- bio_src->bi_max_vecs * sizeof(struct bio_vec));
+ BUG_ON(bio->bi_pool && BIO_POOL_IDX(bio) != BIO_POOL_NONE);

/*
* most users will be overriding ->bi_bdev with a new target,
@@ -554,8 +553,8 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
bio->bi_bdev = bio_src->bi_bdev;
bio->bi_flags |= 1 << BIO_CLONED;
bio->bi_rw = bio_src->bi_rw;
- bio->bi_vcnt = bio_src->bi_vcnt;
bio->bi_iter = bio_src->bi_iter;
+ bio->bi_io_vec = bio_src->bi_io_vec;
}
EXPORT_SYMBOL(__bio_clone);

@@ -572,7 +571,7 @@ struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
{
struct bio *b;

- b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
+ b = bio_alloc_bioset(gfp_mask, 0, bs);
if (!b)
return NULL;

@@ -594,6 +593,50 @@ struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
EXPORT_SYMBOL(bio_clone_bioset);

/**
+ * bio_clone_biovec: Given a cloned bio, give the clone its own copy of the
+ * biovec
+ * @bio: cloned bio
+ *
+ * @bio must have been allocated from a bioset - i.e. returned from
+ * bio_clone_bioset()
+ */
+int bio_clone_biovec(struct bio *bio, gfp_t gfp_mask)
+{
+ unsigned long idx = BIO_POOL_NONE;
+ unsigned nr_iovecs = 0;
+ struct bio_vec bv, *bvl = NULL;
+ struct bvec_iter iter;
+
+ BUG_ON(!bio->bi_pool);
+ BUG_ON(BIO_POOL_IDX(bio) != BIO_POOL_NONE);
+
+ bio_for_each_segment(bv, bio, iter)
+ nr_iovecs++;
+
+ if (nr_iovecs > BIO_INLINE_VECS) {
+ bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx,
+ bio->bi_pool->bvec_pool);
+ if (!bvl)
+ return -ENOMEM;
+ } else if (nr_iovecs) {
+ bvl = bio->bi_inline_vecs;
+ }
+
+ bio_for_each_segment(bv, bio, iter)
+ bvl[bio->bi_vcnt++] = bv;
+
+ bio->bi_io_vec = bvl;
+ bio->bi_iter.bi_idx = 0;
+ bio->bi_iter.bi_bvec_done = 0;
+
+ bio->bi_flags &= BIO_POOL_MASK - 1;
+ bio->bi_flags |= idx << BIO_POOL_OFFSET;
+
+ return 0;
+}
+EXPORT_SYMBOL(bio_clone_biovec);
+
+/**
* bio_get_nr_vecs - return approx number of vecs
* @bdev: I/O target
*
@@ -918,60 +961,33 @@ void bio_copy_data(struct bio *dst, struct bio *src)
EXPORT_SYMBOL(bio_copy_data);

struct bio_map_data {
- struct bio_vec *iovecs;
- struct sg_iovec *sgvecs;
int nr_sgvecs;
int is_our_pages;
+ struct sg_iovec sgvecs[];
};

static void bio_set_map_data(struct bio_map_data *bmd, struct bio *bio,
struct sg_iovec *iov, int iov_count,
int is_our_pages)
{
- memcpy(bmd->iovecs, bio->bi_io_vec, sizeof(struct bio_vec) * bio->bi_vcnt);
memcpy(bmd->sgvecs, iov, sizeof(struct sg_iovec) * iov_count);
bmd->nr_sgvecs = iov_count;
bmd->is_our_pages = is_our_pages;
bio->bi_private = bmd;
}

-static void bio_free_map_data(struct bio_map_data *bmd)
-{
- kfree(bmd->iovecs);
- kfree(bmd->sgvecs);
- kfree(bmd);
-}
-
static struct bio_map_data *bio_alloc_map_data(int nr_segs,
unsigned int iov_count,
gfp_t gfp_mask)
{
- struct bio_map_data *bmd;
-
if (iov_count > UIO_MAXIOV)
return NULL;

- bmd = kmalloc(sizeof(*bmd), gfp_mask);
- if (!bmd)
- return NULL;
-
- bmd->iovecs = kmalloc(sizeof(struct bio_vec) * nr_segs, gfp_mask);
- if (!bmd->iovecs) {
- kfree(bmd);
- return NULL;
- }
-
- bmd->sgvecs = kmalloc(sizeof(struct sg_iovec) * iov_count, gfp_mask);
- if (bmd->sgvecs)
- return bmd;
-
- kfree(bmd->iovecs);
- kfree(bmd);
- return NULL;
+ return kmalloc(sizeof(struct bio_map_data) +
+ sizeof(struct sg_iovec) * iov_count, gfp_mask);
}

-static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,
- struct sg_iovec *iov, int iov_count,
+static int __bio_copy_iov(struct bio *bio, struct sg_iovec *iov, int iov_count,
int to_user, int from_user, int do_free_page)
{
int ret = 0, i;
@@ -981,7 +997,7 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs,

bio_for_each_segment_all(bvec, bio, i) {
char *bv_addr = page_address(bvec->bv_page);
- unsigned int bv_len = iovecs[i].bv_len;
+ unsigned int bv_len = bvec->bv_len;

while (bv_len && iov_idx < iov_count) {
unsigned int bytes;
@@ -1035,10 +1051,10 @@ int bio_uncopy_user(struct bio *bio)
int ret = 0;

if (!bio_flagged(bio, BIO_NULL_MAPPED))
- ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
- bmd->nr_sgvecs, bio_data_dir(bio) == READ,
+ ret = __bio_copy_iov(bio, bmd->sgvecs, bmd->nr_sgvecs,
+ bio_data_dir(bio) == READ,
0, bmd->is_our_pages);
- bio_free_map_data(bmd);
+ kfree(bmd);
bio_put(bio);
return ret;
}
@@ -1152,7 +1168,7 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
*/
if ((!write_to_vm && (!map_data || !map_data->null_mapped)) ||
(map_data && map_data->from_user)) {
- ret = __bio_copy_iov(bio, bio->bi_io_vec, iov, iov_count, 0, 1, 0);
+ ret = __bio_copy_iov(bio, iov, iov_count, 0, 1, 0);
if (ret)
goto cleanup;
}
@@ -1166,7 +1182,7 @@ cleanup:

bio_put(bio);
out_bmd:
- bio_free_map_data(bmd);
+ kfree(bmd);
return ERR_PTR(ret);
}

@@ -1483,16 +1499,15 @@ static void bio_copy_kern_endio(struct bio *bio, int err)

bio_for_each_segment_all(bvec, bio, i) {
char *addr = page_address(bvec->bv_page);
- int len = bmd->iovecs[i].bv_len;

if (read)
- memcpy(p, addr, len);
+ memcpy(p, addr, bvec->bv_len);

__free_page(bvec->bv_page);
- p += len;
+ p += bvec->bv_len;
}

- bio_free_map_data(bmd);
+ kfree(bmd);
bio_put(bio);
}

@@ -1720,60 +1735,25 @@ EXPORT_SYMBOL(bio_endio);
* Allocates and returns a new bio which represents @sectors from the start of
* @bio, and updates @bio to represent the remaining sectors.
*
- * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
- * unchanged.
+ * The newly allocated bio will point to @bio's bi_io_vec; it is the caller's
+ * responsibility to ensure that @bio is not freed before the split.
*/
struct bio *bio_split(struct bio *bio, int sectors,
gfp_t gfp, struct bio_set *bs)
{
- unsigned vcnt = 0, nbytes = sectors << 9;
- struct bio_vec bv;
- struct bvec_iter iter;
struct bio *split = NULL;

BUG_ON(sectors <= 0);
BUG_ON(sectors >= bio_sectors(bio));

- if (bio->bi_rw & REQ_DISCARD) {
- split = bio_alloc_bioset(gfp, 1, bs);
- goto out;
- }
-
- bio_for_each_segment(bv, bio, iter) {
- vcnt++;
-
- if (nbytes <= bv.bv_len)
- break;
-
- nbytes -= bv.bv_len;
- }
-
- split = bio_alloc_bioset(gfp, vcnt, bs);
+ split = bio_clone_bioset(bio, gfp, bs);
if (!split)
return NULL;

- bio_for_each_segment(bv, bio, iter) {
- split->bi_io_vec[split->bi_vcnt++] = bv;
-
- if (split->bi_vcnt == vcnt)
- break;
- }
+ split->bi_iter.bi_size = sectors << 9;

- split->bi_io_vec[split->bi_vcnt - 1].bv_len = nbytes;
-out:
- split->bi_bdev = bio->bi_bdev;
- split->bi_iter.bi_sector = bio->bi_iter.bi_sector;
- split->bi_iter.bi_size = sectors << 9;
- split->bi_rw = bio->bi_rw;
-
- if (bio_integrity(bio)) {
- if (bio_integrity_clone(split, bio, gfp)) {
- bio_put(split);
- return NULL;
- }
-
- bio_integrity_trim(split, 0, bio_sectors(split));
- }
+ if (bio_integrity(split))
+ bio_integrity_trim(split, 0, sectors);

bio_advance(bio, split->bi_iter.bi_size);

diff --git a/include/linux/bio.h b/include/linux/bio.h
index aa015ee..444cc91 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -293,6 +293,7 @@ extern void bio_put(struct bio *);

extern void __bio_clone(struct bio *, struct bio *);
extern struct bio *bio_clone_bioset(struct bio *, gfp_t, struct bio_set *bs);
+extern int bio_clone_biovec(struct bio *bio, gfp_t gfp_mask);

extern struct bio_set *fs_bio_set;

diff --git a/mm/bounce.c b/mm/bounce.c
index 4525e3d..985fe23 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -209,6 +209,7 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
return;
bounce:
bio = bio_clone_bioset(*bio_orig, GFP_NOIO, fs_bio_set);
+ bio_clone_biovec(bio, GFP_NOIO);

bio_for_each_segment_all(to, bio, i) {
struct page *page = to->bv_page;
--
1.8.3.rc1

2013-06-09 02:20:51

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 26/26] Apply fire to dio code

Near total rewrite of fs/direct-io.c. This makes use of our new bio
splitting functionality, and the fact that generic_make_request() will
take arbitrary size bios - we allocate a bio, pin pages to it directly,
then call the getblocks() function to map it wherever the filesystem
tells us - splitting as needed.

This version will break btrfs - we need to add a new kind of hook so
btrfs can do what it wants to more sanely that previously.

Signed-off-by: Kent Overstreet <[email protected]>
---
fs/direct-io.c | 1318 ++++++++++++++------------------------------------------
1 file changed, 331 insertions(+), 987 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 6a5de20..3c32bf3 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -8,7 +8,7 @@
* 04Jul2002 Andrew Morton
* Initial version
* 11Sep2002 [email protected]
- * added readv/writev support.
+ * added readv/writev support.
* 29Oct2002 Andrew Morton
* rewrote bio_add_page() support.
* 30Oct2002 [email protected]
@@ -39,183 +39,37 @@
#include <linux/prefetch.h>
#include <linux/aio.h>

-/*
- * How many user pages to map in one call to get_user_pages(). This determines
- * the size of a structure in the slab cache
- */
-#define DIO_PAGES 64
-
-/*
- * This code generally works in units of "dio_blocks". A dio_block is
- * somewhere between the hard sector size and the filesystem block size. it
- * is determined on a per-invocation basis. When talking to the filesystem
- * we need to convert dio_blocks to fs_blocks by scaling the dio_block quantity
- * down by dio->blkfactor. Similarly, fs-blocksize quantities are converted
- * to bio_block quantities by shifting left by blkfactor.
- *
- * If blkfactor is zero then the user's request was aligned to the filesystem's
- * blocksize.
- */
-
/* dio_state only used in the submission path */
-
struct dio_submit {
- struct bio *bio; /* bio under assembly */
- unsigned blkbits; /* doesn't change */
- unsigned blkfactor; /* When we're using an alignment which
- is finer than the filesystem's soft
- blocksize, this specifies how much
- finer. blkfactor=2 means 1/4-block
- alignment. Does not change */
- unsigned start_zero_done; /* flag: sub-blocksize zeroing has
- been performed at the start of a
- write */
- int pages_in_io; /* approximate total IO pages */
- size_t size; /* total request size (doesn't change)*/
- sector_t block_in_file; /* Current offset into the underlying
- file in dio_block units. */
- unsigned blocks_available; /* At block_in_file. changes */
- int reap_counter; /* rate limit reaping */
- sector_t final_block_in_request;/* doesn't change */
- unsigned first_block_in_page; /* doesn't change, Used only once */
- int boundary; /* prev block is at a boundary */
- get_block_t *get_block; /* block mapping function */
- dio_submit_t *submit_io; /* IO submition function */
-
- loff_t logical_offset_in_bio; /* current first logical block in bio */
- sector_t final_block_in_bio; /* current final block in bio + 1 */
- sector_t next_block_for_io; /* next block to be put under IO,
- in dio_blocks units */
-
- /*
- * Deferred addition of a page to the dio. These variables are
- * private to dio_send_cur_page(), submit_page_section() and
- * dio_bio_add_page().
- */
- struct page *cur_page; /* The page */
- unsigned cur_page_offset; /* Offset into it, in bytes */
- unsigned cur_page_len; /* Nr of bytes at cur_page_offset */
- sector_t cur_page_block; /* Where it starts */
- loff_t cur_page_fs_offset; /* Offset in file */
-
- /*
- * Page fetching state. These variables belong to dio_refill_pages().
- */
- int curr_page; /* changes */
- int total_pages; /* doesn't change */
- unsigned long curr_user_address;/* changes */
-
- /*
- * Page queue. These variables belong to dio_refill_pages() and
- * dio_get_page().
- */
- unsigned head; /* next page to process */
- unsigned tail; /* last valid page + 1 */
+ get_block_t *get_block; /* block mapping function */
+ dio_submit_t *submit_io; /* IO submition function */
+ unsigned i_blkbits;
};

/* dio_state communicated between submission path and end_io */
struct dio {
- int flags; /* doesn't change */
- int rw;
- struct inode *inode;
- loff_t i_size; /* i_size when submitted */
- dio_iodone_t *end_io; /* IO completion function */
+ int flags; /* doesn't change */
+ int rw;
+ struct inode *inode;
+ loff_t i_size; /* i_size when submitted */

- void *private; /* copy from map_bh.b_private */
+ dio_iodone_t *end_io; /* IO completion function */
+ void *private; /* copy from map_bh.b_private */

/* BIO completion state */
- spinlock_t bio_lock; /* protects BIO fields below */
- int page_errors; /* errno from get_user_pages() */
- int is_async; /* is IO async ? */
- int io_error; /* IO error in completion path */
- unsigned long refcount; /* direct_io_worker() and bios */
- struct bio *bio_list; /* singly linked via bi_private */
+ int page_error; /* errno from get_user_pages() */
+ int io_error; /* IO error in completion path */
+ atomic_long_t refcount; /* direct_io_worker() and bios */
struct task_struct *waiter; /* waiting task (NULL if none) */

/* AIO related stuff */
- struct kiocb *iocb; /* kiocb */
- ssize_t result; /* IO result */
-
- /*
- * pages[] (and any fields placed after it) are not zeroed out at
- * allocation time. Don't add new fields after pages[] unless you
- * wish that they not be zeroed.
- */
- struct page *pages[DIO_PAGES]; /* page buffer */
-} ____cacheline_aligned_in_smp;
+ struct kiocb *iocb; /* kiocb */
+ ssize_t result; /* IO result */

-static struct kmem_cache *dio_cache __read_mostly;
-
-/*
- * How many pages are in the queue?
- */
-static inline unsigned dio_pages_present(struct dio_submit *sdio)
-{
- return sdio->tail - sdio->head;
-}
-
-/*
- * Go grab and pin some userspace pages. Typically we'll get 64 at a time.
- */
-static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio)
-{
- int ret;
- int nr_pages;
-
- nr_pages = min(sdio->total_pages - sdio->curr_page, DIO_PAGES);
- ret = get_user_pages_fast(
- sdio->curr_user_address, /* Where from? */
- nr_pages, /* How many pages? */
- dio->rw == READ, /* Write to memory? */
- &dio->pages[0]); /* Put results here */
-
- if (ret < 0 && sdio->blocks_available && (dio->rw & WRITE)) {
- struct page *page = ZERO_PAGE(0);
- /*
- * A memory fault, but the filesystem has some outstanding
- * mapped blocks. We need to use those blocks up to avoid
- * leaking stale data in the file.
- */
- if (dio->page_errors == 0)
- dio->page_errors = ret;
- page_cache_get(page);
- dio->pages[0] = page;
- sdio->head = 0;
- sdio->tail = 1;
- ret = 0;
- goto out;
- }
-
- if (ret >= 0) {
- sdio->curr_user_address += ret * PAGE_SIZE;
- sdio->curr_page += ret;
- sdio->head = 0;
- sdio->tail = ret;
- ret = 0;
- }
-out:
- return ret;
-}
-
-/*
- * Get another userspace page. Returns an ERR_PTR on error. Pages are
- * buffered inside the dio so that we can call get_user_pages() against a
- * decent number of pages, less frequently. To provide nicer use of the
- * L1 cache.
- */
-static inline struct page *dio_get_page(struct dio *dio,
- struct dio_submit *sdio)
-{
- if (dio_pages_present(sdio) == 0) {
- int ret;
+ struct bio bio;
+};

- ret = dio_refill_pages(dio, sdio);
- if (ret)
- return ERR_PTR(ret);
- BUG_ON(dio_pages_present(sdio) == 0);
- }
- return dio->pages[sdio->head++];
-}
+static struct bio_set *dio_pool __read_mostly;

/**
* dio_complete() - called when all DIO BIO I/O has been completed
@@ -234,25 +88,18 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret, bool is
{
ssize_t transferred = 0;

- /*
- * AIO submission can race with bio completion to get here while
- * expecting to have the last io completed by bio completion.
- * In that case -EIOCBQUEUED is in fact not an error we want
- * to preserve through this call.
- */
- if (ret == -EIOCBQUEUED)
- ret = 0;
-
if (dio->result) {
transferred = dio->result;

+ /* XXX: dio_send_bio() could do this */
+
/* Check for short read case */
if ((dio->rw == READ) && ((offset + transferred) > dio->i_size))
transferred = dio->i_size - offset;
}

if (ret == 0)
- ret = dio->page_errors;
+ ret = dio->page_error;
if (ret == 0)
ret = dio->io_error;
if (ret == 0)
@@ -267,53 +114,11 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, ssize_t ret, bool is
aio_complete(dio->iocb, ret, 0);
}

+ bio_put(&dio->bio);
return ret;
}

-static int dio_bio_complete(struct dio *dio, struct bio *bio);
-/*
- * Asynchronous IO callback.
- */
-static void dio_bio_end_aio(struct bio *bio, int error)
-{
- struct dio *dio = bio->bi_private;
- unsigned long remaining;
- unsigned long flags;
-
- /* cleanup the bio */
- dio_bio_complete(dio, bio);
-
- spin_lock_irqsave(&dio->bio_lock, flags);
- remaining = --dio->refcount;
- if (remaining == 1 && dio->waiter)
- wake_up_process(dio->waiter);
- spin_unlock_irqrestore(&dio->bio_lock, flags);
-
- if (remaining == 0) {
- dio_complete(dio, dio->iocb->ki_pos, 0, true);
- kmem_cache_free(dio_cache, dio);
- }
-}
-
-/*
- * The BIO completion handler simply queues the BIO up for the process-context
- * handler.
- *
- * During I/O bi_private points at the dio. After I/O, bi_private is used to
- * implement a singly-linked list of completed BIOs, at dio->bio_list.
- */
-static void dio_bio_end_io(struct bio *bio, int error)
-{
- struct dio *dio = bio->bi_private;
- unsigned long flags;
-
- spin_lock_irqsave(&dio->bio_lock, flags);
- bio->bi_private = dio->bio_list;
- dio->bio_list = bio;
- if (--dio->refcount == 1 && dio->waiter)
- wake_up_process(dio->waiter);
- spin_unlock_irqrestore(&dio->bio_lock, flags);
-}
+#define DIO_WAKEUP (1U << 31)

/**
* dio_end_io - handle the end io action for the given bio
@@ -327,693 +132,321 @@ static void dio_bio_end_io(struct bio *bio, int error)
void dio_end_io(struct bio *bio, int error)
{
struct dio *dio = bio->bi_private;
+ unsigned long remaining;

- if (dio->is_async)
- dio_bio_end_aio(bio, error);
- else
- dio_bio_end_io(bio, error);
+ if (error)
+ dio->io_error = -EIO;
+
+ if (dio->rw == READ) {
+ bio_check_pages_dirty(bio); /* transfers ownership */
+ } else {
+ struct bio_vec *bv;
+ int i;
+
+ bio_for_each_segment_all(bv, bio, i)
+ page_cache_release(bv->bv_page);
+ bio_put(bio);
+ }
+
+ remaining = atomic_long_dec_return(&dio->refcount);
+
+ if (remaining == DIO_WAKEUP)
+ wake_up_process(dio->waiter);
+ else if (!remaining)
+ dio_complete(dio, dio->iocb->ki_pos, 0, true);
}
EXPORT_SYMBOL_GPL(dio_end_io);

-static inline void
-dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
- struct block_device *bdev,
- sector_t first_sector, int nr_vecs)
+static void dio_wait_completion(struct dio *dio)
{
- struct bio *bio;
-
- /*
- * bio_alloc() is guaranteed to return a bio when called with
- * __GFP_WAIT and we request a valid number of vectors.
- */
- bio = bio_alloc(GFP_KERNEL, nr_vecs);
+ if (atomic_long_add_return(DIO_WAKEUP - 1,
+ &dio->refcount) == DIO_WAKEUP)
+ return;

- bio->bi_bdev = bdev;
- bio->bi_iter.bi_sector = first_sector;
- if (dio->is_async)
- bio->bi_end_io = dio_bio_end_aio;
- else
- bio->bi_end_io = dio_bio_end_io;
+ while (1) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (atomic_long_read(&dio->refcount) == DIO_WAKEUP)
+ break;

- sdio->bio = bio;
- sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
+ io_schedule();
+ }
+ __set_current_state(TASK_RUNNING);
}

/*
- * In the AIO read case we speculatively dirty the pages before starting IO.
- * During IO completion, any of these pages which happen to have been written
- * back will be redirtied by bio_check_pages_dirty().
+ * For reads we speculatively dirty the pages before starting IO. During IO
+ * completion, any of these pages which happen to have been written back will be
+ * redirtied by bio_check_pages_dirty().
*
* bios hold a dio reference between submit_bio and ->end_io.
*/
-static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
+static void dio_bio_submit(struct dio *dio, struct dio_submit *sdio,
+ struct bio *bio, loff_t offset)
{
- struct bio *bio = sdio->bio;
- unsigned long flags;
-
- bio->bi_private = dio;
-
- spin_lock_irqsave(&dio->bio_lock, flags);
- dio->refcount++;
- spin_unlock_irqrestore(&dio->bio_lock, flags);
-
- if (dio->is_async && dio->rw == READ)
- bio_set_pages_dirty(bio);
+ /*
+ * Read accounting is performed in submit_bio()
+ */
+ if (dio->rw & WRITE)
+ task_io_account_write(bio->bi_iter.bi_size);

if (sdio->submit_io)
sdio->submit_io(dio->rw, bio, dio->inode,
- sdio->logical_offset_in_bio);
+ offset >> sdio->i_blkbits);
else
submit_bio(dio->rw, bio);
-
- sdio->bio = NULL;
- sdio->boundary = 0;
- sdio->logical_offset_in_bio = 0;
-}
-
-/*
- * Release any resources in case of a failure
- */
-static inline void dio_cleanup(struct dio *dio, struct dio_submit *sdio)
-{
- while (dio_pages_present(sdio))
- page_cache_release(dio_get_page(dio, sdio));
-}
-
-/*
- * Wait for the next BIO to complete. Remove it and return it. NULL is
- * returned once all BIOs have been completed. This must only be called once
- * all bios have been issued so that dio->refcount can only decrease. This
- * requires that that the caller hold a reference on the dio.
- */
-static struct bio *dio_await_one(struct dio *dio)
-{
- unsigned long flags;
- struct bio *bio = NULL;
-
- spin_lock_irqsave(&dio->bio_lock, flags);
-
- /*
- * Wait as long as the list is empty and there are bios in flight. bio
- * completion drops the count, maybe adds to the list, and wakes while
- * holding the bio_lock so we don't need set_current_state()'s barrier
- * and can call it after testing our condition.
- */
- while (dio->refcount > 1 && dio->bio_list == NULL) {
- __set_current_state(TASK_UNINTERRUPTIBLE);
- dio->waiter = current;
- spin_unlock_irqrestore(&dio->bio_lock, flags);
- io_schedule();
- /* wake up sets us TASK_RUNNING */
- spin_lock_irqsave(&dio->bio_lock, flags);
- dio->waiter = NULL;
- }
- if (dio->bio_list) {
- bio = dio->bio_list;
- dio->bio_list = bio->bi_private;
- }
- spin_unlock_irqrestore(&dio->bio_lock, flags);
- return bio;
}

/*
- * Process one completed BIO. No locks are held.
+ * Clean any dirty buffers in the blockdev mapping which alias newly-created
+ * file blocks. Only called for S_ISREG files - blockdevs do not set buffer_new
*/
-static int dio_bio_complete(struct dio *dio, struct bio *bio)
+static void clean_blockdev_aliases(struct dio *dio, struct dio_submit *sdio,
+ struct buffer_head *map_bh)
{
- const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
- struct bio_vec *bvec;
unsigned i;
+ unsigned nblocks;

- if (!uptodate)
- dio->io_error = -EIO;
+ nblocks = map_bh->b_size >> sdio->i_blkbits;

- if (dio->is_async && dio->rw == READ) {
- bio_check_pages_dirty(bio); /* transfers ownership */
- } else {
- bio_for_each_segment_all(bvec, bio, i) {
- struct page *page = bvec->bv_page;
-
- if (dio->rw == READ && !PageCompound(page))
- set_page_dirty_lock(page);
- page_cache_release(page);
- }
- bio_put(bio);
- }
- return uptodate ? 0 : -EIO;
+ for (i = 0; i < nblocks; i++)
+ unmap_underlying_metadata(map_bh->b_bdev,
+ map_bh->b_blocknr + i);
}

-/*
- * Wait on and process all in-flight BIOs. This must only be called once
- * all bios have been issued so that the refcount can only decrease.
- * This just waits for all bios to make it through dio_bio_complete. IO
- * errors are propagated through dio->io_error and should be propagated via
- * dio_complete().
- */
-static void dio_await_completion(struct dio *dio)
-{
- struct bio *bio;
- do {
- bio = dio_await_one(dio);
- if (bio)
- dio_bio_complete(dio, bio);
- } while (bio);
-}
+struct dio_mapping {
+ enum {
+ MAP_MAPPED,
+ MAP_NEW,
+ MAP_UNMAPPED,
+ } state;

-/*
- * A really large O_DIRECT read or write can generate a lot of BIOs. So
- * to keep the memory consumption sane we periodically reap any completed BIOs
- * during the BIO generation phase.
- *
- * This also helps to limit the peak amount of pinned userspace memory.
- */
-static inline int dio_bio_reap(struct dio *dio, struct dio_submit *sdio)
-{
- int ret = 0;
-
- if (sdio->reap_counter++ >= 64) {
- while (dio->bio_list) {
- unsigned long flags;
- struct bio *bio;
- int ret2;
-
- spin_lock_irqsave(&dio->bio_lock, flags);
- bio = dio->bio_list;
- dio->bio_list = bio->bi_private;
- spin_unlock_irqrestore(&dio->bio_lock, flags);
- ret2 = dio_bio_complete(dio, bio);
- if (ret == 0)
- ret = ret2;
- }
- sdio->reap_counter = 0;
- }
- return ret;
-}
+ struct block_device *bdev;
+ sector_t sector;
+ size_t size;
+};

-/*
- * Call into the fs to map some more disk blocks. We record the current number
- * of available blocks at sdio->blocks_available. These are in units of the
- * fs blocksize, (1 << inode->i_blkbits).
- *
- * The fs is allowed to map lots of blocks at once. If it wants to do that,
- * it uses the passed inode-relative block number as the file offset, as usual.
- *
- * get_block() is passed the number of i_blkbits-sized blocks which direct_io
- * has remaining to do. The fs should not map more than this number of blocks.
- *
- * If the fs has mapped a lot of blocks, it should populate bh->b_size to
- * indicate how much contiguous disk space has been made available at
- * bh->b_blocknr.
- *
- * If *any* of the mapped blocks are new, then the fs must set buffer_new().
- * This isn't very efficient...
- *
- * In the case of filesystem holes: the fs may return an arbitrarily-large
- * hole by returning an appropriate value in b_size and by clearing
- * buffer_mapped(). However the direct-io code will only process holes one
- * block at a time - it will repeatedly call get_block() as it walks the hole.
- */
-static int get_more_blocks(struct dio *dio, struct dio_submit *sdio,
- struct buffer_head *map_bh)
+static int get_blocks(struct dio *dio, struct dio_submit *sdio,
+ loff_t offset, size_t size,
+ struct dio_mapping *map)
{
- int ret;
- sector_t fs_startblk; /* Into file, in filesystem-sized blocks */
- sector_t fs_endblk; /* Into file, in filesystem-sized blocks */
- unsigned long fs_count; /* Number of filesystem-sized blocks */
- int create;
- unsigned int i_blkbits = sdio->blkbits + sdio->blkfactor;
+ struct buffer_head map_bh = { 0, };
+ int ret, create;
+ unsigned i_mask = (1 << sdio->i_blkbits) - 1;
+ unsigned fs_offset = offset & i_mask;
+ sector_t fs_block = offset >> sdio->i_blkbits;

/*
- * If there was a memory error and we've overwritten all the
- * mapped blocks then we can now return that memory error
+ * For writes inside i_size on a DIO_SKIP_HOLES filesystem we
+ * forbid block creations: only overwrites are permitted.
+ * We will return early to the caller once we see an
+ * unmapped buffer head returned, and the caller will fall
+ * back to buffered I/O.
+ *
+ * Otherwise the decision is left to the get_blocks method,
+ * which may decide to handle it or also return an unmapped
+ * buffer head.
*/
- ret = dio->page_errors;
- if (ret == 0) {
- BUG_ON(sdio->block_in_file >= sdio->final_block_in_request);
- fs_startblk = sdio->block_in_file >> sdio->blkfactor;
- fs_endblk = (sdio->final_block_in_request - 1) >>
- sdio->blkfactor;
- fs_count = fs_endblk - fs_startblk + 1;
-
- map_bh->b_state = 0;
- map_bh->b_size = fs_count << i_blkbits;
-
- /*
- * For writes inside i_size on a DIO_SKIP_HOLES filesystem we
- * forbid block creations: only overwrites are permitted.
- * We will return early to the caller once we see an
- * unmapped buffer head returned, and the caller will fall
- * back to buffered I/O.
- *
- * Otherwise the decision is left to the get_blocks method,
- * which may decide to handle it or also return an unmapped
- * buffer head.
- */
- create = dio->rw & WRITE;
- if (dio->flags & DIO_SKIP_HOLES) {
- if (sdio->block_in_file < (i_size_read(dio->inode) >>
- sdio->blkbits))
- create = 0;
- }
-
- ret = (*sdio->get_block)(dio->inode, fs_startblk,
- map_bh, create);
-
- /* Store for completion */
- dio->private = map_bh->b_private;
+ create = dio->rw & WRITE;
+ if (dio->flags & DIO_SKIP_HOLES) {
+ if (fs_block < dio->i_size >> sdio->i_blkbits)
+ create = 0;
}
- return ret;
-}

-/*
- * There is no bio. Make one now.
- */
-static inline int dio_new_bio(struct dio *dio, struct dio_submit *sdio,
- sector_t start_sector, struct buffer_head *map_bh)
-{
- sector_t sector;
- int ret, nr_pages;
+ map_bh.b_state = 0;
+ map_bh.b_size = size + fs_offset;

- ret = dio_bio_reap(dio, sdio);
+ ret = sdio->get_block(dio->inode, fs_block,
+ &map_bh, create);
if (ret)
- goto out;
- sector = start_sector << (sdio->blkbits - 9);
- nr_pages = min(sdio->pages_in_io, bio_get_nr_vecs(map_bh->b_bdev));
- nr_pages = min(nr_pages, BIO_MAX_PAGES);
- BUG_ON(nr_pages <= 0);
- dio_bio_alloc(dio, sdio, map_bh->b_bdev, sector, nr_pages);
- sdio->boundary = 0;
-out:
- return ret;
-}
+ return ret;

-/*
- * Attempt to put the current chunk of 'cur_page' into the current BIO. If
- * that was successful then update final_block_in_bio and take a ref against
- * the just-added page.
- *
- * Return zero on success. Non-zero means the caller needs to start a new BIO.
- */
-static inline int dio_bio_add_page(struct dio_submit *sdio)
-{
- int ret;
+ /* Store for completion */
+ dio->private = map_bh.b_private;

- ret = bio_add_page(sdio->bio, sdio->cur_page,
- sdio->cur_page_len, sdio->cur_page_offset);
- if (ret == sdio->cur_page_len) {
- /*
- * Decrement count only, if we are done with this page
- */
- if ((sdio->cur_page_len + sdio->cur_page_offset) == PAGE_SIZE)
- sdio->pages_in_io--;
- page_cache_get(sdio->cur_page);
- sdio->final_block_in_bio = sdio->cur_page_block +
- (sdio->cur_page_len >> sdio->blkbits);
- ret = 0;
- } else {
- ret = 1;
- }
- return ret;
-}
-
-/*
- * Put cur_page under IO. The section of cur_page which is described by
- * cur_page_offset,cur_page_len is put into a BIO. The section of cur_page
- * starts on-disk at cur_page_block.
- *
- * We take a ref against the page here (on behalf of its presence in the bio).
- *
- * The caller of this function is responsible for removing cur_page from the
- * dio, and for dropping the refcount which came from that presence.
- */
-static inline int dio_send_cur_page(struct dio *dio, struct dio_submit *sdio,
- struct buffer_head *map_bh)
-{
- int ret = 0;
+ if (!buffer_mapped(&map_bh))
+ map->state = MAP_UNMAPPED;
+ else if (buffer_new(&map_bh))
+ map->state = MAP_NEW;
+ else
+ map->state = MAP_MAPPED;

- if (sdio->bio) {
- loff_t cur_offset = sdio->cur_page_fs_offset;
- loff_t bio_next_offset = sdio->logical_offset_in_bio +
- sdio->bio->bi_iter.bi_size;
+ /* Holes always 1 block? */
+ if (map->state == MAP_UNMAPPED)
+ map_bh.b_size = 1 << sdio->i_blkbits;

- /*
- * See whether this new request is contiguous with the old.
- *
- * Btrfs cannot handle having logically non-contiguous requests
- * submitted. For example if you have
- *
- * Logical: [0-4095][HOLE][8192-12287]
- * Physical: [0-4095] [4096-8191]
- *
- * We cannot submit those pages together as one BIO. So if our
- * current logical offset in the file does not equal what would
- * be the next logical offset in the bio, submit the bio we
- * have.
- */
- if (sdio->final_block_in_bio != sdio->cur_page_block ||
- cur_offset != bio_next_offset)
- dio_bio_submit(dio, sdio);
- }
+ if (map->state == MAP_NEW)
+ clean_blockdev_aliases(dio, sdio, &map_bh);

- if (sdio->bio == NULL) {
- ret = dio_new_bio(dio, sdio, sdio->cur_page_block, map_bh);
- if (ret)
- goto out;
- }
+ BUG_ON(map_bh.b_size <= fs_offset);
+
+ map->bdev = map_bh.b_bdev;
+ map->sector = (map_bh.b_blocknr << (sdio->i_blkbits - 9)) +
+ (fs_offset >> 9);
+ map->size = min(map_bh.b_size - fs_offset, size);

- if (dio_bio_add_page(sdio) != 0) {
- dio_bio_submit(dio, sdio);
- ret = dio_new_bio(dio, sdio, sdio->cur_page_block, map_bh);
- if (ret == 0) {
- ret = dio_bio_add_page(sdio);
- BUG_ON(ret != 0);
- }
- }
-out:
return ret;
}

-/*
- * An autonomous function to put a chunk of a page under deferred IO.
- *
- * The caller doesn't actually know (or care) whether this piece of page is in
- * a BIO, or is under IO or whatever. We just take care of all possible
- * situations here. The separation between the logic of do_direct_IO() and
- * that of submit_page_section() is important for clarity. Please don't break.
- *
- * The chunk of page starts on-disk at blocknr.
- *
- * We perform deferred IO, by recording the last-submitted page inside our
- * private part of the dio structure. If possible, we just expand the IO
- * across that page here.
- *
- * If that doesn't work out then we put the old page into the bio and add this
- * page to the dio instead.
- */
-static inline int
-submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page,
- unsigned offset, unsigned len, sector_t blocknr,
- struct buffer_head *map_bh)
+static void dio_write_zeroes(struct dio *dio, struct bio *parent,
+ sector_t sector, size_t size)
{
- int ret = 0;
+ unsigned pages = DIV_ROUND_UP(size, PAGE_SIZE);
+ struct bio *bio = bio_alloc(GFP_KERNEL, pages);

- if (dio->rw & WRITE) {
- /*
- * Read accounting is performed in submit_bio()
- */
- task_io_account_write(len);
+ while (pages--) {
+ bio->bi_io_vec[pages].bv_page = ZERO_PAGE(0);
+ bio->bi_io_vec[pages].bv_len = PAGE_SIZE;
+ bio->bi_io_vec[pages].bv_offset = 0;
}

- /*
- * Can we just grow the current page's presence in the dio?
- */
- if (sdio->cur_page == page &&
- sdio->cur_page_offset + sdio->cur_page_len == offset &&
- sdio->cur_page_block +
- (sdio->cur_page_len >> sdio->blkbits) == blocknr) {
- sdio->cur_page_len += len;
- goto out;
- }
-
- /*
- * If there's a deferred page already there then send it.
- */
- if (sdio->cur_page) {
- ret = dio_send_cur_page(dio, sdio, map_bh);
- page_cache_release(sdio->cur_page);
- sdio->cur_page = NULL;
- if (ret)
- return ret;
- }
+ bio->bi_bdev = parent->bi_bdev;
+ bio->bi_iter.bi_sector = sector;
+ bio->bi_iter.bi_size = size;

- page_cache_get(page); /* It is in dio */
- sdio->cur_page = page;
- sdio->cur_page_offset = offset;
- sdio->cur_page_len = len;
- sdio->cur_page_block = blocknr;
- sdio->cur_page_fs_offset = sdio->block_in_file << sdio->blkbits;
-out:
- /*
- * If sdio->boundary then we want to schedule the IO now to
- * avoid metadata seeks.
- */
- if (sdio->boundary) {
- ret = dio_send_cur_page(dio, sdio, map_bh);
- dio_bio_submit(dio, sdio);
- page_cache_release(sdio->cur_page);
- sdio->cur_page = NULL;
- }
- return ret;
+ bio_chain(bio, parent);
+ submit_bio(WRITE, bio);
}

-/*
- * Clean any dirty buffers in the blockdev mapping which alias newly-created
- * file blocks. Only called for S_ISREG files - blockdevs do not set
- * buffer_new
- */
-static void clean_blockdev_aliases(struct dio *dio, struct buffer_head *map_bh)
+static void dio_zero_partial_block(struct dio *dio, struct dio_submit *sdio,
+ struct bio *bio, loff_t offset,
+ struct dio_mapping *map)
{
- unsigned i;
- unsigned nblocks;
-
- nblocks = map_bh->b_size >> dio->inode->i_blkbits;
-
- for (i = 0; i < nblocks; i++) {
- unmap_underlying_metadata(map_bh->b_bdev,
- map_bh->b_blocknr + i);
+ if ((dio->rw & WRITE) && map->state == MAP_NEW) {
+ unsigned blksize = 1 << sdio->i_blkbits;
+ unsigned blkmask = blksize - 1;
+ unsigned front = offset & blkmask;
+ unsigned back = (offset + bio->bi_iter.bi_size) & blkmask;
+
+ if (front)
+ dio_write_zeroes(dio, bio,
+ bio->bi_iter.bi_sector - (front >> 9),
+ front);
+
+ if (back)
+ dio_write_zeroes(dio, bio, bio_end_sector(bio),
+ blksize - back);
}
}

-/*
- * If we are not writing the entire block and get_block() allocated
- * the block for us, we need to fill-in the unused portion of the
- * block with zeros. This happens only if user-buffer, fileoffset or
- * io length is not filesystem block-size multiple.
- *
- * `end' is zero if we're doing the start of the IO, 1 at the end of the
- * IO.
- */
-static inline void dio_zero_block(struct dio *dio, struct dio_submit *sdio,
- int end, struct buffer_head *map_bh)
+static int dio_send_bio(struct dio *dio, struct dio_submit *sdio,
+ struct bio *bio, loff_t offset)
{
- unsigned dio_blocks_per_fs_block;
- unsigned this_chunk_blocks; /* In dio_blocks */
- unsigned this_chunk_bytes;
- struct page *page;
+ struct dio_mapping map;
+ struct bio *split;
+ int ret;

- sdio->start_zero_done = 1;
- if (!sdio->blkfactor || !buffer_new(map_bh))
- return;
+ while (1) {
+ if (dio->rw == READ && offset >= dio->i_size)
+ break;

- dio_blocks_per_fs_block = 1 << sdio->blkfactor;
- this_chunk_blocks = sdio->block_in_file & (dio_blocks_per_fs_block - 1);
+ ret = get_blocks(dio, sdio, offset, bio->bi_iter.bi_size, &map);
+ if (ret)
+ break;

- if (!this_chunk_blocks)
- return;
+ if (map.state != MAP_UNMAPPED) {
+ split = bio_next_split(bio, map.size >> 9,
+ GFP_KERNEL, fs_bio_set);

- /*
- * We need to zero out part of an fs block. It is either at the
- * beginning or the end of the fs block.
- */
- if (end)
- this_chunk_blocks = dio_blocks_per_fs_block - this_chunk_blocks;
+ if (split != bio)
+ bio_chain(split, bio);

- this_chunk_bytes = this_chunk_blocks << sdio->blkbits;
+ split->bi_bdev = map.bdev;
+ split->bi_iter.bi_sector = map.sector;

- page = ZERO_PAGE(0);
- if (submit_page_section(dio, sdio, page, 0, this_chunk_bytes,
- sdio->next_block_for_io, map_bh))
- return;
+ dio_zero_partial_block(dio, sdio, split, offset, &map);

- sdio->next_block_for_io += this_chunk_blocks;
-}
+ dio->result += map.size;
+ dio_bio_submit(dio, sdio, split, offset);

-/*
- * Walk the user pages, and the file, mapping blocks to disk and generating
- * a sequence of (page,offset,len,block) mappings. These mappings are injected
- * into submit_page_section(), which takes care of the next stage of submission
- *
- * Direct IO against a blockdev is different from a file. Because we can
- * happily perform page-sized but 512-byte aligned IOs. It is important that
- * blockdev IO be able to have fine alignment and large sizes.
- *
- * So what we do is to permit the ->get_block function to populate bh.b_size
- * with the size of IO which is permitted at this offset and this i_blkbits.
- *
- * For best results, the blockdev should be set up with 512-byte i_blkbits and
- * it should set b_size to PAGE_SIZE or more inside get_block(). This gives
- * fine alignment but still allows this function to work in PAGE_SIZE units.
- */
-static int do_direct_IO(struct dio *dio, struct dio_submit *sdio,
- struct buffer_head *map_bh)
-{
- const unsigned blkbits = sdio->blkbits;
- const unsigned blocks_per_page = PAGE_SIZE >> blkbits;
- struct page *page;
- unsigned block_in_page;
- int ret = 0;
-
- /* The I/O can start at any block offset within the first page */
- block_in_page = sdio->first_block_in_page;
-
- while (sdio->block_in_file < sdio->final_block_in_request) {
- page = dio_get_page(dio, sdio);
- if (IS_ERR(page)) {
- ret = PTR_ERR(page);
- goto out;
- }
+ if (split == bio)
+ return 0;
+ } else {
+ /* Hole */

- while (block_in_page < blocks_per_page) {
- unsigned offset_in_page = block_in_page << blkbits;
- unsigned this_chunk_bytes; /* # of bytes mapped */
- unsigned this_chunk_blocks; /* # of blocks */
- unsigned u;
-
- if (sdio->blocks_available == 0) {
- /*
- * Need to go and map some more disk
- */
- unsigned long blkmask;
- unsigned long dio_remainder;
-
- ret = get_more_blocks(dio, sdio, map_bh);
- if (ret) {
- page_cache_release(page);
- goto out;
- }
- if (!buffer_mapped(map_bh))
- goto do_holes;
-
- sdio->blocks_available =
- map_bh->b_size >> sdio->blkbits;
- sdio->next_block_for_io =
- map_bh->b_blocknr << sdio->blkfactor;
- if (buffer_new(map_bh))
- clean_blockdev_aliases(dio, map_bh);
-
- if (!sdio->blkfactor)
- goto do_holes;
-
- blkmask = (1 << sdio->blkfactor) - 1;
- dio_remainder = (sdio->block_in_file & blkmask);
-
- /*
- * If we are at the start of IO and that IO
- * starts partway into a fs-block,
- * dio_remainder will be non-zero. If the IO
- * is a read then we can simply advance the IO
- * cursor to the first block which is to be
- * read. But if the IO is a write and the
- * block was newly allocated we cannot do that;
- * the start of the fs block must be zeroed out
- * on-disk
- */
- if (!buffer_new(map_bh))
- sdio->next_block_for_io += dio_remainder;
- sdio->blocks_available -= dio_remainder;
- }
-do_holes:
- /* Handle holes */
- if (!buffer_mapped(map_bh)) {
- loff_t i_size_aligned;
-
- /* AKPM: eargh, -ENOTBLK is a hack */
- if (dio->rw & WRITE) {
- page_cache_release(page);
- return -ENOTBLK;
- }
-
- /*
- * Be sure to account for a partial block as the
- * last block in the file
- */
- i_size_aligned = ALIGN(i_size_read(dio->inode),
- 1 << blkbits);
- if (sdio->block_in_file >=
- i_size_aligned >> blkbits) {
- /* We hit eof */
- page_cache_release(page);
- goto out;
- }
- zero_user(page, block_in_page << blkbits,
- 1 << blkbits);
- sdio->block_in_file++;
- block_in_page++;
- goto next_block;
+ /* AKPM: eargh, -ENOTBLK is a hack */
+ if (dio->rw & WRITE) {
+ ret = -ENOTBLK;
+ break;
}

- /*
- * If we're performing IO which has an alignment which
- * is finer than the underlying fs, go check to see if
- * we must zero out the start of this block.
- */
- if (unlikely(sdio->blkfactor && !sdio->start_zero_done))
- dio_zero_block(dio, sdio, 0, map_bh);
-
- /*
- * Work out, in this_chunk_blocks, how much disk we
- * can add to this page
- */
- this_chunk_blocks = sdio->blocks_available;
- u = (PAGE_SIZE - offset_in_page) >> blkbits;
- if (this_chunk_blocks > u)
- this_chunk_blocks = u;
- u = sdio->final_block_in_request - sdio->block_in_file;
- if (this_chunk_blocks > u)
- this_chunk_blocks = u;
- this_chunk_bytes = this_chunk_blocks << blkbits;
- BUG_ON(this_chunk_bytes == 0);
-
- if (this_chunk_blocks == sdio->blocks_available)
- sdio->boundary = buffer_boundary(map_bh);
- ret = submit_page_section(dio, sdio, page,
- offset_in_page,
- this_chunk_bytes,
- sdio->next_block_for_io,
- map_bh);
- if (ret) {
- page_cache_release(page);
- goto out;
- }
- sdio->next_block_for_io += this_chunk_blocks;
-
- sdio->block_in_file += this_chunk_blocks;
- block_in_page += this_chunk_blocks;
- sdio->blocks_available -= this_chunk_blocks;
-next_block:
- BUG_ON(sdio->block_in_file > sdio->final_block_in_request);
- if (sdio->block_in_file == sdio->final_block_in_request)
+ swap(bio->bi_iter.bi_size, map.size);
+ zero_fill_bio(bio);
+ swap(bio->bi_iter.bi_size, map.size);
+
+ dio->result += map.size;
+ bio_advance(bio, map.size);
+
+ if (!bio->bi_iter.bi_size)
break;
}

- /* Drop the ref which was taken in get_user_pages() */
- page_cache_release(page);
- block_in_page = 0;
+ offset += map.size;
}
-out:
+
+ bio_endio(bio, 0);
return ret;
}

-static inline int drop_refcount(struct dio *dio)
+static int dio_alloc_bios(struct dio *dio, struct dio_submit *sdio,
+ const struct iovec *iov, loff_t offset,
+ unsigned long nr_segs, unsigned nr_pages)
{
- int ret2;
- unsigned long flags;
+ ssize_t ret;
+ size_t seg_done = 0;
+ unsigned seg = 0;
+ struct bio *bio;

- /*
- * Sync will always be dropping the final ref and completing the
- * operation. AIO can if it was a broken operation described above or
- * in fact if all the bios race to complete before we get here. In
- * that case dio_complete() translates the EIOCBQUEUED into the proper
- * return code that the caller will hand to aio_complete().
- *
- * This is managed by the bio_lock instead of being an atomic_t so that
- * completion paths can drop their ref and use the remaining count to
- * decide to wake the submission path atomically.
- */
- spin_lock_irqsave(&dio->bio_lock, flags);
- ret2 = --dio->refcount;
- spin_unlock_irqrestore(&dio->bio_lock, flags);
- return ret2;
+ bio = &dio->bio;
+ bio_get(bio);
+ goto start;
+
+ while (seg < nr_segs) {
+ BUG_ON(!nr_pages);
+
+ bio = bio_alloc(GFP_KERNEL,
+ min_t(unsigned, BIO_MAX_PAGES, nr_pages));
+start:
+ bio->bi_private = dio;
+ bio->bi_end_io = dio_end_io;
+
+ while (bio->bi_vcnt < bio->bi_max_vecs &&
+ seg < nr_segs) {
+ ret = bio_get_user_pages(bio,
+ (size_t) iov[seg].iov_base + seg_done,
+ iov[seg].iov_len - seg_done,
+ dio->rw == READ);
+ if (ret < 0) {
+ struct bio_vec *bv;
+ int i;
+
+ bio_for_each_segment_all(bv, bio, i)
+ page_cache_release(bv->bv_page);
+ bio_put(bio);
+
+ dio->page_error = ret;
+ return 0;
+ }
+
+ seg_done += ret;
+
+ if (seg_done == iov[seg].iov_len) {
+ seg++;
+ seg_done = 0;
+ }
+ }
+
+ nr_pages -= bio->bi_vcnt;
+
+ if (dio->rw == READ)
+ bio_set_pages_dirty(bio);
+
+ atomic_long_inc(&dio->refcount);
+ ret = dio_send_bio(dio, sdio, bio, offset + dio->result);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
}

/*
@@ -1041,76 +474,57 @@ static inline int drop_refcount(struct dio *dio)
* individual fields and will generate much worse code. This is important
* for the whole file.
*/
-static inline ssize_t
+static ssize_t
do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
- struct block_device *bdev, const struct iovec *iov, loff_t offset,
+ struct block_device *bdev, const struct iovec *iov, loff_t offset,
unsigned long nr_segs, get_block_t get_block, dio_iodone_t end_io,
dio_submit_t submit_io, int flags)
{
- int seg;
- size_t size;
- unsigned long addr;
- unsigned i_blkbits = ACCESS_ONCE(inode->i_blkbits);
- unsigned blkbits = i_blkbits;
- unsigned blocksize_mask = (1 << blkbits) - 1;
- ssize_t retval = -EINVAL;
- loff_t end = offset;
+ unsigned nr_pages = 0, blocksize_mask;
+ size_t size = 0;
+ ssize_t retval = 0;
+ const struct iovec *v;
struct dio *dio;
- struct dio_submit sdio = { 0, };
- unsigned long user_addr;
- size_t bytes;
- struct buffer_head map_bh = { 0, };
+ struct dio_submit sdio;
struct blk_plug plug;

if (rw & WRITE)
rw = WRITE_ODIRECT;

- /*
- * Avoid references to bdev if not absolutely needed to give
- * the early prefetch in the caller enough time.
- */
+ sdio.get_block = get_block;
+ sdio.submit_io = submit_io;
+ sdio.i_blkbits = ACCESS_ONCE(inode->i_blkbits);

- if (offset & blocksize_mask) {
- if (bdev)
- blkbits = blksize_bits(bdev_logical_block_size(bdev));
- blocksize_mask = (1 << blkbits) - 1;
- if (offset & blocksize_mask)
- goto out;
- }
+ for (v = iov; v < iov + nr_segs; v++) {
+ unsigned offset = (size_t) v->iov_base & (PAGE_SIZE - 1);

- /* Check the memory alignment. Blocks cannot straddle pages */
- for (seg = 0; seg < nr_segs; seg++) {
- addr = (unsigned long)iov[seg].iov_base;
- size = iov[seg].iov_len;
- end += size;
- if (unlikely((addr & blocksize_mask) ||
- (size & blocksize_mask))) {
- if (bdev)
- blkbits = blksize_bits(
- bdev_logical_block_size(bdev));
- blocksize_mask = (1 << blkbits) - 1;
- if ((addr & blocksize_mask) || (size & blocksize_mask))
- goto out;
- }
+ nr_pages += DIV_ROUND_UP(offset + v->iov_len, PAGE_SIZE);
+ size += v->iov_len;
}

/* watch out for a 0 len io from a tricksy fs */
- if (rw == READ && end == offset)
+ if (rw == READ && !size)
return 0;

- dio = kmem_cache_alloc(dio_cache, GFP_KERNEL);
- retval = -ENOMEM;
- if (!dio)
- goto out;
+ blocksize_mask = (1 << sdio.i_blkbits) - 1;
+
/*
- * Believe it or not, zeroing out the page array caused a .5%
- * performance regression in a database benchmark. So, we take
- * care to only zero out what's needed.
+ * Avoid references to bdev if not absolutely needed to give
+ * the early prefetch in the caller enough time.
*/
- memset(dio, 0, offsetof(struct dio, pages));

- dio->flags = flags;
- if (dio->flags & DIO_LOCKING) {
+ if (unlikely((offset & blocksize_mask) ||
+ (size & blocksize_mask))) {
+ if (bdev)
+ blocksize_mask = roundup_pow_of_two(
+ bdev_logical_block_size(bdev)) - 1;
+
+ if ((offset & blocksize_mask) ||
+ (size & blocksize_mask))
+ return -EINVAL;
+ }
+
+ if (flags & DIO_LOCKING) {
if (rw == READ) {
struct address_space *mapping =
iocb->ki_filp->f_mapping;
@@ -1119,11 +533,10 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
mutex_lock(&inode->i_mutex);

retval = filemap_write_and_wait_range(mapping, offset,
- end - 1);
+ offset + size - 1);
if (retval) {
mutex_unlock(&inode->i_mutex);
- kmem_cache_free(dio_cache, dio);
- goto out;
+ return retval;
}
}
}
@@ -1133,83 +546,27 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
*/
atomic_inc(&inode->i_dio_count);

- /*
- * For file extending writes updating i_size before data
- * writeouts complete can expose uninitialized blocks. So
- * even for AIO, we need to wait for i/o to complete before
- * returning in this case.
- */
- dio->is_async = !is_sync_kiocb(iocb) && !((rw & WRITE) &&
- (end > i_size_read(inode)));
-
- retval = 0;
-
- dio->inode = inode;
- dio->rw = rw;
- sdio.blkbits = blkbits;
- sdio.blkfactor = i_blkbits - blkbits;
- sdio.block_in_file = offset >> blkbits;
-
- sdio.get_block = get_block;
- dio->end_io = end_io;
- sdio.submit_io = submit_io;
- sdio.final_block_in_bio = -1;
- sdio.next_block_for_io = -1;
-
- dio->iocb = iocb;
- dio->i_size = i_size_read(inode);
-
- spin_lock_init(&dio->bio_lock);
- dio->refcount = 1;
-
- /*
- * In case of non-aligned buffers, we may need 2 more
- * pages since we need to zero out first and last block.
- */
- if (unlikely(sdio.blkfactor))
- sdio.pages_in_io = 2;
-
- for (seg = 0; seg < nr_segs; seg++) {
- user_addr = (unsigned long)iov[seg].iov_base;
- sdio.pages_in_io +=
- ((user_addr + iov[seg].iov_len + PAGE_SIZE-1) /
- PAGE_SIZE - user_addr / PAGE_SIZE);
- }
+ dio = container_of(bio_alloc_bioset(GFP_KERNEL,
+ min_t(unsigned, BIO_MAX_PAGES, nr_pages),
+ dio_pool),
+ struct dio, bio);
+
+ dio->flags = flags;
+ dio->rw = rw;
+ dio->inode = inode;
+ dio->i_size = i_size_read(inode);
+ dio->end_io = end_io;
+ dio->private = NULL;
+ dio->page_error = 0;
+ dio->io_error = 0;
+ atomic_long_set(&dio->refcount, 1);
+ dio->waiter = current;
+ dio->iocb = iocb;
+ dio->result = 0;

blk_start_plug(&plug);

- for (seg = 0; seg < nr_segs; seg++) {
- user_addr = (unsigned long)iov[seg].iov_base;
- sdio.size += bytes = iov[seg].iov_len;
-
- /* Index into the first page of the first block */
- sdio.first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits;
- sdio.final_block_in_request = sdio.block_in_file +
- (bytes >> blkbits);
- /* Page fetching state */
- sdio.head = 0;
- sdio.tail = 0;
- sdio.curr_page = 0;
-
- sdio.total_pages = 0;
- if (user_addr & (PAGE_SIZE-1)) {
- sdio.total_pages++;
- bytes -= PAGE_SIZE - (user_addr & (PAGE_SIZE - 1));
- }
- sdio.total_pages += (bytes + PAGE_SIZE - 1) / PAGE_SIZE;
- sdio.curr_user_address = user_addr;
-
- retval = do_direct_IO(dio, &sdio, &map_bh);
-
- dio->result += iov[seg].iov_len -
- ((sdio.final_block_in_request - sdio.block_in_file) <<
- blkbits);
-
- if (retval) {
- dio_cleanup(dio, &sdio);
- break;
- }
- } /* end iovec loop */
+ retval = dio_alloc_bios(dio, &sdio, iov, offset, nr_segs, nr_pages);

if (retval == -ENOTBLK) {
/*
@@ -1218,33 +575,10 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
*/
retval = 0;
}
- /*
- * There may be some unwritten disk at the end of a part-written
- * fs-block-sized block. Go zero that now.
- */
- dio_zero_block(dio, &sdio, 1, &map_bh);
-
- if (sdio.cur_page) {
- ssize_t ret2;
-
- ret2 = dio_send_cur_page(dio, &sdio, &map_bh);
- if (retval == 0)
- retval = ret2;
- page_cache_release(sdio.cur_page);
- sdio.cur_page = NULL;
- }
- if (sdio.bio)
- dio_bio_submit(dio, &sdio);

blk_finish_plug(&plug);

/*
- * It is possible that, we return short IO due to end of file.
- * In that case, we need to release all the pages we got hold on.
- */
- dio_cleanup(dio, &sdio);
-
- /*
* All block lookups have been performed. For READ requests
* we can let i_mutex go now that its achieved its purpose
* of protecting us from looking up uninitialized blocks.
@@ -1260,20 +594,28 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
* This had *better* be the only place that raises -EIOCBQUEUED.
*/
BUG_ON(retval == -EIOCBQUEUED);
- if (dio->is_async && retval == 0 && dio->result &&
- ((rw == READ) || (dio->result == sdio.size)))
- retval = -EIOCBQUEUED;

- if (retval != -EIOCBQUEUED)
- dio_await_completion(dio);
-
- if (drop_refcount(dio) == 0) {
+ /*
+ * For file extending writes updating i_size before data
+ * writeouts complete can expose uninitialized blocks. So
+ * even for AIO, we need to wait for i/o to complete before
+ * returning in this case.
+ */
+ if (!is_sync_kiocb(iocb) &&
+ retval == 0 && dio->result &&
+ ((rw == READ) ||
+ (offset + size <= dio->i_size &&
+ dio->result == size))) {
+ if (atomic_long_dec_and_test(&dio->refcount))
+ retval = dio_complete(dio, offset, retval, false);
+ else
+ retval = -EIOCBQUEUED;
+ } else {
+ dio_wait_completion(dio);
retval = dio_complete(dio, offset, retval, false);
- kmem_cache_free(dio_cache, dio);
- } else
- BUG_ON(retval != -EIOCBQUEUED);
+ BUG_ON(retval == -EIOCBQUEUED);
+ }

-out:
return retval;
}

@@ -1304,7 +646,9 @@ EXPORT_SYMBOL(__blockdev_direct_IO);

static __init int dio_init(void)
{
- dio_cache = KMEM_CACHE(dio, SLAB_PANIC);
+ dio_pool = bioset_create(4, offsetof(struct dio, bio));
+ if (!dio_pool)
+ return -ENOMEM;
return 0;
}
module_init(dio_init)
--
1.8.3.rc1

2013-06-09 02:20:11

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 21/26] block: Move bouncing to generic_make_request()

Next patch is going to make generic_make_request() handle arbitrary
sized bios by splitting them if necessary. It makes more sense to call
blk_queue_bounce() first, before the bios have been fragmented.

Also, __blk_recalc_rq_segments() now doesn't have to take into account
potential bouncing - it's already been done.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Asai Thambi S P <[email protected]>
---
block/blk-core.c | 14 +++++++-------
block/blk-merge.c | 15 ++++-----------
drivers/block/mtip32xx/mtip32xx.c | 2 --
drivers/block/pktcdvd.c | 2 --
4 files changed, 11 insertions(+), 22 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 0704c5c..4d6eb60 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1470,13 +1470,6 @@ void blk_queue_bio(struct request_queue *q, struct bio *bio)
struct request *req;
unsigned int request_count = 0;

- /*
- * low level driver can indicate that it wants pages above a
- * certain limit bounced to low memory (ie for highmem, or even
- * ISA dma in theory)
- */
- blk_queue_bounce(q, &bio);
-
if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
bio_endio(bio, -EIO);
return;
@@ -1828,6 +1821,13 @@ void generic_make_request(struct bio *bio)
do {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);

+ /*
+ * low level driver can indicate that it wants pages above a
+ * certain limit bounced to low memory (ie for highmem, or even
+ * ISA dma in theory)
+ */
+ blk_queue_bounce(q, &bio);
+
q->make_request_fn(q, bio);

bio = bio_list_pop(current->bio_list);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index b53ddac..ba48830 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -13,7 +13,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
struct bio *bio)
{
struct bio_vec bv, bvprv;
- int cluster, high, highprv = 1;
+ int cluster, prev = 0;
unsigned int seg_size, nr_phys_segs;
struct bio *fbio, *bbio;
struct bvec_iter iter;
@@ -27,15 +27,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
nr_phys_segs = 0;
for_each_bio(bio) {
bio_for_each_segment(bv, bio, iter) {
- /*
- * the trick here is making sure that a high page is
- * never considered part of another segment, since that
- * might change with the bounce page.
- */
- high = page_to_pfn(bv.bv_page) > queue_bounce_pfn(q);
- if (high || highprv)
- goto new_segment;
- if (cluster) {
+ if (prev && cluster) {
if (seg_size + bv.bv_len
> queue_max_segment_size(q))
goto new_segment;
@@ -46,6 +38,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,

seg_size += bv.bv_len;
bvprv = bv;
+ prev = 1;
continue;
}
new_segment:
@@ -55,8 +48,8 @@ new_segment:

nr_phys_segs++;
bvprv = bv;
+ prev = 1;
seg_size = bv.bv_len;
- highprv = high;
}
bbio = bio;
}
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 9fd1751..089cda3 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3912,8 +3912,6 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)

sg = mtip_hw_get_scatterlist(dd, &tag, unaligned);
if (likely(sg != NULL)) {
- blk_queue_bounce(queue, &bio);
-
if (unlikely((bio)->bi_vcnt > MTIP_MAX_SG)) {
dev_warn(&dd->pdev->dev,
"Maximum number of SGL entries exceeded\n");
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index a929817..3f6ad2b 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2486,8 +2486,6 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
goto end_io;
}

- blk_queue_bounce(q, &bio);
-
do {
sector_t zone = ZONE(bio->bi_iter.bi_sector, pd);
sector_t last_zone = ZONE(bio_end_sector(bio) - 1, pd);
--
1.8.3.rc1

2013-06-09 02:21:15

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 24/26] bcache: generic_make_request() handles large bios now

So we get to delete our hacky workaround.

Signed-off-by: Kent Overstreet <[email protected]>
---
drivers/md/bcache/bcache.h | 18 --------
drivers/md/bcache/debug.c | 2 +-
drivers/md/bcache/io.c | 100 +-----------------------------------------
drivers/md/bcache/journal.c | 4 +-
drivers/md/bcache/request.c | 14 +++---
drivers/md/bcache/super.c | 40 ++---------------
drivers/md/bcache/util.h | 4 +-
drivers/md/bcache/writeback.c | 4 +-
8 files changed, 19 insertions(+), 167 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 14aaff5..384a800 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -407,19 +407,6 @@ struct keybuf {
DECLARE_ARRAY_ALLOCATOR(struct keybuf_key, freelist, KEYBUF_NR);
};

-struct bio_split_pool {
- struct bio_set *bio_split;
- mempool_t *bio_split_hook;
-};
-
-struct bio_split_hook {
- struct closure cl;
- struct bio_split_pool *p;
- struct bio *bio;
- bio_end_io_t *bi_end_io;
- void *bi_private;
-};
-
struct bcache_device {
struct closure cl;

@@ -450,8 +437,6 @@ struct bcache_device {
int (*cache_miss)(struct btree *, struct search *,
struct bio *, unsigned);
int (*ioctl) (struct bcache_device *, fmode_t, unsigned, unsigned long);
-
- struct bio_split_pool bio_split_hook;
};

struct io {
@@ -639,8 +624,6 @@ struct cache {
atomic_long_t meta_sectors_written;
atomic_long_t btree_sectors_written;
atomic_long_t sectors_written;
-
- struct bio_split_pool bio_split_hook;
};

struct gc_stat {
@@ -1184,7 +1167,6 @@ void bch_bbio_endio(struct cache_set *, struct bio *, int, const char *);
void bch_bbio_free(struct bio *, struct cache_set *);
struct bio *bch_bbio_alloc(struct cache_set *);

-void bch_generic_make_request(struct bio *, struct bio_split_pool *);
void __bch_submit_bbio(struct bio *, struct cache_set *);
void bch_submit_bbio(struct bio *, struct cache_set *, struct bkey *, unsigned);

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 2077848..3b60e60 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -204,7 +204,7 @@ void bch_data_verify(struct search *s)
check->bi_private = cl;
check->bi_end_io = data_verify_endio;

- closure_bio_submit(check, cl, &dc->disk);
+ closure_bio_submit(check, cl);
closure_sync(cl);

bio_for_each_segment(bv, s->orig_bio, iter) {
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index e3c582d..4b4d1fa 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -9,104 +9,6 @@
#include "bset.h"
#include "debug.h"

-static unsigned bch_bio_max_sectors(struct bio *bio)
-{
- struct request_queue *q = bdev_get_queue(bio->bi_bdev);
- struct bio_vec bv;
- struct bvec_iter iter;
- unsigned ret = 0, seg = 0;
-
- if (bio->bi_rw & REQ_DISCARD)
- return min(bio_sectors(bio), q->limits.max_discard_sectors);
-
- bio_for_each_segment(bv, bio, iter) {
- struct bvec_merge_data bvm = {
- .bi_bdev = bio->bi_bdev,
- .bi_sector = bio->bi_iter.bi_sector,
- .bi_size = ret << 9,
- .bi_rw = bio->bi_rw,
- };
-
- if (seg == min_t(unsigned, BIO_MAX_PAGES,
- queue_max_segments(q)))
- break;
-
- if (q->merge_bvec_fn &&
- q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
- break;
-
- seg++;
- ret += bv.bv_len >> 9;
- }
-
- ret = min(ret, queue_max_sectors(q));
-
- WARN_ON(!ret);
- ret = max_t(int, ret, bio_iovec(bio).bv_len >> 9);
-
- return ret;
-}
-
-static void bch_bio_submit_split_done(struct closure *cl)
-{
- struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
- s->bio->bi_end_io = s->bi_end_io;
- s->bio->bi_private = s->bi_private;
- s->bio->bi_end_io(s->bio, 0);
-
- closure_debug_destroy(&s->cl);
- mempool_free(s, s->p->bio_split_hook);
-}
-
-static void bch_bio_submit_split_endio(struct bio *bio, int error)
-{
- struct closure *cl = bio->bi_private;
- struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
- if (error)
- clear_bit(BIO_UPTODATE, &s->bio->bi_flags);
-
- bio_put(bio);
- closure_put(cl);
-}
-
-void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
-{
- struct bio_split_hook *s;
- struct bio *n;
-
- if (!bio_has_data(bio) && !(bio->bi_rw & REQ_DISCARD))
- goto submit;
-
- if (bio_sectors(bio) <= bch_bio_max_sectors(bio))
- goto submit;
-
- s = mempool_alloc(p->bio_split_hook, GFP_NOIO);
- closure_init(&s->cl, NULL);
-
- s->bio = bio;
- s->p = p;
- s->bi_end_io = bio->bi_end_io;
- s->bi_private = bio->bi_private;
- bio_get(bio);
-
- do {
- n = bio_next_split(bio, bch_bio_max_sectors(bio),
- GFP_NOIO, s->p->bio_split);
-
- n->bi_end_io = bch_bio_submit_split_endio;
- n->bi_private = &s->cl;
-
- closure_get(&s->cl);
- generic_make_request(n);
- } while (n != bio);
-
- continue_at(&s->cl, bch_bio_submit_split_done, NULL);
-submit:
- generic_make_request(bio);
-}
-
/* Bios with headers */

void bch_bbio_free(struct bio *bio, struct cache_set *c)
@@ -136,7 +38,7 @@ void __bch_submit_bbio(struct bio *bio, struct cache_set *c)
bio->bi_bdev = PTR_CACHE(c, &b->key, 0)->bdev;

b->submit_time_us = local_clock_us();
- closure_bio_submit(bio, bio->bi_private, PTR_CACHE(c, &b->key, 0));
+ closure_bio_submit(bio, bio->bi_private);
}

void bch_submit_bbio(struct bio *bio, struct cache_set *c,
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 724cb7a..c52e1cf 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -56,7 +56,7 @@ reread: left = ca->sb.bucket_size - offset;
bio->bi_private = &op->cl;
bch_bio_map(bio, data);

- closure_bio_submit(bio, &op->cl, ca);
+ closure_bio_submit(bio, &op->cl);
closure_sync(&op->cl);

/* This function could be simpler now since we no longer write
@@ -639,7 +639,7 @@ static void journal_write_unlocked(struct closure *cl)
spin_unlock(&c->journal.lock);

while ((bio = bio_list_pop(&list)))
- closure_bio_submit(bio, cl, c->cache[0]);
+ closure_bio_submit(bio, cl);

continue_at(cl, journal_write_done, NULL);
}
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 3c95b0e..762a143 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -779,7 +779,7 @@ static void request_read_error(struct closure *cl)
/* XXX: invalidate cache */

trace_bcache_read_retry(&s->bio.bio);
- closure_bio_submit(&s->bio.bio, &s->cl, s->d);
+ closure_bio_submit(&s->bio.bio, &s->cl);
}

continue_at(cl, cached_dev_read_complete, NULL);
@@ -903,14 +903,14 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
bio_get(s->op.cache_bio);

trace_bcache_cache_miss(s->orig_bio);
- closure_bio_submit(s->op.cache_bio, &s->cl, s->d);
+ closure_bio_submit(s->op.cache_bio, &s->cl);

return ret;
out_put:
bio_put(s->op.cache_bio);
s->op.cache_bio = NULL;
out_submit:
- closure_bio_submit(miss, &s->cl, s->d);
+ closure_bio_submit(miss, &s->cl);
return ret;
}

@@ -978,7 +978,7 @@ static void request_write(struct cached_dev *dc, struct search *s)
dc->disk.bio_split);

trace_bcache_writethrough(s->orig_bio);
- closure_bio_submit(bio, cl, s->d);
+ closure_bio_submit(bio, cl);
} else {
s->op.cache_bio = bio;
trace_bcache_writeback(s->orig_bio);
@@ -997,7 +997,7 @@ skip:
!blk_queue_discard(bdev_get_queue(dc->bdev)))
goto out;

- closure_bio_submit(bio, cl, s->d);
+ closure_bio_submit(bio, cl);
goto out;
}

@@ -1014,7 +1014,7 @@ static void request_nodata(struct cached_dev *dc, struct search *s)
if (s->op.flush_journal)
bch_journal_meta(s->op.c, cl);

- closure_bio_submit(bio, cl, s->d);
+ closure_bio_submit(bio, cl);

continue_at(cl, cached_dev_bio_complete, NULL);
}
@@ -1171,7 +1171,7 @@ static void cached_dev_make_request(struct request_queue *q, struct bio *bio)
!blk_queue_discard(bdev_get_queue(dc->bdev)))
bio_endio(bio, 0);
else
- bch_generic_make_request(bio, &d->bio_split_hook);
+ generic_make_request(bio);
}
}

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 668b50e..e67b839 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -62,29 +62,6 @@ struct workqueue_struct *bcache_wq;

#define BTREE_MAX_PAGES (256 * 1024 / PAGE_SIZE)

-static void bio_split_pool_free(struct bio_split_pool *p)
-{
- if (p->bio_split_hook)
- mempool_destroy(p->bio_split_hook);
-
- if (p->bio_split)
- bioset_free(p->bio_split);
-}
-
-static int bio_split_pool_init(struct bio_split_pool *p)
-{
- p->bio_split = bioset_create(4, 0);
- if (!p->bio_split)
- return -ENOMEM;
-
- p->bio_split_hook = mempool_create_kmalloc_pool(4,
- sizeof(struct bio_split_hook));
- if (!p->bio_split_hook)
- return -ENOMEM;
-
- return 0;
-}
-
/* Superblock */

static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
@@ -515,7 +492,7 @@ static void prio_io(struct cache *ca, uint64_t bucket, unsigned long rw)
bio->bi_private = ca;
bch_bio_map(bio, ca->disk_buckets);

- closure_bio_submit(bio, &ca->prio, ca);
+ closure_bio_submit(bio, &ca->prio);
closure_sync(cl);
}

@@ -739,8 +716,6 @@ static void bcache_device_free(struct bcache_device *d)
blk_cleanup_queue(d->disk->queue);
if (d->disk)
put_disk(d->disk);
-
- bio_split_pool_free(&d->bio_split_hook);
if (d->bio_split)
bioset_free(d->bio_split);

@@ -752,12 +727,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size)
struct request_queue *q;

if (!(d->bio_split = bioset_create(4, offsetof(struct bbio, bio))) ||
- bio_split_pool_init(&d->bio_split_hook))
-
- return -ENOMEM;
-
- d->disk = alloc_disk(1);
- if (!d->disk)
+ !(d->disk = alloc_disk(1)))
return -ENOMEM;

snprintf(d->disk->disk_name, DISK_NAME_LEN, "bcache%i", bcache_minor);
@@ -785,6 +755,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned block_size)
q->limits.physical_block_size = block_size;
set_bit(QUEUE_FLAG_NONROT, &d->disk->queue->queue_flags);
set_bit(QUEUE_FLAG_DISCARD, &d->disk->queue->queue_flags);
+ set_bit(QUEUE_FLAG_LARGEBIOS, &d->disk->queue->queue_flags);

return 0;
}
@@ -1682,8 +1653,6 @@ void bch_cache_release(struct kobject *kobj)

bch_cache_allocator_exit(ca);

- bio_split_pool_free(&ca->bio_split_hook);
-
if (ca->alloc_workqueue)
destroy_workqueue(ca->alloc_workqueue);

@@ -1743,8 +1712,7 @@ static int cache_alloc(struct cache_sb *sb, struct cache *ca)
!(ca->prio_buckets = kzalloc(sizeof(uint64_t) * prio_buckets(ca) *
2, GFP_KERNEL)) ||
!(ca->disk_buckets = alloc_bucket_pages(GFP_KERNEL, ca)) ||
- !(ca->alloc_workqueue = alloc_workqueue("bch_allocator", 0, 1)) ||
- bio_split_pool_init(&ca->bio_split_hook))
+ !(ca->alloc_workqueue = alloc_workqueue("bch_allocator", 0, 1)))
goto err;

ca->prio_last_buckets = ca->prio_buckets + prio_buckets(ca);
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index c9b3806..e78d4c5 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -573,10 +573,10 @@ static inline sector_t bdev_sectors(struct block_device *bdev)
return bdev->bd_inode->i_size >> 9;
}

-#define closure_bio_submit(bio, cl, dev) \
+#define closure_bio_submit(bio, cl) \
do { \
closure_get(cl); \
- bch_generic_make_request(bio, &(dev)->bio_split_hook); \
+ generic_make_request(bio); \
} while (0)

uint64_t bch_crc64_update(uint64_t, const void *, size_t);
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 61bc071..b4719f1 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -277,7 +277,7 @@ static void write_dirty(struct closure *cl)
io->bio.bi_end_io = dirty_endio;

trace_bcache_write_dirty(&io->bio);
- closure_bio_submit(&io->bio, cl, &io->dc->disk);
+ closure_bio_submit(&io->bio, cl);

continue_at(cl, write_dirty_finish, dirty_wq);
}
@@ -298,7 +298,7 @@ static void read_dirty_submit(struct closure *cl)
struct dirty_io *io = container_of(cl, struct dirty_io, cl);

trace_bcache_read_dirty(&io->bio);
- closure_bio_submit(&io->bio, cl, &io->dc->disk);
+ closure_bio_submit(&io->bio, cl);

continue_at(cl, write_dirty, dirty_wq);
}
--
1.8.3.rc1

2013-06-09 02:21:42

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 23/26] blk-lib.c: generic_make_request() handles large bios now

generic_make_request() will now do for us what the code in blk-lib.c was
doing manually, with the bio_batch stuff - we still need some looping in
case we're trying to discard/zeroout more than around a gigabyte, but
when we can submit that much at a time doing the submissions in parallel
really shouldn't matter.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
---
block/blk-lib.c | 177 ++++++++++----------------------------------------------
1 file changed, 30 insertions(+), 147 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 3250620..368c36a 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -9,23 +9,6 @@

#include "blk.h"

-struct bio_batch {
- atomic_t done;
- unsigned long flags;
- struct completion *wait;
-};
-
-static void bio_batch_end_io(struct bio *bio, int err)
-{
- struct bio_batch *bb = bio->bi_private;
-
- if (err && (err != -EOPNOTSUPP))
- clear_bit(BIO_UPTODATE, &bb->flags);
- if (atomic_dec_and_test(&bb->done))
- complete(bb->wait);
- bio_put(bio);
-}
-
/**
* blkdev_issue_discard - queue a discard
* @bdev: blockdev to issue discard for
@@ -40,15 +23,10 @@ static void bio_batch_end_io(struct bio *bio, int err)
int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
{
- DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
- sector_t max_discard_sectors;
- sector_t granularity, alignment;
- struct bio_batch bb;
struct bio *bio;
int ret = 0;
- struct blk_plug plug;

if (!q)
return -ENXIO;
@@ -56,80 +34,28 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;

- /* Zero-sector (unknown) and one-sector granularities are the same. */
- granularity = max(q->limits.discard_granularity >> 9, 1U);
- alignment = bdev_discard_alignment(bdev) >> 9;
- alignment = sector_div(alignment, granularity);
-
- /*
- * Ensure that max_discard_sectors is of the proper
- * granularity, so that requests stay aligned after a split.
- */
- max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
- sector_div(max_discard_sectors, granularity);
- max_discard_sectors *= granularity;
- if (unlikely(!max_discard_sectors)) {
- /* Avoid infinite loop below. Being cautious never hurts. */
- return -EOPNOTSUPP;
- }
-
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
type |= REQ_SECURE;
}

- atomic_set(&bb.done, 1);
- bb.flags = 1 << BIO_UPTODATE;
- bb.wait = &wait;
-
- blk_start_plug(&plug);
while (nr_sects) {
- unsigned int req_sects;
- sector_t end_sect, tmp;
-
bio = bio_alloc(gfp_mask, 1);
- if (!bio) {
- ret = -ENOMEM;
- break;
- }
-
- req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
-
- /*
- * If splitting a request, and the next starting sector would be
- * misaligned, stop the discard at the previous aligned sector.
- */
- end_sect = sector + req_sects;
- tmp = end_sect;
- if (req_sects < nr_sects &&
- sector_div(tmp, granularity) != alignment) {
- end_sect = end_sect - alignment;
- sector_div(end_sect, granularity);
- end_sect = end_sect * granularity + alignment;
- req_sects = end_sect - sector;
- }
+ if (!bio)
+ return -ENOMEM;

- bio->bi_iter.bi_sector = sector;
- bio->bi_end_io = bio_batch_end_io;
bio->bi_bdev = bdev;
- bio->bi_private = &bb;
+ bio->bi_iter.bi_sector = sector;
+ bio->bi_iter.bi_size = min_t(sector_t, nr_sects, 1 << 20) << 9;

- bio->bi_iter.bi_size = req_sects << 9;
- nr_sects -= req_sects;
- sector = end_sect;
+ sector += bio_sectors(bio);
+ nr_sects -= bio_sectors(bio);

- atomic_inc(&bb.done);
- submit_bio(type, bio);
+ ret = submit_bio_wait(type, bio);
+ if (ret)
+ break;
}
- blk_finish_plug(&plug);
-
- /* Wait for bios in-flight */
- if (!atomic_dec_and_test(&bb.done))
- wait_for_completion_io(&wait);
-
- if (!test_bit(BIO_UPTODATE, &bb.flags))
- ret = -EIO;

return ret;
}
@@ -150,61 +76,37 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask,
struct page *page)
{
- DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
- unsigned int max_write_same_sectors;
- struct bio_batch bb;
struct bio *bio;
int ret = 0;

if (!q)
return -ENXIO;

- max_write_same_sectors = q->limits.max_write_same_sectors;
-
- if (max_write_same_sectors == 0)
+ if (!q->limits.max_write_same_sectors)
return -EOPNOTSUPP;

- atomic_set(&bb.done, 1);
- bb.flags = 1 << BIO_UPTODATE;
- bb.wait = &wait;
-
while (nr_sects) {
bio = bio_alloc(gfp_mask, 1);
- if (!bio) {
- ret = -ENOMEM;
- break;
- }
+ if (!bio)
+ return -ENOMEM;

- bio->bi_iter.bi_sector = sector;
- bio->bi_end_io = bio_batch_end_io;
bio->bi_bdev = bdev;
- bio->bi_private = &bb;
+ bio->bi_iter.bi_sector = sector;
+ bio->bi_iter.bi_size = min_t(sector_t, nr_sects, 1 << 20) << 9;
bio->bi_vcnt = 1;
bio->bi_io_vec->bv_page = page;
bio->bi_io_vec->bv_offset = 0;
bio->bi_io_vec->bv_len = bdev_logical_block_size(bdev);

- if (nr_sects > max_write_same_sectors) {
- bio->bi_iter.bi_size = max_write_same_sectors << 9;
- nr_sects -= max_write_same_sectors;
- sector += max_write_same_sectors;
- } else {
- bio->bi_iter.bi_size = nr_sects << 9;
- nr_sects = 0;
- }
+ sector += bio_sectors(bio);
+ nr_sects -= bio_sectors(bio);

- atomic_inc(&bb.done);
- submit_bio(REQ_WRITE | REQ_WRITE_SAME, bio);
+ ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
+ if (ret)
+ break;
}

- /* Wait for bios in-flight */
- if (!atomic_dec_and_test(&bb.done))
- wait_for_completion_io(&wait);
-
- if (!test_bit(BIO_UPTODATE, &bb.flags))
- ret = -ENOTSUPP;
-
return ret;
}
EXPORT_SYMBOL(blkdev_issue_write_same);
@@ -219,33 +121,22 @@ EXPORT_SYMBOL(blkdev_issue_write_same);
* Description:
* Generate and issue number of bios with zerofiled pages.
*/
-
int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
- sector_t nr_sects, gfp_t gfp_mask)
+ sector_t nr_sects, gfp_t gfp_mask)
{
- int ret;
+ int ret = 0;
struct bio *bio;
- struct bio_batch bb;
unsigned int sz;
- DECLARE_COMPLETION_ONSTACK(wait);
-
- atomic_set(&bb.done, 1);
- bb.flags = 1 << BIO_UPTODATE;
- bb.wait = &wait;

- ret = 0;
- while (nr_sects != 0) {
+ while (nr_sects) {
bio = bio_alloc(gfp_mask,
- min(nr_sects, (sector_t)BIO_MAX_PAGES));
- if (!bio) {
- ret = -ENOMEM;
- break;
- }
+ min(nr_sects / (PAGE_SIZE >> 9),
+ (sector_t)BIO_MAX_PAGES));
+ if (!bio)
+ return -ENOMEM;

bio->bi_iter.bi_sector = sector;
bio->bi_bdev = bdev;
- bio->bi_end_io = bio_batch_end_io;
- bio->bi_private = &bb;

while (nr_sects != 0) {
sz = min((sector_t) PAGE_SIZE >> 9 , nr_sects);
@@ -255,18 +146,11 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
if (ret < (sz << 9))
break;
}
- ret = 0;
- atomic_inc(&bb.done);
- submit_bio(WRITE, bio);
- }
-
- /* Wait for bios in-flight */
- if (!atomic_dec_and_test(&bb.done))
- wait_for_completion_io(&wait);

- if (!test_bit(BIO_UPTODATE, &bb.flags))
- /* One of bios in the batch was completed with error.*/
- ret = -EIO;
+ ret = submit_bio_wait(WRITE, bio);
+ if (ret)
+ break;
+ }

return ret;
}
@@ -281,7 +165,6 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
* Description:
* Generate and issue number of bios with zerofiled pages.
*/
-
int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask)
{
--
1.8.3.rc1

2013-06-09 02:22:03

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 22/26] block: Make generic_make_request handle arbitrary sized bios

The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.

But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: [email protected]
---
block/blk-core.c | 24 ++++++----
block/blk-merge.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++
block/blk.h | 3 ++
include/linux/blkdev.h | 4 ++
4 files changed, 142 insertions(+), 9 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 4d6eb60..f43bf1a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -599,6 +599,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
if (q->id < 0)
goto fail_q;

+ q->bio_split = bioset_create(4, 0);
+ if (!q->bio_split)
+ goto fail_split;
+
q->backing_dev_info.ra_pages =
(VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
q->backing_dev_info.state = 0;
@@ -651,6 +655,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)

fail_id:
ida_simple_remove(&blk_queue_ida, q->id);
+fail_split:
+ bioset_free(q->bio_split);
fail_q:
kmem_cache_free(blk_requestq_cachep, q);
return NULL;
@@ -1687,15 +1693,6 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}

- if (likely(bio_is_rw(bio) &&
- nr_sectors > queue_max_hw_sectors(q))) {
- printk(KERN_ERR "bio too big device %s (%u > %u)\n",
- bdevname(bio->bi_bdev, b),
- bio_sectors(bio),
- queue_max_hw_sectors(q));
- goto end_io;
- }
-
part = bio->bi_bdev->bd_part;
if (should_fail_request(part, bio->bi_iter.bi_size) ||
should_fail_request(&part_to_disk(part)->part0,
@@ -1820,6 +1817,7 @@ void generic_make_request(struct bio *bio)
current->bio_list = &bio_list_on_stack;
do {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
+ struct bio *split = NULL;

/*
* low level driver can indicate that it wants pages above a
@@ -1828,6 +1826,14 @@ void generic_make_request(struct bio *bio)
*/
blk_queue_bounce(q, &bio);

+ if (!blk_queue_largebios(q))
+ split = blk_bio_segment_split(q, bio, q->bio_split);
+ if (split) {
+ bio_chain(split, bio);
+ bio_list_add(current->bio_list, bio);
+ bio = split;
+ }
+
q->make_request_fn(q, bio);

bio = bio_list_pop(current->bio_list);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index ba48830..fbbcfc5 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -9,6 +9,126 @@

#include "blk.h"

+static struct bio *blk_bio_discard_split(struct request_queue *q,
+ struct bio *bio,
+ struct bio_set *bs)
+{
+ sector_t max_discard_sectors, granularity, alignment, tmp;
+ unsigned split_sectors;
+
+ /* Zero-sector (unknown) and one-sector granularities are the same. */
+ granularity = max(q->limits.discard_granularity >> 9, 1U);
+
+ max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
+ sector_div(max_discard_sectors, granularity);
+ max_discard_sectors *= granularity;
+
+ if (unlikely(!max_discard_sectors)) {
+ /* XXX: warn */
+ return NULL;
+ }
+
+ if (bio_sectors(bio) <= max_discard_sectors)
+ return NULL;
+
+ split_sectors = max_discard_sectors;
+
+ /*
+ * If splitting a request, and the next starting sector would be
+ * misaligned, stop the discard at the previous aligned sector.
+ */
+ alignment = q->limits.discard_alignment >> 9;
+ alignment = sector_div(alignment, granularity);
+
+ tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
+ tmp = sector_div(tmp, granularity);
+
+ if (split_sectors > tmp)
+ split_sectors -= tmp;
+
+ return bio_split(bio, split_sectors, GFP_NOIO, bs);
+}
+
+static struct bio *blk_bio_write_same_split(struct request_queue *q,
+ struct bio *bio,
+ struct bio_set *bs)
+{
+ if (!q->limits.max_write_same_sectors)
+ return NULL;
+
+ if (bio_sectors(bio) <= q->limits.max_write_same_sectors)
+ return NULL;
+
+ return bio_split(bio, q->limits.max_write_same_sectors, GFP_NOIO, bs);
+}
+
+struct bio *blk_bio_segment_split(struct request_queue *q, struct bio *bio,
+ struct bio_set *bs)
+{
+ struct bio *split;
+ struct bio_vec bv, bvprv;
+ struct bvec_iter iter;
+ unsigned seg_size = 0, nsegs = 0;
+ int prev = 0;
+
+ struct bvec_merge_data bvm = {
+ .bi_bdev = bio->bi_bdev,
+ .bi_sector = bio->bi_iter.bi_sector,
+ .bi_size = 0,
+ .bi_rw = bio->bi_rw,
+ };
+
+ if (bio->bi_rw & REQ_DISCARD)
+ return blk_bio_discard_split(q, bio, bs);
+
+ if (bio->bi_rw & REQ_WRITE_SAME)
+ return blk_bio_write_same_split(q, bio, bs);
+
+ bio_for_each_segment(bv, bio, iter) {
+ if (q->merge_bvec_fn &&
+ q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
+ goto split;
+
+ bvm.bi_size += bv.bv_len;
+
+ if (prev && blk_queue_cluster(q)) {
+ if (seg_size + bv.bv_len > queue_max_segment_size(q))
+ goto new_segment;
+ if (!BIOVEC_PHYS_MERGEABLE(&bvprv, &bv))
+ goto new_segment;
+ if (!BIOVEC_SEG_BOUNDARY(q, &bvprv, &bv))
+ goto new_segment;
+
+ seg_size += bv.bv_len;
+ bvprv = bv;
+ prev = 1;
+ continue;
+ }
+new_segment:
+ if (nsegs == queue_max_segments(q))
+ goto split;
+
+ nsegs++;
+ bvprv = bv;
+ prev = 1;
+ seg_size = bv.bv_len;
+ }
+
+ return NULL;
+split:
+ split = bio_clone_bioset(bio, GFP_NOIO, bs);
+
+ split->bi_iter.bi_size -= iter.bi_size;
+ bio->bi_iter = iter;
+
+ if (bio_integrity(bio)) {
+ bio_integrity_advance(bio, split->bi_iter.bi_size);
+ bio_integrity_trim(split, 0, bio_sectors(split));
+ }
+
+ return split;
+}
+
static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
struct bio *bio)
{
diff --git a/block/blk.h b/block/blk.h
index e837b8f..387afbd 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -130,6 +130,9 @@ static inline int blk_should_fake_timeout(struct request_queue *q)
}
#endif

+struct bio *blk_bio_segment_split(struct request_queue *q, struct bio *bio,
+ struct bio_set *bs);
+
int ll_back_merge_fn(struct request_queue *q, struct request *req,
struct bio *bio);
int ll_front_merge_fn(struct request_queue *q, struct request *req,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2a16de2..9a32ed8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -445,6 +445,7 @@ struct request_queue {
struct throtl_data *td;
#endif
struct rcu_head rcu_head;
+ struct bio_set *bio_split;
};

#define QUEUE_FLAG_QUEUED 1 /* uses generic tag queueing */
@@ -467,6 +468,7 @@ struct request_queue {
#define QUEUE_FLAG_SECDISCARD 17 /* supports SECDISCARD */
#define QUEUE_FLAG_SAME_FORCE 18 /* force complete on same CPU */
#define QUEUE_FLAG_DEAD 19 /* queue tear-down finished */
+#define QUEUE_FLAG_LARGEBIOS 19 /* no limits on bio size */

#define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_STACKABLE) | \
@@ -550,6 +552,8 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
#define blk_queue_discard(q) test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
#define blk_queue_secdiscard(q) (blk_queue_discard(q) && \
test_bit(QUEUE_FLAG_SECDISCARD, &(q)->queue_flags))
+#define blk_queue_largebios(q) \
+ test_bit(QUEUE_FLAG_LARGEBIOS, &(q)->queue_flags)

#define blk_noretry_request(rq) \
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
--
1.8.3.rc1

2013-06-09 02:22:30

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 19/26] block: Kill bio_segments()

When we start sharing biovecs, keeping bi_vcnt accurate for splits is
going to be error prone - and unnecessary, if we refactor some code.

So bio_segments() has to go - but most of the existing users just needed
to know if the bio had multiple segments, which is easier - add a
bio_multiple_segments() for them.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Nagalakshmi Nandigama <[email protected]>
Cc: Sreekanth Reddy <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
---
drivers/block/ps3disk.c | 7 ++---
drivers/md/bcache/io.c | 54 ++++++++++++++------------------
drivers/message/fusion/mptsas.c | 8 ++---
drivers/scsi/libsas/sas_expander.c | 8 ++---
drivers/scsi/mpt2sas/mpt2sas_transport.c | 10 +++---
drivers/scsi/mpt3sas/mpt3sas_transport.c | 8 ++---
include/linux/bio.h | 3 +-
7 files changed, 45 insertions(+), 53 deletions(-)

diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 464be78..8d1a19c 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -101,10 +101,9 @@ static void ps3disk_scatter_gather(struct ps3_storage_device *dev,

rq_for_each_segment(bvec, req, iter) {
unsigned long flags;
- dev_dbg(&dev->sbd.core,
- "%s:%u: bio %u: %u segs %u sectors from %lu\n",
- __func__, __LINE__, i, bio_segments(iter.bio),
- bio_sectors(iter.bio), iter.bio->bi_iter.bi_sector);
+ dev_dbg(&dev->sbd.core, "%s:%u: bio %u: %u sectors from %lu\n",
+ __func__, __LINE__, i, bio_sectors(iter.bio),
+ iter.bio->bi_iter.bi_sector);

size = bvec->bv_len;
buf = bvec_kmap_irq(bvec, &flags);
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 4224cfd..e3c582d 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -11,40 +11,32 @@

static unsigned bch_bio_max_sectors(struct bio *bio)
{
- unsigned ret = bio_sectors(bio);
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
- unsigned max_segments = min_t(unsigned, BIO_MAX_PAGES,
- queue_max_segments(q));
+ struct bio_vec bv;
+ struct bvec_iter iter;
+ unsigned ret = 0, seg = 0;

if (bio->bi_rw & REQ_DISCARD)
- return min(ret, q->limits.max_discard_sectors);
-
- if (bio_segments(bio) > max_segments ||
- q->merge_bvec_fn) {
- struct bio_vec bv;
- struct bvec_iter iter;
- unsigned seg = 0;
-
- ret = 0;
-
- bio_for_each_segment(bv, bio, iter) {
- struct bvec_merge_data bvm = {
- .bi_bdev = bio->bi_bdev,
- .bi_sector = bio->bi_iter.bi_sector,
- .bi_size = ret << 9,
- .bi_rw = bio->bi_rw,
- };
-
- if (seg == max_segments)
- break;
-
- if (q->merge_bvec_fn &&
- q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
- break;
-
- seg++;
- ret += bv.bv_len >> 9;
- }
+ return min(bio_sectors(bio), q->limits.max_discard_sectors);
+
+ bio_for_each_segment(bv, bio, iter) {
+ struct bvec_merge_data bvm = {
+ .bi_bdev = bio->bi_bdev,
+ .bi_sector = bio->bi_iter.bi_sector,
+ .bi_size = ret << 9,
+ .bi_rw = bio->bi_rw,
+ };
+
+ if (seg == min_t(unsigned, BIO_MAX_PAGES,
+ queue_max_segments(q)))
+ break;
+
+ if (q->merge_bvec_fn &&
+ q->merge_bvec_fn(q, &bvm, &bv) < (int) bv.bv_len)
+ break;
+
+ seg++;
+ ret += bv.bv_len >> 9;
}

ret = min(ret, queue_max_sectors(q));
diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index dd239bd..00d339c 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -2235,10 +2235,10 @@ static int mptsas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
}

/* do we need to support multiple segments? */
- if (bio_segments(req->bio) > 1 || bio_segments(rsp->bio) > 1) {
- printk(MYIOC_s_ERR_FMT "%s: multiple segments req %u %u, rsp %u %u\n",
- ioc->name, __func__, bio_segments(req->bio), blk_rq_bytes(req),
- bio_segments(rsp->bio), blk_rq_bytes(rsp));
+ if (bio_multiple_segments(req->bio) ||
+ bio_multiple_segments(rsp->bio)) {
+ printk(MYIOC_s_ERR_FMT "%s: multiple segments req %u, rsp %u\n",
+ ioc->name, __func__, blk_rq_bytes(req), blk_rq_bytes(rsp));
return -EINVAL;
}

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 446b851..0cac7d8 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -2163,10 +2163,10 @@ int sas_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
}

/* do we need to support multiple segments? */
- if (bio_segments(req->bio) > 1 || bio_segments(rsp->bio) > 1) {
- printk("%s: multiple segments req %u %u, rsp %u %u\n",
- __func__, bio_segments(req->bio), blk_rq_bytes(req),
- bio_segments(rsp->bio), blk_rq_bytes(rsp));
+ if (bio_multiple_segments(req->bio) ||
+ bio_multiple_segments(rsp->bio)) {
+ printk("%s: multiple segments req %u, rsp %u\n",
+ __func__, blk_rq_bytes(req), blk_rq_bytes(rsp));
return -EINVAL;
}

diff --git a/drivers/scsi/mpt2sas/mpt2sas_transport.c b/drivers/scsi/mpt2sas/mpt2sas_transport.c
index 2c2e01e..d2224b5 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_transport.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_transport.c
@@ -1940,7 +1940,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
ioc->transport_cmds.status = MPT2_CMD_PENDING;

/* Check if the request is split across multiple segments */
- if (bio_segments(req->bio) > 1) {
+ if (bio_multiple_segments(req->bio)) {
u32 offset = 0;

/* Allocate memory and copy the request */
@@ -1972,7 +1972,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,

/* Check if the response needs to be populated across
* multiple segments */
- if (bio_segments(rsp->bio) > 1) {
+ if (bio_multiple_segments(rsp->bio)) {
pci_addr_in = pci_alloc_consistent(ioc->pdev, blk_rq_bytes(rsp),
&pci_dma_in);
if (!pci_addr_in) {
@@ -2039,7 +2039,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
sgl_flags = (MPI2_SGE_FLAGS_SIMPLE_ELEMENT |
MPI2_SGE_FLAGS_END_OF_BUFFER | MPI2_SGE_FLAGS_HOST_TO_IOC);
sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
- if (bio_segments(req->bio) > 1) {
+ if (bio_multiple_segments(req->bio)) {
ioc->base_add_sg_single(psge, sgl_flags |
(blk_rq_bytes(req) - 4), pci_dma_out);
} else {
@@ -2055,7 +2055,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
MPI2_SGE_FLAGS_LAST_ELEMENT | MPI2_SGE_FLAGS_END_OF_BUFFER |
MPI2_SGE_FLAGS_END_OF_LIST);
sgl_flags = sgl_flags << MPI2_SGE_FLAGS_SHIFT;
- if (bio_segments(rsp->bio) > 1) {
+ if (bio_multiple_segments(rsp->bio)) {
ioc->base_add_sg_single(psge, sgl_flags |
(blk_rq_bytes(rsp) + 4), pci_dma_in);
} else {
@@ -2100,7 +2100,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
le16_to_cpu(mpi_reply->ResponseDataLength);
/* check if the resp needs to be copied from the allocated
* pci mem */
- if (bio_segments(rsp->bio) > 1) {
+ if (bio_multiple_segments(rsp->bio)) {
u32 offset = 0;
u32 bytes_to_copy =
le16_to_cpu(mpi_reply->ResponseDataLength);
diff --git a/drivers/scsi/mpt3sas/mpt3sas_transport.c b/drivers/scsi/mpt3sas/mpt3sas_transport.c
index dd15b2d..3898f51 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_transport.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_transport.c
@@ -1923,7 +1923,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
ioc->transport_cmds.status = MPT3_CMD_PENDING;

/* Check if the request is split across multiple segments */
- if (req->bio->bi_vcnt > 1) {
+ if (bio_multiple_segments(req->bio)) {
u32 offset = 0;

/* Allocate memory and copy the request */
@@ -1955,7 +1955,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,

/* Check if the response needs to be populated across
* multiple segments */
- if (rsp->bio->bi_vcnt > 1) {
+ if (bio_multiple_segments(rsp->bio)) {
pci_addr_in = pci_alloc_consistent(ioc->pdev, blk_rq_bytes(rsp),
&pci_dma_in);
if (!pci_addr_in) {
@@ -2016,7 +2016,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,
mpi_request->RequestDataLength = cpu_to_le16(blk_rq_bytes(req) - 4);
psge = &mpi_request->SGL;

- if (req->bio->bi_vcnt > 1)
+ if (bio_multiple_segments(req->bio))
ioc->build_sg(ioc, psge, pci_dma_out, (blk_rq_bytes(req) - 4),
pci_dma_in, (blk_rq_bytes(rsp) + 4));
else
@@ -2061,7 +2061,7 @@ _transport_smp_handler(struct Scsi_Host *shost, struct sas_rphy *rphy,

/* check if the resp needs to be copied from the allocated
* pci mem */
- if (rsp->bio->bi_vcnt > 1) {
+ if (bio_multiple_segments(rsp->bio)) {
u32 offset = 0;
u32 bytes_to_copy =
le16_to_cpu(mpi_reply->ResponseDataLength);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 6767622..aa015ee 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -90,7 +90,8 @@
#define bio_offset(bio) bio_offset_iter((bio), (bio)->bi_iter)
#define bio_iovec(bio) bio_iovec_iter((bio), (bio)->bi_iter)

-#define bio_segments(bio) ((bio)->bi_vcnt - (bio)->bi_iter.bi_idx)
+#define bio_multiple_segments(bio) \
+ ((bio)->bi_iter.bi_size != bio_iovec(bio).bv_len)
#define bio_sectors(bio) ((bio)->bi_iter.bi_size >> 9)
#define bio_end_sector(bio) ((bio)->bi_iter.bi_sector + bio_sectors((bio)))

--
1.8.3.rc1

2013-06-09 02:22:48

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 17/26] block: Introduce new bio_split()

The new bio_split() can split arbitrary bios - it's not restricted to
single page bios, like the old bio_split() (previously renamed to
bio_pair_split()). It also has different semantics - it doesn't allocate
a struct bio_pair, leaving it up to the caller to handle completions.

Then convert the existing bio_pair_split() users to the new bio_split()
- and also nvme, which was open coding bio splitting.

(We have to take that BUG_ON() out of bio_integrity_trim() because this
bio_split() needs to use it, and there's no reason it has to be used on
bios marked as cloned; BIO_CLONED doesn't seem to have clearly
documented semantics anyways.)

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Neil Brown <[email protected]>
---
drivers/block/nvme-core.c | 108 +++--------------------------------
drivers/block/pktcdvd.c | 135 ++++++++++++++++++++++++--------------------
drivers/md/bcache/bcache.h | 1 -
drivers/md/bcache/btree.c | 2 +-
drivers/md/bcache/io.c | 80 +-------------------------
drivers/md/bcache/request.c | 4 +-
drivers/md/linear.c | 96 +++++++++++++++----------------
drivers/md/raid0.c | 77 +++++++++----------------
drivers/md/raid10.c | 113 +++++++++++++++---------------------
fs/bio.c | 71 +++++++++++++++++++++++
include/linux/bio.h | 22 ++++++++
11 files changed, 304 insertions(+), 405 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index e4f2c37..d88badc 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -406,106 +406,19 @@ int nvme_setup_prps(struct nvme_dev *dev, struct nvme_common_command *cmd,
return total_len;
}

-struct nvme_bio_pair {
- struct bio b1, b2, *parent;
- struct bio_vec *bv1, *bv2;
- int err;
- atomic_t cnt;
-};
-
-static void nvme_bio_pair_endio(struct bio *bio, int err)
-{
- struct nvme_bio_pair *bp = bio->bi_private;
-
- if (err)
- bp->err = err;
-
- if (atomic_dec_and_test(&bp->cnt)) {
- bio_endio(bp->parent, bp->err);
- if (bp->bv1)
- kfree(bp->bv1);
- if (bp->bv2)
- kfree(bp->bv2);
- kfree(bp);
- }
-}
-
-static struct nvme_bio_pair *nvme_bio_split(struct bio *bio, int idx,
- int len, int offset)
-{
- struct nvme_bio_pair *bp;
-
- BUG_ON(len > bio->bi_iter.bi_size);
- BUG_ON(idx > bio->bi_vcnt);
-
- bp = kmalloc(sizeof(*bp), GFP_ATOMIC);
- if (!bp)
- return NULL;
- bp->err = 0;
-
- bp->b1 = *bio;
- bp->b2 = *bio;
-
- bp->b1.bi_iter.bi_size = len;
- bp->b2.bi_iter.bi_size -= len;
- bp->b1.bi_vcnt = idx;
- bp->b2.bi_iter.bi_idx = idx;
- bp->b2.bi_iter.bi_sector += len >> 9;
-
- if (offset) {
- bp->bv1 = kmalloc(bio->bi_max_vecs * sizeof(struct bio_vec),
- GFP_ATOMIC);
- if (!bp->bv1)
- goto split_fail_1;
-
- bp->bv2 = kmalloc(bio->bi_max_vecs * sizeof(struct bio_vec),
- GFP_ATOMIC);
- if (!bp->bv2)
- goto split_fail_2;
-
- memcpy(bp->bv1, bio->bi_io_vec,
- bio->bi_max_vecs * sizeof(struct bio_vec));
- memcpy(bp->bv2, bio->bi_io_vec,
- bio->bi_max_vecs * sizeof(struct bio_vec));
-
- bp->b1.bi_io_vec = bp->bv1;
- bp->b2.bi_io_vec = bp->bv2;
- bp->b2.bi_io_vec[idx].bv_offset += offset;
- bp->b2.bi_io_vec[idx].bv_len -= offset;
- bp->b1.bi_io_vec[idx].bv_len = offset;
- bp->b1.bi_vcnt++;
- } else
- bp->bv1 = bp->bv2 = NULL;
-
- bp->b1.bi_private = bp;
- bp->b2.bi_private = bp;
-
- bp->b1.bi_end_io = nvme_bio_pair_endio;
- bp->b2.bi_end_io = nvme_bio_pair_endio;
-
- bp->parent = bio;
- atomic_set(&bp->cnt, 2);
-
- return bp;
-
- split_fail_2:
- kfree(bp->bv1);
- split_fail_1:
- kfree(bp);
- return NULL;
-}
-
static int nvme_split_and_submit(struct bio *bio, struct nvme_queue *nvmeq,
- int idx, int len, int offset)
+ int len)
{
- struct nvme_bio_pair *bp = nvme_bio_split(bio, idx, len, offset);
- if (!bp)
+ struct bio *split = bio_split(bio, len >> 9, GFP_ATOMIC, NULL);
+ if (!split)
return -ENOMEM;

+ bio_chain(split, bio);
+
if (bio_list_empty(&nvmeq->sq_cong))
add_wait_queue(&nvmeq->sq_full, &nvmeq->sq_cong_wait);
- bio_list_add(&nvmeq->sq_cong, &bp->b1);
- bio_list_add(&nvmeq->sq_cong, &bp->b2);
+ bio_list_add(&nvmeq->sq_cong, split);
+ bio_list_add(&nvmeq->sq_cong, bio);

return 0;
}
@@ -535,8 +448,7 @@ static int nvme_map_bio(struct nvme_queue *nvmeq, struct nvme_iod *iod,
} else {
if (!first && BIOVEC_NOT_VIRT_MERGEABLE(&bvprv, &bvec))
return nvme_split_and_submit(bio, nvmeq,
- iter.bi_idx,
- length, 0);
+ length);

sg = sg ? sg + 1 : iod->sg;
sg_set_page(sg, bvec.bv_page,
@@ -545,9 +457,7 @@ static int nvme_map_bio(struct nvme_queue *nvmeq, struct nvme_iod *iod,
}

if (split_len - length < bvec.bv_len)
- return nvme_split_and_submit(bio, nvmeq, iter.bi_idx,
- split_len,
- split_len - length);
+ return nvme_split_and_submit(bio, nvmeq, split_len);
length += bvec.bv_len;
bvprv = bvec;
first = 0;
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 6c1029c..a929817 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2352,74 +2352,29 @@ static void pkt_end_io_read_cloned(struct bio *bio, int err)
pkt_bio_finished(pd);
}

-static void pkt_make_request(struct request_queue *q, struct bio *bio)
+static void pkt_make_request_read(struct pktcdvd_device *pd, struct bio *bio)
{
- struct pktcdvd_device *pd;
- char b[BDEVNAME_SIZE];
+ struct bio *cloned_bio = bio_clone(bio, GFP_NOIO);
+ struct packet_stacked_data *psd = mempool_alloc(psd_pool, GFP_NOIO);
+
+ psd->pd = pd;
+ psd->bio = bio;
+ cloned_bio->bi_bdev = pd->bdev;
+ cloned_bio->bi_private = psd;
+ cloned_bio->bi_end_io = pkt_end_io_read_cloned;
+ pd->stats.secs_r += bio_sectors(bio);
+ pkt_queue_bio(pd, cloned_bio);
+}
+
+static void pkt_make_request_write(struct request_queue *q, struct bio *bio)
+{
+ struct pktcdvd_device *pd = q->queuedata;
sector_t zone;
struct packet_data *pkt;
int was_empty, blocked_bio;
struct pkt_rb_node *node;

- pd = q->queuedata;
- if (!pd) {
- printk(DRIVER_NAME": %s incorrect request queue\n", bdevname(bio->bi_bdev, b));
- goto end_io;
- }
-
- /*
- * Clone READ bios so we can have our own bi_end_io callback.
- */
- if (bio_data_dir(bio) == READ) {
- struct bio *cloned_bio = bio_clone(bio, GFP_NOIO);
- struct packet_stacked_data *psd = mempool_alloc(psd_pool, GFP_NOIO);
-
- psd->pd = pd;
- psd->bio = bio;
- cloned_bio->bi_bdev = pd->bdev;
- cloned_bio->bi_private = psd;
- cloned_bio->bi_end_io = pkt_end_io_read_cloned;
- pd->stats.secs_r += bio_sectors(bio);
- pkt_queue_bio(pd, cloned_bio);
- return;
- }
-
- if (!test_bit(PACKET_WRITABLE, &pd->flags)) {
- printk(DRIVER_NAME": WRITE for ro device %s (%llu)\n",
- pd->name, (unsigned long long)bio->bi_iter.bi_sector);
- goto end_io;
- }
-
- if (!bio->bi_iter.bi_size || (bio->bi_iter.bi_size % CD_FRAMESIZE)) {
- printk(DRIVER_NAME": wrong bio size\n");
- goto end_io;
- }
-
- blk_queue_bounce(q, &bio);
-
zone = ZONE(bio->bi_iter.bi_sector, pd);
- VPRINTK("pkt_make_request: start = %6llx stop = %6llx\n",
- (unsigned long long)bio->bi_sector,
- (unsigned long long)bio_end_sector(bio));
-
- /* Check if we have to split the bio */
- {
- struct bio_pair *bp;
- sector_t last_zone;
- int first_sectors;
-
- last_zone = ZONE(bio_end_sector(bio) - 1, pd);
- if (last_zone != zone) {
- BUG_ON(last_zone != zone + pd->settings.size);
- first_sectors = last_zone - bio->bi_iter.bi_sector;
- bp = bio_pair_split(bio, first_sectors);
- BUG_ON(!bp);
- pkt_make_request(q, &bp->bio1);
- pkt_make_request(q, &bp->bio2);
- bio_pair_release(bp);
- return;
- }
- }

/*
* If we find a matching packet in state WAITING or READ_WAIT, we can
@@ -2493,6 +2448,64 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
*/
wake_up(&pd->wqueue);
}
+}
+
+static void pkt_make_request(struct request_queue *q, struct bio *bio)
+{
+ struct pktcdvd_device *pd;
+ char b[BDEVNAME_SIZE];
+ struct bio *split;
+
+ pd = q->queuedata;
+ if (!pd) {
+ printk(DRIVER_NAME": %s incorrect request queue\n",
+ bdevname(bio->bi_bdev, b));
+ goto end_io;
+ }
+
+ VPRINTK("pkt_make_request: start = %6llx stop = %6llx\n",
+ (unsigned long long)bio->bi_sector,
+ (unsigned long long)(bio->bi_sector + bio_sectors(bio)));
+
+ /*
+ * Clone READ bios so we can have our own bi_end_io callback.
+ */
+ if (bio_data_dir(bio) == READ) {
+ pkt_make_request_read(pd, bio);
+ return;
+ }
+
+ if (!test_bit(PACKET_WRITABLE, &pd->flags)) {
+ printk(DRIVER_NAME": WRITE for ro device %s (%llu)\n",
+ pd->name, (unsigned long long)bio->bi_iter.bi_sector);
+ goto end_io;
+ }
+
+ if (!bio->bi_iter.bi_size || (bio->bi_iter.bi_size % CD_FRAMESIZE)) {
+ printk(DRIVER_NAME": wrong bio size\n");
+ goto end_io;
+ }
+
+ blk_queue_bounce(q, &bio);
+
+ do {
+ sector_t zone = ZONE(bio->bi_iter.bi_sector, pd);
+ sector_t last_zone = ZONE(bio_end_sector(bio) - 1, pd);
+
+ if (last_zone != zone) {
+ BUG_ON(last_zone != zone + pd->settings.size);
+
+ split = bio_split(bio, last_zone -
+ bio->bi_iter.bi_sector,
+ GFP_NOIO, fs_bio_set);
+ bio_chain(split, bio);
+ } else {
+ split = bio;
+ }
+
+ pkt_make_request_write(q, split);
+ } while (split != bio);
+
return;
end_io:
bio_io_error(bio);
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 850de67..14aaff5 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -1184,7 +1184,6 @@ void bch_bbio_endio(struct cache_set *, struct bio *, int, const char *);
void bch_bbio_free(struct bio *, struct cache_set *);
struct bio *bch_bbio_alloc(struct cache_set *);

-struct bio *bch_bio_split(struct bio *, int, gfp_t, struct bio_set *);
void bch_generic_make_request(struct bio *, struct bio_split_pool *);
void __bch_submit_bbio(struct bio *, struct cache_set *);
void bch_submit_bbio(struct bio *, struct cache_set *, struct bkey *, unsigned);
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index b6c3a05..e724b21 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -2214,7 +2214,7 @@ static int submit_partial_cache_hit(struct btree *b, struct btree_op *op,
unsigned sectors = min_t(uint64_t, INT_MAX,
KEY_OFFSET(k) - bio->bi_iter.bi_sector);

- n = bch_bio_split(bio, sectors, GFP_NOIO, s->d->bio_split);
+ n = bio_next_split(bio, sectors, GFP_NOIO, s->d->bio_split);
if (n == bio)
op->lookup_done = true;

diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 246b420..4224cfd 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -9,82 +9,6 @@
#include "bset.h"
#include "debug.h"

-/**
- * bch_bio_split - split a bio
- * @bio: bio to split
- * @sectors: number of sectors to split from the front of @bio
- * @gfp: gfp mask
- * @bs: bio set to allocate from
- *
- * Allocates and returns a new bio which represents @sectors from the start of
- * @bio, and updates @bio to represent the remaining sectors.
- *
- * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
- * unchanged.
- *
- * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
- * bvec boundry; it is the caller's responsibility to ensure that @bio is not
- * freed before the split.
- */
-struct bio *bch_bio_split(struct bio *bio, int sectors,
- gfp_t gfp, struct bio_set *bs)
-{
- unsigned vcnt = 0, nbytes = sectors << 9;
- struct bio_vec bv;
- struct bvec_iter iter;
- struct bio *ret = NULL;
-
- BUG_ON(sectors <= 0);
-
- if (sectors >= bio_sectors(bio))
- return bio;
-
- if (bio->bi_rw & REQ_DISCARD) {
- ret = bio_alloc_bioset(gfp, 1, bs);
- goto out;
- }
-
- bio_for_each_segment(bv, bio, iter) {
- vcnt++;
-
- if (nbytes <= bv.bv_len)
- break;
-
- nbytes -= bv.bv_len;
- }
-
- ret = bio_alloc_bioset(gfp, vcnt, bs);
- if (!ret)
- return NULL;
-
- bio_for_each_segment(bv, bio, iter) {
- ret->bi_io_vec[ret->bi_vcnt++] = bv;
-
- if (ret->bi_vcnt == vcnt)
- break;
- }
-
- ret->bi_io_vec[ret->bi_vcnt - 1].bv_len = nbytes;
-out:
- ret->bi_bdev = bio->bi_bdev;
- ret->bi_iter.bi_sector = bio->bi_iter.bi_sector;
- ret->bi_iter.bi_size = sectors << 9;
- ret->bi_rw = bio->bi_rw;
-
- if (bio_integrity(bio)) {
- if (bio_integrity_clone(ret, bio, gfp)) {
- bio_put(ret);
- return NULL;
- }
-
- bio_integrity_trim(ret, 0, bio_sectors(ret));
- }
-
- bio_advance(bio, ret->bi_iter.bi_size);
-
- return ret;
-}
-
static unsigned bch_bio_max_sectors(struct bio *bio)
{
unsigned ret = bio_sectors(bio);
@@ -176,8 +100,8 @@ void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
bio_get(bio);

do {
- n = bch_bio_split(bio, bch_bio_max_sectors(bio),
- GFP_NOIO, s->p->bio_split);
+ n = bio_next_split(bio, bch_bio_max_sectors(bio),
+ GFP_NOIO, s->p->bio_split);

n->bi_end_io = bch_bio_submit_split_endio;
n->bi_private = &s->cl;
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index b0de7a43..ca513d4 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -509,7 +509,7 @@ static void bch_insert_data_loop(struct closure *cl)
if (!bch_alloc_sectors(k, bio_sectors(bio), s))
goto err;

- n = bch_bio_split(bio, KEY_SIZE(k), GFP_NOIO, split);
+ n = bio_next_split(bio, KEY_SIZE(k), GFP_NOIO, split);

n->bi_end_io = bch_insert_data_endio;
n->bi_private = cl;
@@ -853,7 +853,7 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
struct bio *miss;

- miss = bch_bio_split(bio, sectors, GFP_NOIO, s->d->bio_split);
+ miss = bio_next_split(bio, sectors, GFP_NOIO, s->d->bio_split);
if (miss == bio)
s->op.lookup_done = true;

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index e9b53e9..56f534b 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -288,65 +288,65 @@ static int linear_stop (struct mddev *mddev)

static void linear_make_request(struct mddev *mddev, struct bio *bio)
{
+ char b[BDEVNAME_SIZE];
struct dev_info *tmp_dev;
- sector_t start_sector;
+ struct bio *split;
+ sector_t start_sector, end_sector, data_offset;

if (unlikely(bio->bi_rw & REQ_FLUSH)) {
md_flush_request(mddev, bio);
return;
}

- rcu_read_lock();
- tmp_dev = which_dev(mddev, bio->bi_iter.bi_sector);
- start_sector = tmp_dev->end_sector - tmp_dev->rdev->sectors;
-
-
- if (unlikely(bio->bi_iter.bi_sector >= (tmp_dev->end_sector)
- || (bio->bi_iter.bi_sector < start_sector))) {
- char b[BDEVNAME_SIZE];
-
- printk(KERN_ERR
- "md/linear:%s: make_request: Sector %llu out of bounds on "
- "dev %s: %llu sectors, offset %llu\n",
- mdname(mddev),
- (unsigned long long)bio->bi_iter.bi_sector,
- bdevname(tmp_dev->rdev->bdev, b),
- (unsigned long long)tmp_dev->rdev->sectors,
- (unsigned long long)start_sector);
- rcu_read_unlock();
- bio_io_error(bio);
- return;
- }
- if (unlikely(bio_end_sector(bio) > tmp_dev->end_sector)) {
- /* This bio crosses a device boundary, so we have to
- * split it.
- */
- struct bio_pair *bp;
- sector_t end_sector = tmp_dev->end_sector;
+ do {
+ rcu_read_lock();

- rcu_read_unlock();
-
- bp = bio_pair_split(bio, end_sector - bio->bi_iter.bi_sector);
+ tmp_dev = which_dev(mddev, bio->bi_iter.bi_sector);
+ start_sector = tmp_dev->end_sector - tmp_dev->rdev->sectors;
+ end_sector = tmp_dev->end_sector;
+ data_offset = tmp_dev->rdev->data_offset;
+ bio->bi_bdev = tmp_dev->rdev->bdev;

- linear_make_request(mddev, &bp->bio1);
- linear_make_request(mddev, &bp->bio2);
- bio_pair_release(bp);
- return;
- }
-
- bio->bi_bdev = tmp_dev->rdev->bdev;
- bio->bi_iter.bi_sector = bio->bi_iter.bi_sector - start_sector
- + tmp_dev->rdev->data_offset;
- rcu_read_unlock();
+ rcu_read_unlock();

- if (unlikely((bio->bi_rw & REQ_DISCARD) &&
- !blk_queue_discard(bdev_get_queue(bio->bi_bdev)))) {
- /* Just ignore it */
- bio_endio(bio, 0);
- return;
- }
+ if (unlikely(bio->bi_iter.bi_sector >= end_sector ||
+ bio->bi_iter.bi_sector < start_sector))
+ goto out_of_bounds;
+
+ if (unlikely(bio_end_sector(bio) > end_sector)) {
+ /* This bio crosses a device boundary, so we have to
+ * split it.
+ */
+ split = bio_split(bio, end_sector -
+ bio->bi_iter.bi_sector,
+ GFP_NOIO, fs_bio_set);
+ bio_chain(split, bio);
+ } else {
+ split = bio;
+ }

- generic_make_request(bio);
+ split->bi_iter.bi_sector = split->bi_iter.bi_sector -
+ start_sector + data_offset;
+
+ if (unlikely((split->bi_rw & REQ_DISCARD) &&
+ !blk_queue_discard(bdev_get_queue(split->bi_bdev)))) {
+ /* Just ignore it */
+ bio_endio(split, 0);
+ } else
+ generic_make_request(split);
+ } while (split != bio);
+ return;
+
+out_of_bounds:
+ printk(KERN_ERR
+ "md/linear:%s: make_request: Sector %llu out of bounds on "
+ "dev %s: %llu sectors, offset %llu\n",
+ mdname(mddev),
+ (unsigned long long)bio->bi_iter.bi_sector,
+ bdevname(tmp_dev->rdev->bdev, b),
+ (unsigned long long)tmp_dev->rdev->sectors,
+ (unsigned long long)start_sector);
+ bio_io_error(bio);
}

static void linear_status (struct seq_file *seq, struct mddev *mddev)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 5edc3f7..49d1188 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -513,65 +513,44 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,

static void raid0_make_request(struct mddev *mddev, struct bio *bio)
{
- unsigned int chunk_sects;
- sector_t sector_offset;
struct strip_zone *zone;
struct md_rdev *tmp_dev;
+ struct bio *split;

if (unlikely(bio->bi_rw & REQ_FLUSH)) {
md_flush_request(mddev, bio);
return;
}

- chunk_sects = mddev->chunk_sectors;
- if (unlikely(!is_io_in_chunk_boundary(mddev, chunk_sects, bio))) {
+ do {
sector_t sector = bio->bi_iter.bi_sector;
- struct bio_pair *bp;
- /* Sanity check -- queue functions should prevent this happening */
- if (bio_segments(bio) > 1)
- goto bad_map;
- /* This is a one page bio that upper layers
- * refuse to split for us, so we need to split it.
- */
- if (likely(is_power_of_2(chunk_sects)))
- bp = bio_pair_split(bio, chunk_sects - (sector &
- (chunk_sects-1)));
- else
- bp = bio_pair_split(bio, chunk_sects -
- sector_div(sector, chunk_sects));
- raid0_make_request(mddev, &bp->bio1);
- raid0_make_request(mddev, &bp->bio2);
- bio_pair_release(bp);
- return;
- }
-
- sector_offset = bio->bi_iter.bi_sector;
- zone = find_zone(mddev->private, &sector_offset);
- tmp_dev = map_sector(mddev, zone, bio->bi_iter.bi_sector,
- &sector_offset);
- bio->bi_bdev = tmp_dev->bdev;
- bio->bi_iter.bi_sector = sector_offset + zone->dev_start +
- tmp_dev->data_offset;
-
- if (unlikely((bio->bi_rw & REQ_DISCARD) &&
- !blk_queue_discard(bdev_get_queue(bio->bi_bdev)))) {
- /* Just ignore it */
- bio_endio(bio, 0);
- return;
- }
-
- generic_make_request(bio);
- return;
-
-bad_map:
- printk("md/raid0:%s: make_request bug: can't convert block across chunks"
- " or bigger than %dk %llu %d\n",
- mdname(mddev), chunk_sects / 2,
- (unsigned long long)bio->bi_iter.bi_sector,
- bio_sectors(bio) / 2);
+ unsigned chunk_sects = mddev->chunk_sectors;
+
+ unsigned sectors = chunk_sects -
+ (likely(is_power_of_2(chunk_sects))
+ ? (sector & (chunk_sects-1))
+ : sector_div(sector, chunk_sects));
+
+ if (sectors < bio_sectors(bio)) {
+ split = bio_split(bio, sectors, GFP_NOIO, fs_bio_set);
+ bio_chain(split, bio);
+ } else {
+ split = bio;
+ }

- bio_io_error(bio);
- return;
+ zone = find_zone(mddev->private, &sector);
+ tmp_dev = map_sector(mddev, zone, sector, &sector);
+ split->bi_bdev = tmp_dev->bdev;
+ split->bi_iter.bi_sector = sector + zone->dev_start +
+ tmp_dev->data_offset;
+
+ if (unlikely((split->bi_rw & REQ_DISCARD) &&
+ !blk_queue_discard(bdev_get_queue(split->bi_bdev)))) {
+ /* Just ignore it */
+ bio_endio(split, 0);
+ } else
+ generic_make_request(split);
+ } while (split != bio);
}

static void raid0_status(struct seq_file *seq, struct mddev *mddev)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 7b68d82..867a996 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1144,14 +1144,12 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
kfree(plug);
}

-static void make_request(struct mddev *mddev, struct bio * bio)
+static void __make_request(struct mddev *mddev, struct bio *bio)
{
struct r10conf *conf = mddev->private;
struct r10bio *r10_bio;
struct bio *read_bio;
int i;
- sector_t chunk_mask = (conf->geo.chunk_mask & conf->prev.chunk_mask);
- int chunk_sects = chunk_mask + 1;
const int rw = bio_data_dir(bio);
const unsigned long do_sync = (bio->bi_rw & REQ_SYNC);
const unsigned long do_fua = (bio->bi_rw & REQ_FUA);
@@ -1166,69 +1164,6 @@ static void make_request(struct mddev *mddev, struct bio * bio)
int max_sectors;
int sectors;

- if (unlikely(bio->bi_rw & REQ_FLUSH)) {
- md_flush_request(mddev, bio);
- return;
- }
-
- /* If this request crosses a chunk boundary, we need to
- * split it. This will only happen for 1 PAGE (or less) requests.
- */
- if (unlikely((bio->bi_iter.bi_sector & chunk_mask) + bio_sectors(bio)
- > chunk_sects
- && (conf->geo.near_copies < conf->geo.raid_disks
- || conf->prev.near_copies < conf->prev.raid_disks))) {
- struct bio_pair *bp;
- /* Sanity check -- queue functions should prevent this happening */
- if (bio_segments(bio) > 1)
- goto bad_map;
- /* This is a one page bio that upper layers
- * refuse to split for us, so we need to split it.
- */
- bp = bio_pair_split(bio, chunk_sects -
- (bio->bi_iter.bi_sector & (chunk_sects - 1)));
-
- /* Each of these 'make_request' calls will call 'wait_barrier'.
- * If the first succeeds but the second blocks due to the resync
- * thread raising the barrier, we will deadlock because the
- * IO to the underlying device will be queued in generic_make_request
- * and will never complete, so will never reduce nr_pending.
- * So increment nr_waiting here so no new raise_barriers will
- * succeed, and so the second wait_barrier cannot block.
- */
- spin_lock_irq(&conf->resync_lock);
- conf->nr_waiting++;
- spin_unlock_irq(&conf->resync_lock);
-
- make_request(mddev, &bp->bio1);
- make_request(mddev, &bp->bio2);
-
- spin_lock_irq(&conf->resync_lock);
- conf->nr_waiting--;
- wake_up(&conf->wait_barrier);
- spin_unlock_irq(&conf->resync_lock);
-
- bio_pair_release(bp);
- return;
- bad_map:
- printk("md/raid10:%s: make_request bug: can't convert block across chunks"
- " or bigger than %dk %llu %d\n", mdname(mddev), chunk_sects/2,
- (unsigned long long)bio->bi_iter.bi_sector,
- bio_sectors(bio) / 2);
-
- bio_io_error(bio);
- return;
- }
-
- md_write_start(mddev, bio);
-
- /*
- * Register the new request and wait if the reconstruction
- * thread has put up a bar for new requests.
- * Continue immediately if no resync is active currently.
- */
- wait_barrier(conf);
-
sectors = bio_sectors(bio);
while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
bio->bi_iter.bi_sector < conf->reshape_progress &&
@@ -1592,6 +1527,52 @@ retry_write:
goto retry_write;
}
one_write_done(r10_bio);
+}
+
+static void make_request(struct mddev *mddev, struct bio *bio)
+{
+ struct r10conf *conf = mddev->private;
+ sector_t chunk_mask = (conf->geo.chunk_mask & conf->prev.chunk_mask);
+ int chunk_sects = chunk_mask + 1;
+
+ struct bio *split;
+
+ if (unlikely(bio->bi_rw & REQ_FLUSH)) {
+ md_flush_request(mddev, bio);
+ return;
+ }
+
+ md_write_start(mddev, bio);
+
+ /*
+ * Register the new request and wait if the reconstruction
+ * thread has put up a bar for new requests.
+ * Continue immediately if no resync is active currently.
+ */
+ wait_barrier(conf);
+
+ do {
+
+ /*
+ * If this request crosses a chunk boundary, we need to split
+ * it.
+ */
+ if (unlikely((bio->bi_iter.bi_sector & chunk_mask) +
+ bio_sectors(bio) > chunk_sects
+ && (conf->geo.near_copies < conf->geo.raid_disks
+ || conf->prev.near_copies <
+ conf->prev.raid_disks))) {
+ split = bio_split(bio, chunk_sects -
+ (bio->bi_iter.bi_sector &
+ (chunk_sects - 1)),
+ GFP_NOIO, fs_bio_set);
+ bio_chain(split, bio);
+ } else {
+ split = bio;
+ }
+
+ __make_request(mddev, split);
+ } while (split != bio);

/* In case raid10d snuck in to freeze_array */
wake_up(&conf->wait_barrier);
diff --git a/fs/bio.c b/fs/bio.c
index 3968b8f..a674101 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1712,6 +1712,77 @@ void bio_endio(struct bio *bio, int error)
}
EXPORT_SYMBOL(bio_endio);

+/**
+ * bio_split - split a bio
+ * @bio: bio to split
+ * @sectors: number of sectors to split from the front of @bio
+ * @gfp: gfp mask
+ * @bs: bio set to allocate from
+ *
+ * Allocates and returns a new bio which represents @sectors from the start of
+ * @bio, and updates @bio to represent the remaining sectors.
+ *
+ * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
+ * unchanged.
+ */
+struct bio *bio_split(struct bio *bio, int sectors,
+ gfp_t gfp, struct bio_set *bs)
+{
+ unsigned vcnt = 0, nbytes = sectors << 9;
+ struct bio_vec bv;
+ struct bvec_iter iter;
+ struct bio *split = NULL;
+
+ BUG_ON(sectors <= 0);
+ BUG_ON(sectors >= bio_sectors(bio));
+
+ if (bio->bi_rw & REQ_DISCARD) {
+ split = bio_alloc_bioset(gfp, 1, bs);
+ goto out;
+ }
+
+ bio_for_each_segment(bv, bio, iter) {
+ vcnt++;
+
+ if (nbytes <= bv.bv_len)
+ break;
+
+ nbytes -= bv.bv_len;
+ }
+
+ split = bio_alloc_bioset(gfp, vcnt, bs);
+ if (!split)
+ return NULL;
+
+ bio_for_each_segment(bv, bio, iter) {
+ split->bi_io_vec[split->bi_vcnt++] = bv;
+
+ if (split->bi_vcnt == vcnt)
+ break;
+ }
+
+ split->bi_io_vec[split->bi_vcnt - 1].bv_len = nbytes;
+out:
+ split->bi_bdev = bio->bi_bdev;
+ split->bi_iter.bi_sector = bio->bi_iter.bi_sector;
+ split->bi_iter.bi_size = sectors << 9;
+ split->bi_rw = bio->bi_rw;
+
+ if (bio_integrity(bio)) {
+ if (bio_integrity_clone(split, bio, gfp)) {
+ bio_put(split);
+ return NULL;
+ }
+
+ bio_integrity_trim(split, 0, bio_sectors(split));
+ }
+
+ bio_advance(bio, split->bi_iter.bi_size);
+
+ return split;
+}
+EXPORT_SYMBOL(bio_split);
+
void bio_pair_release(struct bio_pair *bp)
{
if (atomic_dec_and_test(&bp->cnt)) {
diff --git a/include/linux/bio.h b/include/linux/bio.h
index dd70f7e..866db8a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -284,6 +284,28 @@ struct bio_pair {
extern struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors);
extern void bio_pair_release(struct bio_pair *dbio);

+extern struct bio *bio_split(struct bio *bio, int sectors,
+ gfp_t gfp, struct bio_set *bs);
+
+/**
+ * bio_next_split - get next @sectors from a bio, splitting if necessary
+ * @bio: bio to split
+ * @sectors: number of sectors to split from the front of @bio
+ * @gfp: gfp mask
+ * @bs: bio set to allocate from
+ *
+ * Returns a bio representing the next @sectors of @bio - if the bio is smaller
+ * than @sectors, returns the original bio unchanged.
+ */
+static inline struct bio *bio_next_split(struct bio *bio, int sectors,
+ gfp_t gfp, struct bio_set *bs)
+{
+ if (sectors >= bio_sectors(bio))
+ return bio;
+
+ return bio_split(bio, sectors, gfp, bs);
+}
+
extern struct bio_set *bioset_create(unsigned int, unsigned int);
extern void bioset_free(struct bio_set *);
extern mempool_t *biovec_create_pool(struct bio_set *bs, int pool_entries);
--
1.8.3.rc1

2013-06-09 02:22:46

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 18/26] block: Kill bio_pair_split()

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
---
fs/bio-integrity.c | 45 ---------------------------
fs/bio.c | 90 -----------------------------------------------------
include/linux/bio.h | 30 ------------------
3 files changed, 165 deletions(-)

diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index 61f41ff..72fa942 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -581,51 +581,6 @@ void bio_integrity_trim(struct bio *bio, unsigned int offset,
EXPORT_SYMBOL(bio_integrity_trim);

/**
- * bio_integrity_split - Split integrity metadata
- * @bio: Protected bio
- * @bp: Resulting bio_pair
- * @sectors: Offset
- *
- * Description: Splits an integrity page into a bio_pair.
- */
-void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
-{
- struct blk_integrity *bi;
- struct bio_integrity_payload *bip = bio->bi_integrity;
- unsigned int nr_sectors;
-
- if (bio_integrity(bio) == 0)
- return;
-
- bi = bdev_get_integrity(bio->bi_bdev);
- BUG_ON(bi == NULL);
- BUG_ON(bip->bip_vcnt != 1);
-
- nr_sectors = bio_integrity_hw_sectors(bi, sectors);
-
- bp->bio1.bi_integrity = &bp->bip1;
- bp->bio2.bi_integrity = &bp->bip2;
-
- bp->iv1 = bip->bip_vec[bip->bip_iter.bi_idx];
- bp->iv2 = bip->bip_vec[bip->bip_iter.bi_idx];
-
- bp->bip1.bip_vec = &bp->iv1;
- bp->bip2.bip_vec = &bp->iv2;
-
- bp->iv1.bv_len = sectors * bi->tuple_size;
- bp->iv2.bv_offset += sectors * bi->tuple_size;
- bp->iv2.bv_len -= sectors * bi->tuple_size;
-
- bp->bip1.bip_iter.bi_sector = bio->bi_integrity->bip_iter.bi_sector;
- bp->bip2.bip_iter.bi_sector =
- bio->bi_integrity->bip_iter.bi_sector + nr_sectors;
-
- bp->bip1.bip_vcnt = bp->bip2.bip_vcnt = 1;
- bp->bip1.bip_iter.bi_idx = bp->bip2.bip_iter.bi_idx = 0;
-}
-EXPORT_SYMBOL(bio_integrity_split);
-
-/**
* bio_integrity_clone - Callback for cloning bios with integrity metadata
* @bio: New bio
* @bio_src: Original bio
diff --git a/fs/bio.c b/fs/bio.c
index a674101..ee033a7 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -38,8 +38,6 @@
*/
#define BIO_INLINE_VECS 4

-static mempool_t *bio_split_pool __read_mostly;
-
/*
* if you change this list, also change bvec_alloc or things will
* break badly! cannot be bigger than what you can fit into an
@@ -1783,89 +1781,6 @@ out:
}
EXPORT_SYMBOL(bio_split);

-void bio_pair_release(struct bio_pair *bp)
-{
- if (atomic_dec_and_test(&bp->cnt)) {
- struct bio *master = bp->bio1.bi_private;
-
- bio_endio(master, bp->error);
- mempool_free(bp, bp->bio2.bi_private);
- }
-}
-EXPORT_SYMBOL(bio_pair_release);
-
-static void bio_pair_end_1(struct bio *bi, int err)
-{
- struct bio_pair *bp = container_of(bi, struct bio_pair, bio1);
-
- if (err)
- bp->error = err;
-
- bio_pair_release(bp);
-}
-
-static void bio_pair_end_2(struct bio *bi, int err)
-{
- struct bio_pair *bp = container_of(bi, struct bio_pair, bio2);
-
- if (err)
- bp->error = err;
-
- bio_pair_release(bp);
-}
-
-/*
- * split a bio - only worry about a bio with a single page in its iovec
- */
-struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
-{
- struct bio_pair *bp = mempool_alloc(bio_split_pool, GFP_NOIO);
-
- if (!bp)
- return bp;
-
- trace_block_split(bdev_get_queue(bi->bi_bdev), bi,
- bi->bi_iter.bi_sector + first_sectors);
-
- BUG_ON(bio_segments(bi) > 1);
- atomic_set(&bp->cnt, 3);
- bp->error = 0;
- bp->bio1 = *bi;
- bp->bio2 = *bi;
- bp->bio2.bi_iter.bi_sector += first_sectors;
- bp->bio2.bi_iter.bi_size -= first_sectors << 9;
- bp->bio1.bi_iter.bi_size = first_sectors << 9;
-
- if (bi->bi_vcnt != 0) {
- bp->bv1 = bio_iovec(bi);
- bp->bv2 = bio_iovec(bi);
-
- if (bio_is_rw(bi)) {
- bp->bv2.bv_offset += first_sectors << 9;
- bp->bv2.bv_len -= first_sectors << 9;
- bp->bv1.bv_len = first_sectors << 9;
- }
-
- bp->bio1.bi_io_vec = &bp->bv1;
- bp->bio2.bi_io_vec = &bp->bv2;
-
- bp->bio1.bi_max_vecs = 1;
- bp->bio2.bi_max_vecs = 1;
- }
-
- bp->bio1.bi_end_io = bio_pair_end_1;
- bp->bio2.bi_end_io = bio_pair_end_2;
-
- bp->bio1.bi_private = bi;
- bp->bio2.bi_private = bio_split_pool;
-
- if (bio_integrity(bi))
- bio_integrity_split(bi, bp, first_sectors);
-
- return bp;
-}
-EXPORT_SYMBOL(bio_pair_split);
-
/*
* create memory pools for biovec's in a bio_set.
* use the global biovec slabs created for general use.
@@ -2043,11 +1958,6 @@ static int __init init_bio(void)
if (bioset_integrity_create(fs_bio_set, BIO_POOL_SIZE))
panic("bio: can't create integrity pool\n");

- bio_split_pool = mempool_create_kmalloc_pool(BIO_SPLIT_ENTRIES,
- sizeof(struct bio_pair));
- if (!bio_split_pool)
- panic("bio: can't create split pool\n");
-
return 0;
}
subsys_initcall(init_bio);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 866db8a..6767622 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -261,29 +261,6 @@ struct bio_integrity_payload {
};
#endif /* CONFIG_BLK_DEV_INTEGRITY */

-/*
- * A bio_pair is used when we need to split a bio.
- * This can only happen for a bio that refers to just one
- * page of data, and in the unusual situation when the
- * page crosses a chunk/device boundary
- *
- * The address of the master bio is stored in bio1.bi_private
- * The address of the pool the pair was allocated from is stored
- * in bio2.bi_private
- */
-struct bio_pair {
- struct bio bio1, bio2;
- struct bio_vec bv1, bv2;
-#if defined(CONFIG_BLK_DEV_INTEGRITY)
- struct bio_integrity_payload bip1, bip2;
- struct bio_vec iv1, iv2;
-#endif
- atomic_t cnt;
- int error;
-};
-extern struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors);
-extern void bio_pair_release(struct bio_pair *dbio);
-
extern struct bio *bio_split(struct bio *bio, int sectors,
gfp_t gfp, struct bio_set *bs);

@@ -655,7 +632,6 @@ extern int bio_integrity_prep(struct bio *);
extern void bio_integrity_endio(struct bio *, int);
extern void bio_integrity_advance(struct bio *, unsigned int);
extern void bio_integrity_trim(struct bio *, unsigned int, unsigned int);
-extern void bio_integrity_split(struct bio *, struct bio_pair *, int);
extern int bio_integrity_clone(struct bio *, struct bio *, gfp_t);
extern int bioset_integrity_create(struct bio_set *, int);
extern void bioset_integrity_free(struct bio_set *);
@@ -699,12 +675,6 @@ static inline int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
return 0;
}

-static inline void bio_integrity_split(struct bio *bio, struct bio_pair *bp,
- int sectors)
-{
- return;
-}
-
static inline void bio_integrity_advance(struct bio *bio,
unsigned int bytes_done)
{
--
1.8.3.rc1

2013-06-09 02:20:02

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 13/26] dm: Refactor for new bio cloning/splitting

We need to convert the dm code to the new bvec_iter primitives which
respect bi_bvec_done; they also allow us to drastically simplify dm's
bio splitting code.

Also kill bio_sector_offset(), dm was the only user and it doesn't make
much sense anymore.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: [email protected]
---
drivers/md/dm.c | 170 ++++++----------------------------------------------
fs/bio.c | 38 ------------
include/linux/bio.h | 1 -
3 files changed, 18 insertions(+), 191 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index d67c6a9..8aac97b 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1013,7 +1013,6 @@ struct clone_info {
struct dm_io *io;
sector_t sector;
sector_t sector_count;
- unsigned short idx;
};

static void bio_setup_sector(struct bio *bio, sector_t sector, sector_t len)
@@ -1022,68 +1021,24 @@ static void bio_setup_sector(struct bio *bio, sector_t sector, sector_t len)
bio->bi_iter.bi_size = to_bytes(len);
}

-static void bio_setup_bv(struct bio *bio, unsigned short idx, unsigned short bv_count)
-{
- bio->bi_iter.bi_idx = idx;
- bio->bi_vcnt = idx + bv_count;
- bio->bi_flags &= ~(1 << BIO_SEG_VALID);
-}
-
-static void clone_bio_integrity(struct bio *bio, struct bio *clone,
- unsigned short idx, unsigned len, unsigned offset,
- unsigned trim)
-{
- if (!bio_integrity(bio))
- return;
-
- bio_integrity_clone(clone, bio, GFP_NOIO);
-
- if (trim)
- bio_integrity_trim(clone, bio_sector_offset(bio, idx, offset), len);
-}
-
-/*
- * Creates a little bio that just does part of a bvec.
- */
-static void clone_split_bio(struct dm_target_io *tio, struct bio *bio,
- sector_t sector, unsigned short idx,
- unsigned offset, unsigned len)
-{
- struct bio *clone = &tio->clone;
- struct bio_vec *bv = bio->bi_io_vec + idx;
-
- *clone->bi_io_vec = *bv;
-
- bio_setup_sector(clone, sector, len);
-
- clone->bi_bdev = bio->bi_bdev;
- clone->bi_rw = bio->bi_rw;
- clone->bi_vcnt = 1;
- clone->bi_io_vec->bv_offset = offset;
- clone->bi_io_vec->bv_len = clone->bi_iter.bi_size;
- clone->bi_flags |= 1 << BIO_CLONED;
-
- clone_bio_integrity(bio, clone, idx, len, offset, 1);
-}
-
/*
* Creates a bio that consists of range of complete bvecs.
*/
static void clone_bio(struct dm_target_io *tio, struct bio *bio,
- sector_t sector, unsigned short idx,
- unsigned short bv_count, unsigned len)
+ sector_t sector, unsigned len)
{
struct bio *clone = &tio->clone;
- unsigned trim = 0;

__bio_clone(clone, bio);
- bio_setup_sector(clone, sector, len);
- bio_setup_bv(clone, idx, bv_count);

- if (idx != bio->bi_iter.bi_idx ||
- clone->bi_iter.bi_size < bio->bi_iter.bi_size)
- trim = 1;
- clone_bio_integrity(bio, clone, idx, len, 0, trim);
+ if (bio_integrity(bio))
+ bio_integrity_clone(clone, bio, GFP_NOIO);
+
+ bio_advance(clone, (sector - clone->bi_iter.bi_sector) << 9);
+ bio->bi_iter.bi_size = len << 9;
+
+ if (bio_integrity(bio))
+ bio_integrity_trim(clone, 0, len);
}

static struct dm_target_io *alloc_tio(struct clone_info *ci,
@@ -1145,10 +1100,7 @@ static int __send_empty_flush(struct clone_info *ci)
}

static void __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti,
- sector_t sector, int nr_iovecs,
- unsigned short idx, unsigned short bv_count,
- unsigned offset, unsigned len,
- unsigned split_bvec)
+ sector_t sector, unsigned len)
{
struct bio *bio = ci->bio;
struct dm_target_io *tio;
@@ -1162,11 +1114,8 @@ static void __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti
num_target_bios = ti->num_write_bios(ti, bio);

for (target_bio_nr = 0; target_bio_nr < num_target_bios; target_bio_nr++) {
- tio = alloc_tio(ci, ti, nr_iovecs, target_bio_nr);
- if (split_bvec)
- clone_split_bio(tio, bio, sector, idx, offset, len);
- else
- clone_bio(tio, bio, sector, idx, bv_count, len);
+ tio = alloc_tio(ci, ti, 0, target_bio_nr);
+ clone_bio(tio, bio, sector, len);
__map_bio(tio);
}
}
@@ -1238,68 +1187,13 @@ static int __send_write_same(struct clone_info *ci)
}

/*
- * Find maximum number of sectors / bvecs we can process with a single bio.
- */
-static sector_t __len_within_target(struct clone_info *ci, sector_t max, int *idx)
-{
- struct bio *bio = ci->bio;
- sector_t bv_len, total_len = 0;
-
- for (*idx = ci->idx; max && (*idx < bio->bi_vcnt); (*idx)++) {
- bv_len = to_sector(bio->bi_io_vec[*idx].bv_len);
-
- if (bv_len > max)
- break;
-
- max -= bv_len;
- total_len += bv_len;
- }
-
- return total_len;
-}
-
-static int __split_bvec_across_targets(struct clone_info *ci,
- struct dm_target *ti, sector_t max)
-{
- struct bio *bio = ci->bio;
- struct bio_vec *bv = bio->bi_io_vec + ci->idx;
- sector_t remaining = to_sector(bv->bv_len);
- unsigned offset = 0;
- sector_t len;
-
- do {
- if (offset) {
- ti = dm_table_find_target(ci->map, ci->sector);
- if (!dm_target_is_valid(ti))
- return -EIO;
-
- max = max_io_len(ci->sector, ti);
- }
-
- len = min(remaining, max);
-
- __clone_and_map_data_bio(ci, ti, ci->sector, 1, ci->idx, 0,
- bv->bv_offset + offset, len, 1);
-
- ci->sector += len;
- ci->sector_count -= len;
- offset += to_bytes(len);
- } while (remaining -= len);
-
- ci->idx++;
-
- return 0;
-}
-
-/*
* Select the correct strategy for processing a non-flush bio.
*/
static int __split_and_process_non_flush(struct clone_info *ci)
{
struct bio *bio = ci->bio;
struct dm_target *ti;
- sector_t len, max;
- int idx;
+ unsigned len;

if (unlikely(bio->bi_rw & REQ_DISCARD))
return __send_discard(ci);
@@ -1310,41 +1204,14 @@ static int __split_and_process_non_flush(struct clone_info *ci)
if (!dm_target_is_valid(ti))
return -EIO;

- max = max_io_len(ci->sector, ti);
-
- /*
- * Optimise for the simple case where we can do all of
- * the remaining io with a single clone.
- */
- if (ci->sector_count <= max) {
- __clone_and_map_data_bio(ci, ti, ci->sector, bio->bi_max_vecs,
- ci->idx, bio->bi_vcnt - ci->idx, 0,
- ci->sector_count, 0);
- ci->sector_count = 0;
- return 0;
- }
-
- /*
- * There are some bvecs that don't span targets.
- * Do as many of these as possible.
- */
- if (to_sector(bio->bi_io_vec[ci->idx].bv_len) <= max) {
- len = __len_within_target(ci, max, &idx);
-
- __clone_and_map_data_bio(ci, ti, ci->sector, bio->bi_max_vecs,
- ci->idx, idx - ci->idx, 0, len, 0);
+ len = min_t(unsigned, max_io_len(ci->sector, ti), bio_sectors(bio));

- ci->sector += len;
- ci->sector_count -= len;
- ci->idx = idx;
+ __clone_and_map_data_bio(ci, ti, ci->sector, len);

- return 0;
- }
+ ci->sector += len;
+ ci->sector_count -= len;

- /*
- * Handle a bvec that must be split between two or more targets.
- */
- return __split_bvec_across_targets(ci, ti, max);
+ return 0;
}

/*
@@ -1369,7 +1236,6 @@ static void __split_and_process_bio(struct mapped_device *md, struct bio *bio)
ci.io->md = md;
spin_lock_init(&ci.io->endio_lock);
ci.sector = bio->bi_iter.bi_sector;
- ci.idx = bio->bi_iter.bi_idx;

start_io_acct(ci.io);

diff --git a/fs/bio.c b/fs/bio.c
index d4200f4..24271ce 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1762,44 +1762,6 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)
}
EXPORT_SYMBOL(bio_split);

-/**
- * bio_sector_offset - Find hardware sector offset in bio
- * @bio: bio to inspect
- * @index: bio_vec index
- * @offset: offset in bv_page
- *
- * Return the number of hardware sectors between beginning of bio
- * and an end point indicated by a bio_vec index and an offset
- * within that vector's page.
- */
-sector_t bio_sector_offset(struct bio *bio, unsigned short index,
- unsigned int offset)
-{
- unsigned int sector_sz;
- struct bio_vec *bv;
- sector_t sectors;
- int i;
-
- sector_sz = queue_logical_block_size(bio->bi_bdev->bd_disk->queue);
- sectors = 0;
-
- if (index >= bio->bi_iter.bi_idx)
- index = bio->bi_vcnt - 1;
-
- bio_for_each_segment_all(bv, bio, i) {
- if (i == index) {
- if (offset > bv->bv_offset)
- sectors += (offset - bv->bv_offset) / sector_sz;
- break;
- }
-
- sectors += bv->bv_len / sector_sz;
- }
-
- return sectors;
-}
-EXPORT_SYMBOL(bio_sector_offset);
-
/*
* create memory pools for biovec's in a bio_set.
* use the global biovec slabs created for general use.
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1967b64..80ffe15 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -331,7 +331,6 @@ extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
unsigned int, unsigned int);
extern int bio_get_nr_vecs(struct block_device *);
-extern sector_t bio_sector_offset(struct bio *, unsigned short, unsigned int);
extern struct bio *bio_map_user(struct request_queue *, struct block_device *,
unsigned long, unsigned int, int, gfp_t);
struct sg_iovec;
--
1.8.3.rc1

2013-06-09 02:23:27

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 16/26] block: Rename bio_split() -> bio_pair_split()

This is prep work for introducing a more general bio_split().

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Lars Ellenberg <[email protected]>
Cc: Peter Osterlund <[email protected]>
Cc: Sage Weil <[email protected]>
---
drivers/block/pktcdvd.c | 2 +-
drivers/md/linear.c | 2 +-
drivers/md/raid0.c | 6 +++---
drivers/md/raid10.c | 2 +-
fs/bio.c | 4 ++--
include/linux/bio.h | 2 +-
6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 68a6d2a..6c1029c 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2412,7 +2412,7 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
if (last_zone != zone) {
BUG_ON(last_zone != zone + pd->settings.size);
first_sectors = last_zone - bio->bi_iter.bi_sector;
- bp = bio_split(bio, first_sectors);
+ bp = bio_pair_split(bio, first_sectors);
BUG_ON(!bp);
pkt_make_request(q, &bp->bio1);
pkt_make_request(q, &bp->bio2);
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index fb3b0d0..e9b53e9 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -326,7 +326,7 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)

rcu_read_unlock();

- bp = bio_split(bio, end_sector - bio->bi_iter.bi_sector);
+ bp = bio_pair_split(bio, end_sector - bio->bi_iter.bi_sector);

linear_make_request(mddev, &bp->bio1);
linear_make_request(mddev, &bp->bio2);
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index d8c2ec0..5edc3f7 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -534,11 +534,11 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
* refuse to split for us, so we need to split it.
*/
if (likely(is_power_of_2(chunk_sects)))
- bp = bio_split(bio, chunk_sects - (sector &
+ bp = bio_pair_split(bio, chunk_sects - (sector &
(chunk_sects-1)));
else
- bp = bio_split(bio, chunk_sects -
- sector_div(sector, chunk_sects));
+ bp = bio_pair_split(bio, chunk_sects -
+ sector_div(sector, chunk_sects));
raid0_make_request(mddev, &bp->bio1);
raid0_make_request(mddev, &bp->bio2);
bio_pair_release(bp);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5899d87..7b68d82 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1185,7 +1185,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
/* This is a one page bio that upper layers
* refuse to split for us, so we need to split it.
*/
- bp = bio_split(bio, chunk_sects -
+ bp = bio_pair_split(bio, chunk_sects -
(bio->bi_iter.bi_sector & (chunk_sects - 1)));

/* Each of these 'make_request' calls will call 'wait_barrier'.
diff --git a/fs/bio.c b/fs/bio.c
index bd25401..3968b8f 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1746,7 +1746,7 @@ static void bio_pair_end_2(struct bio *bi, int err)
/*
* split a bio - only worry about a bio with a single page in its iovec
*/
-struct bio_pair *bio_split(struct bio *bi, int first_sectors)
+struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
{
struct bio_pair *bp = mempool_alloc(bio_split_pool, GFP_NOIO);

@@ -1793,7 +1793,7 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)

return bp;
}
-EXPORT_SYMBOL(bio_split);
+EXPORT_SYMBOL(bio_pair_split);

/*
* create memory pools for biovec's in a bio_set.
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 55d2ce8..dd70f7e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -281,7 +281,7 @@ struct bio_pair {
atomic_t cnt;
int error;
};
-extern struct bio_pair *bio_split(struct bio *bi, int first_sectors);
+extern struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors);
extern void bio_pair_release(struct bio_pair *dbio);

extern struct bio_set *bioset_create(unsigned int, unsigned int);
--
1.8.3.rc1

2013-06-09 02:23:47

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 15/26] block: Generic bio chaining

This adds a generic mechanism for chaining bio completions. This is
going to be used for a bio_split() replacement, and some other things in
the future.

This is implemented with a new bio flag that bio_endio() checks; it
would definitely be cleaner to implement chaining with a bi_end_io
function, but since there's no limits on the depth of a bio chain (and
with arbitrary bio splitting coming this is going to be a real issue)
using an endio function would lead to unbounded stack usage.

Tail call optimization could solve that, but CONFIG_FRAME_POINTER
disables gcc's tail call optimization (-fno-optimize-sibling-calls) - so
we do it the hacky but safe way.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
---
drivers/md/bcache/io.c | 2 +-
fs/bio.c | 45 +++++++++++++++++++++++++++++++++++++++------
include/linux/bio.h | 1 +
include/linux/blk_types.h | 7 +++++--
4 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 294b4c1..246b420 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -137,7 +137,7 @@ static void bch_bio_submit_split_done(struct closure *cl)

s->bio->bi_end_io = s->bi_end_io;
s->bio->bi_private = s->bi_private;
- bio_endio(s->bio, 0);
+ s->bio->bi_end_io(s->bio, 0);

closure_debug_destroy(&s->cl);
mempool_free(s, s->p->bio_split_hook);
diff --git a/fs/bio.c b/fs/bio.c
index 24271ce..bd25401 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -273,6 +273,7 @@ void bio_init(struct bio *bio)
{
memset(bio, 0, sizeof(*bio));
bio->bi_flags = 1 << BIO_UPTODATE;
+ atomic_set(&bio->bi_remaining, 1);
atomic_set(&bio->bi_cnt, 1);
}
EXPORT_SYMBOL(bio_init);
@@ -295,9 +296,29 @@ void bio_reset(struct bio *bio)

memset(bio, 0, BIO_RESET_BYTES);
bio->bi_flags = flags|(1 << BIO_UPTODATE);
+ atomic_set(&bio->bi_remaining, 1);
}
EXPORT_SYMBOL(bio_reset);

+/**
+ * bio_chain - chain bio completions
+ *
+ * The caller won't have a bi_end_io called when @bio completes - instead,
+ * @parent's bi_end_io won't be called until both @parent and @bio have
+ * completed.
+ *
+ * The caller must not set bi_private or bi_end_io in @bio.
+ */
+void bio_chain(struct bio *bio, struct bio *parent)
+{
+ BUG_ON(bio->bi_private || bio->bi_end_io);
+
+ bio->bi_flags |= 1 << BIO_CHAINED;
+ bio->bi_private = parent;
+ atomic_inc(&parent->bi_remaining);
+}
+EXPORT_SYMBOL(bio_chain);
+
static void bio_alloc_rescue(struct work_struct *work)
{
struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
@@ -1669,13 +1690,25 @@ EXPORT_SYMBOL(bio_flush_dcache_pages);
**/
void bio_endio(struct bio *bio, int error)
{
- if (error)
- clear_bit(BIO_UPTODATE, &bio->bi_flags);
- else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
- error = -EIO;
+ while (bio) {
+ BUG_ON(atomic_read(&bio->bi_remaining) <= 0);
+
+ if (error)
+ clear_bit(BIO_UPTODATE, &bio->bi_flags);
+ else if (!test_bit(BIO_UPTODATE, &bio->bi_flags))
+ error = -EIO;
+
+ if (!atomic_dec_and_test(&bio->bi_remaining))
+ return;

- if (bio->bi_end_io)
- bio->bi_end_io(bio, error);
+ if (bio_flagged(bio, BIO_CHAINED)) {
+ bio = bio->bi_private;
+ } else {
+ if (bio->bi_end_io)
+ bio->bi_end_io(bio, error);
+ bio = NULL;
+ }
+ }
}
EXPORT_SYMBOL(bio_endio);

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 80ffe15..55d2ce8 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -326,6 +326,7 @@ extern void bio_advance(struct bio *, unsigned);

extern void bio_init(struct bio *);
extern void bio_reset(struct bio *);
+void bio_chain(struct bio *, struct bio *);

extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 72f1274..69f5c0d 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -64,6 +64,8 @@ struct bio {
unsigned int bi_seg_front_size;
unsigned int bi_seg_back_size;

+ atomic_t bi_remaining;
+
bio_end_io_t *bi_end_io;

void *bi_private;
@@ -119,13 +121,14 @@ struct bio {
#define BIO_QUIET 10 /* Make BIO Quiet */
#define BIO_MAPPED_INTEGRITY 11/* integrity metadata has been remapped */
#define BIO_SNAP_STABLE 12 /* bio data must be snapshotted during write */
+#define BIO_CHAINED 13 /* bi_private points to a parent bio */

/*
* Flags starting here get preserved by bio_reset() - this includes
* BIO_POOL_IDX()
*/
-#define BIO_RESET_BITS 13
-#define BIO_OWNS_VEC 13 /* bio_free() should free bvec */
+#define BIO_RESET_BITS 14
+#define BIO_OWNS_VEC 14 /* bio_free() should free bvec */

#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))

--
1.8.3.rc1

2013-06-09 02:24:13

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 11/26] block: Kill bio_iovec_idx(), __bio_iovec()

bio_iovec_idx() and __bio_iovec() don't have any valid uses anymore -
previous users have been converted to bio_iovec_iter() or other methods.

__BVEC_END() has to go too - the bvec array can't be used directly for
the last biovec because we might only be using the first portion of it,
we have to iterate over the bvec array with bio_for_each_segment() which
checks against the current value of bi_iter.bi_size.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
---
block/blk-merge.c | 13 +++++++++++--
include/linux/bio.h | 26 ++++++++------------------
2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 8940562..b53ddac 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -88,6 +88,9 @@ EXPORT_SYMBOL(blk_recount_segments);
static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
struct bio *nxt)
{
+ struct bio_vec end_bv, nxt_bv;
+ struct bvec_iter iter;
+
if (!blk_queue_cluster(q))
return 0;

@@ -98,14 +101,20 @@ static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
if (!bio_has_data(bio))
return 1;

- if (!BIOVEC_PHYS_MERGEABLE(__BVEC_END(bio), __BVEC_START(nxt)))
+ bio_for_each_segment(end_bv, bio, iter)
+ if (end_bv.bv_len == iter.bi_size)
+ break;
+
+ nxt_bv = bio_iovec(nxt);
+
+ if (!BIOVEC_PHYS_MERGEABLE(&end_bv, &nxt_bv))
return 0;

/*
* bio and nxt are contiguous in memory; check if the queue allows
* these two to be merged into one
*/
- if (BIO_SEG_BOUNDARY(q, bio, nxt))
+ if (BIOVEC_SEG_BOUNDARY(q, &end_bv, &nxt_bv))
return 1;

return 0;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 62c7293..1967b64 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -61,9 +61,6 @@
* various member access, note that bio_data should of course not be used
* on highmem page vectors
*/
-#define bio_iovec_idx(bio, idx) (&((bio)->bi_io_vec[(idx)]))
-#define __bio_iovec(bio) bio_iovec_idx((bio), (bio)->bi_iter.bi_idx)
-
#define __bvec_iter_bvec(bvec, iter) (&(bvec)[(iter).bi_idx])

#define bvec_iter_page(bvec, iter) \
@@ -138,19 +135,16 @@ static inline void *bio_data(struct bio *bio)
* permanent PIO fall back, user is probably better off disabling highmem
* I/O completely on that queue (see ide-dma for example)
*/
-#define __bio_kmap_atomic(bio, idx, kmtype) \
- (kmap_atomic(bio_iovec_idx((bio), (idx))->bv_page) + \
- bio_iovec_idx((bio), (idx))->bv_offset)
+#define __bio_kmap_atomic(bio, iter) \
+ (kmap_atomic(bio_iovec_iter((bio), (iter)).bv_page) + \
+ bio_iovec_iter((bio), (iter)).bv_offset)

-#define __bio_kunmap_atomic(addr, kmtype) kunmap_atomic(addr)
+#define __bio_kunmap_atomic(addr) kunmap_atomic(addr)

/*
* merge helpers etc
*/

-#define __BVEC_END(bio) bio_iovec_idx((bio), (bio)->bi_vcnt - 1)
-#define __BVEC_START(bio) bio_iovec_idx((bio), (bio)->bi_iter.bi_idx)
-
/* Default implementation of BIOVEC_PHYS_MERGEABLE */
#define __BIOVEC_PHYS_MERGEABLE(vec1, vec2) \
((bvec_to_phys((vec1)) + (vec1)->bv_len) == bvec_to_phys((vec2)))
@@ -167,8 +161,6 @@ static inline void *bio_data(struct bio *bio)
(((addr1) | (mask)) == (((addr2) - 1) | (mask)))
#define BIOVEC_SEG_BOUNDARY(q, b1, b2) \
__BIO_SEG_BOUNDARY(bvec_to_phys((b1)), bvec_to_phys((b2)) + (b2)->bv_len, queue_segment_boundary((q)))
-#define BIO_SEG_BOUNDARY(q, b1, b2) \
- BIOVEC_SEG_BOUNDARY((q), __BVEC_END((b1)), __BVEC_START((b2)))

#define bio_io_error(bio) bio_endio((bio), -EIO)

@@ -177,9 +169,7 @@ static inline void *bio_data(struct bio *bio)
* before it got to the driver and the driver won't own all of it
*/
#define bio_for_each_segment_all(bvl, bio, i) \
- for (i = 0; \
- bvl = bio_iovec_idx((bio), (i)), i < (bio)->bi_vcnt; \
- i++)
+ for (i = 0, bvl = (bio)->bi_io_vec; i < (bio)->bi_vcnt; i++, bvl++)

static inline void bvec_iter_advance(struct bio_vec *bv, struct bvec_iter *iter,
unsigned bytes)
@@ -431,15 +421,15 @@ static inline void bvec_kunmap_irq(char *buffer, unsigned long *flags)
}
#endif

-static inline char *__bio_kmap_irq(struct bio *bio, unsigned short idx,
+static inline char *__bio_kmap_irq(struct bio *bio, struct bvec_iter iter,
unsigned long *flags)
{
- return bvec_kmap_irq(bio_iovec_idx(bio, idx), flags);
+ return bvec_kmap_irq(&bio_iovec_iter(bio, iter), flags);
}
#define __bio_kunmap_irq(buf, flags) bvec_kunmap_irq(buf, flags)

#define bio_kmap_irq(bio, flags) \
- __bio_kmap_irq((bio), (bio)->bi_iter.bi_idx, (flags))
+ __bio_kmap_irq((bio), (bio)->bi_iter, (flags))
#define bio_kunmap_irq(buf,flags) __bio_kunmap_irq(buf, flags)

static inline bool bio_is_rw(struct bio *bio)
--
1.8.3.rc1

2013-06-09 02:24:11

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 14/26] md, bcache: Remove bi_idx hacks

Now that drivers have been converted to the new bvec_iter primitives,
there's no need to trim the bvec before we submit it; and we can't trim
it once we start sharing bvecs.

It used to be that passing a partially completed bio (i.e. one with
nonzero bi_idx) to generic_make_request() was a dangerous thing -
various drivers would choke on such things. But with immutable biovecs
and our new bio splitting that shares the biovecs, submitting partially
completed bios has to work (and should work, now that all the drivers
have been completed to the new primitives)

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Neil Brown <[email protected]>
---
drivers/md/bcache/io.c | 46 ++--------------------------------------------
drivers/md/md.c | 22 ----------------------
2 files changed, 2 insertions(+), 66 deletions(-)

diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index da198da..294b4c1 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -9,48 +9,6 @@
#include "bset.h"
#include "debug.h"

-static void bch_bi_idx_hack_endio(struct bio *bio, int error)
-{
- struct bio *p = bio->bi_private;
-
- bio_endio(p, error);
- bio_put(bio);
-}
-
-static void bch_generic_make_request_hack(struct bio *bio)
-{
- if (bio->bi_iter.bi_idx) {
- struct bio_vec bv;
- struct bvec_iter iter;
- struct bio *clone = bio_alloc(GFP_NOIO, bio_segments(bio));
-
- bio_for_each_segment(bv, bio, iter)
- clone->bi_io_vec[clone->bi_vcnt++] = bv;
-
- clone->bi_iter.bi_sector = bio->bi_iter.bi_sector;
- clone->bi_bdev = bio->bi_bdev;
- clone->bi_rw = bio->bi_rw;
- clone->bi_vcnt = bio_segments(bio);
- clone->bi_iter.bi_size = bio->bi_iter.bi_size;
-
- clone->bi_private = bio;
- clone->bi_end_io = bch_bi_idx_hack_endio;
-
- bio = clone;
- }
-
- /*
- * Hack, since drivers that clone bios clone up to bi_max_vecs, but our
- * bios might have had more than that (before we split them per device
- * limitations).
- *
- * To be taken out once immutable bvec stuff is in.
- */
- bio->bi_max_vecs = bio->bi_vcnt;
-
- generic_make_request(bio);
-}
-
/**
* bch_bio_split - split a bio
* @bio: bio to split
@@ -225,12 +183,12 @@ void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
n->bi_private = &s->cl;

closure_get(&s->cl);
- bch_generic_make_request_hack(n);
+ generic_make_request(n);
} while (n != bio);

continue_at(&s->cl, bch_bio_submit_split_done, NULL);
submit:
- bch_generic_make_request_hack(bio);
+ generic_make_request(bio);
}

/* Bios with headers */
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 1ed2426..20e0408 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -189,9 +189,6 @@ void md_trim_bio(struct bio *bio, int offset, int size)
* the given offset and size.
* This requires adjusting bi_sector, bi_size, and bi_io_vec
*/
- struct bio_vec bvec;
- struct bvec_iter iter;
- int sofar = 0;

size <<= 9;
if (offset == 0 && size == bio->bi_iter.bi_size)
@@ -202,25 +199,6 @@ void md_trim_bio(struct bio *bio, int offset, int size)
bio_advance(bio, offset << 9);

bio->bi_iter.bi_size = size;
-
- /* avoid any complications with bi_idx being non-zero*/
- if (bio->bi_iter.bi_idx) {
- memmove(bio->bi_io_vec, bio->bi_io_vec+bio->bi_iter.bi_idx,
- (bio->bi_vcnt - bio->bi_iter.bi_idx) *
- sizeof(struct bio_vec));
- bio->bi_vcnt -= bio->bi_iter.bi_idx;
- bio->bi_iter.bi_idx = 0;
- }
- /* Make sure vcnt and last bv are not too big */
- bio_for_each_segment(bvec, bio, iter) {
- if (sofar + bvec.bv_len > size)
- bvec.bv_len = size - sofar;
- if (bvec.bv_len == 0) {
- bio->bi_vcnt = iter.bi_idx;
- break;
- }
- sofar += bvec.bv_len;
- }
}
EXPORT_SYMBOL_GPL(md_trim_bio);

--
1.8.3.rc1

2013-06-09 02:24:52

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 10/26] block: Convert drivers to immutable biovecs

Now that we've got a mechanism for immutable biovecs -
bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
respect it instead of using the bvec array directly.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: NeilBrown <[email protected]>
Cc: "Ed L. Cashin" <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: [email protected]
---
drivers/block/aoe/aoe.h | 10 +---
drivers/block/aoe/aoecmd.c | 127 +++++++++++++++++----------------------------
drivers/block/umem.c | 50 ++++++++----------
drivers/md/dm-crypt.c | 52 ++++++++-----------
drivers/md/dm-io.c | 31 ++++++-----
drivers/md/dm-raid1.c | 8 +--
drivers/md/dm-verity.c | 52 +++++--------------
include/linux/dm-io.h | 4 +-
8 files changed, 131 insertions(+), 203 deletions(-)

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index 1756494..e959e6b 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -100,11 +100,8 @@ enum {

struct buf {
ulong nframesout;
- ulong resid;
- ulong bv_resid;
- sector_t sector;
struct bio *bio;
- struct bio_vec *bv;
+ struct bvec_iter iter;
struct request *rq;
};

@@ -120,13 +117,10 @@ struct frame {
ulong waited;
ulong waited_total;
struct aoetgt *t; /* parent target I belong to */
- sector_t lba;
struct sk_buff *skb; /* command skb freed on module exit */
struct sk_buff *r_skb; /* response skb for async processing */
struct buf *buf;
- struct bio_vec *bv;
- ulong bcnt;
- ulong bv_off;
+ struct bvec_iter iter;
char flags;
};

diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index a52975a..0733ba1 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -183,8 +183,7 @@ aoe_freetframe(struct frame *f)

t = f->t;
f->buf = NULL;
- f->lba = 0;
- f->bv = NULL;
+ memset(&f->iter, 0, sizeof(f->iter));
f->r_skb = NULL;
f->flags = 0;
list_add(&f->head, &t->ffree);
@@ -282,21 +281,17 @@ newframe(struct aoedev *d)
}

static void
-skb_fillup(struct sk_buff *skb, struct bio_vec *bv, ulong off, ulong cnt)
+skb_fillup(struct sk_buff *skb, struct bio *bio, struct bvec_iter *iter)
{
int frag = 0;
- ulong fcnt;
-loop:
- fcnt = bv->bv_len - (off - bv->bv_offset);
- if (fcnt > cnt)
- fcnt = cnt;
- skb_fill_page_desc(skb, frag++, bv->bv_page, off, fcnt);
- cnt -= fcnt;
- if (cnt <= 0)
- return;
- bv++;
- off = bv->bv_offset;
- goto loop;
+
+ while (iter->bi_size) {
+ struct bio_vec bv = bio_iovec_iter(bio, *iter);
+
+ skb_fill_page_desc(skb, frag++, bv.bv_page,
+ bv.bv_offset, bv.bv_len);
+ bio_advance_iter(bio, iter, bv.bv_len);
+ }
}

static void
@@ -333,12 +328,10 @@ ata_rw_frameinit(struct frame *f)
t->nout++;
f->waited = 0;
f->waited_total = 0;
- if (f->buf)
- f->lba = f->buf->sector;

/* set up ata header */
- ah->scnt = f->bcnt >> 9;
- put_lba(ah, f->lba);
+ ah->scnt = f->iter.bi_size >> 9;
+ put_lba(ah, f->iter.bi_sector);
if (t->d->flags & DEVFL_EXT) {
ah->aflags |= AOEAFL_EXT;
} else {
@@ -347,11 +340,11 @@ ata_rw_frameinit(struct frame *f)
ah->lba3 |= 0xe0; /* LBA bit + obsolete 0xa0 */
}
if (f->buf && bio_data_dir(f->buf->bio) == WRITE) {
- skb_fillup(skb, f->bv, f->bv_off, f->bcnt);
+ skb->len += f->iter.bi_size;
+ skb->data_len = f->iter.bi_size;
+ skb->truesize += f->iter.bi_size;
+ skb_fillup(skb, f->buf->bio, &f->iter);
ah->aflags |= AOEAFL_WRITE;
- skb->len += f->bcnt;
- skb->data_len = f->bcnt;
- skb->truesize += f->bcnt;
t->wpkts++;
} else {
t->rpkts++;
@@ -370,7 +363,7 @@ aoecmd_ata_rw(struct aoedev *d)
struct aoetgt *t;
struct sk_buff *skb;
struct sk_buff_head queue;
- ulong bcnt, fbcnt;
+ ulong bcnt;

buf = nextbuf(d);
if (buf == NULL)
@@ -382,36 +375,19 @@ aoecmd_ata_rw(struct aoedev *d)
bcnt = d->maxbcnt;
if (bcnt == 0)
bcnt = DEFAULTBCNT;
- if (bcnt > buf->resid)
- bcnt = buf->resid;
- fbcnt = bcnt;
- f->bv = buf->bv;
- f->bv_off = f->bv->bv_offset + (f->bv->bv_len - buf->bv_resid);
- do {
- if (fbcnt < buf->bv_resid) {
- buf->bv_resid -= fbcnt;
- buf->resid -= fbcnt;
- break;
- }
- fbcnt -= buf->bv_resid;
- buf->resid -= buf->bv_resid;
- if (buf->resid == 0) {
- d->ip.buf = NULL;
- break;
- }
- buf->bv++;
- buf->bv_resid = buf->bv->bv_len;
- WARN_ON(buf->bv_resid == 0);
- } while (fbcnt);
+ if (bcnt > buf->iter.bi_size)
+ bcnt = buf->iter.bi_size;
+
+ bio_advance_iter(buf->bio, &buf->iter, bcnt);

/* initialize the headers & frame */
f->buf = buf;
- f->bcnt = bcnt;
+ f->iter = buf->iter;
+ f->iter.bi_size = bcnt;
ata_rw_frameinit(f);

/* mark all tracking fields and load out */
buf->nframesout += 1;
- buf->sector += bcnt >> 9;

skb = skb_clone(f->skb, GFP_ATOMIC);
if (skb) {
@@ -604,10 +580,7 @@ reassign_frame(struct frame *f)
skb = nf->skb;
nf->skb = f->skb;
nf->buf = f->buf;
- nf->bcnt = f->bcnt;
- nf->lba = f->lba;
- nf->bv = f->bv;
- nf->bv_off = f->bv_off;
+ nf->iter = f->iter;
nf->waited = 0;
nf->waited_total = f->waited_total;
nf->sent = f->sent;
@@ -626,6 +599,7 @@ probe(struct aoetgt *t)
struct sk_buff_head queue;
size_t n, m;
int frag;
+ ulong bcnt;

d = t->d;
f = newtframe(d, t);
@@ -639,19 +613,20 @@ probe(struct aoetgt *t)
}
f->flags |= FFL_PROBE;
ifrotate(t);
- f->bcnt = t->d->maxbcnt ? t->d->maxbcnt : DEFAULTBCNT;
+ bcnt = t->d->maxbcnt ? t->d->maxbcnt : DEFAULTBCNT;
+ f->iter.bi_size = bcnt;
ata_rw_frameinit(f);
skb = f->skb;
- for (frag = 0, n = f->bcnt; n > 0; ++frag, n -= m) {
+ for (frag = 0, n = bcnt; n > 0; ++frag, n -= m) {
if (n < PAGE_SIZE)
m = n;
else
m = PAGE_SIZE;
skb_fill_page_desc(skb, frag, empty_page, 0, m);
}
- skb->len += f->bcnt;
- skb->data_len = f->bcnt;
- skb->truesize += f->bcnt;
+ skb->len += bcnt;
+ skb->data_len = bcnt;
+ skb->truesize += bcnt;

skb = skb_clone(f->skb, GFP_ATOMIC);
if (skb) {
@@ -923,12 +898,8 @@ bufinit(struct buf *buf, struct request *rq, struct bio *bio)
memset(buf, 0, sizeof(*buf));
buf->rq = rq;
buf->bio = bio;
- buf->resid = bio->bi_iter.bi_size;
- buf->sector = bio->bi_iter.bi_sector;
+ buf->iter = bio->bi_iter;
bio_pageinc(bio);
- buf->bv = __bio_iovec(bio);
- buf->bv_resid = buf->bv->bv_len;
- WARN_ON(buf->bv_resid == 0);
}

static struct buf *
@@ -1113,24 +1084,23 @@ gettgt(struct aoedev *d, char *addr)
}

static void
-bvcpy(struct bio_vec *bv, ulong off, struct sk_buff *skb, long cnt)
+bvcpy(struct sk_buff *skb, struct bio *bio, struct bvec_iter *iter, long cnt)
{
- ulong fcnt;
char *p;
int soff = 0;
-loop:
- fcnt = bv->bv_len - (off - bv->bv_offset);
- if (fcnt > cnt)
- fcnt = cnt;
- p = page_address(bv->bv_page) + off;
- skb_copy_bits(skb, soff, p, fcnt);
- soff += fcnt;
- cnt -= fcnt;
- if (cnt <= 0)
- return;
- bv++;
- off = bv->bv_offset;
- goto loop;
+
+ do {
+ struct bio_vec bv = bio_iovec_iter(bio, *iter);
+
+ p = page_address(bv.bv_page) + bv.bv_offset;
+ skb_copy_bits(skb, soff, p, bv.bv_len);
+
+ bio_advance_iter(bio, iter, bv.bv_len);
+ soff += bv.bv_len;
+ cnt -= bv.bv_len;
+ if (cnt <= 0)
+ return;
+ } while (cnt > 0);
}

void
@@ -1223,7 +1193,7 @@ noskb: if (buf)
clear_bit(BIO_UPTODATE, &buf->bio->bi_flags);
break;
}
- bvcpy(f->bv, f->bv_off, skb, n);
+ bvcpy(skb, f->buf->bio, &f->iter, n);
case ATA_CMD_PIO_WRITE:
case ATA_CMD_PIO_WRITE_EXT:
spin_lock_irq(&d->lock);
@@ -1266,7 +1236,7 @@ out:

aoe_freetframe(f);

- if (buf && --buf->nframesout == 0 && buf->resid == 0)
+ if (buf && --buf->nframesout == 0 && buf->iter.bi_size == 0)
aoe_end_buf(d, buf);

spin_unlock_irq(&d->lock);
@@ -1697,7 +1667,6 @@ aoe_failbuf(struct aoedev *d, struct buf *buf)
{
if (buf == NULL)
return;
- buf->resid = 0;
clear_bit(BIO_UPTODATE, &buf->bio->bi_flags);
if (buf->nframesout == 0)
aoe_end_buf(d, buf);
diff --git a/drivers/block/umem.c b/drivers/block/umem.c
index dab4f1a..00145e8 100644
--- a/drivers/block/umem.c
+++ b/drivers/block/umem.c
@@ -108,8 +108,7 @@ struct cardinfo {
* have been written
*/
struct bio *bio, *currentbio, **biotail;
- int current_idx;
- sector_t current_sector;
+ struct bvec_iter current_iter;

struct request_queue *queue;

@@ -118,7 +117,7 @@ struct cardinfo {
struct mm_dma_desc *desc;
int cnt, headcnt;
struct bio *bio, **biotail;
- int idx;
+ struct bvec_iter iter;
} mm_pages[2];
#define DESC_PER_PAGE ((PAGE_SIZE*2)/sizeof(struct mm_dma_desc))

@@ -344,16 +343,13 @@ static int add_bio(struct cardinfo *card)
dma_addr_t dma_handle;
int offset;
struct bio *bio;
- struct bio_vec *vec;
- int idx;
+ struct bio_vec vec;
int rw;
- int len;

bio = card->currentbio;
if (!bio && card->bio) {
card->currentbio = card->bio;
- card->current_idx = card->bio->bi_iter.bi_idx;
- card->current_sector = card->bio->bi_iter.bi_sector;
+ card->current_iter = card->bio->bi_iter;
card->bio = card->bio->bi_next;
if (card->bio == NULL)
card->biotail = &card->bio;
@@ -362,18 +358,17 @@ static int add_bio(struct cardinfo *card)
}
if (!bio)
return 0;
- idx = card->current_idx;

rw = bio_rw(bio);
if (card->mm_pages[card->Ready].cnt >= DESC_PER_PAGE)
return 0;

- vec = bio_iovec_idx(bio, idx);
- len = vec->bv_len;
+ vec = bio_iovec_iter(bio, card->current_iter);
+
dma_handle = pci_map_page(card->dev,
- vec->bv_page,
- vec->bv_offset,
- len,
+ vec.bv_page,
+ vec.bv_offset,
+ vec.bv_len,
(rw == READ) ?
PCI_DMA_FROMDEVICE : PCI_DMA_TODEVICE);

@@ -381,7 +376,7 @@ static int add_bio(struct cardinfo *card)
desc = &p->desc[p->cnt];
p->cnt++;
if (p->bio == NULL)
- p->idx = idx;
+ p->iter = card->current_iter;
if ((p->biotail) != &bio->bi_next) {
*(p->biotail) = bio;
p->biotail = &(bio->bi_next);
@@ -391,8 +386,8 @@ static int add_bio(struct cardinfo *card)
desc->data_dma_handle = dma_handle;

desc->pci_addr = cpu_to_le64((u64)desc->data_dma_handle);
- desc->local_addr = cpu_to_le64(card->current_sector << 9);
- desc->transfer_size = cpu_to_le32(len);
+ desc->local_addr = cpu_to_le64(card->current_iter.bi_sector << 9);
+ desc->transfer_size = cpu_to_le32(vec.bv_len);
offset = (((char *)&desc->sem_control_bits) - ((char *)p->desc));
desc->sem_addr = cpu_to_le64((u64)(p->page_dma+offset));
desc->zero1 = desc->zero2 = 0;
@@ -407,10 +402,9 @@ static int add_bio(struct cardinfo *card)
desc->control_bits |= cpu_to_le32(DMASCR_TRANSFER_READ);
desc->sem_control_bits = desc->control_bits;

- card->current_sector += (len >> 9);
- idx++;
- card->current_idx = idx;
- if (idx >= bio->bi_vcnt)
+
+ bio_advance_iter(bio, &card->current_iter, vec.bv_len);
+ if (!card->current_iter.bi_size)
card->currentbio = NULL;

return 1;
@@ -439,23 +433,25 @@ static void process_page(unsigned long data)
struct mm_dma_desc *desc = &page->desc[page->headcnt];
int control = le32_to_cpu(desc->sem_control_bits);
int last = 0;
- int idx;
+ struct bio_vec vec;

if (!(control & DMASCR_DMA_COMPLETE)) {
control = dma_status;
last = 1;
}
+
page->headcnt++;
- idx = page->idx;
- page->idx++;
- if (page->idx >= bio->bi_vcnt) {
+ vec = bio_iovec_iter(bio, page->iter);
+ bio_advance_iter(bio, &page->iter, vec.bv_len);
+
+ if (!page->iter.bi_size) {
page->bio = bio->bi_next;
if (page->bio)
- page->idx = page->bio->bi_iter.bi_idx;
+ page->iter = page->bio->bi_iter;
}

pci_unmap_page(card->dev, desc->data_dma_handle,
- bio_iovec_idx(bio, idx)->bv_len,
+ vec.bv_len,
(control & DMASCR_TRANSFER_READ) ?
PCI_DMA_TODEVICE : PCI_DMA_FROMDEVICE);
if (control & DMASCR_HARD_ERROR) {
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index fca3bba..d97d824 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -38,10 +38,8 @@ struct convert_context {
struct completion restart;
struct bio *bio_in;
struct bio *bio_out;
- unsigned int offset_in;
- unsigned int offset_out;
- unsigned int idx_in;
- unsigned int idx_out;
+ struct bvec_iter iter_in;
+ struct bvec_iter iter_out;
sector_t cc_sector;
atomic_t cc_pending;
};
@@ -650,10 +648,12 @@ static void crypt_convert_init(struct crypt_config *cc,
{
ctx->bio_in = bio_in;
ctx->bio_out = bio_out;
- ctx->offset_in = 0;
- ctx->offset_out = 0;
- ctx->idx_in = bio_in ? bio_in->bi_iter.bi_idx : 0;
- ctx->idx_out = bio_out ? bio_out->bi_iter.bi_idx : 0;
+
+ if (bio_in)
+ ctx->iter_in = bio_in->bi_iter;
+ if (bio_out)
+ ctx->iter_out = bio_out->bi_iter;
+
ctx->cc_sector = sector + cc->iv_offset;
init_completion(&ctx->restart);
}
@@ -681,8 +681,8 @@ static int crypt_convert_block(struct crypt_config *cc,
struct convert_context *ctx,
struct ablkcipher_request *req)
{
- struct bio_vec *bv_in = bio_iovec_idx(ctx->bio_in, ctx->idx_in);
- struct bio_vec *bv_out = bio_iovec_idx(ctx->bio_out, ctx->idx_out);
+ struct bio_vec bv_in = bio_iovec_iter(ctx->bio_in, ctx->iter_in);
+ struct bio_vec bv_out = bio_iovec_iter(ctx->bio_out, ctx->iter_out);
struct dm_crypt_request *dmreq;
u8 *iv;
int r;
@@ -693,24 +693,15 @@ static int crypt_convert_block(struct crypt_config *cc,
dmreq->iv_sector = ctx->cc_sector;
dmreq->ctx = ctx;
sg_init_table(&dmreq->sg_in, 1);
- sg_set_page(&dmreq->sg_in, bv_in->bv_page, 1 << SECTOR_SHIFT,
- bv_in->bv_offset + ctx->offset_in);
+ sg_set_page(&dmreq->sg_in, bv_in.bv_page, 1 << SECTOR_SHIFT,
+ bv_in.bv_offset);

sg_init_table(&dmreq->sg_out, 1);
- sg_set_page(&dmreq->sg_out, bv_out->bv_page, 1 << SECTOR_SHIFT,
- bv_out->bv_offset + ctx->offset_out);
+ sg_set_page(&dmreq->sg_out, bv_out.bv_page, 1 << SECTOR_SHIFT,
+ bv_out.bv_offset);

- ctx->offset_in += 1 << SECTOR_SHIFT;
- if (ctx->offset_in >= bv_in->bv_len) {
- ctx->offset_in = 0;
- ctx->idx_in++;
- }
-
- ctx->offset_out += 1 << SECTOR_SHIFT;
- if (ctx->offset_out >= bv_out->bv_len) {
- ctx->offset_out = 0;
- ctx->idx_out++;
- }
+ bio_advance_iter(ctx->bio_in, &ctx->iter_in, 1 << SECTOR_SHIFT);
+ bio_advance_iter(ctx->bio_out, &ctx->iter_out, 1 << SECTOR_SHIFT);

if (cc->iv_gen_ops) {
r = cc->iv_gen_ops->generator(cc, iv, dmreq);
@@ -761,8 +752,8 @@ static int crypt_convert(struct crypt_config *cc,

atomic_set(&ctx->cc_pending, 1);

- while(ctx->idx_in < ctx->bio_in->bi_vcnt &&
- ctx->idx_out < ctx->bio_out->bi_vcnt) {
+ while (ctx->iter_in.bi_size &&
+ ctx->iter_out.bi_size) {

crypt_alloc_req(cc, ctx);

@@ -1031,7 +1022,7 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
}

/* crypt_convert should have filled the clone bio */
- BUG_ON(io->ctx.idx_out < clone->bi_vcnt);
+ BUG_ON(io->ctx.iter_out.bi_size);

clone->bi_iter.bi_sector = cc->start + io->sector;

@@ -1070,7 +1061,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
}

io->ctx.bio_out = clone;
- io->ctx.idx_out = 0;
+ io->ctx.iter_out = clone->bi_iter;

remaining -= clone->bi_iter.bi_size;
sector += bio_sectors(clone);
@@ -1114,8 +1105,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
crypt_inc_pending(new_io);
crypt_convert_init(cc, &new_io->ctx, NULL,
io->base_bio, sector);
- new_io->ctx.idx_in = io->ctx.idx_in;
- new_io->ctx.offset_in = io->ctx.offset_in;
+ new_io->ctx.iter_in = io->ctx.iter_in;

/*
* Fragments after the first use the base_io
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index a6de5c9..c2a6c34 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -202,26 +202,29 @@ static void list_dp_init(struct dpages *dp, struct page_list *pl, unsigned offse
/*
* Functions for getting the pages from a bvec.
*/
-static void bvec_get_page(struct dpages *dp,
+static void bio_get_page(struct dpages *dp,
struct page **p, unsigned long *len, unsigned *offset)
{
- struct bio_vec *bvec = (struct bio_vec *) dp->context_ptr;
- *p = bvec->bv_page;
- *len = bvec->bv_len;
- *offset = bvec->bv_offset;
+ struct bio *bio = dp->context_ptr;
+ struct bio_vec bvec = bio_iovec(bio);
+ *p = bvec.bv_page;
+ *len = bvec.bv_len;
+ *offset = bvec.bv_offset;
}

-static void bvec_next_page(struct dpages *dp)
+static void bio_next_page(struct dpages *dp)
{
- struct bio_vec *bvec = (struct bio_vec *) dp->context_ptr;
- dp->context_ptr = bvec + 1;
+ struct bio *bio = dp->context_ptr;
+ struct bio_vec bvec = bio_iovec(bio);
+
+ bio_advance(bio, bvec.bv_len);
}

-static void bvec_dp_init(struct dpages *dp, struct bio_vec *bvec)
+static void bio_dp_init(struct dpages *dp, struct bio *bio)
{
- dp->get_page = bvec_get_page;
- dp->next_page = bvec_next_page;
- dp->context_ptr = bvec;
+ dp->get_page = bio_get_page;
+ dp->next_page = bio_next_page;
+ dp->context_ptr = bio;
}

/*
@@ -459,8 +462,8 @@ static int dp_init(struct dm_io_request *io_req, struct dpages *dp,
list_dp_init(dp, io_req->mem.ptr.pl, io_req->mem.offset);
break;

- case DM_IO_BVEC:
- bvec_dp_init(dp, io_req->mem.ptr.bvec);
+ case DM_IO_BIO:
+ bio_dp_init(dp, io_req->mem.ptr.bio);
break;

case DM_IO_VMA:
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index e3efb91..56e8844 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -526,8 +526,8 @@ static void read_async_bio(struct mirror *m, struct bio *bio)
struct dm_io_region io;
struct dm_io_request io_req = {
.bi_rw = READ,
- .mem.type = DM_IO_BVEC,
- .mem.ptr.bvec = bio->bi_io_vec + bio->bi_iter.bi_idx,
+ .mem.type = DM_IO_BIO,
+ .mem.ptr.bio = bio,
.notify.fn = read_callback,
.notify.context = bio,
.client = m->ms->io_client,
@@ -629,8 +629,8 @@ static void do_write(struct mirror_set *ms, struct bio *bio)
struct mirror *m;
struct dm_io_request io_req = {
.bi_rw = WRITE | (bio->bi_rw & WRITE_FLUSH_FUA),
- .mem.type = DM_IO_BVEC,
- .mem.ptr.bvec = bio->bi_io_vec + bio->bi_iter.bi_idx,
+ .mem.type = DM_IO_BIO,
+ .mem.ptr.bio = bio,
.notify.fn = write_callback,
.notify.context = bio,
.client = ms->io_client,
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index f3a4dcb..5e82c79 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -73,15 +73,10 @@ struct dm_verity_io {
sector_t block;
unsigned n_blocks;

- /* saved bio vector */
- struct bio_vec *io_vec;
- unsigned io_vec_size;
+ struct bvec_iter iter;

struct work_struct work;

- /* A space for short vectors; longer vectors are allocated separately. */
- struct bio_vec io_vec_inline[DM_VERITY_IO_VEC_INLINE];
-
/*
* Three variably-size fields follow this struct:
*
@@ -284,9 +279,10 @@ release_ret_r:
static int verity_verify_io(struct dm_verity_io *io)
{
struct dm_verity *v = io->v;
+ struct bio *bio = dm_bio_from_per_bio_data(io,
+ v->ti->per_bio_data_size);
unsigned b;
int i;
- unsigned vector = 0, offset = 0;

for (b = 0; b < io->n_blocks; b++) {
struct shash_desc *desc;
@@ -336,31 +332,22 @@ test_block_hash:
}

todo = 1 << v->data_dev_block_bits;
- do {
- struct bio_vec *bv;
+ while (io->iter.bi_size) {
u8 *page;
- unsigned len;
-
- BUG_ON(vector >= io->io_vec_size);
- bv = &io->io_vec[vector];
- page = kmap_atomic(bv->bv_page);
- len = bv->bv_len - offset;
- if (likely(len >= todo))
- len = todo;
- r = crypto_shash_update(desc,
- page + bv->bv_offset + offset, len);
+ struct bio_vec bv = bio_iovec_iter(bio, io->iter);
+
+ page = kmap_atomic(bv.bv_page);
+ r = crypto_shash_update(desc, page + bv.bv_offset,
+ bv.bv_len);
kunmap_atomic(page);
+
if (r < 0) {
DMERR("crypto_shash_update failed: %d", r);
return r;
}
- offset += len;
- if (likely(offset == bv->bv_len)) {
- offset = 0;
- vector++;
- }
- todo -= len;
- } while (todo);
+
+ bio_advance_iter(bio, &io->iter, bv.bv_len);
+ }

if (!v->version) {
r = crypto_shash_update(desc, v->salt, v->salt_size);
@@ -383,8 +370,6 @@ test_block_hash:
return -EIO;
}
}
- BUG_ON(vector != io->io_vec_size);
- BUG_ON(offset);

return 0;
}
@@ -400,9 +385,6 @@ static void verity_finish_io(struct dm_verity_io *io, int error)
bio->bi_end_io = io->orig_bi_end_io;
bio->bi_private = io->orig_bi_private;

- if (io->io_vec != io->io_vec_inline)
- mempool_free(io->io_vec, v->vec_mempool);
-
bio_endio(bio, error);
}

@@ -520,13 +502,7 @@ static int verity_map(struct dm_target *ti, struct bio *bio)

bio->bi_end_io = verity_end_io;
bio->bi_private = io;
- io->io_vec_size = bio_segments(bio);
- if (io->io_vec_size < DM_VERITY_IO_VEC_INLINE)
- io->io_vec = io->io_vec_inline;
- else
- io->io_vec = mempool_alloc(v->vec_mempool, GFP_NOIO);
- memcpy(io->io_vec, __bio_iovec(bio),
- io->io_vec_size * sizeof(struct bio_vec));
+ io->iter = bio->bi_iter;

verity_submit_prefetch(v, io);

diff --git a/include/linux/dm-io.h b/include/linux/dm-io.h
index f4b0aa3..6cf1f62 100644
--- a/include/linux/dm-io.h
+++ b/include/linux/dm-io.h
@@ -29,7 +29,7 @@ typedef void (*io_notify_fn)(unsigned long error, void *context);

enum dm_io_mem_type {
DM_IO_PAGE_LIST,/* Page list */
- DM_IO_BVEC, /* Bio vector */
+ DM_IO_BIO,
DM_IO_VMA, /* Virtual memory area */
DM_IO_KMEM, /* Kernel memory */
};
@@ -41,7 +41,7 @@ struct dm_io_memory {

union {
struct page_list *pl;
- struct bio_vec *bvec;
+ struct bio *bio;
void *vma;
void *addr;
} ptr;
--
1.8.3.rc1

2013-06-09 02:19:53

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 05/26] block: Convert bio_iovec() to bvec_iter

For immutable biovecs, we'll be introducing a new bio_iovec() that uses
our new bvec iterator to construct a biovec, taking into account
bvec_iter->bi_bvec_done - this patch updates existing users for the new
usage.

Some of the existing users really do need a pointer into the bvec array
- those uses are all going to be removed, but we'll need the
functionality from immutable to remove them - so for now rename the
existing bio_iovec() -> __bio_iovec(), and it'll be removed in a couple
patches.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: "Ed L. Cashin" <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: [email protected]
Cc: "James E.J. Bottomley" <[email protected]>
---
drivers/block/aoe/aoecmd.c | 2 +-
drivers/md/bcache/io.c | 13 +++++++------
drivers/md/dm-verity.c | 2 +-
drivers/scsi/sd.c | 2 +-
fs/bio.c | 20 ++++++++++----------
include/linux/bio.h | 10 ++++++----
6 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 8716181..1fb2d6d 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -926,7 +926,7 @@ bufinit(struct buf *buf, struct request *rq, struct bio *bio)
buf->resid = bio->bi_iter.bi_size;
buf->sector = bio->bi_iter.bi_sector;
bio_pageinc(bio);
- buf->bv = bio_iovec(bio);
+ buf->bv = __bio_iovec(bio);
buf->bv_resid = buf->bv->bv_len;
WARN_ON(buf->bv_resid == 0);
}
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 28f06ca..13580e5 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -20,11 +20,12 @@ static void bch_bi_idx_hack_endio(struct bio *bio, int error)
static void bch_generic_make_request_hack(struct bio *bio)
{
if (bio->bi_iter.bi_idx) {
+ int i;
+ struct bio_vec *bv;
struct bio *clone = bio_alloc(GFP_NOIO, bio_segments(bio));

- memcpy(clone->bi_io_vec,
- bio_iovec(bio),
- bio_segments(bio) * sizeof(struct bio_vec));
+ bio_for_each_segment(bv, bio, i)
+ clone->bi_io_vec[clone->bi_vcnt++] = *bv;

clone->bi_iter.bi_sector = bio->bi_iter.bi_sector;
clone->bi_bdev = bio->bi_bdev;
@@ -93,7 +94,7 @@ struct bio *bch_bio_split(struct bio *bio, int sectors,
if (!ret)
return NULL;

- memcpy(ret->bi_io_vec, bio_iovec(bio),
+ memcpy(ret->bi_io_vec, __bio_iovec(bio),
sizeof(struct bio_vec) * vcnt);

break;
@@ -102,7 +103,7 @@ struct bio *bch_bio_split(struct bio *bio, int sectors,
if (!ret)
return NULL;

- memcpy(ret->bi_io_vec, bio_iovec(bio),
+ memcpy(ret->bi_io_vec, __bio_iovec(bio),
sizeof(struct bio_vec) * vcnt);

ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
@@ -178,7 +179,7 @@ static unsigned bch_bio_max_sectors(struct bio *bio)
ret = min(ret, queue_max_sectors(q));

WARN_ON(!ret);
- ret = max_t(int, ret, bio_iovec(bio)->bv_len >> 9);
+ ret = max_t(int, ret, bio_iovec(bio).bv_len >> 9);

return ret;
}
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index 90ce9e0..f3a4dcb 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -525,7 +525,7 @@ static int verity_map(struct dm_target *ti, struct bio *bio)
io->io_vec = io->io_vec_inline;
else
io->io_vec = mempool_alloc(v->vec_mempool, GFP_NOIO);
- memcpy(io->io_vec, bio_iovec(bio),
+ memcpy(io->io_vec, __bio_iovec(bio),
io->io_vec_size * sizeof(struct bio_vec));

verity_submit_prefetch(v, io);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c1c5552..73aec6a 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -786,7 +786,7 @@ static int sd_setup_write_same_cmnd(struct scsi_device *sdp, struct request *rq)
if (sdkp->device->no_write_same)
return BLKPREP_KILL;

- BUG_ON(bio_offset(bio) || bio_iovec(bio)->bv_len != sdp->sector_size);
+ BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);

sector >>= ilog2(sdp->sector_size) - 9;
nr_sectors >>= ilog2(sdp->sector_size) - 9;
diff --git a/fs/bio.c b/fs/bio.c
index b7c02b0..f1e7c68 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -821,12 +821,12 @@ void bio_advance(struct bio *bio, unsigned bytes)
break;
}

- if (bytes >= bio_iovec(bio)->bv_len) {
- bytes -= bio_iovec(bio)->bv_len;
+ if (bytes >= bio_iovec(bio).bv_len) {
+ bytes -= bio_iovec(bio).bv_len;
bio->bi_iter.bi_idx++;
} else {
- bio_iovec(bio)->bv_len -= bytes;
- bio_iovec(bio)->bv_offset += bytes;
+ bio_iovec(bio).bv_len -= bytes;
+ bio_iovec(bio).bv_offset += bytes;
bytes = 0;
}
}
@@ -879,8 +879,8 @@ void bio_copy_data(struct bio *dst, struct bio *src)
unsigned src_offset, dst_offset, bytes;
void *src_p, *dst_p;

- src_bv = bio_iovec(src);
- dst_bv = bio_iovec(dst);
+ src_bv = __bio_iovec(src);
+ dst_bv = __bio_iovec(dst);

src_offset = src_bv->bv_offset;
dst_offset = dst_bv->bv_offset;
@@ -893,7 +893,7 @@ void bio_copy_data(struct bio *dst, struct bio *src)
if (!src)
break;

- src_bv = bio_iovec(src);
+ src_bv = __bio_iovec(src);
}

src_offset = src_bv->bv_offset;
@@ -906,7 +906,7 @@ void bio_copy_data(struct bio *dst, struct bio *src)
if (!dst)
break;

- dst_bv = bio_iovec(dst);
+ dst_bv = __bio_iovec(dst);
}

dst_offset = dst_bv->bv_offset;
@@ -1766,8 +1766,8 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)
bp->bio1.bi_iter.bi_size = first_sectors << 9;

if (bi->bi_vcnt != 0) {
- bp->bv1 = *bio_iovec(bi);
- bp->bv2 = *bio_iovec(bi);
+ bp->bv1 = bio_iovec(bi);
+ bp->bv2 = bio_iovec(bi);

if (bio_is_rw(bi)) {
bp->bv2.bv_offset += first_sectors << 9;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d321e63..580c9ae 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -62,9 +62,11 @@
* on highmem page vectors
*/
#define bio_iovec_idx(bio, idx) (&((bio)->bi_io_vec[(idx)]))
-#define bio_iovec(bio) bio_iovec_idx((bio), (bio)->bi_iter.bi_idx)
-#define bio_page(bio) bio_iovec((bio))->bv_page
-#define bio_offset(bio) bio_iovec((bio))->bv_offset
+#define __bio_iovec(bio) bio_iovec_idx((bio), (bio)->bi_iter.bi_idx)
+#define bio_iovec(bio) (*__bio_iovec(bio))
+
+#define bio_page(bio) (bio_iovec((bio)).bv_page)
+#define bio_offset(bio) (bio_iovec((bio)).bv_offset)
#define bio_segments(bio) ((bio)->bi_vcnt - (bio)->bi_iter.bi_idx)
#define bio_sectors(bio) ((bio)->bi_iter.bi_size >> 9)
#define bio_end_sector(bio) ((bio)->bi_iter.bi_sector + bio_sectors((bio)))
@@ -72,7 +74,7 @@
static inline unsigned int bio_cur_bytes(struct bio *bio)
{
if (bio->bi_vcnt)
- return bio_iovec(bio)->bv_len;
+ return bio_iovec(bio).bv_len;
else /* dataless requests such as discard */
return bio->bi_iter.bi_size;
}
--
1.8.3.rc1

2013-06-09 02:25:17

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 09/26] bio-integrity: Convert to bvec_iter

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
---
block/blk-integrity.c | 40 ++++++++++---------
drivers/scsi/sd_dif.c | 30 +++++++-------
fs/bio-integrity.c | 108 ++++++++++++--------------------------------------
include/linux/bio.h | 19 ++++-----
4 files changed, 71 insertions(+), 126 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index 03cf717..861fcae 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -43,30 +43,32 @@ static const char *bi_unsupported_name = "unsupported";
*/
int blk_rq_count_integrity_sg(struct request_queue *q, struct bio *bio)
{
- struct bio_vec *iv, *ivprv = NULL;
+ struct bio_vec iv, ivprv;
unsigned int segments = 0;
unsigned int seg_size = 0;
- unsigned int i = 0;
+ struct bvec_iter iter;
+ int prev = 0;

- bio_for_each_integrity_vec(iv, bio, i) {
+ bio_for_each_integrity_vec(iv, bio, iter) {

- if (ivprv) {
- if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))
+ if (prev) {
+ if (!BIOVEC_PHYS_MERGEABLE(&ivprv, &iv))
goto new_segment;

- if (!BIOVEC_SEG_BOUNDARY(q, ivprv, iv))
+ if (!BIOVEC_SEG_BOUNDARY(q, &ivprv, &iv))
goto new_segment;

- if (seg_size + iv->bv_len > queue_max_segment_size(q))
+ if (seg_size + iv.bv_len > queue_max_segment_size(q))
goto new_segment;

- seg_size += iv->bv_len;
+ seg_size += iv.bv_len;
} else {
new_segment:
segments++;
- seg_size = iv->bv_len;
+ seg_size = iv.bv_len;
}

+ prev = 1;
ivprv = iv;
}

@@ -87,24 +89,25 @@ EXPORT_SYMBOL(blk_rq_count_integrity_sg);
int blk_rq_map_integrity_sg(struct request_queue *q, struct bio *bio,
struct scatterlist *sglist)
{
- struct bio_vec *iv, *ivprv = NULL;
+ struct bio_vec iv, ivprv;
struct scatterlist *sg = NULL;
unsigned int segments = 0;
- unsigned int i = 0;
+ struct bvec_iter iter;
+ int prev = 0;

- bio_for_each_integrity_vec(iv, bio, i) {
+ bio_for_each_integrity_vec(iv, bio, iter) {

- if (ivprv) {
- if (!BIOVEC_PHYS_MERGEABLE(ivprv, iv))
+ if (prev) {
+ if (!BIOVEC_PHYS_MERGEABLE(&ivprv, &iv))
goto new_segment;

- if (!BIOVEC_SEG_BOUNDARY(q, ivprv, iv))
+ if (!BIOVEC_SEG_BOUNDARY(q, &ivprv, &iv))
goto new_segment;

- if (sg->length + iv->bv_len > queue_max_segment_size(q))
+ if (sg->length + iv.bv_len > queue_max_segment_size(q))
goto new_segment;

- sg->length += iv->bv_len;
+ sg->length += iv.bv_len;
} else {
new_segment:
if (!sg)
@@ -114,10 +117,11 @@ new_segment:
sg = sg_next(sg);
}

- sg_set_page(sg, iv->bv_page, iv->bv_len, iv->bv_offset);
+ sg_set_page(sg, iv.bv_page, iv.bv_len, iv.bv_offset);
segments++;
}

+ prev = 1;
ivprv = iv;
}

diff --git a/drivers/scsi/sd_dif.c b/drivers/scsi/sd_dif.c
index 6174ca4..a7a691d 100644
--- a/drivers/scsi/sd_dif.c
+++ b/drivers/scsi/sd_dif.c
@@ -365,7 +365,6 @@ void sd_dif_prepare(struct request *rq, sector_t hw_sector,
struct bio *bio;
struct scsi_disk *sdkp;
struct sd_dif_tuple *sdt;
- unsigned int i, j;
u32 phys, virt;

sdkp = rq->bio->bi_bdev->bd_disk->private_data;
@@ -376,19 +375,21 @@ void sd_dif_prepare(struct request *rq, sector_t hw_sector,
phys = hw_sector & 0xffffffff;

__rq_for_each_bio(bio, rq) {
- struct bio_vec *iv;
+ struct bio_vec iv;
+ struct bvec_iter iter;
+ unsigned int j;

/* Already remapped? */
if (bio_flagged(bio, BIO_MAPPED_INTEGRITY))
break;

- virt = bio->bi_integrity->bip_sector & 0xffffffff;
+ virt = bio->bi_integrity->bip_iter.bi_sector & 0xffffffff;

- bip_for_each_vec(iv, bio->bi_integrity, i) {
- sdt = kmap_atomic(iv->bv_page)
- + iv->bv_offset;
+ bip_for_each_vec(iv, bio->bi_integrity, iter) {
+ sdt = kmap_atomic(iv.bv_page)
+ + iv.bv_offset;

- for (j = 0 ; j < iv->bv_len ; j += tuple_sz, sdt++) {
+ for (j = 0; j < iv.bv_len; j += tuple_sz, sdt++) {

if (be32_to_cpu(sdt->ref_tag) == virt)
sdt->ref_tag = cpu_to_be32(phys);
@@ -414,7 +415,7 @@ void sd_dif_complete(struct scsi_cmnd *scmd, unsigned int good_bytes)
struct scsi_disk *sdkp;
struct bio *bio;
struct sd_dif_tuple *sdt;
- unsigned int i, j, sectors, sector_sz;
+ unsigned int j, sectors, sector_sz;
u32 phys, virt;

sdkp = scsi_disk(scmd->request->rq_disk);
@@ -430,15 +431,16 @@ void sd_dif_complete(struct scsi_cmnd *scmd, unsigned int good_bytes)
phys >>= 3;

__rq_for_each_bio(bio, scmd->request) {
- struct bio_vec *iv;
+ struct bio_vec iv;
+ struct bvec_iter iter;

- virt = bio->bi_integrity->bip_sector & 0xffffffff;
+ virt = bio->bi_integrity->bip_iter.bi_sector & 0xffffffff;

- bip_for_each_vec(iv, bio->bi_integrity, i) {
- sdt = kmap_atomic(iv->bv_page)
- + iv->bv_offset;
+ bip_for_each_vec(iv, bio->bi_integrity, iter) {
+ sdt = kmap_atomic(iv.bv_page)
+ + iv.bv_offset;

- for (j = 0 ; j < iv->bv_len ; j += tuple_sz, sdt++) {
+ for (j = 0; j < iv.bv_len; j += tuple_sz, sdt++) {

if (sectors == 0) {
kunmap_atomic(sdt);
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index 4220b96..61f41ff 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -134,8 +134,7 @@ int bio_integrity_add_page(struct bio *bio, struct page *page,
return 0;
}

- iv = bip_vec_idx(bip, bip->bip_vcnt);
- BUG_ON(iv == NULL);
+ iv = bip->bip_vec + bip->bip_vcnt;

iv->bv_page = page;
iv->bv_len = len;
@@ -203,6 +202,12 @@ static inline unsigned int bio_integrity_hw_sectors(struct blk_integrity *bi,
return sectors;
}

+static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi,
+ unsigned int sectors)
+{
+ return bio_integrity_hw_sectors(bi, sectors) * bi->tuple_size;
+}
+
/**
* bio_integrity_tag_size - Retrieve integrity tag space
* @bio: bio to inspect
@@ -235,9 +240,9 @@ int bio_integrity_tag(struct bio *bio, void *tag_buf, unsigned int len, int set)
nr_sectors = bio_integrity_hw_sectors(bi,
DIV_ROUND_UP(len, bi->tag_size));

- if (nr_sectors * bi->tuple_size > bip->bip_size) {
- printk(KERN_ERR "%s: tag too big for bio: %u > %u\n",
- __func__, nr_sectors * bi->tuple_size, bip->bip_size);
+ if (nr_sectors * bi->tuple_size > bip->bip_iter.bi_size) {
+ printk(KERN_ERR "%s: tag too big for bio: %u > %u\n", __func__,
+ nr_sectors * bi->tuple_size, bip->bip_iter.bi_size);
return -1;
}

@@ -322,7 +327,7 @@ static void bio_integrity_generate(struct bio *bio)
sector += sectors;
prot_buf += sectors * bi->tuple_size;
total += sectors * bi->tuple_size;
- BUG_ON(total > bio->bi_integrity->bip_size);
+ BUG_ON(total > bio->bi_integrity->bip_iter.bi_size);

kunmap_atomic(kaddr);
}
@@ -387,8 +392,8 @@ int bio_integrity_prep(struct bio *bio)

bip->bip_owns_buf = 1;
bip->bip_buf = buf;
- bip->bip_size = len;
- bip->bip_sector = bio->bi_iter.bi_sector;
+ bip->bip_iter.bi_size = len;
+ bip->bip_iter.bi_sector = bio->bi_iter.bi_sector;

/* Map it */
offset = offset_in_page(buf);
@@ -444,7 +449,7 @@ static int bio_integrity_verify(struct bio *bio)
struct blk_integrity_exchg bix;
struct bio_vec bv;
struct bvec_iter iter;
- sector_t sector = bio->bi_integrity->bip_sector;
+ sector_t sector = bio->bi_integrity->bip_iter.bi_sector;
unsigned int sectors, total, ret;
void *prot_buf = bio->bi_integrity->bip_buf;

@@ -470,7 +475,7 @@ static int bio_integrity_verify(struct bio *bio)
sector += sectors;
prot_buf += sectors * bi->tuple_size;
total += sectors * bi->tuple_size;
- BUG_ON(total > bio->bi_integrity->bip_size);
+ BUG_ON(total > bio->bi_integrity->bip_iter.bi_size);

kunmap_atomic(kaddr);
}
@@ -535,56 +540,6 @@ void bio_integrity_endio(struct bio *bio, int error)
EXPORT_SYMBOL(bio_integrity_endio);

/**
- * bio_integrity_mark_head - Advance bip_vec skip bytes
- * @bip: Integrity vector to advance
- * @skip: Number of bytes to advance it
- */
-void bio_integrity_mark_head(struct bio_integrity_payload *bip,
- unsigned int skip)
-{
- struct bio_vec *iv;
- unsigned int i;
-
- bip_for_each_vec(iv, bip, i) {
- if (skip == 0) {
- bip->bip_idx = i;
- return;
- } else if (skip >= iv->bv_len) {
- skip -= iv->bv_len;
- } else { /* skip < iv->bv_len) */
- iv->bv_offset += skip;
- iv->bv_len -= skip;
- bip->bip_idx = i;
- return;
- }
- }
-}
-
-/**
- * bio_integrity_mark_tail - Truncate bip_vec to be len bytes long
- * @bip: Integrity vector to truncate
- * @len: New length of integrity vector
- */
-void bio_integrity_mark_tail(struct bio_integrity_payload *bip,
- unsigned int len)
-{
- struct bio_vec *iv;
- unsigned int i;
-
- bip_for_each_vec(iv, bip, i) {
- if (len == 0) {
- bip->bip_vcnt = i;
- return;
- } else if (len >= iv->bv_len) {
- len -= iv->bv_len;
- } else { /* len < iv->bv_len) */
- iv->bv_len = len;
- len = 0;
- }
- }
-}
-
-/**
* bio_integrity_advance - Advance integrity vector
* @bio: bio whose integrity vector to update
* @bytes_done: number of data bytes that have been completed
@@ -597,13 +552,9 @@ void bio_integrity_advance(struct bio *bio, unsigned int bytes_done)
{
struct bio_integrity_payload *bip = bio->bi_integrity;
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
- unsigned int nr_sectors;
+ unsigned bytes = bio_integrity_bytes(bi, bytes_done >> 9);

- BUG_ON(bip == NULL);
- BUG_ON(bi == NULL);
-
- nr_sectors = bio_integrity_hw_sectors(bi, bytes_done >> 9);
- bio_integrity_mark_head(bip, nr_sectors * bi->tuple_size);
+ bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
}
EXPORT_SYMBOL(bio_integrity_advance);

@@ -623,16 +574,9 @@ void bio_integrity_trim(struct bio *bio, unsigned int offset,
{
struct bio_integrity_payload *bip = bio->bi_integrity;
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
- unsigned int nr_sectors;

- BUG_ON(bip == NULL);
- BUG_ON(bi == NULL);
- BUG_ON(!bio_flagged(bio, BIO_CLONED));
-
- nr_sectors = bio_integrity_hw_sectors(bi, sectors);
- bip->bip_sector = bip->bip_sector + offset;
- bio_integrity_mark_head(bip, offset * bi->tuple_size);
- bio_integrity_mark_tail(bip, sectors * bi->tuple_size);
+ bio_integrity_advance(bio, offset << 9);
+ bip->bip_iter.bi_size = bio_integrity_bytes(bi, sectors);
}
EXPORT_SYMBOL(bio_integrity_trim);

@@ -662,8 +606,8 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
bp->bio1.bi_integrity = &bp->bip1;
bp->bio2.bi_integrity = &bp->bip2;

- bp->iv1 = bip->bip_vec[bip->bip_idx];
- bp->iv2 = bip->bip_vec[bip->bip_idx];
+ bp->iv1 = bip->bip_vec[bip->bip_iter.bi_idx];
+ bp->iv2 = bip->bip_vec[bip->bip_iter.bi_idx];

bp->bip1.bip_vec = &bp->iv1;
bp->bip2.bip_vec = &bp->iv2;
@@ -672,11 +616,12 @@ void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
bp->iv2.bv_offset += sectors * bi->tuple_size;
bp->iv2.bv_len -= sectors * bi->tuple_size;

- bp->bip1.bip_sector = bio->bi_integrity->bip_sector;
- bp->bip2.bip_sector = bio->bi_integrity->bip_sector + nr_sectors;
+ bp->bip1.bip_iter.bi_sector = bio->bi_integrity->bip_iter.bi_sector;
+ bp->bip2.bip_iter.bi_sector =
+ bio->bi_integrity->bip_iter.bi_sector + nr_sectors;

bp->bip1.bip_vcnt = bp->bip2.bip_vcnt = 1;
- bp->bip1.bip_idx = bp->bip2.bip_idx = 0;
+ bp->bip1.bip_iter.bi_idx = bp->bip2.bip_iter.bi_idx = 0;
}
EXPORT_SYMBOL(bio_integrity_split);

@@ -704,9 +649,8 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
memcpy(bip->bip_vec, bip_src->bip_vec,
bip_src->bip_vcnt * sizeof(struct bio_vec));

- bip->bip_sector = bip_src->bip_sector;
bip->bip_vcnt = bip_src->bip_vcnt;
- bip->bip_idx = bip_src->bip_idx;
+ bip->bip_iter = bip_src->bip_iter;

return 0;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 3c194bc..62c7293 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -253,16 +253,15 @@ static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
struct bio_integrity_payload {
struct bio *bip_bio; /* parent bio */

- sector_t bip_sector; /* virtual start sector */
+ struct bvec_iter bip_iter;

+ /* kill - should just use bip_vec */
void *bip_buf; /* generated integrity data */
- bio_end_io_t *bip_end_io; /* saved I/O completion fn */

- unsigned int bip_size;
+ bio_end_io_t *bip_end_io; /* saved I/O completion fn */

unsigned short bip_slab; /* slab the bip came from */
unsigned short bip_vcnt; /* # of integrity bio_vecs */
- unsigned short bip_idx; /* current bip_vec index */
unsigned bip_owns_buf:1; /* should free bip_buf */

struct work_struct bip_work; /* I/O completion */
@@ -621,16 +620,12 @@ struct biovec_slab {

#if defined(CONFIG_BLK_DEV_INTEGRITY)

-#define bip_vec_idx(bip, idx) (&(bip->bip_vec[(idx)]))
-#define bip_vec(bip) bip_vec_idx(bip, 0)

-#define __bip_for_each_vec(bvl, bip, i, start_idx) \
- for (bvl = bip_vec_idx((bip), (start_idx)), i = (start_idx); \
- i < (bip)->bip_vcnt; \
- bvl++, i++)

-#define bip_for_each_vec(bvl, bip, i) \
- __bip_for_each_vec(bvl, bip, i, (bip)->bip_idx)
+#define bip_vec_idx(bip, idx) (&(bip->bip_vec[(idx)]))
+
+#define bip_for_each_vec(bvl, bip, iter) \
+ for_each_bvec(bvl, (bip)->bip_vec, iter, (bip)->bip_iter)

#define bio_for_each_integrity_vec(_bvl, _bio, _iter) \
for_each_bio(_bio) \
--
1.8.3.rc1

2013-06-09 02:25:53

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 08/26] block: Convert bio_copy_data() to bvec_iter

Our fancy new bvec iterator makes code like this much easier to write.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
---
fs/bio.c | 60 +++++++++++++++++++++++++-----------------------------------
1 file changed, 25 insertions(+), 35 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 92a92bc..d4200f4 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -852,58 +852,48 @@ EXPORT_SYMBOL(bio_alloc_pages);
*/
void bio_copy_data(struct bio *dst, struct bio *src)
{
- struct bio_vec *src_bv, *dst_bv;
- unsigned src_offset, dst_offset, bytes;
+ struct bvec_iter src_iter, dst_iter;
+ struct bio_vec src_bv, dst_bv;
void *src_p, *dst_p;
+ unsigned bytes;

- src_bv = __bio_iovec(src);
- dst_bv = __bio_iovec(dst);
-
- src_offset = src_bv->bv_offset;
- dst_offset = dst_bv->bv_offset;
+ src_iter = src->bi_iter;
+ dst_iter = dst->bi_iter;

while (1) {
- if (src_offset == src_bv->bv_offset + src_bv->bv_len) {
- src_bv++;
- if (src_bv == bio_iovec_idx(src, src->bi_vcnt)) {
- src = src->bi_next;
- if (!src)
- break;
-
- src_bv = __bio_iovec(src);
- }
+ if (!src_iter.bi_size) {
+ src = src->bi_next;
+ if (!src)
+ break;

- src_offset = src_bv->bv_offset;
+ src_iter = src->bi_iter;
}

- if (dst_offset == dst_bv->bv_offset + dst_bv->bv_len) {
- dst_bv++;
- if (dst_bv == bio_iovec_idx(dst, dst->bi_vcnt)) {
- dst = dst->bi_next;
- if (!dst)
- break;
-
- dst_bv = __bio_iovec(dst);
- }
+ if (!dst_iter.bi_size) {
+ dst = dst->bi_next;
+ if (!dst)
+ break;

- dst_offset = dst_bv->bv_offset;
+ dst_iter = dst->bi_iter;
}

- bytes = min(dst_bv->bv_offset + dst_bv->bv_len - dst_offset,
- src_bv->bv_offset + src_bv->bv_len - src_offset);
+ src_bv = bio_iovec_iter(src, src_iter);
+ dst_bv = bio_iovec_iter(dst, dst_iter);
+
+ bytes = min(src_bv.bv_len, dst_bv.bv_len);

- src_p = kmap_atomic(src_bv->bv_page);
- dst_p = kmap_atomic(dst_bv->bv_page);
+ src_p = kmap_atomic(src_bv.bv_page);
+ dst_p = kmap_atomic(dst_bv.bv_page);

- memcpy(dst_p + dst_bv->bv_offset,
- src_p + src_bv->bv_offset,
+ memcpy(dst_p + dst_bv.bv_offset,
+ src_p + src_bv.bv_offset,
bytes);

kunmap_atomic(dst_p);
kunmap_atomic(src_p);

- src_offset += bytes;
- dst_offset += bytes;
+ bio_advance_iter(src, &src_iter, bytes);
+ bio_advance_iter(dst, &dst_iter, bytes);
}
}
EXPORT_SYMBOL(bio_copy_data);
--
1.8.3.rc1

2013-06-09 02:26:11

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 07/26] block: Immutable bio vecs

This adds a mechanism by which we can advance a bio by an arbitrary
number of bytes without modifying the biovec: bio->bi_iter.bi_bvec_done
indicates the number of bytes completed in the current bvec.

Various driver code still needs to be updated to not refer to the bvec
directly before we can use this for interesting things, like efficient
bio splitting.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Lars Ellenberg <[email protected]>
Cc: Paul Clements <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
drivers/block/drbd/drbd_main.c | 4 +-
drivers/block/nbd.c | 2 +-
fs/bio.c | 27 +----------
include/linux/bio.h | 108 ++++++++++++++++++++++++++++++++---------
include/linux/blk_types.h | 2 +
include/linux/blkdev.h | 4 +-
6 files changed, 95 insertions(+), 52 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 30b0f91..7309d81 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1546,7 +1546,7 @@ static int _drbd_send_bio(struct drbd_conf *mdev, struct bio *bio)

err = _drbd_no_send_page(mdev, bvec.bv_page,
bvec.bv_offset, bvec.bv_len,
- bio_iter_last(bio, iter)
+ bio_iter_last(bvec, iter)
? 0 : MSG_MORE);
if (err)
return err;
@@ -1565,7 +1565,7 @@ static int _drbd_send_zc_bio(struct drbd_conf *mdev, struct bio *bio)

err = _drbd_send_page(mdev, bvec.bv_page,
bvec.bv_offset, bvec.bv_len,
- bio_iter_last(bio, iter) ? 0 : MSG_MORE);
+ bio_iter_last(bvec, iter) ? 0 : MSG_MORE);
if (err)
return err;
}
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index b446f50..3b7e5ca 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -278,7 +278,7 @@ static int nbd_send_req(struct nbd_device *nbd, struct request *req)
*/
rq_for_each_segment(bvec, req, iter) {
flags = 0;
- if (!rq_iter_last(req, iter))
+ if (!rq_iter_last(bvec, iter))
flags = MSG_MORE;
dprintk(DBG_TX, "%s: request %p: sending %d bytes data\n",
nbd->disk->disk_name, req, bvec.bv_len);
diff --git a/fs/bio.c b/fs/bio.c
index 018e3a8..92a92bc 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -532,13 +532,11 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
* most users will be overriding ->bi_bdev with a new target,
* so we don't set nor calculate new physical/hw segment counts here
*/
- bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector;
bio->bi_bdev = bio_src->bi_bdev;
bio->bi_flags |= 1 << BIO_CLONED;
bio->bi_rw = bio_src->bi_rw;
bio->bi_vcnt = bio_src->bi_vcnt;
- bio->bi_iter.bi_size = bio_src->bi_iter.bi_size;
- bio->bi_iter.bi_idx = bio_src->bi_iter.bi_idx;
+ bio->bi_iter = bio_src->bi_iter;
}
EXPORT_SYMBOL(__bio_clone);

@@ -808,28 +806,7 @@ void bio_advance(struct bio *bio, unsigned bytes)
if (bio_integrity(bio))
bio_integrity_advance(bio, bytes);

- bio->bi_iter.bi_sector += bytes >> 9;
- bio->bi_iter.bi_size -= bytes;
-
- if (bio->bi_rw & BIO_NO_ADVANCE_ITER_MASK)
- return;
-
- while (bytes) {
- if (unlikely(bio->bi_iter.bi_idx >= bio->bi_vcnt)) {
- WARN_ONCE(1, "bio idx %d >= vcnt %d\n",
- bio->bi_iter.bi_idx, bio->bi_vcnt);
- break;
- }
-
- if (bytes >= bio_iovec(bio).bv_len) {
- bytes -= bio_iovec(bio).bv_len;
- bio->bi_iter.bi_idx++;
- } else {
- bio_iovec(bio).bv_len -= bytes;
- bio_iovec(bio).bv_offset += bytes;
- bytes = 0;
- }
- }
+ bio_advance_iter(bio, &bio->bi_iter, bytes);
}
EXPORT_SYMBOL(bio_advance);

diff --git a/include/linux/bio.h b/include/linux/bio.h
index a31bcd2..3c194bc 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -61,21 +61,58 @@
* various member access, note that bio_data should of course not be used
* on highmem page vectors
*/
-#define bio_iovec_iter(bio, iter) ((bio)->bi_io_vec[(iter).bi_idx])
-
#define bio_iovec_idx(bio, idx) (&((bio)->bi_io_vec[(idx)]))
#define __bio_iovec(bio) bio_iovec_idx((bio), (bio)->bi_iter.bi_idx)
-#define bio_iovec(bio) (*__bio_iovec(bio))

-#define bio_page(bio) (bio_iovec((bio)).bv_page)
-#define bio_offset(bio) (bio_iovec((bio)).bv_offset)
+#define __bvec_iter_bvec(bvec, iter) (&(bvec)[(iter).bi_idx])
+
+#define bvec_iter_page(bvec, iter) \
+ (__bvec_iter_bvec((bvec), (iter))->bv_page)
+#define bvec_iter_len(bio, iter) \
+ min((iter).bi_size, \
+ __bvec_iter_bvec((bio), (iter))->bv_len - (iter).bi_bvec_done)
+#define bvec_iter_offset(bio, iter) \
+ (__bvec_iter_bvec((bio), (iter))->bv_offset + (iter).bi_bvec_done)
+
+#define bvec_iter_bvec(bvec, iter) \
+((struct bio_vec) { \
+ .bv_page = bvec_iter_page((bvec), (iter)), \
+ .bv_len = bvec_iter_len((bvec), (iter)), \
+ .bv_offset = bvec_iter_offset((bvec), (iter)), \
+})
+
+
+#define bio_iovec_iter(bio, iter) \
+ bvec_iter_bvec((bio)->bi_io_vec, (iter))
+#define bio_page_iter(bio, iter) \
+ bvec_iter_page((bio)->bi_io_vec, (iter))
+#define bio_offset_iter(bio, iter) \
+ bvec_iter_offset((bio)->bi_io_vec, (iter))
+
+#define bio_page(bio) bio_page_iter((bio), (bio)->bi_iter)
+#define bio_offset(bio) bio_offset_iter((bio), (bio)->bi_iter)
+#define bio_iovec(bio) bio_iovec_iter((bio), (bio)->bi_iter)
+
#define bio_segments(bio) ((bio)->bi_vcnt - (bio)->bi_iter.bi_idx)
#define bio_sectors(bio) ((bio)->bi_iter.bi_size >> 9)
#define bio_end_sector(bio) ((bio)->bi_iter.bi_sector + bio_sectors((bio)))

+/*
+ * Check whether this bio carries any data or not. A NULL bio is allowed.
+ */
+static inline bool bio_has_data(struct bio *bio)
+{
+ if (bio &&
+ bio->bi_iter.bi_size &&
+ !(bio->bi_rw & BIO_NO_ADVANCE_ITER_MASK))
+ return true;
+
+ return false;
+}
+
static inline unsigned int bio_cur_bytes(struct bio *bio)
{
- if (bio->bi_vcnt)
+ if (bio_has_data(bio))
return bio_iovec(bio).bv_len;
else /* dataless requests such as discard */
return bio->bi_iter.bi_size;
@@ -83,7 +120,7 @@ static inline unsigned int bio_cur_bytes(struct bio *bio)

static inline void *bio_data(struct bio *bio)
{
- if (bio->bi_vcnt)
+ if (bio_has_data(bio))
return page_address(bio_page(bio)) + bio_offset(bio);

return NULL;
@@ -144,16 +181,54 @@ static inline void *bio_data(struct bio *bio)
bvl = bio_iovec_idx((bio), (i)), i < (bio)->bi_vcnt; \
i++)

+static inline void bvec_iter_advance(struct bio_vec *bv, struct bvec_iter *iter,
+ unsigned bytes)
+{
+ WARN_ONCE(bytes > iter->bi_size,
+ "Attempted to advance past end of bvec iter\n");
+
+ while (bytes) {
+ unsigned len = min(bytes, bvec_iter_len(bv, *iter));
+
+ bytes -= len;
+ iter->bi_size -= len;
+ iter->bi_bvec_done += len;
+
+ if (iter->bi_bvec_done == __bvec_iter_bvec(bv, *iter)->bv_len) {
+ iter->bi_bvec_done = 0;
+ iter->bi_idx++;
+ }
+ }
+}
+
+#define for_each_bvec(bvl, bio_vec, iter, start) \
+ for ((iter) = start; \
+ (bvl) = bvec_iter_bvec((bio_vec), (iter)), \
+ (iter).bi_size; \
+ bvec_iter_advance((bio_vec), &(iter), (bvl).bv_len))
+
+
+static inline void bio_advance_iter(struct bio *bio, struct bvec_iter *iter,
+ unsigned bytes)
+{
+ iter->bi_sector += bytes >> 9;
+
+ if (bio->bi_rw & BIO_NO_ADVANCE_ITER_MASK)
+ iter->bi_size -= bytes;
+ else
+ bvec_iter_advance(bio->bi_io_vec, iter, bytes);
+}
+
#define __bio_for_each_segment(bvl, bio, iter, start) \
for (iter = (start); \
- bvl = bio_iovec_iter((bio), (iter)), \
- (iter).bi_idx < (bio)->bi_vcnt; \
- (iter).bi_idx++)
+ (iter).bi_size && \
+ ((bvl = bio_iovec_iter((bio), (iter))), 1); \
+ bio_advance_iter((bio), &(iter), (bvl).bv_len))

#define bio_for_each_segment(bvl, bio, iter) \
__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)

-#define bio_iter_last(bio, iter) ((iter).bi_idx == (bio)->bi_vcnt - 1)
+#define bio_iter_last(bvec, iter) ((iter).bi_size == (bvec).bv_len)

/*
* get a reference to a bio, so it won't disappear. the intended use is
@@ -368,17 +443,6 @@ static inline char *__bio_kmap_irq(struct bio *bio, unsigned short idx,
__bio_kmap_irq((bio), (bio)->bi_iter.bi_idx, (flags))
#define bio_kunmap_irq(buf,flags) __bio_kunmap_irq(buf, flags)

-/*
- * Check whether this bio carries any data or not. A NULL bio is allowed.
- */
-static inline bool bio_has_data(struct bio *bio)
-{
- if (bio && bio->bi_vcnt)
- return true;
-
- return false;
-}
-
static inline bool bio_is_rw(struct bio *bio)
{
if (!bio_has_data(bio))
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d46e8a6..72f1274 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -34,6 +34,8 @@ struct bvec_iter {
unsigned int bi_size; /* residual I/O count */

unsigned int bi_idx; /* current index into bvl_vec */
+
+ unsigned int bi_bvec_done;
};

/*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1b9d47b..2a16de2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -714,9 +714,9 @@ struct req_iterator {
__rq_for_each_bio(_iter.bio, _rq) \
bio_for_each_segment(bvl, _iter.bio, _iter.iter)

-#define rq_iter_last(rq, _iter) \
+#define rq_iter_last(bvec, _iter) \
(_iter.bio->bi_next == NULL && \
- bio_iter_last(_iter.bio, _iter.iter))
+ bio_iter_last(bvec, _iter.iter))

#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
# error "You should define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE for your platform"
--
1.8.3.rc1

2013-06-09 02:26:42

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 03/26] block: Abstract out bvec iterator

Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jens Axboe <[email protected]>
---
Documentation/block/biodoc.txt | 7 ++-
arch/m68k/emu/nfblock.c | 2 +-
arch/powerpc/sysdev/axonram.c | 3 +-
block/blk-core.c | 36 +++++++-------
block/blk-flush.c | 2 +-
block/blk-lib.c | 12 ++---
block/blk-map.c | 6 +--
block/blk-merge.c | 4 +-
block/blk-throttle.c | 13 ++---
block/elevator.c | 2 +-
drivers/block/aoe/aoecmd.c | 6 +--
drivers/block/brd.c | 4 +-
drivers/block/drbd/drbd_actlog.c | 2 +-
drivers/block/drbd/drbd_bitmap.c | 2 +-
drivers/block/drbd/drbd_receiver.c | 6 +--
drivers/block/drbd/drbd_req.c | 6 +--
drivers/block/drbd/drbd_req.h | 2 +-
drivers/block/floppy.c | 4 +-
drivers/block/loop.c | 4 +-
drivers/block/mtip32xx/mtip32xx.c | 7 +--
drivers/block/nvme-core.c | 25 +++++-----
drivers/block/pktcdvd.c | 51 +++++++++++---------
drivers/block/ps3disk.c | 2 +-
drivers/block/rbd.c | 21 ++++----
drivers/block/rsxx/dev.c | 4 +-
drivers/block/rsxx/dma.c | 4 +-
drivers/block/umem.c | 9 ++--
drivers/block/virtio_blk.c | 4 +-
drivers/block/xen-blkback/blkback.c | 2 +-
drivers/md/bcache/alloc.c | 4 +-
drivers/md/bcache/btree.c | 19 ++++----
drivers/md/bcache/debug.c | 2 +-
drivers/md/bcache/io.c | 26 +++++-----
drivers/md/bcache/journal.c | 12 ++---
drivers/md/bcache/movinggc.c | 4 +-
drivers/md/bcache/request.c | 52 ++++++++++----------
drivers/md/bcache/super.c | 16 +++----
drivers/md/bcache/util.c | 4 +-
drivers/md/bcache/writeback.c | 6 +--
drivers/md/dm-bio-record.h | 12 ++---
drivers/md/dm-bufio.c | 2 +-
drivers/md/dm-cache-policy-mq.c | 4 +-
drivers/md/dm-cache-target.c | 16 ++++---
drivers/md/dm-crypt.c | 20 ++++----
drivers/md/dm-delay.c | 7 +--
drivers/md/dm-flakey.c | 7 +--
drivers/md/dm-io.c | 7 +--
drivers/md/dm-linear.c | 3 +-
drivers/md/dm-raid1.c | 16 +++----
drivers/md/dm-region-hash.c | 3 +-
drivers/md/dm-snap.c | 13 ++---
drivers/md/dm-stripe.c | 13 +++--
drivers/md/dm-thin.c | 23 +++++----
drivers/md/dm-verity.c | 9 ++--
drivers/md/dm.c | 21 ++++----
drivers/md/faulty.c | 19 +++++---
drivers/md/linear.c | 12 ++---
drivers/md/md.c | 25 +++++-----
drivers/md/multipath.c | 13 ++---
drivers/md/raid0.c | 16 ++++---
drivers/md/raid1.c | 63 +++++++++++++-----------
drivers/md/raid10.c | 95 ++++++++++++++++++++-----------------
drivers/md/raid5.c | 72 ++++++++++++++--------------
drivers/s390/block/dcssblk.c | 5 +-
drivers/s390/block/xpram.c | 9 ++--
drivers/scsi/osd/osd_initiator.c | 2 +-
drivers/staging/zram/zram_drv.c | 12 +++--
drivers/target/target_core_iblock.c | 2 +-
fs/bio-integrity.c | 8 ++--
fs/bio.c | 41 ++++++++--------
fs/btrfs/check-integrity.c | 10 ++--
fs/btrfs/compression.c | 17 +++----
fs/btrfs/extent_io.c | 16 +++----
fs/btrfs/file-item.c | 13 ++---
fs/btrfs/inode.c | 17 +++----
fs/btrfs/raid56.c | 22 ++++-----
fs/btrfs/scrub.c | 12 ++---
fs/btrfs/volumes.c | 12 ++---
fs/buffer.c | 12 ++---
fs/direct-io.c | 4 +-
fs/ext4/page-io.c | 4 +-
fs/f2fs/data.c | 2 +-
fs/f2fs/segment.c | 3 +-
fs/gfs2/lops.c | 2 +-
fs/gfs2/ops_fstype.c | 2 +-
fs/hfsplus/wrapper.c | 2 +-
fs/jfs/jfs_logmgr.c | 10 ++--
fs/jfs/jfs_metapage.c | 9 ++--
fs/logfs/dev_bdev.c | 20 ++++----
fs/mpage.c | 2 +-
fs/nfs/blocklayout/blocklayout.c | 9 ++--
fs/nilfs2/segbuf.c | 3 +-
fs/ocfs2/cluster/heartbeat.c | 2 +-
fs/xfs/xfs_aops.c | 2 +-
fs/xfs/xfs_buf.c | 4 +-
include/linux/bio.h | 16 +++----
include/linux/blk_types.h | 19 +++++---
include/trace/events/bcache.h | 20 ++++----
include/trace/events/block.h | 26 +++++-----
include/trace/events/f2fs.h | 4 +-
kernel/power/block_io.c | 2 +-
kernel/trace/blktrace.c | 15 +++---
mm/page_io.c | 10 ++--
103 files changed, 682 insertions(+), 608 deletions(-)

diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index 8df5e8e..2101e71 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -447,14 +447,13 @@ struct bio_vec {
* main unit of I/O for the block layer and lower layers (ie drivers)
*/
struct bio {
- sector_t bi_sector;
struct bio *bi_next; /* request queue link */
struct block_device *bi_bdev; /* target device */
unsigned long bi_flags; /* status, command, etc */
unsigned long bi_rw; /* low bits: r/w, high: priority */

unsigned int bi_vcnt; /* how may bio_vec's */
- unsigned int bi_idx; /* current index into bio_vec array */
+ struct bvec_iter bi_iter; /* current index into bio_vec array */

unsigned int bi_size; /* total size in bytes */
unsigned short bi_phys_segments; /* segments after physaddr coalesce*/
@@ -480,7 +479,7 @@ With this multipage bio design:
- Code that traverses the req list can find all the segments of a bio
by using rq_for_each_segment. This handles the fact that a request
has multiple bios, each of which can have multiple segments.
-- Drivers which can't process a large bio in one shot can use the bi_idx
+- Drivers which can't process a large bio in one shot can use the bi_iter
field to keep track of the next bio_vec entry to process.
(e.g a 1MB bio_vec needs to be handled in max 128kB chunks for IDE)
[TBD: Should preferably also have a bi_voffset and bi_vlen to avoid modifying
@@ -589,7 +588,7 @@ driver should not modify these values. The block layer sets up the
nr_sectors and current_nr_sectors fields (based on the corresponding
hard_xxx values and the number of bytes transferred) and updates it on
every transfer that invokes end_that_request_first. It does the same for the
-buffer, bio, bio->bi_idx fields too.
+buffer, bio, bio->bi_iter fields too.

The buffer field is just a virtual address mapping of the current segment
of the i/o buffer in cases where the buffer resides in low-memory. For high
diff --git a/arch/m68k/emu/nfblock.c b/arch/m68k/emu/nfblock.c
index e301133..9070d6c 100644
--- a/arch/m68k/emu/nfblock.c
+++ b/arch/m68k/emu/nfblock.c
@@ -64,7 +64,7 @@ static void nfhd_make_request(struct request_queue *queue, struct bio *bio)
struct nfhd_device *dev = queue->queuedata;
struct bio_vec *bvec;
int i, dir, len, shift;
- sector_t sec = bio->bi_sector;
+ sector_t sec = bio->bi_iter.bi_sector;

dir = bio_data_dir(bio);
shift = dev->bshift;
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 1c16141..f33bcba 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -113,7 +113,8 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
unsigned int transfered;
unsigned short idx;

- phys_mem = bank->io_addr + (bio->bi_sector << AXON_RAM_SECTOR_SHIFT);
+ phys_mem = bank->io_addr + (bio->bi_iter.bi_sector <<
+ AXON_RAM_SECTOR_SHIFT);
phys_end = bank->io_addr + bank->size;
transfered = 0;
bio_for_each_segment(vec, bio, idx) {
diff --git a/block/blk-core.c b/block/blk-core.c
index 33c33bc..0704c5c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -166,7 +166,7 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
bio_advance(bio, nbytes);

/* don't actually finish bio if it's part of flush sequence */
- if (bio->bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ))
+ if (bio->bi_iter.bi_size == 0 && !(rq->cmd_flags & REQ_FLUSH_SEQ))
bio_endio(bio, error);
}

@@ -1333,7 +1333,7 @@ void blk_add_request_payload(struct request *rq, struct page *page,
bio->bi_io_vec->bv_offset = 0;
bio->bi_io_vec->bv_len = len;

- bio->bi_size = len;
+ bio->bi_iter.bi_size = len;
bio->bi_vcnt = 1;
bio->bi_phys_segments = 1;

@@ -1358,7 +1358,7 @@ static bool bio_attempt_back_merge(struct request_queue *q, struct request *req,

req->biotail->bi_next = bio;
req->biotail = bio;
- req->__data_len += bio->bi_size;
+ req->__data_len += bio->bi_iter.bi_size;
req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));

drive_stat_acct(req, 0);
@@ -1387,8 +1387,8 @@ static bool bio_attempt_front_merge(struct request_queue *q,
* not touch req->buffer either...
*/
req->buffer = bio_data(bio);
- req->__sector = bio->bi_sector;
- req->__data_len += bio->bi_size;
+ req->__sector = bio->bi_iter.bi_sector;
+ req->__data_len += bio->bi_iter.bi_size;
req->ioprio = ioprio_best(req->ioprio, bio_prio(bio));

drive_stat_acct(req, 0);
@@ -1457,7 +1457,7 @@ void init_request_from_bio(struct request *req, struct bio *bio)
req->cmd_flags |= REQ_FAILFAST_MASK;

req->errors = 0;
- req->__sector = bio->bi_sector;
+ req->__sector = bio->bi_iter.bi_sector;
req->ioprio = bio_prio(bio);
blk_rq_bio_prep(req->q, req, bio);
}
@@ -1583,12 +1583,12 @@ static inline void blk_partition_remap(struct bio *bio)
if (bio_sectors(bio) && bdev != bdev->bd_contains) {
struct hd_struct *p = bdev->bd_part;

- bio->bi_sector += p->start_sect;
+ bio->bi_iter.bi_sector += p->start_sect;
bio->bi_bdev = bdev->bd_contains;

trace_block_bio_remap(bdev_get_queue(bio->bi_bdev), bio,
bdev->bd_dev,
- bio->bi_sector - p->start_sect);
+ bio->bi_iter.bi_sector - p->start_sect);
}
}

@@ -1654,7 +1654,7 @@ static inline int bio_check_eod(struct bio *bio, unsigned int nr_sectors)
/* Test device or partition size, when known. */
maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;
if (maxsector) {
- sector_t sector = bio->bi_sector;
+ sector_t sector = bio->bi_iter.bi_sector;

if (maxsector < nr_sectors || maxsector - nr_sectors < sector) {
/*
@@ -1690,7 +1690,7 @@ generic_make_request_checks(struct bio *bio)
"generic_make_request: Trying to access "
"nonexistent block-device %s (%Lu)\n",
bdevname(bio->bi_bdev, b),
- (long long) bio->bi_sector);
+ (long long) bio->bi_iter.bi_sector);
goto end_io;
}

@@ -1704,9 +1704,9 @@ generic_make_request_checks(struct bio *bio)
}

part = bio->bi_bdev->bd_part;
- if (should_fail_request(part, bio->bi_size) ||
+ if (should_fail_request(part, bio->bi_iter.bi_size) ||
should_fail_request(&part_to_disk(part)->part0,
- bio->bi_size))
+ bio->bi_iter.bi_size))
goto end_io;

/*
@@ -1865,7 +1865,7 @@ void submit_bio(int rw, struct bio *bio)
if (rw & WRITE) {
count_vm_events(PGPGOUT, count);
} else {
- task_io_account_read(bio->bi_size);
+ task_io_account_read(bio->bi_iter.bi_size);
count_vm_events(PGPGIN, count);
}

@@ -1874,7 +1874,7 @@ void submit_bio(int rw, struct bio *bio)
printk(KERN_DEBUG "%s(%d): %s block %Lu on %s (%u sectors)\n",
current->comm, task_pid_nr(current),
(rw & WRITE) ? "WRITE" : "READ",
- (unsigned long long)bio->bi_sector,
+ (unsigned long long)bio->bi_iter.bi_sector,
bdevname(bio->bi_bdev, b),
count);
}
@@ -2007,7 +2007,7 @@ unsigned int blk_rq_err_bytes(const struct request *rq)
for (bio = rq->bio; bio; bio = bio->bi_next) {
if ((bio->bi_rw & ff) != ff)
break;
- bytes += bio->bi_size;
+ bytes += bio->bi_iter.bi_size;
}

/* this could lead to infinite loop */
@@ -2332,9 +2332,9 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
total_bytes = 0;
while (req->bio) {
struct bio *bio = req->bio;
- unsigned bio_bytes = min(bio->bi_size, nr_bytes);
+ unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes);

- if (bio_bytes == bio->bi_size)
+ if (bio_bytes == bio->bi_iter.bi_size)
req->bio = bio->bi_next;

req_bio_endio(req, bio, bio_bytes, error);
@@ -2683,7 +2683,7 @@ void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
rq->nr_phys_segments = bio_phys_segments(q, bio);
rq->buffer = bio_data(bio);
}
- rq->__data_len = bio->bi_size;
+ rq->__data_len = bio->bi_iter.bi_size;
rq->bio = rq->biotail = bio;

if (bio->bi_bdev)
diff --git a/block/blk-flush.c b/block/blk-flush.c
index cc2b827..3248998 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -444,7 +444,7 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
* copied from blk_rq_pos(rq).
*/
if (error_sector)
- *error_sector = bio->bi_sector;
+ *error_sector = bio->bi_iter.bi_sector;

if (!bio_flagged(bio, BIO_UPTODATE))
ret = -EIO;
diff --git a/block/blk-lib.c b/block/blk-lib.c
index d6f50d5..3250620 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -110,12 +110,12 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
req_sects = end_sect - sector;
}

- bio->bi_sector = sector;
+ bio->bi_iter.bi_sector = sector;
bio->bi_end_io = bio_batch_end_io;
bio->bi_bdev = bdev;
bio->bi_private = &bb;

- bio->bi_size = req_sects << 9;
+ bio->bi_iter.bi_size = req_sects << 9;
nr_sects -= req_sects;
sector = end_sect;

@@ -176,7 +176,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
break;
}

- bio->bi_sector = sector;
+ bio->bi_iter.bi_sector = sector;
bio->bi_end_io = bio_batch_end_io;
bio->bi_bdev = bdev;
bio->bi_private = &bb;
@@ -186,11 +186,11 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
bio->bi_io_vec->bv_len = bdev_logical_block_size(bdev);

if (nr_sects > max_write_same_sectors) {
- bio->bi_size = max_write_same_sectors << 9;
+ bio->bi_iter.bi_size = max_write_same_sectors << 9;
nr_sects -= max_write_same_sectors;
sector += max_write_same_sectors;
} else {
- bio->bi_size = nr_sects << 9;
+ bio->bi_iter.bi_size = nr_sects << 9;
nr_sects = 0;
}

@@ -242,7 +242,7 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
break;
}

- bio->bi_sector = sector;
+ bio->bi_iter.bi_sector = sector;
bio->bi_bdev = bdev;
bio->bi_end_io = bio_batch_end_io;
bio->bi_private = &bb;
diff --git a/block/blk-map.c b/block/blk-map.c
index 623e1cd..ae4ae10 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -20,7 +20,7 @@ int blk_rq_append_bio(struct request_queue *q, struct request *rq,
rq->biotail->bi_next = bio;
rq->biotail = bio;

- rq->__data_len += bio->bi_size;
+ rq->__data_len += bio->bi_iter.bi_size;
}
return 0;
}
@@ -76,7 +76,7 @@ static int __blk_rq_map_user(struct request_queue *q, struct request *rq,

ret = blk_rq_append_bio(q, rq, bio);
if (!ret)
- return bio->bi_size;
+ return bio->bi_iter.bi_size;

/* if it was boucned we must call the end io function */
bio_endio(bio, 0);
@@ -220,7 +220,7 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
if (IS_ERR(bio))
return PTR_ERR(bio);

- if (bio->bi_size != len) {
+ if (bio->bi_iter.bi_size != len) {
/*
* Grab an extra reference to this bio, as bio_unmap_user()
* expects to be able to drop it twice as it happens on the
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 5f24482..7750b25 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -532,9 +532,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)

int blk_try_merge(struct request *rq, struct bio *bio)
{
- if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_sector)
+ if (blk_rq_pos(rq) + blk_rq_sectors(rq) == bio->bi_iter.bi_sector)
return ELEVATOR_BACK_MERGE;
- else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_sector)
+ else if (blk_rq_pos(rq) - bio_sectors(bio) == bio->bi_iter.bi_sector)
return ELEVATOR_FRONT_MERGE;
return ELEVATOR_NO_MERGE;
}
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 3114622..fc06b58 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -590,14 +590,14 @@ static bool tg_with_in_bps_limit(struct throtl_data *td, struct throtl_grp *tg,
do_div(tmp, HZ);
bytes_allowed = tmp;

- if (tg->bytes_disp[rw] + bio->bi_size <= bytes_allowed) {
+ if (tg->bytes_disp[rw] + bio->bi_iter.bi_size <= bytes_allowed) {
if (wait)
*wait = 0;
return 1;
}

/* Calc approx time to dispatch */
- extra_bytes = tg->bytes_disp[rw] + bio->bi_size - bytes_allowed;
+ extra_bytes = tg->bytes_disp[rw] + bio->bi_iter.bi_size - bytes_allowed;
jiffy_wait = div64_u64(extra_bytes * HZ, tg->bps[rw]);

if (!jiffy_wait)
@@ -705,10 +705,11 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
bool rw = bio_data_dir(bio);

/* Charge the bio to the group */
- tg->bytes_disp[rw] += bio->bi_size;
+ tg->bytes_disp[rw] += bio->bi_iter.bi_size;
tg->io_disp[rw]++;

- throtl_update_dispatch_stats(tg_to_blkg(tg), bio->bi_size, bio->bi_rw);
+ throtl_update_dispatch_stats(tg_to_blkg(tg),
+ bio->bi_iter.bi_size, bio->bi_rw);
}

static void throtl_add_bio_tg(struct throtl_data *td, struct throtl_grp *tg,
@@ -1128,7 +1129,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio)
if (tg) {
if (tg_no_rule_group(tg, rw)) {
throtl_update_dispatch_stats(tg_to_blkg(tg),
- bio->bi_size, bio->bi_rw);
+ bio->bi_iter.bi_size, bio->bi_rw);
goto out_unlock_rcu;
}
}
@@ -1175,7 +1176,7 @@ queue_bio:
throtl_log_tg(td, tg, "[%c] bio. bdisp=%llu sz=%u bps=%llu"
" iodisp=%u iops=%u queued=%d/%d",
rw == READ ? 'R' : 'W',
- tg->bytes_disp[rw], bio->bi_size, tg->bps[rw],
+ tg->bytes_disp[rw], bio->bi_iter.bi_size, tg->bps[rw],
tg->io_disp[rw], tg->iops[rw],
tg->nr_queued[READ], tg->nr_queued[WRITE]);

diff --git a/block/elevator.c b/block/elevator.c
index eba5b04..ddb92b6 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -442,7 +442,7 @@ int elv_merge(struct request_queue *q, struct request **req, struct bio *bio)
/*
* See if our hash lookup can find a potential backmerge.
*/
- __rq = elv_rqhash_find(q, bio->bi_sector);
+ __rq = elv_rqhash_find(q, bio->bi_iter.bi_sector);
if (__rq && elv_rq_merge_ok(__rq, bio)) {
*req = __rq;
return ELEVATOR_BACK_MERGE;
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index fc803ec..8716181 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -923,8 +923,8 @@ bufinit(struct buf *buf, struct request *rq, struct bio *bio)
memset(buf, 0, sizeof(*buf));
buf->rq = rq;
buf->bio = bio;
- buf->resid = bio->bi_size;
- buf->sector = bio->bi_sector;
+ buf->resid = bio->bi_iter.bi_size;
+ buf->sector = bio->bi_iter.bi_sector;
bio_pageinc(bio);
buf->bv = bio_iovec(bio);
buf->bv_resid = buf->bv->bv_len;
@@ -1146,7 +1146,7 @@ aoe_end_request(struct aoedev *d, struct request *rq, int fastfail)
do {
bio = rq->bio;
bok = !fastfail && test_bit(BIO_UPTODATE, &bio->bi_flags);
- } while (__blk_end_request(rq, bok ? 0 : -EIO, bio->bi_size));
+ } while (__blk_end_request(rq, bok ? 0 : -EIO, bio->bi_iter.bi_size));

/* cf. http://lkml.org/lkml/2006/10/31/28 */
if (!fastfail)
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 9bf4371..e269532 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -333,13 +333,13 @@ static void brd_make_request(struct request_queue *q, struct bio *bio)
int i;
int err = -EIO;

- sector = bio->bi_sector;
+ sector = bio->bi_iter.bi_sector;
if (bio_end_sector(bio) > get_capacity(bdev->bd_disk))
goto out;

if (unlikely(bio->bi_rw & REQ_DISCARD)) {
err = 0;
- discard_from_brd(brd, sector, bio->bi_size);
+ discard_from_brd(brd, sector, bio->bi_iter.bi_size);
goto out;
}

diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c
index 6608076..6182761 100644
--- a/drivers/block/drbd/drbd_actlog.c
+++ b/drivers/block/drbd/drbd_actlog.c
@@ -159,7 +159,7 @@ static int _drbd_md_sync_page_io(struct drbd_conf *mdev,

bio = bio_alloc_drbd(GFP_NOIO);
bio->bi_bdev = bdev->md_bdev;
- bio->bi_sector = sector;
+ bio->bi_iter.bi_sector = sector;
err = -EIO;
if (bio_add_page(bio, page, size, 0) != size)
goto out;
diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index 64fbb83..d0d847a 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -1028,7 +1028,7 @@ static void bm_page_io_async(struct bm_aio_ctx *ctx, int page_nr, int rw) __must
} else
page = b->bm_pages[page_nr];
bio->bi_bdev = mdev->ldev->md_bdev;
- bio->bi_sector = on_disk_sector;
+ bio->bi_iter.bi_sector = on_disk_sector;
/* bio_add_page of a single page to an empty bio will always succeed,
* according to api. Do we want to assert that? */
bio_add_page(bio, page, len, 0);
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 4222aff..c342f93 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1333,7 +1333,7 @@ next_bio:
goto fail;
}
/* > peer_req->i.sector, unless this is the first bio */
- bio->bi_sector = sector;
+ bio->bi_iter.bi_sector = sector;
bio->bi_bdev = mdev->ldev->backing_bdev;
bio->bi_rw = rw;
bio->bi_private = peer_req;
@@ -1353,7 +1353,7 @@ next_bio:
dev_err(DEV,
"bio_add_page failed for len=%u, "
"bi_vcnt=0 (bi_sector=%llu)\n",
- len, (unsigned long long)bio->bi_sector);
+ len, (uint64_t)bio->bi_iter.bi_sector);
err = -ENOSPC;
goto fail;
}
@@ -1615,7 +1615,7 @@ static int recv_dless_read(struct drbd_conf *mdev, struct drbd_request *req,
mdev->recv_cnt += data_size>>9;

bio = req->master_bio;
- D_ASSERT(sector == bio->bi_sector);
+ D_ASSERT(sector == bio->bi_iter.bi_sector);

bio_for_each_segment(bvec, bio, i) {
void *mapped = kmap(bvec->bv_page) + bvec->bv_offset;
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index c24379f..a6bedaa 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -77,8 +77,8 @@ static struct drbd_request *drbd_req_new(struct drbd_conf *mdev,
req->epoch = 0;

drbd_clear_interval(&req->i);
- req->i.sector = bio_src->bi_sector;
- req->i.size = bio_src->bi_size;
+ req->i.sector = bio_src->bi_iter.bi_sector;
+ req->i.size = bio_src->bi_iter.bi_size;
req->i.local = true;
req->i.waiting = false;

@@ -1280,7 +1280,7 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
/*
* what we "blindly" assume:
*/
- D_ASSERT(IS_ALIGNED(bio->bi_size, 512));
+ D_ASSERT(IS_ALIGNED(bio->bi_iter.bi_size, 512));

inc_ap_bio(mdev);
__drbd_make_request(mdev, bio, start_time);
diff --git a/drivers/block/drbd/drbd_req.h b/drivers/block/drbd/drbd_req.h
index 978cb1a..28e15d9 100644
--- a/drivers/block/drbd/drbd_req.h
+++ b/drivers/block/drbd/drbd_req.h
@@ -269,7 +269,7 @@ static inline void drbd_req_make_private_bio(struct drbd_request *req, struct bi

/* Short lived temporary struct on the stack.
* We could squirrel the error to be returned into
- * bio->bi_size, or similar. But that would be too ugly. */
+ * bio->bi_iter.bi_size, or similar. But that would be too ugly. */
struct bio_and_error {
struct bio *bio;
int error;
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 04ceb7e..bf7b8b2 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3775,9 +3775,9 @@ static int __floppy_read_block_0(struct block_device *bdev)
bio_vec.bv_len = size;
bio_vec.bv_offset = 0;
bio.bi_vcnt = 1;
- bio.bi_size = size;
+ bio.bi_iter.bi_size = size;
bio.bi_bdev = bdev;
- bio.bi_sector = 0;
+ bio.bi_iter.bi_sector = 0;
bio.bi_flags = (1 << BIO_QUIET);
init_completion(&complete);
bio.bi_private = &complete;
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d92d50f..3df42e6 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -415,7 +415,7 @@ static int do_bio_filebacked(struct loop_device *lo, struct bio *bio)
loff_t pos;
int ret;

- pos = ((loff_t) bio->bi_sector << 9) + lo->lo_offset;
+ pos = ((loff_t) bio->bi_iter.bi_sector << 9) + lo->lo_offset;

if (bio_rw(bio) == WRITE) {
struct file *file = lo->lo_backing_file;
@@ -444,7 +444,7 @@ static int do_bio_filebacked(struct loop_device *lo, struct bio *bio)
goto out;
}
ret = file->f_op->fallocate(file, mode, pos,
- bio->bi_size);
+ bio->bi_iter.bi_size);
if (unlikely(ret && ret != -EINVAL &&
ret != -EOPNOTSUPP))
ret = -EIO;
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 847107e..8d6729c 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3889,7 +3889,7 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)
}

if (unlikely(bio->bi_rw & REQ_DISCARD)) {
- bio_endio(bio, mtip_send_trim(dd, bio->bi_sector,
+ bio_endio(bio, mtip_send_trim(dd, bio->bi_iter.bi_sector,
bio_sectors(bio)));
return;
}
@@ -3902,7 +3902,8 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)

if (bio_data_dir(bio) == WRITE && bio_sectors(bio) <= 64 &&
dd->unal_qdepth) {
- if (bio->bi_sector % 8 != 0) /* Unaligned on 4k boundaries */
+ if (bio->bi_iter.bi_sector % 8 != 0)
+ /* Unaligned on 4k boundaries */
unaligned = 1;
else if (bio_sectors(bio) % 8 != 0) /* Aligned but not 4k/8k */
unaligned = 1;
@@ -3930,7 +3931,7 @@ static void mtip_make_request(struct request_queue *queue, struct bio *bio)

/* Issue the read/write. */
mtip_hw_submit_io(dd,
- bio->bi_sector,
+ bio->bi_iter.bi_sector,
bio_sectors(bio),
nents,
tag,
diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 8efdfaa..c80d308 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -435,7 +435,7 @@ static struct nvme_bio_pair *nvme_bio_split(struct bio *bio, int idx,
{
struct nvme_bio_pair *bp;

- BUG_ON(len > bio->bi_size);
+ BUG_ON(len > bio->bi_iter.bi_size);
BUG_ON(idx > bio->bi_vcnt);

bp = kmalloc(sizeof(*bp), GFP_ATOMIC);
@@ -446,11 +446,11 @@ static struct nvme_bio_pair *nvme_bio_split(struct bio *bio, int idx,
bp->b1 = *bio;
bp->b2 = *bio;

- bp->b1.bi_size = len;
- bp->b2.bi_size -= len;
+ bp->b1.bi_iter.bi_size = len;
+ bp->b2.bi_iter.bi_size -= len;
bp->b1.bi_vcnt = idx;
- bp->b2.bi_idx = idx;
- bp->b2.bi_sector += len >> 9;
+ bp->b2.bi_iter.bi_idx = idx;
+ bp->b2.bi_iter.bi_sector += len >> 9;

if (offset) {
bp->bv1 = kmalloc(bio->bi_max_vecs * sizeof(struct bio_vec),
@@ -519,11 +519,12 @@ static int nvme_map_bio(struct nvme_queue *nvmeq, struct nvme_iod *iod,
{
struct bio_vec *bvec, *bvprv = NULL;
struct scatterlist *sg = NULL;
- int i, length = 0, nsegs = 0, split_len = bio->bi_size;
+ int i, length = 0, nsegs = 0, split_len = bio->bi_iter.bi_size;

if (nvmeq->dev->stripe_size)
split_len = nvmeq->dev->stripe_size -
- ((bio->bi_sector << 9) & (nvmeq->dev->stripe_size - 1));
+ ((bio->bi_iter.bi_sector << 9) &
+ (nvmeq->dev->stripe_size - 1));

sg_init_table(iod->sg, psegs);
bio_for_each_segment(bvec, bio, i) {
@@ -551,7 +552,7 @@ static int nvme_map_bio(struct nvme_queue *nvmeq, struct nvme_iod *iod,
if (dma_map_sg(nvmeq->q_dmadev, iod->sg, iod->nents, dma_dir) == 0)
return -ENOMEM;

- BUG_ON(length != bio->bi_size);
+ BUG_ON(length != bio->bi_iter.bi_size);
return length;
}

@@ -575,8 +576,8 @@ static int nvme_submit_discard(struct nvme_queue *nvmeq, struct nvme_ns *ns,
iod->npages = 0;

range->cattr = cpu_to_le32(0);
- range->nlb = cpu_to_le32(bio->bi_size >> ns->lba_shift);
- range->slba = cpu_to_le64(nvme_block_nr(ns, bio->bi_sector));
+ range->nlb = cpu_to_le32(bio->bi_iter.bi_size >> ns->lba_shift);
+ range->slba = cpu_to_le64(nvme_block_nr(ns, bio->bi_iter.bi_sector));

memset(cmnd, 0, sizeof(*cmnd));
cmnd->dsm.opcode = nvme_cmd_dsm;
@@ -640,7 +641,7 @@ static int nvme_submit_bio_queue(struct nvme_queue *nvmeq, struct nvme_ns *ns,
return result;
}

- iod = nvme_alloc_iod(psegs, bio->bi_size, GFP_ATOMIC);
+ iod = nvme_alloc_iod(psegs, bio->bi_iter.bi_size, GFP_ATOMIC);
if (!iod)
goto nomem;
iod->private = bio;
@@ -689,7 +690,7 @@ static int nvme_submit_bio_queue(struct nvme_queue *nvmeq, struct nvme_ns *ns,
cmnd->rw.nsid = cpu_to_le32(ns->ns_id);
length = nvme_setup_prps(nvmeq->dev, &cmnd->common, iod, length,
GFP_ATOMIC);
- cmnd->rw.slba = cpu_to_le64(nvme_block_nr(ns, bio->bi_sector));
+ cmnd->rw.slba = cpu_to_le64(nvme_block_nr(ns, bio->bi_iter.bi_sector));
cmnd->rw.length = cpu_to_le16((length >> ns->lba_shift) - 1);
cmnd->rw.control = cpu_to_le16(control);
cmnd->rw.dsmgmt = cpu_to_le32(dsmgmt);
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 3c08983..68a6d2a 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -658,7 +658,7 @@ static struct pkt_rb_node *pkt_rbtree_find(struct pktcdvd_device *pd, sector_t s

for (;;) {
tmp = rb_entry(n, struct pkt_rb_node, rb_node);
- if (s <= tmp->bio->bi_sector)
+ if (s <= tmp->bio->bi_iter.bi_sector)
next = n->rb_left;
else
next = n->rb_right;
@@ -667,12 +667,12 @@ static struct pkt_rb_node *pkt_rbtree_find(struct pktcdvd_device *pd, sector_t s
n = next;
}

- if (s > tmp->bio->bi_sector) {
+ if (s > tmp->bio->bi_iter.bi_sector) {
tmp = pkt_rbtree_next(tmp);
if (!tmp)
return NULL;
}
- BUG_ON(s > tmp->bio->bi_sector);
+ BUG_ON(s > tmp->bio->bi_iter.bi_sector);
return tmp;
}

@@ -683,13 +683,13 @@ static void pkt_rbtree_insert(struct pktcdvd_device *pd, struct pkt_rb_node *nod
{
struct rb_node **p = &pd->bio_queue.rb_node;
struct rb_node *parent = NULL;
- sector_t s = node->bio->bi_sector;
+ sector_t s = node->bio->bi_iter.bi_sector;
struct pkt_rb_node *tmp;

while (*p) {
parent = *p;
tmp = rb_entry(parent, struct pkt_rb_node, rb_node);
- if (s < tmp->bio->bi_sector)
+ if (s < tmp->bio->bi_iter.bi_sector)
p = &(*p)->rb_left;
else
p = &(*p)->rb_right;
@@ -867,7 +867,8 @@ static void pkt_iosched_process_queue(struct pktcdvd_device *pd)
spin_lock(&pd->iosched.lock);
bio = bio_list_peek(&pd->iosched.write_queue);
spin_unlock(&pd->iosched.lock);
- if (bio && (bio->bi_sector == pd->iosched.last_write))
+ if (bio && (bio->bi_iter.bi_sector ==
+ pd->iosched.last_write))
need_write_seek = 0;
if (need_write_seek && reads_queued) {
if (atomic_read(&pd->cdrw.pending_bios) > 0) {
@@ -898,7 +899,8 @@ static void pkt_iosched_process_queue(struct pktcdvd_device *pd)
continue;

if (bio_data_dir(bio) == READ)
- pd->iosched.successive_reads += bio->bi_size >> 10;
+ pd->iosched.successive_reads +=
+ bio->bi_iter.bi_size >> 10;
else {
pd->iosched.successive_reads = 0;
pd->iosched.last_write = bio_end_sector(bio);
@@ -987,7 +989,8 @@ static void pkt_end_io_read(struct bio *bio, int err)
BUG_ON(!pd);

VPRINTK("pkt_end_io_read: bio=%p sec0=%llx sec=%llx err=%d\n", bio,
- (unsigned long long)pkt->sector, (unsigned long long)bio->bi_sector, err);
+ (unsigned long long)pkt->sector,
+ (unsigned long long)bio->bi_iter.bi_sector, err);

if (err)
atomic_inc(&pkt->io_errors);
@@ -1035,8 +1038,9 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
memset(written, 0, sizeof(written));
spin_lock(&pkt->lock);
bio_list_for_each(bio, &pkt->orig_bios) {
- int first_frame = (bio->bi_sector - pkt->sector) / (CD_FRAMESIZE >> 9);
- int num_frames = bio->bi_size / CD_FRAMESIZE;
+ int first_frame = (bio->bi_iter.bi_sector - pkt->sector) /
+ (CD_FRAMESIZE >> 9);
+ int num_frames = bio->bi_iter.bi_size / CD_FRAMESIZE;
pd->stats.secs_w += num_frames * (CD_FRAMESIZE >> 9);
BUG_ON(first_frame < 0);
BUG_ON(first_frame + num_frames > pkt->frames);
@@ -1062,7 +1066,7 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)

bio = pkt->r_bios[f];
bio_reset(bio);
- bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
+ bio->bi_iter.bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
bio->bi_bdev = pd->bdev;
bio->bi_end_io = pkt_end_io_read;
bio->bi_private = pkt;
@@ -1159,8 +1163,8 @@ static int pkt_start_recovery(struct packet_data *pkt)
bio_reset(pkt->bio);
pkt->bio->bi_bdev = pd->bdev;
pkt->bio->bi_rw = REQ_WRITE;
- pkt->bio->bi_sector = new_sector;
- pkt->bio->bi_size = pkt->frames * CD_FRAMESIZE;
+ pkt->bio->bi_iter.bi_sector = new_sector;
+ pkt->bio->bi_iter.bi_size = pkt->frames * CD_FRAMESIZE;
pkt->bio->bi_vcnt = pkt->frames;

pkt->bio->bi_end_io = pkt_end_io_packet_write;
@@ -1223,7 +1227,7 @@ static int pkt_handle_queue(struct pktcdvd_device *pd)
node = first_node;
while (node) {
bio = node->bio;
- zone = ZONE(bio->bi_sector, pd);
+ zone = ZONE(bio->bi_iter.bi_sector, pd);
list_for_each_entry(p, &pd->cdrw.pkt_active_list, list) {
if (p->sector == zone) {
bio = NULL;
@@ -1263,13 +1267,13 @@ try_next_bio:
while ((node = pkt_rbtree_find(pd, zone)) != NULL) {
bio = node->bio;
VPRINTK("pkt_handle_queue: found zone=%llx\n",
- (unsigned long long)ZONE(bio->bi_sector, pd));
- if (ZONE(bio->bi_sector, pd) != zone)
+ (unsigned long long)ZONE(bio->bi_iter.bi_sector, pd));
+ if (ZONE(bio->bi_iter.bi_sector, pd) != zone)
break;
pkt_rbtree_erase(pd, node);
spin_lock(&pkt->lock);
bio_list_add(&pkt->orig_bios, bio);
- pkt->write_size += bio->bi_size / CD_FRAMESIZE;
+ pkt->write_size += bio->bi_iter.bi_size / CD_FRAMESIZE;
spin_unlock(&pkt->lock);
}
/* check write congestion marks, and if bio_queue_size is
@@ -1303,7 +1307,7 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
struct bio_vec *bvec = pkt->w_bio->bi_io_vec;

bio_reset(pkt->w_bio);
- pkt->w_bio->bi_sector = pkt->sector;
+ pkt->w_bio->bi_iter.bi_sector = pkt->sector;
pkt->w_bio->bi_bdev = pd->bdev;
pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
pkt->w_bio->bi_private = pkt;
@@ -2382,18 +2386,18 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)

if (!test_bit(PACKET_WRITABLE, &pd->flags)) {
printk(DRIVER_NAME": WRITE for ro device %s (%llu)\n",
- pd->name, (unsigned long long)bio->bi_sector);
+ pd->name, (unsigned long long)bio->bi_iter.bi_sector);
goto end_io;
}

- if (!bio->bi_size || (bio->bi_size % CD_FRAMESIZE)) {
+ if (!bio->bi_iter.bi_size || (bio->bi_iter.bi_size % CD_FRAMESIZE)) {
printk(DRIVER_NAME": wrong bio size\n");
goto end_io;
}

blk_queue_bounce(q, &bio);

- zone = ZONE(bio->bi_sector, pd);
+ zone = ZONE(bio->bi_iter.bi_sector, pd);
VPRINTK("pkt_make_request: start = %6llx stop = %6llx\n",
(unsigned long long)bio->bi_sector,
(unsigned long long)bio_end_sector(bio));
@@ -2407,7 +2411,7 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
last_zone = ZONE(bio_end_sector(bio) - 1, pd);
if (last_zone != zone) {
BUG_ON(last_zone != zone + pd->settings.size);
- first_sectors = last_zone - bio->bi_sector;
+ first_sectors = last_zone - bio->bi_iter.bi_sector;
bp = bio_split(bio, first_sectors);
BUG_ON(!bp);
pkt_make_request(q, &bp->bio1);
@@ -2429,7 +2433,8 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
if ((pkt->state == PACKET_WAITING_STATE) ||
(pkt->state == PACKET_READ_WAIT_STATE)) {
bio_list_add(&pkt->orig_bios, bio);
- pkt->write_size += bio->bi_size / CD_FRAMESIZE;
+ pkt->write_size +=
+ bio->bi_iter.bi_size / CD_FRAMESIZE;
if ((pkt->write_size >= pkt->frames) &&
(pkt->state == PACKET_WAITING_STATE)) {
atomic_inc(&pkt->run_sm);
diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index d754a88..464be78 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -104,7 +104,7 @@ static void ps3disk_scatter_gather(struct ps3_storage_device *dev,
dev_dbg(&dev->sbd.core,
"%s:%u: bio %u: %u segs %u sectors from %lu\n",
__func__, __LINE__, i, bio_segments(iter.bio),
- bio_sectors(iter.bio), iter.bio->bi_sector);
+ bio_sectors(iter.bio), iter.bio->bi_iter.bi_sector);

size = bvec->bv_len;
buf = bvec_kmap_irq(bvec, &flags);
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index d6d3140..ce7b1aa 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1180,14 +1180,14 @@ static struct bio *bio_clone_range(struct bio *bio_src,

/* Handle the easy case for the caller */

- if (!offset && len == bio_src->bi_size)
+ if (!offset && len == bio_src->bi_iter.bi_size)
return bio_clone(bio_src, gfpmask);

if (WARN_ON_ONCE(!len))
return NULL;
- if (WARN_ON_ONCE(len > bio_src->bi_size))
+ if (WARN_ON_ONCE(len > bio_src->bi_iter.bi_size))
return NULL;
- if (WARN_ON_ONCE(offset > bio_src->bi_size - len))
+ if (WARN_ON_ONCE(offset > bio_src->bi_iter.bi_size - len))
return NULL;

/* Find first affected segment... */
@@ -1217,7 +1217,8 @@ static struct bio *bio_clone_range(struct bio *bio_src,
return NULL; /* ENOMEM */

bio->bi_bdev = bio_src->bi_bdev;
- bio->bi_sector = bio_src->bi_sector + (offset >> SECTOR_SHIFT);
+ bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector +
+ (offset >> SECTOR_SHIFT);
bio->bi_rw = bio_src->bi_rw;
bio->bi_flags |= 1 << BIO_CLONED;

@@ -1236,8 +1237,7 @@ static struct bio *bio_clone_range(struct bio *bio_src,
}

bio->bi_vcnt = vcnt;
- bio->bi_size = len;
- bio->bi_idx = 0;
+ bio->bi_iter.bi_size = len;

return bio;
}
@@ -1268,7 +1268,7 @@ static struct bio *bio_chain_clone_range(struct bio **bio_src,

/* Build up a chain of clone bios up to the limit */

- if (!bi || off >= bi->bi_size || !len)
+ if (!bi || off >= bi->bi_iter.bi_size || !len)
return NULL; /* Nothing to clone */

end = &chain;
@@ -1280,7 +1280,7 @@ static struct bio *bio_chain_clone_range(struct bio **bio_src,
rbd_warn(NULL, "bio_chain exhausted with %u left", len);
goto out_err; /* EINVAL; ran out of bio's */
}
- bi_size = min_t(unsigned int, bi->bi_size - off, len);
+ bi_size = min_t(unsigned int, bi->bi_iter.bi_size - off, len);
bio = bio_clone_range(bi, off, bi_size, gfpmask);
if (!bio)
goto out_err; /* ENOMEM */
@@ -1289,7 +1289,7 @@ static struct bio *bio_chain_clone_range(struct bio **bio_src,
end = &bio->bi_next;

off += bi_size;
- if (off == bi->bi_size) {
+ if (off == bi->bi_iter.bi_size) {
bi = bi->bi_next;
off = 0;
}
@@ -2183,7 +2183,8 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,

if (type == OBJ_REQUEST_BIO) {
bio_list = data_desc;
- rbd_assert(img_offset == bio_list->bi_sector << SECTOR_SHIFT);
+ rbd_assert(img_offset ==
+ bio_list->bi_iter.bi_sector << SECTOR_SHIFT);
} else {
rbd_assert(type == OBJ_REQUEST_PAGES);
pages = data_desc;
diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index 4346d17..44d045b 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -180,7 +180,7 @@ static void rsxx_make_request(struct request_queue *q, struct bio *bio)
goto req_err;
}

- if (bio->bi_size == 0) {
+ if (bio->bi_iter.bi_size == 0) {
dev_err(CARD_TO_DEV(card), "size zero BIO!\n");
goto req_err;
}
@@ -200,7 +200,7 @@ static void rsxx_make_request(struct request_queue *q, struct bio *bio)

dev_dbg(CARD_TO_DEV(card), "BIO[%c]: meta: %p addr8: x%llx size: %d\n",
bio_data_dir(bio) ? 'W' : 'R', bio_meta,
- (u64)bio->bi_sector << 9, bio->bi_size);
+ (u64)bio->bi_iter.bi_sector << 9, bio->bi_iter.bi_size);

st = rsxx_dma_queue_bio(card, bio, &bio_meta->pending_dmas,
bio_dma_done_cb, bio_meta);
diff --git a/drivers/block/rsxx/dma.c b/drivers/block/rsxx/dma.c
index 0607513..c9bba8b 100644
--- a/drivers/block/rsxx/dma.c
+++ b/drivers/block/rsxx/dma.c
@@ -642,7 +642,7 @@ int rsxx_dma_queue_bio(struct rsxx_cardinfo *card,
int st;
int i;

- addr8 = bio->bi_sector << 9; /* sectors are 512 bytes */
+ addr8 = bio->bi_iter.bi_sector << 9; /* sectors are 512 bytes */
atomic_set(n_dmas, 0);

for (i = 0; i < card->n_targets; i++) {
@@ -651,7 +651,7 @@ int rsxx_dma_queue_bio(struct rsxx_cardinfo *card,
}

if (bio->bi_rw & REQ_DISCARD) {
- bv_len = bio->bi_size;
+ bv_len = bio->bi_iter.bi_size;

while (bv_len > 0) {
tgt = rsxx_get_dma_tgt(card, addr8);
diff --git a/drivers/block/umem.c b/drivers/block/umem.c
index ad70868..dab4f1a 100644
--- a/drivers/block/umem.c
+++ b/drivers/block/umem.c
@@ -352,8 +352,8 @@ static int add_bio(struct cardinfo *card)
bio = card->currentbio;
if (!bio && card->bio) {
card->currentbio = card->bio;
- card->current_idx = card->bio->bi_idx;
- card->current_sector = card->bio->bi_sector;
+ card->current_idx = card->bio->bi_iter.bi_idx;
+ card->current_sector = card->bio->bi_iter.bi_sector;
card->bio = card->bio->bi_next;
if (card->bio == NULL)
card->biotail = &card->bio;
@@ -451,7 +451,7 @@ static void process_page(unsigned long data)
if (page->idx >= bio->bi_vcnt) {
page->bio = bio->bi_next;
if (page->bio)
- page->idx = page->bio->bi_idx;
+ page->idx = page->bio->bi_iter.bi_idx;
}

pci_unmap_page(card->dev, desc->data_dma_handle,
@@ -532,7 +532,8 @@ static void mm_make_request(struct request_queue *q, struct bio *bio)
{
struct cardinfo *card = q->queuedata;
pr_debug("mm_make_request %llu %u\n",
- (unsigned long long)bio->bi_sector, bio->bi_size);
+ (unsigned long long)bio->bi_iter.bi_sector,
+ bio->bi_iter.bi_size);

spin_lock_irq(&card->lock);
*card->biotail = bio;
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 6472395..f61f366 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -184,7 +184,7 @@ static void virtblk_bio_send_data(struct virtblk_req *vbr)

vbr->flags &= ~VBLK_IS_FLUSH;
vbr->out_hdr.type = 0;
- vbr->out_hdr.sector = bio->bi_sector;
+ vbr->out_hdr.sector = bio->bi_iter.bi_sector;
vbr->out_hdr.ioprio = bio_prio(bio);

if (blk_bio_map_sg(vblk->disk->queue, bio, vbr->sg)) {
@@ -400,7 +400,7 @@ static void virtblk_make_request(struct request_queue *q, struct bio *bio)
vbr->flags |= VBLK_REQ_FLUSH;
if (bio->bi_rw & REQ_FUA)
vbr->flags |= VBLK_REQ_FUA;
- if (bio->bi_size)
+ if (bio->bi_iter.bi_size)
vbr->flags |= VBLK_REQ_DATA;

if (unlikely(vbr->flags & VBLK_REQ_FLUSH))
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index dd5b2fe..0f4c746 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -972,7 +972,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
bio->bi_bdev = preq.bdev;
bio->bi_private = pending_req;
bio->bi_end_io = end_block_io_op;
- bio->bi_sector = preq.sector_number;
+ bio->bi_iter.bi_sector = preq.sector_number;
}

preq.sector_number += seg[i].nsec;
diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c
index 048f294..cbb8ccd 100644
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -175,12 +175,12 @@ static void do_discard(struct cache *ca, long bucket)

bio_init(&d->bio);

- d->bio.bi_sector = bucket_to_sector(ca->set, d->bucket);
+ d->bio.bi_iter.bi_sector = bucket_to_sector(ca->set, d->bucket);
d->bio.bi_bdev = ca->bdev;
d->bio.bi_rw = REQ_WRITE|REQ_DISCARD;
d->bio.bi_max_vecs = 1;
d->bio.bi_io_vec = d->bio.bi_inline_vecs;
- d->bio.bi_size = bucket_bytes(ca);
+ d->bio.bi_iter.bi_size = bucket_bytes(ca);
d->bio.bi_end_io = discard_endio;
bio_set_prio(&d->bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0));

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 6830190..d43f480 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -258,7 +258,7 @@ void bch_btree_read(struct btree *b)

btree_bio_init(b);
b->bio->bi_rw = REQ_META|READ_SYNC;
- b->bio->bi_size = KEY_SIZE(&b->key) << 9;
+ b->bio->bi_iter.bi_size = KEY_SIZE(&b->key) << 9;

bch_bio_map(b->bio, b->sets[0].data);

@@ -327,7 +327,7 @@ static void do_btree_write(struct btree *b)

btree_bio_init(b);
b->bio->bi_rw = REQ_META|WRITE_SYNC;
- b->bio->bi_size = set_blocks(i, b->c) * block_bytes(b->c);
+ b->bio->bi_iter.bi_size = set_blocks(i, b->c) * block_bytes(b->c);
bch_bio_map(b->bio, i);

bkey_copy(&k.key, &b->key);
@@ -2171,11 +2171,11 @@ static int submit_partial_cache_miss(struct btree *b, struct btree_op *op,
unsigned sectors = INT_MAX;

if (KEY_INODE(k) == op->inode) {
- if (KEY_START(k) <= bio->bi_sector)
+ if (KEY_START(k) <= bio->bi_iter.bi_sector)
break;

sectors = min_t(uint64_t, sectors,
- KEY_START(k) - bio->bi_sector);
+ KEY_START(k) - bio->bi_iter.bi_sector);
}

ret = s->d->cache_miss(b, s, bio, sectors);
@@ -2207,12 +2207,12 @@ static int submit_partial_cache_hit(struct btree *b, struct btree_op *op,

while (!op->lookup_done &&
KEY_INODE(k) == op->inode &&
- bio->bi_sector < KEY_OFFSET(k)) {
+ bio->bi_iter.bi_sector < KEY_OFFSET(k)) {
struct bkey *bio_key;
sector_t sector = PTR_OFFSET(k, ptr) +
- (bio->bi_sector - KEY_START(k));
+ (bio->bi_iter.bi_sector - KEY_START(k));
unsigned sectors = min_t(uint64_t, INT_MAX,
- KEY_OFFSET(k) - bio->bi_sector);
+ KEY_OFFSET(k) - bio->bi_iter.bi_sector);

n = bch_bio_split(bio, sectors, GFP_NOIO, s->d->bio_split);
if (n == bio)
@@ -2252,10 +2252,11 @@ int bch_btree_search_recurse(struct btree *b, struct btree_op *op)
int ret = 0;
struct bkey *k;
struct btree_iter iter;
- bch_btree_iter_init(b, &iter, &KEY(op->inode, bio->bi_sector, 0));
+ bch_btree_iter_init(b, &iter, &KEY(op->inode,
+ bio->bi_iter.bi_sector, 0));

pr_debug("at %s searching for %u:%llu", pbtree(b), op->inode,
- (uint64_t) bio->bi_sector);
+ (uint64_t) bio->bi_iter.bi_sector);

do {
k = bch_btree_iter_next_filter(&iter, b, bch_ptr_bad);
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 5941ed7..4bc7c14 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -216,7 +216,7 @@ void bch_data_verify(struct search *s)
printk(KERN_ERR
"bcache (%s): verify failed at sector %llu\n",
bdevname(dc->bdev, name),
- (uint64_t) s->orig_bio->bi_sector);
+ (uint64_t) s->orig_bio->bi_iter.bi_sector);

kunmap(bv->bv_page);
kunmap(check->bi_io_vec[i].bv_page);
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index e5d27a8..28f06ca 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -19,18 +19,18 @@ static void bch_bi_idx_hack_endio(struct bio *bio, int error)

static void bch_generic_make_request_hack(struct bio *bio)
{
- if (bio->bi_idx) {
+ if (bio->bi_iter.bi_idx) {
struct bio *clone = bio_alloc(GFP_NOIO, bio_segments(bio));

memcpy(clone->bi_io_vec,
bio_iovec(bio),
bio_segments(bio) * sizeof(struct bio_vec));

- clone->bi_sector = bio->bi_sector;
+ clone->bi_iter.bi_sector = bio->bi_iter.bi_sector;
clone->bi_bdev = bio->bi_bdev;
clone->bi_rw = bio->bi_rw;
clone->bi_vcnt = bio_segments(bio);
- clone->bi_size = bio->bi_size;
+ clone->bi_iter.bi_size = bio->bi_iter.bi_size;

clone->bi_private = bio;
clone->bi_end_io = bch_bi_idx_hack_endio;
@@ -70,7 +70,7 @@ static void bch_generic_make_request_hack(struct bio *bio)
struct bio *bch_bio_split(struct bio *bio, int sectors,
gfp_t gfp, struct bio_set *bs)
{
- unsigned idx = bio->bi_idx, vcnt = 0, nbytes = sectors << 9;
+ unsigned idx = bio->bi_iter.bi_idx, vcnt = 0, nbytes = sectors << 9;
struct bio_vec *bv;
struct bio *ret = NULL;

@@ -86,7 +86,7 @@ struct bio *bch_bio_split(struct bio *bio, int sectors,
}

bio_for_each_segment(bv, bio, idx) {
- vcnt = idx - bio->bi_idx;
+ vcnt = idx - bio->bi_iter.bi_idx;

if (!nbytes) {
ret = bio_alloc_bioset(gfp, vcnt, bs);
@@ -115,15 +115,15 @@ struct bio *bch_bio_split(struct bio *bio, int sectors,
}
out:
ret->bi_bdev = bio->bi_bdev;
- ret->bi_sector = bio->bi_sector;
- ret->bi_size = sectors << 9;
+ ret->bi_iter.bi_sector = bio->bi_iter.bi_sector;
+ ret->bi_iter.bi_size = sectors << 9;
ret->bi_rw = bio->bi_rw;
ret->bi_vcnt = vcnt;
ret->bi_max_vecs = vcnt;

- bio->bi_sector += sectors;
- bio->bi_size -= sectors << 9;
- bio->bi_idx = idx;
+ bio->bi_iter.bi_sector += sectors;
+ bio->bi_iter.bi_size -= sectors << 9;
+ bio->bi_iter.bi_idx = idx;

if (bio_integrity(bio)) {
if (bio_integrity_clone(ret, bio, gfp)) {
@@ -158,7 +158,7 @@ static unsigned bch_bio_max_sectors(struct bio *bio)
bio_for_each_segment(bv, bio, i) {
struct bvec_merge_data bvm = {
.bi_bdev = bio->bi_bdev,
- .bi_sector = bio->bi_sector,
+ .bi_sector = bio->bi_iter.bi_sector,
.bi_size = ret << 9,
.bi_rw = bio->bi_rw,
};
@@ -268,8 +268,8 @@ void __bch_submit_bbio(struct bio *bio, struct cache_set *c)
{
struct bbio *b = container_of(bio, struct bbio, bio);

- bio->bi_sector = PTR_OFFSET(&b->key, 0);
- bio->bi_bdev = PTR_CACHE(c, &b->key, 0)->bdev;
+ bio->bi_iter.bi_sector = PTR_OFFSET(&b->key, 0);
+ bio->bi_bdev = PTR_CACHE(c, &b->key, 0)->bdev;

b->submit_time_us = local_clock_us();
closure_bio_submit(bio, bio->bi_private, PTR_CACHE(c, &b->key, 0));
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 8c8dfdc..724cb7a 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -47,10 +47,10 @@ reread: left = ca->sb.bucket_size - offset;
len = min_t(unsigned, left, PAGE_SECTORS * 8);

bio_reset(bio);
- bio->bi_sector = bucket + offset;
+ bio->bi_iter.bi_sector = bucket + offset;
bio->bi_bdev = ca->bdev;
bio->bi_rw = READ;
- bio->bi_size = len << 9;
+ bio->bi_iter.bi_size = len << 9;

bio->bi_end_io = journal_read_endio;
bio->bi_private = &op->cl;
@@ -438,13 +438,13 @@ static void do_journal_discard(struct cache *ca)
atomic_set(&ja->discard_in_flight, DISCARD_IN_FLIGHT);

bio_init(bio);
- bio->bi_sector = bucket_to_sector(ca->set,
+ bio->bi_iter.bi_sector = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev = ca->bdev;
bio->bi_rw = REQ_WRITE|REQ_DISCARD;
bio->bi_max_vecs = 1;
bio->bi_io_vec = bio->bi_inline_vecs;
- bio->bi_size = bucket_bytes(ca);
+ bio->bi_iter.bi_size = bucket_bytes(ca);
bio->bi_end_io = journal_discard_endio;

closure_get(&ca->set->cl);
@@ -615,10 +615,10 @@ static void journal_write_unlocked(struct closure *cl)
atomic_long_add(sectors, &ca->meta_sectors_written);

bio_reset(bio);
- bio->bi_sector = PTR_OFFSET(k, i);
+ bio->bi_iter.bi_sector = PTR_OFFSET(k, i);
bio->bi_bdev = ca->bdev;
bio->bi_rw = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH;
- bio->bi_size = sectors << 9;
+ bio->bi_iter.bi_size = sectors << 9;

bio->bi_end_io = journal_write_endio;
bio->bi_private = w;
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index 23f19fa..3acbcec 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -81,7 +81,7 @@ static void moving_init(struct moving_io *io)
bio_get(bio);
bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0));

- bio->bi_size = KEY_SIZE(&io->w->key) << 9;
+ bio->bi_iter.bi_size = KEY_SIZE(&io->w->key) << 9;
bio->bi_max_vecs = DIV_ROUND_UP(KEY_SIZE(&io->w->key),
PAGE_SECTORS);
bio->bi_private = &io->s.cl;
@@ -99,7 +99,7 @@ static void write_moving(struct closure *cl)

moving_init(io);

- io->bio.bio.bi_sector = KEY_START(&io->w->key);
+ io->bio.bio.bi_iter.bi_sector = KEY_START(&io->w->key);
s->op.lock = -1;
s->op.write_prio = 1;
s->op.cache_bio = &io->bio.bio;
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 8e5c35d..6360df5 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -220,7 +220,7 @@ static void bio_invalidate(struct closure *cl)
struct bio *bio = op->cache_bio;

pr_debug("invalidating %i sectors from %llu",
- bio_sectors(bio), (uint64_t) bio->bi_sector);
+ bio_sectors(bio), (uint64_t) bio->bi_iter.bi_sector);

while (bio_sectors(bio)) {
unsigned len = min(bio_sectors(bio), 1U << 14);
@@ -228,11 +228,11 @@ static void bio_invalidate(struct closure *cl)
if (bch_keylist_realloc(&op->keys, 0, op->c))
goto out;

- bio->bi_sector += len;
- bio->bi_size -= len << 9;
+ bio->bi_iter.bi_sector += len;
+ bio->bi_iter.bi_size -= len << 9;

bch_keylist_add(&op->keys,
- &KEY(op->inode, bio->bi_sector, len));
+ &KEY(op->inode, bio->bi_iter.bi_sector, len));
}

op->insert_data_done = true;
@@ -504,7 +504,7 @@ static void bch_insert_data_loop(struct closure *cl)
k = op->keys.top;
bkey_init(k);
SET_KEY_INODE(k, op->inode);
- SET_KEY_OFFSET(k, bio->bi_sector);
+ SET_KEY_OFFSET(k, bio->bi_iter.bi_sector);

if (!bch_alloc_sectors(k, bio_sectors(bio), s))
goto err;
@@ -529,7 +529,7 @@ static void bch_insert_data_loop(struct closure *cl)
pr_debug("%s", pkey(k));
bch_keylist_push(&op->keys);

- trace_bcache_cache_insert(n, n->bi_sector, n->bi_bdev);
+ trace_bcache_cache_insert(n, n->bi_iter.bi_sector, n->bi_bdev);
n->bi_rw |= REQ_WRITE;
bch_submit_bbio(n, op->c, k, 0);
} while (n != bio);
@@ -773,7 +773,7 @@ static void request_read_error(struct closure *cl)
* device.
*/
pr_debug("recovering at sector %llu",
- (uint64_t) s->orig_bio->bi_sector);
+ (uint64_t) s->orig_bio->bi_iter.bi_sector);

s->error = 0;
do_bio_hook(s);
@@ -802,9 +802,12 @@ static void request_read_done(struct closure *cl)

if (s->op.cache_bio) {
bio_reset(s->op.cache_bio);
- s->op.cache_bio->bi_sector = s->cache_miss->bi_sector;
- s->op.cache_bio->bi_bdev = s->cache_miss->bi_bdev;
- s->op.cache_bio->bi_size = s->cache_bio_sectors << 9;
+ s->op.cache_bio->bi_iter.bi_sector =
+ s->cache_miss->bi_iter.bi_sector;
+ s->op.cache_bio->bi_bdev =
+ s->cache_miss->bi_bdev;
+ s->op.cache_bio->bi_iter.bi_size =
+ s->cache_bio_sectors << 9;
bch_bio_map(s->op.cache_bio, NULL);

bio_copy_data(s->cache_miss, s->op.cache_bio);
@@ -882,9 +885,9 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
if (!s->op.cache_bio)
goto out_submit;

- s->op.cache_bio->bi_sector = miss->bi_sector;
+ s->op.cache_bio->bi_iter.bi_sector = miss->bi_iter.bi_sector;
s->op.cache_bio->bi_bdev = miss->bi_bdev;
- s->op.cache_bio->bi_size = s->cache_bio_sectors << 9;
+ s->op.cache_bio->bi_iter.bi_size = s->cache_bio_sectors << 9;

s->op.cache_bio->bi_end_io = request_endio;
s->op.cache_bio->bi_private = &s->cl;
@@ -950,7 +953,7 @@ static void request_write(struct cached_dev *dc, struct search *s)
struct closure *cl = &s->cl;
struct bio *bio = &s->bio.bio;
struct bkey start, end;
- start = KEY(dc->disk.id, bio->bi_sector, 0);
+ start = KEY(dc->disk.id, bio->bi_iter.bi_sector, 0);
end = KEY(dc->disk.id, bio_end_sector(bio), 0);

bch_keybuf_check_overlapping(&s->op.c->moving_gc_keys, &start, &end);
@@ -1073,8 +1076,8 @@ static void check_should_skip(struct cached_dev *dc, struct search *s)
(bio->bi_rw & REQ_WRITE)))
goto skip;

- if (bio->bi_sector & (c->sb.block_size - 1) ||
- bio_sectors(bio) & (c->sb.block_size - 1)) {
+ if (bio->bi_iter.bi_sector & (c->sb.block_size - 1) ||
+ bio_sectors(bio) & (c->sb.block_size - 1)) {
pr_debug("skipping unaligned io");
goto skip;
}
@@ -1096,8 +1099,9 @@ static void check_should_skip(struct cached_dev *dc, struct search *s)

spin_lock(&dc->io_lock);

- hlist_for_each_entry(i, iohash(dc, bio->bi_sector), hash)
- if (i->last == bio->bi_sector &&
+ hlist_for_each_entry(i, iohash(dc, bio->bi_iter.bi_sector),
+ hash)
+ if (i->last == bio->bi_iter.bi_sector &&
time_before(jiffies, i->jiffies))
goto found;

@@ -1106,8 +1110,8 @@ static void check_should_skip(struct cached_dev *dc, struct search *s)
add_sequential(s->task);
i->sequential = 0;
found:
- if (i->sequential + bio->bi_size > i->sequential)
- i->sequential += bio->bi_size;
+ if (i->sequential + bio->bi_iter.bi_size > i->sequential)
+ i->sequential += bio->bi_iter.bi_size;

i->last = bio_end_sector(bio);
i->jiffies = jiffies + msecs_to_jiffies(5000);
@@ -1119,7 +1123,7 @@ found:

spin_unlock(&dc->io_lock);
} else {
- s->task->sequential_io = bio->bi_size;
+ s->task->sequential_io = bio->bi_iter.bi_size;

add_sequential(s->task);
}
@@ -1152,7 +1156,7 @@ static void cached_dev_make_request(struct request_queue *q, struct bio *bio)
part_stat_unlock();

bio->bi_bdev = dc->bdev;
- bio->bi_sector += dc->sb.data_offset;
+ bio->bi_iter.bi_sector += dc->sb.data_offset;

if (cached_dev_get(dc)) {
s = search_alloc(bio, d);
@@ -1235,9 +1239,9 @@ static int flash_dev_cache_miss(struct btree *b, struct search *s,
sectors -= j;
}

- bio_advance(bio, min(sectors << 9, bio->bi_size));
+ bio_advance(bio, min(sectors << 9, bio->bi_iter.bi_size));

- if (!bio->bi_size)
+ if (!bio->bi_iter.bi_size)
s->op.lookup_done = true;

return 0;
@@ -1265,7 +1269,7 @@ static void flash_dev_make_request(struct request_queue *q, struct bio *bio)
closure_call(&s->op.cl, btree_read_async, NULL, cl);
} else if (bio_has_data(bio) || s->op.skip) {
bch_keybuf_check_overlapping(&s->op.c->moving_gc_keys,
- &KEY(d->id, bio->bi_sector, 0),
+ &KEY(d->id, bio->bi_iter.bi_sector, 0),
&KEY(d->id, bio_end_sector(bio), 0));

s->writeback = true;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 7f7ea78..668b50e 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -237,9 +237,9 @@ static void __write_super(struct cache_sb *sb, struct bio *bio)
struct cache_sb *out = page_address(bio->bi_io_vec[0].bv_page);
unsigned i;

- bio->bi_sector = SB_SECTOR;
- bio->bi_rw = REQ_SYNC|REQ_META;
- bio->bi_size = SB_SIZE;
+ bio->bi_iter.bi_sector = SB_SECTOR;
+ bio->bi_rw = REQ_SYNC|REQ_META;
+ bio->bi_iter.bi_size = SB_SIZE;
bch_bio_map(bio, NULL);

out->offset = cpu_to_le64(sb->offset);
@@ -350,7 +350,7 @@ static void uuid_io(struct cache_set *c, unsigned long rw,
struct bio *bio = bch_bbio_alloc(c);

bio->bi_rw = REQ_SYNC|REQ_META|rw;
- bio->bi_size = KEY_SIZE(k) << 9;
+ bio->bi_iter.bi_size = KEY_SIZE(k) << 9;

bio->bi_end_io = uuid_endio;
bio->bi_private = cl;
@@ -506,10 +506,10 @@ static void prio_io(struct cache *ca, uint64_t bucket, unsigned long rw)

closure_init_stack(cl);

- bio->bi_sector = bucket * ca->sb.bucket_size;
- bio->bi_bdev = ca->bdev;
- bio->bi_rw = REQ_SYNC|REQ_META|rw;
- bio->bi_size = bucket_bytes(ca);
+ bio->bi_iter.bi_sector = bucket * ca->sb.bucket_size;
+ bio->bi_bdev = ca->bdev;
+ bio->bi_rw = REQ_SYNC|REQ_META|rw;
+ bio->bi_iter.bi_size = bucket_bytes(ca);

bio->bi_end_io = prio_endio;
bio->bi_private = ca;
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index 98eb811..b995d5a 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -203,10 +203,10 @@ unsigned bch_next_delay(struct ratelimit *d, uint64_t done)

void bch_bio_map(struct bio *bio, void *base)
{
- size_t size = bio->bi_size;
+ size_t size = bio->bi_iter.bi_size;
struct bio_vec *bv = bio->bi_io_vec;

- BUG_ON(!bio->bi_size);
+ BUG_ON(!bio->bi_iter.bi_size);
BUG_ON(bio->bi_vcnt);

bv->bv_offset = base ? ((unsigned long) base) % PAGE_SIZE : 0;
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 8a3311a..61bc071 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -114,7 +114,7 @@ static void dirty_init(struct keybuf_key *w)
if (!io->dc->writeback_percent)
bio_set_prio(bio, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0));

- bio->bi_size = KEY_SIZE(&w->key) << 9;
+ bio->bi_iter.bi_size = KEY_SIZE(&w->key) << 9;
bio->bi_max_vecs = DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS);
bio->bi_private = w;
bio->bi_io_vec = bio->bi_inline_vecs;
@@ -272,7 +272,7 @@ static void write_dirty(struct closure *cl)

dirty_init(w);
io->bio.bi_rw = WRITE;
- io->bio.bi_sector = KEY_START(&w->key);
+ io->bio.bi_iter.bi_sector = KEY_START(&w->key);
io->bio.bi_bdev = io->dc->bdev;
io->bio.bi_end_io = dirty_endio;

@@ -344,7 +344,7 @@ static void read_dirty(struct closure *cl)
io->dc = dc;

dirty_init(w);
- io->bio.bi_sector = PTR_OFFSET(&w->key, 0);
+ io->bio.bi_iter.bi_sector = PTR_OFFSET(&w->key, 0);
io->bio.bi_bdev = PTR_CACHE(dc->disk.c,
&w->key, 0)->bdev;
io->bio.bi_rw = READ;
diff --git a/drivers/md/dm-bio-record.h b/drivers/md/dm-bio-record.h
index 3a8cfa2..5ace48e 100644
--- a/drivers/md/dm-bio-record.h
+++ b/drivers/md/dm-bio-record.h
@@ -40,10 +40,10 @@ static inline void dm_bio_record(struct dm_bio_details *bd, struct bio *bio)
{
unsigned i;

- bd->bi_sector = bio->bi_sector;
+ bd->bi_sector = bio->bi_iter.bi_sector;
bd->bi_bdev = bio->bi_bdev;
- bd->bi_size = bio->bi_size;
- bd->bi_idx = bio->bi_idx;
+ bd->bi_size = bio->bi_iter.bi_size;
+ bd->bi_idx = bio->bi_iter.bi_idx;
bd->bi_flags = bio->bi_flags;

for (i = 0; i < bio->bi_vcnt; i++) {
@@ -56,10 +56,10 @@ static inline void dm_bio_restore(struct dm_bio_details *bd, struct bio *bio)
{
unsigned i;

- bio->bi_sector = bd->bi_sector;
+ bio->bi_iter.bi_sector = bd->bi_sector;
bio->bi_bdev = bd->bi_bdev;
- bio->bi_size = bd->bi_size;
- bio->bi_idx = bd->bi_idx;
+ bio->bi_iter.bi_size = bd->bi_size;
+ bio->bi_iter.bi_idx = bd->bi_idx;
bio->bi_flags = bd->bi_flags;

for (i = 0; i < bio->bi_vcnt; i++) {
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 0387e05..bb6d1d3 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -537,7 +537,7 @@ static void use_inline_bio(struct dm_buffer *b, int rw, sector_t block,
bio_init(&b->bio);
b->bio.bi_io_vec = b->bio_vec;
b->bio.bi_max_vecs = DM_BUFIO_INLINE_VECS;
- b->bio.bi_sector = block << b->c->sectors_per_block_bits;
+ b->bio.bi_iter.bi_sector = block << b->c->sectors_per_block_bits;
b->bio.bi_bdev = b->c->bdev;
b->bio.bi_end_io = end_io;

diff --git a/drivers/md/dm-cache-policy-mq.c b/drivers/md/dm-cache-policy-mq.c
index dc112a7..6e8aedb 100644
--- a/drivers/md/dm-cache-policy-mq.c
+++ b/drivers/md/dm-cache-policy-mq.c
@@ -85,7 +85,7 @@ static enum io_pattern iot_pattern(struct io_tracker *t)

static void iot_update_stats(struct io_tracker *t, struct bio *bio)
{
- if (bio->bi_sector == from_oblock(t->last_end_oblock) + 1)
+ if (bio->bi_iter.bi_sector == from_oblock(t->last_end_oblock) + 1)
t->nr_seq_samples++;
else {
/*
@@ -100,7 +100,7 @@ static void iot_update_stats(struct io_tracker *t, struct bio *bio)
t->nr_rand_samples++;
}

- t->last_end_oblock = to_oblock(bio->bi_sector + bio_sectors(bio) - 1);
+ t->last_end_oblock = to_oblock(bio_end_sector(bio) - 1);
}

static void iot_check_for_pattern_switch(struct io_tracker *t)
diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index df44b60..59372e0 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -561,14 +561,16 @@ static void remap_to_origin(struct cache *cache, struct bio *bio)
static void remap_to_cache(struct cache *cache, struct bio *bio,
dm_cblock_t cblock)
{
- sector_t bi_sector = bio->bi_sector;
+ sector_t bi_sector = bio->bi_iter.bi_sector;

bio->bi_bdev = cache->cache_dev->bdev;
if (!block_size_is_power_of_two(cache))
- bio->bi_sector = (from_cblock(cblock) * cache->sectors_per_block) +
+ bio->bi_iter.bi_sector = (from_cblock(cblock) *
+ cache->sectors_per_block) +
sector_div(bi_sector, cache->sectors_per_block);
else
- bio->bi_sector = (from_cblock(cblock) << cache->sectors_per_block_shift) |
+ bio->bi_iter.bi_sector = (from_cblock(cblock) <<
+ cache->sectors_per_block_shift) |
(bi_sector & (cache->sectors_per_block - 1));
}

@@ -608,7 +610,7 @@ static void remap_to_cache_dirty(struct cache *cache, struct bio *bio,

static dm_oblock_t get_bio_block(struct cache *cache, struct bio *bio)
{
- sector_t block_nr = bio->bi_sector;
+ sector_t block_nr = bio->bi_iter.bi_sector;

if (!block_size_is_power_of_two(cache))
(void) sector_div(block_nr, cache->sectors_per_block);
@@ -1060,7 +1062,7 @@ static void process_flush_bio(struct cache *cache, struct bio *bio)
size_t pb_data_size = get_per_bio_data_size(cache);
struct per_bio_data *pb = get_per_bio_data(bio, pb_data_size);

- BUG_ON(bio->bi_size);
+ BUG_ON(bio->bi_iter.bi_size);
if (!pb->req_nr)
remap_to_origin(cache, bio);
else
@@ -1083,9 +1085,9 @@ static void process_flush_bio(struct cache *cache, struct bio *bio)
*/
static void process_discard_bio(struct cache *cache, struct bio *bio)
{
- dm_block_t start_block = dm_sector_div_up(bio->bi_sector,
+ dm_block_t start_block = dm_sector_div_up(bio->bi_iter.bi_sector,
cache->discard_block_size);
- dm_block_t end_block = bio->bi_sector + bio_sectors(bio);
+ dm_block_t end_block = bio_end_sector(bio);
dm_block_t b;

end_block = block_div(end_block, cache->discard_block_size);
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 6d2d41a..fca3bba 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -652,8 +652,8 @@ static void crypt_convert_init(struct crypt_config *cc,
ctx->bio_out = bio_out;
ctx->offset_in = 0;
ctx->offset_out = 0;
- ctx->idx_in = bio_in ? bio_in->bi_idx : 0;
- ctx->idx_out = bio_out ? bio_out->bi_idx : 0;
+ ctx->idx_in = bio_in ? bio_in->bi_iter.bi_idx : 0;
+ ctx->idx_out = bio_out ? bio_out->bi_iter.bi_idx : 0;
ctx->cc_sector = sector + cc->iv_offset;
init_completion(&ctx->restart);
}
@@ -845,7 +845,7 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned size,
size -= len;
}

- if (!clone->bi_size) {
+ if (!clone->bi_iter.bi_size) {
bio_put(clone);
return NULL;
}
@@ -985,7 +985,7 @@ static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
crypt_inc_pending(io);

clone_init(io, clone);
- clone->bi_sector = cc->start + io->sector;
+ clone->bi_iter.bi_sector = cc->start + io->sector;

generic_make_request(clone);
return 0;
@@ -1033,7 +1033,7 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
/* crypt_convert should have filled the clone bio */
BUG_ON(io->ctx.idx_out < clone->bi_vcnt);

- clone->bi_sector = cc->start + io->sector;
+ clone->bi_iter.bi_sector = cc->start + io->sector;

if (async)
kcryptd_queue_io(io);
@@ -1048,7 +1048,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
struct dm_crypt_io *new_io;
int crypt_finished;
unsigned out_of_pages = 0;
- unsigned remaining = io->base_bio->bi_size;
+ unsigned remaining = io->base_bio->bi_iter.bi_size;
sector_t sector = io->sector;
int r;

@@ -1072,7 +1072,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
io->ctx.bio_out = clone;
io->ctx.idx_out = 0;

- remaining -= clone->bi_size;
+ remaining -= clone->bi_iter.bi_size;
sector += bio_sectors(clone);

crypt_inc_pending(io);
@@ -1687,11 +1687,13 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
if (unlikely(bio->bi_rw & (REQ_FLUSH | REQ_DISCARD))) {
bio->bi_bdev = cc->dev->bdev;
if (bio_sectors(bio))
- bio->bi_sector = cc->start + dm_target_offset(ti, bio->bi_sector);
+ bio->bi_iter.bi_sector = cc->start +
+ dm_target_offset(ti, bio->bi_iter.bi_sector);
return DM_MAPIO_REMAPPED;
}

- io = crypt_io_alloc(cc, bio, dm_target_offset(ti, bio->bi_sector));
+ io = crypt_io_alloc(cc, bio,
+ dm_target_offset(ti, bio->bi_iter.bi_sector));

if (bio_data_dir(io->base_bio) == READ) {
if (kcryptd_io_read(io, GFP_NOWAIT))
diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
index 496d5f3..84c8601 100644
--- a/drivers/md/dm-delay.c
+++ b/drivers/md/dm-delay.c
@@ -281,14 +281,15 @@ static int delay_map(struct dm_target *ti, struct bio *bio)
if ((bio_data_dir(bio) == WRITE) && (dc->dev_write)) {
bio->bi_bdev = dc->dev_write->bdev;
if (bio_sectors(bio))
- bio->bi_sector = dc->start_write +
- dm_target_offset(ti, bio->bi_sector);
+ bio->bi_iter.bi_sector = dc->start_write +
+ dm_target_offset(ti, bio->bi_iter.bi_sector);

return delay_bio(dc, dc->write_delay, bio);
}

bio->bi_bdev = dc->dev_read->bdev;
- bio->bi_sector = dc->start_read + dm_target_offset(ti, bio->bi_sector);
+ bio->bi_iter.bi_sector = dc->start_read +
+ dm_target_offset(ti, bio->bi_iter.bi_sector);

return delay_bio(dc, dc->read_delay, bio);
}
diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index 7fcf21c..b8785cd 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -248,7 +248,8 @@ static void flakey_map_bio(struct dm_target *ti, struct bio *bio)

bio->bi_bdev = fc->dev->bdev;
if (bio_sectors(bio))
- bio->bi_sector = flakey_map_sector(ti, bio->bi_sector);
+ bio->bi_iter.bi_sector =
+ flakey_map_sector(ti, bio->bi_iter.bi_sector);
}

static void corrupt_bio_data(struct bio *bio, struct flakey_c *fc)
@@ -265,8 +266,8 @@ static void corrupt_bio_data(struct bio *bio, struct flakey_c *fc)
DMDEBUG("Corrupting data bio=%p by writing %u to byte %u "
"(rw=%c bi_rw=%lu bi_sector=%llu cur_bytes=%u)\n",
bio, fc->corrupt_bio_value, fc->corrupt_bio_byte,
- (bio_data_dir(bio) == WRITE) ? 'w' : 'r',
- bio->bi_rw, (unsigned long long)bio->bi_sector, bio_bytes);
+ (bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_rw,
+ (unsigned long long)bio->bi_iter.bi_sector, bio_bytes);
}
}

diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index ea49834..a6de5c9 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -305,14 +305,15 @@ static void do_region(int rw, unsigned region, struct dm_io_region *where,
dm_sector_div_up(remaining, (PAGE_SIZE >> SECTOR_SHIFT)));

bio = bio_alloc_bioset(GFP_NOIO, num_bvecs, io->client->bios);
- bio->bi_sector = where->sector + (where->count - remaining);
+ bio->bi_iter.bi_sector = where->sector +
+ (where->count - remaining);
bio->bi_bdev = where->bdev;
bio->bi_end_io = endio;
store_io_and_region_in_bio(bio, io, region);

if (rw & REQ_DISCARD) {
num_sectors = min_t(sector_t, q->limits.max_discard_sectors, remaining);
- bio->bi_size = num_sectors << SECTOR_SHIFT;
+ bio->bi_iter.bi_size = num_sectors << SECTOR_SHIFT;
remaining -= num_sectors;
} else if (rw & REQ_WRITE_SAME) {
/*
@@ -321,7 +322,7 @@ static void do_region(int rw, unsigned region, struct dm_io_region *where,
dp->get_page(dp, &page, &len, &offset);
bio_add_page(bio, page, logical_block_size, offset);
num_sectors = min_t(sector_t, q->limits.max_write_same_sectors, remaining);
- bio->bi_size = num_sectors << SECTOR_SHIFT;
+ bio->bi_iter.bi_size = num_sectors << SECTOR_SHIFT;

offset = 0;
remaining -= num_sectors;
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 4f99d26..53e848c 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -85,7 +85,8 @@ static void linear_map_bio(struct dm_target *ti, struct bio *bio)

bio->bi_bdev = lc->dev->bdev;
if (bio_sectors(bio))
- bio->bi_sector = linear_map_sector(ti, bio->bi_sector);
+ bio->bi_iter.bi_sector =
+ linear_map_sector(ti, bio->bi_iter.bi_sector);
}

static int linear_map(struct dm_target *ti, struct bio *bio)
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index 699b5be..e3efb91 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -432,7 +432,7 @@ static int mirror_available(struct mirror_set *ms, struct bio *bio)
region_t region = dm_rh_bio_to_region(ms->rh, bio);

if (log->type->in_sync(log, region, 0))
- return choose_mirror(ms, bio->bi_sector) ? 1 : 0;
+ return choose_mirror(ms, bio->bi_iter.bi_sector) ? 1 : 0;

return 0;
}
@@ -442,15 +442,15 @@ static int mirror_available(struct mirror_set *ms, struct bio *bio)
*/
static sector_t map_sector(struct mirror *m, struct bio *bio)
{
- if (unlikely(!bio->bi_size))
+ if (unlikely(!bio->bi_iter.bi_size))
return 0;
- return m->offset + dm_target_offset(m->ms->ti, bio->bi_sector);
+ return m->offset + dm_target_offset(m->ms->ti, bio->bi_iter.bi_sector);
}

static void map_bio(struct mirror *m, struct bio *bio)
{
bio->bi_bdev = m->dev->bdev;
- bio->bi_sector = map_sector(m, bio);
+ bio->bi_iter.bi_sector = map_sector(m, bio);
}

static void map_region(struct dm_io_region *io, struct mirror *m,
@@ -527,7 +527,7 @@ static void read_async_bio(struct mirror *m, struct bio *bio)
struct dm_io_request io_req = {
.bi_rw = READ,
.mem.type = DM_IO_BVEC,
- .mem.ptr.bvec = bio->bi_io_vec + bio->bi_idx,
+ .mem.ptr.bvec = bio->bi_io_vec + bio->bi_iter.bi_idx,
.notify.fn = read_callback,
.notify.context = bio,
.client = m->ms->io_client,
@@ -559,7 +559,7 @@ static void do_reads(struct mirror_set *ms, struct bio_list *reads)
* We can only read balance if the region is in sync.
*/
if (likely(region_in_sync(ms, region, 1)))
- m = choose_mirror(ms, bio->bi_sector);
+ m = choose_mirror(ms, bio->bi_iter.bi_sector);
else if (m && atomic_read(&m->error_count))
m = NULL;

@@ -630,7 +630,7 @@ static void do_write(struct mirror_set *ms, struct bio *bio)
struct dm_io_request io_req = {
.bi_rw = WRITE | (bio->bi_rw & WRITE_FLUSH_FUA),
.mem.type = DM_IO_BVEC,
- .mem.ptr.bvec = bio->bi_io_vec + bio->bi_idx,
+ .mem.ptr.bvec = bio->bi_io_vec + bio->bi_iter.bi_idx,
.notify.fn = write_callback,
.notify.context = bio,
.client = ms->io_client,
@@ -1182,7 +1182,7 @@ static int mirror_map(struct dm_target *ti, struct bio *bio)
* The region is in-sync and we can perform reads directly.
* Store enough information so we can retry if it fails.
*/
- m = choose_mirror(ms, bio->bi_sector);
+ m = choose_mirror(ms, bio->bi_iter.bi_sector);
if (unlikely(!m))
return -EIO;

diff --git a/drivers/md/dm-region-hash.c b/drivers/md/dm-region-hash.c
index 69732e0..b929fd5 100644
--- a/drivers/md/dm-region-hash.c
+++ b/drivers/md/dm-region-hash.c
@@ -126,7 +126,8 @@ EXPORT_SYMBOL_GPL(dm_rh_region_to_sector);

region_t dm_rh_bio_to_region(struct dm_region_hash *rh, struct bio *bio)
{
- return dm_rh_sector_to_region(rh, bio->bi_sector - rh->target_begin);
+ return dm_rh_sector_to_region(rh, bio->bi_iter.bi_sector -
+ rh->target_begin);
}
EXPORT_SYMBOL_GPL(dm_rh_bio_to_region);

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c
index c434e5a..a1df777 100644
--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -1563,10 +1563,10 @@ static void remap_exception(struct dm_snapshot *s, struct dm_exception *e,
struct bio *bio, chunk_t chunk)
{
bio->bi_bdev = s->cow->bdev;
- bio->bi_sector = chunk_to_sector(s->store,
+ bio->bi_iter.bi_sector = chunk_to_sector(s->store,
dm_chunk_number(e->new_chunk) +
(chunk - e->old_chunk)) +
- (bio->bi_sector &
+ (bio->bi_iter.bi_sector &
s->store->chunk_mask);
}

@@ -1585,7 +1585,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
}

- chunk = sector_to_chunk(s->store, bio->bi_sector);
+ chunk = sector_to_chunk(s->store, bio->bi_iter.bi_sector);

/* Full snapshots are not usable */
/* To get here the table must be live so s->active is always set. */
@@ -1646,7 +1646,8 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
r = DM_MAPIO_SUBMITTED;

if (!pe->started &&
- bio->bi_size == (s->store->chunk_size << SECTOR_SHIFT)) {
+ bio->bi_iter.bi_size ==
+ (s->store->chunk_size << SECTOR_SHIFT)) {
pe->started = 1;
up_write(&s->lock);
start_full_bio(pe, bio);
@@ -1702,7 +1703,7 @@ static int snapshot_merge_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
}

- chunk = sector_to_chunk(s->store, bio->bi_sector);
+ chunk = sector_to_chunk(s->store, bio->bi_iter.bi_sector);

down_write(&s->lock);

@@ -2039,7 +2040,7 @@ static int do_origin(struct dm_dev *origin, struct bio *bio)
down_read(&_origins_lock);
o = __lookup_origin(origin->bdev);
if (o)
- r = __origin_write(&o->snapshots, bio->bi_sector, bio);
+ r = __origin_write(&o->snapshots, bio->bi_iter.bi_sector, bio);
up_read(&_origins_lock);

return r;
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index d907ca6..8045b09 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -258,13 +258,15 @@ static int stripe_map_range(struct stripe_c *sc, struct bio *bio,
{
sector_t begin, end;

- stripe_map_range_sector(sc, bio->bi_sector, target_stripe, &begin);
+ stripe_map_range_sector(sc, bio->bi_iter.bi_sector,
+ target_stripe, &begin);
stripe_map_range_sector(sc, bio_end_sector(bio),
target_stripe, &end);
if (begin < end) {
bio->bi_bdev = sc->stripe[target_stripe].dev->bdev;
- bio->bi_sector = begin + sc->stripe[target_stripe].physical_start;
- bio->bi_size = to_bytes(end - begin);
+ bio->bi_iter.bi_sector = begin +
+ sc->stripe[target_stripe].physical_start;
+ bio->bi_iter.bi_size = to_bytes(end - begin);
return DM_MAPIO_REMAPPED;
} else {
/* The range doesn't map to the target stripe */
@@ -292,9 +294,10 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
return stripe_map_range(sc, bio, target_bio_nr);
}

- stripe_map_sector(sc, bio->bi_sector, &stripe, &bio->bi_sector);
+ stripe_map_sector(sc, bio->bi_iter.bi_sector,
+ &stripe, &bio->bi_iter.bi_sector);

- bio->bi_sector += sc->stripe[stripe].physical_start;
+ bio->bi_iter.bi_sector += sc->stripe[stripe].physical_start;
bio->bi_bdev = sc->stripe[stripe].dev->bdev;

return DM_MAPIO_REMAPPED;
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 88f2f80..2e2212c 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -413,7 +413,7 @@ static bool block_size_is_power_of_two(struct pool *pool)
static dm_block_t get_bio_block(struct thin_c *tc, struct bio *bio)
{
struct pool *pool = tc->pool;
- sector_t block_nr = bio->bi_sector;
+ sector_t block_nr = bio->bi_iter.bi_sector;

if (block_size_is_power_of_two(pool))
block_nr >>= pool->sectors_per_block_shift;
@@ -426,14 +426,15 @@ static dm_block_t get_bio_block(struct thin_c *tc, struct bio *bio)
static void remap(struct thin_c *tc, struct bio *bio, dm_block_t block)
{
struct pool *pool = tc->pool;
- sector_t bi_sector = bio->bi_sector;
+ sector_t bi_sector = bio->bi_iter.bi_sector;

bio->bi_bdev = tc->pool_dev->bdev;
if (block_size_is_power_of_two(pool))
- bio->bi_sector = (block << pool->sectors_per_block_shift) |
- (bi_sector & (pool->sectors_per_block - 1));
+ bio->bi_iter.bi_sector =
+ (block << pool->sectors_per_block_shift) |
+ (bi_sector & (pool->sectors_per_block - 1));
else
- bio->bi_sector = (block * pool->sectors_per_block) +
+ bio->bi_iter.bi_sector = (block * pool->sectors_per_block) +
sector_div(bi_sector, pool->sectors_per_block);
}

@@ -721,7 +722,8 @@ static void process_prepared(struct pool *pool, struct list_head *head,
*/
static int io_overlaps_block(struct pool *pool, struct bio *bio)
{
- return bio->bi_size == (pool->sectors_per_block << SECTOR_SHIFT);
+ return bio->bi_iter.bi_size ==
+ (pool->sectors_per_block << SECTOR_SHIFT);
}

static int io_overwrites_block(struct pool *pool, struct bio *bio)
@@ -1121,7 +1123,7 @@ static void process_shared_bio(struct thin_c *tc, struct bio *bio,
if (bio_detain(pool, &key, bio, &cell))
return;

- if (bio_data_dir(bio) == WRITE && bio->bi_size)
+ if (bio_data_dir(bio) == WRITE && bio->bi_iter.bi_size)
break_sharing(tc, bio, block, &key, lookup_result, cell);
else {
struct dm_thin_endio_hook *h = dm_per_bio_data(bio, sizeof(struct dm_thin_endio_hook));
@@ -1144,7 +1146,7 @@ static void provision_block(struct thin_c *tc, struct bio *bio, dm_block_t block
/*
* Remap empty bios (flushes) immediately, without provisioning.
*/
- if (!bio->bi_size) {
+ if (!bio->bi_iter.bi_size) {
inc_all_io_entry(pool, bio);
cell_defer_no_holder(tc, cell);

@@ -1244,7 +1246,8 @@ static void process_bio_read_only(struct thin_c *tc, struct bio *bio)
r = dm_thin_find_block(tc->td, block, 1, &lookup_result);
switch (r) {
case 0:
- if (lookup_result.shared && (rw == WRITE) && bio->bi_size)
+ if (lookup_result.shared &&
+ (rw == WRITE) && bio->bi_iter.bi_size)
bio_io_error(bio);
else {
inc_all_io_entry(tc->pool, bio);
@@ -2827,7 +2830,7 @@ out_unlock:

static int thin_map(struct dm_target *ti, struct bio *bio)
{
- bio->bi_sector = dm_target_offset(ti, bio->bi_sector);
+ bio->bi_iter.bi_sector = dm_target_offset(ti, bio->bi_iter.bi_sector);

return thin_bio_map(ti, bio);
}
diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
index b948fd8..90ce9e0 100644
--- a/drivers/md/dm-verity.c
+++ b/drivers/md/dm-verity.c
@@ -493,9 +493,9 @@ static int verity_map(struct dm_target *ti, struct bio *bio)
struct dm_verity_io *io;

bio->bi_bdev = v->data_dev->bdev;
- bio->bi_sector = verity_map_sector(v, bio->bi_sector);
+ bio->bi_iter.bi_sector = verity_map_sector(v, bio->bi_iter.bi_sector);

- if (((unsigned)bio->bi_sector | bio_sectors(bio)) &
+ if (((unsigned)bio->bi_iter.bi_sector | bio_sectors(bio)) &
((1 << (v->data_dev_block_bits - SECTOR_SHIFT)) - 1)) {
DMERR_LIMIT("unaligned io");
return -EIO;
@@ -514,8 +514,9 @@ static int verity_map(struct dm_target *ti, struct bio *bio)
io->v = v;
io->orig_bi_end_io = bio->bi_end_io;
io->orig_bi_private = bio->bi_private;
- io->block = bio->bi_sector >> (v->data_dev_block_bits - SECTOR_SHIFT);
- io->n_blocks = bio->bi_size >> v->data_dev_block_bits;
+ io->block = bio->bi_iter.bi_sector >>
+ (v->data_dev_block_bits - SECTOR_SHIFT);
+ io->n_blocks = bio->bi_iter.bi_size >> v->data_dev_block_bits;

bio->bi_end_io = verity_end_io;
bio->bi_private = io;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index d5370a9..d67c6a9 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -600,7 +600,7 @@ static void dec_pending(struct dm_io *io, int error)
if (io_error == DM_ENDIO_REQUEUE)
return;

- if ((bio->bi_rw & REQ_FLUSH) && bio->bi_size) {
+ if ((bio->bi_rw & REQ_FLUSH) && bio->bi_iter.bi_size) {
/*
* Preflush done for flush with data, reissue
* without REQ_FLUSH.
@@ -655,7 +655,7 @@ static void end_clone_bio(struct bio *clone, int error)
struct dm_rq_clone_bio_info *info = clone->bi_private;
struct dm_rq_target_io *tio = info->tio;
struct bio *bio = info->orig;
- unsigned int nr_bytes = info->orig->bi_size;
+ unsigned int nr_bytes = info->orig->bi_iter.bi_size;

bio_put(clone);

@@ -986,7 +986,7 @@ static void __map_bio(struct dm_target_io *tio)
* this io.
*/
atomic_inc(&tio->io->io_count);
- sector = clone->bi_sector;
+ sector = clone->bi_iter.bi_sector;
r = ti->type->map(ti, clone);
if (r == DM_MAPIO_REMAPPED) {
/* the bio has been remapped so dispatch it */
@@ -1018,13 +1018,13 @@ struct clone_info {

static void bio_setup_sector(struct bio *bio, sector_t sector, sector_t len)
{
- bio->bi_sector = sector;
- bio->bi_size = to_bytes(len);
+ bio->bi_iter.bi_sector = sector;
+ bio->bi_iter.bi_size = to_bytes(len);
}

static void bio_setup_bv(struct bio *bio, unsigned short idx, unsigned short bv_count)
{
- bio->bi_idx = idx;
+ bio->bi_iter.bi_idx = idx;
bio->bi_vcnt = idx + bv_count;
bio->bi_flags &= ~(1 << BIO_SEG_VALID);
}
@@ -1060,7 +1060,7 @@ static void clone_split_bio(struct dm_target_io *tio, struct bio *bio,
clone->bi_rw = bio->bi_rw;
clone->bi_vcnt = 1;
clone->bi_io_vec->bv_offset = offset;
- clone->bi_io_vec->bv_len = clone->bi_size;
+ clone->bi_io_vec->bv_len = clone->bi_iter.bi_size;
clone->bi_flags |= 1 << BIO_CLONED;

clone_bio_integrity(bio, clone, idx, len, offset, 1);
@@ -1080,7 +1080,8 @@ static void clone_bio(struct dm_target_io *tio, struct bio *bio,
bio_setup_sector(clone, sector, len);
bio_setup_bv(clone, idx, bv_count);

- if (idx != bio->bi_idx || clone->bi_size < bio->bi_size)
+ if (idx != bio->bi_iter.bi_idx ||
+ clone->bi_iter.bi_size < bio->bi_iter.bi_size)
trim = 1;
clone_bio_integrity(bio, clone, idx, len, 0, trim);
}
@@ -1367,8 +1368,8 @@ static void __split_and_process_bio(struct mapped_device *md, struct bio *bio)
ci.io->bio = bio;
ci.io->md = md;
spin_lock_init(&ci.io->endio_lock);
- ci.sector = bio->bi_sector;
- ci.idx = bio->bi_idx;
+ ci.sector = bio->bi_iter.bi_sector;
+ ci.idx = bio->bi_iter.bi_idx;

start_io_acct(ci.io);

diff --git a/drivers/md/faulty.c b/drivers/md/faulty.c
index 3193aef..e8b4574 100644
--- a/drivers/md/faulty.c
+++ b/drivers/md/faulty.c
@@ -74,8 +74,8 @@ static void faulty_fail(struct bio *bio, int error)
{
struct bio *b = bio->bi_private;

- b->bi_size = bio->bi_size;
- b->bi_sector = bio->bi_sector;
+ b->bi_iter.bi_size = bio->bi_iter.bi_size;
+ b->bi_iter.bi_sector = bio->bi_iter.bi_sector;

bio_put(bio);

@@ -185,26 +185,31 @@ static void make_request(struct mddev *mddev, struct bio *bio)
return;
}

- if (check_sector(conf, bio->bi_sector, bio_end_sector(bio), WRITE))
+ if (check_sector(conf, bio->bi_iter.bi_sector,
+ bio_end_sector(bio), WRITE))
failit = 1;
if (check_mode(conf, WritePersistent)) {
- add_sector(conf, bio->bi_sector, WritePersistent);
+ add_sector(conf, bio->bi_iter.bi_sector,
+ WritePersistent);
failit = 1;
}
if (check_mode(conf, WriteTransient))
failit = 1;
} else {
/* read request */
- if (check_sector(conf, bio->bi_sector, bio_end_sector(bio), READ))
+ if (check_sector(conf, bio->bi_iter.bi_sector,
+ bio_end_sector(bio), READ))
failit = 1;
if (check_mode(conf, ReadTransient))
failit = 1;
if (check_mode(conf, ReadPersistent)) {
- add_sector(conf, bio->bi_sector, ReadPersistent);
+ add_sector(conf, bio->bi_iter.bi_sector,
+ ReadPersistent);
failit = 1;
}
if (check_mode(conf, ReadFixable)) {
- add_sector(conf, bio->bi_sector, ReadFixable);
+ add_sector(conf, bio->bi_iter.bi_sector,
+ ReadFixable);
failit = 1;
}
}
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index f03fabd..fb3b0d0 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -297,19 +297,19 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)
}

rcu_read_lock();
- tmp_dev = which_dev(mddev, bio->bi_sector);
+ tmp_dev = which_dev(mddev, bio->bi_iter.bi_sector);
start_sector = tmp_dev->end_sector - tmp_dev->rdev->sectors;


- if (unlikely(bio->bi_sector >= (tmp_dev->end_sector)
- || (bio->bi_sector < start_sector))) {
+ if (unlikely(bio->bi_iter.bi_sector >= (tmp_dev->end_sector)
+ || (bio->bi_iter.bi_sector < start_sector))) {
char b[BDEVNAME_SIZE];

printk(KERN_ERR
"md/linear:%s: make_request: Sector %llu out of bounds on "
"dev %s: %llu sectors, offset %llu\n",
mdname(mddev),
- (unsigned long long)bio->bi_sector,
+ (unsigned long long)bio->bi_iter.bi_sector,
bdevname(tmp_dev->rdev->bdev, b),
(unsigned long long)tmp_dev->rdev->sectors,
(unsigned long long)start_sector);
@@ -326,7 +326,7 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)

rcu_read_unlock();

- bp = bio_split(bio, end_sector - bio->bi_sector);
+ bp = bio_split(bio, end_sector - bio->bi_iter.bi_sector);

linear_make_request(mddev, &bp->bio1);
linear_make_request(mddev, &bp->bio2);
@@ -335,7 +335,7 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)
}

bio->bi_bdev = tmp_dev->rdev->bdev;
- bio->bi_sector = bio->bi_sector - start_sector
+ bio->bi_iter.bi_sector = bio->bi_iter.bi_sector - start_sector
+ tmp_dev->rdev->data_offset;
rcu_read_unlock();

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 681d109..38464ce 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -194,21 +194,22 @@ void md_trim_bio(struct bio *bio, int offset, int size)
int sofar = 0;

size <<= 9;
- if (offset == 0 && size == bio->bi_size)
+ if (offset == 0 && size == bio->bi_iter.bi_size)
return;

clear_bit(BIO_SEG_VALID, &bio->bi_flags);

bio_advance(bio, offset << 9);

- bio->bi_size = size;
+ bio->bi_iter.bi_size = size;

/* avoid any complications with bi_idx being non-zero*/
- if (bio->bi_idx) {
- memmove(bio->bi_io_vec, bio->bi_io_vec+bio->bi_idx,
- (bio->bi_vcnt - bio->bi_idx) * sizeof(struct bio_vec));
- bio->bi_vcnt -= bio->bi_idx;
- bio->bi_idx = 0;
+ if (bio->bi_iter.bi_idx) {
+ memmove(bio->bi_io_vec, bio->bi_io_vec+bio->bi_iter.bi_idx,
+ (bio->bi_vcnt - bio->bi_iter.bi_idx) *
+ sizeof(struct bio_vec));
+ bio->bi_vcnt -= bio->bi_iter.bi_idx;
+ bio->bi_iter.bi_idx = 0;
}
/* Make sure vcnt and last bv are not too big */
bio_for_each_segment(bvec, bio, i) {
@@ -433,7 +434,7 @@ static void md_submit_flush_data(struct work_struct *ws)
struct mddev *mddev = container_of(ws, struct mddev, flush_work);
struct bio *bio = mddev->flush_bio;

- if (bio->bi_size == 0)
+ if (bio->bi_iter.bi_size == 0)
/* an empty barrier - all done */
bio_endio(bio, 0);
else {
@@ -785,7 +786,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, mddev);

bio->bi_bdev = rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev;
- bio->bi_sector = sector;
+ bio->bi_iter.bi_sector = sector;
bio_add_page(bio, page, size, 0);
bio->bi_private = rdev;
bio->bi_end_io = super_written;
@@ -824,13 +825,13 @@ int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
bio->bi_bdev = (metadata_op && rdev->meta_bdev) ?
rdev->meta_bdev : rdev->bdev;
if (metadata_op)
- bio->bi_sector = sector + rdev->sb_start;
+ bio->bi_iter.bi_sector = sector + rdev->sb_start;
else if (rdev->mddev->reshape_position != MaxSector &&
(rdev->mddev->reshape_backwards ==
(sector >= rdev->mddev->reshape_position)))
- bio->bi_sector = sector + rdev->new_data_offset;
+ bio->bi_iter.bi_sector = sector + rdev->new_data_offset;
else
- bio->bi_sector = sector + rdev->data_offset;
+ bio->bi_iter.bi_sector = sector + rdev->data_offset;
bio_add_page(bio, page, size, 0);
init_completion(&event);
bio->bi_private = &event;
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 1642eae..849ad39 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -100,7 +100,7 @@ static void multipath_end_request(struct bio *bio, int error)
md_error (mp_bh->mddev, rdev);
printk(KERN_ERR "multipath: %s: rescheduling sector %llu\n",
bdevname(rdev->bdev,b),
- (unsigned long long)bio->bi_sector);
+ (unsigned long long)bio->bi_iter.bi_sector);
multipath_reschedule_retry(mp_bh);
} else
multipath_end_bh_io(mp_bh, error);
@@ -132,7 +132,7 @@ static void multipath_make_request(struct mddev *mddev, struct bio * bio)
multipath = conf->multipaths + mp_bh->path;

mp_bh->bio = *bio;
- mp_bh->bio.bi_sector += multipath->rdev->data_offset;
+ mp_bh->bio.bi_iter.bi_sector += multipath->rdev->data_offset;
mp_bh->bio.bi_bdev = multipath->rdev->bdev;
mp_bh->bio.bi_rw |= REQ_FAILFAST_TRANSPORT;
mp_bh->bio.bi_end_io = multipath_end_request;
@@ -355,21 +355,22 @@ static void multipathd(struct md_thread *thread)
spin_unlock_irqrestore(&conf->device_lock, flags);

bio = &mp_bh->bio;
- bio->bi_sector = mp_bh->master_bio->bi_sector;
+ bio->bi_iter.bi_sector = mp_bh->master_bio->bi_iter.bi_sector;

if ((mp_bh->path = multipath_map (conf))<0) {
printk(KERN_ALERT "multipath: %s: unrecoverable IO read"
" error for block %llu\n",
bdevname(bio->bi_bdev,b),
- (unsigned long long)bio->bi_sector);
+ (unsigned long long)bio->bi_iter.bi_sector);
multipath_end_bh_io(mp_bh, -EIO);
} else {
printk(KERN_ERR "multipath: %s: redirecting sector %llu"
" to another IO path\n",
bdevname(bio->bi_bdev,b),
- (unsigned long long)bio->bi_sector);
+ (unsigned long long)bio->bi_iter.bi_sector);
*bio = *(mp_bh->master_bio);
- bio->bi_sector += conf->multipaths[mp_bh->path].rdev->data_offset;
+ bio->bi_iter.bi_sector +=
+ conf->multipaths[mp_bh->path].rdev->data_offset;
bio->bi_bdev = conf->multipaths[mp_bh->path].rdev->bdev;
bio->bi_rw |= REQ_FAILFAST_TRANSPORT;
bio->bi_end_io = multipath_end_request;
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index fcf65e5..d8c2ec0 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -501,10 +501,11 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,
unsigned int chunk_sects, struct bio *bio)
{
if (likely(is_power_of_2(chunk_sects))) {
- return chunk_sects >= ((bio->bi_sector & (chunk_sects-1))
+ return chunk_sects >=
+ ((bio->bi_iter.bi_sector & (chunk_sects-1))
+ bio_sectors(bio));
} else{
- sector_t sector = bio->bi_sector;
+ sector_t sector = bio->bi_iter.bi_sector;
return chunk_sects >= (sector_div(sector, chunk_sects)
+ bio_sectors(bio));
}
@@ -524,7 +525,7 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)

chunk_sects = mddev->chunk_sectors;
if (unlikely(!is_io_in_chunk_boundary(mddev, chunk_sects, bio))) {
- sector_t sector = bio->bi_sector;
+ sector_t sector = bio->bi_iter.bi_sector;
struct bio_pair *bp;
/* Sanity check -- queue functions should prevent this happening */
if (bio_segments(bio) > 1)
@@ -544,12 +545,12 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
return;
}

- sector_offset = bio->bi_sector;
+ sector_offset = bio->bi_iter.bi_sector;
zone = find_zone(mddev->private, &sector_offset);
- tmp_dev = map_sector(mddev, zone, bio->bi_sector,
+ tmp_dev = map_sector(mddev, zone, bio->bi_iter.bi_sector,
&sector_offset);
bio->bi_bdev = tmp_dev->bdev;
- bio->bi_sector = sector_offset + zone->dev_start +
+ bio->bi_iter.bi_sector = sector_offset + zone->dev_start +
tmp_dev->data_offset;

if (unlikely((bio->bi_rw & REQ_DISCARD) &&
@@ -566,7 +567,8 @@ bad_map:
printk("md/raid0:%s: make_request bug: can't convert block across chunks"
" or bigger than %dk %llu %d\n",
mdname(mddev), chunk_sects / 2,
- (unsigned long long)bio->bi_sector, bio_sectors(bio) / 2);
+ (unsigned long long)bio->bi_iter.bi_sector,
+ bio_sectors(bio) / 2);

bio_io_error(bio);
return;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 5595118..1c15387 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -255,8 +255,8 @@ static void raid_end_bio_io(struct r1bio *r1_bio)
if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) {
pr_debug("raid1: sync end %s on sectors %llu-%llu\n",
(bio_data_dir(bio) == WRITE) ? "write" : "read",
- (unsigned long long) bio->bi_sector,
- (unsigned long long) bio->bi_sector +
+ (unsigned long long) bio->bi_iter.bi_sector,
+ (unsigned long long) bio->bi_iter.bi_sector +
bio_sectors(bio) - 1);

call_bio_endio(r1_bio);
@@ -446,9 +446,8 @@ static void raid1_end_write_request(struct bio *bio, int error)
struct bio *mbio = r1_bio->master_bio;
pr_debug("raid1: behind end write sectors"
" %llu-%llu\n",
- (unsigned long long) mbio->bi_sector,
- (unsigned long long) mbio->bi_sector +
- bio_sectors(mbio) - 1);
+ (uint64_t) mbio->bi_iter.bi_sector,
+ bio_end_sector(mbio) - 1);
call_bio_endio(r1_bio);
}
}
@@ -935,7 +934,8 @@ do_sync_io:
if (bvecs[i].bv_page)
put_page(bvecs[i].bv_page);
kfree(bvecs);
- pr_debug("%dB behind alloc failed, doing sync I/O\n", bio->bi_size);
+ pr_debug("%dB behind alloc failed, doing sync I/O\n",
+ bio->bi_iter.bi_size);
}

struct raid1_plug_cb {
@@ -1014,7 +1014,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)

if (bio_data_dir(bio) == WRITE &&
bio_end_sector(bio) > mddev->suspend_lo &&
- bio->bi_sector < mddev->suspend_hi) {
+ bio->bi_iter.bi_sector < mddev->suspend_hi) {
/* As the suspend_* range is controlled by
* userspace, we want an interruptible
* wait.
@@ -1025,7 +1025,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
prepare_to_wait(&conf->wait_barrier,
&w, TASK_INTERRUPTIBLE);
if (bio_end_sector(bio) <= mddev->suspend_lo ||
- bio->bi_sector >= mddev->suspend_hi)
+ bio->bi_iter.bi_sector >= mddev->suspend_hi)
break;
schedule();
}
@@ -1047,7 +1047,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
r1_bio->sectors = bio_sectors(bio);
r1_bio->state = 0;
r1_bio->mddev = mddev;
- r1_bio->sector = bio->bi_sector;
+ r1_bio->sector = bio->bi_iter.bi_sector;

/* We might need to issue multiple reads to different
* devices if there are bad blocks around, so we keep
@@ -1087,12 +1087,13 @@ read_again:
r1_bio->read_disk = rdisk;

read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(read_bio, r1_bio->sector - bio->bi_sector,
+ md_trim_bio(read_bio, r1_bio->sector - bio->bi_iter.bi_sector,
max_sectors);

r1_bio->bios[rdisk] = read_bio;

- read_bio->bi_sector = r1_bio->sector + mirror->rdev->data_offset;
+ read_bio->bi_iter.bi_sector = r1_bio->sector +
+ mirror->rdev->data_offset;
read_bio->bi_bdev = mirror->rdev->bdev;
read_bio->bi_end_io = raid1_end_read_request;
read_bio->bi_rw = READ | do_sync;
@@ -1104,7 +1105,7 @@ read_again:
*/

sectors_handled = (r1_bio->sector + max_sectors
- - bio->bi_sector);
+ - bio->bi_iter.bi_sector);
r1_bio->sectors = max_sectors;
spin_lock_irq(&conf->device_lock);
if (bio->bi_phys_segments == 0)
@@ -1125,7 +1126,8 @@ read_again:
r1_bio->sectors = bio_sectors(bio) - sectors_handled;
r1_bio->state = 0;
r1_bio->mddev = mddev;
- r1_bio->sector = bio->bi_sector + sectors_handled;
+ r1_bio->sector = bio->bi_iter.bi_sector +
+ sectors_handled;
goto read_again;
} else
generic_make_request(read_bio);
@@ -1244,7 +1246,7 @@ read_again:
bio->bi_phys_segments++;
spin_unlock_irq(&conf->device_lock);
}
- sectors_handled = r1_bio->sector + max_sectors - bio->bi_sector;
+ sectors_handled = r1_bio->sector + max_sectors - bio->bi_iter.bi_sector;

atomic_set(&r1_bio->remaining, 1);
atomic_set(&r1_bio->behind_remaining, 0);
@@ -1256,7 +1258,8 @@ read_again:
continue;

mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(mbio, r1_bio->sector - bio->bi_sector, max_sectors);
+ md_trim_bio(mbio, r1_bio->sector - bio->bi_iter.bi_sector,
+ max_sectors);

if (first_clone) {
/* do behind I/O ?
@@ -1290,7 +1293,7 @@ read_again:

r1_bio->bios[i] = mbio;

- mbio->bi_sector = (r1_bio->sector +
+ mbio->bi_iter.bi_sector = (r1_bio->sector +
conf->mirrors[i].rdev->data_offset);
mbio->bi_bdev = conf->mirrors[i].rdev->bdev;
mbio->bi_end_io = raid1_end_write_request;
@@ -1330,7 +1333,7 @@ read_again:
r1_bio->sectors = bio_sectors(bio) - sectors_handled;
r1_bio->state = 0;
r1_bio->mddev = mddev;
- r1_bio->sector = bio->bi_sector + sectors_handled;
+ r1_bio->sector = bio->bi_iter.bi_sector + sectors_handled;
goto retry_write;
}

@@ -1880,14 +1883,14 @@ static int process_checks(struct r1bio *r1_bio)
/* fixup the bio for reuse */
bio_reset(sbio);
sbio->bi_vcnt = vcnt;
- sbio->bi_size = r1_bio->sectors << 9;
- sbio->bi_sector = r1_bio->sector +
+ sbio->bi_iter.bi_size = r1_bio->sectors << 9;
+ sbio->bi_iter.bi_sector = r1_bio->sector +
conf->mirrors[i].rdev->data_offset;
sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
sbio->bi_end_io = end_sync_read;
sbio->bi_private = r1_bio;

- size = sbio->bi_size;
+ size = sbio->bi_iter.bi_size;
for (j = 0; j < vcnt ; j++) {
struct bio_vec *bi;
bi = &sbio->bi_io_vec[j];
@@ -2104,11 +2107,11 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
}

wbio->bi_rw = WRITE;
- wbio->bi_sector = r1_bio->sector;
- wbio->bi_size = r1_bio->sectors << 9;
+ wbio->bi_iter.bi_sector = r1_bio->sector;
+ wbio->bi_iter.bi_size = r1_bio->sectors << 9;

md_trim_bio(wbio, sector - r1_bio->sector, sectors);
- wbio->bi_sector += rdev->data_offset;
+ wbio->bi_iter.bi_sector += rdev->data_offset;
wbio->bi_bdev = rdev->bdev;
if (submit_bio_wait(WRITE, wbio) == 0)
/* failure! */
@@ -2222,7 +2225,8 @@ read_more:
}
r1_bio->read_disk = disk;
bio = bio_clone_mddev(r1_bio->master_bio, GFP_NOIO, mddev);
- md_trim_bio(bio, r1_bio->sector - bio->bi_sector, max_sectors);
+ md_trim_bio(bio, r1_bio->sector - bio->bi_iter.bi_sector,
+ max_sectors);
r1_bio->bios[r1_bio->read_disk] = bio;
rdev = conf->mirrors[disk].rdev;
printk_ratelimited(KERN_ERR
@@ -2231,7 +2235,7 @@ read_more:
mdname(mddev),
(unsigned long long)r1_bio->sector,
bdevname(rdev->bdev, b));
- bio->bi_sector = r1_bio->sector + rdev->data_offset;
+ bio->bi_iter.bi_sector = r1_bio->sector + rdev->data_offset;
bio->bi_bdev = rdev->bdev;
bio->bi_end_io = raid1_end_read_request;
bio->bi_rw = READ | do_sync;
@@ -2240,7 +2244,7 @@ read_more:
/* Drat - have to split this up more */
struct bio *mbio = r1_bio->master_bio;
int sectors_handled = (r1_bio->sector + max_sectors
- - mbio->bi_sector);
+ - mbio->bi_iter.bi_sector);
r1_bio->sectors = max_sectors;
spin_lock_irq(&conf->device_lock);
if (mbio->bi_phys_segments == 0)
@@ -2258,7 +2262,8 @@ read_more:
r1_bio->state = 0;
set_bit(R1BIO_ReadError, &r1_bio->state);
r1_bio->mddev = mddev;
- r1_bio->sector = mbio->bi_sector + sectors_handled;
+ r1_bio->sector = mbio->bi_iter.bi_sector +
+ sectors_handled;

goto read_more;
} else
@@ -2482,7 +2487,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp
}
if (bio->bi_end_io) {
atomic_inc(&rdev->nr_pending);
- bio->bi_sector = sector_nr + rdev->data_offset;
+ bio->bi_iter.bi_sector = sector_nr + rdev->data_offset;
bio->bi_bdev = rdev->bdev;
bio->bi_private = r1_bio;
}
@@ -2582,7 +2587,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp
continue;
/* remove last page from this bio */
bio->bi_vcnt--;
- bio->bi_size -= len;
+ bio->bi_iter.bi_size -= len;
bio->bi_flags &= ~(1<< BIO_SEG_VALID);
}
goto bio_full;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 59d4daa..5899d87 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1174,7 +1174,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
/* If this request crosses a chunk boundary, we need to
* split it. This will only happen for 1 PAGE (or less) requests.
*/
- if (unlikely((bio->bi_sector & chunk_mask) + bio_sectors(bio)
+ if (unlikely((bio->bi_iter.bi_sector & chunk_mask) + bio_sectors(bio)
> chunk_sects
&& (conf->geo.near_copies < conf->geo.raid_disks
|| conf->prev.near_copies < conf->prev.raid_disks))) {
@@ -1185,8 +1185,8 @@ static void make_request(struct mddev *mddev, struct bio * bio)
/* This is a one page bio that upper layers
* refuse to split for us, so we need to split it.
*/
- bp = bio_split(bio,
- chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );
+ bp = bio_split(bio, chunk_sects -
+ (bio->bi_iter.bi_sector & (chunk_sects - 1)));

/* Each of these 'make_request' calls will call 'wait_barrier'.
* If the first succeeds but the second blocks due to the resync
@@ -1213,7 +1213,8 @@ static void make_request(struct mddev *mddev, struct bio * bio)
bad_map:
printk("md/raid10:%s: make_request bug: can't convert block across chunks"
" or bigger than %dk %llu %d\n", mdname(mddev), chunk_sects/2,
- (unsigned long long)bio->bi_sector, bio_sectors(bio) / 2);
+ (unsigned long long)bio->bi_iter.bi_sector,
+ bio_sectors(bio) / 2);

bio_io_error(bio);
return;
@@ -1230,24 +1231,25 @@ static void make_request(struct mddev *mddev, struct bio * bio)

sectors = bio_sectors(bio);
while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
- bio->bi_sector < conf->reshape_progress &&
- bio->bi_sector + sectors > conf->reshape_progress) {
+ bio->bi_iter.bi_sector < conf->reshape_progress &&
+ bio->bi_iter.bi_sector + sectors > conf->reshape_progress) {
/* IO spans the reshape position. Need to wait for
* reshape to pass
*/
allow_barrier(conf);
wait_event(conf->wait_barrier,
- conf->reshape_progress <= bio->bi_sector ||
- conf->reshape_progress >= bio->bi_sector + sectors);
+ conf->reshape_progress <= bio->bi_iter.bi_sector ||
+ conf->reshape_progress >= bio->bi_iter.bi_sector +
+ sectors);
wait_barrier(conf);
}
if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
bio_data_dir(bio) == WRITE &&
(mddev->reshape_backwards
- ? (bio->bi_sector < conf->reshape_safe &&
- bio->bi_sector + sectors > conf->reshape_progress)
- : (bio->bi_sector + sectors > conf->reshape_safe &&
- bio->bi_sector < conf->reshape_progress))) {
+ ? (bio->bi_iter.bi_sector < conf->reshape_safe &&
+ bio->bi_iter.bi_sector + sectors > conf->reshape_progress)
+ : (bio->bi_iter.bi_sector + sectors > conf->reshape_safe &&
+ bio->bi_iter.bi_sector < conf->reshape_progress))) {
/* Need to update reshape_position in metadata */
mddev->reshape_position = conf->reshape_progress;
set_bit(MD_CHANGE_DEVS, &mddev->flags);
@@ -1265,7 +1267,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
r10_bio->sectors = sectors;

r10_bio->mddev = mddev;
- r10_bio->sector = bio->bi_sector;
+ r10_bio->sector = bio->bi_iter.bi_sector;
r10_bio->state = 0;

/* We might need to issue multiple reads to different
@@ -1294,13 +1296,13 @@ read_again:
slot = r10_bio->read_slot;

read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(read_bio, r10_bio->sector - bio->bi_sector,
+ md_trim_bio(read_bio, r10_bio->sector - bio->bi_iter.bi_sector,
max_sectors);

r10_bio->devs[slot].bio = read_bio;
r10_bio->devs[slot].rdev = rdev;

- read_bio->bi_sector = r10_bio->devs[slot].addr +
+ read_bio->bi_iter.bi_sector = r10_bio->devs[slot].addr +
choose_data_offset(r10_bio, rdev);
read_bio->bi_bdev = rdev->bdev;
read_bio->bi_end_io = raid10_end_read_request;
@@ -1312,7 +1314,7 @@ read_again:
* need another r10_bio.
*/
sectors_handled = (r10_bio->sectors + max_sectors
- - bio->bi_sector);
+ - bio->bi_iter.bi_sector);
r10_bio->sectors = max_sectors;
spin_lock_irq(&conf->device_lock);
if (bio->bi_phys_segments == 0)
@@ -1333,7 +1335,8 @@ read_again:
r10_bio->sectors = bio_sectors(bio) - sectors_handled;
r10_bio->state = 0;
r10_bio->mddev = mddev;
- r10_bio->sector = bio->bi_sector + sectors_handled;
+ r10_bio->sector = bio->bi_iter.bi_sector +
+ sectors_handled;
goto read_again;
} else
generic_make_request(read_bio);
@@ -1491,7 +1494,8 @@ retry_write:
bio->bi_phys_segments++;
spin_unlock_irq(&conf->device_lock);
}
- sectors_handled = r10_bio->sector + max_sectors - bio->bi_sector;
+ sectors_handled = r10_bio->sector + max_sectors -
+ bio->bi_iter.bi_sector;

atomic_set(&r10_bio->remaining, 1);
bitmap_startwrite(mddev->bitmap, r10_bio->sector, r10_bio->sectors, 0);
@@ -1502,11 +1506,11 @@ retry_write:
if (r10_bio->devs[i].bio) {
struct md_rdev *rdev = conf->mirrors[d].rdev;
mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(mbio, r10_bio->sector - bio->bi_sector,
- max_sectors);
+ md_trim_bio(mbio, r10_bio->sector -
+ bio->bi_iter.bi_sector, max_sectors);
r10_bio->devs[i].bio = mbio;

- mbio->bi_sector = (r10_bio->devs[i].addr+
+ mbio->bi_iter.bi_sector = (r10_bio->devs[i].addr+
choose_data_offset(r10_bio,
rdev));
mbio->bi_bdev = rdev->bdev;
@@ -1545,11 +1549,11 @@ retry_write:
rdev = conf->mirrors[d].rdev;
}
mbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(mbio, r10_bio->sector - bio->bi_sector,
- max_sectors);
+ md_trim_bio(mbio, r10_bio->sector -
+ bio->bi_iter.bi_sector, max_sectors);
r10_bio->devs[i].repl_bio = mbio;

- mbio->bi_sector = (r10_bio->devs[i].addr +
+ mbio->bi_iter.bi_sector = (r10_bio->devs[i].addr +
choose_data_offset(
r10_bio, rdev));
mbio->bi_bdev = rdev->bdev;
@@ -1583,7 +1587,7 @@ retry_write:
r10_bio->sectors = bio_sectors(bio) - sectors_handled;

r10_bio->mddev = mddev;
- r10_bio->sector = bio->bi_sector + sectors_handled;
+ r10_bio->sector = bio->bi_iter.bi_sector + sectors_handled;
r10_bio->state = 0;
goto retry_write;
}
@@ -2085,10 +2089,10 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
bio_reset(tbio);

tbio->bi_vcnt = vcnt;
- tbio->bi_size = r10_bio->sectors << 9;
+ tbio->bi_iter.bi_size = r10_bio->sectors << 9;
tbio->bi_rw = WRITE;
tbio->bi_private = r10_bio;
- tbio->bi_sector = r10_bio->devs[i].addr;
+ tbio->bi_iter.bi_sector = r10_bio->devs[i].addr;

for (j=0; j < vcnt ; j++) {
tbio->bi_io_vec[j].bv_offset = 0;
@@ -2105,7 +2109,7 @@ static void sync_request_write(struct mddev *mddev, struct r10bio *r10_bio)
atomic_inc(&r10_bio->remaining);
md_sync_acct(conf->mirrors[d].rdev->bdev, bio_sectors(tbio));

- tbio->bi_sector += conf->mirrors[d].rdev->data_offset;
+ tbio->bi_iter.bi_sector += conf->mirrors[d].rdev->data_offset;
tbio->bi_bdev = conf->mirrors[d].rdev->bdev;
generic_make_request(tbio);
}
@@ -2569,8 +2573,8 @@ static int narrow_write_error(struct r10bio *r10_bio, int i)
sectors = sect_to_write;
/* Write at 'sector' for 'sectors' */
wbio = bio_clone_mddev(bio, GFP_NOIO, mddev);
- md_trim_bio(wbio, sector - bio->bi_sector, sectors);
- wbio->bi_sector = (r10_bio->devs[i].addr+
+ md_trim_bio(wbio, sector - bio->bi_iter.bi_sector, sectors);
+ wbio->bi_iter.bi_sector = (r10_bio->devs[i].addr+
choose_data_offset(r10_bio, rdev) +
(sector - r10_bio->sector));
wbio->bi_bdev = rdev->bdev;
@@ -2643,11 +2647,11 @@ read_more:
bio = bio_clone_mddev(r10_bio->master_bio,
GFP_NOIO, mddev);
md_trim_bio(bio,
- r10_bio->sector - bio->bi_sector,
+ r10_bio->sector - bio->bi_iter.bi_sector,
max_sectors);
r10_bio->devs[slot].bio = bio;
r10_bio->devs[slot].rdev = rdev;
- bio->bi_sector = r10_bio->devs[slot].addr
+ bio->bi_iter.bi_sector = r10_bio->devs[slot].addr
+ choose_data_offset(r10_bio, rdev);
bio->bi_bdev = rdev->bdev;
bio->bi_rw = READ | do_sync;
@@ -2658,7 +2662,7 @@ read_more:
struct bio *mbio = r10_bio->master_bio;
int sectors_handled =
r10_bio->sector + max_sectors
- - mbio->bi_sector;
+ - mbio->bi_iter.bi_sector;
r10_bio->sectors = max_sectors;
spin_lock_irq(&conf->device_lock);
if (mbio->bi_phys_segments == 0)
@@ -2676,7 +2680,7 @@ read_more:
set_bit(R10BIO_ReadError,
&r10_bio->state);
r10_bio->mddev = mddev;
- r10_bio->sector = mbio->bi_sector
+ r10_bio->sector = mbio->bi_iter.bi_sector
+ sectors_handled;

goto read_more;
@@ -3115,7 +3119,8 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
bio->bi_end_io = end_sync_read;
bio->bi_rw = READ;
from_addr = r10_bio->devs[j].addr;
- bio->bi_sector = from_addr + rdev->data_offset;
+ bio->bi_iter.bi_sector = from_addr +
+ rdev->data_offset;
bio->bi_bdev = rdev->bdev;
atomic_inc(&rdev->nr_pending);
/* and we write to 'i' (if not in_sync) */
@@ -3139,7 +3144,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
bio->bi_private = r10_bio;
bio->bi_end_io = end_sync_write;
bio->bi_rw = WRITE;
- bio->bi_sector = to_addr
+ bio->bi_iter.bi_sector = to_addr
+ rdev->data_offset;
bio->bi_bdev = rdev->bdev;
atomic_inc(&r10_bio->remaining);
@@ -3168,7 +3173,8 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
bio->bi_private = r10_bio;
bio->bi_end_io = end_sync_write;
bio->bi_rw = WRITE;
- bio->bi_sector = to_addr + rdev->data_offset;
+ bio->bi_iter.bi_sector = to_addr +
+ rdev->data_offset;
bio->bi_bdev = rdev->bdev;
atomic_inc(&r10_bio->remaining);
break;
@@ -3286,7 +3292,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
bio->bi_private = r10_bio;
bio->bi_end_io = end_sync_read;
bio->bi_rw = READ;
- bio->bi_sector = sector +
+ bio->bi_iter.bi_sector = sector +
conf->mirrors[d].rdev->data_offset;
bio->bi_bdev = conf->mirrors[d].rdev->bdev;
count++;
@@ -3308,7 +3314,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
bio->bi_private = r10_bio;
bio->bi_end_io = end_sync_write;
bio->bi_rw = WRITE;
- bio->bi_sector = sector +
+ bio->bi_iter.bi_sector = sector +
conf->mirrors[d].replacement->data_offset;
bio->bi_bdev = conf->mirrors[d].replacement->bdev;
count++;
@@ -3355,7 +3361,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
bio2 = bio2->bi_next) {
/* remove last page from this bio */
bio2->bi_vcnt--;
- bio2->bi_size -= len;
+ bio2->bi_iter.bi_size -= len;
bio2->bi_flags &= ~(1<< BIO_SEG_VALID);
}
goto bio_full;
@@ -4371,7 +4377,7 @@ read_more:
read_bio = bio_alloc_mddev(GFP_KERNEL, RESYNC_PAGES, mddev);

read_bio->bi_bdev = rdev->bdev;
- read_bio->bi_sector = (r10_bio->devs[r10_bio->read_slot].addr
+ read_bio->bi_iter.bi_sector = (r10_bio->devs[r10_bio->read_slot].addr
+ rdev->data_offset);
read_bio->bi_private = r10_bio;
read_bio->bi_end_io = end_sync_read;
@@ -4379,7 +4385,7 @@ read_more:
read_bio->bi_flags &= ~(BIO_POOL_MASK - 1);
read_bio->bi_flags |= 1 << BIO_UPTODATE;
read_bio->bi_vcnt = 0;
- read_bio->bi_size = 0;
+ read_bio->bi_iter.bi_size = 0;
r10_bio->master_bio = read_bio;
r10_bio->read_slot = r10_bio->devs[r10_bio->read_slot].devnum;

@@ -4405,7 +4411,8 @@ read_more:

bio_reset(b);
b->bi_bdev = rdev2->bdev;
- b->bi_sector = r10_bio->devs[s/2].addr + rdev2->new_data_offset;
+ b->bi_iter.bi_sector = r10_bio->devs[s/2].addr +
+ rdev2->new_data_offset;
b->bi_private = r10_bio;
b->bi_end_io = end_reshape_write;
b->bi_rw = WRITE;
@@ -4432,7 +4439,7 @@ read_more:
bio2 = bio2->bi_next) {
/* Remove last page from this bio */
bio2->bi_vcnt--;
- bio2->bi_size -= len;
+ bio2->bi_iter.bi_size -= len;
bio2->bi_flags &= ~(1<<BIO_SEG_VALID);
}
goto bio_full;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9359828..1b87468 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -91,7 +91,7 @@ static inline struct hlist_head *stripe_hash(struct r5conf *conf, sector_t sect)
static inline struct bio *r5_next_bio(struct bio *bio, sector_t sector)
{
int sectors = bio_sectors(bio);
- if (bio->bi_sector + sectors < sector + STRIPE_SECTORS)
+ if (bio->bi_iter.bi_sector + sectors < sector + STRIPE_SECTORS)
return bio->bi_next;
else
return NULL;
@@ -183,7 +183,7 @@ static void return_io(struct bio *return_bi)

return_bi = bi->bi_next;
bi->bi_next = NULL;
- bi->bi_size = 0;
+ bi->bi_iter.bi_size = 0;
trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
bi, 0);
bio_endio(bi, 0);
@@ -656,17 +656,17 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
bi->bi_rw, i);
atomic_inc(&sh->count);
if (use_new_offset(conf, sh))
- bi->bi_sector = (sh->sector
+ bi->bi_iter.bi_sector = (sh->sector
+ rdev->new_data_offset);
else
- bi->bi_sector = (sh->sector
+ bi->bi_iter.bi_sector = (sh->sector
+ rdev->data_offset);
if (test_bit(R5_ReadNoMerge, &sh->dev[i].flags))
bi->bi_rw |= REQ_FLUSH;

bi->bi_io_vec[0].bv_len = STRIPE_SIZE;
bi->bi_io_vec[0].bv_offset = 0;
- bi->bi_size = STRIPE_SIZE;
+ bi->bi_iter.bi_size = STRIPE_SIZE;
if (rrdev)
set_bit(R5_DOUBLE_LOCKED, &sh->dev[i].flags);

@@ -696,14 +696,14 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
rbi->bi_rw, i);
atomic_inc(&sh->count);
if (use_new_offset(conf, sh))
- rbi->bi_sector = (sh->sector
+ rbi->bi_iter.bi_sector = (sh->sector
+ rrdev->new_data_offset);
else
- rbi->bi_sector = (sh->sector
+ rbi->bi_iter.bi_sector = (sh->sector
+ rrdev->data_offset);
rbi->bi_io_vec[0].bv_len = STRIPE_SIZE;
rbi->bi_io_vec[0].bv_offset = 0;
- rbi->bi_size = STRIPE_SIZE;
+ rbi->bi_iter.bi_size = STRIPE_SIZE;
if (conf->mddev->gendisk)
trace_block_bio_remap(bdev_get_queue(rbi->bi_bdev),
rbi, disk_devt(conf->mddev->gendisk),
@@ -732,10 +732,10 @@ async_copy_data(int frombio, struct bio *bio, struct page *page,
struct async_submit_ctl submit;
enum async_tx_flags flags = 0;

- if (bio->bi_sector >= sector)
- page_offset = (signed)(bio->bi_sector - sector) * 512;
+ if (bio->bi_iter.bi_sector >= sector)
+ page_offset = (signed)(bio->bi_iter.bi_sector - sector) * 512;
else
- page_offset = (signed)(sector - bio->bi_sector) * -512;
+ page_offset = (signed)(sector - bio->bi_iter.bi_sector) * -512;

if (frombio)
flags |= ASYNC_TX_FENCE;
@@ -802,7 +802,7 @@ static void ops_complete_biofill(void *stripe_head_ref)
BUG_ON(!dev->read);
rbi = dev->read;
dev->read = NULL;
- while (rbi && rbi->bi_sector <
+ while (rbi && rbi->bi_iter.bi_sector <
dev->sector + STRIPE_SECTORS) {
rbi2 = r5_next_bio(rbi, dev->sector);
if (!raid5_dec_bi_active_stripes(rbi)) {
@@ -838,7 +838,7 @@ static void ops_run_biofill(struct stripe_head *sh)
dev->read = rbi = dev->toread;
dev->toread = NULL;
spin_unlock_irq(&sh->stripe_lock);
- while (rbi && rbi->bi_sector <
+ while (rbi && rbi->bi_iter.bi_sector <
dev->sector + STRIPE_SECTORS) {
tx = async_copy_data(0, rbi, dev->page,
dev->sector, tx);
@@ -1180,7 +1180,7 @@ ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
wbi = dev->written = chosen;
spin_unlock_irq(&sh->stripe_lock);

- while (wbi && wbi->bi_sector <
+ while (wbi && wbi->bi_iter.bi_sector <
dev->sector + STRIPE_SECTORS) {
if (wbi->bi_rw & REQ_FUA)
set_bit(R5_WantFUA, &dev->flags);
@@ -2382,7 +2382,7 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
int firstwrite=0;

pr_debug("adding bi b#%llu to stripe s#%llu\n",
- (unsigned long long)bi->bi_sector,
+ (unsigned long long)bi->bi_iter.bi_sector,
(unsigned long long)sh->sector);

/*
@@ -2400,12 +2400,12 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
firstwrite = 1;
} else
bip = &sh->dev[dd_idx].toread;
- while (*bip && (*bip)->bi_sector < bi->bi_sector) {
- if (bio_end_sector(*bip) > bi->bi_sector)
+ while (*bip && (*bip)->bi_iter.bi_sector < bi->bi_iter.bi_sector) {
+ if (bio_end_sector(*bip) > bi->bi_iter.bi_sector)
goto overlap;
bip = & (*bip)->bi_next;
}
- if (*bip && (*bip)->bi_sector < bio_end_sector(bi))
+ if (*bip && (*bip)->bi_iter.bi_sector < bio_end_sector(bi))
goto overlap;

BUG_ON(*bip && bi->bi_next && (*bip) != bi->bi_next);
@@ -2419,7 +2419,7 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
sector_t sector = sh->dev[dd_idx].sector;
for (bi=sh->dev[dd_idx].towrite;
sector < sh->dev[dd_idx].sector + STRIPE_SECTORS &&
- bi && bi->bi_sector <= sector;
+ bi && bi->bi_iter.bi_sector <= sector;
bi = r5_next_bio(bi, sh->dev[dd_idx].sector)) {
if (bio_end_sector(bi) >= sector)
sector = bio_end_sector(bi);
@@ -2429,7 +2429,7 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
}

pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n",
- (unsigned long long)(*bip)->bi_sector,
+ (unsigned long long)(*bip)->bi_iter.bi_sector,
(unsigned long long)sh->sector, dd_idx);
spin_unlock_irq(&sh->stripe_lock);

@@ -2504,7 +2504,7 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
wake_up(&conf->wait_for_overlap);

- while (bi && bi->bi_sector <
+ while (bi && bi->bi_iter.bi_sector <
sh->dev[i].sector + STRIPE_SECTORS) {
struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
clear_bit(BIO_UPTODATE, &bi->bi_flags);
@@ -2523,7 +2523,7 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
bi = sh->dev[i].written;
sh->dev[i].written = NULL;
if (bi) bitmap_end = 1;
- while (bi && bi->bi_sector <
+ while (bi && bi->bi_iter.bi_sector <
sh->dev[i].sector + STRIPE_SECTORS) {
struct bio *bi2 = r5_next_bio(bi, sh->dev[i].sector);
clear_bit(BIO_UPTODATE, &bi->bi_flags);
@@ -2547,7 +2547,7 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
spin_unlock_irq(&sh->stripe_lock);
if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
wake_up(&conf->wait_for_overlap);
- while (bi && bi->bi_sector <
+ while (bi && bi->bi_iter.bi_sector <
sh->dev[i].sector + STRIPE_SECTORS) {
struct bio *nextbi =
r5_next_bio(bi, sh->dev[i].sector);
@@ -2771,7 +2771,7 @@ static void handle_stripe_clean_event(struct r5conf *conf,
clear_bit(R5_UPTODATE, &dev->flags);
wbi = dev->written;
dev->written = NULL;
- while (wbi && wbi->bi_sector <
+ while (wbi && wbi->bi_iter.bi_sector <
dev->sector + STRIPE_SECTORS) {
wbi2 = r5_next_bio(wbi, dev->sector);
if (!raid5_dec_bi_active_stripes(wbi)) {
@@ -3846,7 +3846,7 @@ static int raid5_mergeable_bvec(struct request_queue *q,

static int in_chunk_boundary(struct mddev *mddev, struct bio *bio)
{
- sector_t sector = bio->bi_sector + get_start_sect(bio->bi_bdev);
+ sector_t sector = bio->bi_iter.bi_sector + get_start_sect(bio->bi_bdev);
unsigned int chunk_sectors = mddev->chunk_sectors;
unsigned int bio_sectors = bio_sectors(bio);

@@ -3983,9 +3983,9 @@ static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio)
/*
* compute position
*/
- align_bi->bi_sector = raid5_compute_sector(conf, raid_bio->bi_sector,
- 0,
- &dd_idx, NULL);
+ align_bi->bi_iter.bi_sector =
+ raid5_compute_sector(conf, raid_bio->bi_iter.bi_sector,
+ 0, &dd_idx, NULL);

end_sector = bio_end_sector(align_bi);
rcu_read_lock();
@@ -4010,7 +4010,8 @@ static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio)
align_bi->bi_flags &= ~(1 << BIO_SEG_VALID);

if (!bio_fits_rdev(align_bi) ||
- is_badblock(rdev, align_bi->bi_sector, bio_sectors(align_bi),
+ is_badblock(rdev, align_bi->bi_iter.bi_sector,
+ bio_sectors(align_bi),
&first_bad, &bad_sectors)) {
/* too big in some way, or has a known bad block */
bio_put(align_bi);
@@ -4019,7 +4020,7 @@ static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio)
}

/* No reshape active, so we can trust rdev->data_offset */
- align_bi->bi_sector += rdev->data_offset;
+ align_bi->bi_iter.bi_sector += rdev->data_offset;

spin_lock_irq(&conf->device_lock);
wait_event_lock_irq(conf->wait_for_stripe,
@@ -4031,7 +4032,7 @@ static int chunk_aligned_read(struct mddev *mddev, struct bio * raid_bio)
if (mddev->gendisk)
trace_block_bio_remap(bdev_get_queue(align_bi->bi_bdev),
align_bi, disk_devt(mddev->gendisk),
- raid_bio->bi_sector);
+ raid_bio->bi_iter.bi_sector);
generic_make_request(align_bi);
return 1;
} else {
@@ -4166,8 +4167,8 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
/* Skip discard while reshape is happening */
return;

- logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
- last_sector = bi->bi_sector + (bi->bi_size>>9);
+ logical_sector = bi->bi_iter.bi_sector & ~((sector_t)STRIPE_SECTORS-1);
+ last_sector = bi->bi_iter.bi_sector + (bi->bi_iter.bi_size>>9);

bi->bi_next = NULL;
bi->bi_phys_segments = 1; /* over-loaded to count active stripes */
@@ -4271,7 +4272,7 @@ static void make_request(struct mddev *mddev, struct bio * bi)
return;
}

- logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
+ logical_sector = bi->bi_iter.bi_sector & ~((sector_t)STRIPE_SECTORS-1);
last_sector = bio_end_sector(bi);
bi->bi_next = NULL;
bi->bi_phys_segments = 1; /* over-loaded to count active stripes */
@@ -4735,7 +4736,8 @@ static int retry_aligned_read(struct r5conf *conf, struct bio *raid_bio)
int remaining;
int handled = 0;

- logical_sector = raid_bio->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
+ logical_sector = raid_bio->bi_iter.bi_sector &
+ ~((sector_t)STRIPE_SECTORS-1);
sector = raid5_compute_sector(conf, logical_sector,
0, &dd_idx, NULL);
last_sector = bio_end_sector(raid_bio);
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 6eca019..16814a8 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -819,7 +819,8 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
dev_info = bio->bi_bdev->bd_disk->private_data;
if (dev_info == NULL)
goto fail;
- if ((bio->bi_sector & 7) != 0 || (bio->bi_size & 4095) != 0)
+ if ((bio->bi_iter.bi_sector & 7) != 0 ||
+ (bio->bi_iter.bi_size & 4095) != 0)
/* Request is not page-aligned. */
goto fail;
if (bio_end_sector(bio) > get_capacity(bio->bi_bdev->bd_disk)) {
@@ -842,7 +843,7 @@ dcssblk_make_request(struct request_queue *q, struct bio *bio)
}
}

- index = (bio->bi_sector >> 3);
+ index = (bio->bi_iter.bi_sector >> 3);
bio_for_each_segment(bvec, bio, i) {
page_addr = (unsigned long)
page_address(bvec->bv_page) + bvec->bv_offset;
diff --git a/drivers/s390/block/xpram.c b/drivers/s390/block/xpram.c
index 464dd29..dd4e73f 100644
--- a/drivers/s390/block/xpram.c
+++ b/drivers/s390/block/xpram.c
@@ -190,15 +190,16 @@ static void xpram_make_request(struct request_queue *q, struct bio *bio)
unsigned long bytes;
int i;

- if ((bio->bi_sector & 7) != 0 || (bio->bi_size & 4095) != 0)
+ if ((bio->bi_iter.bi_sector & 7) != 0 ||
+ (bio->bi_iter.bi_size & 4095) != 0)
/* Request is not page-aligned. */
goto fail;
- if ((bio->bi_size >> 12) > xdev->size)
+ if ((bio->bi_iter.bi_size >> 12) > xdev->size)
/* Request size is no page-aligned. */
goto fail;
- if ((bio->bi_sector >> 3) > 0xffffffffU - xdev->offset)
+ if ((bio->bi_iter.bi_sector >> 3) > 0xffffffffU - xdev->offset)
goto fail;
- index = (bio->bi_sector >> 3) + xdev->offset;
+ index = (bio->bi_iter.bi_sector >> 3) + xdev->offset;
bio_for_each_segment(bvec, bio, i) {
page_addr = (unsigned long)
kmap(bvec->bv_page) + bvec->bv_offset;
diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c
index aa66361..bac04c2 100644
--- a/drivers/scsi/osd/osd_initiator.c
+++ b/drivers/scsi/osd/osd_initiator.c
@@ -731,7 +731,7 @@ static int _osd_req_list_objects(struct osd_request *or,

bio->bi_rw &= ~REQ_WRITE;
or->in.bio = bio;
- or->in.total_bytes = bio->bi_size;
+ or->in.total_bytes = bio->bi_iter.bi_size;
return 0;
}

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index e34e3fe..7680d53 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -377,8 +377,9 @@ static void __zram_make_request(struct zram *zram, struct bio *bio, int rw)
break;
}

- index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;
- offset = (bio->bi_sector & (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;
+ index = bio->bi_iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
+ offset = (bio->bi_iter.bi_sector &
+ (SECTORS_PER_PAGE - 1)) << SECTOR_SHIFT;

bio_for_each_segment(bvec, bio, i) {
int max_transfer_size = PAGE_SIZE - offset;
@@ -423,9 +424,10 @@ out:
static inline int valid_io_request(struct zram *zram, struct bio *bio)
{
if (unlikely(
- (bio->bi_sector >= (zram->disksize >> SECTOR_SHIFT)) ||
- (bio->bi_sector & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)) ||
- (bio->bi_size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))) {
+ (bio->bi_iter.bi_sector >= (zram->disksize >> SECTOR_SHIFT)) ||
+ (bio->bi_iter.bi_sector &
+ (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)) ||
+ (bio->bi_iter.bi_size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))) {

return 0;
}
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index aa1620a..4032d6b 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -319,7 +319,7 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)
bio->bi_bdev = ib_dev->ibd_bd;
bio->bi_private = cmd;
bio->bi_end_io = &iblock_bio_done;
- bio->bi_sector = lba;
+ bio->bi_iter.bi_sector = lba;

return bio;
}
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index 8fb4291..f824c30 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -215,9 +215,9 @@ unsigned int bio_integrity_tag_size(struct bio *bio)
{
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);

- BUG_ON(bio->bi_size == 0);
+ BUG_ON(bio->bi_iter.bi_size == 0);

- return bi->tag_size * (bio->bi_size / bi->sector_size);
+ return bi->tag_size * (bio->bi_iter.bi_size / bi->sector_size);
}
EXPORT_SYMBOL(bio_integrity_tag_size);

@@ -300,7 +300,7 @@ static void bio_integrity_generate(struct bio *bio)
struct blk_integrity *bi = bdev_get_integrity(bio->bi_bdev);
struct blk_integrity_exchg bix;
struct bio_vec *bv;
- sector_t sector = bio->bi_sector;
+ sector_t sector = bio->bi_iter.bi_sector;
unsigned int i, sectors, total;
void *prot_buf = bio->bi_integrity->bip_buf;

@@ -387,7 +387,7 @@ int bio_integrity_prep(struct bio *bio)
bip->bip_owns_buf = 1;
bip->bip_buf = buf;
bip->bip_size = len;
- bip->bip_sector = bio->bi_sector;
+ bip->bip_sector = bio->bi_iter.bi_sector;

/* Map it */
offset = offset_in_page(buf);
diff --git a/fs/bio.c b/fs/bio.c
index 94bbc04..b7c02b0 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -532,13 +532,13 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
* most users will be overriding ->bi_bdev with a new target,
* so we don't set nor calculate new physical/hw segment counts here
*/
- bio->bi_sector = bio_src->bi_sector;
+ bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector;
bio->bi_bdev = bio_src->bi_bdev;
bio->bi_flags |= 1 << BIO_CLONED;
bio->bi_rw = bio_src->bi_rw;
bio->bi_vcnt = bio_src->bi_vcnt;
- bio->bi_size = bio_src->bi_size;
- bio->bi_idx = bio_src->bi_idx;
+ bio->bi_iter.bi_size = bio_src->bi_iter.bi_size;
+ bio->bi_iter.bi_idx = bio_src->bi_iter.bi_idx;
}
EXPORT_SYMBOL(__bio_clone);

@@ -612,7 +612,7 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
if (unlikely(bio_flagged(bio, BIO_CLONED)))
return 0;

- if (((bio->bi_size + len) >> 9) > max_sectors)
+ if (((bio->bi_iter.bi_size + len) >> 9) > max_sectors)
return 0;

/*
@@ -635,8 +635,9 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
simulate merging updated prev_bvec
as new bvec. */
.bi_bdev = bio->bi_bdev,
- .bi_sector = bio->bi_sector,
- .bi_size = bio->bi_size - prev_bv_len,
+ .bi_sector = bio->bi_iter.bi_sector,
+ .bi_size = bio->bi_iter.bi_size -
+ prev_bv_len,
.bi_rw = bio->bi_rw,
};

@@ -684,8 +685,8 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
if (q->merge_bvec_fn) {
struct bvec_merge_data bvm = {
.bi_bdev = bio->bi_bdev,
- .bi_sector = bio->bi_sector,
- .bi_size = bio->bi_size,
+ .bi_sector = bio->bi_iter.bi_sector,
+ .bi_size = bio->bi_iter.bi_size,
.bi_rw = bio->bi_rw,
};

@@ -708,7 +709,7 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
bio->bi_vcnt++;
bio->bi_phys_segments++;
done:
- bio->bi_size += len;
+ bio->bi_iter.bi_size += len;
return len;
}

@@ -807,22 +808,22 @@ void bio_advance(struct bio *bio, unsigned bytes)
if (bio_integrity(bio))
bio_integrity_advance(bio, bytes);

- bio->bi_sector += bytes >> 9;
- bio->bi_size -= bytes;
+ bio->bi_iter.bi_sector += bytes >> 9;
+ bio->bi_iter.bi_size -= bytes;

if (bio->bi_rw & BIO_NO_ADVANCE_ITER_MASK)
return;

while (bytes) {
- if (unlikely(bio->bi_idx >= bio->bi_vcnt)) {
+ if (unlikely(bio->bi_iter.bi_idx >= bio->bi_vcnt)) {
WARN_ONCE(1, "bio idx %d >= vcnt %d\n",
- bio->bi_idx, bio->bi_vcnt);
+ bio->bi_iter.bi_idx, bio->bi_vcnt);
break;
}

if (bytes >= bio_iovec(bio)->bv_len) {
bytes -= bio_iovec(bio)->bv_len;
- bio->bi_idx++;
+ bio->bi_iter.bi_idx++;
} else {
bio_iovec(bio)->bv_len -= bytes;
bio_iovec(bio)->bv_offset += bytes;
@@ -1475,7 +1476,7 @@ struct bio *bio_map_kern(struct request_queue *q, void *data, unsigned int len,
if (IS_ERR(bio))
return bio;

- if (bio->bi_size == len)
+ if (bio->bi_iter.bi_size == len)
return bio;

/*
@@ -1753,16 +1754,16 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)
return bp;

trace_block_split(bdev_get_queue(bi->bi_bdev), bi,
- bi->bi_sector + first_sectors);
+ bi->bi_iter.bi_sector + first_sectors);

BUG_ON(bio_segments(bi) > 1);
atomic_set(&bp->cnt, 3);
bp->error = 0;
bp->bio1 = *bi;
bp->bio2 = *bi;
- bp->bio2.bi_sector += first_sectors;
- bp->bio2.bi_size -= first_sectors << 9;
- bp->bio1.bi_size = first_sectors << 9;
+ bp->bio2.bi_iter.bi_sector += first_sectors;
+ bp->bio2.bi_iter.bi_size -= first_sectors << 9;
+ bp->bio1.bi_iter.bi_size = first_sectors << 9;

if (bi->bi_vcnt != 0) {
bp->bv1 = *bio_iovec(bi);
@@ -1815,7 +1816,7 @@ sector_t bio_sector_offset(struct bio *bio, unsigned short index,
sector_sz = queue_logical_block_size(bio->bi_bdev->bd_disk->queue);
sectors = 0;

- if (index >= bio->bi_idx)
+ if (index >= bio->bi_iter.bi_idx)
index = bio->bi_vcnt - 1;

bio_for_each_segment_all(bv, bio, i) {
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 1431a69..2ae350c 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1708,7 +1708,7 @@ static int btrfsic_read_block(struct btrfsic_state *state,
return -1;
}
bio->bi_bdev = block_ctx->dev->bdev;
- bio->bi_sector = dev_bytenr >> 9;
+ bio->bi_iter.bi_sector = dev_bytenr >> 9;
bio->bi_end_io = btrfsic_complete_bio_end_io;
bio->bi_private = &complete;

@@ -3112,16 +3112,16 @@ void btrfsic_submit_bio(int rw, struct bio *bio)
int bio_is_patched;
char **mapped_datav;

- dev_bytenr = 512 * bio->bi_sector;
+ dev_bytenr = 512 * bio->bi_iter.bi_sector;
bio_is_patched = 0;
if (dev_state->state->print_mask &
BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
printk(KERN_INFO
"submit_bio(rw=0x%x, bi_vcnt=%u,"
" bi_sector=%lu (bytenr %llu), bi_bdev=%p)\n",
- rw, bio->bi_vcnt, (unsigned long)bio->bi_sector,
- (unsigned long long)dev_bytenr,
- bio->bi_bdev);
+ rw, bio->bi_vcnt,
+ (unsigned long)bio->bi_iter.bi_sector,
+ (unsigned long long)dev_bytenr, bio->bi_bdev);

mapped_datav = kmalloc(sizeof(*mapped_datav) * bio->bi_vcnt,
GFP_NOFS);
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b189bd1..2d7515b 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -174,7 +174,8 @@ static void end_compressed_bio_read(struct bio *bio, int err)
goto out;

inode = cb->inode;
- ret = check_compressed_csum(inode, cb, (u64)bio->bi_sector << 9);
+ ret = check_compressed_csum(inode, cb,
+ (u64)bio->bi_iter.bi_sector << 9);
if (ret)
goto csum_failed;

@@ -374,7 +375,7 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 start,
for (pg_index = 0; pg_index < cb->nr_pages; pg_index++) {
page = compressed_pages[pg_index];
page->mapping = inode->i_mapping;
- if (bio->bi_size)
+ if (bio->bi_iter.bi_size)
ret = io_tree->ops->merge_bio_hook(WRITE, page, 0,
PAGE_CACHE_SIZE,
bio, 0);
@@ -508,7 +509,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,

if (!em || last_offset < em->start ||
(last_offset + PAGE_CACHE_SIZE > extent_map_end(em)) ||
- (em->block_start >> 9) != cb->orig_bio->bi_sector) {
+ (em->block_start >> 9) != cb->orig_bio->bi_iter.bi_sector) {
free_extent_map(em);
unlock_extent(tree, last_offset, end);
unlock_page(page);
@@ -554,7 +555,7 @@ next:
* in it. We don't actually do IO on those pages but allocate new ones
* to hold the compressed pages on disk.
*
- * bio->bi_sector points to the compressed extent on disk
+ * bio->bi_iter.bi_sector points to the compressed extent on disk
* bio->bi_io_vec points to all of the inode pages
* bio->bi_vcnt is a count of pages
*
@@ -575,7 +576,7 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
struct page *page;
struct block_device *bdev;
struct bio *comp_bio;
- u64 cur_disk_byte = (u64)bio->bi_sector << 9;
+ u64 cur_disk_byte = (u64)bio->bi_iter.bi_sector << 9;
u64 em_len;
u64 em_start;
struct extent_map *em;
@@ -657,7 +658,7 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
page->mapping = inode->i_mapping;
page->index = em_start >> PAGE_CACHE_SHIFT;

- if (comp_bio->bi_size)
+ if (comp_bio->bi_iter.bi_size)
ret = tree->ops->merge_bio_hook(READ, page, 0,
PAGE_CACHE_SIZE,
comp_bio, 0);
@@ -685,8 +686,8 @@ int btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
comp_bio, sums);
BUG_ON(ret); /* -ENOMEM */
}
- sums += (comp_bio->bi_size + root->sectorsize - 1) /
- root->sectorsize;
+ sums += (comp_bio->bi_iter.bi_size +
+ root->sectorsize - 1) / root->sectorsize;

ret = btrfs_map_bio(root, READ, comp_bio,
mirror_num, 0);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e7e7afb..cdeb8d4 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2042,7 +2042,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 start,
return -EIO;
bio->bi_private = &compl;
bio->bi_end_io = repair_io_failure_callback;
- bio->bi_size = 0;
+ bio->bi_iter.bi_size = 0;
map_length = length;

ret = btrfs_map_block(fs_info, WRITE, logical,
@@ -2053,7 +2053,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 start,
}
BUG_ON(mirror_num != bbio->mirror_num);
sector = bbio->stripes[mirror_num-1].physical >> 9;
- bio->bi_sector = sector;
+ bio->bi_iter.bi_sector = sector;
dev = bbio->stripes[mirror_num-1].dev;
kfree(bbio);
if (!dev || !dev->bdev || !dev->writeable) {
@@ -2334,9 +2334,9 @@ static int bio_readpage_error(struct bio *failed_bio, struct page *page,
}
bio->bi_private = state;
bio->bi_end_io = failed_bio->bi_end_io;
- bio->bi_sector = failrec->logical >> 9;
+ bio->bi_iter.bi_sector = failrec->logical >> 9;
bio->bi_bdev = BTRFS_I(inode)->root->fs_info->fs_devices->latest_bdev;
- bio->bi_size = 0;
+ bio->bi_iter.bi_size = 0;

bio_add_page(bio, page, failrec->len, start - page_offset(page));

@@ -2452,7 +2452,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);

pr_debug("end_bio_extent_readpage: bi_sector=%llu, err=%d, "
- "mirror=%lu\n", (u64)bio->bi_sector, err,
+ "mirror=%lu\n", (u64)bio->bi_iter.bi_sector, err,
io_bio->mirror_num);
tree = &BTRFS_I(page->mapping->host)->io_tree;

@@ -2559,9 +2559,9 @@ btrfs_bio_alloc(struct block_device *bdev, u64 first_sector, int nr_vecs,
}

if (bio) {
- bio->bi_size = 0;
+ bio->bi_iter.bi_size = 0;
bio->bi_bdev = bdev;
- bio->bi_sector = first_sector;
+ bio->bi_iter.bi_sector = first_sector;
}
return bio;
}
@@ -2641,7 +2641,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree,
if (bio_ret && *bio_ret) {
bio = *bio_ret;
if (old_compressed)
- contig = bio->bi_sector == sector;
+ contig = bio->bi_iter.bi_sector == sector;
else
contig = bio_end_sector(bio) == sector;

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index b193bf3..6a5b4ce 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -175,7 +175,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
- if (bio->bi_size > PAGE_CACHE_SIZE * 8)
+ if (bio->bi_iter.bi_size > PAGE_CACHE_SIZE * 8)
path->reada = 2;

WARN_ON(bio->bi_vcnt <= 0);
@@ -191,7 +191,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
path->skip_locking = 1;
}

- disk_bytenr = (u64)bio->bi_sector << 9;
+ disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
if (dio)
offset = logical_offset;
while (bio_index < bio->bi_vcnt) {
@@ -428,13 +428,14 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode,
u64 disk_bytenr;

WARN_ON(bio->bi_vcnt <= 0);
- sums = kzalloc(btrfs_ordered_sum_size(root, bio->bi_size), GFP_NOFS);
+ sums = kzalloc(btrfs_ordered_sum_size(root, bio->bi_iter.bi_size),
+ GFP_NOFS);
if (!sums)
return -ENOMEM;

sector_sum = sums->sums;
- disk_bytenr = (u64)bio->bi_sector << 9;
- sums->len = bio->bi_size;
+ disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
+ sums->len = bio->bi_iter.bi_size;
INIT_LIST_HEAD(&sums->list);

if (contig)
@@ -458,7 +459,7 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct inode *inode,
btrfs_add_ordered_sum(inode, ordered, sums);
btrfs_put_ordered_extent(ordered);

- bytes_left = bio->bi_size - total_bytes;
+ bytes_left = bio->bi_iter.bi_size - total_bytes;

sums = kzalloc(btrfs_ordered_sum_size(root, bytes_left),
GFP_NOFS);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index af978f7..3523418 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1635,7 +1635,7 @@ int btrfs_merge_bio_hook(int rw, struct page *page, unsigned long offset,
unsigned long bio_flags)
{
struct btrfs_root *root = BTRFS_I(page->mapping->host)->root;
- u64 logical = (u64)bio->bi_sector << 9;
+ u64 logical = (u64)bio->bi_iter.bi_sector << 9;
u64 length = 0;
u64 map_length;
int ret;
@@ -1643,7 +1643,7 @@ int btrfs_merge_bio_hook(int rw, struct page *page, unsigned long offset,
if (bio_flags & EXTENT_BIO_COMPRESSED)
return 0;

- length = bio->bi_size;
+ length = bio->bi_iter.bi_size;
map_length = length;
ret = btrfs_map_block(root->fs_info, rw, logical,
&map_length, NULL, 0);
@@ -7063,7 +7063,8 @@ static void btrfs_end_dio_bio(struct bio *bio, int err)
printk(KERN_ERR "btrfs direct IO failed ino %llu rw %lu "
"sector %#Lx len %u err no %d\n",
(unsigned long long)btrfs_ino(dip->inode), bio->bi_rw,
- (unsigned long long)bio->bi_sector, bio->bi_size, err);
+ (unsigned long long)bio->bi_iter.bi_sector,
+ bio->bi_iter.bi_size, err);
dip->errors = 1;

/*
@@ -7152,7 +7153,7 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
struct bio *bio;
struct bio *orig_bio = dip->orig_bio;
struct bio_vec *bvec = orig_bio->bi_io_vec;
- u64 start_sector = orig_bio->bi_sector;
+ u64 start_sector = orig_bio->bi_iter.bi_sector;
u64 file_offset = dip->logical_offset;
u64 submit_len = 0;
u64 map_length;
@@ -7160,14 +7161,14 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
int ret = 0;
int async_submit = 0;

- map_length = orig_bio->bi_size;
+ map_length = orig_bio->bi_iter.bi_size;
ret = btrfs_map_block(root->fs_info, rw, start_sector << 9,
&map_length, NULL, 0);
if (ret) {
bio_put(orig_bio);
return -EIO;
}
- if (map_length >= orig_bio->bi_size) {
+ if (map_length >= orig_bio->bi_iter.bi_size) {
bio = orig_bio;
goto submit;
}
@@ -7219,7 +7220,7 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
bio->bi_private = dip;
bio->bi_end_io = btrfs_end_dio_bio;

- map_length = orig_bio->bi_size;
+ map_length = orig_bio->bi_iter.bi_size;
ret = btrfs_map_block(root->fs_info, rw,
start_sector << 9,
&map_length, NULL, 0);
@@ -7292,7 +7293,7 @@ static void btrfs_submit_direct(int rw, struct bio *dio_bio,
bvec++;
} while (bvec <= (dio_bio->bi_io_vec + dio_bio->bi_vcnt - 1));

- dip->disk_bytenr = (u64)dio_bio->bi_sector << 9;
+ dip->disk_bytenr = (u64)dio_bio->bi_iter.bi_sector << 9;
io_bio->bi_private = dip;
dip->errors = 0;
dip->orig_bio = io_bio;
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0525e138..fc89bee 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1033,8 +1033,8 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,

/* see if we can add this page onto our existing bio */
if (last) {
- last_end = (u64)last->bi_sector << 9;
- last_end += last->bi_size;
+ last_end = (u64)last->bi_iter.bi_sector << 9;
+ last_end += last->bi_iter.bi_size;

/*
* we can't merge these if they are from different
@@ -1054,9 +1054,9 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
if (!bio)
return -ENOMEM;

- bio->bi_size = 0;
+ bio->bi_iter.bi_size = 0;
bio->bi_bdev = stripe->dev->bdev;
- bio->bi_sector = disk_start >> 9;
+ bio->bi_iter.bi_sector = disk_start >> 9;
set_bit(BIO_UPTODATE, &bio->bi_flags);

bio_add_page(bio, page, PAGE_CACHE_SIZE, 0);
@@ -1112,7 +1112,7 @@ static void index_rbio_pages(struct btrfs_raid_bio *rbio)

spin_lock_irq(&rbio->bio_list_lock);
bio_list_for_each(bio, &rbio->bio_list) {
- start = (u64)bio->bi_sector << 9;
+ start = (u64)bio->bi_iter.bi_sector << 9;
stripe_offset = start - rbio->raid_map[0];
page_index = stripe_offset >> PAGE_CACHE_SHIFT;

@@ -1273,7 +1273,7 @@ cleanup:
static int find_bio_stripe(struct btrfs_raid_bio *rbio,
struct bio *bio)
{
- u64 physical = bio->bi_sector;
+ u64 physical = bio->bi_iter.bi_sector;
u64 stripe_start;
int i;
struct btrfs_bio_stripe *stripe;
@@ -1299,7 +1299,7 @@ static int find_bio_stripe(struct btrfs_raid_bio *rbio,
static int find_logical_bio_stripe(struct btrfs_raid_bio *rbio,
struct bio *bio)
{
- u64 logical = bio->bi_sector;
+ u64 logical = bio->bi_iter.bi_sector;
u64 stripe_start;
int i;

@@ -1601,8 +1601,8 @@ static int plug_cmp(void *priv, struct list_head *a, struct list_head *b)
plug_list);
struct btrfs_raid_bio *rb = container_of(b, struct btrfs_raid_bio,
plug_list);
- u64 a_sector = ra->bio_list.head->bi_sector;
- u64 b_sector = rb->bio_list.head->bi_sector;
+ u64 a_sector = ra->bio_list.head->bi_iter.bi_sector;
+ u64 b_sector = rb->bio_list.head->bi_iter.bi_sector;

if (a_sector < b_sector)
return -1;
@@ -1693,7 +1693,7 @@ int raid56_parity_write(struct btrfs_root *root, struct bio *bio,
return PTR_ERR(rbio);
}
bio_list_add(&rbio->bio_list, bio);
- rbio->bio_list_bytes = bio->bi_size;
+ rbio->bio_list_bytes = bio->bi_iter.bi_size;

/*
* don't plug on full rbios, just get them out the door
@@ -2047,7 +2047,7 @@ int raid56_parity_recover(struct btrfs_root *root, struct bio *bio,

rbio->read_rebuild = 1;
bio_list_add(&rbio->bio_list, bio);
- rbio->bio_list_bytes = bio->bi_size;
+ rbio->bio_list_bytes = bio->bi_iter.bi_size;

rbio->faila = find_logical_bio_stripe(rbio, bio);
if (rbio->faila == -1) {
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 79bd479..1fadb35 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1303,7 +1303,7 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
continue;
}
bio->bi_bdev = page->dev->bdev;
- bio->bi_sector = page->physical >> 9;
+ bio->bi_iter.bi_sector = page->physical >> 9;
bio->bi_end_io = scrub_complete_bio_end_io;
bio->bi_private = &complete;

@@ -1435,7 +1435,7 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad,
if (!bio)
return -EIO;
bio->bi_bdev = page_bad->dev->bdev;
- bio->bi_sector = page_bad->physical >> 9;
+ bio->bi_iter.bi_sector = page_bad->physical >> 9;
bio->bi_end_io = scrub_complete_bio_end_io;
bio->bi_private = &complete;

@@ -1533,7 +1533,7 @@ again:
bio->bi_private = sbio;
bio->bi_end_io = scrub_wr_bio_end_io;
bio->bi_bdev = sbio->dev->bdev;
- bio->bi_sector = sbio->physical >> 9;
+ bio->bi_iter.bi_sector = sbio->physical >> 9;
sbio->err = 0;
} else if (sbio->physical + sbio->page_count * PAGE_SIZE !=
spage->physical_for_dev_replace ||
@@ -1939,7 +1939,7 @@ again:
bio->bi_private = sbio;
bio->bi_end_io = scrub_bio_end_io;
bio->bi_bdev = sbio->dev->bdev;
- bio->bi_sector = sbio->physical >> 9;
+ bio->bi_iter.bi_sector = sbio->physical >> 9;
sbio->err = 0;
} else if (sbio->physical + sbio->page_count * PAGE_SIZE !=
spage->physical ||
@@ -3316,8 +3316,8 @@ static int write_page_nocow(struct scrub_ctx *sctx,
}
bio->bi_private = &compl;
bio->bi_end_io = scrub_complete_bio_end_io;
- bio->bi_size = 0;
- bio->bi_sector = physical_for_dev_replace >> 9;
+ bio->bi_iter.bi_size = 0;
+ bio->bi_iter.bi_sector = physical_for_dev_replace >> 9;
bio->bi_bdev = dev->bdev;
ret = bio_add_page(bio, page, PAGE_CACHE_SIZE, 0);
if (ret != PAGE_CACHE_SIZE) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8bffb91..11f5ab7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5170,7 +5170,7 @@ static int bio_size_ok(struct block_device *bdev, struct bio *bio,
if (!q->merge_bvec_fn)
return 1;

- bvm.bi_size = bio->bi_size - prev->bv_len;
+ bvm.bi_size = bio->bi_iter.bi_size - prev->bv_len;
if (q->merge_bvec_fn(q, &bvm, prev) < prev->bv_len)
return 0;
return 1;
@@ -5185,7 +5185,7 @@ static void submit_stripe_bio(struct btrfs_root *root, struct btrfs_bio *bbio,
bio->bi_private = bbio;
btrfs_io_bio(bio)->stripe_index = dev_nr;
bio->bi_end_io = btrfs_end_bio;
- bio->bi_sector = physical >> 9;
+ bio->bi_iter.bi_sector = physical >> 9;
#ifdef DEBUG
{
struct rcu_string *name;
@@ -5223,7 +5223,7 @@ again:
while (bvec <= (first_bio->bi_io_vec + first_bio->bi_vcnt - 1)) {
if (bio_add_page(bio, bvec->bv_page, bvec->bv_len,
bvec->bv_offset) < bvec->bv_len) {
- u64 len = bio->bi_size;
+ u64 len = bio->bi_iter.bi_size;

atomic_inc(&bbio->stripes_pending);
submit_stripe_bio(root, bbio, bio, physical, dev_nr,
@@ -5245,7 +5245,7 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical)
bio->bi_private = bbio->private;
bio->bi_end_io = bbio->end_io;
btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
- bio->bi_sector = logical >> 9;
+ bio->bi_iter.bi_sector = logical >> 9;
kfree(bbio);
bio_endio(bio, -EIO);
}
@@ -5256,7 +5256,7 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct bio *bio,
{
struct btrfs_device *dev;
struct bio *first_bio = bio;
- u64 logical = (u64)bio->bi_sector << 9;
+ u64 logical = (u64)bio->bi_iter.bi_sector << 9;
u64 length = 0;
u64 map_length;
u64 *raid_map = NULL;
@@ -5265,7 +5265,7 @@ int btrfs_map_bio(struct btrfs_root *root, int rw, struct bio *bio,
int total_devs = 1;
struct btrfs_bio *bbio = NULL;

- length = bio->bi_size;
+ length = bio->bi_iter.bi_size;
map_length = length;

ret = __btrfs_map_block(root->fs_info, rw, logical, &map_length, &bbio,
diff --git a/fs/buffer.c b/fs/buffer.c
index d2a4d1b..0f1e99b3 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2923,11 +2923,11 @@ static void guard_bh_eod(int rw, struct bio *bio, struct buffer_head *bh)
* let it through, and the IO layer will turn it into
* an EIO.
*/
- if (unlikely(bio->bi_sector >= maxsector))
+ if (unlikely(bio->bi_iter.bi_sector >= maxsector))
return;

- maxsector -= bio->bi_sector;
- bytes = bio->bi_size;
+ maxsector -= bio->bi_iter.bi_sector;
+ bytes = bio->bi_iter.bi_size;
if (likely((bytes >> 9) <= maxsector))
return;

@@ -2935,7 +2935,7 @@ static void guard_bh_eod(int rw, struct bio *bio, struct buffer_head *bh)
bytes = maxsector << 9;

/* Truncate the bio.. */
- bio->bi_size = bytes;
+ bio->bi_iter.bi_size = bytes;
bio->bi_io_vec[0].bv_len = bytes;

/* ..and clear the end of the buffer for reads */
@@ -2970,14 +2970,14 @@ int _submit_bh(int rw, struct buffer_head *bh, unsigned long bio_flags)
*/
bio = bio_alloc(GFP_NOIO, 1);

- bio->bi_sector = bh->b_blocknr * (bh->b_size >> 9);
+ bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
bio->bi_bdev = bh->b_bdev;
bio->bi_io_vec[0].bv_page = bh->b_page;
bio->bi_io_vec[0].bv_len = bh->b_size;
bio->bi_io_vec[0].bv_offset = bh_offset(bh);

bio->bi_vcnt = 1;
- bio->bi_size = bh->b_size;
+ bio->bi_iter.bi_size = bh->b_size;

bio->bi_end_io = end_bio_bh_io_sync;
bio->bi_private = bh;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 7ab90f5..6a5de20 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -349,7 +349,7 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
bio = bio_alloc(GFP_KERNEL, nr_vecs);

bio->bi_bdev = bdev;
- bio->bi_sector = first_sector;
+ bio->bi_iter.bi_sector = first_sector;
if (dio->is_async)
bio->bi_end_io = dio_bio_end_aio;
else
@@ -654,7 +654,7 @@ static inline int dio_send_cur_page(struct dio *dio, struct dio_submit *sdio,
if (sdio->bio) {
loff_t cur_offset = sdio->cur_page_fs_offset;
loff_t bio_next_offset = sdio->logical_offset_in_bio +
- sdio->bio->bi_size;
+ sdio->bio->bi_iter.bi_size;

/*
* See whether this new request is contiguous with the old.
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 4acf1f7..b232538 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -225,7 +225,7 @@ static void ext4_end_bio(struct bio *bio, int error)
struct inode *inode;
int i;
int blocksize;
- sector_t bi_sector = bio->bi_sector;
+ sector_t bi_sector = bio->bi_iter.bi_sector;

BUG_ON(!io_end);
inode = io_end->inode;
@@ -323,7 +323,7 @@ static int io_submit_init(struct ext4_io_submit *io,
if (!io_end)
return -ENOMEM;
bio = bio_alloc(GFP_NOIO, min(nvecs, BIO_MAX_PAGES));
- bio->bi_sector = bh->b_blocknr * (bh->b_size >> 9);
+ bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
bio->bi_bdev = bh->b_bdev;
bio->bi_private = io->io_end = io_end;
bio->bi_end_io = ext4_end_bio;
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 91ff93b..1dd4f114 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -371,7 +371,7 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct page *page,
bio = f2fs_bio_alloc(bdev, 1);

/* Initialize the bio */
- bio->bi_sector = SECTOR_FROM_BLOCK(sbi, blk_addr);
+ bio->bi_iter.bi_sector = SECTOR_FROM_BLOCK(sbi, blk_addr);
bio->bi_end_io = read_end_io;

if (bio_add_page(bio, page, PAGE_CACHE_SIZE, 0) < PAGE_CACHE_SIZE) {
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index d8e84e4..51f17f6 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -735,7 +735,8 @@ static void submit_write_page(struct f2fs_sb_info *sbi, struct page *page,
alloc_new:
if (sbi->bio[type] == NULL) {
sbi->bio[type] = f2fs_bio_alloc(bdev, max_hw_blocks(sbi));
- sbi->bio[type]->bi_sector = SECTOR_FROM_BLOCK(sbi, blk_addr);
+ sbi->bio[type]->bi_iter.bi_sector =
+ SECTOR_FROM_BLOCK(sbi, blk_addr);
/*
* The end_io will be assigned at the sumbission phase.
* Until then, let bio_add_page() merge consecutive IOs as much
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 68b4c8f..8bbc622 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -271,7 +271,7 @@ static struct bio *gfs2_log_alloc_bio(struct gfs2_sbd *sdp, u64 blkno)
nrvecs = max(nrvecs/2, 1U);
}

- bio->bi_sector = blkno * (sb->s_blocksize >> 9);
+ bio->bi_iter.bi_sector = blkno * (sb->s_blocksize >> 9);
bio->bi_bdev = sb->s_bdev;
bio->bi_end_io = gfs2_end_log_write;
bio->bi_private = sdp;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 60ede2a..0606aea 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -224,7 +224,7 @@ static int gfs2_read_super(struct gfs2_sbd *sdp, sector_t sector, int silent)
lock_page(page);

bio = bio_alloc(GFP_NOFS, 1);
- bio->bi_sector = sector * (sb->s_blocksize >> 9);
+ bio->bi_iter.bi_sector = sector * (sb->s_blocksize >> 9);
bio->bi_bdev = sb->s_bdev;
bio_add_page(bio, page, PAGE_SIZE, 0);

diff --git a/fs/hfsplus/wrapper.c b/fs/hfsplus/wrapper.c
index b51a607..5585a32 100644
--- a/fs/hfsplus/wrapper.c
+++ b/fs/hfsplus/wrapper.c
@@ -71,7 +71,7 @@ int hfsplus_submit_bio(struct super_block *sb, sector_t sector,
sector &= ~((io_size >> HFSPLUS_SECTOR_SHIFT) - 1);

bio = bio_alloc(GFP_NOIO, 1);
- bio->bi_sector = sector;
+ bio->bi_iter.bi_sector = sector;
bio->bi_bdev = sb->s_bdev;
bio->bi_end_io = hfsplus_end_io_sync;
bio->bi_private = &wait;
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index c57499d..c9b5cd6 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -1998,14 +1998,14 @@ static int lbmRead(struct jfs_log * log, int pn, struct lbuf ** bpp)

bio = bio_alloc(GFP_NOFS, 1);

- bio->bi_sector = bp->l_blkno << (log->l2bsize - 9);
+ bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9);
bio->bi_bdev = log->bdev;
bio->bi_io_vec[0].bv_page = bp->l_page;
bio->bi_io_vec[0].bv_len = LOGPSIZE;
bio->bi_io_vec[0].bv_offset = bp->l_offset;

bio->bi_vcnt = 1;
- bio->bi_size = LOGPSIZE;
+ bio->bi_iter.bi_size = LOGPSIZE;

bio->bi_end_io = lbmIODone;
bio->bi_private = bp;
@@ -2138,21 +2138,21 @@ static void lbmStartIO(struct lbuf * bp)
jfs_info("lbmStartIO\n");

bio = bio_alloc(GFP_NOFS, 1);
- bio->bi_sector = bp->l_blkno << (log->l2bsize - 9);
+ bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9);
bio->bi_bdev = log->bdev;
bio->bi_io_vec[0].bv_page = bp->l_page;
bio->bi_io_vec[0].bv_len = LOGPSIZE;
bio->bi_io_vec[0].bv_offset = bp->l_offset;

bio->bi_vcnt = 1;
- bio->bi_size = LOGPSIZE;
+ bio->bi_iter.bi_size = LOGPSIZE;

bio->bi_end_io = lbmIODone;
bio->bi_private = bp;

/* check if journaling to disk has been disabled */
if (log->no_integrity) {
- bio->bi_size = 0;
+ bio->bi_iter.bi_size = 0;
lbmIODone(bio, 0);
} else {
submit_bio(WRITE_SYNC, bio);
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index 6740d34..299d9cd 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -416,7 +416,7 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
* count from hitting zero before we're through
*/
inc_io(page);
- if (!bio->bi_size)
+ if (!bio->bi_iter.bi_size)
goto dump_bio;
submit_bio(WRITE, bio);
nr_underway++;
@@ -438,7 +438,7 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)

bio = bio_alloc(GFP_NOFS, 1);
bio->bi_bdev = inode->i_sb->s_bdev;
- bio->bi_sector = pblock << (inode->i_blkbits - 9);
+ bio->bi_iter.bi_sector = pblock << (inode->i_blkbits - 9);
bio->bi_end_io = metapage_write_end_io;
bio->bi_private = page;

@@ -452,7 +452,7 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
if (bio) {
if (bio_add_page(bio, page, bio_bytes, bio_offset) < bio_bytes)
goto add_failed;
- if (!bio->bi_size)
+ if (!bio->bi_iter.bi_size)
goto dump_bio;

submit_bio(WRITE, bio);
@@ -517,7 +517,8 @@ static int metapage_readpage(struct file *fp, struct page *page)

bio = bio_alloc(GFP_NOFS, 1);
bio->bi_bdev = inode->i_sb->s_bdev;
- bio->bi_sector = pblock << (inode->i_blkbits - 9);
+ bio->bi_iter.bi_sector =
+ pblock << (inode->i_blkbits - 9);
bio->bi_end_io = metapage_read_end_io;
bio->bi_private = page;
len = xlen << inode->i_blkbits;
diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index 550475c..a1b161f 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -32,9 +32,9 @@ static int sync_request(struct page *page, struct block_device *bdev, int rw)
bio_vec.bv_len = PAGE_SIZE;
bio_vec.bv_offset = 0;
bio.bi_vcnt = 1;
- bio.bi_size = PAGE_SIZE;
+ bio.bi_iter.bi_size = PAGE_SIZE;
bio.bi_bdev = bdev;
- bio.bi_sector = page->index * (PAGE_SIZE >> 9);
+ bio.bi_iter.bi_sector = page->index * (PAGE_SIZE >> 9);
init_completion(&complete);
bio.bi_private = &complete;
bio.bi_end_io = request_complete;
@@ -107,9 +107,9 @@ static int __bdev_writeseg(struct super_block *sb, u64 ofs, pgoff_t index,
if (i >= max_pages) {
/* Block layer cannot split bios :( */
bio->bi_vcnt = i;
- bio->bi_size = i * PAGE_SIZE;
+ bio->bi_iter.bi_size = i * PAGE_SIZE;
bio->bi_bdev = super->s_bdev;
- bio->bi_sector = ofs >> 9;
+ bio->bi_iter.bi_sector = ofs >> 9;
bio->bi_private = sb;
bio->bi_end_io = writeseg_end_io;
atomic_inc(&super->s_pending_writes);
@@ -134,9 +134,9 @@ static int __bdev_writeseg(struct super_block *sb, u64 ofs, pgoff_t index,
unlock_page(page);
}
bio->bi_vcnt = nr_pages;
- bio->bi_size = nr_pages * PAGE_SIZE;
+ bio->bi_iter.bi_size = nr_pages * PAGE_SIZE;
bio->bi_bdev = super->s_bdev;
- bio->bi_sector = ofs >> 9;
+ bio->bi_iter.bi_sector = ofs >> 9;
bio->bi_private = sb;
bio->bi_end_io = writeseg_end_io;
atomic_inc(&super->s_pending_writes);
@@ -199,9 +199,9 @@ static int do_erase(struct super_block *sb, u64 ofs, pgoff_t index,
if (i >= max_pages) {
/* Block layer cannot split bios :( */
bio->bi_vcnt = i;
- bio->bi_size = i * PAGE_SIZE;
+ bio->bi_iter.bi_size = i * PAGE_SIZE;
bio->bi_bdev = super->s_bdev;
- bio->bi_sector = ofs >> 9;
+ bio->bi_iter.bi_sector = ofs >> 9;
bio->bi_private = sb;
bio->bi_end_io = erase_end_io;
atomic_inc(&super->s_pending_writes);
@@ -220,9 +220,9 @@ static int do_erase(struct super_block *sb, u64 ofs, pgoff_t index,
bio->bi_io_vec[i].bv_offset = 0;
}
bio->bi_vcnt = nr_pages;
- bio->bi_size = nr_pages * PAGE_SIZE;
+ bio->bi_iter.bi_size = nr_pages * PAGE_SIZE;
bio->bi_bdev = super->s_bdev;
- bio->bi_sector = ofs >> 9;
+ bio->bi_iter.bi_sector = ofs >> 9;
bio->bi_private = sb;
bio->bi_end_io = erase_end_io;
atomic_inc(&super->s_pending_writes);
diff --git a/fs/mpage.c b/fs/mpage.c
index 0face1c..92b125f 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -94,7 +94,7 @@ mpage_alloc(struct block_device *bdev,

if (bio) {
bio->bi_bdev = bdev;
- bio->bi_sector = first_sector;
+ bio->bi_iter.bi_sector = first_sector;
}
return bio;
}
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 434b93e..6f4e31d 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -134,8 +134,8 @@ bl_submit_bio(int rw, struct bio *bio)
if (bio) {
get_parallel(bio->bi_private);
dprintk("%s submitting %s bio %u@%llu\n", __func__,
- rw == READ ? "read" : "write",
- bio->bi_size, (unsigned long long)bio->bi_sector);
+ rw == READ ? "read" : "write", bio->bi_iter.bi_size,
+ (unsigned long long)bio->bi_iter.bi_sector);
submit_bio(rw, bio);
}
return NULL;
@@ -156,7 +156,8 @@ static struct bio *bl_alloc_init_bio(int npg, sector_t isect,
}

if (bio) {
- bio->bi_sector = isect - be->be_f_offset + be->be_v_offset;
+ bio->bi_iter.bi_sector = isect - be->be_f_offset +
+ be->be_v_offset;
bio->bi_bdev = be->be_mdev;
bio->bi_end_io = end_io;
bio->bi_private = par;
@@ -519,7 +520,7 @@ bl_do_readpage_sync(struct page *page, struct pnfs_block_extent *be,
isect = (page->index << PAGE_CACHE_SECTOR_SHIFT) +
(offset / SECTOR_SIZE);

- bio->bi_sector = isect - be->be_f_offset + be->be_v_offset;
+ bio->bi_iter.bi_sector = isect - be->be_f_offset + be->be_v_offset;
bio->bi_bdev = be->be_mdev;
bio->bi_end_io = bl_read_single_end_io;

diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index dc9a913..85dabcd 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -417,7 +417,8 @@ static struct bio *nilfs_alloc_seg_bio(struct the_nilfs *nilfs, sector_t start,
}
if (likely(bio)) {
bio->bi_bdev = nilfs->ns_bdev;
- bio->bi_sector = start << (nilfs->ns_blocksize_bits - 9);
+ bio->bi_iter.bi_sector =
+ start << (nilfs->ns_blocksize_bits - 9);
}
return bio;
}
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 42252bf..0e7bac5 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -421,7 +421,7 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg,
}

/* Must put everything in 512 byte sectors for the bio... */
- bio->bi_sector = (reg->hr_start_block + cs) << (bits - 9);
+ bio->bi_iter.bi_sector = (reg->hr_start_block + cs) << (bits - 9);
bio->bi_bdev = reg->hr_bdev;
bio->bi_private = wc;
bio->bi_end_io = o2hb_bio_end_io;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 41a6950..cb4ea6a 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -415,7 +415,7 @@ xfs_alloc_ioend_bio(
struct bio *bio = bio_alloc(GFP_NOIO, nvecs);

ASSERT(bio->bi_private == NULL);
- bio->bi_sector = bh->b_blocknr * (bh->b_size >> 9);
+ bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
bio->bi_bdev = bh->b_bdev;
return bio;
}
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 1b2472a..0af7f93 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1285,7 +1285,7 @@ next_chunk:

bio = bio_alloc(GFP_NOIO, nr_pages);
bio->bi_bdev = bp->b_target->bt_bdev;
- bio->bi_sector = sector;
+ bio->bi_iter.bi_sector = sector;
bio->bi_end_io = xfs_buf_bio_end_io;
bio->bi_private = bp;

@@ -1307,7 +1307,7 @@ next_chunk:
total_nr_pages--;
}

- if (likely(bio->bi_size)) {
+ if (likely(bio->bi_iter.bi_size)) {
if (xfs_buf_is_vmapped(bp)) {
flush_kernel_vmap_range(bp->b_addr,
xfs_buf_vmap_len(bp));
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ef24466..d321e63 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -62,19 +62,19 @@
* on highmem page vectors
*/
#define bio_iovec_idx(bio, idx) (&((bio)->bi_io_vec[(idx)]))
-#define bio_iovec(bio) bio_iovec_idx((bio), (bio)->bi_idx)
+#define bio_iovec(bio) bio_iovec_idx((bio), (bio)->bi_iter.bi_idx)
#define bio_page(bio) bio_iovec((bio))->bv_page
#define bio_offset(bio) bio_iovec((bio))->bv_offset
-#define bio_segments(bio) ((bio)->bi_vcnt - (bio)->bi_idx)
-#define bio_sectors(bio) ((bio)->bi_size >> 9)
-#define bio_end_sector(bio) ((bio)->bi_sector + bio_sectors((bio)))
+#define bio_segments(bio) ((bio)->bi_vcnt - (bio)->bi_iter.bi_idx)
+#define bio_sectors(bio) ((bio)->bi_iter.bi_size >> 9)
+#define bio_end_sector(bio) ((bio)->bi_iter.bi_sector + bio_sectors((bio)))

static inline unsigned int bio_cur_bytes(struct bio *bio)
{
if (bio->bi_vcnt)
return bio_iovec(bio)->bv_len;
else /* dataless requests such as discard */
- return bio->bi_size;
+ return bio->bi_iter.bi_size;
}

static inline void *bio_data(struct bio *bio)
@@ -108,7 +108,7 @@ static inline void *bio_data(struct bio *bio)
*/

#define __BVEC_END(bio) bio_iovec_idx((bio), (bio)->bi_vcnt - 1)
-#define __BVEC_START(bio) bio_iovec_idx((bio), (bio)->bi_idx)
+#define __BVEC_START(bio) bio_iovec_idx((bio), (bio)->bi_iter.bi_idx)

/* Default implementation of BIOVEC_PHYS_MERGEABLE */
#define __BIOVEC_PHYS_MERGEABLE(vec1, vec2) \
@@ -150,7 +150,7 @@ static inline void *bio_data(struct bio *bio)
i++)

#define bio_for_each_segment(bvl, bio, i) \
- for (i = (bio)->bi_idx; \
+ for (i = (bio)->bi_iter.bi_idx; \
bvl = bio_iovec_idx((bio), (i)), i < (bio)->bi_vcnt; \
i++)

@@ -364,7 +364,7 @@ static inline char *__bio_kmap_irq(struct bio *bio, unsigned short idx,
#define __bio_kunmap_irq(buf, flags) bvec_kunmap_irq(buf, flags)

#define bio_kmap_irq(bio, flags) \
- __bio_kmap_irq((bio), (bio)->bi_idx, (flags))
+ __bio_kmap_irq((bio), (bio)->bi_iter.bi_idx, (flags))
#define bio_kunmap_irq(buf,flags) __bio_kunmap_irq(buf, flags)

/*
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index fa1abeb..d46e8a6 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -28,13 +28,19 @@ struct bio_vec {
unsigned int bv_offset;
};

+struct bvec_iter {
+ sector_t bi_sector; /* device address in 512 byte
+ sectors */
+ unsigned int bi_size; /* residual I/O count */
+
+ unsigned int bi_idx; /* current index into bvl_vec */
+};
+
/*
* main unit of I/O for the block layer and lower layers (ie drivers and
* stacking drivers)
*/
struct bio {
- sector_t bi_sector; /* device address in 512 byte
- sectors */
struct bio *bi_next; /* request queue link */
struct block_device *bi_bdev;
unsigned long bi_flags; /* status, command, etc */
@@ -42,16 +48,13 @@ struct bio {
* top bits priority
*/

- unsigned short bi_vcnt; /* how many bio_vec's */
- unsigned short bi_idx; /* current index into bvl_vec */
+ struct bvec_iter bi_iter;

/* Number of segments in this BIO after
* physical address coalescing is performed.
*/
unsigned int bi_phys_segments;

- unsigned int bi_size; /* residual I/O count */
-
/*
* To keep track of the max segment size, we account for the
* sizes of the first and last mergeable segments in this bio.
@@ -74,11 +77,13 @@ struct bio {
struct bio_integrity_payload *bi_integrity; /* data integrity */
#endif

+ unsigned short bi_vcnt; /* how many bio_vec's */
+
/*
* Everything starting with bi_max_vecs will be preserved by bio_reset()
*/

- unsigned int bi_max_vecs; /* max bvl_vecs we can hold */
+ unsigned short bi_max_vecs; /* max bvl_vecs we can hold */

atomic_t bi_cnt; /* pin count */

diff --git a/include/trace/events/bcache.h b/include/trace/events/bcache.h
index 3cc5a0b..827166f 100644
--- a/include/trace/events/bcache.h
+++ b/include/trace/events/bcache.h
@@ -29,10 +29,10 @@ DECLARE_EVENT_CLASS(bcache_request,
__entry->dev = bio->bi_bdev->bd_dev;
__entry->orig_major = s->d->disk->major;
__entry->orig_minor = s->d->disk->first_minor;
- __entry->sector = bio->bi_sector;
- __entry->orig_sector = bio->bi_sector - 16;
- __entry->nr_sector = bio->bi_size >> 9;
- blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
+ __entry->sector = bio->bi_iter.bi_sector;
+ __entry->orig_sector = bio->bi_iter.bi_sector - 16;
+ __entry->nr_sector = bio_sectors(bio);
+ blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),

@@ -75,9 +75,9 @@ DECLARE_EVENT_CLASS(bcache_bio,

TP_fast_assign(
__entry->dev = bio->bi_bdev->bd_dev;
- __entry->sector = bio->bi_sector;
- __entry->nr_sector = bio->bi_size >> 9;
- blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
+ __entry->sector = bio->bi_iter.bi_sector;
+ __entry->nr_sector = bio_sectors(bio);
+ blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),

@@ -208,10 +208,10 @@ DECLARE_EVENT_CLASS(bcache_cache_bio,
TP_fast_assign(
__entry->dev = bio->bi_bdev->bd_dev;
__entry->orig_dev = orig_bdev->bd_dev;
- __entry->sector = bio->bi_sector;
+ __entry->sector = bio->bi_iter.bi_sector;
__entry->orig_sector = orig_sector;
- __entry->nr_sector = bio->bi_size >> 9;
- blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
+ __entry->nr_sector = bio_sectors(bio);
+ blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 60ae7c3..fbf9c5c 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -243,9 +243,9 @@ TRACE_EVENT(block_bio_bounce,
TP_fast_assign(
__entry->dev = bio->bi_bdev ?
bio->bi_bdev->bd_dev : 0;
- __entry->sector = bio->bi_sector;
+ __entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector = bio_sectors(bio);
- blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
+ blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),

@@ -280,10 +280,10 @@ TRACE_EVENT(block_bio_complete,

TP_fast_assign(
__entry->dev = bio->bi_bdev->bd_dev;
- __entry->sector = bio->bi_sector;
+ __entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector = bio_sectors(bio);
__entry->error = error;
- blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
+ blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
),

TP_printk("%d,%d %s %llu + %u [%d]",
@@ -308,9 +308,9 @@ DECLARE_EVENT_CLASS(block_bio_merge,

TP_fast_assign(
__entry->dev = bio->bi_bdev->bd_dev;
- __entry->sector = bio->bi_sector;
+ __entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector = bio_sectors(bio);
- blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
+ blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),

@@ -375,9 +375,9 @@ TRACE_EVENT(block_bio_queue,

TP_fast_assign(
__entry->dev = bio->bi_bdev->bd_dev;
- __entry->sector = bio->bi_sector;
+ __entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector = bio_sectors(bio);
- blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
+ blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),

@@ -403,7 +403,7 @@ DECLARE_EVENT_CLASS(block_get_rq,

TP_fast_assign(
__entry->dev = bio ? bio->bi_bdev->bd_dev : 0;
- __entry->sector = bio ? bio->bi_sector : 0;
+ __entry->sector = bio ? bio->bi_iter.bi_sector : 0;
__entry->nr_sector = bio ? bio_sectors(bio) : 0;
blk_fill_rwbs(__entry->rwbs,
bio ? bio->bi_rw : 0, __entry->nr_sector);
@@ -538,9 +538,9 @@ TRACE_EVENT(block_split,

TP_fast_assign(
__entry->dev = bio->bi_bdev->bd_dev;
- __entry->sector = bio->bi_sector;
+ __entry->sector = bio->bi_iter.bi_sector;
__entry->new_sector = new_sector;
- blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
+ blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),

@@ -579,11 +579,11 @@ TRACE_EVENT(block_bio_remap,

TP_fast_assign(
__entry->dev = bio->bi_bdev->bd_dev;
- __entry->sector = bio->bi_sector;
+ __entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector = bio_sectors(bio);
__entry->old_dev = dev;
__entry->old_sector = from;
- blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size);
+ blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_iter.bi_size);
),

TP_printk("%d,%d %s %llu + %u <- (%d,%d) %llu",
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 52ae548..a9531ad 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -611,8 +611,8 @@ TRACE_EVENT(f2fs_do_submit_bio,
__entry->dev = sb->s_dev;
__entry->btype = btype;
__entry->sync = sync;
- __entry->sector = bio->bi_sector;
- __entry->size = bio->bi_size;
+ __entry->sector = bio->bi_iter.bi_sector;
+ __entry->size = bio->bi_iter.bi_size;
),

TP_printk("dev = (%d,%d), type = %s, io = %s, sector = %lld, size = %u",
diff --git a/kernel/power/block_io.c b/kernel/power/block_io.c
index d09dd10..9a58bc2 100644
--- a/kernel/power/block_io.c
+++ b/kernel/power/block_io.c
@@ -32,7 +32,7 @@ static int submit(int rw, struct block_device *bdev, sector_t sector,
struct bio *bio;

bio = bio_alloc(__GFP_WAIT | __GFP_HIGH, 1);
- bio->bi_sector = sector;
+ bio->bi_iter.bi_sector = sector;
bio->bi_bdev = bdev;
bio->bi_end_io = end_swap_bio_read;

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index b8b8560..2814a3e 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -764,8 +764,8 @@ static void blk_add_trace_bio(struct request_queue *q, struct bio *bio,
if (!error && !bio_flagged(bio, BIO_UPTODATE))
error = EIO;

- __blk_add_trace(bt, bio->bi_sector, bio->bi_size, bio->bi_rw, what,
- error, 0, NULL);
+ __blk_add_trace(bt, bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
+ bio->bi_rw, what, error, 0, NULL);
}

static void blk_add_trace_bio_bounce(void *ignore,
@@ -868,8 +868,9 @@ static void blk_add_trace_split(void *ignore,
if (bt) {
__be64 rpdu = cpu_to_be64(pdu);

- __blk_add_trace(bt, bio->bi_sector, bio->bi_size, bio->bi_rw,
- BLK_TA_SPLIT, !bio_flagged(bio, BIO_UPTODATE),
+ __blk_add_trace(bt, bio->bi_iter.bi_sector,
+ bio->bi_iter.bi_size, bio->bi_rw, BLK_TA_SPLIT,
+ !bio_flagged(bio, BIO_UPTODATE),
sizeof(rpdu), &rpdu);
}
}
@@ -901,9 +902,9 @@ static void blk_add_trace_bio_remap(void *ignore,
r.device_to = cpu_to_be32(bio->bi_bdev->bd_dev);
r.sector_from = cpu_to_be64(from);

- __blk_add_trace(bt, bio->bi_sector, bio->bi_size, bio->bi_rw,
- BLK_TA_REMAP, !bio_flagged(bio, BIO_UPTODATE),
- sizeof(r), &r);
+ __blk_add_trace(bt, bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
+ bio->bi_rw, BLK_TA_REMAP,
+ !bio_flagged(bio, BIO_UPTODATE), sizeof(r), &r);
}

/**
diff --git a/mm/page_io.c b/mm/page_io.c
index a8a3ef4..7617859 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -30,13 +30,13 @@ static struct bio *get_swap_bio(gfp_t gfp_flags,

bio = bio_alloc(gfp_flags, 1);
if (bio) {
- bio->bi_sector = map_swap_page(page, &bio->bi_bdev);
- bio->bi_sector <<= PAGE_SHIFT - 9;
+ bio->bi_iter.bi_sector = map_swap_page(page, &bio->bi_bdev);
+ bio->bi_iter.bi_sector <<= PAGE_SHIFT - 9;
bio->bi_io_vec[0].bv_page = page;
bio->bi_io_vec[0].bv_len = PAGE_SIZE;
bio->bi_io_vec[0].bv_offset = 0;
bio->bi_vcnt = 1;
- bio->bi_size = PAGE_SIZE;
+ bio->bi_iter.bi_size = PAGE_SIZE;
bio->bi_end_io = end_io;
}
return bio;
@@ -61,7 +61,7 @@ void end_swap_bio_write(struct bio *bio, int err)
printk(KERN_ALERT "Write-error on swap-device (%u:%u:%Lu)\n",
imajor(bio->bi_bdev->bd_inode),
iminor(bio->bi_bdev->bd_inode),
- (unsigned long long)bio->bi_sector);
+ (unsigned long long)bio->bi_iter.bi_sector);
ClearPageReclaim(page);
}
end_page_writeback(page);
@@ -79,7 +79,7 @@ void end_swap_bio_read(struct bio *bio, int err)
printk(KERN_ALERT "Read-error on swap-device (%u:%u:%Lu)\n",
imajor(bio->bi_bdev->bd_inode),
iminor(bio->bi_bdev->bd_inode),
- (unsigned long long)bio->bi_sector);
+ (unsigned long long)bio->bi_iter.bi_sector);
} else {
SetPageUptodate(page);
}
--
1.8.3.rc1

2013-06-09 02:27:08

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 01/26] bcache: Use standard utility code

Some of bcache's utility code has made it into the rest of the kernel,
so drop the bcache versions.

Bcache used to have a workaround for allocating from a bio set under
generic_make_request() (if you allocated more than once, the bios you
already allocated would get stuck on current->bio_list when you
submitted, and you'd risk deadlock) - bcache would mask out __GFP_WAIT
when allocating bios under generic_make_request() so that allocation
could fail and it could retry from workqueue. But bio_alloc_bioset() has
a workaround now, so we can drop this hack and the associated error
handling.

Signed-off-by: Kent Overstreet <[email protected]>
---
drivers/md/bcache/btree.c | 7 +---
drivers/md/bcache/debug.c | 2 +-
drivers/md/bcache/io.c | 64 +++++++++++--------------------
drivers/md/bcache/movinggc.c | 7 ++--
drivers/md/bcache/request.c | 87 +++++++++----------------------------------
drivers/md/bcache/util.c | 17 ---------
drivers/md/bcache/util.h | 4 --
drivers/md/bcache/writeback.c | 7 ++--
8 files changed, 51 insertions(+), 144 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 7a5658f..6830190 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -333,7 +333,7 @@ static void do_btree_write(struct btree *b)
bkey_copy(&k.key, &b->key);
SET_PTR_OFFSET(&k.key, 0, PTR_OFFSET(&k.key, 0) + bset_offset(b, i));

- if (!bch_bio_alloc_pages(b->bio, GFP_NOIO)) {
+ if (!bio_alloc_pages(b->bio, GFP_NOIO)) {
int j;
struct bio_vec *bv;
void *base = (void *) ((unsigned long) i & ~(PAGE_SIZE - 1));
@@ -1896,7 +1896,7 @@ bool bch_btree_insert_check_key(struct btree *b, struct btree_op *op,
should_split(b))
goto out;

- op->replace = KEY(op->inode, bio_end(bio), bio_sectors(bio));
+ op->replace = KEY(op->inode, bio_end_sector(bio), bio_sectors(bio));

SET_KEY_PTRS(&op->replace, 1);
get_random_bytes(&op->replace.ptr[0], sizeof(uint64_t));
@@ -2215,9 +2215,6 @@ static int submit_partial_cache_hit(struct btree *b, struct btree_op *op,
KEY_OFFSET(k) - bio->bi_sector);

n = bch_bio_split(bio, sectors, GFP_NOIO, s->d->bio_split);
- if (!n)
- return -EAGAIN;
-
if (n == bio)
op->lookup_done = true;

diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 89fd520..91cd5f8 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -200,7 +200,7 @@ void bch_data_verify(struct search *s)
if (!check)
return;

- if (bch_bio_alloc_pages(check, GFP_NOIO))
+ if (bio_alloc_pages(check, GFP_NOIO))
goto out_put;

check->bi_rw = READ_SYNC;
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 48efd4d..e5d27a8 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -66,13 +66,6 @@ static void bch_generic_make_request_hack(struct bio *bio)
* The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
* bvec boundry; it is the caller's responsibility to ensure that @bio is not
* freed before the split.
- *
- * If bch_bio_split() is running under generic_make_request(), it's not safe to
- * allocate more than one bio from the same bio set. Therefore, if it is running
- * under generic_make_request() it masks out __GFP_WAIT when doing the
- * allocation. The caller must check for failure if there's any possibility of
- * it being called from under generic_make_request(); it is then the caller's
- * responsibility to retry from a safe context (by e.g. punting to workqueue).
*/
struct bio *bch_bio_split(struct bio *bio, int sectors,
gfp_t gfp, struct bio_set *bs)
@@ -83,15 +76,6 @@ struct bio *bch_bio_split(struct bio *bio, int sectors,

BUG_ON(sectors <= 0);

- /*
- * If we're being called from underneath generic_make_request() and we
- * already allocated any bios from this bio set, we risk deadlock if we
- * use the mempool. So instead, we possibly fail and let the caller punt
- * to workqueue or somesuch and retry in a safe context.
- */
- if (current->bio_list)
- gfp &= ~__GFP_WAIT;
-
if (sectors >= bio_sectors(bio))
return bio;

@@ -160,17 +144,18 @@ static unsigned bch_bio_max_sectors(struct bio *bio)
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
unsigned max_segments = min_t(unsigned, BIO_MAX_PAGES,
queue_max_segments(q));
- struct bio_vec *bv, *end = bio_iovec(bio) +
- min_t(int, bio_segments(bio), max_segments);

if (bio->bi_rw & REQ_DISCARD)
return min(ret, q->limits.max_discard_sectors);

if (bio_segments(bio) > max_segments ||
q->merge_bvec_fn) {
+ struct bio_vec *bv;
+ int i, seg = 0;
+
ret = 0;

- for (bv = bio_iovec(bio); bv < end; bv++) {
+ bio_for_each_segment(bv, bio, i) {
struct bvec_merge_data bvm = {
.bi_bdev = bio->bi_bdev,
.bi_sector = bio->bi_sector,
@@ -178,10 +163,14 @@ static unsigned bch_bio_max_sectors(struct bio *bio)
.bi_rw = bio->bi_rw,
};

+ if (seg == max_segments)
+ break;
+
if (q->merge_bvec_fn &&
q->merge_bvec_fn(q, &bvm, bv) < (int) bv->bv_len)
break;

+ seg++;
ret += bv->bv_len >> 9;
}
}
@@ -218,30 +207,10 @@ static void bch_bio_submit_split_endio(struct bio *bio, int error)
closure_put(cl);
}

-static void __bch_bio_submit_split(struct closure *cl)
-{
- struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
- struct bio *bio = s->bio, *n;
-
- do {
- n = bch_bio_split(bio, bch_bio_max_sectors(bio),
- GFP_NOIO, s->p->bio_split);
- if (!n)
- continue_at(cl, __bch_bio_submit_split, system_wq);
-
- n->bi_end_io = bch_bio_submit_split_endio;
- n->bi_private = cl;
-
- closure_get(cl);
- bch_generic_make_request_hack(n);
- } while (n != bio);
-
- continue_at(cl, bch_bio_submit_split_done, NULL);
-}
-
void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
{
struct bio_split_hook *s;
+ struct bio *n;

if (!bio_has_data(bio) && !(bio->bi_rw & REQ_DISCARD))
goto submit;
@@ -250,6 +219,7 @@ void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
goto submit;

s = mempool_alloc(p->bio_split_hook, GFP_NOIO);
+ closure_init(&s->cl, NULL);

s->bio = bio;
s->p = p;
@@ -257,8 +227,18 @@ void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
s->bi_private = bio->bi_private;
bio_get(bio);

- closure_call(&s->cl, __bch_bio_submit_split, NULL, NULL);
- return;
+ do {
+ n = bch_bio_split(bio, bch_bio_max_sectors(bio),
+ GFP_NOIO, s->p->bio_split);
+
+ n->bi_end_io = bch_bio_submit_split_endio;
+ n->bi_private = &s->cl;
+
+ closure_get(&s->cl);
+ bch_generic_make_request_hack(n);
+ } while (n != bio);
+
+ continue_at(&s->cl, bch_bio_submit_split_done, NULL);
submit:
bch_generic_make_request_hack(bio);
}
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index 8589512..23f19fa 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -44,9 +44,10 @@ static void write_moving_finish(struct closure *cl)
{
struct moving_io *io = container_of(cl, struct moving_io, s.cl);
struct bio *bio = &io->bio.bio;
- struct bio_vec *bv = bio_iovec_idx(bio, bio->bi_vcnt);
+ struct bio_vec *bv;
+ int i;

- while (bv-- != bio->bi_io_vec)
+ bio_for_each_segment_all(bv, bio, i)
__free_page(bv->bv_page);

pr_debug("%s %s", io->s.op.insert_collision
@@ -159,7 +160,7 @@ static void read_moving(struct closure *cl)
bio->bi_rw = READ;
bio->bi_end_io = read_moving_endio;

- if (bch_bio_alloc_pages(bio, GFP_KERNEL))
+ if (bio_alloc_pages(bio, GFP_KERNEL))
goto err;

pr_debug("%s", pkey(&w->key));
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index e5ff12e..c91ef76 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -510,10 +510,6 @@ static void bch_insert_data_loop(struct closure *cl)
goto err;

n = bch_bio_split(bio, KEY_SIZE(k), GFP_NOIO, split);
- if (!n) {
- __bkey_put(op->c, k);
- continue_at(cl, bch_insert_data_loop, bcache_wq);
- }

n->bi_end_io = bch_insert_data_endio;
n->bi_private = cl;
@@ -827,53 +823,13 @@ static void request_read_done(struct closure *cl)
*/

if (s->op.cache_bio) {
- struct bio_vec *src, *dst;
- unsigned src_offset, dst_offset, bytes;
- void *dst_ptr;
-
bio_reset(s->op.cache_bio);
s->op.cache_bio->bi_sector = s->cache_miss->bi_sector;
s->op.cache_bio->bi_bdev = s->cache_miss->bi_bdev;
s->op.cache_bio->bi_size = s->cache_bio_sectors << 9;
bch_bio_map(s->op.cache_bio, NULL);

- src = bio_iovec(s->op.cache_bio);
- dst = bio_iovec(s->cache_miss);
- src_offset = src->bv_offset;
- dst_offset = dst->bv_offset;
- dst_ptr = kmap(dst->bv_page);
-
- while (1) {
- if (dst_offset == dst->bv_offset + dst->bv_len) {
- kunmap(dst->bv_page);
- dst++;
- if (dst == bio_iovec_idx(s->cache_miss,
- s->cache_miss->bi_vcnt))
- break;
-
- dst_offset = dst->bv_offset;
- dst_ptr = kmap(dst->bv_page);
- }
-
- if (src_offset == src->bv_offset + src->bv_len) {
- src++;
- if (src == bio_iovec_idx(s->op.cache_bio,
- s->op.cache_bio->bi_vcnt))
- BUG();
-
- src_offset = src->bv_offset;
- }
-
- bytes = min(dst->bv_offset + dst->bv_len - dst_offset,
- src->bv_offset + src->bv_len - src_offset);
-
- memcpy(dst_ptr + dst_offset,
- page_address(src->bv_page) + src_offset,
- bytes);
-
- src_offset += bytes;
- dst_offset += bytes;
- }
+ bio_copy_data(s->cache_miss, s->op.cache_bio);

bio_put(s->cache_miss);
s->cache_miss = NULL;
@@ -917,9 +873,6 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
struct bio *miss;

miss = bch_bio_split(bio, sectors, GFP_NOIO, s->d->bio_split);
- if (!miss)
- return -EAGAIN;
-
if (miss == bio)
s->op.lookup_done = true;

@@ -938,8 +891,9 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
reada = min(dc->readahead >> 9,
sectors - bio_sectors(miss));

- if (bio_end(miss) + reada > bdev_sectors(miss->bi_bdev))
- reada = bdev_sectors(miss->bi_bdev) - bio_end(miss);
+ if (bio_end_sector(miss) + reada > bdev_sectors(miss->bi_bdev))
+ reada = bdev_sectors(miss->bi_bdev) -
+ bio_end_sector(miss);
}

s->cache_bio_sectors = bio_sectors(miss) + reada;
@@ -963,7 +917,7 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
goto out_put;

bch_bio_map(s->op.cache_bio, NULL);
- if (bch_bio_alloc_pages(s->op.cache_bio, __GFP_NOWARN|GFP_NOIO))
+ if (bio_alloc_pages(s->op.cache_bio, __GFP_NOWARN|GFP_NOIO))
goto out_put;

s->cache_miss = miss;
@@ -1019,7 +973,7 @@ static void request_write(struct cached_dev *dc, struct search *s)
struct bio *bio = &s->bio.bio;
struct bkey start, end;
start = KEY(dc->disk.id, bio->bi_sector, 0);
- end = KEY(dc->disk.id, bio_end(bio), 0);
+ end = KEY(dc->disk.id, bio_end_sector(bio), 0);

bch_keybuf_check_overlapping(&s->op.c->moving_gc_keys, &start, &end);

@@ -1177,7 +1131,7 @@ found:
if (i->sequential + bio->bi_size > i->sequential)
i->sequential += bio->bi_size;

- i->last = bio_end(bio);
+ i->last = bio_end_sector(bio);
i->jiffies = jiffies + msecs_to_jiffies(5000);
s->task->sequential_io = i->sequential;

@@ -1288,30 +1242,25 @@ void bch_cached_dev_request_init(struct cached_dev *dc)
static int flash_dev_cache_miss(struct btree *b, struct search *s,
struct bio *bio, unsigned sectors)
{
+ struct bio_vec *bv;
+ int i;
+
/* Zero fill bio */

- while (bio->bi_idx != bio->bi_vcnt) {
- struct bio_vec *bv = bio_iovec(bio);
+ bio_for_each_segment(bv, bio, i) {
unsigned j = min(bv->bv_len >> 9, sectors);

void *p = kmap(bv->bv_page);
memset(p + bv->bv_offset, 0, j << 9);
kunmap(bv->bv_page);

- bv->bv_len -= j << 9;
- bv->bv_offset += j << 9;
-
- if (bv->bv_len)
- return 0;
-
- bio->bi_sector += j;
- bio->bi_size -= j << 9;
-
- bio->bi_idx++;
- sectors -= j;
+ sectors -= j;
}

- s->op.lookup_done = true;
+ bio_advance(bio, min(sectors << 9, bio->bi_size));
+
+ if (!bio->bi_size)
+ s->op.lookup_done = true;

return 0;
}
@@ -1338,8 +1287,8 @@ static void flash_dev_make_request(struct request_queue *q, struct bio *bio)
closure_call(&s->op.cl, btree_read_async, NULL, cl);
} else if (bio_has_data(bio) || s->op.skip) {
bch_keybuf_check_overlapping(&s->op.c->moving_gc_keys,
- &KEY(d->id, bio->bi_sector, 0),
- &KEY(d->id, bio_end(bio), 0));
+ &KEY(d->id, bio->bi_sector, 0),
+ &KEY(d->id, bio_end_sector(bio), 0));

s->writeback = true;
s->op.cache_bio = bio;
diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
index da3a99e..98eb811 100644
--- a/drivers/md/bcache/util.c
+++ b/drivers/md/bcache/util.c
@@ -228,23 +228,6 @@ start: bv->bv_len = min_t(size_t, PAGE_SIZE - bv->bv_offset,
}
}

-int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp)
-{
- int i;
- struct bio_vec *bv;
-
- bio_for_each_segment(bv, bio, i) {
- bv->bv_page = alloc_page(gfp);
- if (!bv->bv_page) {
- while (bv-- != bio->bi_io_vec + bio->bi_idx)
- __free_page(bv->bv_page);
- return -ENOMEM;
- }
- }
-
- return 0;
-}
-
/*
* Portions Copyright (c) 1996-2001, PostgreSQL Global Development Group (Any
* use permitted, subject to terms of PostgreSQL license; see.)
diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
index 577393e..c9b3806 100644
--- a/drivers/md/bcache/util.h
+++ b/drivers/md/bcache/util.h
@@ -566,12 +566,8 @@ static inline unsigned fract_exp_two(unsigned x, unsigned fract_bits)
return x;
}

-#define bio_end(bio) ((bio)->bi_sector + bio_sectors(bio))
-
void bch_bio_map(struct bio *bio, void *base);

-int bch_bio_alloc_pages(struct bio *bio, gfp_t gfp);
-
static inline sector_t bdev_sectors(struct block_device *bdev)
{
return bdev->bd_inode->i_size >> 9;
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 93e7e31..8a3311a 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -216,9 +216,10 @@ static void write_dirty_finish(struct closure *cl)
struct dirty_io *io = container_of(cl, struct dirty_io, cl);
struct keybuf_key *w = io->bio.bi_private;
struct cached_dev *dc = io->dc;
- struct bio_vec *bv = bio_iovec_idx(&io->bio, io->bio.bi_vcnt);
+ struct bio_vec *bv;
+ int i;

- while (bv-- != io->bio.bi_io_vec)
+ bio_for_each_segment_all(bv, &io->bio, i)
__free_page(bv->bv_page);

/* This is kind of a dumb way of signalling errors. */
@@ -349,7 +350,7 @@ static void read_dirty(struct closure *cl)
io->bio.bi_rw = READ;
io->bio.bi_end_io = read_dirty_endio;

- if (bch_bio_alloc_pages(&io->bio, GFP_KERNEL))
+ if (bio_alloc_pages(&io->bio, GFP_KERNEL))
goto err_free;

pr_debug("%s", pkey(&w->key));
--
1.8.3.rc1

2013-06-09 08:34:21

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: Immutable biovecs, dio rewrite

On Sun, Jun 9, 2013 at 4:18 AM, Kent Overstreet <[email protected]> wrote:
> * Changing all the drivers to go through the iterator means that we can
> submit a partially completed bio to generic_make_request() - this
> previously worked on some drivers, but worked on others.

Some negation is missing in this sentence?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2013-06-09 08:55:20

by Kent Overstreet

[permalink] [raw]
Subject: Re: Immutable biovecs, dio rewrite

On Sun, Jun 09, 2013 at 10:34:08AM +0200, Geert Uytterhoeven wrote:
> On Sun, Jun 9, 2013 at 4:18 AM, Kent Overstreet <[email protected]> wrote:
> > * Changing all the drivers to go through the iterator means that we can
> > submit a partially completed bio to generic_make_request() - this
> > previously worked on some drivers, but worked on others.
>
> Some negation is missing in this sentence?

Heh, whoops.

Yeah, it seemed to work fine on normal request queue based drivers - and
expecting the driver to respect the current value of bi_idx isn't really
a crazy thing - but it breaks with other drivers. Both bcache and md
have hacks to work around this (c.f. md_trim_bio()) but those hacks have
downsides (you have to either modify the biovec, or clone the whole bio
and biovec).

But if drivers are using the bvec iterator primitives so they don't
modify the biovec, we get this for free.

2013-06-11 05:20:20

by Dave Chinner

[permalink] [raw]
Subject: Re: Immutable biovecs, dio rewrite

On Sat, Jun 08, 2013 at 07:18:42PM -0700, Kent Overstreet wrote:
> Immutable biovecs: Drivers no longer modify the biovec array directly
> (bv_len/bv_offset in particular) - we add a real iterator to struct bio
> that lets drivers partially complete a bio while only modifying the
> iterator. The iterator has the existing bi_sector, bi_size, bi_idx
> memembers, and also bi_bvec_done.
>
> This gets us a couple things:
> * Changing all the drivers to go through the iterator means that we can
> submit a partially completed bio to generic_make_request() - this
> previously worked on some drivers, but worked on others.
>
> This makes it much easier for upper layers to process bios
> incrementally - not just stacking drivers, my dio rewrite relies
> heavily on this strategy.
>
> * Previously, any code that might need to retry a bio somehow if it
> errored (mainly stacking drivers) had to clone not just the bio, but
> the entire biovec. The biovec can be up to BIO_MAX_PAGES, which works
> out to 4k...
>
> * When cloning a bio, now we don't have to clone the biovec unless we
> want to modify it. Bio splitting also becomes just a special case of
> cloning a bio.
>
> We also get to delete a lot of code. And this patch series barely
> scratches the surface - I've got more patches that delete another 1.5k
> lines of code, without trying all that hard.
>
> I'd like to get as much of this into 3.11 as possible - I don't know if
> the dio rewrite is a realistic possibility (it currently breaks btrfs -
> we need to add a different hook for them) and it does need a lot of
> review and testing from the various driver maintainers. The dio rewrite
> does pass xfstests for me, though.

Please test with XFS and CONFIG_XFS_DEBUG=y - xfstests will stress
the dio subsystem a lot more when it is run on XFS. Indeed, xfstests
generic/013 assert fails almost immediately with:

[ 58.859136] XFS (vda): Mounting Filesystem
[ 58.881742] XFS (vda): Ending clean mount
[ 58.989301] XFS: Assertion failed: bh_result->b_size >= (1 << inode->i_blkbits), file: fs/xfs/xfs_aops.c, line: 1209
[ 58.992672] ------------[ cut here ]------------
[ 58.994093] kernel BUG at fs/xfs/xfs_message.c:108!
[ 58.995385] invalid opcode: 0000 [#1] SMP
[ 58.996569] Modules linked in:
[ 58.997427] CPU: 1 PID: 9529 Comm: fsstress Not tainted 3.10.0-rc4-next-20130510-dgc+ #85
[ 58.999556] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 59.001143] task: ffff880079a34530 ti: ffff880079a0a000 task.ti: ffff880079a0a000
[ 59.003130] RIP: 0010:[<ffffffff814655c2>] [<ffffffff814655c2>] assfail+0x22/0x30
[ 59.005263] RSP: 0018:ffff880079a0b998 EFLAGS: 00010292
[ 59.006334] RAX: 0000000000000068 RBX: 0000000000054000 RCX: ffff88007fd0eb68
[ 59.007676] RDX: 0000000000000000 RSI: ffff88007fd0d0d8 RDI: 0000000000000246
[ 59.009076] RBP: ffff880079a0b998 R08: 000000000000000a R09: 00000000000001fa
[ 59.010883] R10: 0000000000000000 R11: 00000000000001f9 R12: ffff880073730b50
[ 59.012855] R13: ffff88007a91e800 R14: ffff880079a0baa0 R15: 0000000000000a00
[ 59.014753] FS: 00007f784eb7b700(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
[ 59.016947] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 59.018174] CR2: 00007f5d4c40e1a0 CR3: 000000007b177000 CR4: 00000000000006e0
[ 59.019509] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 59.020906] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 59.022252] Stack:
[ 59.022648] ffff880079a0ba48 ffffffff81450b2e ffff88007ba2d2c0 0000000000000001
[ 59.024181] ffff880000000000 ffffffff00000000 ffff880000000008 00010000ffff159d
[ 59.025650] 0000000000000054 ffff880073730980 ffff880079a34530 0000000179a34530
[ 59.027117] Call Trace:
[ 59.027592] [<ffffffff81450b2e>] __xfs_get_blocks+0x7e/0x5d0
[ 59.028744] [<ffffffff81451094>] xfs_get_blocks_direct+0x14/0x20
[ 59.029914] [<ffffffff811c157b>] get_blocks+0x9b/0x1b0
[ 59.030903] [<ffffffff8115da92>] ? get_user_pages+0x52/0x60
[ 59.031968] [<ffffffff811c1af7>] __blockdev_direct_IO+0x367/0x850
[ 59.033191] [<ffffffff81451080>] ? __xfs_get_blocks+0x5d0/0x5d0
[ 59.034336] [<ffffffff8144f3ed>] xfs_vm_direct_IO+0x18d/0x1b0
[ 59.035436] [<ffffffff81451080>] ? __xfs_get_blocks+0x5d0/0x5d0
[ 59.036635] [<ffffffff81144b42>] ? pagevec_lookup+0x22/0x30
[ 59.037718] [<ffffffff8113a6ef>] generic_file_aio_read+0x6bf/0x710
[ 59.038899] [<ffffffff814579a2>] xfs_file_aio_read+0x152/0x320
[ 59.040089] [<ffffffff81186910>] do_sync_read+0x80/0xb0
[ 59.041100] [<ffffffff811876d5>] vfs_read+0xa5/0x160
[ 59.042098] [<ffffffff81187912>] SyS_read+0x52/0xa0
[ 59.043045] [<ffffffff81c39e99>] system_call_fastpath+0x16/0x1b
[ 59.044246] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 f1 41 89 d0 48 89 e5 48 89 fa 48 c7 c6 48 f4 fb 81 31 ff 31 c0 e8 de fb ff ff <0f> 0b 66 66 66
[ 59.049085] RIP [<ffffffff814655c2>] assfail+0x22/0x30
[ 59.050100] RSP <ffff880079a0b998>

Cheers,

Dave.
--
Dave Chinner
[email protected]

2013-06-11 17:12:45

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH 22/26] block: Make generic_make_request handle arbitrary sized bios

On Sat, Jun 08, 2013 at 07:19:04PM -0700, Kent Overstreet wrote:
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -467,6 +468,7 @@ struct request_queue {
> #define QUEUE_FLAG_SECDISCARD 17 /* supports SECDISCARD */
> #define QUEUE_FLAG_SAME_FORCE 18 /* force complete on same CPU */
> #define QUEUE_FLAG_DEAD 19 /* queue tear-down finished */
> +#define QUEUE_FLAG_LARGEBIOS 19 /* no limits on bio size */
^^
20 ?

david

2013-06-12 04:26:50

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 22/26] block: Make generic_make_request handle arbitrary sized bios

On Tue, Jun 11, 2013 at 07:12:41PM +0200, David Sterba wrote:
> On Sat, Jun 08, 2013 at 07:19:04PM -0700, Kent Overstreet wrote:
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -467,6 +468,7 @@ struct request_queue {
> > #define QUEUE_FLAG_SECDISCARD 17 /* supports SECDISCARD */
> > #define QUEUE_FLAG_SAME_FORCE 18 /* force complete on same CPU */
> > #define QUEUE_FLAG_DEAD 19 /* queue tear-down finished */
> > +#define QUEUE_FLAG_LARGEBIOS 19 /* no limits on bio size */
> ^^
> 20 ?
>
> david

Whoops! Thanks, fixed.

2013-06-12 20:30:08

by Kent Overstreet

[permalink] [raw]
Subject: Re: Immutable biovecs, dio rewrite

On Tue, Jun 11, 2013 at 03:20:12PM +1000, Dave Chinner wrote:
> Please test with XFS and CONFIG_XFS_DEBUG=y - xfstests will stress
> the dio subsystem a lot more when it is run on XFS. Indeed, xfstests
> generic/013 assert fails almost immediately with:

Thanks - I haven't used xfstests much before. This one turned out to be
an easy fix, but once I turned on CONFIG_XFS_DEBUG I hit something else
I hadn't seen before:

[ 1343.482366] XFS (sdb2): Corruption detected. Unmount and run xfs_repair
[ 1343.482425] XFS (sdb2): bad inode magic/vsn daddr 64 #8 (magic=5858)
[ 1343.482482] XFS: Assertion failed: 0, file: /home/kent/linux/fs/xfs/xfs_inode.c, line: 417
[ 1343.482558] ------------[ cut here ]------------
[ 1343.482611] kernel BUG at /home/kent/linux/fs/xfs/xfs_message.c:108!

This one looks weirder - any thoughts on what might cause _that_?

(Don't spend any time on it, I haven't tried to debug it besides glancing at
the relevant xfs code - just curious if you have any pointers)

2013-06-28 19:44:19

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [PATCH 10/26] block: Convert drivers to immutable biovecs

Hi, Kent Overstreet.

I tried your patches in the block branch of your git://evilpiepirate.org/~kent/linux-bcache.git repository. Please let me know if I should be using some other branch than linux-bcache/block.

The AoE targets that work without the patches are not completing their initialization. It looks like they get to the part where the kernel (outside the aoe driver) attempts to read the partition table, and then there's a general protection fault when memcpy runs from skb_copy_bits. (Console messages below.)

With the upstream 3.10.0-rc5 running in this paravirtualized Xen guest, the partition tables are read within the first second after loading the aoe module, and the devices are ready for use.

I'm using a Coraid SRX, but a convenient way to do (non-performance, simple) tests with AoE targets is to create them as needed using vblade,

http://sourceforge.net/projects/aoetools/files/vblade/

... to export sparse files as block devices over AoE. A handy check is aoe-stat from the aoetools,

http://sourceforge.net/projects/aoetools/files/aoetools/35/

... which will show output with "(NA)" in the last columns if the devices can't finish initializing. If you'd like to use aoe-stat, here's the expected behavior with jumbo frames and 500GB targets, as seen with 3.10-rc5:

ecashin@tolstoy ~$ sudo aoe-stat | sed 2q
e82.0 500.107GB eth0,eth1 8192 up
e82.1 500.107GB eth0,eth1 8192 up
ecashin@tolstoy ~$

Compare to 3.10.0-rc5+ from linux-bcache/block:

ecashin@tolstoy ~$ sudo aoe-stat
e82.0 500.107GB (NA) (NA)
e82.1 500.107GB (NA) (NA)
ecashin@tolstoy ~$


Kernel 3.10.0-rc5+ on an x86_64

tolstoy.coraid.com login: ixgbe 0000:00:00.1: removed PHC on eth1
ixgbe 0000:00:00.1: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
ixgbe 0000:00:00.1: registered PHC device on eth1
ixgbe 0000:00:00.0: removed PHC on eth0
ixgbe 0000:00:00.1 eth1: detected SFP+: 4
ixgbe 0000:00:00.0: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
ixgbe 0000:00:00.0: registered PHC device on eth0
ixgbe 0000:00:00.0 eth0: detected SFP+: 3
ixgbe 0000:00:00.1 eth1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
ixgbe 0000:00:00.0 eth0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
aoe: AoE v81 initialised.
aoe: e82.23: setting 8192 byte data frames
aoe: e82.22: setting 8192 byte data frames
aoe: e82.21: setting 8192 byte data frames
aoe: e82.20: setting 8192 byte data frames
aoe: e82.19: setting 8192 byte data frames
aoe: e82.18: setting 8192 byte data frames
aoe: e82.17: setting 8192 byte data frames
aoe: e82.16: setting 8192 byte data frames
aoe: e82.15: setting 8192 byte data frames
aoe: e82.14: setting 8192 byte data frames
aoe: e82.13: setting 8192 byte data frames
aoe: e82.12: setting 8192 byte data frames
aoe: e82.11: setting 8192 byte data frames
aoe: e82.10: setting 8192 byte data frames
aoe: e82.9: setting 8192 byte data frames
aoe: e82.8: setting 8192 byte data frames
aoe: e82.7: setting 8192 byte data frames
aoe: e82.6: setting 8192 byte data frames
aoe: e82.5: setting 8192 byte data frames
aoe: e82.4: setting 8192 byte data frames
aoe: e82.3: setting 8192 byte data frames
aoe: e82.2: setting 8192 byte data frames
aoe: e82.1: setting 8192 byte data frames
aoe: e82.0: setting 8192 byte data frames
aoe: 002590643b25 e82.23 vace0 has 976772992 sectors
aoe: 002590643b24 e82.22 vace0 has 976772992 sectors
aoe: 002590643b24 e82.21 vace0 has 976772992 sectors
aoe: 002590643b25 e82.20 vace0 has 976772992 sectors
aoe: 002590643b25 e82.19 vace0 has 5860532992 sectors
aoe: 002590643b25 e82.18 vace0 has 976772992 sectors
aoe: 002590643b25 e82.17 vace0 has 976772992 sectors
aoe: 002590643b24 e82.16 vace0 has 5860532992 sectors
aoe: 002590643b24 e82.15 vace0 has 976772992 sectors
aoe: 002590643b24 e82.14 vace0 has 976772992 sectors
aoe: 002590643b24 e82.13 vace0 has 5860532992 sectors
aoe: 002590643b24 e82.12 vace0 has 976772992 sectors
aoe: 002590643b24 e82.11 vace0 has 976772992 sectors
aoe: 002590643b24 e82.10 vace0 has 976772992 sectors
aoe: 002590643b24 e82.9 vace0 has 976772992 sectors
aoe: 002590643b25 e82.8 vace0 has 5860532992 sectors
aoe: 002590643b25 e82.7 vace0 has 976772992 sectors
aoe: 002590643b25 e82.6 vace0 has 976772992 sectors
aoe: 002590643b25 e82.5 vace0 has 976772992 sectors
aoe: 002590643b25 e82.4 vace0 has 976772992 sectors
aoe: 002590643b25 e82.3 vace0 has 976772992 sectors
aoe: 002590643b25 e82.2 vace0 has 976772992 sectors
aoe: 002590643b25 e82.1 vace0 has 976772992 sectors
aoe: 002590643b25 e82.0 vace0 has 976772992 sectors
general protection fault: 0000 [#1] SMP
Modules linked in: aoe bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc garp scsi_tgt stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront coretemp microcode ixgbe dca ptp pps_core mdio pcspkr ext4 jbd2 mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod
CPU: 7 PID: 1557 Comm: aoe_ktio Not tainted 3.10.0-rc5+ #4
task: ffff880076504240 ti: ffff880076554000 task.ti: ffff880076554000
RIP: e030:[<ffffffff8129908d>] [<ffffffff8129908d>] memcpy+0xd/0x110
RSP: e02b:ffff880076555ce0 EFLAGS: 00010202
RAX: 0008b045cc646fff RBX: 0000000000001000 RCX: 00000000000000fb
RDX: 0000000000000004 RSI: ffff8800737e7024 RDI: 0008b045cc646fff
RBP: ffff880076555d48 R08: 00000000000007dc R09: 0008b045cc646fff
R10: ffff88000667bbc0 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 00000000000007dc R15: 00000000000007dc
FS: 00007f7a0ddcb700(0000) GS:ffff88007ce00000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffff600400 CR3: 0000000073856000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffffffff814a8875 ffff880076555d08 ffffffff8109653f ffff880076554000
0008b045cc646fff ffff88000667bbc0 00000000000007dc ffff880076504240
ffff8800058047c0 0000000000001000 ffff880005804820 ffff880004be1180
Call Trace:
[<ffffffff814a8875>] ? skb_copy_bits+0x155/0x2b0
[<ffffffff8109653f>] ? local_clock+0x6f/0x80
[<ffffffffa035b9d8>] ktiocomplete+0x3c8/0x540 [aoe]
[<ffffffffa035a570>] ? aoe_ktstart+0xd0/0xd0 [aoe]
[<ffffffffa035a570>] ? aoe_ktstart+0xd0/0xd0 [aoe]
[<ffffffffa035bb88>] ktio+0x38/0x80 [aoe]
[<ffffffffa035a61c>] kthread+0xac/0x100 [aoe]
[<ffffffff81094f70>] ? try_to_wake_up+0x300/0x300
[<ffffffffa035a570>] ? aoe_ktstart+0xd0/0xd0 [aoe]
[<ffffffff81081c5e>] kthread+0xee/0x100
[<ffffffff810c3b5b>] ? __lock_release+0x13b/0x1b0
[<ffffffff81081b70>] ? __init_kthread_worker+0x70/0x70
[<ffffffff8159c02c>] ret_from_fork+0x7c/0xb0
[<ffffffff81081b70>] ? __init_kthread_worker+0x70/0x70
Code: 0f b6 c0 5b c9 c3 0f 1f 84 00 00 00 00 00 e8 6b f8 ff ff 80 7b 25 00 74 c8 eb d3 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c
RIP [<ffffffff8129908d>] memcpy+0xd/0x110
RSP <ffff880076555ce0>
---[ end trace ff5308cca9a17603 ]---
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic(): 1, irqs_disabled(): 0, pid: 1557, name: aoe_ktio
INFO: lockdep is turned off.
CPU: 7 PID: 1557 Comm: aoe_ktio Tainted: G D 3.10.0-rc5+ #4
000000000000000b ffff880076555ad8 ffffffff8158e8e7 ffff880076555af8
ffffffff8108e2a5 ffffffff81a30250 ffff8800765e5128 ffff880076555b28
ffffffff8158fba6 0000000000000000 ffff880076504240 ffff880076504240
Call Trace:
[<ffffffff8158e8e7>] dump_stack+0x19/0x22
[<ffffffff8108e2a5>] __might_sleep+0xf5/0x130
[<ffffffff8158fba6>] down_read+0x26/0xa0
[<ffffffff8106e464>] exit_signals+0x24/0x140
[<ffffffff810884a6>] ? blocking_notifier_call_chain+0x16/0x20
[<ffffffff8105dac2>] do_exit+0xb2/0x480
[<ffffffff81594161>] oops_end+0xb1/0x100
[<ffffffff81017a7b>] die+0x5b/0x90
[<ffffffff81593c4c>] do_general_protection+0xdc/0x160
[<ffffffff81593223>] ? restore_args+0x30/0x30
[<ffffffff81593498>] general_protection+0x28/0x30
[<ffffffff8129908d>] ? memcpy+0xd/0x110
[<ffffffff814a8875>] ? skb_copy_bits+0x155/0x2b0
[<ffffffff8109653f>] ? local_clock+0x6f/0x80
[<ffffffffa035b9d8>] ktiocomplete+0x3c8/0x540 [aoe]
[<ffffffffa035a570>] ? aoe_ktstart+0xd0/0xd0 [aoe]
[<ffffffffa035a570>] ? aoe_ktstart+0xd0/0xd0 [aoe]
[<ffffffffa035bb88>] ktio+0x38/0x80 [aoe]
[<ffffffffa035a61c>] kthread+0xac/0x100 [aoe]
[<ffffffff81094f70>] ? try_to_wake_up+0x300/0x300
[<ffffffffa035a570>] ? aoe_ktstart+0xd0/0xd0 [aoe]
[<ffffffff81081c5e>] kthread+0xee/0x100
[<ffffffff810c3b5b>] ? __lock_release+0x13b/0x1b0
[<ffffffff81081b70>] ? __init_kthread_worker+0x70/0x70
[<ffffffff8159c02c>] ret_from_fork+0x7c/0xb0
[<ffffffff81081b70>] ? __init_kthread_worker+0x70/0x70
BUG: scheduling while atomic: aoe_ktio/1557/0x10000002
INFO: lockdep is turned off.
Modules linked in: aoe bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc garp scsi_tgt stp llc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xen_netfront coretemp microcode ixgbe dca ptp pps_core mdio pcspkr ext4 jbd2 mbcache xen_blkfront dm_mirror dm_region_hash dm_log dm_mod
CPU: 7 PID: 1557 Comm: aoe_ktio Tainted: G D 3.10.0-rc5+ #4
0000000000000007 ffff880076555a18 ffffffff8158e8e7 ffff880076555a38
ffffffff8108ea5a ffff88007cfd4cc0 ffff88007cfd4cc0 ffff880076555ac8
ffffffff815907ee ffff880076555aa8 ffff880076554000 ffff880076555fd8
Call Trace:
[<ffffffff8158e8e7>] dump_stack+0x19/0x22
[<ffffffff8108ea5a>] __schedule_bug+0x6a/0x90
[<ffffffff815907ee>] __schedule+0x68e/0x740
[<ffffffff810938da>] __cond_resched+0x2a/0x40
[<ffffffff81590930>] _cond_resched+0x30/0x40
[<ffffffff8158fbaf>] down_read+0x2f/0xa0
[<ffffffff8106e464>] exit_signals+0x24/0x140
[<ffffffff810884a6>] ? blocking_notifier_call_chain+0x16/0x20
[<ffffffff8105dac2>] do_exit+0xb2/0x480
[<ffffffff81594161>] oops_end+0xb1/0x100
[<ffffffff81017a7b>] die+0x5b/0x90
[<ffffffff81593c4c>] do_general_protection+0xdc/0x160
[<ffffffff81593223>] ? restore_args+0x30/0x30
[<ffffffff81593498>] general_protection+0x28/0x30
[<ffffffff8129908d>] ? memcpy+0xd/0x110
[<ffffffff814a8875>] ? skb_copy_bits+0x155/0x2b0
[<ffffffff8109653f>] ? local_clock+0x6f/0x80
[<ffffffffa035b9d8>] ktiocomplete+0x3c8/0x540 [aoe]
[<ffffffffa035a570>] ? aoe_ktstart+0xd0/0xd0 [aoe]
[<ffffffffa035a570>] ? aoe_ktstart+0xd0/0xd0 [aoe]
[<ffffffffa035bb88>] ktio+0x38/0x80 [aoe]
[<ffffffffa035a61c>] kthread+0xac/0x100 [aoe]
[<ffffffff81094f70>] ? try_to_wake_up+0x300/0x300
[<ffffffffa035a570>] ? aoe_ktstart+0xd0/0xd0 [aoe]
[<ffffffff81081c5e>] kthread+0xee/0x100
[<ffffffff810c3b5b>] ? __lock_release+0x13b/0x1b0
[<ffffffff81081b70>] ? __init_kthread_worker+0x70/0x70
[<ffffffff8159c02c>] ret_from_fork+0x7c/0xb0
[<ffffffff81081b70>] ? __init_kthread_worker+0x70/0x70
note: aoe_ktio[1557] exited with preempt_count 1

On Jun 8, 2013, at 10:18 PM, Kent Overstreet wrote:

> Now that we've got a mechanism for immutable biovecs -
> bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
> respect it instead of using the bvec array directly.
>
> Signed-off-by: Kent Overstreet <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Cc: NeilBrown <[email protected]>
> Cc: "Ed L. Cashin" <[email protected]>
> Cc: Alasdair Kergon <[email protected]>
> Cc: [email protected]
> ---
> drivers/block/aoe/aoe.h | 10 +---
> drivers/block/aoe/aoecmd.c | 127 +++++++++++++++++----------------------------
> drivers/block/umem.c | 50 ++++++++----------
> drivers/md/dm-crypt.c | 52 ++++++++-----------
> drivers/md/dm-io.c | 31 ++++++-----
> drivers/md/dm-raid1.c | 8 +--
> drivers/md/dm-verity.c | 52 +++++--------------
> include/linux/dm-io.h | 4 +-
> 8 files changed, 131 insertions(+), 203 deletions(-)
>
> diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
> index 1756494..e959e6b 100644
> --- a/drivers/block/aoe/aoe.h
> +++ b/drivers/block/aoe/aoe.h
> @@ -100,11 +100,8 @@ enum {
>
> struct buf {
> ulong nframesout;
> - ulong resid;
> - ulong bv_resid;
> - sector_t sector;
> struct bio *bio;
> - struct bio_vec *bv;
> + struct bvec_iter iter;
> struct request *rq;
> };
>
> @@ -120,13 +117,10 @@ struct frame {
> ulong waited;
> ulong waited_total;
> struct aoetgt *t; /* parent target I belong to */
> - sector_t lba;
> struct sk_buff *skb; /* command skb freed on module exit */
> struct sk_buff *r_skb; /* response skb for async processing */
> struct buf *buf;
> - struct bio_vec *bv;
> - ulong bcnt;
> - ulong bv_off;
> + struct bvec_iter iter;
> char flags;
> };
>
> diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
> index a52975a..0733ba1 100644
> --- a/drivers/block/aoe/aoecmd.c
> +++ b/drivers/block/aoe/aoecmd.c
> @@ -183,8 +183,7 @@ aoe_freetframe(struct frame *f)
>
> t = f->t;
> f->buf = NULL;
> - f->lba = 0;
> - f->bv = NULL;
> + memset(&f->iter, 0, sizeof(f->iter));
> f->r_skb = NULL;
> f->flags = 0;
> list_add(&f->head, &t->ffree);
> @@ -282,21 +281,17 @@ newframe(struct aoedev *d)
> }
>
> static void
> -skb_fillup(struct sk_buff *skb, struct bio_vec *bv, ulong off, ulong cnt)
> +skb_fillup(struct sk_buff *skb, struct bio *bio, struct bvec_iter *iter)
> {
> int frag = 0;
> - ulong fcnt;
> -loop:
> - fcnt = bv->bv_len - (off - bv->bv_offset);
> - if (fcnt > cnt)
> - fcnt = cnt;
> - skb_fill_page_desc(skb, frag++, bv->bv_page, off, fcnt);
> - cnt -= fcnt;
> - if (cnt <= 0)
> - return;
> - bv++;
> - off = bv->bv_offset;
> - goto loop;
> +
> + while (iter->bi_size) {
> + struct bio_vec bv = bio_iovec_iter(bio, *iter);
> +
> + skb_fill_page_desc(skb, frag++, bv.bv_page,
> + bv.bv_offset, bv.bv_len);
> + bio_advance_iter(bio, iter, bv.bv_len);
> + }
> }
>
> static void
> @@ -333,12 +328,10 @@ ata_rw_frameinit(struct frame *f)
> t->nout++;
> f->waited = 0;
> f->waited_total = 0;
> - if (f->buf)
> - f->lba = f->buf->sector;
>
> /* set up ata header */
> - ah->scnt = f->bcnt >> 9;
> - put_lba(ah, f->lba);
> + ah->scnt = f->iter.bi_size >> 9;
> + put_lba(ah, f->iter.bi_sector);
> if (t->d->flags & DEVFL_EXT) {
> ah->aflags |= AOEAFL_EXT;
> } else {
> @@ -347,11 +340,11 @@ ata_rw_frameinit(struct frame *f)
> ah->lba3 |= 0xe0; /* LBA bit + obsolete 0xa0 */
> }
> if (f->buf && bio_data_dir(f->buf->bio) == WRITE) {
> - skb_fillup(skb, f->bv, f->bv_off, f->bcnt);
> + skb->len += f->iter.bi_size;
> + skb->data_len = f->iter.bi_size;
> + skb->truesize += f->iter.bi_size;
> + skb_fillup(skb, f->buf->bio, &f->iter);
> ah->aflags |= AOEAFL_WRITE;
> - skb->len += f->bcnt;
> - skb->data_len = f->bcnt;
> - skb->truesize += f->bcnt;
> t->wpkts++;
> } else {
> t->rpkts++;
> @@ -370,7 +363,7 @@ aoecmd_ata_rw(struct aoedev *d)
> struct aoetgt *t;
> struct sk_buff *skb;
> struct sk_buff_head queue;
> - ulong bcnt, fbcnt;
> + ulong bcnt;
>
> buf = nextbuf(d);
> if (buf == NULL)
> @@ -382,36 +375,19 @@ aoecmd_ata_rw(struct aoedev *d)
> bcnt = d->maxbcnt;
> if (bcnt == 0)
> bcnt = DEFAULTBCNT;
> - if (bcnt > buf->resid)
> - bcnt = buf->resid;
> - fbcnt = bcnt;
> - f->bv = buf->bv;
> - f->bv_off = f->bv->bv_offset + (f->bv->bv_len - buf->bv_resid);
> - do {
> - if (fbcnt < buf->bv_resid) {
> - buf->bv_resid -= fbcnt;
> - buf->resid -= fbcnt;
> - break;
> - }
> - fbcnt -= buf->bv_resid;
> - buf->resid -= buf->bv_resid;
> - if (buf->resid == 0) {
> - d->ip.buf = NULL;
> - break;
> - }
> - buf->bv++;
> - buf->bv_resid = buf->bv->bv_len;
> - WARN_ON(buf->bv_resid == 0);
> - } while (fbcnt);
> + if (bcnt > buf->iter.bi_size)
> + bcnt = buf->iter.bi_size;
> +
> + bio_advance_iter(buf->bio, &buf->iter, bcnt);
>
> /* initialize the headers & frame */
> f->buf = buf;
> - f->bcnt = bcnt;
> + f->iter = buf->iter;
> + f->iter.bi_size = bcnt;
> ata_rw_frameinit(f);
>
> /* mark all tracking fields and load out */
> buf->nframesout += 1;
> - buf->sector += bcnt >> 9;
>
> skb = skb_clone(f->skb, GFP_ATOMIC);
> if (skb) {
> @@ -604,10 +580,7 @@ reassign_frame(struct frame *f)
> skb = nf->skb;
> nf->skb = f->skb;
> nf->buf = f->buf;
> - nf->bcnt = f->bcnt;
> - nf->lba = f->lba;
> - nf->bv = f->bv;
> - nf->bv_off = f->bv_off;
> + nf->iter = f->iter;
> nf->waited = 0;
> nf->waited_total = f->waited_total;
> nf->sent = f->sent;
> @@ -626,6 +599,7 @@ probe(struct aoetgt *t)
> struct sk_buff_head queue;
> size_t n, m;
> int frag;
> + ulong bcnt;
>
> d = t->d;
> f = newtframe(d, t);
> @@ -639,19 +613,20 @@ probe(struct aoetgt *t)
> }
> f->flags |= FFL_PROBE;
> ifrotate(t);
> - f->bcnt = t->d->maxbcnt ? t->d->maxbcnt : DEFAULTBCNT;
> + bcnt = t->d->maxbcnt ? t->d->maxbcnt : DEFAULTBCNT;
> + f->iter.bi_size = bcnt;
> ata_rw_frameinit(f);
> skb = f->skb;
> - for (frag = 0, n = f->bcnt; n > 0; ++frag, n -= m) {
> + for (frag = 0, n = bcnt; n > 0; ++frag, n -= m) {
> if (n < PAGE_SIZE)
> m = n;
> else
> m = PAGE_SIZE;
> skb_fill_page_desc(skb, frag, empty_page, 0, m);
> }
> - skb->len += f->bcnt;
> - skb->data_len = f->bcnt;
> - skb->truesize += f->bcnt;
> + skb->len += bcnt;
> + skb->data_len = bcnt;
> + skb->truesize += bcnt;
>
> skb = skb_clone(f->skb, GFP_ATOMIC);
> if (skb) {
> @@ -923,12 +898,8 @@ bufinit(struct buf *buf, struct request *rq, struct bio *bio)
> memset(buf, 0, sizeof(*buf));
> buf->rq = rq;
> buf->bio = bio;
> - buf->resid = bio->bi_iter.bi_size;
> - buf->sector = bio->bi_iter.bi_sector;
> + buf->iter = bio->bi_iter;
> bio_pageinc(bio);
> - buf->bv = __bio_iovec(bio);
> - buf->bv_resid = buf->bv->bv_len;
> - WARN_ON(buf->bv_resid == 0);
> }
>
> static struct buf *
> @@ -1113,24 +1084,23 @@ gettgt(struct aoedev *d, char *addr)
> }
>
> static void
> -bvcpy(struct bio_vec *bv, ulong off, struct sk_buff *skb, long cnt)
> +bvcpy(struct sk_buff *skb, struct bio *bio, struct bvec_iter *iter, long cnt)
> {
> - ulong fcnt;
> char *p;
> int soff = 0;
> -loop:
> - fcnt = bv->bv_len - (off - bv->bv_offset);
> - if (fcnt > cnt)
> - fcnt = cnt;
> - p = page_address(bv->bv_page) + off;
> - skb_copy_bits(skb, soff, p, fcnt);
> - soff += fcnt;
> - cnt -= fcnt;
> - if (cnt <= 0)
> - return;
> - bv++;
> - off = bv->bv_offset;
> - goto loop;
> +
> + do {
> + struct bio_vec bv = bio_iovec_iter(bio, *iter);
> +
> + p = page_address(bv.bv_page) + bv.bv_offset;
> + skb_copy_bits(skb, soff, p, bv.bv_len);
> +
> + bio_advance_iter(bio, iter, bv.bv_len);
> + soff += bv.bv_len;
> + cnt -= bv.bv_len;
> + if (cnt <= 0)
> + return;
> + } while (cnt > 0);
> }
>
> void
> @@ -1223,7 +1193,7 @@ noskb: if (buf)
> clear_bit(BIO_UPTODATE, &buf->bio->bi_flags);
> break;
> }
> - bvcpy(f->bv, f->bv_off, skb, n);
> + bvcpy(skb, f->buf->bio, &f->iter, n);
> case ATA_CMD_PIO_WRITE:
> case ATA_CMD_PIO_WRITE_EXT:
> spin_lock_irq(&d->lock);
> @@ -1266,7 +1236,7 @@ out:
>
> aoe_freetframe(f);
>
> - if (buf && --buf->nframesout == 0 && buf->resid == 0)
> + if (buf && --buf->nframesout == 0 && buf->iter.bi_size == 0)
> aoe_end_buf(d, buf);
>
> spin_unlock_irq(&d->lock);
> @@ -1697,7 +1667,6 @@ aoe_failbuf(struct aoedev *d, struct buf *buf)
> {
> if (buf == NULL)
> return;
> - buf->resid = 0;
> clear_bit(BIO_UPTODATE, &buf->bio->bi_flags);
> if (buf->nframesout == 0)
> aoe_end_buf(d, buf);
> diff --git a/drivers/block/umem.c b/drivers/block/umem.c
> index dab4f1a..00145e8 100644
> --- a/drivers/block/umem.c
> +++ b/drivers/block/umem.c
> @@ -108,8 +108,7 @@ struct cardinfo {
> * have been written
> */
> struct bio *bio, *currentbio, **biotail;
> - int current_idx;
> - sector_t current_sector;
> + struct bvec_iter current_iter;
>
> struct request_queue *queue;
>
> @@ -118,7 +117,7 @@ struct cardinfo {
> struct mm_dma_desc *desc;
> int cnt, headcnt;
> struct bio *bio, **biotail;
> - int idx;
> + struct bvec_iter iter;
> } mm_pages[2];
> #define DESC_PER_PAGE ((PAGE_SIZE*2)/sizeof(struct mm_dma_desc))
>
> @@ -344,16 +343,13 @@ static int add_bio(struct cardinfo *card)
> dma_addr_t dma_handle;
> int offset;
> struct bio *bio;
> - struct bio_vec *vec;
> - int idx;
> + struct bio_vec vec;
> int rw;
> - int len;
>
> bio = card->currentbio;
> if (!bio && card->bio) {
> card->currentbio = card->bio;
> - card->current_idx = card->bio->bi_iter.bi_idx;
> - card->current_sector = card->bio->bi_iter.bi_sector;
> + card->current_iter = card->bio->bi_iter;
> card->bio = card->bio->bi_next;
> if (card->bio == NULL)
> card->biotail = &card->bio;
> @@ -362,18 +358,17 @@ static int add_bio(struct cardinfo *card)
> }
> if (!bio)
> return 0;
> - idx = card->current_idx;
>
> rw = bio_rw(bio);
> if (card->mm_pages[card->Ready].cnt >= DESC_PER_PAGE)
> return 0;
>
> - vec = bio_iovec_idx(bio, idx);
> - len = vec->bv_len;
> + vec = bio_iovec_iter(bio, card->current_iter);
> +
> dma_handle = pci_map_page(card->dev,
> - vec->bv_page,
> - vec->bv_offset,
> - len,
> + vec.bv_page,
> + vec.bv_offset,
> + vec.bv_len,
> (rw == READ) ?
> PCI_DMA_FROMDEVICE : PCI_DMA_TODEVICE);
>
> @@ -381,7 +376,7 @@ static int add_bio(struct cardinfo *card)
> desc = &p->desc[p->cnt];
> p->cnt++;
> if (p->bio == NULL)
> - p->idx = idx;
> + p->iter = card->current_iter;
> if ((p->biotail) != &bio->bi_next) {
> *(p->biotail) = bio;
> p->biotail = &(bio->bi_next);
> @@ -391,8 +386,8 @@ static int add_bio(struct cardinfo *card)
> desc->data_dma_handle = dma_handle;
>
> desc->pci_addr = cpu_to_le64((u64)desc->data_dma_handle);
> - desc->local_addr = cpu_to_le64(card->current_sector << 9);
> - desc->transfer_size = cpu_to_le32(len);
> + desc->local_addr = cpu_to_le64(card->current_iter.bi_sector << 9);
> + desc->transfer_size = cpu_to_le32(vec.bv_len);
> offset = (((char *)&desc->sem_control_bits) - ((char *)p->desc));
> desc->sem_addr = cpu_to_le64((u64)(p->page_dma+offset));
> desc->zero1 = desc->zero2 = 0;
> @@ -407,10 +402,9 @@ static int add_bio(struct cardinfo *card)
> desc->control_bits |= cpu_to_le32(DMASCR_TRANSFER_READ);
> desc->sem_control_bits = desc->control_bits;
>
> - card->current_sector += (len >> 9);
> - idx++;
> - card->current_idx = idx;
> - if (idx >= bio->bi_vcnt)
> +
> + bio_advance_iter(bio, &card->current_iter, vec.bv_len);
> + if (!card->current_iter.bi_size)
> card->currentbio = NULL;
>
> return 1;
> @@ -439,23 +433,25 @@ static void process_page(unsigned long data)
> struct mm_dma_desc *desc = &page->desc[page->headcnt];
> int control = le32_to_cpu(desc->sem_control_bits);
> int last = 0;
> - int idx;
> + struct bio_vec vec;
>
> if (!(control & DMASCR_DMA_COMPLETE)) {
> control = dma_status;
> last = 1;
> }
> +
> page->headcnt++;
> - idx = page->idx;
> - page->idx++;
> - if (page->idx >= bio->bi_vcnt) {
> + vec = bio_iovec_iter(bio, page->iter);
> + bio_advance_iter(bio, &page->iter, vec.bv_len);
> +
> + if (!page->iter.bi_size) {
> page->bio = bio->bi_next;
> if (page->bio)
> - page->idx = page->bio->bi_iter.bi_idx;
> + page->iter = page->bio->bi_iter;
> }
>
> pci_unmap_page(card->dev, desc->data_dma_handle,
> - bio_iovec_idx(bio, idx)->bv_len,
> + vec.bv_len,
> (control & DMASCR_TRANSFER_READ) ?
> PCI_DMA_TODEVICE : PCI_DMA_FROMDEVICE);
> if (control & DMASCR_HARD_ERROR) {
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index fca3bba..d97d824 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -38,10 +38,8 @@ struct convert_context {
> struct completion restart;
> struct bio *bio_in;
> struct bio *bio_out;
> - unsigned int offset_in;
> - unsigned int offset_out;
> - unsigned int idx_in;
> - unsigned int idx_out;
> + struct bvec_iter iter_in;
> + struct bvec_iter iter_out;
> sector_t cc_sector;
> atomic_t cc_pending;
> };
> @@ -650,10 +648,12 @@ static void crypt_convert_init(struct crypt_config *cc,
> {
> ctx->bio_in = bio_in;
> ctx->bio_out = bio_out;
> - ctx->offset_in = 0;
> - ctx->offset_out = 0;
> - ctx->idx_in = bio_in ? bio_in->bi_iter.bi_idx : 0;
> - ctx->idx_out = bio_out ? bio_out->bi_iter.bi_idx : 0;
> +
> + if (bio_in)
> + ctx->iter_in = bio_in->bi_iter;
> + if (bio_out)
> + ctx->iter_out = bio_out->bi_iter;
> +
> ctx->cc_sector = sector + cc->iv_offset;
> init_completion(&ctx->restart);
> }
> @@ -681,8 +681,8 @@ static int crypt_convert_block(struct crypt_config *cc,
> struct convert_context *ctx,
> struct ablkcipher_request *req)
> {
> - struct bio_vec *bv_in = bio_iovec_idx(ctx->bio_in, ctx->idx_in);
> - struct bio_vec *bv_out = bio_iovec_idx(ctx->bio_out, ctx->idx_out);
> + struct bio_vec bv_in = bio_iovec_iter(ctx->bio_in, ctx->iter_in);
> + struct bio_vec bv_out = bio_iovec_iter(ctx->bio_out, ctx->iter_out);
> struct dm_crypt_request *dmreq;
> u8 *iv;
> int r;
> @@ -693,24 +693,15 @@ static int crypt_convert_block(struct crypt_config *cc,
> dmreq->iv_sector = ctx->cc_sector;
> dmreq->ctx = ctx;
> sg_init_table(&dmreq->sg_in, 1);
> - sg_set_page(&dmreq->sg_in, bv_in->bv_page, 1 << SECTOR_SHIFT,
> - bv_in->bv_offset + ctx->offset_in);
> + sg_set_page(&dmreq->sg_in, bv_in.bv_page, 1 << SECTOR_SHIFT,
> + bv_in.bv_offset);
>
> sg_init_table(&dmreq->sg_out, 1);
> - sg_set_page(&dmreq->sg_out, bv_out->bv_page, 1 << SECTOR_SHIFT,
> - bv_out->bv_offset + ctx->offset_out);
> + sg_set_page(&dmreq->sg_out, bv_out.bv_page, 1 << SECTOR_SHIFT,
> + bv_out.bv_offset);
>
> - ctx->offset_in += 1 << SECTOR_SHIFT;
> - if (ctx->offset_in >= bv_in->bv_len) {
> - ctx->offset_in = 0;
> - ctx->idx_in++;
> - }
> -
> - ctx->offset_out += 1 << SECTOR_SHIFT;
> - if (ctx->offset_out >= bv_out->bv_len) {
> - ctx->offset_out = 0;
> - ctx->idx_out++;
> - }
> + bio_advance_iter(ctx->bio_in, &ctx->iter_in, 1 << SECTOR_SHIFT);
> + bio_advance_iter(ctx->bio_out, &ctx->iter_out, 1 << SECTOR_SHIFT);
>
> if (cc->iv_gen_ops) {
> r = cc->iv_gen_ops->generator(cc, iv, dmreq);
> @@ -761,8 +752,8 @@ static int crypt_convert(struct crypt_config *cc,
>
> atomic_set(&ctx->cc_pending, 1);
>
> - while(ctx->idx_in < ctx->bio_in->bi_vcnt &&
> - ctx->idx_out < ctx->bio_out->bi_vcnt) {
> + while (ctx->iter_in.bi_size &&
> + ctx->iter_out.bi_size) {
>
> crypt_alloc_req(cc, ctx);
>
> @@ -1031,7 +1022,7 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
> }
>
> /* crypt_convert should have filled the clone bio */
> - BUG_ON(io->ctx.idx_out < clone->bi_vcnt);
> + BUG_ON(io->ctx.iter_out.bi_size);
>
> clone->bi_iter.bi_sector = cc->start + io->sector;
>
> @@ -1070,7 +1061,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
> }
>
> io->ctx.bio_out = clone;
> - io->ctx.idx_out = 0;
> + io->ctx.iter_out = clone->bi_iter;
>
> remaining -= clone->bi_iter.bi_size;
> sector += bio_sectors(clone);
> @@ -1114,8 +1105,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
> crypt_inc_pending(new_io);
> crypt_convert_init(cc, &new_io->ctx, NULL,
> io->base_bio, sector);
> - new_io->ctx.idx_in = io->ctx.idx_in;
> - new_io->ctx.offset_in = io->ctx.offset_in;
> + new_io->ctx.iter_in = io->ctx.iter_in;
>
> /*
> * Fragments after the first use the base_io
> diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
> index a6de5c9..c2a6c34 100644
> --- a/drivers/md/dm-io.c
> +++ b/drivers/md/dm-io.c
> @@ -202,26 +202,29 @@ static void list_dp_init(struct dpages *dp, struct page_list *pl, unsigned offse
> /*
> * Functions for getting the pages from a bvec.
> */
> -static void bvec_get_page(struct dpages *dp,
> +static void bio_get_page(struct dpages *dp,
> struct page **p, unsigned long *len, unsigned *offset)
> {
> - struct bio_vec *bvec = (struct bio_vec *) dp->context_ptr;
> - *p = bvec->bv_page;
> - *len = bvec->bv_len;
> - *offset = bvec->bv_offset;
> + struct bio *bio = dp->context_ptr;
> + struct bio_vec bvec = bio_iovec(bio);
> + *p = bvec.bv_page;
> + *len = bvec.bv_len;
> + *offset = bvec.bv_offset;
> }
>
> -static void bvec_next_page(struct dpages *dp)
> +static void bio_next_page(struct dpages *dp)
> {
> - struct bio_vec *bvec = (struct bio_vec *) dp->context_ptr;
> - dp->context_ptr = bvec + 1;
> + struct bio *bio = dp->context_ptr;
> + struct bio_vec bvec = bio_iovec(bio);
> +
> + bio_advance(bio, bvec.bv_len);
> }
>
> -static void bvec_dp_init(struct dpages *dp, struct bio_vec *bvec)
> +static void bio_dp_init(struct dpages *dp, struct bio *bio)
> {
> - dp->get_page = bvec_get_page;
> - dp->next_page = bvec_next_page;
> - dp->context_ptr = bvec;
> + dp->get_page = bio_get_page;
> + dp->next_page = bio_next_page;
> + dp->context_ptr = bio;
> }
>
> /*
> @@ -459,8 +462,8 @@ static int dp_init(struct dm_io_request *io_req, struct dpages *dp,
> list_dp_init(dp, io_req->mem.ptr.pl, io_req->mem.offset);
> break;
>
> - case DM_IO_BVEC:
> - bvec_dp_init(dp, io_req->mem.ptr.bvec);
> + case DM_IO_BIO:
> + bio_dp_init(dp, io_req->mem.ptr.bio);
> break;
>
> case DM_IO_VMA:
> diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
> index e3efb91..56e8844 100644
> --- a/drivers/md/dm-raid1.c
> +++ b/drivers/md/dm-raid1.c
> @@ -526,8 +526,8 @@ static void read_async_bio(struct mirror *m, struct bio *bio)
> struct dm_io_region io;
> struct dm_io_request io_req = {
> .bi_rw = READ,
> - .mem.type = DM_IO_BVEC,
> - .mem.ptr.bvec = bio->bi_io_vec + bio->bi_iter.bi_idx,
> + .mem.type = DM_IO_BIO,
> + .mem.ptr.bio = bio,
> .notify.fn = read_callback,
> .notify.context = bio,
> .client = m->ms->io_client,
> @@ -629,8 +629,8 @@ static void do_write(struct mirror_set *ms, struct bio *bio)
> struct mirror *m;
> struct dm_io_request io_req = {
> .bi_rw = WRITE | (bio->bi_rw & WRITE_FLUSH_FUA),
> - .mem.type = DM_IO_BVEC,
> - .mem.ptr.bvec = bio->bi_io_vec + bio->bi_iter.bi_idx,
> + .mem.type = DM_IO_BIO,
> + .mem.ptr.bio = bio,
> .notify.fn = write_callback,
> .notify.context = bio,
> .client = ms->io_client,
> diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c
> index f3a4dcb..5e82c79 100644
> --- a/drivers/md/dm-verity.c
> +++ b/drivers/md/dm-verity.c
> @@ -73,15 +73,10 @@ struct dm_verity_io {
> sector_t block;
> unsigned n_blocks;
>
> - /* saved bio vector */
> - struct bio_vec *io_vec;
> - unsigned io_vec_size;
> + struct bvec_iter iter;
>
> struct work_struct work;
>
> - /* A space for short vectors; longer vectors are allocated separately. */
> - struct bio_vec io_vec_inline[DM_VERITY_IO_VEC_INLINE];
> -
> /*
> * Three variably-size fields follow this struct:
> *
> @@ -284,9 +279,10 @@ release_ret_r:
> static int verity_verify_io(struct dm_verity_io *io)
> {
> struct dm_verity *v = io->v;
> + struct bio *bio = dm_bio_from_per_bio_data(io,
> + v->ti->per_bio_data_size);
> unsigned b;
> int i;
> - unsigned vector = 0, offset = 0;
>
> for (b = 0; b < io->n_blocks; b++) {
> struct shash_desc *desc;
> @@ -336,31 +332,22 @@ test_block_hash:
> }
>
> todo = 1 << v->data_dev_block_bits;
> - do {
> - struct bio_vec *bv;
> + while (io->iter.bi_size) {
> u8 *page;
> - unsigned len;
> -
> - BUG_ON(vector >= io->io_vec_size);
> - bv = &io->io_vec[vector];
> - page = kmap_atomic(bv->bv_page);
> - len = bv->bv_len - offset;
> - if (likely(len >= todo))
> - len = todo;
> - r = crypto_shash_update(desc,
> - page + bv->bv_offset + offset, len);
> + struct bio_vec bv = bio_iovec_iter(bio, io->iter);
> +
> + page = kmap_atomic(bv.bv_page);
> + r = crypto_shash_update(desc, page + bv.bv_offset,
> + bv.bv_len);
> kunmap_atomic(page);
> +
> if (r < 0) {
> DMERR("crypto_shash_update failed: %d", r);
> return r;
> }
> - offset += len;
> - if (likely(offset == bv->bv_len)) {
> - offset = 0;
> - vector++;
> - }
> - todo -= len;
> - } while (todo);
> +
> + bio_advance_iter(bio, &io->iter, bv.bv_len);
> + }
>
> if (!v->version) {
> r = crypto_shash_update(desc, v->salt, v->salt_size);
> @@ -383,8 +370,6 @@ test_block_hash:
> return -EIO;
> }
> }
> - BUG_ON(vector != io->io_vec_size);
> - BUG_ON(offset);
>
> return 0;
> }
> @@ -400,9 +385,6 @@ static void verity_finish_io(struct dm_verity_io *io, int error)
> bio->bi_end_io = io->orig_bi_end_io;
> bio->bi_private = io->orig_bi_private;
>
> - if (io->io_vec != io->io_vec_inline)
> - mempool_free(io->io_vec, v->vec_mempool);
> -
> bio_endio(bio, error);
> }
>
> @@ -520,13 +502,7 @@ static int verity_map(struct dm_target *ti, struct bio *bio)
>
> bio->bi_end_io = verity_end_io;
> bio->bi_private = io;
> - io->io_vec_size = bio_segments(bio);
> - if (io->io_vec_size < DM_VERITY_IO_VEC_INLINE)
> - io->io_vec = io->io_vec_inline;
> - else
> - io->io_vec = mempool_alloc(v->vec_mempool, GFP_NOIO);
> - memcpy(io->io_vec, __bio_iovec(bio),
> - io->io_vec_size * sizeof(struct bio_vec));
> + io->iter = bio->bi_iter;
>
> verity_submit_prefetch(v, io);
>
> diff --git a/include/linux/dm-io.h b/include/linux/dm-io.h
> index f4b0aa3..6cf1f62 100644
> --- a/include/linux/dm-io.h
> +++ b/include/linux/dm-io.h
> @@ -29,7 +29,7 @@ typedef void (*io_notify_fn)(unsigned long error, void *context);
>
> enum dm_io_mem_type {
> DM_IO_PAGE_LIST,/* Page list */
> - DM_IO_BVEC, /* Bio vector */
> + DM_IO_BIO,
> DM_IO_VMA, /* Virtual memory area */
> DM_IO_KMEM, /* Kernel memory */
> };
> @@ -41,7 +41,7 @@ struct dm_io_memory {
>
> union {
> struct page_list *pl;
> - struct bio_vec *bvec;
> + struct bio *bio;
> void *vma;
> void *addr;
> } ptr;
> --
> 1.8.3.rc1
>

--
Ed Cashin
[email protected]