2012-08-06 22:09:09

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 00/12] Block cleanups

Various cleanups for the generic block layer - the patches should be pretty
self explanatory.

CHANGES SINCE LAST VERSION:

Review feedback - should all be noted in the patch descriptions. Fixed
retarded rebase conflicts. Added some acked-bys.

Kent Overstreet (12):
block: Generalized bio pool freeing
dm: Use bioset's front_pad for dm_rq_clone_bio_info
block: Add bio_reset()
pktcdvd: Switch to bio_kmalloc()
block: Kill bi_destructor
block: Add an explicit bio flag for bios that own their bvec
block: Rename bio_split() -> bio_pair_split()
block: Introduce new bio_split()
block: Rework bio_pair_split()
block: Add bio_clone_kmalloc()
block: Add bio_clone_bioset()
block: Only clone bio vecs that are in use

Documentation/block/biodoc.txt | 5 -
block/blk-core.c | 10 +-
drivers/block/drbd/drbd_main.c | 13 +--
drivers/block/drbd/drbd_req.c | 18 +--
drivers/block/osdblk.c | 3 +-
drivers/block/pktcdvd.c | 73 +++-------
drivers/block/rbd.c | 8 +-
drivers/md/dm-crypt.c | 9 --
drivers/md/dm-io.c | 11 --
drivers/md/dm.c | 60 ++-------
drivers/md/linear.c | 6 +-
drivers/md/md.c | 44 +------
drivers/md/raid0.c | 8 +-
drivers/md/raid10.c | 23 +---
drivers/target/target_core_iblock.c | 9 --
fs/bio-integrity.c | 44 ------
fs/bio.c | 259 +++++++++++++++++++++++------------
fs/exofs/ore.c | 5 +-
include/linux/bio.h | 37 ++---
include/linux/blk_types.h | 12 ++-
20 files changed, 256 insertions(+), 401 deletions(-)

--
1.7.7.3


2012-08-06 22:09:23

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 08/12] block: Introduce new bio_split()

The new bio_split() can split arbitrary bios - it's not restricted to
single page bios, like the old bio_split() (previously renamed to
bio_pair_split()). It also has different semantics - it doesn't allocate
a struct bio_pair, leaving it up to the caller to handle completions.

v5: Take out current->bio_list check and make it the caller's
responsibility, per Boaz

Signed-off-by: Kent Overstreet <[email protected]>
---
fs/bio.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/bio.h | 3 ++
2 files changed, 99 insertions(+), 0 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 0470376..312e5de 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1537,6 +1537,102 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
EXPORT_SYMBOL(bio_pair_split);

/**
+ * bio_split - split a bio
+ * @bio: bio to split
+ * @sectors: number of sectors to split from the front of @bio
+ * @gfp: gfp mask
+ * @bs: bio set to allocate from
+ *
+ * Allocates and returns a new bio which represents @sectors from the start of
+ * @bio, and updates @bio to represent the remaining sectors.
+ *
+ * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
+ * unchanged.
+ *
+ * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
+ * bvec boundry; it is the caller's responsibility to ensure that @bio is not
+ * freed before the split.
+ *
+ * BIG FAT WARNING:
+ *
+ * If you're calling this from under generic_make_request() (i.e.
+ * current->bio_list != NULL), you should mask out __GFP_WAIT and punt to
+ * workqueue if the allocation fails. Otherwise, your code will probably
+ * deadlock.
+ *
+ * You can't allocate more than once from the same bio pool without submitting
+ * the previous allocations (so they'll eventually complete and deallocate
+ * themselves), but if you're under generic_make_request() those previous
+ * allocations won't submit until you return . And if you have to split bios,
+ * you should expect that some bios will require multiple splits.
+ */
+struct bio *bio_split(struct bio *bio, int sectors,
+ gfp_t gfp, struct bio_set *bs)
+{
+ unsigned idx, vcnt = 0, nbytes = sectors << 9;
+ struct bio_vec *bv;
+ struct bio *ret = NULL;
+
+ BUG_ON(sectors <= 0);
+
+ if (sectors >= bio_sectors(bio))
+ return bio;
+
+ trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
+ bio->bi_sector + sectors);
+
+ bio_for_each_segment(bv, bio, idx) {
+ vcnt = idx - bio->bi_idx;
+
+ if (!nbytes) {
+ ret = bio_alloc_bioset(gfp, 0, bs);
+ if (!ret)
+ return NULL;
+
+ ret->bi_io_vec = bio_iovec(bio);
+ ret->bi_flags |= 1 << BIO_CLONED;
+ break;
+ } else if (nbytes < bv->bv_len) {
+ ret = bio_alloc_bioset(gfp, ++vcnt, bs);
+ if (!ret)
+ return NULL;
+
+ memcpy(ret->bi_io_vec, bio_iovec(bio),
+ sizeof(struct bio_vec) * vcnt);
+
+ ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
+ bv->bv_offset += nbytes;
+ bv->bv_len -= nbytes;
+ break;
+ }
+
+ nbytes -= bv->bv_len;
+ }
+
+ ret->bi_bdev = bio->bi_bdev;
+ ret->bi_sector = bio->bi_sector;
+ ret->bi_size = sectors << 9;
+ ret->bi_rw = bio->bi_rw;
+ ret->bi_vcnt = vcnt;
+ ret->bi_max_vecs = vcnt;
+ ret->bi_end_io = bio->bi_end_io;
+ ret->bi_private = bio->bi_private;
+
+ bio->bi_sector += sectors;
+ bio->bi_size -= sectors << 9;
+ bio->bi_idx = idx;
+
+ if (bio_integrity(bio)) {
+ bio_integrity_clone(ret, bio, gfp, bs);
+ bio_integrity_trim(ret, 0, bio_sectors(ret));
+ bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(bio_split);
+
+/**
* bio_sector_offset - Find hardware sector offset in bio
* @bio: bio to inspect
* @index: bio_vec index
diff --git a/include/linux/bio.h b/include/linux/bio.h
index fdcc8dc..2d06262 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -201,6 +201,9 @@ struct bio_pair {
atomic_t cnt;
int error;
};
+
+extern struct bio *bio_split(struct bio *bio, int sectors,
+ gfp_t gfp, struct bio_set *bs);
extern struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors);
extern void bio_pair_release(struct bio_pair *dbio);

--
1.7.7.3

2012-08-06 22:09:27

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 11/12] block: Add bio_clone_bioset()

This consolidates some code, and will help in a later patch changing how
bio cloning works.

Signed-off-by: Kent Overstreet <[email protected]>
---
block/blk-core.c | 8 +-------
drivers/md/dm.c | 4 ++--
drivers/md/md.c | 20 +-------------------
fs/bio.c | 16 ++++++++++++----
include/linux/bio.h | 1 +
5 files changed, 17 insertions(+), 32 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e9058c2..10a6e08 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2768,16 +2768,10 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
blk_rq_init(NULL, rq);

__rq_for_each_bio(bio_src, rq_src) {
- bio = bio_alloc_bioset(gfp_mask, bio_src->bi_max_vecs, bs);
+ bio = bio_clone_bioset(bio_src, gfp_mask, bs);
if (!bio)
goto free_and_out;

- __bio_clone(bio, bio_src);
-
- if (bio_integrity(bio_src) &&
- bio_integrity_clone(bio, bio_src, gfp_mask, bs))
- goto free_and_out;
-
if (bio_ctr && bio_ctr(bio, bio_src, data))
goto free_and_out;

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 4014696..3f3c26e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1101,8 +1101,8 @@ static void __issue_target_request(struct clone_info *ci, struct dm_target *ti,
* ci->bio->bi_max_vecs is BIO_INLINE_VECS anyway, for both flush
* and discard, so no need for concern about wasted bvec allocations.
*/
- clone = bio_alloc_bioset(GFP_NOIO, ci->bio->bi_max_vecs, ci->md->bs);
- __bio_clone(clone, ci->bio);
+ clone = bio_clone_bioset(ci->bio, GFP_NOIO, ci->md->bs);
+
if (len) {
clone->bi_sector = ci->sector;
clone->bi_size = to_bytes(len);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index f9d16dc..069c3bc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -173,28 +173,10 @@ EXPORT_SYMBOL_GPL(bio_alloc_mddev);
struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
struct mddev *mddev)
{
- struct bio *b;
-
if (!mddev || !mddev->bio_set)
return bio_clone(bio, gfp_mask);

- b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, mddev->bio_set);
- if (!b)
- return NULL;
-
- __bio_clone(b, bio);
- if (bio_integrity(bio)) {
- int ret;
-
- ret = bio_integrity_clone(b, bio, gfp_mask, mddev->bio_set);
-
- if (ret < 0) {
- bio_put(b);
- return NULL;
- }
- }
-
- return b;
+ return bio_clone_bioset(bio, gfp_mask, mddev->bio_set);
}
EXPORT_SYMBOL_GPL(bio_clone_mddev);

diff --git a/fs/bio.c b/fs/bio.c
index 77b9313..38ad026 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -467,15 +467,17 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
EXPORT_SYMBOL(__bio_clone);

/**
- * bio_clone - clone a bio
+ * bio_clone_bioset - clone a bio
* @bio: bio to clone
* @gfp_mask: allocation priority
+ * @bs: bio_set to allocate from
*
* Like __bio_clone, only also allocates the returned bio
*/
-struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
+struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
+ struct bio_set *bs)
{
- struct bio *b = bio_alloc(gfp_mask, bio->bi_max_vecs);
+ struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);

if (!b)
return NULL;
@@ -485,7 +487,7 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
if (bio_integrity(bio)) {
int ret;

- ret = bio_integrity_clone(b, bio, gfp_mask, fs_bio_set);
+ ret = bio_integrity_clone(b, bio, gfp_mask, bs);

if (ret < 0) {
bio_put(b);
@@ -495,6 +497,12 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)

return b;
}
+EXPORT_SYMBOL(bio_clone_bioset);
+
+struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
+{
+ return bio_clone_bioset(bio, gfp_mask, fs_bio_set);
+}
EXPORT_SYMBOL(bio_clone);

struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
diff --git a/include/linux/bio.h b/include/linux/bio.h
index e180f1d..fb90584 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -220,6 +220,7 @@ struct request_queue;
extern int bio_phys_segments(struct request_queue *, struct bio *);

extern void __bio_clone(struct bio *, struct bio *);
+extern struct bio *bio_clone_bioset(struct bio *, gfp_t, struct bio_set *bs);
extern struct bio *bio_clone(struct bio *, gfp_t);
struct bio *bio_clone_kmalloc(struct bio *, gfp_t);

--
1.7.7.3

2012-08-06 22:09:43

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 12/12] block: Only clone bio vecs that are in use

bcache creates large bios internally, and then splits them according to
the device requirements before it sends them down. If a lower level
device tries to clone the bio, and the original bio had more than
BIO_MAX_PAGES, the clone will fail unecessarily.

We can fix this by only cloning the bio vecs that are actually in use.

Signed-off-by: Kent Overstreet <[email protected]>
---
drivers/block/rbd.c | 2 +-
drivers/md/dm.c | 5 ++---
fs/bio.c | 13 +++++++------
3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 692cf05..21edfe5 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -714,7 +714,7 @@ static struct bio *bio_chain_clone(struct bio **old, struct bio **next,
}

while (old_chain && (total < len)) {
- tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
+ tmp = bio_kmalloc(gfpmask, bio_segments(old_chain));
if (!tmp)
goto err_out;

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3f3c26e..193fb19 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1057,11 +1057,10 @@ static struct bio *clone_bio(struct bio *bio, sector_t sector,
{
struct bio *clone;

- clone = bio_alloc_bioset(GFP_NOIO, bio->bi_max_vecs, bs);
+ clone = bio_alloc_bioset(GFP_NOIO, bv_count, bs);
__bio_clone(clone, bio);
clone->bi_sector = sector;
- clone->bi_idx = idx;
- clone->bi_vcnt = idx + bv_count;
+ clone->bi_vcnt = bv_count;
clone->bi_size = to_bytes(len);
clone->bi_flags &= ~(1 << BIO_SEG_VALID);

diff --git a/fs/bio.c b/fs/bio.c
index 38ad026..631c67e 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -449,8 +449,9 @@ EXPORT_SYMBOL(bio_phys_segments);
*/
void __bio_clone(struct bio *bio, struct bio *bio_src)
{
- memcpy(bio->bi_io_vec, bio_src->bi_io_vec,
- bio_src->bi_max_vecs * sizeof(struct bio_vec));
+ memcpy(bio->bi_io_vec,
+ bio_iovec(bio_src),
+ bio_segments(bio_src) * sizeof(struct bio_vec));

/*
* most users will be overriding ->bi_bdev with a new target,
@@ -459,10 +460,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
bio->bi_sector = bio_src->bi_sector;
bio->bi_bdev = bio_src->bi_bdev;
bio->bi_flags |= 1 << BIO_CLONED;
+ bio->bi_flags &= ~(1 << BIO_SEG_VALID);
bio->bi_rw = bio_src->bi_rw;
- bio->bi_vcnt = bio_src->bi_vcnt;
+ bio->bi_vcnt = bio_segments(bio_src);
bio->bi_size = bio_src->bi_size;
- bio->bi_idx = bio_src->bi_idx;
}
EXPORT_SYMBOL(__bio_clone);

@@ -477,7 +478,7 @@ EXPORT_SYMBOL(__bio_clone);
struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
struct bio_set *bs)
{
- struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
+ struct bio *b = bio_alloc_bioset(gfp_mask, bio_segments(bio), bs);

if (!b)
return NULL;
@@ -507,7 +508,7 @@ EXPORT_SYMBOL(bio_clone);

struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
{
- struct bio *b = bio_kmalloc(gfp_mask, bio->bi_max_vecs);
+ struct bio *b = bio_kmalloc(gfp_mask, bio_segments(bio));

if (!b)
return NULL;
--
1.7.7.3

2012-08-06 22:09:21

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 03/12] block: Add bio_reset()

Reusing bios is something that's been highly frowned upon in the past,
but driver code keeps doing it anyways. If it's going to happen anyways,
we should provide a generic method.

This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
was open coding it, by doing a bio_init() and resetting bi_destructor.

v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
bio->bi_flags are saved.

Signed-off-by: Kent Overstreet <[email protected]>
Change-Id: I4eb2975bd678d3be811d5423d0620b08020be9ff
---
fs/bio.c | 9 +++++++++
include/linux/bio.h | 1 +
include/linux/blk_types.h | 9 +++++++++
3 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 1c6c8b7..c7f3442 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -261,6 +261,15 @@ void bio_init(struct bio *bio)
}
EXPORT_SYMBOL(bio_init);

+void bio_reset(struct bio *bio)
+{
+ unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS);
+
+ memset(bio, 0, BIO_RESET_BYTES);
+ bio->bi_flags = flags|(1 << BIO_UPTODATE);
+}
+EXPORT_SYMBOL(bio_reset);
+
/**
* bio_alloc_bioset - allocate a bio for I/O
* @gfp_mask: the GFP_ mask given to the slab allocator
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 2643589..ba60319 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -226,6 +226,7 @@ extern void __bio_clone(struct bio *, struct bio *);
extern struct bio *bio_clone(struct bio *, gfp_t);

extern void bio_init(struct bio *);
+extern void bio_reset(struct bio *);

extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 293547e..401c573 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -59,6 +59,10 @@ struct bio {
unsigned int bi_seg_front_size;
unsigned int bi_seg_back_size;

+ /*
+ * Everything starting with bi_max_vecs will be preserved by bio_reset()
+ */
+
unsigned int bi_max_vecs; /* max bvl_vecs we can hold */

atomic_t bi_cnt; /* pin count */
@@ -93,6 +97,8 @@ struct bio {
struct bio_vec bi_inline_vecs[0];
};

+#define BIO_RESET_BYTES offsetof(struct bio, bi_max_vecs)
+
/*
* bio flags
*/
@@ -108,6 +114,9 @@ struct bio {
#define BIO_FS_INTEGRITY 9 /* fs owns integrity data, not block layer */
#define BIO_QUIET 10 /* Make BIO Quiet */
#define BIO_MAPPED_INTEGRITY 11/* integrity metadata has been remapped */
+
+#define BIO_RESET_BITS 12 /* Flags starting here get preserved by bio_reset() */
+
#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))

/*
--
1.7.7.3

2012-08-06 22:10:07

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 10/12] block: Add bio_clone_kmalloc()

Acked-by: Boaz Harrosh <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
drivers/block/osdblk.c | 3 +--
fs/bio.c | 13 +++++++++++++
fs/exofs/ore.c | 5 ++---
include/linux/bio.h | 1 +
4 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
index 87311eb..1bbc681 100644
--- a/drivers/block/osdblk.c
+++ b/drivers/block/osdblk.c
@@ -266,11 +266,10 @@ static struct bio *bio_chain_clone(struct bio *old_chain, gfp_t gfpmask)
struct bio *tmp, *new_chain = NULL, *tail = NULL;

while (old_chain) {
- tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
+ tmp = bio_clone_kmalloc(old_chain, gfpmask);
if (!tmp)
goto err_out;

- __bio_clone(tmp, old_chain);
tmp->bi_bdev = NULL;
gfpmask &= ~__GFP_WAIT;
tmp->bi_next = NULL;
diff --git a/fs/bio.c b/fs/bio.c
index f0c865b..77b9313 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -497,6 +497,19 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
}
EXPORT_SYMBOL(bio_clone);

+struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
+{
+ struct bio *b = bio_kmalloc(gfp_mask, bio->bi_max_vecs);
+
+ if (!b)
+ return NULL;
+
+ __bio_clone(b, bio);
+
+ return b;
+}
+EXPORT_SYMBOL(bio_clone_kmalloc);
+
/**
* bio_get_nr_vecs - return approx number of vecs
* @bdev: I/O target
diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c
index 24a49d4..a8d92fc 100644
--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -814,8 +814,8 @@ static int _write_mirror(struct ore_io_state *ios, int cur_comp)
struct bio *bio;

if (per_dev != master_dev) {
- bio = bio_kmalloc(GFP_KERNEL,
- master_dev->bio->bi_max_vecs);
+ bio = bio_clone_kmalloc(master_dev->bio,
+ GFP_KERNEL);
if (unlikely(!bio)) {
ORE_DBGMSG(
"Failed to allocate BIO size=%u\n",
@@ -824,7 +824,6 @@ static int _write_mirror(struct ore_io_state *ios, int cur_comp)
goto out;
}

- __bio_clone(bio, master_dev->bio);
bio->bi_bdev = NULL;
bio->bi_next = NULL;
per_dev->offset = master_dev->offset;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 9720544..e180f1d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -221,6 +221,7 @@ extern int bio_phys_segments(struct request_queue *, struct bio *);

extern void __bio_clone(struct bio *, struct bio *);
extern struct bio *bio_clone(struct bio *, gfp_t);
+struct bio *bio_clone_kmalloc(struct bio *, gfp_t);

extern void bio_init(struct bio *);
extern void bio_reset(struct bio *);
--
1.7.7.3

2012-08-06 22:10:35

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 09/12] block: Rework bio_pair_split()

This changes bio_pair_split() to use the new bio_split() underneath,
which gets rid of the single page bio limitation. The various callers
are fixed up for the slightly different struct bio_pair, and to remove
the unnecessary checks.

v5: Move extern declaration to proper patch, per Boaz

Signed-off-by: Kent Overstreet <[email protected]>
---
drivers/block/drbd/drbd_req.c | 16 +-------
drivers/block/pktcdvd.c | 4 +-
drivers/block/rbd.c | 7 ++--
drivers/md/linear.c | 4 +-
drivers/md/raid0.c | 6 ++--
drivers/md/raid10.c | 21 ++---------
fs/bio-integrity.c | 44 -----------------------
fs/bio.c | 78 ++++++++++++----------------------------
include/linux/bio.h | 22 ++++--------
9 files changed, 47 insertions(+), 155 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index b5cfa3b..fea4a40 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1123,18 +1123,6 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
do {
inc_ap_bio(mdev, 1);
} while (drbd_make_request_common(mdev, bio, start_time));
- return;
- }
-
- /* can this bio be split generically?
- * Maybe add our own split-arbitrary-bios function. */
- if (bio->bi_vcnt != 1 || bio->bi_idx != 0 || bio->bi_size > DRBD_MAX_BIO_SIZE) {
- /* rather error out here than BUG in bio_split */
- dev_err(DEV, "bio would need to, but cannot, be split: "
- "(vcnt=%u,idx=%u,size=%u,sector=%llu)\n",
- bio->bi_vcnt, bio->bi_idx, bio->bi_size,
- (unsigned long long)bio->bi_sector);
- bio_endio(bio, -EINVAL);
} else {
/* This bio crosses some boundary, so we have to split it. */
struct bio_pair *bp;
@@ -1161,10 +1149,10 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)

D_ASSERT(e_enr == s_enr + 1);

- while (drbd_make_request_common(mdev, &bp->bio1, start_time))
+ while (drbd_make_request_common(mdev, &bp->split, start_time))
inc_ap_bio(mdev, 1);

- while (drbd_make_request_common(mdev, &bp->bio2, start_time))
+ while (drbd_make_request_common(mdev, bio, start_time))
inc_ap_bio(mdev, 1);

dec_ap_bio(mdev);
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 18393a1..6709c1d 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2471,8 +2471,8 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
first_sectors = last_zone - bio->bi_sector;
bp = bio_pair_split(bio, first_sectors);
BUG_ON(!bp);
- pkt_make_request(q, &bp->bio1);
- pkt_make_request(q, &bp->bio2);
+ pkt_make_request(q, &bp->split);
+ pkt_make_request(q, bio);
bio_pair_release(bp);
return;
}
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index e33c224..692cf05 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -732,14 +732,13 @@ static struct bio *bio_chain_clone(struct bio **old, struct bio **next,

/* split the bio. We'll release it either in the next
call, or it will have to be released outside */
- bp = bio_pair_split(old_chain,
- (len - total) / SECTOR_SIZE);
+ bp = bio_pair_split(old_chain, (len - total) / SECTOR_SIZE);
if (!bp)
goto err_out;

- __bio_clone(tmp, &bp->bio1);
+ __bio_clone(tmp, &bp->split);

- *next = &bp->bio2;
+ *next = bp->orig;
} else {
__bio_clone(tmp, old_chain);
*next = old_chain->bi_next;
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index e860cb9..7c6cafd 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -316,8 +316,8 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)

bp = bio_pair_split(bio, end_sector - bio->bi_sector);

- linear_make_request(mddev, &bp->bio1);
- linear_make_request(mddev, &bp->bio2);
+ linear_make_request(mddev, &bp->split);
+ linear_make_request(mddev, bio);
bio_pair_release(bp);
return;
}
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index c89c8aa..3469adf 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -520,9 +520,9 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
(chunk_sects-1)));
else
bp = bio_pair_split(bio, chunk_sects -
- sector_div(sector, chunk_sects));
- raid0_make_request(mddev, &bp->bio1);
- raid0_make_request(mddev, &bp->bio2);
+ sector_div(sector, chunk_sects));
+ raid0_make_request(mddev, &bp->split);
+ raid0_make_request(mddev, bio);
bio_pair_release(bp);
return;
}
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index be75924..1a67a78 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1056,15 +1056,9 @@ static void make_request(struct mddev *mddev, struct bio * bio)
&& (conf->geo.near_copies < conf->geo.raid_disks
|| conf->prev.near_copies < conf->prev.raid_disks))) {
struct bio_pair *bp;
- /* Sanity check -- queue functions should prevent this happening */
- if (bio->bi_vcnt != 1 ||
- bio->bi_idx != 0)
- goto bad_map;
- /* This is a one page bio that upper layers
- * refuse to split for us, so we need to split it.
- */
+
bp = bio_pair_split(bio,
- chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );
+ chunk_sects - (bio->bi_sector & (chunk_sects - 1)));

/* Each of these 'make_request' calls will call 'wait_barrier'.
* If the first succeeds but the second blocks due to the resync
@@ -1078,8 +1072,8 @@ static void make_request(struct mddev *mddev, struct bio * bio)
conf->nr_waiting++;
spin_unlock_irq(&conf->resync_lock);

- make_request(mddev, &bp->bio1);
- make_request(mddev, &bp->bio2);
+ make_request(mddev, &bp->split);
+ make_request(mddev, bio);

spin_lock_irq(&conf->resync_lock);
conf->nr_waiting--;
@@ -1088,13 +1082,6 @@ static void make_request(struct mddev *mddev, struct bio * bio)

bio_pair_release(bp);
return;
- bad_map:
- printk("md/raid10:%s: make_request bug: can't convert block across chunks"
- " or bigger than %dk %llu %d\n", mdname(mddev), chunk_sects/2,
- (unsigned long long)bio->bi_sector, bio->bi_size >> 10);
-
- bio_io_error(bio);
- return;
}

md_write_start(mddev, bio);
diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
index e85c04b..9ed2c44 100644
--- a/fs/bio-integrity.c
+++ b/fs/bio-integrity.c
@@ -682,50 +682,6 @@ void bio_integrity_trim(struct bio *bio, unsigned int offset,
EXPORT_SYMBOL(bio_integrity_trim);

/**
- * bio_integrity_split - Split integrity metadata
- * @bio: Protected bio
- * @bp: Resulting bio_pair
- * @sectors: Offset
- *
- * Description: Splits an integrity page into a bio_pair.
- */
-void bio_integrity_split(struct bio *bio, struct bio_pair *bp, int sectors)
-{
- struct blk_integrity *bi;
- struct bio_integrity_payload *bip = bio->bi_integrity;
- unsigned int nr_sectors;
-
- if (bio_integrity(bio) == 0)
- return;
-
- bi = bdev_get_integrity(bio->bi_bdev);
- BUG_ON(bi == NULL);
- BUG_ON(bip->bip_vcnt != 1);
-
- nr_sectors = bio_integrity_hw_sectors(bi, sectors);
-
- bp->bio1.bi_integrity = &bp->bip1;
- bp->bio2.bi_integrity = &bp->bip2;
-
- bp->iv1 = bip->bip_vec[0];
- bp->iv2 = bip->bip_vec[0];
-
- bp->bip1.bip_vec[0] = bp->iv1;
- bp->bip2.bip_vec[0] = bp->iv2;
-
- bp->iv1.bv_len = sectors * bi->tuple_size;
- bp->iv2.bv_offset += sectors * bi->tuple_size;
- bp->iv2.bv_len -= sectors * bi->tuple_size;
-
- bp->bip1.bip_sector = bio->bi_integrity->bip_sector;
- bp->bip2.bip_sector = bio->bi_integrity->bip_sector + nr_sectors;
-
- bp->bip1.bip_vcnt = bp->bip2.bip_vcnt = 1;
- bp->bip1.bip_idx = bp->bip2.bip_idx = 0;
-}
-EXPORT_SYMBOL(bio_integrity_split);
-
-/**
* bio_integrity_clone - Callback for cloning bios with integrity metadata
* @bio: New bio
* @bio_src: Original bio
diff --git a/fs/bio.c b/fs/bio.c
index 312e5de..f0c865b 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -37,7 +37,7 @@
*/
#define BIO_INLINE_VECS 4

-static mempool_t *bio_split_pool __read_mostly;
+static struct bio_set *bio_split_pool __read_mostly;

/*
* if you change this list, also change bvec_alloc or things will
@@ -1460,77 +1460,48 @@ EXPORT_SYMBOL(bio_endio);
void bio_pair_release(struct bio_pair *bp)
{
if (atomic_dec_and_test(&bp->cnt)) {
- struct bio *master = bp->bio1.bi_private;
+ bp->orig->bi_end_io = bp->bi_end_io;
+ bp->orig->bi_private = bp->bi_private;

- bio_endio(master, bp->error);
- mempool_free(bp, bp->bio2.bi_private);
+ bio_endio(bp->orig, 0);
+ bio_put(&bp->split);
}
}
EXPORT_SYMBOL(bio_pair_release);

-static void bio_pair_end_1(struct bio *bi, int err)
+static void bio_pair_end(struct bio *bio, int error)
{
- struct bio_pair *bp = container_of(bi, struct bio_pair, bio1);
+ struct bio_pair *bp = bio->bi_private;

- if (err)
- bp->error = err;
+ if (error)
+ clear_bit(BIO_UPTODATE, &bp->orig->bi_flags);

bio_pair_release(bp);
}

-static void bio_pair_end_2(struct bio *bi, int err)
+struct bio_pair *bio_pair_split(struct bio *bio, int first_sectors)
{
- struct bio_pair *bp = container_of(bi, struct bio_pair, bio2);
-
- if (err)
- bp->error = err;
-
- bio_pair_release(bp);
-}
+ struct bio_pair *bp;
+ struct bio *split = bio_split(bio, first_sectors,
+ GFP_NOIO, bio_split_pool);

-/*
- * split a bio - only worry about a bio with a single page in its iovec
- */
-struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
-{
- struct bio_pair *bp = mempool_alloc(bio_split_pool, GFP_NOIO);
+ if (!split)
+ return NULL;

- if (!bp)
- return bp;
+ BUG_ON(split == bio);

- trace_block_split(bdev_get_queue(bi->bi_bdev), bi,
- bi->bi_sector + first_sectors);
+ bp = container_of(split, struct bio_pair, split);

- BUG_ON(bi->bi_vcnt != 1);
- BUG_ON(bi->bi_idx != 0);
atomic_set(&bp->cnt, 3);
- bp->error = 0;
- bp->bio1 = *bi;
- bp->bio2 = *bi;
- bp->bio2.bi_sector += first_sectors;
- bp->bio2.bi_size -= first_sectors << 9;
- bp->bio1.bi_size = first_sectors << 9;
-
- bp->bv1 = bi->bi_io_vec[0];
- bp->bv2 = bi->bi_io_vec[0];
- bp->bv2.bv_offset += first_sectors << 9;
- bp->bv2.bv_len -= first_sectors << 9;
- bp->bv1.bv_len = first_sectors << 9;
-
- bp->bio1.bi_io_vec = &bp->bv1;
- bp->bio2.bi_io_vec = &bp->bv2;
-
- bp->bio1.bi_max_vecs = 1;
- bp->bio2.bi_max_vecs = 1;

- bp->bio1.bi_end_io = bio_pair_end_1;
- bp->bio2.bi_end_io = bio_pair_end_2;
+ bp->bi_end_io = bio->bi_end_io;
+ bp->bi_private = bio->bi_private;

- bp->bio1.bi_private = bi;
- bp->bio2.bi_private = bio_split_pool;
+ bio->bi_private = bp;
+ bio->bi_end_io = bio_pair_end;

- if (bio_integrity(bi))
- bio_integrity_split(bi, bp, first_sectors);
+ split->bi_private = bp;
+ split->bi_end_io = bio_pair_end;

return bp;
}
@@ -1841,8 +1812,7 @@ static int __init init_bio(void)
if (bioset_integrity_create(fs_bio_set, BIO_POOL_SIZE))
panic("bio: can't create integrity pool\n");

- bio_split_pool = mempool_create_kmalloc_pool(BIO_SPLIT_ENTRIES,
- sizeof(struct bio_pair));
+ bio_split_pool = bioset_create(BIO_POOL_SIZE, offsetof(struct bio_pair, split));
if (!bio_split_pool)
panic("bio: can't create split pool\n");

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 2d06262..9720544 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -192,14 +192,13 @@ struct bio_integrity_payload {
* in bio2.bi_private
*/
struct bio_pair {
- struct bio bio1, bio2;
- struct bio_vec bv1, bv2;
-#if defined(CONFIG_BLK_DEV_INTEGRITY)
- struct bio_integrity_payload bip1, bip2;
- struct bio_vec iv1, iv2;
-#endif
- atomic_t cnt;
- int error;
+ atomic_t cnt;
+
+ bio_end_io_t *bi_end_io;
+ void *bi_private;
+
+ struct bio *orig;
+ struct bio split;
};

extern struct bio *bio_split(struct bio *bio, int sectors,
@@ -515,7 +514,6 @@ extern int bio_integrity_prep(struct bio *);
extern void bio_integrity_endio(struct bio *, int);
extern void bio_integrity_advance(struct bio *, unsigned int);
extern void bio_integrity_trim(struct bio *, unsigned int, unsigned int);
-extern void bio_integrity_split(struct bio *, struct bio_pair *, int);
extern int bio_integrity_clone(struct bio *, struct bio *, gfp_t, struct bio_set *);
extern int bioset_integrity_create(struct bio_set *, int);
extern void bioset_integrity_free(struct bio_set *);
@@ -559,12 +557,6 @@ static inline int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
return 0;
}

-static inline void bio_integrity_split(struct bio *bio, struct bio_pair *bp,
- int sectors)
-{
- return;
-}
-
static inline void bio_integrity_advance(struct bio *bio,
unsigned int bytes_done)
{
--
1.7.7.3

2012-08-06 22:11:05

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 07/12] block: Rename bio_split() -> bio_pair_split()

This is prep work for introducing a more general bio_split()

Signed-off-by: Kent Overstreet <[email protected]>
---
drivers/block/drbd/drbd_req.c | 2 +-
drivers/block/pktcdvd.c | 2 +-
drivers/block/rbd.c | 3 ++-
drivers/md/linear.c | 2 +-
drivers/md/raid0.c | 6 +++---
drivers/md/raid10.c | 4 ++--
fs/bio.c | 4 ++--
include/linux/bio.h | 2 +-
8 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 8e93a6a..b5cfa3b 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1150,7 +1150,7 @@ void drbd_make_request(struct request_queue *q, struct bio *bio)
const int sps = 1 << HT_SHIFT; /* sectors per slot */
const int mask = sps - 1;
const sector_t first_sectors = sps - (sect & mask);
- bp = bio_split(bio, first_sectors);
+ bp = bio_pair_split(bio, first_sectors);

/* we need to get a "reference count" (ap_bio_cnt)
* to avoid races with the disconnect/reconnect/suspend code.
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index ae55f08..18393a1 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2469,7 +2469,7 @@ static void pkt_make_request(struct request_queue *q, struct bio *bio)
if (last_zone != zone) {
BUG_ON(last_zone != zone + pd->settings.size);
first_sectors = last_zone - bio->bi_sector;
- bp = bio_split(bio, first_sectors);
+ bp = bio_pair_split(bio, first_sectors);
BUG_ON(!bp);
pkt_make_request(q, &bp->bio1);
pkt_make_request(q, &bp->bio2);
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 8f428a8..e33c224 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -732,7 +732,8 @@ static struct bio *bio_chain_clone(struct bio **old, struct bio **next,

/* split the bio. We'll release it either in the next
call, or it will have to be released outside */
- bp = bio_split(old_chain, (len - total) / SECTOR_SIZE);
+ bp = bio_pair_split(old_chain,
+ (len - total) / SECTOR_SIZE);
if (!bp)
goto err_out;

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index fa211d8..e860cb9 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -314,7 +314,7 @@ static void linear_make_request(struct mddev *mddev, struct bio *bio)

rcu_read_unlock();

- bp = bio_split(bio, end_sector - bio->bi_sector);
+ bp = bio_pair_split(bio, end_sector - bio->bi_sector);

linear_make_request(mddev, &bp->bio1);
linear_make_request(mddev, &bp->bio2);
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index de63a1f..c89c8aa 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -516,11 +516,11 @@ static void raid0_make_request(struct mddev *mddev, struct bio *bio)
* refuse to split for us, so we need to split it.
*/
if (likely(is_power_of_2(chunk_sects)))
- bp = bio_split(bio, chunk_sects - (sector &
+ bp = bio_pair_split(bio, chunk_sects - (sector &
(chunk_sects-1)));
else
- bp = bio_split(bio, chunk_sects -
- sector_div(sector, chunk_sects));
+ bp = bio_pair_split(bio, chunk_sects -
+ sector_div(sector, chunk_sects));
raid0_make_request(mddev, &bp->bio1);
raid0_make_request(mddev, &bp->bio2);
bio_pair_release(bp);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 8da6282..be75924 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1063,8 +1063,8 @@ static void make_request(struct mddev *mddev, struct bio * bio)
/* This is a one page bio that upper layers
* refuse to split for us, so we need to split it.
*/
- bp = bio_split(bio,
- chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );
+ bp = bio_pair_split(bio,
+ chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );

/* Each of these 'make_request' calls will call 'wait_barrier'.
* If the first succeeds but the second blocks due to the resync
diff --git a/fs/bio.c b/fs/bio.c
index ff34311..0470376 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1491,7 +1491,7 @@ static void bio_pair_end_2(struct bio *bi, int err)
/*
* split a bio - only worry about a bio with a single page in its iovec
*/
-struct bio_pair *bio_split(struct bio *bi, int first_sectors)
+struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
{
struct bio_pair *bp = mempool_alloc(bio_split_pool, GFP_NOIO);

@@ -1534,7 +1534,7 @@ struct bio_pair *bio_split(struct bio *bi, int first_sectors)

return bp;
}
-EXPORT_SYMBOL(bio_split);
+EXPORT_SYMBOL(bio_pair_split);

/**
* bio_sector_offset - Find hardware sector offset in bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 484b96e..fdcc8dc 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -201,7 +201,7 @@ struct bio_pair {
atomic_t cnt;
int error;
};
-extern struct bio_pair *bio_split(struct bio *bi, int first_sectors);
+extern struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors);
extern void bio_pair_release(struct bio_pair *dbio);

extern struct bio_set *bioset_create(unsigned int, unsigned int);
--
1.7.7.3

2012-08-06 22:09:16

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 04/12] pktcdvd: Switch to bio_kmalloc()

This is prep work for killing bi_destructor - previously, pktcdvd had
its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
necessitating its own bi_destructor implementation.

v5: Un-reorder some functions, to make the patch easier to review

Signed-off-by: Kent Overstreet <[email protected]>
---
drivers/block/pktcdvd.c | 67 +++++++++++-----------------------------------
1 files changed, 16 insertions(+), 51 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index ba66e44..ae55f08 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -101,6 +101,8 @@ static struct dentry *pkt_debugfs_root = NULL; /* /sys/kernel/debug/pktcdvd */
static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev);
static int pkt_remove_dev(dev_t pkt_dev);
static int pkt_seq_show(struct seq_file *m, void *p);
+static void pkt_end_io_read(struct bio *bio, int err);
+static void pkt_end_io_packet_write(struct bio *bio, int err);



@@ -522,38 +524,6 @@ static void pkt_bio_finished(struct pktcdvd_device *pd)
}
}

-static void pkt_bio_destructor(struct bio *bio)
-{
- kfree(bio->bi_io_vec);
- kfree(bio);
-}
-
-static struct bio *pkt_bio_alloc(int nr_iovecs)
-{
- struct bio_vec *bvl = NULL;
- struct bio *bio;
-
- bio = kmalloc(sizeof(struct bio), GFP_KERNEL);
- if (!bio)
- goto no_bio;
- bio_init(bio);
-
- bvl = kcalloc(nr_iovecs, sizeof(struct bio_vec), GFP_KERNEL);
- if (!bvl)
- goto no_bvl;
-
- bio->bi_max_vecs = nr_iovecs;
- bio->bi_io_vec = bvl;
- bio->bi_destructor = pkt_bio_destructor;
-
- return bio;
-
- no_bvl:
- kfree(bio);
- no_bio:
- return NULL;
-}
-
/*
* Allocate a packet_data struct
*/
@@ -567,10 +537,13 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
goto no_pkt;

pkt->frames = frames;
- pkt->w_bio = pkt_bio_alloc(frames);
+ pkt->w_bio = bio_kmalloc(GFP_KERNEL, frames);
if (!pkt->w_bio)
goto no_bio;

+ pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
+ pkt->w_bio->bi_private = pkt;
+
for (i = 0; i < frames / FRAMES_PER_PAGE; i++) {
pkt->pages[i] = alloc_page(GFP_KERNEL|__GFP_ZERO);
if (!pkt->pages[i])
@@ -581,9 +554,12 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
bio_list_init(&pkt->orig_bios);

for (i = 0; i < frames; i++) {
- struct bio *bio = pkt_bio_alloc(1);
+ struct bio *bio = bio_kmalloc(GFP_KERNEL, 1);
if (!bio)
goto no_rd_bio;
+
+ bio->bi_end_io = pkt_end_io_read;
+ bio->bi_private = pkt;
pkt->r_bios[i] = bio;
}

@@ -1111,21 +1087,15 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
* Schedule reads for missing parts of the packet.
*/
for (f = 0; f < pkt->frames; f++) {
- struct bio_vec *vec;
-
int p, offset;
+
if (written[f])
continue;
+
bio = pkt->r_bios[f];
- vec = bio->bi_io_vec;
- bio_init(bio);
- bio->bi_max_vecs = 1;
- bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
- bio->bi_bdev = pd->bdev;
- bio->bi_end_io = pkt_end_io_read;
- bio->bi_private = pkt;
- bio->bi_io_vec = vec;
- bio->bi_destructor = pkt_bio_destructor;
+ bio_reset(bio);
+ bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
+ bio->bi_bdev = pd->bdev;

p = (f * CD_FRAMESIZE) / PAGE_SIZE;
offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
@@ -1418,14 +1388,9 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
}

/* Start the write request */
- bio_init(pkt->w_bio);
- pkt->w_bio->bi_max_vecs = PACKET_MAX_SIZE;
+ bio_reset(pkt->w_bio);
pkt->w_bio->bi_sector = pkt->sector;
pkt->w_bio->bi_bdev = pd->bdev;
- pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
- pkt->w_bio->bi_private = pkt;
- pkt->w_bio->bi_io_vec = bvec;
- pkt->w_bio->bi_destructor = pkt_bio_destructor;
for (f = 0; f < pkt->frames; f++)
if (!bio_add_page(pkt->w_bio, bvec[f].bv_page, CD_FRAMESIZE, bvec[f].bv_offset))
BUG();
--
1.7.7.3

2012-08-06 22:11:31

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 06/12] block: Add an explicit bio flag for bios that own their bvec

This is for the new bio splitting code. When we split a bio, if the
split occured on a bvec boundry we reuse the bvec for the new bio. But
that means bio_free() can't free it, hence the explicit flag.

Signed-off-by: Kent Overstreet <[email protected]>
---
fs/bio.c | 3 ++-
include/linux/bio.h | 5 -----
include/linux/blk_types.h | 1 +
3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index ebc7309..ff34311 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -249,7 +249,7 @@ void bio_free(struct bio *bio)
return;
}

- if (bio_has_allocated_vec(bio))
+ if (bio_flagged(bio, BIO_OWNS_VEC))
bvec_free_bs(bs, bio->bi_io_vec, BIO_POOL_IDX(bio));

if (bio_integrity(bio))
@@ -321,6 +321,7 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
goto err_free;

nr_iovecs = bvec_nr_vecs(idx);
+ bio->bi_flags |= 1 << BIO_OWNS_VEC;
}
out_set:
bio->bi_flags |= idx << BIO_POOL_OFFSET;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 393c87e..484b96e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -84,11 +84,6 @@ static inline void *bio_data(struct bio *bio)
return NULL;
}

-static inline int bio_has_allocated_vec(struct bio *bio)
-{
- return bio->bi_io_vec && bio->bi_io_vec != bio->bi_inline_vecs;
-}
-
/*
* will die
*/
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d10c2e49..416aae4 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -113,6 +113,7 @@ struct bio {
#define BIO_MAPPED_INTEGRITY 11/* integrity metadata has been remapped */

#define BIO_RESET_BITS 12 /* Flags starting here get preserved by bio_reset() */
+#define BIO_OWNS_VEC 12 /* bio_free() should free bvec */

#define bio_flagged(bio, flag) ((bio)->bi_flags & (1 << (flag)))

--
1.7.7.3

2012-08-06 22:11:59

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 05/12] block: Kill bi_destructor

Now that we've got generic code for freeing bios allocated from bio
pools, this isn't needed anymore.

This also changes the semantics of bio_free() a bit - it now also frees
bios allocated by bio_kmalloc(). It's also no longer exported, as
without bi_destructor there should be no need for it to be called
anywhere else.

v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz

Signed-off-by: Kent Overstreet <[email protected]>
---
Documentation/block/biodoc.txt | 5 -----
block/blk-core.c | 2 +-
drivers/block/drbd/drbd_main.c | 13 +------------
fs/bio.c | 31 ++++++++++++++++---------------
include/linux/bio.h | 2 +-
include/linux/blk_types.h | 3 ---
6 files changed, 19 insertions(+), 37 deletions(-)

diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index e418dc0..8df5e8e 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -465,7 +465,6 @@ struct bio {
bio_end_io_t *bi_end_io; /* bi_end_io (bio) */
atomic_t bi_cnt; /* pin count: free when it hits zero */
void *bi_private;
- bio_destructor_t *bi_destructor; /* bi_destructor (bio) */
};

With this multipage bio design:
@@ -647,10 +646,6 @@ for a non-clone bio. There are the 6 pools setup for different size biovecs,
so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the
given size from these slabs.

-The bi_destructor() routine takes into account the possibility of the bio
-having originated from a different source (see later discussions on
-n/w to block transfers and kvec_cb)
-
The bio_get() routine may be used to hold an extra reference on a bio prior
to i/o submission, if the bio fields are likely to be accessed after the
i/o is issued (since the bio may otherwise get freed in case i/o completion
diff --git a/block/blk-core.c b/block/blk-core.c
index 93eb3e4..e9058c2 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2794,7 +2794,7 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,

free_and_out:
if (bio)
- bio_free(bio, bs);
+ bio_free(bio);
blk_rq_unprep_clone(rq);

return -ENOMEM;
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 920ede2..19bf632 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -161,23 +161,12 @@ static const struct block_device_operations drbd_ops = {
.release = drbd_release,
};

-static void bio_destructor_drbd(struct bio *bio)
-{
- bio_free(bio, drbd_md_io_bio_set);
-}
-
struct bio *bio_alloc_drbd(gfp_t gfp_mask)
{
- struct bio *bio;
-
if (!drbd_md_io_bio_set)
return bio_alloc(gfp_mask, 1);

- bio = bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
- if (!bio)
- return NULL;
- bio->bi_destructor = bio_destructor_drbd;
- return bio;
+ return bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
}

#ifdef __CHECKER__
diff --git a/fs/bio.c b/fs/bio.c
index c7f3442..ebc7309 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -56,6 +56,8 @@ static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly = {
*/
struct bio_set *fs_bio_set;

+#define BIO_KMALLOC_POOL ((void *) ~0)
+
/*
* Our slab pool management
*/
@@ -232,10 +234,21 @@ fallback:
return bvl;
}

-void bio_free(struct bio *bio, struct bio_set *bs)
+void bio_free(struct bio *bio)
{
+ struct bio_set *bs = bio->bi_pool;
void *p;

+ BUG_ON(!bs);
+
+ if (bs == BIO_KMALLOC_POOL) {
+ /* Bio was allocated by bio_kmalloc() */
+ if (bio_integrity(bio))
+ bio_integrity_free(bio, fs_bio_set);
+ kfree(bio);
+ return;
+ }
+
if (bio_has_allocated_vec(bio))
bvec_free_bs(bs, bio->bi_io_vec, BIO_POOL_IDX(bio));

@@ -346,13 +359,6 @@ struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
}
EXPORT_SYMBOL(bio_alloc);

-static void bio_kmalloc_destructor(struct bio *bio)
-{
- if (bio_integrity(bio))
- bio_integrity_free(bio, fs_bio_set);
- kfree(bio);
-}
-
/**
* bio_kmalloc - allocate a bio for I/O using kmalloc()
* @gfp_mask: the GFP_ mask given to the slab allocator
@@ -379,7 +385,7 @@ struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int nr_iovecs)
bio->bi_flags |= BIO_POOL_NONE << BIO_POOL_OFFSET;
bio->bi_max_vecs = nr_iovecs;
bio->bi_io_vec = bio->bi_inline_vecs;
- bio->bi_destructor = bio_kmalloc_destructor;
+ bio->bi_pool = BIO_KMALLOC_POOL;

return bio;
}
@@ -417,12 +423,7 @@ void bio_put(struct bio *bio)
*/
if (atomic_dec_and_test(&bio->bi_cnt)) {
bio_disassociate_task(bio);
- bio->bi_next = NULL;
-
- if (bio->bi_pool)
- bio_free(bio, bio->bi_pool);
- else
- bio->bi_destructor(bio);
+ bio_free(bio);
}
}
EXPORT_SYMBOL(bio_put);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ba60319..393c87e 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -216,7 +216,7 @@ extern struct bio *bio_alloc(gfp_t, unsigned int);
extern struct bio *bio_kmalloc(gfp_t, unsigned int);
extern struct bio *bio_alloc_bioset(gfp_t, int, struct bio_set *);
extern void bio_put(struct bio *);
-extern void bio_free(struct bio *, struct bio_set *);
+extern void bio_free(struct bio *);

extern void bio_endio(struct bio *, int);
struct request_queue;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 401c573..d10c2e49 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -84,11 +84,8 @@ struct bio {
struct bio_integrity_payload *bi_integrity; /* data integrity */
#endif

- /* If bi_pool is non NULL, bi_destructor is not called */
struct bio_set *bi_pool;

- bio_destructor_t *bi_destructor; /* destructor */
-
/*
* We can inline a number of vecs at the end of the bio, to avoid
* double allocations for a small number of bio_vecs. This member
--
1.7.7.3

2012-08-06 22:12:25

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info

Previously, dm_rq_clone_bio_info needed to be freed by the bio's
destructor to avoid a memory leak in the blk_rq_prep_clone() error path.
This gets rid of a memory allocation and means we can kill
dm_rq_bio_destructor.

Signed-off-by: Kent Overstreet <[email protected]>
---
drivers/md/dm.c | 31 +++++--------------------------
1 files changed, 5 insertions(+), 26 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 40b7735..4014696 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -92,6 +92,7 @@ struct dm_rq_target_io {
struct dm_rq_clone_bio_info {
struct bio *orig;
struct dm_rq_target_io *tio;
+ struct bio clone;
};

union map_info *dm_get_mapinfo(struct bio *bio)
@@ -467,16 +468,6 @@ static void free_rq_tio(struct dm_rq_target_io *tio)
mempool_free(tio, tio->md->tio_pool);
}

-static struct dm_rq_clone_bio_info *alloc_bio_info(struct mapped_device *md)
-{
- return mempool_alloc(md->io_pool, GFP_ATOMIC);
-}
-
-static void free_bio_info(struct dm_rq_clone_bio_info *info)
-{
- mempool_free(info, info->tio->md->io_pool);
-}
-
static int md_in_flight(struct mapped_device *md)
{
return atomic_read(&md->pending[READ]) +
@@ -1438,30 +1429,17 @@ void dm_dispatch_request(struct request *rq)
}
EXPORT_SYMBOL_GPL(dm_dispatch_request);

-static void dm_rq_bio_destructor(struct bio *bio)
-{
- struct dm_rq_clone_bio_info *info = bio->bi_private;
- struct mapped_device *md = info->tio->md;
-
- free_bio_info(info);
- bio_free(bio, md->bs);
-}
-
static int dm_rq_bio_constructor(struct bio *bio, struct bio *bio_orig,
void *data)
{
struct dm_rq_target_io *tio = data;
- struct mapped_device *md = tio->md;
- struct dm_rq_clone_bio_info *info = alloc_bio_info(md);
-
- if (!info)
- return -ENOMEM;
+ struct dm_rq_clone_bio_info *info =
+ container_of(bio, struct dm_rq_clone_bio_info, clone);

info->orig = bio_orig;
info->tio = tio;
bio->bi_end_io = end_clone_bio;
bio->bi_private = info;
- bio->bi_destructor = dm_rq_bio_destructor;

return 0;
}
@@ -2696,7 +2674,8 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity)
if (!pools->tio_pool)
goto free_io_pool_and_out;

- pools->bs = bioset_create(pool_size, 0);
+ pools->bs = bioset_create(pool_size,
+ offsetof(struct dm_rq_clone_bio_info, orig));
if (!pools->bs)
goto free_tio_pool_and_out;

--
1.7.7.3

2012-08-06 22:12:51

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH v5 01/12] block: Generalized bio pool freeing

With the old code, when you allocate a bio from a bio pool you have to
implement your own destructor that knows how to find the bio pool the
bio was originally allocated from.

This adds a new field to struct bio (bi_pool) and changes
bio_alloc_bioset() to use it. This makes various bio destructors
unnecessary, so they're then deleted.

Signed-off-by: Kent Overstreet <[email protected]>
Acked-by: Tejun Heo <[email protected]>
---
drivers/md/dm-crypt.c | 9 ---------
drivers/md/dm-io.c | 11 -----------
drivers/md/dm.c | 20 --------------------
drivers/md/md.c | 28 ++++------------------------
drivers/target/target_core_iblock.c | 9 ---------
fs/bio.c | 26 ++++++++------------------
include/linux/blk_types.h | 3 +++
7 files changed, 15 insertions(+), 91 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3f06df5..40716d8 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -808,14 +808,6 @@ static int crypt_convert(struct crypt_config *cc,
return 0;
}

-static void dm_crypt_bio_destructor(struct bio *bio)
-{
- struct dm_crypt_io *io = bio->bi_private;
- struct crypt_config *cc = io->target->private;
-
- bio_free(bio, cc->bs);
-}
-
/*
* Generate a new unfragmented bio with the given size
* This should never violate the device limitations
@@ -985,7 +977,6 @@ static void clone_init(struct dm_crypt_io *io, struct bio *clone)
clone->bi_end_io = crypt_endio;
clone->bi_bdev = cc->dev->bdev;
clone->bi_rw = io->base_bio->bi_rw;
- clone->bi_destructor = dm_crypt_bio_destructor;
}

static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index ea5dd28..1c46f97 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -249,16 +249,6 @@ static void vm_dp_init(struct dpages *dp, void *data)
dp->context_ptr = data;
}

-static void dm_bio_destructor(struct bio *bio)
-{
- unsigned region;
- struct io *io;
-
- retrieve_io_and_region_from_bio(bio, &io, &region);
-
- bio_free(bio, io->client->bios);
-}
-
/*
* Functions for getting the pages from kernel memory.
*/
@@ -317,7 +307,6 @@ static void do_region(int rw, unsigned region, struct dm_io_region *where,
bio->bi_sector = where->sector + (where->count - remaining);
bio->bi_bdev = where->bdev;
bio->bi_end_io = endio;
- bio->bi_destructor = dm_bio_destructor;
store_io_and_region_in_bio(bio, io, region);

if (rw & REQ_DISCARD) {
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index e24143c..40b7735 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -681,11 +681,6 @@ static void clone_endio(struct bio *bio, int error)
}
}

- /*
- * Store md for cleanup instead of tio which is about to get freed.
- */
- bio->bi_private = md->bs;
-
free_tio(md, tio);
bio_put(bio);
dec_pending(io, error);
@@ -1013,11 +1008,6 @@ static void __map_bio(struct dm_target *ti, struct bio *clone,
/* error the io and bail out, or requeue it if needed */
md = tio->io->md;
dec_pending(tio->io, r);
- /*
- * Store bio_set for cleanup.
- */
- clone->bi_end_io = NULL;
- clone->bi_private = md->bs;
bio_put(clone);
free_tio(md, tio);
} else if (r) {
@@ -1036,13 +1026,6 @@ struct clone_info {
unsigned short idx;
};

-static void dm_bio_destructor(struct bio *bio)
-{
- struct bio_set *bs = bio->bi_private;
-
- bio_free(bio, bs);
-}
-
/*
* Creates a little bio that just does part of a bvec.
*/
@@ -1054,7 +1037,6 @@ static struct bio *split_bvec(struct bio *bio, sector_t sector,
struct bio_vec *bv = bio->bi_io_vec + idx;

clone = bio_alloc_bioset(GFP_NOIO, 1, bs);
- clone->bi_destructor = dm_bio_destructor;
*clone->bi_io_vec = *bv;

clone->bi_sector = sector;
@@ -1086,7 +1068,6 @@ static struct bio *clone_bio(struct bio *bio, sector_t sector,

clone = bio_alloc_bioset(GFP_NOIO, bio->bi_max_vecs, bs);
__bio_clone(clone, bio);
- clone->bi_destructor = dm_bio_destructor;
clone->bi_sector = sector;
clone->bi_idx = idx;
clone->bi_vcnt = idx + bv_count;
@@ -1131,7 +1112,6 @@ static void __issue_target_request(struct clone_info *ci, struct dm_target *ti,
*/
clone = bio_alloc_bioset(GFP_NOIO, ci->bio->bi_max_vecs, ci->md->bs);
__bio_clone(clone, ci->bio);
- clone->bi_destructor = dm_bio_destructor;
if (len) {
clone->bi_sector = ci->sector;
clone->bi_size = to_bytes(len);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d5ab449..f9d16dc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -155,32 +155,17 @@ static int start_readonly;
* like bio_clone, but with a local bio set
*/

-static void mddev_bio_destructor(struct bio *bio)
-{
- struct mddev *mddev, **mddevp;
-
- mddevp = (void*)bio;
- mddev = mddevp[-1];
-
- bio_free(bio, mddev->bio_set);
-}
-
struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
struct mddev *mddev)
{
struct bio *b;
- struct mddev **mddevp;

if (!mddev || !mddev->bio_set)
return bio_alloc(gfp_mask, nr_iovecs);

- b = bio_alloc_bioset(gfp_mask, nr_iovecs,
- mddev->bio_set);
+ b = bio_alloc_bioset(gfp_mask, nr_iovecs, mddev->bio_set);
if (!b)
return NULL;
- mddevp = (void*)b;
- mddevp[-1] = mddev;
- b->bi_destructor = mddev_bio_destructor;
return b;
}
EXPORT_SYMBOL_GPL(bio_alloc_mddev);
@@ -189,18 +174,14 @@ struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
struct mddev *mddev)
{
struct bio *b;
- struct mddev **mddevp;

if (!mddev || !mddev->bio_set)
return bio_clone(bio, gfp_mask);

- b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs,
- mddev->bio_set);
+ b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, mddev->bio_set);
if (!b)
return NULL;
- mddevp = (void*)b;
- mddevp[-1] = mddev;
- b->bi_destructor = mddev_bio_destructor;
+
__bio_clone(b, bio);
if (bio_integrity(bio)) {
int ret;
@@ -5056,8 +5037,7 @@ int md_run(struct mddev *mddev)
}

if (mddev->bio_set == NULL)
- mddev->bio_set = bioset_create(BIO_POOL_SIZE,
- sizeof(struct mddev *));
+ mddev->bio_set = bioset_create(BIO_POOL_SIZE, 0);

spin_lock(&pers_lock);
pers = find_pers(mddev->level, mddev->clevel);
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index fd47950..9338d0602 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -447,14 +447,6 @@ static void iblock_complete_cmd(struct se_cmd *cmd)
kfree(ibr);
}

-static void iblock_bio_destructor(struct bio *bio)
-{
- struct se_cmd *cmd = bio->bi_private;
- struct iblock_dev *ib_dev = cmd->se_dev->dev_ptr;
-
- bio_free(bio, ib_dev->ibd_bio_set);
-}
-
static struct bio *
iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)
{
@@ -476,7 +468,6 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 sg_num)

bio->bi_bdev = ib_dev->ibd_bd;
bio->bi_private = cmd;
- bio->bi_destructor = iblock_bio_destructor;
bio->bi_end_io = &iblock_bio_done;
bio->bi_sector = lba;
return bio;
diff --git a/fs/bio.c b/fs/bio.c
index 73922ab..1c6c8b7 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -271,10 +271,6 @@ EXPORT_SYMBOL(bio_init);
* bio_alloc_bioset will try its own mempool to satisfy the allocation.
* If %__GFP_WAIT is set then we will block on the internal pool waiting
* for a &struct bio to become free.
- *
- * Note that the caller must set ->bi_destructor on successful return
- * of a bio, to do the appropriate freeing of the bio once the reference
- * count drops to zero.
**/
struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
{
@@ -289,6 +285,7 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
bio = p + bs->front_pad;

bio_init(bio);
+ bio->bi_pool = bs;

if (unlikely(!nr_iovecs))
goto out_set;
@@ -315,11 +312,6 @@ err_free:
}
EXPORT_SYMBOL(bio_alloc_bioset);

-static void bio_fs_destructor(struct bio *bio)
-{
- bio_free(bio, fs_bio_set);
-}
-
/**
* bio_alloc - allocate a new bio, memory pool backed
* @gfp_mask: allocation mask to use
@@ -341,12 +333,7 @@ static void bio_fs_destructor(struct bio *bio)
*/
struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
{
- struct bio *bio = bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
-
- if (bio)
- bio->bi_destructor = bio_fs_destructor;
-
- return bio;
+ return bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
}
EXPORT_SYMBOL(bio_alloc);

@@ -422,7 +409,11 @@ void bio_put(struct bio *bio)
if (atomic_dec_and_test(&bio->bi_cnt)) {
bio_disassociate_task(bio);
bio->bi_next = NULL;
- bio->bi_destructor(bio);
+
+ if (bio->bi_pool)
+ bio_free(bio, bio->bi_pool);
+ else
+ bio->bi_destructor(bio);
}
}
EXPORT_SYMBOL(bio_put);
@@ -473,12 +464,11 @@ EXPORT_SYMBOL(__bio_clone);
*/
struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
{
- struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, fs_bio_set);
+ struct bio *b = bio_alloc(gfp_mask, bio->bi_max_vecs);

if (!b)
return NULL;

- b->bi_destructor = bio_fs_destructor;
__bio_clone(b, bio);

if (bio_integrity(bio)) {
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 0edb65d..293547e 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -80,6 +80,9 @@ struct bio {
struct bio_integrity_payload *bi_integrity; /* data integrity */
#endif

+ /* If bi_pool is non NULL, bi_destructor is not called */
+ struct bio_set *bi_pool;
+
bio_destructor_t *bi_destructor; /* destructor */

/*
--
1.7.7.3

2012-08-06 23:16:53

by Mikulas Patocka

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH v5 12/12] block: Only clone bio vecs that are in use

Hi Kent

When you change the semantics of an exported function, rename that
function. There may be external modules that use __bio_clone and this
change could silently introduce bugs in them.

Otherwise, the patchset looks fine.

Mikulas


On Mon, 6 Aug 2012, Kent Overstreet wrote:

> bcache creates large bios internally, and then splits them according to
> the device requirements before it sends them down. If a lower level
> device tries to clone the bio, and the original bio had more than
> BIO_MAX_PAGES, the clone will fail unecessarily.
>
> We can fix this by only cloning the bio vecs that are actually in use.
>
> Signed-off-by: Kent Overstreet <[email protected]>
> ---
> drivers/block/rbd.c | 2 +-
> drivers/md/dm.c | 5 ++---
> fs/bio.c | 13 +++++++------
> 3 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 692cf05..21edfe5 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -714,7 +714,7 @@ static struct bio *bio_chain_clone(struct bio **old, struct bio **next,
> }
>
> while (old_chain && (total < len)) {
> - tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
> + tmp = bio_kmalloc(gfpmask, bio_segments(old_chain));
> if (!tmp)
> goto err_out;
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 3f3c26e..193fb19 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1057,11 +1057,10 @@ static struct bio *clone_bio(struct bio *bio, sector_t sector,
> {
> struct bio *clone;
>
> - clone = bio_alloc_bioset(GFP_NOIO, bio->bi_max_vecs, bs);
> + clone = bio_alloc_bioset(GFP_NOIO, bv_count, bs);
> __bio_clone(clone, bio);
> clone->bi_sector = sector;
> - clone->bi_idx = idx;
> - clone->bi_vcnt = idx + bv_count;
> + clone->bi_vcnt = bv_count;
> clone->bi_size = to_bytes(len);
> clone->bi_flags &= ~(1 << BIO_SEG_VALID);
>
> diff --git a/fs/bio.c b/fs/bio.c
> index 38ad026..631c67e 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -449,8 +449,9 @@ EXPORT_SYMBOL(bio_phys_segments);
> */
> void __bio_clone(struct bio *bio, struct bio *bio_src)
> {
> - memcpy(bio->bi_io_vec, bio_src->bi_io_vec,
> - bio_src->bi_max_vecs * sizeof(struct bio_vec));
> + memcpy(bio->bi_io_vec,
> + bio_iovec(bio_src),
> + bio_segments(bio_src) * sizeof(struct bio_vec));
>
> /*
> * most users will be overriding ->bi_bdev with a new target,
> @@ -459,10 +460,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
> bio->bi_sector = bio_src->bi_sector;
> bio->bi_bdev = bio_src->bi_bdev;
> bio->bi_flags |= 1 << BIO_CLONED;
> + bio->bi_flags &= ~(1 << BIO_SEG_VALID);
> bio->bi_rw = bio_src->bi_rw;
> - bio->bi_vcnt = bio_src->bi_vcnt;
> + bio->bi_vcnt = bio_segments(bio_src);
> bio->bi_size = bio_src->bi_size;
> - bio->bi_idx = bio_src->bi_idx;
> }
> EXPORT_SYMBOL(__bio_clone);
>
> @@ -477,7 +478,7 @@ EXPORT_SYMBOL(__bio_clone);
> struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
> struct bio_set *bs)
> {
> - struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
> + struct bio *b = bio_alloc_bioset(gfp_mask, bio_segments(bio), bs);
>
> if (!b)
> return NULL;
> @@ -507,7 +508,7 @@ EXPORT_SYMBOL(bio_clone);
>
> struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
> {
> - struct bio *b = bio_kmalloc(gfp_mask, bio->bi_max_vecs);
> + struct bio *b = bio_kmalloc(gfp_mask, bio_segments(bio));
>
> if (!b)
> return NULL;
> --
> 1.7.7.3
>
> --
> dm-devel mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/dm-devel
>

2012-08-07 03:19:37

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH v5 05/12] block: Kill bi_destructor

On Mon, Aug 06 2012 at 6:08pm -0400,
Kent Overstreet <[email protected]> wrote:

> Now that we've got generic code for freeing bios allocated from bio
> pools, this isn't needed anymore.
>
> This also changes the semantics of bio_free() a bit - it now also frees
> bios allocated by bio_kmalloc(). It's also no longer exported, as
> without bi_destructor there should be no need for it to be called
> anywhere else.

Seems you forgot to remove bio_free's EXPORT_SYMBOL

2012-08-08 22:06:20

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info

Hello,

On Mon, Aug 06, 2012 at 03:08:31PM -0700, Kent Overstreet wrote:
> Previously, dm_rq_clone_bio_info needed to be freed by the bio's
> destructor to avoid a memory leak in the blk_rq_prep_clone() error path.
> This gets rid of a memory allocation and means we can kill
> dm_rq_bio_destructor.
>
> Signed-off-by: Kent Overstreet <[email protected]>
> ---
> drivers/md/dm.c | 31 +++++--------------------------
> 1 files changed, 5 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 40b7735..4014696 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -92,6 +92,7 @@ struct dm_rq_target_io {
> struct dm_rq_clone_bio_info {
> struct bio *orig;
> struct dm_rq_target_io *tio;
> + struct bio clone;
> };
...
> @@ -2696,7 +2674,8 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity)
> if (!pools->tio_pool)
> goto free_io_pool_and_out;
>
> - pools->bs = bioset_create(pool_size, 0);
> + pools->bs = bioset_create(pool_size,
> + offsetof(struct dm_rq_clone_bio_info, orig));
> if (!pools->bs)
> goto free_tio_pool_and_out;

I do like this approach much better but this isn't something
super-obvious. Can we please explain what's going on? Especially,
the comment above dm_rq_clone_bio_info is outright misleading now.

Can someone more familiar review this one? Alasdir, Mike?

Also, how was this tested?

Thanks.

--
tejun

2012-08-08 22:11:36

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 03/12] block: Add bio_reset()

Hello,

On Mon, Aug 06, 2012 at 03:08:32PM -0700, Kent Overstreet wrote:
> Reusing bios is something that's been highly frowned upon in the past,
> but driver code keeps doing it anyways. If it's going to happen anyways,
> we should provide a generic method.
>
> This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
> was open coding it, by doing a bio_init() and resetting bi_destructor.
>
> v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
> bio->bi_flags are saved.
>
> Signed-off-by: Kent Overstreet <[email protected]>
> Change-Id: I4eb2975bd678d3be811d5423d0620b08020be9ff

Please drop Change-Id. Die gerrit die.

> +void bio_reset(struct bio *bio)
> +{
> + unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS);

How many flags are we talking about? If there aren't too many, I'd
prefer explicit BIO_FLAGS_PRESERVED or whatever.

Thanks.

--
tejun

2012-08-08 22:14:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 04/12] pktcdvd: Switch to bio_kmalloc()

Hello,

On Mon, Aug 06, 2012 at 03:08:33PM -0700, Kent Overstreet wrote:
> This is prep work for killing bi_destructor - previously, pktcdvd had
> its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
> necessitating its own bi_destructor implementation.
>
> v5: Un-reorder some functions, to make the patch easier to review
>
> Signed-off-by: Kent Overstreet <[email protected]>

Please Cc: the maintainers. Cc'ing Peter Osterlund and keeping the
whole body for him.

Generally looks good to me. How is this tested?

Thanks.

> ---
> drivers/block/pktcdvd.c | 67 +++++++++++-----------------------------------
> 1 files changed, 16 insertions(+), 51 deletions(-)
>
> diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
> index ba66e44..ae55f08 100644
> --- a/drivers/block/pktcdvd.c
> +++ b/drivers/block/pktcdvd.c
> @@ -101,6 +101,8 @@ static struct dentry *pkt_debugfs_root = NULL; /* /sys/kernel/debug/pktcdvd */
> static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev);
> static int pkt_remove_dev(dev_t pkt_dev);
> static int pkt_seq_show(struct seq_file *m, void *p);
> +static void pkt_end_io_read(struct bio *bio, int err);
> +static void pkt_end_io_packet_write(struct bio *bio, int err);
>
>
>
> @@ -522,38 +524,6 @@ static void pkt_bio_finished(struct pktcdvd_device *pd)
> }
> }
>
> -static void pkt_bio_destructor(struct bio *bio)
> -{
> - kfree(bio->bi_io_vec);
> - kfree(bio);
> -}
> -
> -static struct bio *pkt_bio_alloc(int nr_iovecs)
> -{
> - struct bio_vec *bvl = NULL;
> - struct bio *bio;
> -
> - bio = kmalloc(sizeof(struct bio), GFP_KERNEL);
> - if (!bio)
> - goto no_bio;
> - bio_init(bio);
> -
> - bvl = kcalloc(nr_iovecs, sizeof(struct bio_vec), GFP_KERNEL);
> - if (!bvl)
> - goto no_bvl;
> -
> - bio->bi_max_vecs = nr_iovecs;
> - bio->bi_io_vec = bvl;
> - bio->bi_destructor = pkt_bio_destructor;
> -
> - return bio;
> -
> - no_bvl:
> - kfree(bio);
> - no_bio:
> - return NULL;
> -}
> -
> /*
> * Allocate a packet_data struct
> */
> @@ -567,10 +537,13 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
> goto no_pkt;
>
> pkt->frames = frames;
> - pkt->w_bio = pkt_bio_alloc(frames);
> + pkt->w_bio = bio_kmalloc(GFP_KERNEL, frames);
> if (!pkt->w_bio)
> goto no_bio;
>
> + pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
> + pkt->w_bio->bi_private = pkt;
> +
> for (i = 0; i < frames / FRAMES_PER_PAGE; i++) {
> pkt->pages[i] = alloc_page(GFP_KERNEL|__GFP_ZERO);
> if (!pkt->pages[i])
> @@ -581,9 +554,12 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
> bio_list_init(&pkt->orig_bios);
>
> for (i = 0; i < frames; i++) {
> - struct bio *bio = pkt_bio_alloc(1);
> + struct bio *bio = bio_kmalloc(GFP_KERNEL, 1);
> if (!bio)
> goto no_rd_bio;
> +
> + bio->bi_end_io = pkt_end_io_read;
> + bio->bi_private = pkt;
> pkt->r_bios[i] = bio;
> }
>
> @@ -1111,21 +1087,15 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
> * Schedule reads for missing parts of the packet.
> */
> for (f = 0; f < pkt->frames; f++) {
> - struct bio_vec *vec;
> -
> int p, offset;
> +
> if (written[f])
> continue;
> +
> bio = pkt->r_bios[f];
> - vec = bio->bi_io_vec;
> - bio_init(bio);
> - bio->bi_max_vecs = 1;
> - bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
> - bio->bi_bdev = pd->bdev;
> - bio->bi_end_io = pkt_end_io_read;
> - bio->bi_private = pkt;
> - bio->bi_io_vec = vec;
> - bio->bi_destructor = pkt_bio_destructor;
> + bio_reset(bio);
> + bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
> + bio->bi_bdev = pd->bdev;
>
> p = (f * CD_FRAMESIZE) / PAGE_SIZE;
> offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
> @@ -1418,14 +1388,9 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
> }
>
> /* Start the write request */
> - bio_init(pkt->w_bio);
> - pkt->w_bio->bi_max_vecs = PACKET_MAX_SIZE;
> + bio_reset(pkt->w_bio);
> pkt->w_bio->bi_sector = pkt->sector;
> pkt->w_bio->bi_bdev = pd->bdev;
> - pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
> - pkt->w_bio->bi_private = pkt;
> - pkt->w_bio->bi_io_vec = bvec;
> - pkt->w_bio->bi_destructor = pkt_bio_destructor;
> for (f = 0; f < pkt->frames; f++)
> if (!bio_add_page(pkt->w_bio, bvec[f].bv_page, CD_FRAMESIZE, bvec[f].bv_offset))
> BUG();
> --
> 1.7.7.3
>

--
tejun

2012-08-08 22:22:30

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 05/12] block: Kill bi_destructor

Hello,

On Mon, Aug 06, 2012 at 03:08:34PM -0700, Kent Overstreet wrote:
> Now that we've got generic code for freeing bios allocated from bio
> pools, this isn't needed anymore.
>
> This also changes the semantics of bio_free() a bit - it now also frees
> bios allocated by bio_kmalloc(). It's also no longer exported, as
> without bi_destructor there should be no need for it to be called
> anywhere else.
>
> v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
>
> Signed-off-by: Kent Overstreet <[email protected]>
> ---
> diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
> index 920ede2..19bf632 100644
> --- a/drivers/block/drbd/drbd_main.c
> +++ b/drivers/block/drbd/drbd_main.c
> @@ -161,23 +161,12 @@ static const struct block_device_operations drbd_ops = {
> .release = drbd_release,
> };
>
> -static void bio_destructor_drbd(struct bio *bio)
> -{
> - bio_free(bio, drbd_md_io_bio_set);
> -}
> -
> struct bio *bio_alloc_drbd(gfp_t gfp_mask)
> {
> - struct bio *bio;
> -
> if (!drbd_md_io_bio_set)
> return bio_alloc(gfp_mask, 1);
>
> - bio = bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
> - if (!bio)
> - return NULL;
> - bio->bi_destructor = bio_destructor_drbd;
> - return bio;
> + return bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
> }

Does this chunk belong to this patch?

> @@ -56,6 +56,8 @@ static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly = {
> */
> struct bio_set *fs_bio_set;
>
> +#define BIO_KMALLOC_POOL ((void *) ~0)

What's wrong with good ol' NULL?

Thanks.

--
tejun

2012-08-08 22:25:23

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 01/12] block: Generalized bio pool freeing

On Mon, Aug 06, 2012 at 03:08:30PM -0700, Kent Overstreet wrote:
> @@ -422,7 +409,11 @@ void bio_put(struct bio *bio)
> if (atomic_dec_and_test(&bio->bi_cnt)) {
> bio_disassociate_task(bio);
> bio->bi_next = NULL;
> - bio->bi_destructor(bio);
> +
> + if (bio->bi_pool)
> + bio_free(bio, bio->bi_pool);
> + else
> + bio->bi_destructor(bio);

So, this bi_pool overriding caller specified custom bi_destructor is
rather unusual. I know why it's like that - the patch series is
gradually replacing bi_destructor with bi_pool and removes
bi_destructor eventually, but it would be far better if at least patch
description says why this is unusual like this.

Thanks.

--
tejun

2012-08-08 22:28:49

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 06/12] block: Add an explicit bio flag for bios that own their bvec

On Mon, Aug 06, 2012 at 03:08:35PM -0700, Kent Overstreet wrote:
> This is for the new bio splitting code. When we split a bio, if the
> split occured on a bvec boundry we reuse the bvec for the new bio. But
> that means bio_free() can't free it, hence the explicit flag.
>
> Signed-off-by: Kent Overstreet <[email protected]>

Sans how the flag is preserved,

Acked-by: Tejun Heo <[email protected]>

Thanks.

--
tejun

2012-08-08 22:58:47

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] block: Introduce new bio_split()

Hello,

On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> /**
> + * bio_split - split a bio
> + * @bio: bio to split
> + * @sectors: number of sectors to split from the front of @bio
> + * @gfp: gfp mask
> + * @bs: bio set to allocate from
> + *
> + * Allocates and returns a new bio which represents @sectors from the start of
> + * @bio, and updates @bio to represent the remaining sectors.
> + *
> + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
> + * unchanged.

Umm.... I don't know. This is rather confusing. The function may
return new or old bios? What's the rationale behind it? Return
ERR_PTR(-EINVAL) instead?

> + *
> + * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
> + * bvec boundry; it is the caller's responsibility to ensure that @bio is not
> + * freed before the split.

This is somewhat error-prone. Given how splits are used now, this
might not be a big issue but it isn't difficult to imagine how this
could go subtly wrong. More on this.

> + *
> + * BIG FAT WARNING:
> + *
> + * If you're calling this from under generic_make_request() (i.e.
> + * current->bio_list != NULL), you should mask out __GFP_WAIT and punt to
> + * workqueue if the allocation fails. Otherwise, your code will probably
> + * deadlock.

If the condition is detectable, WARN_ON_ONCE() please.

> + * You can't allocate more than once from the same bio pool without submitting
> + * the previous allocations (so they'll eventually complete and deallocate
> + * themselves), but if you're under generic_make_request() those previous
> + * allocations won't submit until you return . And if you have to split bios,
^
extra space
> + * you should expect that some bios will require multiple splits.
> + */
> +struct bio *bio_split(struct bio *bio, int sectors,
> + gfp_t gfp, struct bio_set *bs)
> +{
> + unsigned idx, vcnt = 0, nbytes = sectors << 9;
> + struct bio_vec *bv;
> + struct bio *ret = NULL;
> +
> + BUG_ON(sectors <= 0);
> +
> + if (sectors >= bio_sectors(bio))
> + return bio;
> +
> + trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
> + bio->bi_sector + sectors);
> +
> + bio_for_each_segment(bv, bio, idx) {
> + vcnt = idx - bio->bi_idx;
> +
> + if (!nbytes) {
> + ret = bio_alloc_bioset(gfp, 0, bs);
> + if (!ret)
> + return NULL;
> +
> + ret->bi_io_vec = bio_iovec(bio);
> + ret->bi_flags |= 1 << BIO_CLONED;
> + break;
> + } else if (nbytes < bv->bv_len) {
> + ret = bio_alloc_bioset(gfp, ++vcnt, bs);
> + if (!ret)
> + return NULL;
> +
> + memcpy(ret->bi_io_vec, bio_iovec(bio),
> + sizeof(struct bio_vec) * vcnt);
> +
> + ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
> + bv->bv_offset += nbytes;
> + bv->bv_len -= nbytes;
> + break;
> + }

Ummm... ISTR reviewing this code and getting confused by bio_alloc
inside bio_for_each_segment() loop and commenting something about
that. Yeah, this one.

http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/15790/focus=370

So, I actually have reviewed this but didn't get any response and
majority of the issues I raised aren't addressed and you sent the
patch to me again? What the hell, Kent?

> +
> + nbytes -= bv->bv_len;
> + }
> +
> + ret->bi_bdev = bio->bi_bdev;
> + ret->bi_sector = bio->bi_sector;
> + ret->bi_size = sectors << 9;
> + ret->bi_rw = bio->bi_rw;
> + ret->bi_vcnt = vcnt;
> + ret->bi_max_vecs = vcnt;
> + ret->bi_end_io = bio->bi_end_io;

Is this safe? Why isn't this chaining completion of split bio to the
original one?

Thanks.

--
tejun

2012-08-08 23:05:40

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] block: Introduce new bio_split()

One more thing.

On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> + if (bio_integrity(bio)) {
> + bio_integrity_clone(ret, bio, gfp, bs);
> + bio_integrity_trim(ret, 0, bio_sectors(ret));
> + bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));

Is this equivalent to bio_integrity_split() performance-wise?

Thanks.

--
tejun

2012-08-08 23:09:55

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 09/12] block: Rework bio_pair_split()

Hello,

On Mon, Aug 06, 2012 at 03:08:38PM -0700, Kent Overstreet wrote:
> This changes bio_pair_split() to use the new bio_split() underneath,
> which gets rid of the single page bio limitation. The various callers
> are fixed up for the slightly different struct bio_pair, and to remove
> the unnecessary checks.
>
> v5: Move extern declaration to proper patch, per Boaz

I don't get this. Why can't bio_split() chain the split to the
original one thus make bio_pair unnecessary? It's not like completing
the split bio with the same end_io ever makes sense.

Thanks.

--
tejun

2012-08-08 23:16:00

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 10/12] block: Add bio_clone_kmalloc()

On Mon, Aug 06, 2012 at 03:08:39PM -0700, Kent Overstreet wrote:

How about the following?

There was no API to kmalloc bio and clone and osdblk was using
explicit bio_kmalloc() + __bio_clone(). (my guess here) As this is
inconvenient and there will be more users of it in the future, add
bio_clone_kmalloc() and use it in osdblk.

> Acked-by: Boaz Harrosh <[email protected]>
> Signed-off-by: Kent Overstreet <[email protected]>
> ---
> drivers/block/osdblk.c | 3 +--
> fs/bio.c | 13 +++++++++++++
> fs/exofs/ore.c | 5 ++---
> include/linux/bio.h | 1 +
> 4 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
> index 87311eb..1bbc681 100644
> --- a/drivers/block/osdblk.c
> +++ b/drivers/block/osdblk.c
> @@ -266,11 +266,10 @@ static struct bio *bio_chain_clone(struct bio *old_chain, gfp_t gfpmask)
> struct bio *tmp, *new_chain = NULL, *tail = NULL;
>
> while (old_chain) {
> - tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
> + tmp = bio_clone_kmalloc(old_chain, gfpmask);
> if (!tmp)
> goto err_out;
>
> - __bio_clone(tmp, old_chain);
> tmp->bi_bdev = NULL;
> gfpmask &= ~__GFP_WAIT;
> tmp->bi_next = NULL;
> diff --git a/fs/bio.c b/fs/bio.c
> index f0c865b..77b9313 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -497,6 +497,19 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
> }
> EXPORT_SYMBOL(bio_clone);
>

/**

PLEASE.

> +struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
> +{
> + struct bio *b = bio_kmalloc(gfp_mask, bio->bi_max_vecs);

Can't we use %NULL bioset as an indication to allocate from kmalloc
instead of duping interfaces like this?

Thanks.

--
tejun

2012-08-08 23:21:27

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 11/12] block: Add bio_clone_bioset()

On Mon, Aug 06, 2012 at 03:08:40PM -0700, Kent Overstreet wrote:
> This consolidates some code, and will help in a later patch changing how
> bio cloning works.

I think it would be better to introduce bio_clone*() functions in a
separate patch and convert the users in a different one.

> /**
> - * bio_clone - clone a bio
> + * bio_clone_bioset - clone a bio
> * @bio: bio to clone
> * @gfp_mask: allocation priority
> + * @bs: bio_set to allocate from
> *
> * Like __bio_clone, only also allocates the returned bio
> */
> -struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
> +struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
> + struct bio_set *bs)
> {
> - struct bio *b = bio_alloc(gfp_mask, bio->bi_max_vecs);
> + struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
>
> if (!b)
> return NULL;
> @@ -485,7 +487,7 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
> if (bio_integrity(bio)) {
> int ret;
>
> - ret = bio_integrity_clone(b, bio, gfp_mask, fs_bio_set);
> + ret = bio_integrity_clone(b, bio, gfp_mask, bs);
>
> if (ret < 0) {
> bio_put(b);
> @@ -495,6 +497,12 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
>
> return b;
> }
> +EXPORT_SYMBOL(bio_clone_bioset);
> +
> +struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
> +{
> + return bio_clone_bioset(bio, gfp_mask, fs_bio_set);
> +}

So, bio_clone() loses its function comment. Also, does it even make
sense to call bio_clone() from fs_bio_set? Let's say it's so, then
what's the difference from using _kmalloc variant?

Thanks.

--
tejun

2012-08-08 23:28:13

by Tejun Heo

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH v5 12/12] block: Only clone bio vecs that are in use

On Mon, Aug 06, 2012 at 07:16:33PM -0400, Mikulas Patocka wrote:
> Hi Kent
>
> When you change the semantics of an exported function, rename that
> function. There may be external modules that use __bio_clone and this
> change could silently introduce bugs in them.
>
> Otherwise, the patchset looks fine.

I don't know. This doesn't change the main functionality and should
be transparent unless the caller is doing something crazy. It *might*
be nice to rename but I don't think that's a must here.

Thanks.

--
tejun

2012-08-08 23:30:14

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 12/12] block: Only clone bio vecs that are in use

Hello,

On Mon, Aug 06, 2012 at 03:08:41PM -0700, Kent Overstreet wrote:
> @@ -459,10 +460,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
> bio->bi_sector = bio_src->bi_sector;
> bio->bi_bdev = bio_src->bi_bdev;
> bio->bi_flags |= 1 << BIO_CLONED;
> + bio->bi_flags &= ~(1 << BIO_SEG_VALID);

This isn't obvious at all. Why no explanation anywhere? Also it
would be nice to update comments of the updated functions so that it's
clear that only partial cloning happens.

Thanks.

--
tejun

2012-08-08 23:47:48

by Muthu Kumar

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH v5 12/12] block: Only clone bio vecs that are in use

Tejun,

This is changing the semantics of the clone. Sorry, I missed this
thread and replied separately. But anyway, replying it again here:


On Wed, Aug 8, 2012 at 4:28 PM, Tejun Heo <[email protected]> wrote:
> On Mon, Aug 06, 2012 at 07:16:33PM -0400, Mikulas Patocka wrote:
>> Hi Kent
>>
>> When you change the semantics of an exported function, rename that
>> function. There may be external modules that use __bio_clone and this
>> change could silently introduce bugs in them.
>>
>> Otherwise, the patchset looks fine.
>
> I don't know. This doesn't change the main functionality and should
> be transparent unless the caller is doing something crazy. It *might*
> be nice to rename but I don't think that's a must here.
>
> Thanks.

--
You are changing the meaning of __bio_clone() here. In old code, the
number of io_vecs, bi_idx, bi_vcnt are preserved. But in this modified
code, you are mapping bio_src's bi_iovec[bi_idx] to bio_dests
bi_iovec[0] and also restricting the number of allocated io_vecs of
the clone. It may be useful for cases were we would like a identical
copy of the original bio (may not be in current code base, but this
implementation is definitely not what one would expect from the name
"clone").

May be, call this new implementation some thing else (and use it for bcache)?

---

Like Mikulas pointed out, this is an exported function and silently
changing the semantics will break external modules.

Regards,
Muthu


>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2012-08-08 23:58:09

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info

On Wed, Aug 08, 2012 at 03:06:12PM -0700, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 06, 2012 at 03:08:31PM -0700, Kent Overstreet wrote:
> > Previously, dm_rq_clone_bio_info needed to be freed by the bio's
> > destructor to avoid a memory leak in the blk_rq_prep_clone() error path.
> > This gets rid of a memory allocation and means we can kill
> > dm_rq_bio_destructor.
> >
> > Signed-off-by: Kent Overstreet <[email protected]>
> > ---
> > drivers/md/dm.c | 31 +++++--------------------------
> > 1 files changed, 5 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index 40b7735..4014696 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -92,6 +92,7 @@ struct dm_rq_target_io {
> > struct dm_rq_clone_bio_info {
> > struct bio *orig;
> > struct dm_rq_target_io *tio;
> > + struct bio clone;
> > };
> ...
> > @@ -2696,7 +2674,8 @@ struct dm_md_mempools *dm_alloc_md_mempools(unsigned type, unsigned integrity)
> > if (!pools->tio_pool)
> > goto free_io_pool_and_out;
> >
> > - pools->bs = bioset_create(pool_size, 0);
> > + pools->bs = bioset_create(pool_size,
> > + offsetof(struct dm_rq_clone_bio_info, orig));
> > if (!pools->bs)
> > goto free_tio_pool_and_out;
>
> I do like this approach much better but this isn't something
> super-obvious. Can we please explain what's going on? Especially,
> the comment above dm_rq_clone_bio_info is outright misleading now.

This look better to you?

/*
* For request-based dm - the bio clones we allocate are embedded in these
* structs.
*
* We allocate these with bio_alloc_bioset, using the front_pad parameter when
* the bioset is created - this means the bio has to come at the end of the
* struct.
*/
struct dm_rq_clone_bio_info {
struct bio *orig;
struct dm_rq_target_io *tio;
struct bio clone;
};

> Can someone more familiar review this one? Alasdir, Mike?
>
> Also, how was this tested?

Well, AFAICT the only request based dm target is multipath, and from the
documentation I've seen it doesn't appear to work without multipath
hardware, or at least I haven't seen it documented how. So, unless
there's another user I missed it's not been tested.

>
> Thanks.
>
> --
> tejun

2012-08-09 00:07:48

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 03/12] block: Add bio_reset()

On Wed, Aug 08, 2012 at 03:11:29PM -0700, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 06, 2012 at 03:08:32PM -0700, Kent Overstreet wrote:
> > Reusing bios is something that's been highly frowned upon in the past,
> > but driver code keeps doing it anyways. If it's going to happen anyways,
> > we should provide a generic method.
> >
> > This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
> > was open coding it, by doing a bio_init() and resetting bi_destructor.
> >
> > v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
> > bio->bi_flags are saved.
> >
> > Signed-off-by: Kent Overstreet <[email protected]>
> > Change-Id: I4eb2975bd678d3be811d5423d0620b08020be9ff
>
> Please drop Change-Id. Die gerrit die.

Bah, missed that one.

> > +void bio_reset(struct bio *bio)
> > +{
> > + unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS);
>
> How many flags are we talking about? If there aren't too many, I'd
> prefer explicit BIO_FLAGS_PRESERVED or whatever.

It mostly isn't actual flags that are preserved - the high bits of the
flags are used for indicating what slab the bvec was allocated from, and
that's the main thing that has to be preserved.

So that's why I went with defining the things that are reset instead of
the things that are preserved.

I would prefer if bitfields were used for at least BIO_POOL_IDX, but the
problem is flags is used as an atomic bit vector for BIO_UPTODATE.

But flags isn't treated as an atomic bit vector elsewhere -
bio_flagged() doesn't use test_bit(), and flags are set/cleared with
atomic bit operations in some places but not in others (probably _most_
of them are technically safe, but... ick).

>
> Thanks.
>
> --
> tejun

2012-08-09 00:09:26

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 04/12] pktcdvd: Switch to bio_kmalloc()

On Wed, Aug 08, 2012 at 03:13:59PM -0700, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 06, 2012 at 03:08:33PM -0700, Kent Overstreet wrote:
> > This is prep work for killing bi_destructor - previously, pktcdvd had
> > its own pkt_bio_alloc which was basically duplication bio_kmalloc(),
> > necessitating its own bi_destructor implementation.
> >
> > v5: Un-reorder some functions, to make the patch easier to review
> >
> > Signed-off-by: Kent Overstreet <[email protected]>
>
> Please Cc: the maintainers. Cc'ing Peter Osterlund and keeping the
> whole body for him.

Whoops, thanks.

> Generally looks good to me. How is this tested?

Untested - no hardware for it.

>
> Thanks.
>
> > ---
> > drivers/block/pktcdvd.c | 67 +++++++++++-----------------------------------
> > 1 files changed, 16 insertions(+), 51 deletions(-)
> >
> > diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
> > index ba66e44..ae55f08 100644
> > --- a/drivers/block/pktcdvd.c
> > +++ b/drivers/block/pktcdvd.c
> > @@ -101,6 +101,8 @@ static struct dentry *pkt_debugfs_root = NULL; /* /sys/kernel/debug/pktcdvd */
> > static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev);
> > static int pkt_remove_dev(dev_t pkt_dev);
> > static int pkt_seq_show(struct seq_file *m, void *p);
> > +static void pkt_end_io_read(struct bio *bio, int err);
> > +static void pkt_end_io_packet_write(struct bio *bio, int err);
> >
> >
> >
> > @@ -522,38 +524,6 @@ static void pkt_bio_finished(struct pktcdvd_device *pd)
> > }
> > }
> >
> > -static void pkt_bio_destructor(struct bio *bio)
> > -{
> > - kfree(bio->bi_io_vec);
> > - kfree(bio);
> > -}
> > -
> > -static struct bio *pkt_bio_alloc(int nr_iovecs)
> > -{
> > - struct bio_vec *bvl = NULL;
> > - struct bio *bio;
> > -
> > - bio = kmalloc(sizeof(struct bio), GFP_KERNEL);
> > - if (!bio)
> > - goto no_bio;
> > - bio_init(bio);
> > -
> > - bvl = kcalloc(nr_iovecs, sizeof(struct bio_vec), GFP_KERNEL);
> > - if (!bvl)
> > - goto no_bvl;
> > -
> > - bio->bi_max_vecs = nr_iovecs;
> > - bio->bi_io_vec = bvl;
> > - bio->bi_destructor = pkt_bio_destructor;
> > -
> > - return bio;
> > -
> > - no_bvl:
> > - kfree(bio);
> > - no_bio:
> > - return NULL;
> > -}
> > -
> > /*
> > * Allocate a packet_data struct
> > */
> > @@ -567,10 +537,13 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
> > goto no_pkt;
> >
> > pkt->frames = frames;
> > - pkt->w_bio = pkt_bio_alloc(frames);
> > + pkt->w_bio = bio_kmalloc(GFP_KERNEL, frames);
> > if (!pkt->w_bio)
> > goto no_bio;
> >
> > + pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
> > + pkt->w_bio->bi_private = pkt;
> > +
> > for (i = 0; i < frames / FRAMES_PER_PAGE; i++) {
> > pkt->pages[i] = alloc_page(GFP_KERNEL|__GFP_ZERO);
> > if (!pkt->pages[i])
> > @@ -581,9 +554,12 @@ static struct packet_data *pkt_alloc_packet_data(int frames)
> > bio_list_init(&pkt->orig_bios);
> >
> > for (i = 0; i < frames; i++) {
> > - struct bio *bio = pkt_bio_alloc(1);
> > + struct bio *bio = bio_kmalloc(GFP_KERNEL, 1);
> > if (!bio)
> > goto no_rd_bio;
> > +
> > + bio->bi_end_io = pkt_end_io_read;
> > + bio->bi_private = pkt;
> > pkt->r_bios[i] = bio;
> > }
> >
> > @@ -1111,21 +1087,15 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
> > * Schedule reads for missing parts of the packet.
> > */
> > for (f = 0; f < pkt->frames; f++) {
> > - struct bio_vec *vec;
> > -
> > int p, offset;
> > +
> > if (written[f])
> > continue;
> > +
> > bio = pkt->r_bios[f];
> > - vec = bio->bi_io_vec;
> > - bio_init(bio);
> > - bio->bi_max_vecs = 1;
> > - bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
> > - bio->bi_bdev = pd->bdev;
> > - bio->bi_end_io = pkt_end_io_read;
> > - bio->bi_private = pkt;
> > - bio->bi_io_vec = vec;
> > - bio->bi_destructor = pkt_bio_destructor;
> > + bio_reset(bio);
> > + bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
> > + bio->bi_bdev = pd->bdev;
> >
> > p = (f * CD_FRAMESIZE) / PAGE_SIZE;
> > offset = (f * CD_FRAMESIZE) % PAGE_SIZE;
> > @@ -1418,14 +1388,9 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
> > }
> >
> > /* Start the write request */
> > - bio_init(pkt->w_bio);
> > - pkt->w_bio->bi_max_vecs = PACKET_MAX_SIZE;
> > + bio_reset(pkt->w_bio);
> > pkt->w_bio->bi_sector = pkt->sector;
> > pkt->w_bio->bi_bdev = pd->bdev;
> > - pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
> > - pkt->w_bio->bi_private = pkt;
> > - pkt->w_bio->bi_io_vec = bvec;
> > - pkt->w_bio->bi_destructor = pkt_bio_destructor;
> > for (f = 0; f < pkt->frames; f++)
> > if (!bio_add_page(pkt->w_bio, bvec[f].bv_page, CD_FRAMESIZE, bvec[f].bv_offset))
> > BUG();
> > --
> > 1.7.7.3
> >
>
> --
> tejun

2012-08-09 00:15:05

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 05/12] block: Kill bi_destructor

On Mon, Aug 06, 2012 at 11:19:21PM -0400, Mike Snitzer wrote:
> On Mon, Aug 06 2012 at 6:08pm -0400,
> Kent Overstreet <[email protected]> wrote:
>
> > Now that we've got generic code for freeing bios allocated from bio
> > pools, this isn't needed anymore.
> >
> > This also changes the semantics of bio_free() a bit - it now also frees
> > bios allocated by bio_kmalloc(). It's also no longer exported, as
> > without bi_destructor there should be no need for it to be called
> > anywhere else.
>
> Seems you forgot to remove bio_free's EXPORT_SYMBOL

Whoops - thanks, fixed.

2012-08-09 00:22:31

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 05/12] block: Kill bi_destructor

On Wed, Aug 08, 2012 at 03:22:23PM -0700, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 06, 2012 at 03:08:34PM -0700, Kent Overstreet wrote:
> > Now that we've got generic code for freeing bios allocated from bio
> > pools, this isn't needed anymore.
> >
> > This also changes the semantics of bio_free() a bit - it now also frees
> > bios allocated by bio_kmalloc(). It's also no longer exported, as
> > without bi_destructor there should be no need for it to be called
> > anywhere else.
> >
> > v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
> >
> > Signed-off-by: Kent Overstreet <[email protected]>
> > ---
> > diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
> > index 920ede2..19bf632 100644
> > --- a/drivers/block/drbd/drbd_main.c
> > +++ b/drivers/block/drbd/drbd_main.c
> > @@ -161,23 +161,12 @@ static const struct block_device_operations drbd_ops = {
> > .release = drbd_release,
> > };
> >
> > -static void bio_destructor_drbd(struct bio *bio)
> > -{
> > - bio_free(bio, drbd_md_io_bio_set);
> > -}
> > -
> > struct bio *bio_alloc_drbd(gfp_t gfp_mask)
> > {
> > - struct bio *bio;
> > -
> > if (!drbd_md_io_bio_set)
> > return bio_alloc(gfp_mask, 1);
> >
> > - bio = bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
> > - if (!bio)
> > - return NULL;
> > - bio->bi_destructor = bio_destructor_drbd;
> > - return bio;
> > + return bio_alloc_bioset(gfp_mask, 1, drbd_md_io_bio_set);
> > }
>
> Does this chunk belong to this patch?

Hrm, that should've been in the first patch. Will move it.

>
> > @@ -56,6 +56,8 @@ static struct biovec_slab bvec_slabs[BIOVEC_NR_POOLS] __read_mostly = {
> > */
> > struct bio_set *fs_bio_set;
> >
> > +#define BIO_KMALLOC_POOL ((void *) ~0)
>
> What's wrong with good ol' NULL?

If it's NULL, we can't distinguish between bios where that field wasn't
set (i.e. bios that were statically allocated somewhere) from bios that
were allocated by bio_kmalloc().

It's just there to make debugging easier - if bi_cnt goes to 0 on a bio
where it shouldn't we'll catch it at the BUG_ON() in bio_free() instead
of kfreeing a bad pointer.

>
> Thanks.
>
> --
> tejun

2012-08-09 00:27:30

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 01/12] block: Generalized bio pool freeing

On Wed, Aug 08, 2012 at 03:25:15PM -0700, Tejun Heo wrote:
> On Mon, Aug 06, 2012 at 03:08:30PM -0700, Kent Overstreet wrote:
> > @@ -422,7 +409,11 @@ void bio_put(struct bio *bio)
> > if (atomic_dec_and_test(&bio->bi_cnt)) {
> > bio_disassociate_task(bio);
> > bio->bi_next = NULL;
> > - bio->bi_destructor(bio);
> > +
> > + if (bio->bi_pool)
> > + bio_free(bio, bio->bi_pool);
> > + else
> > + bio->bi_destructor(bio);
>
> So, this bi_pool overriding caller specified custom bi_destructor is
> rather unusual. I know why it's like that - the patch series is
> gradually replacing bi_destructor with bi_pool and removes
> bi_destructor eventually, but it would be far better if at least patch
> description says why this is unusual like this.

Ok, I'll stick a comment in there:

if (atomic_dec_and_test(&bio->bi_cnt)) {
bio_disassociate_task(bio);
bio->bi_next = NULL;

/*
* This if statement is temporary - bi_pool is replacing
* bi_destructor, but bi_destructor will be taken out in another
* patch.
*/
if (bio->bi_pool)
bio_free(bio, bio->bi_pool);
else
bio->bi_destructor(bio);
}

>
> Thanks.
>
> --
> tejun

2012-08-09 01:27:42

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] block: Introduce new bio_split()

On Wed, Aug 08, 2012 at 03:58:39PM -0700, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> > /**
> > + * bio_split - split a bio
> > + * @bio: bio to split
> > + * @sectors: number of sectors to split from the front of @bio
> > + * @gfp: gfp mask
> > + * @bs: bio set to allocate from
> > + *
> > + * Allocates and returns a new bio which represents @sectors from the start of
> > + * @bio, and updates @bio to represent the remaining sectors.
> > + *
> > + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
> > + * unchanged.
>
> Umm.... I don't know. This is rather confusing. The function may
> return new or old bios? What's the rationale behind it? Return
> ERR_PTR(-EINVAL) instead?

Returning the old bio would be semantically equivalent to returning an
error, but IME when you're actually using it it does make sense and
leads to slightly cleaner code.

The reason is that when you're splitting, sectors is typically just the
maximum number of sectors you can handle here - you calculate the device
limit, or the number of sectors you can read from this location, or
whatever.

So the code ends up looking like:

while (1) {
split = bio_split(bio, sectors);

/* do some stuff to split and submit it */

/* check if that was the last split and break */
}

If bio_split returned an error, it'd make the code more convoluted -
you'd have to do work on either the split or the original bio, and then
repeat the same check later when it's time to break out of the loop.


>
> > + *
> > + * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
> > + * bvec boundry; it is the caller's responsibility to ensure that @bio is not
> > + * freed before the split.
>
> This is somewhat error-prone. Given how splits are used now, this
> might not be a big issue but it isn't difficult to imagine how this
> could go subtly wrong. More on this.

I can't find anything else in your emails on the subject...

So, I do agree, but there is a rationale:

Due to the way bio completions have to be chained, I'm not convinced
it's much of an issue in practice; if you're processing a bio by
splitting it, you can't complete it until all the splits have
completed, so you have to have a hook there.

In order for this to lead to a bug, you'd have to be cloning the
original bio (i.e. you can't be splitting a bio that someone else owns
and passed you, because that won't be freed until after you complete it)
and then you have to fail to put/free that clone in your hook, where
you're going to have other state to free too.

Cloning a bio and then not explicitly freeing it ought to be fairly
obviously wrong, IMO.

I think there's a more positive reason to do it this way long term, too.
I'm working towards getting rid of arbitrary restrictions in the block
layer, and forcing bio_split() to allocate the bvec introduces one;
allocating a bio with more than BIO_MAX_VECS will fail, and that _is_
the kind of tricky restriction that's likely to trip callers up (it's
certainly happened to me, I think multiple times).

Currently this is still an issue if the split isn't aligned on a bvec
boundary, but that's also fixable - by making the bvec immutable, which
would have a lot of other benefits too.

Making bio vecs immutable would also solve the original problem, because
cloning a bio would no longer clone the bvec as well - so the bvec the
split points to would _always_ be owned by something higher up that
couldn't free it until after the split completes.

>
> > + *
> > + * BIG FAT WARNING:
> > + *
> > + * If you're calling this from under generic_make_request() (i.e.
> > + * current->bio_list != NULL), you should mask out __GFP_WAIT and punt to
> > + * workqueue if the allocation fails. Otherwise, your code will probably
> > + * deadlock.
>
> If the condition is detectable, WARN_ON_ONCE() please.

Ok, I like that.

>
> > + * You can't allocate more than once from the same bio pool without submitting
> > + * the previous allocations (so they'll eventually complete and deallocate
> > + * themselves), but if you're under generic_make_request() those previous
> > + * allocations won't submit until you return . And if you have to split bios,
> ^
> extra space
> > + * you should expect that some bios will require multiple splits.
> > + */
> > +struct bio *bio_split(struct bio *bio, int sectors,
> > + gfp_t gfp, struct bio_set *bs)
> > +{
> > + unsigned idx, vcnt = 0, nbytes = sectors << 9;
> > + struct bio_vec *bv;
> > + struct bio *ret = NULL;
> > +
> > + BUG_ON(sectors <= 0);
> > +
> > + if (sectors >= bio_sectors(bio))
> > + return bio;
> > +
> > + trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
> > + bio->bi_sector + sectors);
> > +
> > + bio_for_each_segment(bv, bio, idx) {
> > + vcnt = idx - bio->bi_idx;
> > +
> > + if (!nbytes) {
> > + ret = bio_alloc_bioset(gfp, 0, bs);
> > + if (!ret)
> > + return NULL;
> > +
> > + ret->bi_io_vec = bio_iovec(bio);
> > + ret->bi_flags |= 1 << BIO_CLONED;
> > + break;
> > + } else if (nbytes < bv->bv_len) {
> > + ret = bio_alloc_bioset(gfp, ++vcnt, bs);
> > + if (!ret)
> > + return NULL;
> > +
> > + memcpy(ret->bi_io_vec, bio_iovec(bio),
> > + sizeof(struct bio_vec) * vcnt);
> > +
> > + ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
> > + bv->bv_offset += nbytes;
> > + bv->bv_len -= nbytes;
> > + break;
> > + }
>
> Ummm... ISTR reviewing this code and getting confused by bio_alloc
> inside bio_for_each_segment() loop and commenting something about
> that. Yeah, this one.
>
> http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/15790/focus=370
>
> So, I actually have reviewed this but didn't get any response and
> majority of the issues I raised aren't addressed and you sent the
> patch to me again? What the hell, Kent?

Argh. I apologize, I knew I'd missing something. Cutting and pasting the
stuff I haven't already responded to/fixed:

>> + ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
>> + bv->bv_offset += nbytes;
>> + bv->bv_len -= nbytes;
>
> Please don't indent assignments.

Ok, unindented those.

>
>> + break;
>> + }
>> +
>> + nbytes -= bv->bv_len;
>> + }
>
> I find the code a bit confusing. Wouldn't it be better to structure
> it as
>
> bio_for_each_segment() {
> find splitting point;
> }
>
> Do all of the splitting.

Definitely, except I don't see how to sanely do it that way with the
different cases for splitting on a bvec boundry and not.

I would like to get rid of that conditional eventually, but by making
bvecs immutable.

>
>> + ret->bi_bdev = bio->bi_bdev;
>> + ret->bi_sector = bio->bi_sector;
>> + ret->bi_size = sectors << 9;
>> + ret->bi_rw = bio->bi_rw;
>> + ret->bi_vcnt = vcnt;
>> + ret->bi_max_vecs = vcnt;
>> + ret->bi_end_io = bio->bi_end_io;
>> + ret->bi_private = bio->bi_private;
>>
>> - bio_endio(master, bp->error);
>> - mempool_free(bp, bp->bio2.bi_private);
>> + bio->bi_sector += sectors;
>> + bio->bi_size -= sectors << 9;
>> + bio->bi_idx = idx;
>
> I personally would prefer not having indentations here either.

These I'd prefer to keep - it is a dozen assignments in a row, I
_really_ find the indented version more readable.

> So, before, split wouldn't override orig->bi_private. Now, it does so
> while the bio is in flight. I don't know. If the only benefit of
> temporary override is avoiding have a separate end_io call, I think
> not doing so is better. Also, behavior changes as subtle as this
> *must* be noted in the patch description.

Already said more about this below, but to elaborate a bit - there are
situations where the caller really wouldn't want the completions chained
(i.e, if the splits are going to different devices or otherwise are
going to have different error handling, the caller really needs to
supply its own endio function(s)).

The old behaviour is still available (certainly there are cases where it
_is_ what you want) - it's just been decoupled a bit.

>
> > +
> > + nbytes -= bv->bv_len;
> > + }
> > +
> > + ret->bi_bdev = bio->bi_bdev;
> > + ret->bi_sector = bio->bi_sector;
> > + ret->bi_size = sectors << 9;
> > + ret->bi_rw = bio->bi_rw;
> > + ret->bi_vcnt = vcnt;
> > + ret->bi_max_vecs = vcnt;
> > + ret->bi_end_io = bio->bi_end_io;
>
> Is this safe? Why isn't this chaining completion of split bio to the
> original one?

Outside the scope of this function - if you want the completions
chained, you'd use bio_pair_split().

With this bio_split() it's perfectly reasonable to split a bio an
arbitrary number of times, and if that's what you're doing it's much
cleaner (and more efficient) to just use a refcount instead of chaining
the completions a bunch of times.

So if that's what the caller is doing, this will do exactly what they
want - if the caller wants to chain the completions, the caller can
still do that (like how bio_pair_split() does in the next patch).

>
> Thanks.
>
> --
> tejun

2012-08-09 01:40:00

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] block: Introduce new bio_split()

On Wed, Aug 08, 2012 at 04:05:32PM -0700, Tejun Heo wrote:
> One more thing.
>
> On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> > + if (bio_integrity(bio)) {
> > + bio_integrity_clone(ret, bio, gfp, bs);
> > + bio_integrity_trim(ret, 0, bio_sectors(ret));
> > + bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
>
> Is this equivalent to bio_integrity_split() performance-wise?

Strictly speaking, no. But it has the advantage of being drastically
simpler - and the only one only worked for single page bios so I
would've had to come up with something new from scratch, and as
confusing as the integrity stuff is I wouldn't trust the result.

I'm skeptical that it's going to matter in practice given how much
iteration is done elsewhere in the course of processing a bio and given
that this stuff isn't used with high end SSDs...

2012-08-09 01:57:46

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 10/12] block: Add bio_clone_kmalloc()

On Wed, Aug 08, 2012 at 04:15:52PM -0700, Tejun Heo wrote:
> On Mon, Aug 06, 2012 at 03:08:39PM -0700, Kent Overstreet wrote:
>
> How about the following?
>
> There was no API to kmalloc bio and clone and osdblk was using
> explicit bio_kmalloc() + __bio_clone(). (my guess here) As this is
> inconvenient and there will be more users of it in the future, add
> bio_clone_kmalloc() and use it in osdblk.

Adding that.

>
> > Acked-by: Boaz Harrosh <[email protected]>
> > Signed-off-by: Kent Overstreet <[email protected]>
> > ---
> > drivers/block/osdblk.c | 3 +--
> > fs/bio.c | 13 +++++++++++++
> > fs/exofs/ore.c | 5 ++---
> > include/linux/bio.h | 1 +
> > 4 files changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
> > index 87311eb..1bbc681 100644
> > --- a/drivers/block/osdblk.c
> > +++ b/drivers/block/osdblk.c
> > @@ -266,11 +266,10 @@ static struct bio *bio_chain_clone(struct bio *old_chain, gfp_t gfpmask)
> > struct bio *tmp, *new_chain = NULL, *tail = NULL;
> >
> > while (old_chain) {
> > - tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
> > + tmp = bio_clone_kmalloc(old_chain, gfpmask);
> > if (!tmp)
> > goto err_out;
> >
> > - __bio_clone(tmp, old_chain);
> > tmp->bi_bdev = NULL;
> > gfpmask &= ~__GFP_WAIT;
> > tmp->bi_next = NULL;
> > diff --git a/fs/bio.c b/fs/bio.c
> > index f0c865b..77b9313 100644
> > --- a/fs/bio.c
> > +++ b/fs/bio.c
> > @@ -497,6 +497,19 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
> > }
> > EXPORT_SYMBOL(bio_clone);
> >
>
> /**
>
> PLEASE.
>
> > +struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
> > +{
> > + struct bio *b = bio_kmalloc(gfp_mask, bio->bi_max_vecs);
>
> Can't we use %NULL bioset as an indication to allocate from kmalloc
> instead of duping interfaces like this?

The two aren't mutually exclusive - but using BIO_KMALLOC_POOL instead
of separate interfaces is an excellent idea, I'll do that.

That means bio_clone_kmalloc will just become:

static inline struct bio *bio_clone_kmalloc(struct bio *bio,
gfp_t gfp_mask)
{
return bio_clone_bioset(bio, gfp_mask, BIO_KMALLOC_POOL)
}

(or maybe NULL there, I think using NULL for the interface makes sense,
I just don't want to use it for bi_pool).

Do you still want the /** for a one line wrapper like that?


>
> Thanks.
>
> --
> tejun

2012-08-09 02:38:47

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH] Consolidate bio_clone_bioset(), bio_kmalloc()

On Wed, Aug 08, 2012 at 04:15:52PM -0700, Tejun Heo wrote:
> > +struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
> > +{
> > + struct bio *b = bio_kmalloc(gfp_mask, bio->bi_max_vecs);
>
> Can't we use %NULL bioset as an indication to allocate from kmalloc
> instead of duping interfaces like this?

So, if bio_clone_bioset(gfp, nr_iovecs, BIO_KMALLOC_POOL) just does
bio_kmalloc(), the rest just falls out naturally.

We could do this by either just having bio_clone_bioset() call
bio_kmalloc(), or consolidate them both into a single function.

I'm leaning towards the latter, because while looking at it I noticed a
couple subtle behavioural differences.

* bio_kmalloc(GFP_KERNEL, 0) sets bi_io_vec = bi_inline_vecs,
bio_alloc_bioset sets it to NULL. This is a bug waiting to happen, if it
isn't one already - bi_io_vec != NULL is exactly what bio_has_data()
checks.

* bio_alloc_bioset() could return a bio with bi_max_vecs greater than
requested if you asked for a bio with fewer than BIO_INLINE_VECS.
Unlikely to ever be a real problem, but subtle enough that I wouldn't
bet too much against it.

* bio_kmalloc() fails if asked for more than UIO_MAXIOV bvecs (wtf!?),
which is 1024; bio_alloc_bioset fails if asked for more than
BIO_MAX_PAGES (which is 256, and it'd probably take you a bit to see
where/why it fails).

So here's my initial stab at it. Tell me if you think this is too
contorted:


diff --git a/fs/bio.c b/fs/bio.c
index 22596af..c852665 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -295,34 +295,45 @@ EXPORT_SYMBOL(bio_reset);
**/
struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
{
+ unsigned front_pad;
+ unsigned inline_vecs;
unsigned long idx = BIO_POOL_NONE;
struct bio_vec *bvl = NULL;
struct bio *bio;
void *p;

- p = mempool_alloc(bs->bio_pool, gfp_mask);
+ if (nr_iovecs > UIO_MAXIOV)
+ return NULL;
+
+ if (bs == BIO_KMALLOC_POOL) {
+ p = kmalloc(sizeof(struct bio) +
+ nr_iovecs * sizeof(struct bio_vec),
+ gfp_mask);
+ front_pad = 0;
+ inline_vecs = nr_iovecs;
+ } else {
+ p = mempool_alloc(bs->bio_pool, gfp_mask);
+ front_pad = bs->front_pad;
+ inline_vecs = BIO_INLINE_VECS;
+ }
+
if (unlikely(!p))
return NULL;
- bio = p + bs->front_pad;

+ bio = p + front_pad;
bio_init(bio);
- bio->bi_pool = bs;
-
- if (unlikely(!nr_iovecs))
- goto out_set;

- if (nr_iovecs <= BIO_INLINE_VECS) {
- bvl = bio->bi_inline_vecs;
- nr_iovecs = BIO_INLINE_VECS;
- } else {
+ if (nr_iovecs > inline_vecs) {
bvl = bvec_alloc_bs(gfp_mask, nr_iovecs, &idx, bs);
if (unlikely(!bvl))
goto err_free;

- nr_iovecs = bvec_nr_vecs(idx);
bio->bi_flags |= 1 << BIO_OWNS_VEC;
+ } else if (nr_iovecs) {
+ bvl = bio->bi_inline_vecs;
}
-out_set:
+
+ bio->bi_pool = bs;
bio->bi_flags |= idx << BIO_POOL_OFFSET;
bio->bi_max_vecs = nr_iovecs;
bio->bi_io_vec = bvl;

2012-08-09 02:56:46

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 11/12] block: Add bio_clone_bioset()

On Wed, Aug 08, 2012 at 04:21:20PM -0700, Tejun Heo wrote:
> On Mon, Aug 06, 2012 at 03:08:40PM -0700, Kent Overstreet wrote:
> > This consolidates some code, and will help in a later patch changing how
> > bio cloning works.
>
> I think it would be better to introduce bio_clone*() functions in a
> separate patch and convert the users in a different one.
>
> > /**
> > - * bio_clone - clone a bio
> > + * bio_clone_bioset - clone a bio
> > * @bio: bio to clone
> > * @gfp_mask: allocation priority
> > + * @bs: bio_set to allocate from
> > *
> > * Like __bio_clone, only also allocates the returned bio
> > */
> > -struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
> > +struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
> > + struct bio_set *bs)
> > {
> > - struct bio *b = bio_alloc(gfp_mask, bio->bi_max_vecs);
> > + struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
> >
> > if (!b)
> > return NULL;
> > @@ -485,7 +487,7 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
> > if (bio_integrity(bio)) {
> > int ret;
> >
> > - ret = bio_integrity_clone(b, bio, gfp_mask, fs_bio_set);
> > + ret = bio_integrity_clone(b, bio, gfp_mask, bs);
> >
> > if (ret < 0) {
> > bio_put(b);
> > @@ -495,6 +497,12 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
> >
> > return b;
> > }
> > +EXPORT_SYMBOL(bio_clone_bioset);
> > +
> > +struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
> > +{
> > + return bio_clone_bioset(bio, gfp_mask, fs_bio_set);
> > +}
>
> So, bio_clone() loses its function comment. Also, does it even make
> sense to call bio_clone() from fs_bio_set?

I'll re add the function comment if you want, just for a single line
wrapper I don't know if it's worth the cost - comments get out of date,
and they're more stuff to wade through.

> Let's say it's so, then
> what's the difference from using _kmalloc variant?

bio_kmalloc() fails if nr_iovecs > 1024, bio_alloc_bioset() fails if
nr_iovecs > 256

and bio_alloc_bioset() is mempool backed, bio_kmalloc() is not.

AFAICT that's it.

2012-08-09 03:07:16

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 12/12] block: Only clone bio vecs that are in use

On Wed, Aug 08, 2012 at 04:30:07PM -0700, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 06, 2012 at 03:08:41PM -0700, Kent Overstreet wrote:
> > @@ -459,10 +460,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
> > bio->bi_sector = bio_src->bi_sector;
> > bio->bi_bdev = bio_src->bi_bdev;
> > bio->bi_flags |= 1 << BIO_CLONED;
> > + bio->bi_flags &= ~(1 << BIO_SEG_VALID);
>
> This isn't obvious at all. Why no explanation anywhere? Also it
> would be nice to update comments of the updated functions so that it's
> clear that only partial cloning happens.

Because it's not obvious to me, either - I had to grep around through a
fair amount of code to figure out the semantics of BIO_SEG_VALID and I
doubt I have it 100%. I'm also pretty sure it's not used consistently in
the existing code...

If it means what I think it means, it should be obvious - nr_segs isn't
valid because the number of pages in the iovec changed (though we didn't
bother to copy it over anyways. So it's not necessary if nr_segs = 0
means nr_segs isn't valid, but bleh).

Anyways. yeah. BIO_SEG_VALID should be documented, and if it was I think
this code would be fine.

I will update the comment for the partial cloning thing:

/**
* __bio_clone - clone a bio
* @bio: destination bio
* @bio_src: bio to clone
*
* Clone a &bio. Caller will own the returned bio, but not
* the actual data it points to. Reference count of returned
* bio will be one.
*
* We don't clone the entire bvec, just the part from bi_idx to b_vcnt
* (i.e. what the bio currently points to, so the new bio is still
* equivalent to the old bio).
*/
void __bio_clone(struct bio *bio, struct bio *bio_src)
{
memcpy(bio->bi_io_vec,
bio_iovec(bio_src),
bio_segments(bio_src) * sizeof(struct bio_vec));

2012-08-09 03:20:14

by Kent Overstreet

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH v5 12/12] block: Only clone bio vecs that are in use

On Wed, Aug 08, 2012 at 04:47:46PM -0700, Muthu Kumar wrote:
> Tejun,
>
> This is changing the semantics of the clone. Sorry, I missed this
> thread and replied separately. But anyway, replying it again here:
>
>
> On Wed, Aug 8, 2012 at 4:28 PM, Tejun Heo <[email protected]> wrote:
> > On Mon, Aug 06, 2012 at 07:16:33PM -0400, Mikulas Patocka wrote:
> >> Hi Kent
> >>
> >> When you change the semantics of an exported function, rename that
> >> function. There may be external modules that use __bio_clone and this
> >> change could silently introduce bugs in them.
> >>
> >> Otherwise, the patchset looks fine.
> >
> > I don't know. This doesn't change the main functionality and should
> > be transparent unless the caller is doing something crazy. It *might*
> > be nice to rename but I don't think that's a must here.
> >
> > Thanks.
>
> --
> You are changing the meaning of __bio_clone() here. In old code, the
> number of io_vecs, bi_idx, bi_vcnt are preserved. But in this modified
> code, you are mapping bio_src's bi_iovec[bi_idx] to bio_dests
> bi_iovec[0] and also restricting the number of allocated io_vecs of
> the clone. It may be useful for cases were we would like a identical
> copy of the original bio (may not be in current code base, but this
> implementation is definitely not what one would expect from the name
> "clone").

The problem is that bio_clone() is used on bios that were not allocated
or submitted by the cloning module.

If some code somewher submits a bio that points to 500 pages, but by the
time it gets to a driver it only points to 200 pages (say, because it
was split), that clone should succeed; it shouldn't fail simply because
it was trying to clone more than was necessary.

Bios have certain (poorly documented) semantics, and if this breaks
anything it's probably because that code was doing something crazy in
the first place.

In particular, if this change breaks anything then the new bio_split()
_will_ break things.

We need to be clear about our interfaces; in this case bi_idx and
bi_vcnt, in particular. Either this is a safe change, or it's not. If
no one knows... that's a bigger problem, and not just for this patch...

Fortunately this code actually has been tested quite a bit (and the bio
splitting code for even longer), and (somewhat to my surprise) I haven't
run into any bugs caused by it.

2012-08-09 03:25:07

by Kent Overstreet

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH v5 12/12] block: Only clone bio vecs that are in use

On Wed, Aug 8, 2012 at 8:19 PM, Kent Overstreet <[email protected]> wrote:
> In particular, if this change breaks anything then the new bio_split()
> _will_ break things.
>
> We need to be clear about our interfaces; in this case bi_idx and
> bi_vcnt, in particular. Either this is a safe change, or it's not. If
> no one knows... that's a bigger problem, and not just for this patch...
>
> Fortunately this code actually has been tested quite a bit (and the bio
> splitting code for even longer), and (somewhat to my surprise) I haven't
> run into any bugs caused by it.

Oh, it's worse than that - remember how bio_kmalloc() works for up to
1024 pages, but bio_alloc_bioset() only up to 256?

We can already have situations where bios are allocated that are
impossible to clone (or if we don't, it's only because of
queue_limits. That's not sketchy at all.)

2012-08-09 06:00:27

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 03/12] block: Add bio_reset()

Hello,

On Wed, Aug 08, 2012 at 05:07:11PM -0700, Kent Overstreet wrote:
> > > +void bio_reset(struct bio *bio)
> > > +{
> > > + unsigned long flags = bio->bi_flags & (~0UL << BIO_RESET_BITS);
> >
> > How many flags are we talking about? If there aren't too many, I'd
> > prefer explicit BIO_FLAGS_PRESERVED or whatever.
>
> It mostly isn't actual flags that are preserved - the high bits of the
> flags are used for indicating what slab the bvec was allocated from, and
> that's the main thing that has to be preserved.
>
> So that's why I went with defining the things that are reset instead of
> the things that are preserved.
>
> I would prefer if bitfields were used for at least BIO_POOL_IDX, but the
> problem is flags is used as an atomic bit vector for BIO_UPTODATE.
>
> But flags isn't treated as an atomic bit vector elsewhere -
> bio_flagged() doesn't use test_bit(), and flags are set/cleared with

Not using test_bit() may not be necessarily wrong.

> atomic bit operations in some places but not in others (probably _most_
> of them are technically safe, but... ick).

Mixing atomic and non-atomic modifications is almost always wrong tho.

Anyways, understood. Can you *please* put some comment what bits are
being preserved across reset then? Things like this aren't obvious at
all and need ample explanation.

Thanks.

--
tejun

2012-08-09 06:05:24

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 05/12] block: Kill bi_destructor

Hello,

On Wed, Aug 08, 2012 at 05:21:54PM -0700, Kent Overstreet wrote:
> > What's wrong with good ol' NULL?
>
> If it's NULL, we can't distinguish between bios where that field wasn't
> set (i.e. bios that were statically allocated somewhere) from bios that
> were allocated by bio_kmalloc().
>
> It's just there to make debugging easier - if bi_cnt goes to 0 on a bio
> where it shouldn't we'll catch it at the BUG_ON() in bio_free() instead
> of kfreeing a bad pointer.

I fail to see how that improves anything. slab will complain clearly
if it gets passed in a pointer to static area. The benefit is
imaginery. If there's no bioset, it's NULL. Let's please keep things
usual.

Thanks.

--
tejun

2012-08-09 06:06:41

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 03/12] block: Add bio_reset()

On Wed, Aug 08, 2012 at 11:00:19PM -0700, Tejun Heo wrote:
> Anyways, understood. Can you *please* put some comment what bits are
> being preserved across reset then? Things like this aren't obvious at
> all and need ample explanation.

I did, in the header:

#define BIO_RESET_BITS 12 /* Flags starting here get preserved by
bio_reset() */

Where the rest of the flags are defined, and near where BIO_RESET_BYTES
are defined.

2012-08-09 06:12:15

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 05/12] block: Kill bi_destructor

On Wed, Aug 08, 2012 at 11:05:17PM -0700, Tejun Heo wrote:
> Hello,
>
> On Wed, Aug 08, 2012 at 05:21:54PM -0700, Kent Overstreet wrote:
> > > What's wrong with good ol' NULL?
> >
> > If it's NULL, we can't distinguish between bios where that field wasn't
> > set (i.e. bios that were statically allocated somewhere) from bios that
> > were allocated by bio_kmalloc().
> >
> > It's just there to make debugging easier - if bi_cnt goes to 0 on a bio
> > where it shouldn't we'll catch it at the BUG_ON() in bio_free() instead
> > of kfreeing a bad pointer.
>
> I fail to see how that improves anything. slab will complain clearly
> if it gets passed in a pointer to static area. The benefit is
> imaginery. If there's no bioset, it's NULL. Let's please keep things
> usual.

But if it's a pointer to heap allocated memory, but the bio was embedded
in another struct? I've seen a fair number of instances of that (md, off
the top of my head).

If you're sure that in a normal config the slab allocator is going to
complain right away and not corrupt itself, fine. But I've been bitten
way too hard by bugs that could've been caught right away by a simple
assert and instead I had to spend hours backtracking, and the block
layer is _rife_ with that kind of thing.

2012-08-09 06:30:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 03/12] block: Add bio_reset()

Hello,

On Wed, Aug 8, 2012 at 11:06 PM, Kent Overstreet <[email protected]> wrote:
> On Wed, Aug 08, 2012 at 11:00:19PM -0700, Tejun Heo wrote:
>> Anyways, understood. Can you *please* put some comment what bits are
>> being preserved across reset then? Things like this aren't obvious at
>> all and need ample explanation.
>
> I did, in the header:
>
> #define BIO_RESET_BITS 12 /* Flags starting here get preserved by
> bio_reset() */
>
> Where the rest of the flags are defined, and near where BIO_RESET_BYTES
> are defined.

Yeah, I was hoping for the comment to note that the protected bits
include the pool index.

Thanks.

--
tejun

2012-08-09 06:34:15

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 05/12] block: Kill bi_destructor

Hello,

On Wed, Aug 8, 2012 at 11:12 PM, Kent Overstreet <[email protected]> wrote:
> But if it's a pointer to heap allocated memory, but the bio was embedded
> in another struct? I've seen a fair number of instances of that (md, off
> the top of my head).
>
> If you're sure that in a normal config the slab allocator is going to
> complain right away and not corrupt itself, fine. But I've been bitten
> way too hard by bugs that could've been caught right away by a simple
> assert and instead I had to spend hours backtracking, and the block
> layer is _rife_ with that kind of thing.

Let's let slab debug code deal with that. I really don't see much
benefit in doing this. The said kind of bugs aren't particularly
difficult to track down.

Thanks.

--
tejun

2012-08-09 06:44:56

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] block: Introduce new bio_split()

Hello, Kent.

On Wed, Aug 08, 2012 at 06:19:28PM -0700, Kent Overstreet wrote:
> If bio_split returned an error, it'd make the code more convoluted -
> you'd have to do work on either the split or the original bio, and then
> repeat the same check later when it's time to break out of the loop.

I really dislike this reasonsing. The interface might streamline
certain use cases a bit but at the cost of confusing semantics. It's
called bio_split() which takes a bio and returns a bio. Any sane
person would expect it to return the split bio iff the original bio is
successfully split and NULL or ERR_PTR() value if splitting didn't
happen for whatever reason. Please don't try to make code look
elegant by twisting obvious things. Saving three lines is never
worthwhile it it costs obviousness.

This reminds me of, unfortunately, kobject_get() which returns the
parameter it got passed in verbatim. The rationale was that it makes
code like the following *look* more elegant.

my_kobj = kobject_get(&blahblah.kobj);

Some people liked how things like the above looked. Unfortunately, it
unnecessarily made people wonder whether the return value could be
different from the parameter (can it ever fail?) and led to cases
where people misunderstood the semantics and assumed that kobj somehow
handles zero refcnt racing and kobject_get() would magically return
NULL if refcnt reaches zero. I hope we don't have those bugs anymore
but code built on top of kobject unfortunately propagated this
stupidity multiple layers upwards and I'm not sure.

So, please don't do things like this.

> > This is somewhat error-prone. Given how splits are used now, this
> > might not be a big issue but it isn't difficult to imagine how this
> > could go subtly wrong. More on this.
>
> I can't find anything else in your emails on the subject...

Oh, it was the chaining thing. If you bump ref on the original bio
and let the split one put it on completion, the caller doesn't have to
worry about this.

> > > + bio_for_each_segment(bv, bio, idx) {
> > > + vcnt = idx - bio->bi_idx;
> > > +
> > > + if (!nbytes) {
> > > + ret = bio_alloc_bioset(gfp, 0, bs);
> > > + if (!ret)
> > > + return NULL;
> > > +
> > > + ret->bi_io_vec = bio_iovec(bio);
> > > + ret->bi_flags |= 1 << BIO_CLONED;
> > > + break;
> > > + } else if (nbytes < bv->bv_len) {
> > > + ret = bio_alloc_bioset(gfp, ++vcnt, bs);
> > > + if (!ret)
> > > + return NULL;
> > > +
> > > + memcpy(ret->bi_io_vec, bio_iovec(bio),
> > > + sizeof(struct bio_vec) * vcnt);
> > > +
> > > + ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
> > > + bv->bv_offset += nbytes;
> > > + bv->bv_len -= nbytes;
> > > + break;
> > > + }
...
> > I find the code a bit confusing. Wouldn't it be better to structure
> > it as
> >
> > bio_for_each_segment() {
> > find splitting point;
> > }
> >
> > Do all of the splitting.
>
> Definitely, except I don't see how to sanely do it that way with the
> different cases for splitting on a bvec boundry and not.

I really don't buy that. Just break out on the common condition and
differentiate the two cases afterwards.

> > So, before, split wouldn't override orig->bi_private. Now, it does so
> > while the bio is in flight. I don't know. If the only benefit of
> > temporary override is avoiding have a separate end_io call, I think
> > not doing so is better. Also, behavior changes as subtle as this
> > *must* be noted in the patch description.
>
> Already said more about this below, but to elaborate a bit - there are
> situations where the caller really wouldn't want the completions chained
> (i.e, if the splits are going to different devices or otherwise are
> going to have different error handling, the caller really needs to
> supply its own endio function(s)).

Then let it take @chain parameter and if @chain is %false, clear endio
(or introduce bio_split_chain()). Copying endio of the original bio
is useless and dangerous.

> Outside the scope of this function - if you want the completions
> chained, you'd use bio_pair_split().
>
> With this bio_split() it's perfectly reasonable to split a bio an
> arbitrary number of times, and if that's what you're doing it's much
> cleaner (and more efficient) to just use a refcount instead of chaining
> the completions a bunch of times.
>
> So if that's what the caller is doing, this will do exactly what they
> want - if the caller wants to chain the completions, the caller can
> still do that (like how bio_pair_split() does in the next patch).

bio_pair doesn't really make much sense after this change. Originally
it made sense as the container holding everything necessary for the
clone, now it's just a proxy to keep the old interface. It doesn't
even succeed at that. The interface changes. Let's just roll it into
the default bio_clone() interface.

Thanks.

--
tejun

2012-08-09 06:52:58

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 11/12] block: Add bio_clone_bioset()

On Wed, Aug 08, 2012 at 07:56:10PM -0700, Kent Overstreet wrote:
> > So, bio_clone() loses its function comment. Also, does it even make
> > sense to call bio_clone() from fs_bio_set?
>
> I'll re add the function comment if you want, just for a single line
> wrapper I don't know if it's worth the cost - comments get out of date,
> and they're more stuff to wade through.

People actually look at docbook generated docs. I don't know why but
they do. It's a utility function at block layer. Please just add the
comment.

> > Let's say it's so, then
> > what's the difference from using _kmalloc variant?
>
> bio_kmalloc() fails if nr_iovecs > 1024, bio_alloc_bioset() fails if
> nr_iovecs > 256
>
> and bio_alloc_bioset() is mempool backed, bio_kmalloc() is not.
>
> AFAICT that's it.

So, the thing is being mempool backed doesn't mean anything if
multiple layers use the pool. I *suspect* fs_bio_set is supposed to
be used by fs layer - ie. where bios originate. The reason why I
wondered about bio_clone() is that bio_clone() is almost always used
from stacking drivers and stacking driver tapping into fs reserve is
buggy. So, I'm wondering whether cloning from fs_bio_set should be
supported at all.

Thanks.

--
tejun

2012-08-09 06:55:14

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 10/12] block: Add bio_clone_kmalloc()

On Wed, Aug 08, 2012 at 06:57:04PM -0700, Kent Overstreet wrote:
> That means bio_clone_kmalloc will just become:
>
> static inline struct bio *bio_clone_kmalloc(struct bio *bio,
> gfp_t gfp_mask)
> {
> return bio_clone_bioset(bio, gfp_mask, BIO_KMALLOC_POOL)
> }
>
> (or maybe NULL there, I think using NULL for the interface makes sense,
> I just don't want to use it for bi_pool).
>
> Do you still want the /** for a one line wrapper like that?

I don't know. But do you think you can do similar thing to alloc
interface too?

Thanks.

--
tejun

2012-08-09 06:56:57

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] Consolidate bio_clone_bioset(), bio_kmalloc()

On Wed, Aug 08, 2012 at 07:38:11PM -0700, Kent Overstreet wrote:
> So here's my initial stab at it. Tell me if you think this is too
> contorted:

At the first glance, looks okay to me.

Thanks.

--
tejun

2012-08-09 06:59:32

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 11/12] block: Add bio_clone_bioset()

On Wed, Aug 08, 2012 at 11:52:51PM -0700, Tejun Heo wrote:
> On Wed, Aug 08, 2012 at 07:56:10PM -0700, Kent Overstreet wrote:
> > > So, bio_clone() loses its function comment. Also, does it even make
> > > sense to call bio_clone() from fs_bio_set?
> >
> > I'll re add the function comment if you want, just for a single line
> > wrapper I don't know if it's worth the cost - comments get out of date,
> > and they're more stuff to wade through.
>
> People actually look at docbook generated docs. I don't know why but
> they do. It's a utility function at block layer. Please just add the
> comment.

Will do then.

> > > Let's say it's so, then
> > > what's the difference from using _kmalloc variant?
> >
> > bio_kmalloc() fails if nr_iovecs > 1024, bio_alloc_bioset() fails if
> > nr_iovecs > 256
> >
> > and bio_alloc_bioset() is mempool backed, bio_kmalloc() is not.
> >
> > AFAICT that's it.
>
> So, the thing is being mempool backed doesn't mean anything if
> multiple layers use the pool.

It's worse than just using kmalloc, because then you've introduced the
possibility of deadlock.

> I *suspect* fs_bio_set is supposed to
> be used by fs layer - ie. where bios originate. The reason why I
> wondered about bio_clone() is that bio_clone() is almost always used
> from stacking drivers and stacking driver tapping into fs reserve is
> buggy. So, I'm wondering whether cloning from fs_bio_set should be
> supported at all.

That's actually a really good point.

I just grepped and there's actually only 3 callers - I thought there'd
be more. That should be easy to fix, at least.

2012-08-09 07:02:02

by Tejun Heo

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH v5 12/12] block: Only clone bio vecs that are in use

Hello,

On Wed, Aug 08, 2012 at 04:47:46PM -0700, Muthu Kumar wrote:
> You are changing the meaning of __bio_clone() here. In old code, the
> number of io_vecs, bi_idx, bi_vcnt are preserved. But in this modified
> code, you are mapping bio_src's bi_iovec[bi_idx] to bio_dests
> bi_iovec[0] and also restricting the number of allocated io_vecs of
> the clone. It may be useful for cases were we would like a identical
> copy of the original bio (may not be in current code base, but this
> implementation is definitely not what one would expect from the name
> "clone").

Implementation details changed somewhat but the high-level semantics
didn't change at all. Any driver not messing with bio internals - and
they shouldn't - shouldn't notice the change. No in-kernel drivers
seem to be broken by the change. If you ask me, this looks more like
a bug fix to me where the bug is a silly behavior restricting
usefulness of the interface.

> May be, call this new implementation some thing else (and use it for bcache)?

This doesn't only change __bio_clone() but all clone interface stacked
on top of it, so, no way. This ain't windows.

Thanks.

--
tejun

2012-08-09 07:02:48

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 10/12] block: Add bio_clone_kmalloc()

On Wed, Aug 08, 2012 at 11:55:04PM -0700, Tejun Heo wrote:
> On Wed, Aug 08, 2012 at 06:57:04PM -0700, Kent Overstreet wrote:
> > That means bio_clone_kmalloc will just become:
> >
> > static inline struct bio *bio_clone_kmalloc(struct bio *bio,
> > gfp_t gfp_mask)
> > {
> > return bio_clone_bioset(bio, gfp_mask, BIO_KMALLOC_POOL)
> > }
> >
> > (or maybe NULL there, I think using NULL for the interface makes sense,
> > I just don't want to use it for bi_pool).
> >
> > Do you still want the /** for a one line wrapper like that?
>
> I don't know. But do you think you can do similar thing to alloc
> interface too?

Already did:

commit 313e0a46b1681a8e02b2fe9a86cfc3b82599be58
Author: Kent Overstreet <[email protected]>
Date: Wed Aug 8 20:30:16 2012 -0700

block: Add bio_clone_bioset(), bio_clone_kmalloc()

Previously, there was bio_clone() but it only allocated from the fs bio
set; as a result various users were open coding it and using
__bio_clone().

This changes bio_clone() to become bio_clone_bioset(), and then we add
bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
the functionality the last patch adedd.

This will also help in a later patch changing how bio cloning works.

Signed-off-by: Kent Overstreet <[email protected]>

diff --git a/block/blk-core.c b/block/blk-core.c
index e9058c2..10a6e08 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2768,16 +2768,10 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
blk_rq_init(NULL, rq);

__rq_for_each_bio(bio_src, rq_src) {
- bio = bio_alloc_bioset(gfp_mask, bio_src->bi_max_vecs, bs);
+ bio = bio_clone_bioset(bio_src, gfp_mask, bs);
if (!bio)
goto free_and_out;

- __bio_clone(bio, bio_src);
-
- if (bio_integrity(bio_src) &&
- bio_integrity_clone(bio, bio_src, gfp_mask, bs))
- goto free_and_out;
-
if (bio_ctr && bio_ctr(bio, bio_src, data))
goto free_and_out;

diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
index 87311eb..1bbc681 100644
--- a/drivers/block/osdblk.c
+++ b/drivers/block/osdblk.c
@@ -266,11 +266,10 @@ static struct bio *bio_chain_clone(struct bio *old_chain, gfp_t gfpmask)
struct bio *tmp, *new_chain = NULL, *tail = NULL;

while (old_chain) {
- tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
+ tmp = bio_clone_kmalloc(old_chain, gfpmask);
if (!tmp)
goto err_out;

- __bio_clone(tmp, old_chain);
tmp->bi_bdev = NULL;
gfpmask &= ~__GFP_WAIT;
tmp->bi_next = NULL;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a8f5cdc..d978f7e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1105,8 +1105,8 @@ static void __issue_target_request(struct clone_info *ci, struct dm_target *ti,
* ci->bio->bi_max_vecs is BIO_INLINE_VECS anyway, for both flush
* and discard, so no need for concern about wasted bvec allocations.
*/
- clone = bio_alloc_bioset(GFP_NOIO, ci->bio->bi_max_vecs, ci->md->bs);
- __bio_clone(clone, ci->bio);
+ clone = bio_clone_bioset(ci->bio, GFP_NOIO, ci->md->bs);
+
if (len) {
clone->bi_sector = ci->sector;
clone->bi_size = to_bytes(len);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index f9d16dc..069c3bc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -173,28 +173,10 @@ EXPORT_SYMBOL_GPL(bio_alloc_mddev);
struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
struct mddev *mddev)
{
- struct bio *b;
-
if (!mddev || !mddev->bio_set)
return bio_clone(bio, gfp_mask);

- b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, mddev->bio_set);
- if (!b)
- return NULL;
-
- __bio_clone(b, bio);
- if (bio_integrity(bio)) {
- int ret;
-
- ret = bio_integrity_clone(b, bio, gfp_mask, mddev->bio_set);
-
- if (ret < 0) {
- bio_put(b);
- return NULL;
- }
- }
-
- return b;
+ return bio_clone_bioset(bio, gfp_mask, mddev->bio_set);
}
EXPORT_SYMBOL_GPL(bio_clone_mddev);

diff --git a/fs/bio.c b/fs/bio.c
index c0b9bf3..71f1ac5 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -419,16 +419,19 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
EXPORT_SYMBOL(__bio_clone);

/**
- * bio_clone - clone a bio
+ * bio_clone_bioset - clone a bio
* @bio: bio to clone
* @gfp_mask: allocation priority
+ * @bs: bio_set to allocate from
*
* Like __bio_clone, only also allocates the returned bio
*/
-struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
+struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
+ struct bio_set *bs)
{
- struct bio *b = bio_alloc(gfp_mask, bio->bi_max_vecs);
+ struct bio *b;

+ b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
if (!b)
return NULL;

@@ -437,7 +440,7 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
if (bio_integrity(bio)) {
int ret;

- ret = bio_integrity_clone(b, bio, gfp_mask, fs_bio_set);
+ ret = bio_integrity_clone(b, bio, gfp_mask, bs);

if (ret < 0) {
bio_put(b);
@@ -447,7 +450,7 @@ struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)

return b;
}
-EXPORT_SYMBOL(bio_clone);
+EXPORT_SYMBOL(bio_clone_bioset);

/**
* bio_get_nr_vecs - return approx number of vecs
diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c
index 24a49d4..a8d92fc 100644
--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -814,8 +814,8 @@ static int _write_mirror(struct ore_io_state *ios, int cur_comp)
struct bio *bio;

if (per_dev != master_dev) {
- bio = bio_kmalloc(GFP_KERNEL,
- master_dev->bio->bi_max_vecs);
+ bio = bio_clone_kmalloc(master_dev->bio,
+ GFP_KERNEL);
if (unlikely(!bio)) {
ORE_DBGMSG(
"Failed to allocate BIO size=%u\n",
@@ -824,7 +824,6 @@ static int _write_mirror(struct ore_io_state *ios, int cur_comp)
goto out;
}

- __bio_clone(bio, master_dev->bio);
bio->bi_bdev = NULL;
bio->bi_next = NULL;
per_dev->offset = master_dev->offset;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 3e2686f..9e2c9a5 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -213,6 +213,9 @@ extern struct bio *bio_alloc_bioset(gfp_t, int, struct bio_set *);
extern void bio_put(struct bio *);
extern void bio_free(struct bio *);

+extern void __bio_clone(struct bio *, struct bio *);
+extern struct bio *bio_clone_bioset(struct bio *, gfp_t, struct bio_set *bs);
+
extern struct bio_set *fs_bio_set;

static inline struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
@@ -220,6 +223,11 @@ static inline struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
return bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
}

+static inline struct bio *bio_clone(struct bio *bio, gfp_t gfp_mask)
+{
+ return bio_clone_bioset(bio, gfp_mask, fs_bio_set);
+}
+
#define BIO_KMALLOC_POOL ((void *) ~0)

static inline struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int nr_iovecs)
@@ -227,13 +235,16 @@ static inline struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int nr_iovecs)
return bio_alloc_bioset(gfp_mask, nr_iovecs, BIO_KMALLOC_POOL);
}

+static inline struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
+{
+ return bio_clone_bioset(bio, gfp_mask, BIO_KMALLOC_POOL);
+
+}
+
extern void bio_endio(struct bio *, int);
struct request_queue;
extern int bio_phys_segments(struct request_queue *, struct bio *);

-extern void __bio_clone(struct bio *, struct bio *);
-extern struct bio *bio_clone(struct bio *, gfp_t);
-
extern void bio_init(struct bio *);
extern void bio_reset(struct bio *);

2012-08-09 07:22:25

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] block: Introduce new bio_split()

On Wed, Aug 08, 2012 at 06:39:23PM -0700, Kent Overstreet wrote:
> On Wed, Aug 08, 2012 at 04:05:32PM -0700, Tejun Heo wrote:
> > One more thing.
> >
> > On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> > > + if (bio_integrity(bio)) {
> > > + bio_integrity_clone(ret, bio, gfp, bs);
> > > + bio_integrity_trim(ret, 0, bio_sectors(ret));
> > > + bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
> >
> > Is this equivalent to bio_integrity_split() performance-wise?
>
> Strictly speaking, no. But it has the advantage of being drastically
> simpler - and the only one only worked for single page bios so I
> would've had to come up with something new from scratch, and as
> confusing as the integrity stuff is I wouldn't trust the result.

There's already bio_integrity_split() and you're actively dropping it.

> I'm skeptical that it's going to matter in practice given how much
> iteration is done elsewhere in the course of processing a bio and given
> that this stuff isn't used with high end SSDs...

If you think the active dropping is justified, please let the change
and justification clearly stated. You're burying the active change in
two separate patches without even mentioning it or cc'ing people who
care about bio-integrity (Martin K. Petersen). Ummm.... this is
simply unacceptable and makes me a lot more suspicious about the
patchset. Please be explicit about changes you make. Peer-review
breaks down unless such trust can be maintained.

Thanks.

--
tejun

2012-08-09 07:33:34

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] block: Introduce new bio_split()

On Thu, Aug 09, 2012 at 12:22:17AM -0700, Tejun Heo wrote:
> On Wed, Aug 08, 2012 at 06:39:23PM -0700, Kent Overstreet wrote:
> > On Wed, Aug 08, 2012 at 04:05:32PM -0700, Tejun Heo wrote:
> > > One more thing.
> > >
> > > On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> > > > + if (bio_integrity(bio)) {
> > > > + bio_integrity_clone(ret, bio, gfp, bs);
> > > > + bio_integrity_trim(ret, 0, bio_sectors(ret));
> > > > + bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
> > >
> > > Is this equivalent to bio_integrity_split() performance-wise?
> >
> > Strictly speaking, no. But it has the advantage of being drastically
> > simpler - and the only one only worked for single page bios so I
> > would've had to come up with something new from scratch, and as
> > confusing as the integrity stuff is I wouldn't trust the result.
>
> There's already bio_integrity_split() and you're actively dropping it.

Because it only works for single page bios, AFAICT. I'd have to start
from scratch.

> > I'm skeptical that it's going to matter in practice given how much
> > iteration is done elsewhere in the course of processing a bio and given
> > that this stuff isn't used with high end SSDs...
>
> If you think the active dropping is justified, please let the change
> and justification clearly stated. You're burying the active change in
> two separate patches without even mentioning it or cc'ing people who
> care about bio-integrity (Martin K. Petersen).

Not intentionally, he isn't in MAINTAINERS so get_maintainers.pl missed
it and it slipped by while I was looking for people to CC. Added him.

2012-08-09 17:32:26

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] block: Introduce new bio_split()

Hello,

On Thu, Aug 09, 2012 at 03:33:34AM -0400, Kent Overstreet wrote:
> > If you think the active dropping is justified, please let the change
> > and justification clearly stated. You're burying the active change in
> > two separate patches without even mentioning it or cc'ing people who
> > care about bio-integrity (Martin K. Petersen).
>
> Not intentionally, he isn't in MAINTAINERS so get_maintainers.pl missed
> it and it slipped by while I was looking for people to CC. Added him.

git-log is your friend. For one-off patches, doing it this way might
be okay. Higher layer maintainer would be able to redirect it but if
you intend to change block layer APIs significantly as you try to do
in this patch series, you need to be *way* more diligent than you
currently are. At least I feel risky about acking patches in this
series.

* Significant change is buried without explicitly mentioning it or
discussing its implications.

* The patchset makes block layer API changes which impact multiple
stacking and low level drivers which are not particularly known for
simplicity and robustness, but there's no mention of how the patches
are tested and/or why the patches would be safe (e.g. reviewed all
the users and tested certain code paths and am fairly sure all the
changes should be safe because xxx sort of deal). When asked about
testing, not much seems to have been done.

* Responses and iterations across patch postings aren't responsive or
reliable, making it worrisome what will happen when things go south
after this hits mainline.

You're asking reviewers and maintainers to take a lot more risks than
they usually have to, which isn't a good way to make forward progress.

Thanks.

--
tejun

2012-08-09 17:37:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 12/12] block: Only clone bio vecs that are in use

Hello,

On Mon, Aug 06, 2012 at 03:08:41PM -0700, Kent Overstreet wrote:
> bcache creates large bios internally, and then splits them according to
> the device requirements before it sends them down. If a lower level
> device tries to clone the bio, and the original bio had more than
> BIO_MAX_PAGES, the clone will fail unecessarily.
>
> We can fix this by only cloning the bio vecs that are actually in use.

How was this tested?

Thanks.

--
tejun

2012-08-10 01:51:02

by Muthu Kumar

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH v5 12/12] block: Only clone bio vecs that are in use

Kent,

>> --
>> You are changing the meaning of __bio_clone() here. In old code, the
>> number of io_vecs, bi_idx, bi_vcnt are preserved. But in this modified
>> code, you are mapping bio_src's bi_iovec[bi_idx] to bio_dests
>> bi_iovec[0] and also restricting the number of allocated io_vecs of
>> the clone. It may be useful for cases were we would like a identical
>> copy of the original bio (may not be in current code base, but this
>> implementation is definitely not what one would expect from the name
>> "clone").
>
> The problem is that bio_clone() is used on bios that were not allocated
> or submitted by the cloning module.
>
> If some code somewher submits a bio that points to 500 pages, but by the
> time it gets to a driver it only points to 200 pages (say, because it
> was split), that clone should succeed; it shouldn't fail simply because
> it was trying to clone more than was necessary.
>


I would say, the code that submits bio with 500 pages is broken. It
needs to take care of underlying restrictions before submitting bios.
And that's one of the reason for having bio_add_page(), especially for
stackable modules.


> Bios have certain (poorly documented) semantics, and if this breaks
> anything it's probably because that code was doing something crazy in
> the first place.
>

Agree. But doing the above doesn't help in improving the situation,
just makes it even less clear.

Regards,
Muthu.


> In particular, if this change breaks anything then the new bio_split()
> _will_ break things.
>
> We need to be clear about our interfaces; in this case bi_idx and
> bi_vcnt, in particular. Either this is a safe change, or it's not. If
> no one knows... that's a bigger problem, and not just for this patch...
>
> Fortunately this code actually has been tested quite a bit (and the bio
> splitting code for even longer), and (somewhat to my surprise) I haven't
> run into any bugs caused by it.

2012-08-10 02:29:09

by Muthu Kumar

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH v5 12/12] block: Only clone bio vecs that are in use

Tejun,

On Thu, Aug 9, 2012 at 12:01 AM, Tejun Heo <[email protected]> wrote:
> Hello,
>
> On Wed, Aug 08, 2012 at 04:47:46PM -0700, Muthu Kumar wrote:
>> You are changing the meaning of __bio_clone() here. In old code, the
>> number of io_vecs, bi_idx, bi_vcnt are preserved. But in this modified
>> code, you are mapping bio_src's bi_iovec[bi_idx] to bio_dests
>> bi_iovec[0] and also restricting the number of allocated io_vecs of
>> the clone. It may be useful for cases were we would like a identical
>> copy of the original bio (may not be in current code base, but this
>> implementation is definitely not what one would expect from the name
>> "clone").
>
> Implementation details changed somewhat but the high-level semantics
> didn't change at all. Any driver not messing with bio internals - and
> they shouldn't - shouldn't notice the change.

The reason for doing this change is because the code in question is
messing with bio internals.

No in-kernel drivers
> seem to be broken by the change. If you ask me, this looks more like
> a bug fix to me where the bug is a silly behavior restricting
> usefulness of the interface.
>
>> May be, call this new implementation some thing else (and use it for bcache)?
>
> This doesn't only change __bio_clone() but all clone interface stacked
> on top of it, so, no way.

>This ain't windows.

ah... when you put it this way, it gets a different perspective :)

Anyway, my point is, we shouldn't make it non-obvious ("clone" should
be just "clone"). But, we can always add more comments i guess.

Regards,
Muthu


>
> Thanks.
>
> --
> tejun

2012-08-11 05:24:52

by Joseph Glanville

[permalink] [raw]
Subject: Re: [PATCH v5 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info

Hi Kent, Tejun

On 9 August 2012 09:57, Kent Overstreet <[email protected]> wrote:
>> Also, how was this tested?
>
> Well, AFAICT the only request based dm target is multipath, and from the
> documentation I've seen it doesn't appear to work without multipath
> hardware, or at least I haven't seen it documented how. So, unless
> there's another user I missed it's not been tested.

Multipath can be tested quite easily with a loopback scsi target, you
don't require specialized hardware.
The easiest way to do this would probably be the built in LIO target +
open_iscsi initiator.

I haven't attempted running this current version of the patch series
but I haven't run into issues with bcache+multipath in the past.

>
>>
>> Thanks.
>>
>> --
>> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
CTO | Orion Virtualisation Solutions | http://www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

2012-08-13 21:44:52

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info

On Sat, Aug 11, 2012 at 03:24:45PM +1000, Joseph Glanville wrote:
> Hi Kent, Tejun
>
> On 9 August 2012 09:57, Kent Overstreet <[email protected]> wrote:
> >> Also, how was this tested?
> >
> > Well, AFAICT the only request based dm target is multipath, and from the
> > documentation I've seen it doesn't appear to work without multipath
> > hardware, or at least I haven't seen it documented how. So, unless
> > there's another user I missed it's not been tested.
>
> Multipath can be tested quite easily with a loopback scsi target, you
> don't require specialized hardware.
> The easiest way to do this would probably be the built in LIO target +
> open_iscsi initiator.
>
> I haven't attempted running this current version of the patch series
> but I haven't run into issues with bcache+multipath in the past.

Actually, I bet you have been running this code - do you think you could
check the git log for the version you're running? That'd be awesome if
you are.

2012-08-13 21:47:00

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 12/12] block: Only clone bio vecs that are in use

On Thu, Aug 09, 2012 at 10:37:00AM -0700, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 06, 2012 at 03:08:41PM -0700, Kent Overstreet wrote:
> > bcache creates large bios internally, and then splits them according to
> > the device requirements before it sends them down. If a lower level
> > device tries to clone the bio, and the original bio had more than
> > BIO_MAX_PAGES, the clone will fail unecessarily.
> >
> > We can fix this by only cloning the bio vecs that are actually in use.
>
> How was this tested?

This code has been in the bcache tree for months, and I added it to fix
a real bug (think I saw it with bcache on top of dm) - and since then
it's been tested in probably just about all the relevant configurations,
certainly both dm and md.

2012-08-13 21:55:52

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] block: Introduce new bio_split()

On Wed, Aug 08, 2012 at 03:58:39PM -0700, Tejun Heo wrote:
> On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> > + *
> > + * BIG FAT WARNING:
> > + *
> > + * If you're calling this from under generic_make_request() (i.e.
> > + * current->bio_list != NULL), you should mask out __GFP_WAIT and punt to
> > + * workqueue if the allocation fails. Otherwise, your code will probably
> > + * deadlock.
>
> If the condition is detectable, WARN_ON_ONCE() please.

I know I said I liked this idea, but I changed my mind.

Sticking a WARN_ON_ONCE() there is saying that passing __GFP_WAIT from
under generic_make_request() is always wrong - it might as well be a
BUG_ON() except warn is better for the user.

If that's true, then an assertion is completely wrong because we can
just do the right thing instead - mask out __GFP_WAIT if
current->bio_list != NULL and document that it can fail in that
situation.

Which is what my original code did.

The alternative is accepting that there are situations where it is
technically possible to pass __GFP_WAIT from under
generic_make_request() without deadlocking and allow it, but my position
is still that that is far too subtle to expect that it'll be gotten
right (especially considering the ways that the code is wrong today
wrt deadlocks).

But honestly this is turning into bikeshedding. The current bio
splitting and merge_bvec_fn stuff is crap, and there are worse potential
deadlocks/bugs in the existing code than what we're arguing over here.

2012-08-13 22:07:43

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] block: Introduce new bio_split()

Hello,

On Mon, Aug 13, 2012 at 02:55:11PM -0700, Kent Overstreet wrote:
> > If the condition is detectable, WARN_ON_ONCE() please.
>
> I know I said I liked this idea, but I changed my mind.
>
> Sticking a WARN_ON_ONCE() there is saying that passing __GFP_WAIT from
> under generic_make_request() is always wrong - it might as well be a
> BUG_ON() except warn is better for the user.

WARN_ON_ONCE(), as opposed to BUG_ON(), doesn't mean that something
isn't always wrong. It's just better to always use WARN variant if
the system doesn't *need* to be put down immediately to avoid, say,
massive data corruption - limping machine is simply better than dead
one.

> If that's true, then an assertion is completely wrong because we can
> just do the right thing instead - mask out __GFP_WAIT if
> current->bio_list != NULL and document that it can fail in that
> situation.

That's worse because it confuses the reader. Taking something which
is as well-known as __GFP_WAIT and then silently ignoring it isn't a
good idea. Please just add a WARN_ON.

Thanks.

--
tejun

2012-08-13 22:09:46

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 08/12] block: Introduce new bio_split()

On Thu, Aug 09, 2012 at 10:32:17AM -0700, Tejun Heo wrote:
> Hello,
>
> On Thu, Aug 09, 2012 at 03:33:34AM -0400, Kent Overstreet wrote:
> > > If you think the active dropping is justified, please let the change
> > > and justification clearly stated. You're burying the active change in
> > > two separate patches without even mentioning it or cc'ing people who
> > > care about bio-integrity (Martin K. Petersen).
> >
> > Not intentionally, he isn't in MAINTAINERS so get_maintainers.pl missed
> > it and it slipped by while I was looking for people to CC. Added him.
>
> git-log is your friend. For one-off patches, doing it this way might
> be okay. Higher layer maintainer would be able to redirect it but if
> you intend to change block layer APIs significantly as you try to do
> in this patch series, you need to be *way* more diligent than you
> currently are. At least I feel risky about acking patches in this
> series.

Wasn't an excuse, just an explanation.

> * Significant change is buried without explicitly mentioning it or
> discussing its implications.

You are entitled to your opinion, but it honestly didn't seem that
significant to me considering there was no code for splitting multi page
bio integrity stuff before.

> * The patchset makes block layer API changes which impact multiple
> stacking and low level drivers which are not particularly known for
> simplicity and robustness, but there's no mention of how the patches
> are tested and/or why the patches would be safe (e.g. reviewed all
> the users and tested certain code paths and am fairly sure all the
> changes should be safe because xxx sort of deal). When asked about
> testing, not much seems to have been done.

You picked a particularly obscure codepath. I have personally tested all
of this code with both dm and md, and I spent quite a bit of time going
over the dm code in particular to verify as best I could that the
bio_clone() changes were safe.

I should've said more before how the patch series as a whole was tested,
but you hadn't asked either.

> * Responses and iterations across patch postings aren't responsive or
> reliable, making it worrisome what will happen when things go south
> after this hits mainline.
>
> You're asking reviewers and maintainers to take a lot more risks than
> they usually have to, which isn't a good way to make forward progress.

This patch series does touch a lot, and I'm certainly very new at
getting stuff into upstream. But I'm doing my best not to miss stuff,
and I've been asking you for advice and working on my workflow. And a
fair amount of the stuff in this patch series has been your ideas (it
was originally much more minimal).

2012-08-14 05:37:40

by Junichi Nomura

[permalink] [raw]
Subject: Re: [PATCH v5 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info

On 08/07/12 07:08, Kent Overstreet wrote:
> struct dm_rq_clone_bio_info {
> struct bio *orig;
> struct dm_rq_target_io *tio;
> + struct bio clone;
> };
...
> - pools->bs = bioset_create(pool_size, 0);
> + pools->bs = bioset_create(pool_size,
> + offsetof(struct dm_rq_clone_bio_info, orig));
> if (!pools->bs)
> goto free_tio_pool_and_out;

Shouldn't this be offsetof(struct dm_rq_clone_bio_info, clone)?

--
Jun'ichi Nomura, NEC Corporation

2012-08-15 20:46:25

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 02/12] dm: Use bioset's front_pad for dm_rq_clone_bio_info

On Tue, Aug 14, 2012 at 02:33:20PM +0900, Jun'ichi Nomura wrote:
> On 08/07/12 07:08, Kent Overstreet wrote:
> > struct dm_rq_clone_bio_info {
> > struct bio *orig;
> > struct dm_rq_target_io *tio;
> > + struct bio clone;
> > };
> ...
> > - pools->bs = bioset_create(pool_size, 0);
> > + pools->bs = bioset_create(pool_size,
> > + offsetof(struct dm_rq_clone_bio_info, orig));
> > if (!pools->bs)
> > goto free_tio_pool_and_out;
>
> Shouldn't this be offsetof(struct dm_rq_clone_bio_info, clone)?

Ouch! Yes, it definitely should be. Good catch, and thanks.

2012-08-15 22:19:42

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH v5 05/12] block: Kill bi_destructor

On Wed, Aug 08, 2012 at 11:34:09PM -0700, Tejun Heo wrote:
> Hello,
>
> On Wed, Aug 8, 2012 at 11:12 PM, Kent Overstreet <[email protected]> wrote:
> > But if it's a pointer to heap allocated memory, but the bio was embedded
> > in another struct? I've seen a fair number of instances of that (md, off
> > the top of my head).
> >
> > If you're sure that in a normal config the slab allocator is going to
> > complain right away and not corrupt itself, fine. But I've been bitten
> > way too hard by bugs that could've been caught right away by a simple
> > assert and instead I had to spend hours backtracking, and the block
> > layer is _rife_ with that kind of thing.
>
> Let's let slab debug code deal with that. I really don't see much
> benefit in doing this. The said kind of bugs aren't particularly
> difficult to track down.

The only difference would be changing
#define BIO_KMALLOC_POOL ((void *) ~0)

to
#define BIO_KMALLOC_POOL NULL

We want a define for this - bio_alloc_bioset(GFP_KERNEL, 1, NULL)
doesn't make any sense. I just don't see the argument for changing an
arbitrary constant... If it's just the ~0 pointer you don't like, I
originally used a real statically allocated struct bio_set as a sentinel
value.