2018-05-09 01:37:55

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 00/10] Misc block layer patches for bcachefs

- Add separately allowed mempools, biosets: bcachefs uses both all over the
place

- Bit of utility code - bio_copy_data_iter(), zero_fill_bio_iter()

- bio_list_copy_data(), the bi_next check - defensiveness because of a bug I
had fun chasing down at one point

- add some exports, because bcachefs does dio its own way
- show whether fua is supported in sysfs, because I don't know of anything that
exports whether the _block layer_ specifically thinks fua is supported.

Kent Overstreet (10):
mempool: Add mempool_init()/mempool_exit()
block: Convert bio_set to mempool_init()
block: Add bioset_init()/bioset_exit()
block: Use bioset_init() for fs_bio_set
block: Add bio_copy_data_iter(), zero_fill_bio_iter()
block: Split out bio_list_copy_data()
block: Add missing flush_dcache_page() call
block: Add warning for bi_next not NULL in bio_endio()
block: Export bio check/set pages_dirty
block: Add sysfs entry for fua support

block/bio-integrity.c | 29 ++--
block/bio.c | 226 ++++++++++++++++++----------
block/blk-core.c | 10 +-
block/blk-sysfs.c | 11 ++
drivers/block/pktcdvd.c | 2 +-
drivers/target/target_core_iblock.c | 2 +-
include/linux/bio.h | 35 +++--
include/linux/mempool.h | 34 +++++
mm/mempool.c | 108 +++++++++----
9 files changed, 320 insertions(+), 137 deletions(-)

--
2.17.0



2018-05-09 01:35:11

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 10/10] block: Add sysfs entry for fua support

Signed-off-by: Kent Overstreet <[email protected]>
---
block/blk-sysfs.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index cbea895a55..d6dd7d8198 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -497,6 +497,11 @@ static ssize_t queue_wc_store(struct request_queue *q, const char *page,
return count;
}

+static ssize_t queue_fua_show(struct request_queue *q, char *page)
+{
+ return sprintf(page, "%u\n", test_bit(QUEUE_FLAG_FUA, &q->queue_flags));
+}
+
static ssize_t queue_dax_show(struct request_queue *q, char *page)
{
return queue_var_show(blk_queue_dax(q), page);
@@ -665,6 +670,11 @@ static struct queue_sysfs_entry queue_wc_entry = {
.store = queue_wc_store,
};

+static struct queue_sysfs_entry queue_fua_entry = {
+ .attr = {.name = "fua", .mode = S_IRUGO },
+ .show = queue_fua_show,
+};
+
static struct queue_sysfs_entry queue_dax_entry = {
.attr = {.name = "dax", .mode = S_IRUGO },
.show = queue_dax_show,
@@ -714,6 +724,7 @@ static struct attribute *default_attrs[] = {
&queue_random_entry.attr,
&queue_poll_entry.attr,
&queue_wc_entry.attr,
+ &queue_fua_entry.attr,
&queue_dax_entry.attr,
&queue_wb_lat_entry.attr,
&queue_poll_delay_entry.attr,
--
2.17.0


2018-05-09 01:36:02

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 09/10] block: Export bio check/set pages_dirty

Signed-off-by: Kent Overstreet <[email protected]>
---
block/bio.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 5c81391100..6689102f5d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1610,6 +1610,7 @@ void bio_set_pages_dirty(struct bio *bio)
set_page_dirty_lock(page);
}
}
+EXPORT_SYMBOL_GPL(bio_set_pages_dirty);

static void bio_release_pages(struct bio *bio)
{
@@ -1693,6 +1694,7 @@ void bio_check_pages_dirty(struct bio *bio)
bio_put(bio);
}
}
+EXPORT_SYMBOL_GPL(bio_check_pages_dirty);

void generic_start_io_acct(struct request_queue *q, int rw,
unsigned long sectors, struct hd_struct *part)
--
2.17.0


2018-05-09 01:36:41

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 06/10] block: Split out bio_list_copy_data()

Found a bug (with ASAN) where we were passing a bio to bio_copy_data()
with bi_next not NULL, when it should have been - a driver had left
bi_next set to something after calling bio_endio().

Since the normal case is only copying single bios, split out
bio_list_copy_data() to avoid more bugs like this in the future.

Signed-off-by: Kent Overstreet <[email protected]>
---
block/bio.c | 83 +++++++++++++++++++++++++----------------
drivers/block/pktcdvd.c | 2 +-
include/linux/bio.h | 5 ++-
3 files changed, 55 insertions(+), 35 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index d7bd765e9e..c58544d4bc 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -971,32 +971,16 @@ void bio_advance(struct bio *bio, unsigned bytes)
}
EXPORT_SYMBOL(bio_advance);

-void bio_copy_data_iter(struct bio *dst, struct bvec_iter dst_iter,
- struct bio *src, struct bvec_iter src_iter)
+void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
+ struct bio *src, struct bvec_iter *src_iter)
{
struct bio_vec src_bv, dst_bv;
void *src_p, *dst_p;
unsigned bytes;

- while (1) {
- if (!src_iter.bi_size) {
- src = src->bi_next;
- if (!src)
- break;
-
- src_iter = src->bi_iter;
- }
-
- if (!dst_iter.bi_size) {
- dst = dst->bi_next;
- if (!dst)
- break;
-
- dst_iter = dst->bi_iter;
- }
-
- src_bv = bio_iter_iovec(src, src_iter);
- dst_bv = bio_iter_iovec(dst, dst_iter);
+ while (src_iter->bi_size && dst_iter->bi_size) {
+ src_bv = bio_iter_iovec(src, *src_iter);
+ dst_bv = bio_iter_iovec(dst, *dst_iter);

bytes = min(src_bv.bv_len, dst_bv.bv_len);

@@ -1010,31 +994,66 @@ void bio_copy_data_iter(struct bio *dst, struct bvec_iter dst_iter,
kunmap_atomic(dst_p);
kunmap_atomic(src_p);

- bio_advance_iter(src, &src_iter, bytes);
- bio_advance_iter(dst, &dst_iter, bytes);
+ bio_advance_iter(src, src_iter, bytes);
+ bio_advance_iter(dst, dst_iter, bytes);
}
}
EXPORT_SYMBOL(bio_copy_data_iter);

/**
- * bio_copy_data - copy contents of data buffers from one chain of bios to
- * another
- * @src: source bio list
- * @dst: destination bio list
- *
- * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
- * @src and @dst as linked lists of bios.
+ * bio_copy_data - copy contents of data buffers from one bio to another
+ * @src: source bio
+ * @dst: destination bio
*
* Stops when it reaches the end of either @src or @dst - that is, copies
* min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
*/
void bio_copy_data(struct bio *dst, struct bio *src)
{
- bio_copy_data_iter(dst, dst->bi_iter,
- src, src->bi_iter);
+ struct bvec_iter src_iter = src->bi_iter;
+ struct bvec_iter dst_iter = dst->bi_iter;
+
+ bio_copy_data_iter(dst, &dst_iter, src, &src_iter);
}
EXPORT_SYMBOL(bio_copy_data);

+/**
+ * bio_list_copy_data - copy contents of data buffers from one chain of bios to
+ * another
+ * @src: source bio list
+ * @dst: destination bio list
+ *
+ * Stops when it reaches the end of either the @src list or @dst list - that is,
+ * copies min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of
+ * bios).
+ */
+void bio_list_copy_data(struct bio *dst, struct bio *src)
+{
+ struct bvec_iter src_iter = src->bi_iter;
+ struct bvec_iter dst_iter = dst->bi_iter;
+
+ while (1) {
+ if (!src_iter.bi_size) {
+ src = src->bi_next;
+ if (!src)
+ break;
+
+ src_iter = src->bi_iter;
+ }
+
+ if (!dst_iter.bi_size) {
+ dst = dst->bi_next;
+ if (!dst)
+ break;
+
+ dst_iter = dst->bi_iter;
+ }
+
+ bio_copy_data_iter(dst, &dst_iter, src, &src_iter);
+ }
+}
+EXPORT_SYMBOL(bio_list_copy_data);
+
struct bio_map_data {
int is_our_pages;
struct iov_iter iter;
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index c61d20c9f3..00ea788b17 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -1285,7 +1285,7 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
* Fill-in bvec with data from orig_bios.
*/
spin_lock(&pkt->lock);
- bio_copy_data(pkt->w_bio, pkt->orig_bios.head);
+ bio_list_copy_data(pkt->w_bio, pkt->orig_bios.head);

pkt_set_state(pkt, PACKET_WRITE_WAIT_STATE);
spin_unlock(&pkt->lock);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 5a6ee955a8..98b175cc00 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -505,9 +505,10 @@ static inline void bio_flush_dcache_pages(struct bio *bi)
}
#endif

-extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter dst_iter,
- struct bio *src, struct bvec_iter src_iter);
+extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
+ struct bio *src, struct bvec_iter *src_iter);
extern void bio_copy_data(struct bio *dst, struct bio *src);
+extern void bio_list_copy_data(struct bio *dst, struct bio *src);
extern void bio_free_pages(struct bio *bio);

extern struct bio *bio_copy_user_iov(struct request_queue *,
--
2.17.0


2018-05-09 01:37:11

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 02/10] block: Convert bio_set to mempool_init()

Minor performance improvement by getting rid of pointer indirections
from allocation/freeing fastpaths.

Signed-off-by: Kent Overstreet <[email protected]>
---
block/bio-integrity.c | 29 ++++++++++++++---------------
block/bio.c | 36 +++++++++++++++++-------------------
include/linux/bio.h | 10 +++++-----
3 files changed, 36 insertions(+), 39 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 9cfdd6c83b..add7c7c853 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -56,12 +56,12 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
struct bio_set *bs = bio->bi_pool;
unsigned inline_vecs;

- if (!bs || !bs->bio_integrity_pool) {
+ if (!bs || !mempool_initialized(&bs->bio_integrity_pool)) {
bip = kmalloc(sizeof(struct bio_integrity_payload) +
sizeof(struct bio_vec) * nr_vecs, gfp_mask);
inline_vecs = nr_vecs;
} else {
- bip = mempool_alloc(bs->bio_integrity_pool, gfp_mask);
+ bip = mempool_alloc(&bs->bio_integrity_pool, gfp_mask);
inline_vecs = BIP_INLINE_VECS;
}

@@ -74,7 +74,7 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
unsigned long idx = 0;

bip->bip_vec = bvec_alloc(gfp_mask, nr_vecs, &idx,
- bs->bvec_integrity_pool);
+ &bs->bvec_integrity_pool);
if (!bip->bip_vec)
goto err;
bip->bip_max_vcnt = bvec_nr_vecs(idx);
@@ -90,7 +90,7 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,

return bip;
err:
- mempool_free(bip, bs->bio_integrity_pool);
+ mempool_free(bip, &bs->bio_integrity_pool);
return ERR_PTR(-ENOMEM);
}
EXPORT_SYMBOL(bio_integrity_alloc);
@@ -111,10 +111,10 @@ static void bio_integrity_free(struct bio *bio)
kfree(page_address(bip->bip_vec->bv_page) +
bip->bip_vec->bv_offset);

- if (bs && bs->bio_integrity_pool) {
- bvec_free(bs->bvec_integrity_pool, bip->bip_vec, bip->bip_slab);
+ if (bs && mempool_initialized(&bs->bio_integrity_pool)) {
+ bvec_free(&bs->bvec_integrity_pool, bip->bip_vec, bip->bip_slab);

- mempool_free(bip, bs->bio_integrity_pool);
+ mempool_free(bip, &bs->bio_integrity_pool);
} else {
kfree(bip);
}
@@ -465,16 +465,15 @@ EXPORT_SYMBOL(bio_integrity_clone);

int bioset_integrity_create(struct bio_set *bs, int pool_size)
{
- if (bs->bio_integrity_pool)
+ if (mempool_initialized(&bs->bio_integrity_pool))
return 0;

- bs->bio_integrity_pool = mempool_create_slab_pool(pool_size, bip_slab);
- if (!bs->bio_integrity_pool)
+ if (mempool_init_slab_pool(&bs->bio_integrity_pool,
+ pool_size, bip_slab))
return -1;

- bs->bvec_integrity_pool = biovec_create_pool(pool_size);
- if (!bs->bvec_integrity_pool) {
- mempool_destroy(bs->bio_integrity_pool);
+ if (biovec_init_pool(&bs->bvec_integrity_pool, pool_size)) {
+ mempool_exit(&bs->bio_integrity_pool);
return -1;
}

@@ -484,8 +483,8 @@ EXPORT_SYMBOL(bioset_integrity_create);

void bioset_integrity_free(struct bio_set *bs)
{
- mempool_destroy(bs->bio_integrity_pool);
- mempool_destroy(bs->bvec_integrity_pool);
+ mempool_exit(&bs->bio_integrity_pool);
+ mempool_exit(&bs->bvec_integrity_pool);
}
EXPORT_SYMBOL(bioset_integrity_free);

diff --git a/block/bio.c b/block/bio.c
index e1708db482..360e9bcea5 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -254,7 +254,7 @@ static void bio_free(struct bio *bio)
bio_uninit(bio);

if (bs) {
- bvec_free(bs->bvec_pool, bio->bi_io_vec, BVEC_POOL_IDX(bio));
+ bvec_free(&bs->bvec_pool, bio->bi_io_vec, BVEC_POOL_IDX(bio));

/*
* If we have front padding, adjust the bio pointer before freeing
@@ -262,7 +262,7 @@ static void bio_free(struct bio *bio)
p = bio;
p -= bs->front_pad;

- mempool_free(p, bs->bio_pool);
+ mempool_free(p, &bs->bio_pool);
} else {
/* Bio was allocated by bio_kmalloc() */
kfree(bio);
@@ -454,7 +454,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
inline_vecs = nr_iovecs;
} else {
/* should not use nobvec bioset for nr_iovecs > 0 */
- if (WARN_ON_ONCE(!bs->bvec_pool && nr_iovecs > 0))
+ if (WARN_ON_ONCE(!mempool_initialized(&bs->bvec_pool) &&
+ nr_iovecs > 0))
return NULL;
/*
* generic_make_request() converts recursion to iteration; this
@@ -483,11 +484,11 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
bs->rescue_workqueue)
gfp_mask &= ~__GFP_DIRECT_RECLAIM;

- p = mempool_alloc(bs->bio_pool, gfp_mask);
+ p = mempool_alloc(&bs->bio_pool, gfp_mask);
if (!p && gfp_mask != saved_gfp) {
punt_bios_to_rescuer(bs);
gfp_mask = saved_gfp;
- p = mempool_alloc(bs->bio_pool, gfp_mask);
+ p = mempool_alloc(&bs->bio_pool, gfp_mask);
}

front_pad = bs->front_pad;
@@ -503,11 +504,11 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
if (nr_iovecs > inline_vecs) {
unsigned long idx = 0;

- bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx, bs->bvec_pool);
+ bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx, &bs->bvec_pool);
if (!bvl && gfp_mask != saved_gfp) {
punt_bios_to_rescuer(bs);
gfp_mask = saved_gfp;
- bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx, bs->bvec_pool);
+ bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx, &bs->bvec_pool);
}

if (unlikely(!bvl))
@@ -524,7 +525,7 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
return bio;

err_free:
- mempool_free(p, bs->bio_pool);
+ mempool_free(p, &bs->bio_pool);
return NULL;
}
EXPORT_SYMBOL(bio_alloc_bioset);
@@ -1848,11 +1849,11 @@ EXPORT_SYMBOL_GPL(bio_trim);
* create memory pools for biovec's in a bio_set.
* use the global biovec slabs created for general use.
*/
-mempool_t *biovec_create_pool(int pool_entries)
+int biovec_init_pool(mempool_t *pool, int pool_entries)
{
struct biovec_slab *bp = bvec_slabs + BVEC_POOL_MAX;

- return mempool_create_slab_pool(pool_entries, bp->slab);
+ return mempool_init_slab_pool(pool, pool_entries, bp->slab);
}

void bioset_free(struct bio_set *bs)
@@ -1860,8 +1861,8 @@ void bioset_free(struct bio_set *bs)
if (bs->rescue_workqueue)
destroy_workqueue(bs->rescue_workqueue);

- mempool_destroy(bs->bio_pool);
- mempool_destroy(bs->bvec_pool);
+ mempool_exit(&bs->bio_pool);
+ mempool_exit(&bs->bvec_pool);

bioset_integrity_free(bs);
bio_put_slab(bs);
@@ -1913,15 +1914,12 @@ struct bio_set *bioset_create(unsigned int pool_size,
return NULL;
}

- bs->bio_pool = mempool_create_slab_pool(pool_size, bs->bio_slab);
- if (!bs->bio_pool)
+ if (mempool_init_slab_pool(&bs->bio_pool, pool_size, bs->bio_slab))
goto bad;

- if (flags & BIOSET_NEED_BVECS) {
- bs->bvec_pool = biovec_create_pool(pool_size);
- if (!bs->bvec_pool)
- goto bad;
- }
+ if ((flags & BIOSET_NEED_BVECS) &&
+ biovec_init_pool(&bs->bvec_pool, pool_size))
+ goto bad;

if (!(flags & BIOSET_NEED_RESCUER))
return bs;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ce547a25e8..720f7261d0 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -412,7 +412,7 @@ enum {
BIOSET_NEED_RESCUER = BIT(1),
};
extern void bioset_free(struct bio_set *);
-extern mempool_t *biovec_create_pool(int pool_entries);
+extern int biovec_init_pool(mempool_t *pool, int pool_entries);

extern struct bio *bio_alloc_bioset(gfp_t, unsigned int, struct bio_set *);
extern void bio_put(struct bio *);
@@ -722,11 +722,11 @@ struct bio_set {
struct kmem_cache *bio_slab;
unsigned int front_pad;

- mempool_t *bio_pool;
- mempool_t *bvec_pool;
+ mempool_t bio_pool;
+ mempool_t bvec_pool;
#if defined(CONFIG_BLK_DEV_INTEGRITY)
- mempool_t *bio_integrity_pool;
- mempool_t *bvec_integrity_pool;
+ mempool_t bio_integrity_pool;
+ mempool_t bvec_integrity_pool;
#endif

/*
--
2.17.0


2018-05-09 01:37:11

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 05/10] block: Add bio_copy_data_iter(), zero_fill_bio_iter()

Add versions that take bvec_iter args instead of using bio->bi_iter - to
be used by bcachefs.

Signed-off-by: Kent Overstreet <[email protected]>
---
block/bio.c | 44 ++++++++++++++++++++++++--------------------
include/linux/bio.h | 18 +++++++++++++++---
2 files changed, 39 insertions(+), 23 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index b7cdad6fc4..d7bd765e9e 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -530,20 +530,20 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs,
}
EXPORT_SYMBOL(bio_alloc_bioset);

-void zero_fill_bio(struct bio *bio)
+void zero_fill_bio_iter(struct bio *bio, struct bvec_iter start)
{
unsigned long flags;
struct bio_vec bv;
struct bvec_iter iter;

- bio_for_each_segment(bv, bio, iter) {
+ __bio_for_each_segment(bv, bio, iter, start) {
char *data = bvec_kmap_irq(&bv, &flags);
memset(data, 0, bv.bv_len);
flush_dcache_page(bv.bv_page);
bvec_kunmap_irq(data, &flags);
}
}
-EXPORT_SYMBOL(zero_fill_bio);
+EXPORT_SYMBOL(zero_fill_bio_iter);

/**
* bio_put - release a reference to a bio
@@ -971,28 +971,13 @@ void bio_advance(struct bio *bio, unsigned bytes)
}
EXPORT_SYMBOL(bio_advance);

-/**
- * bio_copy_data - copy contents of data buffers from one chain of bios to
- * another
- * @src: source bio list
- * @dst: destination bio list
- *
- * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
- * @src and @dst as linked lists of bios.
- *
- * Stops when it reaches the end of either @src or @dst - that is, copies
- * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
- */
-void bio_copy_data(struct bio *dst, struct bio *src)
+void bio_copy_data_iter(struct bio *dst, struct bvec_iter dst_iter,
+ struct bio *src, struct bvec_iter src_iter)
{
- struct bvec_iter src_iter, dst_iter;
struct bio_vec src_bv, dst_bv;
void *src_p, *dst_p;
unsigned bytes;

- src_iter = src->bi_iter;
- dst_iter = dst->bi_iter;
-
while (1) {
if (!src_iter.bi_size) {
src = src->bi_next;
@@ -1029,6 +1014,25 @@ void bio_copy_data(struct bio *dst, struct bio *src)
bio_advance_iter(dst, &dst_iter, bytes);
}
}
+EXPORT_SYMBOL(bio_copy_data_iter);
+
+/**
+ * bio_copy_data - copy contents of data buffers from one chain of bios to
+ * another
+ * @src: source bio list
+ * @dst: destination bio list
+ *
+ * If @src and @dst are single bios, bi_next must be NULL - otherwise, treats
+ * @src and @dst as linked lists of bios.
+ *
+ * Stops when it reaches the end of either @src or @dst - that is, copies
+ * min(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).
+ */
+void bio_copy_data(struct bio *dst, struct bio *src)
+{
+ bio_copy_data_iter(dst, dst->bi_iter,
+ src, src->bi_iter);
+}
EXPORT_SYMBOL(bio_copy_data);

struct bio_map_data {
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 91b02520e2..5a6ee955a8 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -67,8 +67,12 @@

#define bio_multiple_segments(bio) \
((bio)->bi_iter.bi_size != bio_iovec(bio).bv_len)
-#define bio_sectors(bio) ((bio)->bi_iter.bi_size >> 9)
-#define bio_end_sector(bio) ((bio)->bi_iter.bi_sector + bio_sectors((bio)))
+
+#define bvec_iter_sectors(iter) ((iter).bi_size >> 9)
+#define bvec_iter_end_sector(iter) ((iter).bi_sector + bvec_iter_sectors((iter)))
+
+#define bio_sectors(bio) bvec_iter_sectors((bio)->bi_iter)
+#define bio_end_sector(bio) bvec_iter_end_sector((bio)->bi_iter)

/*
* Return the data direction, READ or WRITE.
@@ -501,6 +505,8 @@ static inline void bio_flush_dcache_pages(struct bio *bi)
}
#endif

+extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter dst_iter,
+ struct bio *src, struct bvec_iter src_iter);
extern void bio_copy_data(struct bio *dst, struct bio *src);
extern void bio_free_pages(struct bio *bio);

@@ -509,7 +515,13 @@ extern struct bio *bio_copy_user_iov(struct request_queue *,
struct iov_iter *,
gfp_t);
extern int bio_uncopy_user(struct bio *);
-void zero_fill_bio(struct bio *bio);
+void zero_fill_bio_iter(struct bio *bio, struct bvec_iter iter);
+
+static inline void zero_fill_bio(struct bio *bio)
+{
+ zero_fill_bio_iter(bio, bio->bi_iter);
+}
+
extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
extern unsigned int bvec_nr_vecs(unsigned short idx);
--
2.17.0


2018-05-09 01:37:19

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 07/10] block: Add missing flush_dcache_page() call

Since a bio can point to userspace pages (e.g. direct IO), this is
generally necessary.

Signed-off-by: Kent Overstreet <[email protected]>
---
block/bio.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index c58544d4bc..ce8e259f9a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -994,6 +994,8 @@ void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
kunmap_atomic(dst_p);
kunmap_atomic(src_p);

+ flush_dcache_page(dst_bv.bv_page);
+
bio_advance_iter(src, src_iter, bytes);
bio_advance_iter(dst, dst_iter, bytes);
}
--
2.17.0


2018-05-09 01:37:31

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 03/10] block: Add bioset_init()/bioset_exit()

Similarly to mempool_init()/mempool_exit(), take a pointer indirection
out of allocation/freeing by allowing biosets to be embedded in other
structs.

Signed-off-by: Kent Overstreet <[email protected]>
---
block/bio.c | 93 +++++++++++++++++++++++++++++++--------------
include/linux/bio.h | 2 +
2 files changed, 67 insertions(+), 28 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 360e9bcea5..980befd919 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1856,21 +1856,83 @@ int biovec_init_pool(mempool_t *pool, int pool_entries)
return mempool_init_slab_pool(pool, pool_entries, bp->slab);
}

-void bioset_free(struct bio_set *bs)
+/*
+ * bioset_exit - exit a bioset initialized with bioset_init()
+ *
+ * May be called on a zeroed but uninitialized bioset (i.e. allocated with
+ * kzalloc()).
+ */
+void bioset_exit(struct bio_set *bs)
{
if (bs->rescue_workqueue)
destroy_workqueue(bs->rescue_workqueue);
+ bs->rescue_workqueue = NULL;

mempool_exit(&bs->bio_pool);
mempool_exit(&bs->bvec_pool);

bioset_integrity_free(bs);
- bio_put_slab(bs);
+ if (bs->bio_slab)
+ bio_put_slab(bs);
+ bs->bio_slab = NULL;
+}
+EXPORT_SYMBOL(bioset_exit);

+void bioset_free(struct bio_set *bs)
+{
+ bioset_exit(bs);
kfree(bs);
}
EXPORT_SYMBOL(bioset_free);

+/**
+ * bioset_init - Initialize a bio_set
+ * @pool_size: Number of bio and bio_vecs to cache in the mempool
+ * @front_pad: Number of bytes to allocate in front of the returned bio
+ * @flags: Flags to modify behavior, currently %BIOSET_NEED_BVECS
+ * and %BIOSET_NEED_RESCUER
+ *
+ * Similar to bioset_create(), but initializes a passed-in bioset instead of
+ * separately allocating it.
+ */
+int bioset_init(struct bio_set *bs,
+ unsigned int pool_size,
+ unsigned int front_pad,
+ int flags)
+{
+ unsigned int back_pad = BIO_INLINE_VECS * sizeof(struct bio_vec);
+
+ bs->front_pad = front_pad;
+
+ spin_lock_init(&bs->rescue_lock);
+ bio_list_init(&bs->rescue_list);
+ INIT_WORK(&bs->rescue_work, bio_alloc_rescue);
+
+ bs->bio_slab = bio_find_or_create_slab(front_pad + back_pad);
+ if (!bs->bio_slab)
+ return -ENOMEM;
+
+ if (mempool_init_slab_pool(&bs->bio_pool, pool_size, bs->bio_slab))
+ goto bad;
+
+ if ((flags & BIOSET_NEED_BVECS) &&
+ biovec_init_pool(&bs->bvec_pool, pool_size))
+ goto bad;
+
+ if (!(flags & BIOSET_NEED_RESCUER))
+ return 0;
+
+ bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
+ if (!bs->rescue_workqueue)
+ goto bad;
+
+ return 0;
+bad:
+ bioset_exit(bs);
+ return -ENOMEM;
+}
+EXPORT_SYMBOL(bioset_init);
+
/**
* bioset_create - Create a bio_set
* @pool_size: Number of bio and bio_vecs to cache in the mempool
@@ -1895,43 +1957,18 @@ struct bio_set *bioset_create(unsigned int pool_size,
unsigned int front_pad,
int flags)
{
- unsigned int back_pad = BIO_INLINE_VECS * sizeof(struct bio_vec);
struct bio_set *bs;

bs = kzalloc(sizeof(*bs), GFP_KERNEL);
if (!bs)
return NULL;

- bs->front_pad = front_pad;
-
- spin_lock_init(&bs->rescue_lock);
- bio_list_init(&bs->rescue_list);
- INIT_WORK(&bs->rescue_work, bio_alloc_rescue);
-
- bs->bio_slab = bio_find_or_create_slab(front_pad + back_pad);
- if (!bs->bio_slab) {
+ if (bioset_init(bs, pool_size, front_pad, flags)) {
kfree(bs);
return NULL;
}

- if (mempool_init_slab_pool(&bs->bio_pool, pool_size, bs->bio_slab))
- goto bad;
-
- if ((flags & BIOSET_NEED_BVECS) &&
- biovec_init_pool(&bs->bvec_pool, pool_size))
- goto bad;
-
- if (!(flags & BIOSET_NEED_RESCUER))
- return bs;
-
- bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
- if (!bs->rescue_workqueue)
- goto bad;
-
return bs;
-bad:
- bioset_free(bs);
- return NULL;
}
EXPORT_SYMBOL(bioset_create);

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 720f7261d0..fa3cf94a50 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -406,6 +406,8 @@ static inline struct bio *bio_next_split(struct bio *bio, int sectors,
return bio_split(bio, sectors, gfp, bs);
}

+extern int bioset_init(struct bio_set *, unsigned int, unsigned int, int flags);
+extern void bioset_exit(struct bio_set *);
extern struct bio_set *bioset_create(unsigned int, unsigned int, int flags);
enum {
BIOSET_NEED_BVECS = BIT(0),
--
2.17.0


2018-05-09 01:37:37

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 08/10] block: Add warning for bi_next not NULL in bio_endio()

Recently found a bug where a driver left bi_next not NULL and then
called bio_endio(), and then the submitter of the bio used
bio_copy_data() which was treating src and dst as lists of bios.

Fixed that bug by splitting out bio_list_copy_data(), but in case other
things are depending on bi_next in weird ways, add a warning to help
avoid more bugs like that in the future.

Signed-off-by: Kent Overstreet <[email protected]>
---
block/bio.c | 3 +++
block/blk-core.c | 8 +++++++-
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index ce8e259f9a..5c81391100 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1775,6 +1775,9 @@ void bio_endio(struct bio *bio)
if (!bio_integrity_endio(bio))
return;

+ if (WARN_ONCE(bio->bi_next, "driver left bi_next not NULL"))
+ bio->bi_next = NULL;
+
/*
* Need to have a real endio function for chained bios, otherwise
* various corner cases will break (like stacking block devices that
diff --git a/block/blk-core.c b/block/blk-core.c
index 66f24798ef..f3cf79198a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -204,6 +204,10 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
bio_advance(bio, nbytes);

/* don't actually finish bio if it's part of flush sequence */
+ /*
+ * XXX this code looks suspicious - it's not consistent with advancing
+ * req->bio in caller
+ */
if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
bio_endio(bio);
}
@@ -2982,8 +2986,10 @@ bool blk_update_request(struct request *req, blk_status_t error,
struct bio *bio = req->bio;
unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes);

- if (bio_bytes == bio->bi_iter.bi_size)
+ if (bio_bytes == bio->bi_iter.bi_size) {
req->bio = bio->bi_next;
+ bio->bi_next = NULL;
+ }

/* Completion has already been traced */
bio_clear_flag(bio, BIO_TRACE_COMPLETION);
--
2.17.0


2018-05-09 01:38:05

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 04/10] block: Use bioset_init() for fs_bio_set

Minor optimization - remove a pointer indirection when using fs_bio_set.

Signed-off-by: Kent Overstreet <[email protected]>
---
block/bio.c | 7 +++----
block/blk-core.c | 2 +-
drivers/target/target_core_iblock.c | 2 +-
include/linux/bio.h | 4 ++--
4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 980befd919..b7cdad6fc4 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -53,7 +53,7 @@ static struct biovec_slab bvec_slabs[BVEC_POOL_NR] __read_mostly = {
* fs_bio_set is the bio_set containing bio and iovec memory pools used by
* IO code that does not need private memory pools.
*/
-struct bio_set *fs_bio_set;
+struct bio_set fs_bio_set;
EXPORT_SYMBOL(fs_bio_set);

/*
@@ -2055,11 +2055,10 @@ static int __init init_bio(void)
bio_integrity_init();
biovec_init_slabs();

- fs_bio_set = bioset_create(BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS);
- if (!fs_bio_set)
+ if (bioset_init(&fs_bio_set, BIO_POOL_SIZE, 0, BIOSET_NEED_BVECS))
panic("bio: can't allocate bios\n");

- if (bioset_integrity_create(fs_bio_set, BIO_POOL_SIZE))
+ if (bioset_integrity_create(&fs_bio_set, BIO_POOL_SIZE))
panic("bio: can't create integrity pool\n");

return 0;
diff --git a/block/blk-core.c b/block/blk-core.c
index 6d82c4f7fa..66f24798ef 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3409,7 +3409,7 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
struct bio *bio, *bio_src;

if (!bs)
- bs = fs_bio_set;
+ bs = &fs_bio_set;

__rq_for_each_bio(bio_src, rq_src) {
bio = bio_clone_fast(bio_src, gfp_mask, bs);
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index 07c814c426..c969c01c7c 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -164,7 +164,7 @@ static int iblock_configure_device(struct se_device *dev)
goto out_blkdev_put;
}
pr_debug("IBLOCK setup BIP bs->bio_integrity_pool: %p\n",
- bs->bio_integrity_pool);
+ &bs->bio_integrity_pool);
}
dev->dev_attrib.hw_pi_prot_type = dev->dev_attrib.pi_prot_type;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index fa3cf94a50..91b02520e2 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -423,11 +423,11 @@ extern void __bio_clone_fast(struct bio *, struct bio *);
extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *);
extern struct bio *bio_clone_bioset(struct bio *, gfp_t, struct bio_set *bs);

-extern struct bio_set *fs_bio_set;
+extern struct bio_set fs_bio_set;

static inline struct bio *bio_alloc(gfp_t gfp_mask, unsigned int nr_iovecs)
{
- return bio_alloc_bioset(gfp_mask, nr_iovecs, fs_bio_set);
+ return bio_alloc_bioset(gfp_mask, nr_iovecs, &fs_bio_set);
}

static inline struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned int nr_iovecs)
--
2.17.0


2018-05-09 01:38:46

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 01/10] mempool: Add mempool_init()/mempool_exit()

Allows mempools to be embedded in other structs, getting rid of a
pointer indirection from allocation fastpaths.

mempool_exit() is safe to call on an uninitialized but zeroed mempool.

Signed-off-by: Kent Overstreet <[email protected]>
---
include/linux/mempool.h | 34 +++++++++++++
mm/mempool.c | 108 ++++++++++++++++++++++++++++++----------
2 files changed, 115 insertions(+), 27 deletions(-)

diff --git a/include/linux/mempool.h b/include/linux/mempool.h
index b51f5c430c..0c964ac107 100644
--- a/include/linux/mempool.h
+++ b/include/linux/mempool.h
@@ -25,6 +25,18 @@ typedef struct mempool_s {
wait_queue_head_t wait;
} mempool_t;

+static inline bool mempool_initialized(mempool_t *pool)
+{
+ return pool->elements != NULL;
+}
+
+void mempool_exit(mempool_t *pool);
+int mempool_init_node(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
+ mempool_free_t *free_fn, void *pool_data,
+ gfp_t gfp_mask, int node_id);
+int mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
+ mempool_free_t *free_fn, void *pool_data);
+
extern mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data);
extern mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
@@ -43,6 +55,14 @@ extern void mempool_free(void *element, mempool_t *pool);
*/
void *mempool_alloc_slab(gfp_t gfp_mask, void *pool_data);
void mempool_free_slab(void *element, void *pool_data);
+
+static inline int
+mempool_init_slab_pool(mempool_t *pool, int min_nr, struct kmem_cache *kc)
+{
+ return mempool_init(pool, min_nr, mempool_alloc_slab,
+ mempool_free_slab, (void *) kc);
+}
+
static inline mempool_t *
mempool_create_slab_pool(int min_nr, struct kmem_cache *kc)
{
@@ -56,6 +76,13 @@ mempool_create_slab_pool(int min_nr, struct kmem_cache *kc)
*/
void *mempool_kmalloc(gfp_t gfp_mask, void *pool_data);
void mempool_kfree(void *element, void *pool_data);
+
+static inline int mempool_init_kmalloc_pool(mempool_t *pool, int min_nr, size_t size)
+{
+ return mempool_init(pool, min_nr, mempool_kmalloc,
+ mempool_kfree, (void *) size);
+}
+
static inline mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size)
{
return mempool_create(min_nr, mempool_kmalloc, mempool_kfree,
@@ -68,6 +95,13 @@ static inline mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size)
*/
void *mempool_alloc_pages(gfp_t gfp_mask, void *pool_data);
void mempool_free_pages(void *element, void *pool_data);
+
+static inline int mempool_init_page_pool(mempool_t *pool, int min_nr, int order)
+{
+ return mempool_init(pool, min_nr, mempool_alloc_pages,
+ mempool_free_pages, (void *)(long)order);
+}
+
static inline mempool_t *mempool_create_page_pool(int min_nr, int order)
{
return mempool_create(min_nr, mempool_alloc_pages, mempool_free_pages,
diff --git a/mm/mempool.c b/mm/mempool.c
index 5c9dce3471..df90ace400 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -137,6 +137,28 @@ static void *remove_element(mempool_t *pool, gfp_t flags)
return element;
}

+/**
+ * mempool_destroy - exit a mempool initialized with mempool_init()
+ * @pool: pointer to the memory pool which was initialized with
+ * mempool_init().
+ *
+ * Free all reserved elements in @pool and @pool itself. This function
+ * only sleeps if the free_fn() function sleeps.
+ *
+ * May be called on a zeroed but uninitialized mempool (i.e. allocated with
+ * kzalloc()).
+ */
+void mempool_exit(mempool_t *pool)
+{
+ while (pool->curr_nr) {
+ void *element = remove_element(pool, GFP_KERNEL);
+ pool->free(element, pool->pool_data);
+ }
+ kfree(pool->elements);
+ pool->elements = NULL;
+}
+EXPORT_SYMBOL(mempool_exit);
+
/**
* mempool_destroy - deallocate a memory pool
* @pool: pointer to the memory pool which was allocated via
@@ -150,15 +172,65 @@ void mempool_destroy(mempool_t *pool)
if (unlikely(!pool))
return;

- while (pool->curr_nr) {
- void *element = remove_element(pool, GFP_KERNEL);
- pool->free(element, pool->pool_data);
- }
- kfree(pool->elements);
+ mempool_exit(pool);
kfree(pool);
}
EXPORT_SYMBOL(mempool_destroy);

+int mempool_init_node(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
+ mempool_free_t *free_fn, void *pool_data,
+ gfp_t gfp_mask, int node_id)
+{
+ spin_lock_init(&pool->lock);
+ pool->min_nr = min_nr;
+ pool->pool_data = pool_data;
+ pool->alloc = alloc_fn;
+ pool->free = free_fn;
+ init_waitqueue_head(&pool->wait);
+
+ pool->elements = kmalloc_array_node(min_nr, sizeof(void *),
+ gfp_mask, node_id);
+ if (!pool->elements)
+ return -ENOMEM;
+
+ /*
+ * First pre-allocate the guaranteed number of buffers.
+ */
+ while (pool->curr_nr < pool->min_nr) {
+ void *element;
+
+ element = pool->alloc(gfp_mask, pool->pool_data);
+ if (unlikely(!element)) {
+ mempool_exit(pool);
+ return -ENOMEM;
+ }
+ add_element(pool, element);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(mempool_init_node);
+
+/**
+ * mempool_init - initialize a memory pool
+ * @min_nr: the minimum number of elements guaranteed to be
+ * allocated for this pool.
+ * @alloc_fn: user-defined element-allocation function.
+ * @free_fn: user-defined element-freeing function.
+ * @pool_data: optional private data available to the user-defined functions.
+ *
+ * Like mempool_create(), but initializes the pool in (i.e. embedded in another
+ * structure).
+ */
+int mempool_init(mempool_t *pool, int min_nr, mempool_alloc_t *alloc_fn,
+ mempool_free_t *free_fn, void *pool_data)
+{
+ return mempool_init_node(pool, min_nr, alloc_fn, free_fn,
+ pool_data, GFP_KERNEL, NUMA_NO_NODE);
+
+}
+EXPORT_SYMBOL(mempool_init);
+
/**
* mempool_create - create a memory pool
* @min_nr: the minimum number of elements guaranteed to be
@@ -186,35 +258,17 @@ mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn,
gfp_t gfp_mask, int node_id)
{
mempool_t *pool;
+
pool = kzalloc_node(sizeof(*pool), gfp_mask, node_id);
if (!pool)
return NULL;
- pool->elements = kmalloc_array_node(min_nr, sizeof(void *),
- gfp_mask, node_id);
- if (!pool->elements) {
+
+ if (mempool_init_node(pool, min_nr, alloc_fn, free_fn, pool_data,
+ gfp_mask, node_id)) {
kfree(pool);
return NULL;
}
- spin_lock_init(&pool->lock);
- pool->min_nr = min_nr;
- pool->pool_data = pool_data;
- init_waitqueue_head(&pool->wait);
- pool->alloc = alloc_fn;
- pool->free = free_fn;

- /*
- * First pre-allocate the guaranteed number of buffers.
- */
- while (pool->curr_nr < pool->min_nr) {
- void *element;
-
- element = pool->alloc(gfp_mask, pool->pool_data);
- if (unlikely(!element)) {
- mempool_destroy(pool);
- return NULL;
- }
- add_element(pool, element);
- }
return pool;
}
EXPORT_SYMBOL(mempool_create_node);
--
2.17.0


2018-05-09 07:54:38

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [PATCH 01/10] mempool: Add mempool_init()/mempool_exit()

On Tue, May 08, 2018 at 09:33:49PM -0400, Kent Overstreet wrote:
> +/**
> + * mempool_destroy - exit a mempool initialized with mempool_init()

^ mempool_exit()

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2018-05-11 21:12:29

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 01/10] mempool: Add mempool_init()/mempool_exit()

On 5/8/18 7:33 PM, Kent Overstreet wrote:
> Allows mempools to be embedded in other structs, getting rid of a
> pointer indirection from allocation fastpaths.
>
> mempool_exit() is safe to call on an uninitialized but zeroed mempool.

Looks fine to me. I'd like to carry it through the block branch, as some
of the following cleanups depend on it. Kent, can you post a v2 with
the destroy -> exit typo fixed up?

But would be nice to have someone sign off on it...

--
Jens Axboe


2018-05-11 21:14:13

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On 5/8/18 7:33 PM, Kent Overstreet wrote:
> - Add separately allowed mempools, biosets: bcachefs uses both all over the
> place
>
> - Bit of utility code - bio_copy_data_iter(), zero_fill_bio_iter()
>
> - bio_list_copy_data(), the bi_next check - defensiveness because of a bug I
> had fun chasing down at one point
>
> - add some exports, because bcachefs does dio its own way
> - show whether fua is supported in sysfs, because I don't know of anything that
> exports whether the _block layer_ specifically thinks fua is supported.

Looked over the series, and looks like both good cleanups and optimizations.
If we can get the mempool patch sorted, I can apply this for 4.18.

--
Jens Axboe


2018-05-14 19:20:50

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 01/10] mempool: Add mempool_init()/mempool_exit()

On Fri, May 11, 2018 at 03:11:45PM -0600, Jens Axboe wrote:
> On 5/8/18 7:33 PM, Kent Overstreet wrote:
> > Allows mempools to be embedded in other structs, getting rid of a
> > pointer indirection from allocation fastpaths.
> >
> > mempool_exit() is safe to call on an uninitialized but zeroed mempool.
>
> Looks fine to me. I'd like to carry it through the block branch, as some
> of the following cleanups depend on it. Kent, can you post a v2 with
> the destroy -> exit typo fixed up?
>
> But would be nice to have someone sign off on it...

Done - it's now up in my git repo:
http://evilpiepirate.org/git/bcachefs.git bcachefs-block

2018-05-14 19:24:33

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On 5/8/18 7:33 PM, Kent Overstreet wrote:
> - Add separately allowed mempools, biosets: bcachefs uses both all over the
> place
>
> - Bit of utility code - bio_copy_data_iter(), zero_fill_bio_iter()
>
> - bio_list_copy_data(), the bi_next check - defensiveness because of a bug I
> had fun chasing down at one point
>
> - add some exports, because bcachefs does dio its own way
> - show whether fua is supported in sysfs, because I don't know of anything that
> exports whether the _block layer_ specifically thinks fua is supported.

Thanks Kent, applied for 4.18 with the update patch 1.

--
Jens Axboe


2018-05-14 19:25:45

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

Thanks!

On Mon, May 14, 2018 at 3:24 PM, Jens Axboe <[email protected]> wrote:
> On 5/8/18 7:33 PM, Kent Overstreet wrote:
>> - Add separately allowed mempools, biosets: bcachefs uses both all over the
>> place
>>
>> - Bit of utility code - bio_copy_data_iter(), zero_fill_bio_iter()
>>
>> - bio_list_copy_data(), the bi_next check - defensiveness because of a bug I
>> had fun chasing down at one point
>>
>> - add some exports, because bcachefs does dio its own way
>> - show whether fua is supported in sysfs, because I don't know of anything that
>> exports whether the _block layer_ specifically thinks fua is supported.
>
> Thanks Kent, applied for 4.18 with the update patch 1.
>
> --
> Jens Axboe
>

2018-05-17 20:56:51

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Tue, 2018-05-08 at 21:33 -0400, Kent Overstreet wrote:
> [ ... ]

Hello Kent,

With Jens' latest for-next branch I hit the kernel warning shown below. Can
you have a look?

Thanks,

Bart.


==================================================================
BUG: KASAN: use-after-free in bio_advance+0x110/0x1b0
Read of size 4 at addr ffff880156c5e6d0 by task ksoftirqd/10/72

CPU: 10 PID: 72 Comm: ksoftirqd/10 Tainted: G W 4.17.0-rc4-dbg+ #5
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x9a/0xeb
print_address_description+0x65/0x270
kasan_report+0x232/0x350
bio_advance+0x110/0x1b0
blk_update_request+0x9d/0x5a0
scsi_end_request+0x4c/0x300 [scsi_mod]
scsi_io_completion+0x71e/0xa40 [scsi_mod]
__blk_mq_complete_request+0x143/0x220
srp_recv_done+0x454/0x1100 [ib_srp]
__ib_process_cq+0x9a/0xf0 [ib_core]
ib_poll_handler+0x2d/0x90 [ib_core]
irq_poll_softirq+0xe5/0x1e0
__do_softirq+0x112/0x5f0
run_ksoftirqd+0x29/0x50
smpboot_thread_fn+0x30f/0x410
kthread+0x1b2/0x1d0
ret_from_fork+0x24/0x30

Allocated by task 1356:
kasan_kmalloc+0xa0/0xd0
kmem_cache_alloc+0xed/0x320
mempool_alloc+0xc6/0x210
bio_alloc_bioset+0x128/0x2d0
submit_bh_wbc+0x95/0x2d0
__block_write_full_page+0x2a6/0x5c0
__writepage+0x37/0x80
write_cache_pages+0x305/0x7c0
generic_writepages+0xb9/0x110
do_writepages+0x96/0x180
__filemap_fdatawrite_range+0x162/0x1b0
file_write_and_wait_range+0x4d/0xb0
blkdev_fsync+0x3c/0x70
do_fsync+0x33/0x60
__x64_sys_fsync+0x18/0x20
do_syscall_64+0x6d/0x220
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 72:
__kasan_slab_free+0x130/0x180
kmem_cache_free+0xcd/0x380
blk_update_request+0xc4/0x5a0
blk_update_request+0xc4/0x5a0
scsi_end_request+0x4c/0x300 [scsi_mod]
scsi_io_completion+0x71e/0xa40 [scsi_mod]
__blk_mq_complete_request+0x143/0x220
srp_recv_done+0x454/0x1100 [ib_srp]
__ib_process_cq+0x9a/0xf0 [ib_core]
ib_poll_handler+0x2d/0x90 [ib_core]
irq_poll_softirq+0xe5/0x1e0
__do_softirq+0x112/0x5f0

The buggy address belongs to the object at ffff880156c5e640
which belongs to the cache bio-0 of size 200
The buggy address is located 144 bytes inside of
200-byte region [ffff880156c5e640, ffff880156c5e708)
The buggy address belongs to the page:
page:ffffea00055b1780 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
ib_srpt:srpt_zerolength_write: ib_srpt 10.196.159.179-24: queued zerolength write
flags: 0x8000000000008100(slab|head)
raw: 8000000000008100 0000000000000000 0000000000000000 0000000100190019
raw: ffffea000543a800 0000000200000002 ffff88015a8f3a00 0000000000000000
ib_srpt:srpt_zerolength_write: ib_srpt 10.196.159.179-22: queued zerolength write
page dumped because: kasan: bad access detected
ib_srpt:srpt_zerolength_write: ib_srpt 10.196.159.179-20: queued zerolength write

Memory state around the buggy address:
ib_srpt:srpt_zerolength_write: ib_srpt 10.196.159.179-18: queued zerolength write
ffff880156c5e580: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
ib_srpt:srpt_zerolength_write_done: ib_srpt 10.196.159.179-24 wc->status 5
ffff880156c5e600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ib_srpt:srpt_zerolength_write_done: ib_srpt 10.196.159.179-22 wc->status 5
>ffff880156c5e680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ib_srpt:srpt_zerolength_write_done: ib_srpt 10.196.159.179-20 wc->status 5
^
ffff880156c5e700: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ib_srpt:srpt_zerolength_write_done: ib_srpt 10.196.159.179-18 wc->status 5
ffff880156c5e780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ib_srpt:srpt_release_channel_work: ib_srpt 10.196.159.179-24
==================================================================

(gdb) list *(bio_advance+0x110)
0xffffffff81450090 is in bio_advance (./include/linux/bvec.h:82).
77 iter->bi_size = 0;
78 return false;
79 }
80
81 while (bytes) {
82 unsigned iter_len = bvec_iter_len(bv, *iter);
83 unsigned len = min(bytes, iter_len);
84
85 bytes -= len;
86 iter->bi_size -= len;






2018-05-18 09:08:51

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Thu, May 17, 2018 at 08:54:57PM +0000, Bart Van Assche wrote:
> On Tue, 2018-05-08 at 21:33 -0400, Kent Overstreet wrote:
> > [ ... ]
>
> Hello Kent,
>
> With Jens' latest for-next branch I hit the kernel warning shown below. Can
> you have a look?

Any hints on how to reproduce it?

> Thanks,
>
> Bart.
>
>
> ==================================================================
> BUG: KASAN: use-after-free in bio_advance+0x110/0x1b0
> Read of size 4 at addr ffff880156c5e6d0 by task ksoftirqd/10/72
>
> CPU: 10 PID: 72 Comm: ksoftirqd/10 Tainted: G W 4.17.0-rc4-dbg+ #5
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> Call Trace:
> dump_stack+0x9a/0xeb
> print_address_description+0x65/0x270
> kasan_report+0x232/0x350
> bio_advance+0x110/0x1b0
> blk_update_request+0x9d/0x5a0
> scsi_end_request+0x4c/0x300 [scsi_mod]
> scsi_io_completion+0x71e/0xa40 [scsi_mod]
> __blk_mq_complete_request+0x143/0x220
> srp_recv_done+0x454/0x1100 [ib_srp]
> __ib_process_cq+0x9a/0xf0 [ib_core]
> ib_poll_handler+0x2d/0x90 [ib_core]
> irq_poll_softirq+0xe5/0x1e0
> __do_softirq+0x112/0x5f0
> run_ksoftirqd+0x29/0x50
> smpboot_thread_fn+0x30f/0x410
> kthread+0x1b2/0x1d0
> ret_from_fork+0x24/0x30
>
> Allocated by task 1356:
> kasan_kmalloc+0xa0/0xd0
> kmem_cache_alloc+0xed/0x320
> mempool_alloc+0xc6/0x210
> bio_alloc_bioset+0x128/0x2d0
> submit_bh_wbc+0x95/0x2d0
> __block_write_full_page+0x2a6/0x5c0
> __writepage+0x37/0x80
> write_cache_pages+0x305/0x7c0
> generic_writepages+0xb9/0x110
> do_writepages+0x96/0x180
> __filemap_fdatawrite_range+0x162/0x1b0
> file_write_and_wait_range+0x4d/0xb0
> blkdev_fsync+0x3c/0x70
> do_fsync+0x33/0x60
> __x64_sys_fsync+0x18/0x20
> do_syscall_64+0x6d/0x220
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Freed by task 72:
> __kasan_slab_free+0x130/0x180
> kmem_cache_free+0xcd/0x380
> blk_update_request+0xc4/0x5a0
> blk_update_request+0xc4/0x5a0
> scsi_end_request+0x4c/0x300 [scsi_mod]
> scsi_io_completion+0x71e/0xa40 [scsi_mod]
> __blk_mq_complete_request+0x143/0x220
> srp_recv_done+0x454/0x1100 [ib_srp]
> __ib_process_cq+0x9a/0xf0 [ib_core]
> ib_poll_handler+0x2d/0x90 [ib_core]
> irq_poll_softirq+0xe5/0x1e0
> __do_softirq+0x112/0x5f0
>
> The buggy address belongs to the object at ffff880156c5e640
> which belongs to the cache bio-0 of size 200
> The buggy address is located 144 bytes inside of
> 200-byte region [ffff880156c5e640, ffff880156c5e708)
> The buggy address belongs to the page:
> page:ffffea00055b1780 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
> ib_srpt:srpt_zerolength_write: ib_srpt 10.196.159.179-24: queued zerolength write
> flags: 0x8000000000008100(slab|head)
> raw: 8000000000008100 0000000000000000 0000000000000000 0000000100190019
> raw: ffffea000543a800 0000000200000002 ffff88015a8f3a00 0000000000000000
> ib_srpt:srpt_zerolength_write: ib_srpt 10.196.159.179-22: queued zerolength write
> page dumped because: kasan: bad access detected
> ib_srpt:srpt_zerolength_write: ib_srpt 10.196.159.179-20: queued zerolength write
>
> Memory state around the buggy address:
> ib_srpt:srpt_zerolength_write: ib_srpt 10.196.159.179-18: queued zerolength write
> ffff880156c5e580: 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc
> ib_srpt:srpt_zerolength_write_done: ib_srpt 10.196.159.179-24 wc->status 5
> ffff880156c5e600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
> ib_srpt:srpt_zerolength_write_done: ib_srpt 10.196.159.179-22 wc->status 5
> >ffff880156c5e680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ib_srpt:srpt_zerolength_write_done: ib_srpt 10.196.159.179-20 wc->status 5
> ^
> ffff880156c5e700: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ib_srpt:srpt_zerolength_write_done: ib_srpt 10.196.159.179-18 wc->status 5
> ffff880156c5e780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ib_srpt:srpt_release_channel_work: ib_srpt 10.196.159.179-24
> ==================================================================
>
> (gdb) list *(bio_advance+0x110)
> 0xffffffff81450090 is in bio_advance (./include/linux/bvec.h:82).
> 77 iter->bi_size = 0;
> 78 return false;
> 79 }
> 80
> 81 while (bytes) {
> 82 unsigned iter_len = bvec_iter_len(bv, *iter);
> 83 unsigned len = min(bytes, iter_len);
> 84
> 85 bytes -= len;
> 86 iter->bi_size -= len;
>
>
>
>
>
>

2018-05-18 15:13:23

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Fri, 2018-05-18 at 05:06 -0400, Kent Overstreet wrote:
> On Thu, May 17, 2018 at 08:54:57PM +0000, Bart Van Assche wrote:
> > With Jens' latest for-next branch I hit the kernel warning shown below. Can
> > you have a look?
>
> Any hints on how to reproduce it?

Sure. This is how I triggered it:
* Clone https://github.com/bvanassche/srp-test.
* Follow the instructions in README.md.
* Run srp-test/run_tests -c -r 10

Thanks,

Bart.




2018-05-18 16:21:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 02/10] block: Convert bio_set to mempool_init()

On Tue, May 08, 2018 at 09:33:50PM -0400, Kent Overstreet wrote:
> Minor performance improvement by getting rid of pointer indirections
> from allocation/freeing fastpaths.

Can you please also send a long conversion for the remaining
few bioset_create users? It would be rather silly to keep two
almost the same interfaces around for just about two hand full
of users.

2018-05-18 16:22:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 02/10] block: Convert bio_set to mempool_init()

On Fri, May 18, 2018 at 09:20:28AM -0700, Christoph Hellwig wrote:
> On Tue, May 08, 2018 at 09:33:50PM -0400, Kent Overstreet wrote:
> > Minor performance improvement by getting rid of pointer indirections
> > from allocation/freeing fastpaths.
>
> Can you please also send a long conversion for the remaining
> few bioset_create users? It would be rather silly to keep two
> almost the same interfaces around for just about two hand full
> of users.

This comment was ment in reply to the next patch, sorry.

2018-05-18 16:23:48

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Fri, May 11, 2018 at 03:13:38PM -0600, Jens Axboe wrote:
> Looked over the series, and looks like both good cleanups and optimizations.
> If we can get the mempool patch sorted, I can apply this for 4.18.

FYI, I agree on the actual cleanups and optimization, but we really
shouldn't add new functions or even just exports without the code
using them. I think it is enough if we can collect ACKs on them, but
there is no point in using them. Especially as I'd really like to see
the users for some of them first.

2018-05-18 16:34:23

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On 5/18/18 10:23 AM, Christoph Hellwig wrote:
> On Fri, May 11, 2018 at 03:13:38PM -0600, Jens Axboe wrote:
>> Looked over the series, and looks like both good cleanups and optimizations.
>> If we can get the mempool patch sorted, I can apply this for 4.18.
>
> FYI, I agree on the actual cleanups and optimization, but we really
> shouldn't add new functions or even just exports without the code
> using them. I think it is enough if we can collect ACKs on them, but
> there is no point in using them. Especially as I'd really like to see
> the users for some of them first.

I certainly agree on that in general, but at the same time it makes the
expected submission of bcachefs not having to carry a number of
(essentially) unrelated patches. I'm assuming the likelihood of bcachefs
being submitted soonish is high, hence we won't have exports that don't
have in-kernel users in the longer term.

--
Jens Axboe


2018-05-18 17:38:03

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 02/10] block: Convert bio_set to mempool_init()

On Fri, May 18, 2018 at 09:20:28AM -0700, Christoph Hellwig wrote:
> On Tue, May 08, 2018 at 09:33:50PM -0400, Kent Overstreet wrote:
> > Minor performance improvement by getting rid of pointer indirections
> > from allocation/freeing fastpaths.
>
> Can you please also send a long conversion for the remaining
> few bioset_create users? It would be rather silly to keep two
> almost the same interfaces around for just about two hand full
> of users.

Yeah, I can do that

2018-05-20 22:18:11

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Fri, May 18, 2018 at 03:12:27PM +0000, Bart Van Assche wrote:
> On Fri, 2018-05-18 at 05:06 -0400, Kent Overstreet wrote:
> > On Thu, May 17, 2018 at 08:54:57PM +0000, Bart Van Assche wrote:
> > > With Jens' latest for-next branch I hit the kernel warning shown below. Can
> > > you have a look?
> >
> > Any hints on how to reproduce it?
>
> Sure. This is how I triggered it:
> * Clone https://github.com/bvanassche/srp-test.
> * Follow the instructions in README.md.
> * Run srp-test/run_tests -c -r 10

Can you bisect it? I don't have infiniband hardware handy...

2018-05-20 22:20:58

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Sun, 2018-05-20 at 18:17 -0400, Kent Overstreet wrote:
> On Fri, May 18, 2018 at 03:12:27PM +0000, Bart Van Assche wrote:
> > On Fri, 2018-05-18 at 05:06 -0400, Kent Overstreet wrote:
> > > On Thu, May 17, 2018 at 08:54:57PM +0000, Bart Van Assche wrote:
> > > > With Jens' latest for-next branch I hit the kernel warning shown below. Can
> > > > you have a look?
> > >
> > > Any hints on how to reproduce it?
> >
> > Sure. This is how I triggered it:
> > * Clone https://github.com/bvanassche/srp-test.
> > * Follow the instructions in README.md.
> > * Run srp-test/run_tests -c -r 10
>
> Can you bisect it? I don't have infiniband hardware handy...

Hello Kent,

Have you noticed that the test I described uses the rdma_rxe driver and hence that
no InfiniBand hardware is needed to run that test?

Thanks,

Bart.


2018-05-20 22:32:50

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Sun, May 20, 2018 at 10:19:13PM +0000, Bart Van Assche wrote:
> On Sun, 2018-05-20 at 18:17 -0400, Kent Overstreet wrote:
> > On Fri, May 18, 2018 at 03:12:27PM +0000, Bart Van Assche wrote:
> > > On Fri, 2018-05-18 at 05:06 -0400, Kent Overstreet wrote:
> > > > On Thu, May 17, 2018 at 08:54:57PM +0000, Bart Van Assche wrote:
> > > > > With Jens' latest for-next branch I hit the kernel warning shown below. Can
> > > > > you have a look?
> > > >
> > > > Any hints on how to reproduce it?
> > >
> > > Sure. This is how I triggered it:
> > > * Clone https://github.com/bvanassche/srp-test.
> > > * Follow the instructions in README.md.
> > > * Run srp-test/run_tests -c -r 10
> >
> > Can you bisect it? I don't have infiniband hardware handy...
>
> Hello Kent,
>
> Have you noticed that the test I described uses the rdma_rxe driver and hence that
> no InfiniBand hardware is needed to run that test?

No, I'm not terribly familiar with infiniband stuff....

Do you have some sort of self contained test/qemu recipe? I would really rather
not have to figure out how to configure multipath, and infiniband, and I'm not
even sure what else is needed based on that readme...

2018-05-20 22:35:59

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Sun, 2018-05-20 at 18:31 -0400, Kent Overstreet wrote:
> On Sun, May 20, 2018 at 10:19:13PM +0000, Bart Van Assche wrote:
> > On Sun, 2018-05-20 at 18:17 -0400, Kent Overstreet wrote:
> > > On Fri, May 18, 2018 at 03:12:27PM +0000, Bart Van Assche wrote:
> > > > On Fri, 2018-05-18 at 05:06 -0400, Kent Overstreet wrote:
> > > > > On Thu, May 17, 2018 at 08:54:57PM +0000, Bart Van Assche wrote:
> > > > > > With Jens' latest for-next branch I hit the kernel warning shown below. Can
> > > > > > you have a look?
> > > > >
> > > > > Any hints on how to reproduce it?
> > > >
> > > > Sure. This is how I triggered it:
> > > > * Clone https://github.com/bvanassche/srp-test.
> > > > * Follow the instructions in README.md.
> > > > * Run srp-test/run_tests -c -r 10
> > >
> > > Can you bisect it? I don't have infiniband hardware handy...
> >
> > Hello Kent,
> >
> > Have you noticed that the test I described uses the rdma_rxe driver and hence that
> > no InfiniBand hardware is needed to run that test?
>
> No, I'm not terribly familiar with infiniband stuff....
>
> Do you have some sort of self contained test/qemu recipe? I would really rather
> not have to figure out how to configure multipath, and infiniband, and I'm not
> even sure what else is needed based on that readme...

Hello Kent,

Please have another look at the srp-test README. The instructions in that document
are easy to follow. No multipath nor any InfiniBand knowledge is required. The test
even can be run in a virtual machine in case you would be worried about potential
impact of the test on the rest of the system.

Bart.


2018-05-20 23:02:38

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Sun, May 20, 2018 at 10:35:29PM +0000, Bart Van Assche wrote:
> On Sun, 2018-05-20 at 18:31 -0400, Kent Overstreet wrote:
> > On Sun, May 20, 2018 at 10:19:13PM +0000, Bart Van Assche wrote:
> > > On Sun, 2018-05-20 at 18:17 -0400, Kent Overstreet wrote:
> > > > On Fri, May 18, 2018 at 03:12:27PM +0000, Bart Van Assche wrote:
> > > > > On Fri, 2018-05-18 at 05:06 -0400, Kent Overstreet wrote:
> > > > > > On Thu, May 17, 2018 at 08:54:57PM +0000, Bart Van Assche wrote:
> > > > > > > With Jens' latest for-next branch I hit the kernel warning shown below. Can
> > > > > > > you have a look?
> > > > > >
> > > > > > Any hints on how to reproduce it?
> > > > >
> > > > > Sure. This is how I triggered it:
> > > > > * Clone https://github.com/bvanassche/srp-test.
> > > > > * Follow the instructions in README.md.
> > > > > * Run srp-test/run_tests -c -r 10
> > > >
> > > > Can you bisect it? I don't have infiniband hardware handy...
> > >
> > > Hello Kent,
> > >
> > > Have you noticed that the test I described uses the rdma_rxe driver and hence that
> > > no InfiniBand hardware is needed to run that test?
> >
> > No, I'm not terribly familiar with infiniband stuff....
> >
> > Do you have some sort of self contained test/qemu recipe? I would really rather
> > not have to figure out how to configure multipath, and infiniband, and I'm not
> > even sure what else is needed based on that readme...
>
> Hello Kent,
>
> Please have another look at the srp-test README. The instructions in that document
> are easy to follow. No multipath nor any InfiniBand knowledge is required. The test
> even can be run in a virtual machine in case you would be worried about potential
> impact of the test on the rest of the system.

Your readme refers to kernel config options that don't even exist in the current
kernel

2018-05-20 23:11:31

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Sun, 2018-05-20 at 19:00 -0400, Kent Overstreet wrote:
> On Sun, May 20, 2018 at 10:35:29PM +0000, Bart Van Assche wrote:
> > On Sun, 2018-05-20 at 18:31 -0400, Kent Overstreet wrote:
> > > On Sun, May 20, 2018 at 10:19:13PM +0000, Bart Van Assche wrote:
> > > > On Sun, 2018-05-20 at 18:17 -0400, Kent Overstreet wrote:
> > > > > On Fri, May 18, 2018 at 03:12:27PM +0000, Bart Van Assche wrote:
> > > > > > On Fri, 2018-05-18 at 05:06 -0400, Kent Overstreet wrote:
> > > > > > > On Thu, May 17, 2018 at 08:54:57PM +0000, Bart Van Assche wrote:
> > > > > > > > With Jens' latest for-next branch I hit the kernel warning shown below. Can
> > > > > > > > you have a look?
> > > > > > >
> > > > > > > Any hints on how to reproduce it?
> > > > > >
> > > > > > Sure. This is how I triggered it:
> > > > > > * Clone https://github.com/bvanassche/srp-test.
> > > > > > * Follow the instructions in README.md.
> > > > > > * Run srp-test/run_tests -c -r 10
> > > > >
> > > > > Can you bisect it? I don't have infiniband hardware handy...
> > > >
> > > > Hello Kent,
> > > >
> > > > Have you noticed that the test I described uses the rdma_rxe driver and hence that
> > > > no InfiniBand hardware is needed to run that test?
> > >
> > > No, I'm not terribly familiar with infiniband stuff....
> > >
> > > Do you have some sort of self contained test/qemu recipe? I would really rather
> > > not have to figure out how to configure multipath, and infiniband, and I'm not
> > > even sure what else is needed based on that readme...
> >
> > Please have another look at the srp-test README. The instructions in that document
> > are easy to follow. No multipath nor any InfiniBand knowledge is required. The test
> > even can be run in a virtual machine in case you would be worried about potential
> > impact of the test on the rest of the system.
>
> Your readme refers to kernel config options that don't even exist in the current
> kernel

That's probably a misunderstanding of your side. From a quick look it seems like all
config symbols mentioned in the srp-test README.md exist in kernel v4.17-rc6. The
following command does not report any non-existing Kconfig symbols:

$ cd linux-kernel
$ sed -n 's/^\* CONFIG_//p' ~/software/infiniband/srp-test/README.md |
while read f; do { git grep -lw "$f" | grep -q Kconfig; } || echo $f; done

Bart.



2018-05-20 23:22:42

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Sun, May 20, 2018 at 10:35:29PM +0000, Bart Van Assche wrote:
> On Sun, 2018-05-20 at 18:31 -0400, Kent Overstreet wrote:
> > On Sun, May 20, 2018 at 10:19:13PM +0000, Bart Van Assche wrote:
> > > On Sun, 2018-05-20 at 18:17 -0400, Kent Overstreet wrote:
> > > > On Fri, May 18, 2018 at 03:12:27PM +0000, Bart Van Assche wrote:
> > > > > On Fri, 2018-05-18 at 05:06 -0400, Kent Overstreet wrote:
> > > > > > On Thu, May 17, 2018 at 08:54:57PM +0000, Bart Van Assche wrote:
> > > > > > > With Jens' latest for-next branch I hit the kernel warning shown below. Can
> > > > > > > you have a look?
> > > > > >
> > > > > > Any hints on how to reproduce it?
> > > > >
> > > > > Sure. This is how I triggered it:
> > > > > * Clone https://github.com/bvanassche/srp-test.
> > > > > * Follow the instructions in README.md.
> > > > > * Run srp-test/run_tests -c -r 10
> > > >
> > > > Can you bisect it? I don't have infiniband hardware handy...
> > >
> > > Hello Kent,
> > >
> > > Have you noticed that the test I described uses the rdma_rxe driver and hence that
> > > no InfiniBand hardware is needed to run that test?
> >
> > No, I'm not terribly familiar with infiniband stuff....
> >
> > Do you have some sort of self contained test/qemu recipe? I would really rather
> > not have to figure out how to configure multipath, and infiniband, and I'm not
> > even sure what else is needed based on that readme...
>
> Hello Kent,
>
> Please have another look at the srp-test README. The instructions in that document
> are easy to follow. No multipath nor any InfiniBand knowledge is required. The test
> even can be run in a virtual machine in case you would be worried about potential
> impact of the test on the rest of the system.


I really have better things to do than debug someone else's tests...

Restarting multipath-tools (via systemctl): multipath-tools.service.
multipathd> reconfigure
ok
multipathd> make -C discontiguous-io discontiguous-io
make[1]: Entering directory '/host/home/kent/ktest/tests/srp-test/discontiguous-io'
make[1]: 'discontiguous-io' is up to date.
make[1]: Leaving directory '/host/home/kent/ktest/tests/srp-test/discontiguous-io'
Unloaded the ib_srpt kernel module
Unloaded the rdma_rxe kernel module
../run_tests: line 65: cd: /lib/modules/4.16.0+/kernel/block: No such file or directory
Zero-initializing /dev/ram0 ... done
Zero-initializing /dev/ram1 ... done
Unable to load target_core_pscsi
Unable to load target_core_user
Configured SRP target driver
Running test ../tests/01 ...
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/01 failed
Running test ../tests/02-mq ...
Test file I/O on top of multipath concurrently with logout and login (10 min; mq)
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/02-mq failed
Running test ../tests/02-sq ...
Test file I/O on top of multipath concurrently with logout and login (10 min; sq)
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/02-sq failed
Running test ../tests/02-sq-on-mq ...
Test file I/O on top of multipath concurrently with logout and login (10 min; sq-on-mq)
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/02-sq-on-mq failed
Running test ../tests/03-4M ...
Test direct I/O with large transfer sizes and cmd_sg_entries=255
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/03-4M failed
Running test ../tests/03-8M ...
Test direct I/O with large transfer sizes and cmd_sg_entries=255
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/03-8M failed
Running test ../tests/04-4M ...
Test direct I/O with large transfer sizes and cmd_sg_entries=1
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/04-4M failed
Running test ../tests/04-8M ...
Test direct I/O with large transfer sizes and cmd_sg_entries=1
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/04-8M failed
Running test ../tests/05-4M ...
Test buffered I/O with large transfer sizes and cmd_sg_entries=255
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/05-4M failed
Running test ../tests/05-8M ...
Test buffered I/O with large transfer sizes and cmd_sg_entries=255
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/05-8M failed
Running test ../tests/06 ...
Test block I/O on top of multipath concurrently with logout and login (10 min)
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/06 failed
0 tests succeeded and 11 tests failed
Unloaded the ib_srpt kernel module
Unloaded the rdma_rxe kernel module

========= PASSED srp in 18s

2018-05-20 23:45:21

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Sun, 2018-05-20 at 19:21 -0400, Kent Overstreet wrote:
> I really have better things to do than debug someone else's tests...
> [ ... ]
> ../run_tests: line 65: cd: /lib/modules/4.16.0+/kernel/block: No such file or directory

Kernel v4.16 is too old to run these tests. The srp-test script needs the
following commit that went upstream in kernel v4.17-rc1:

63cf1a902c9d ("IB/srpt: Add RDMA/CM support")

Bart.


2018-05-20 23:59:57

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Sun, May 20, 2018 at 11:40:45PM +0000, Bart Van Assche wrote:
> On Sun, 2018-05-20 at 19:21 -0400, Kent Overstreet wrote:
> > I really have better things to do than debug someone else's tests...
> > [ ... ]
> > ../run_tests: line 65: cd: /lib/modules/4.16.0+/kernel/block: No such file or directory
>
> Kernel v4.16 is too old to run these tests. The srp-test script needs the
> following commit that went upstream in kernel v4.17-rc1:
>
> 63cf1a902c9d ("IB/srpt: Add RDMA/CM support")

Same output on Jens' for-next branch.

2018-05-21 15:12:57

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Sun, 2018-05-20 at 19:58 -0400, Kent Overstreet wrote:
> On Sun, May 20, 2018 at 11:40:45PM +0000, Bart Van Assche wrote:
> > On Sun, 2018-05-20 at 19:21 -0400, Kent Overstreet wrote:
> > > I really have better things to do than debug someone else's tests...
> > > [ ... ]
> > > ../run_tests: line 65: cd: /lib/modules/4.16.0+/kernel/block: No such file or directory
> >
> > Kernel v4.16 is too old to run these tests. The srp-test script needs the
> > following commit that went upstream in kernel v4.17-rc1:
> >
> > 63cf1a902c9d ("IB/srpt: Add RDMA/CM support")
>
> Same output on Jens' for-next branch.

Others have been able to run the srp-test software with the instructions
provided earlier in this e-mail thread. Can you share the kernel messages from
around the time the test was run (dmesg, /var/log/messages or /var/log/syslog)?

Thanks,

Bart.





2018-05-21 18:39:35

by Omar Sandoval

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Mon, May 21, 2018 at 03:11:08PM +0000, Bart Van Assche wrote:
> On Sun, 2018-05-20 at 19:58 -0400, Kent Overstreet wrote:
> > On Sun, May 20, 2018 at 11:40:45PM +0000, Bart Van Assche wrote:
> > > On Sun, 2018-05-20 at 19:21 -0400, Kent Overstreet wrote:
> > > > I really have better things to do than debug someone else's tests...
> > > > [ ... ]
> > > > ../run_tests: line 65: cd: /lib/modules/4.16.0+/kernel/block: No such file or directory
> > >
> > > Kernel v4.16 is too old to run these tests. The srp-test script needs the
> > > following commit that went upstream in kernel v4.17-rc1:
> > >
> > > 63cf1a902c9d ("IB/srpt: Add RDMA/CM support")
> >
> > Same output on Jens' for-next branch.
>
> Others have been able to run the srp-test software with the instructions
> provided earlier in this e-mail thread. Can you share the kernel messages from
> around the time the test was run (dmesg, /var/log/messages or /var/log/syslog)?
>
> Thanks,
>
> Bart.

Bart,

Have you made any progress in porting srp-test to blktests so we don't
have to have this conversation again?

2018-05-21 18:47:40

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Mon, 2018-05-21 at 11:37 -0700, Omar Sandoval wrote:
> Have you made any progress in porting srp-test to blktests so we don't
> have to have this conversation again?

Hello Omar,

Porting the srp-test software to the blktests framework is still high on my
to-do list. I will start working on this as soon as the work on adding SMR
support for fio has been completed.

Bart.

2018-05-22 22:02:36

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH 00/10] Misc block layer patches for bcachefs

On Sun, 2018-05-20 at 18:17 -0400, Kent Overstreet wrote:
> On Fri, May 18, 2018 at 03:12:27PM +0000, Bart Van Assche wrote:
> > On Fri, 2018-05-18 at 05:06 -0400, Kent Overstreet wrote:
> > > On Thu, May 17, 2018 at 08:54:57PM +0000, Bart Van Assche wrote:
> > > > With Jens' latest for-next branch I hit the kernel warning shown below. Can
> > > > you have a look?
> > >
> > > Any hints on how to reproduce it?
> >
> > Sure. This is how I triggered it:
> > * Clone https://github.com/bvanassche/srp-test.
> > * Follow the instructions in README.md.
> > * Run srp-test/run_tests -c -r 10
>
> Can you bisect it? I don't have infiniband hardware handy...

Hello Kent,

The bisect I ran reported the following commit as the first faulty commit:

block: Add warning for bi_next not NULL in bio_endio()

Bart.