2013-07-12 15:52:51

by Jan Vesely

[permalink] [raw]
Subject: [PATCH v4 0/2] block: Allow merging of tail pages into the last segment

Hi

These patches modify __bio_add_page to accept pages that extent the last bio
segment. some drivers craft their buffers and rely on this behavior (see
message in patch 2 for details)


jan

v4: whitespace fixes to make checkpatch happy

v3: Use code from __blk_recalc_rq_segments to decide whether the page is
mergeable,

v2: modify a comment


2013-07-12 15:52:53

by Jan Vesely

[permalink] [raw]
Subject: [PATCH v4 1/2] block: factor out vector mergeable decision to a helper function

From: Jan Vesely <[email protected]>

Export the function so it can be used to predict segment counts
without calling the recalc function. This will be used in the next
patch.

Signed-off-by: Jan Vesely <[email protected]>

CC: Alexander Viro <[email protected]>
CC: James Bottomley <[email protected]>
CC: Jens Axboe <[email protected]>
CC: Kent Overstreet <[email protected]>
CC: Rob Evers <[email protected]>
CC: Tomas Henzl <[email protected]>
CC: Nikola Pajkovsky <[email protected]>
CC: Kai Makisara <[email protected]>
CC: [email protected]
CC: [email protected]
---
block/blk-merge.c | 52 +++++++++++++++++++++++++++++++---------------------
include/linux/bio.h | 3 +++
2 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 5f24482..f1ef657 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -9,11 +9,39 @@

#include "blk.h"

+bool bvec_mergeable(struct request_queue *q, struct bio_vec *lastbv,
+ struct bio_vec *newbv, unsigned int seg_size)
+{
+ unsigned long limit = queue_bounce_pfn(q);
+
+ if (!blk_queue_cluster(q))
+ return false;
+
+ /*
+ * the trick here is to make sure that a high page is
+ * never considered part of another segment, since that
+ * might change with the bounce page.
+ */
+ if ((page_to_pfn(lastbv->bv_page) > limit)
+ || (page_to_pfn(newbv->bv_page) > limit))
+ return false;
+
+ if (seg_size + newbv->bv_len > queue_max_segment_size(q))
+ return false;
+
+ if (!BIOVEC_PHYS_MERGEABLE(lastbv, newbv))
+ return false;
+ if (!BIOVEC_SEG_BOUNDARY(q, lastbv, newbv))
+ return false;
+ return true;
+}
+
+
static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
struct bio *bio)
{
struct bio_vec *bv, *bvprv = NULL;
- int cluster, i, high, highprv = 1;
+ int i;
unsigned int seg_size, nr_phys_segs;
struct bio *fbio, *bbio;

@@ -21,33 +49,16 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
return 0;

fbio = bio;
- cluster = blk_queue_cluster(q);
seg_size = 0;
nr_phys_segs = 0;
for_each_bio(bio) {
bio_for_each_segment(bv, bio, i) {
- /*
- * the trick here is making sure that a high page is
- * never considered part of another segment, since that
- * might change with the bounce page.
- */
- high = page_to_pfn(bv->bv_page) > queue_bounce_pfn(q);
- if (high || highprv)
- goto new_segment;
- if (cluster) {
- if (seg_size + bv->bv_len
- > queue_max_segment_size(q))
- goto new_segment;
- if (!BIOVEC_PHYS_MERGEABLE(bvprv, bv))
- goto new_segment;
- if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bv))
- goto new_segment;
-
+ if (bvprv && bvec_mergeable(q, bvprv, bv, seg_size)) {
seg_size += bv->bv_len;
bvprv = bv;
continue;
}
-new_segment:
+ /* new segment */
if (nr_phys_segs == 1 && seg_size >
fbio->bi_seg_front_size)
fbio->bi_seg_front_size = seg_size;
@@ -55,7 +66,6 @@ new_segment:
nr_phys_segs++;
bvprv = bv;
seg_size = bv->bv_len;
- highprv = high;
}
bbio = bio;
}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index ef24466..3af0e36 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -307,6 +307,9 @@ extern struct bio_vec *bvec_alloc(gfp_t, int, unsigned long *, mempool_t *);
extern void bvec_free(mempool_t *, struct bio_vec *, unsigned int);
extern unsigned int bvec_nr_vecs(unsigned short idx);

+extern bool bvec_mergeable(struct request_queue *q, struct bio_vec *lastbv,
+ struct bio_vec *newbv, unsigned int seg_size);
+
#ifdef CONFIG_BLK_CGROUP
int bio_associate_current(struct bio *bio);
void bio_disassociate_task(struct bio *bio);
--
1.8.3.1

2013-07-12 15:53:03

by Jan Vesely

[permalink] [raw]
Subject: [PATCH v4 2/2] block: modify __bio_add_page check to accept pages that don't start a new segment

From: Jan Vesely <[email protected]>

The original behavior was to refuse all pages after the maximum number of
segments has been reached. However, some drivers (like st) craft their buffers
to potentially require exactly max segments and multiple pages in the last
segment. This patch modifies the check to allow pages that can be merged into
the last segment.

Fixes EBUSY failures when using large tape block size in high
memory fragmentation condition. This regression was introduced by commit
46081b166415acb66d4b3150ecefcd9460bb48a1
st: Increase success probability in driver buffer allocation

Signed-off-by: Jan Vesely <[email protected]>

CC: Alexander Viro <[email protected]>
CC: James Bottomley <[email protected]>
CC: Jens Axboe <[email protected]>
CC: Kent Overstreet <[email protected]>
CC: Rob Evers <[email protected]>
CC: Tomas Henzl <[email protected]>
CC: Nikola Pajkovsky <[email protected]>
CC: Kai Makisara <[email protected]>
CC: [email protected]
CC: [email protected]
---
fs/bio.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 94bbc04..ba64e99 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -603,7 +603,6 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
*page, unsigned int len, unsigned int offset,
unsigned short max_sectors)
{
- int retried_segments = 0;
struct bio_vec *bvec;

/*
@@ -654,18 +653,12 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
return 0;

/*
- * we might lose a segment or two here, but rather that than
- * make this too complex.
+ * The first part of the segment count check,
+ * reduce segment count if possible
*/
-
- while (bio->bi_phys_segments >= queue_max_segments(q)) {
-
- if (retried_segments)
- return 0;
-
- retried_segments = 1;
+ if (bio->bi_phys_segments >= queue_max_segments(q))
blk_recount_segments(q, bio);
- }
+

/*
* setup the new entry, we might clear it again later if we
@@ -677,6 +670,21 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
bvec->bv_offset = offset;

/*
+ * the other part of the segment count check, allow mergeable pages.
+ * BIO_SEG_VALID flag is cleared below
+ */
+ if ((bio->bi_phys_segments > queue_max_segments(q)) ||
+ ((bio->bi_phys_segments == queue_max_segments(q)) &&
+ !bvec_mergeable(q, __BVEC_END(bio), bvec,
+ bio->bi_seg_back_size))) {
+ bvec->bv_page = NULL;
+ bvec->bv_len = 0;
+ bvec->bv_offset = 0;
+ return 0;
+ }
+
+
+ /*
* if queue has other restrictions (eg varying max sector size
* depending on offset), it can specify a merge_bvec_fn in the
* queue to get further control
--
1.8.3.1

2013-07-26 10:42:21

by Jan Vesely

[permalink] [raw]
Subject: Re: [PATCH v4 0/2] block: Fix regression since 46081b166415acb66d4b3150ecefcd9460bb48a1 (was: Allow merging of tail pages into the last segment)

On 12/07/13 17:52, Jan Vesely wrote:
> Hi
>
> These patches modify __bio_add_page to accept pages that extent the last bio
> segment. some drivers craft their buffers and rely on this behavior (see
> message in patch 2 for details)
>
>
> jan
>
> v4: whitespace fixes to make checkpatch happy
>
> v3: Use code from __blk_recalc_rq_segments to decide whether the page is
> mergeable,
>
> v2: modify a comment

ping
and a bit more info from patch 2/2:

The original behavior was to refuse all pages after the maximum number of
segments has been reached. However, some drivers (like st) craft their buffers
to potentially require exactly max segments and multiple pages in the last
segment. This patch modifies the check to allow pages that can be merged into
the last segment.

Fixes EBUSY failures when using large tape block size in high
memory fragmentation condition. This regression was introduced by commit
46081b166415acb66d4b3150ecefcd9460bb48a1
st: Increase success probability in driver buffer allocation

Jan

--
Jan Vesely <[email protected]>