2013-03-25 14:10:30

by Jan Vesely

[permalink] [raw]
Subject: [PATCH RESEND v2] block: modify __bio_add_page check to accept pages that don't start a new segment

v2: changed a comment

The original behavior was to refuse all pages after the maximum number of
segments has been reached. However, some drivers (like st) craft their buffers
to potentially require exactly max segments and multiple pages in the last
segment. This patch modifies the check to allow pages that can be merged into
the last segment.

Fixes EBUSY failures when using large tape block size in high
memory fragmentation condition.
This regression was introduced by commit
46081b166415acb66d4b3150ecefcd9460bb48a1
st: Increase success probability in driver buffer allocation

Signed-off-by: Jan Vesely <[email protected]>

CC: Alexander Viro <[email protected]>
CC: FUJITA Tomonori <[email protected]>
CC: Kai Makisara <[email protected]>
CC: James Bottomley <[email protected]>
CC: Jens Axboe <[email protected]>
CC: [email protected]
---
fs/bio.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index bb5768f..bc6af71 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -500,7 +500,6 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
*page, unsigned int len, unsigned int offset,
unsigned short max_sectors)
{
- int retried_segments = 0;
struct bio_vec *bvec;

/*
@@ -551,18 +550,13 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
return 0;

/*
- * we might lose a segment or two here, but rather that than
- * make this too complex.
+ * The first part of the segment count check,
+ * reduce segment count if possible
*/

- while (bio->bi_phys_segments >= queue_max_segments(q)) {
-
- if (retried_segments)
- return 0;
-
- retried_segments = 1;
+ if (bio->bi_phys_segments >= queue_max_segments(q))
blk_recount_segments(q, bio);
- }
+

/*
* setup the new entry, we might clear it again later if we
@@ -572,6 +566,19 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
bvec->bv_page = page;
bvec->bv_len = len;
bvec->bv_offset = offset;
+
+ /*
+ * the other part of the segment count check, allow mergeable pages
+ */
+ if ((bio->bi_phys_segments > queue_max_segments(q)) ||
+ ( (bio->bi_phys_segments == queue_max_segments(q)) &&
+ !BIOVEC_PHYS_MERGEABLE(bvec - 1, bvec))) {
+ bvec->bv_page = NULL;
+ bvec->bv_len = 0;
+ bvec->bv_offset = 0;
+ return 0;
+ }
+

/*
* if queue has other restrictions (eg varying max sector size
--
1.8.1.4


2013-03-25 14:25:17

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH RESEND v2] block: modify __bio_add_page check to accept pages that don't start a new segment

On Mon, Mar 25 2013, Jan Vesely wrote:
> v2: changed a comment
>
> The original behavior was to refuse all pages after the maximum number of
> segments has been reached. However, some drivers (like st) craft their buffers
> to potentially require exactly max segments and multiple pages in the last
> segment. This patch modifies the check to allow pages that can be merged into
> the last segment.
>
> Fixes EBUSY failures when using large tape block size in high
> memory fragmentation condition.
> This regression was introduced by commit
> 46081b166415acb66d4b3150ecefcd9460bb48a1
> st: Increase success probability in driver buffer allocation
>
> Signed-off-by: Jan Vesely <[email protected]>
>
> CC: Alexander Viro <[email protected]>
> CC: FUJITA Tomonori <[email protected]>
> CC: Kai Makisara <[email protected]>
> CC: James Bottomley <[email protected]>
> CC: Jens Axboe <[email protected]>
> CC: [email protected]
> ---
> fs/bio.c | 27 +++++++++++++++++----------
> 1 file changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/fs/bio.c b/fs/bio.c
> index bb5768f..bc6af71 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -500,7 +500,6 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
> *page, unsigned int len, unsigned int offset,
> unsigned short max_sectors)
> {
> - int retried_segments = 0;
> struct bio_vec *bvec;
>
> /*
> @@ -551,18 +550,13 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
> return 0;
>
> /*
> - * we might lose a segment or two here, but rather that than
> - * make this too complex.
> + * The first part of the segment count check,
> + * reduce segment count if possible
> */
>
> - while (bio->bi_phys_segments >= queue_max_segments(q)) {
> -
> - if (retried_segments)
> - return 0;
> -
> - retried_segments = 1;
> + if (bio->bi_phys_segments >= queue_max_segments(q))
> blk_recount_segments(q, bio);
> - }
> +
>
> /*
> * setup the new entry, we might clear it again later if we
> @@ -572,6 +566,19 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
> bvec->bv_page = page;
> bvec->bv_len = len;
> bvec->bv_offset = offset;
> +
> + /*
> + * the other part of the segment count check, allow mergeable pages
> + */
> + if ((bio->bi_phys_segments > queue_max_segments(q)) ||
> + ( (bio->bi_phys_segments == queue_max_segments(q)) &&
> + !BIOVEC_PHYS_MERGEABLE(bvec - 1, bvec))) {
> + bvec->bv_page = NULL;
> + bvec->bv_len = 0;
> + bvec->bv_offset = 0;
> + return 0;
> + }
> +

This is a bit messy, I think. bi_phys_segments should never be allowed
to go beyond queue_ma_segments(), so the > test does not look right.
Maybe it's an artifact of when we fall through with this patch, we bump
bi_phys_segments even if the segments are physicall contig and
mergeable.

What happens when the segment is physically mergeable, but the resulting
merged segment is too large (bigger than q->limits.max_segment_size)?

--
Jens Axboe

2013-03-25 15:35:42

by Jan Vesely

[permalink] [raw]
Subject: Re: [PATCH RESEND v2] block: modify __bio_add_page check to accept pages that don't start a new segment

On Mon 25 Mar 2013 15:24:57 CET, Jens Axboe wrote:
> On Mon, Mar 25 2013, Jan Vesely wrote:
>> v2: changed a comment
>>
>> The original behavior was to refuse all pages after the maximum number of
>> segments has been reached. However, some drivers (like st) craft their buffers
>> to potentially require exactly max segments and multiple pages in the last
>> segment. This patch modifies the check to allow pages that can be merged into
>> the last segment.
>>
>> Fixes EBUSY failures when using large tape block size in high
>> memory fragmentation condition.
>> This regression was introduced by commit
>> 46081b166415acb66d4b3150ecefcd9460bb48a1
>> st: Increase success probability in driver buffer allocation
>>
>> Signed-off-by: Jan Vesely <[email protected]>
>>
>> CC: Alexander Viro <[email protected]>
>> CC: FUJITA Tomonori <[email protected]>
>> CC: Kai Makisara <[email protected]>
>> CC: James Bottomley <[email protected]>
>> CC: Jens Axboe <[email protected]>
>> CC: [email protected]
>> ---
>> fs/bio.c | 27 +++++++++++++++++----------
>> 1 file changed, 17 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/bio.c b/fs/bio.c
>> index bb5768f..bc6af71 100644
>> --- a/fs/bio.c
>> +++ b/fs/bio.c
>> @@ -500,7 +500,6 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
>> *page, unsigned int len, unsigned int offset,
>> unsigned short max_sectors)
>> {
>> - int retried_segments = 0;
>> struct bio_vec *bvec;
>>
>> /*
>> @@ -551,18 +550,13 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
>> return 0;
>>
>> /*
>> - * we might lose a segment or two here, but rather that than
>> - * make this too complex.
>> + * The first part of the segment count check,
>> + * reduce segment count if possible
>> */
>>
>> - while (bio->bi_phys_segments >= queue_max_segments(q)) {
>> -
>> - if (retried_segments)
>> - return 0;
>> -
>> - retried_segments = 1;
>> + if (bio->bi_phys_segments >= queue_max_segments(q))
>> blk_recount_segments(q, bio);
>> - }
>> +
>>
>> /*
>> * setup the new entry, we might clear it again later if we
>> @@ -572,6 +566,19 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
>> bvec->bv_page = page;
>> bvec->bv_len = len;
>> bvec->bv_offset = offset;
>> +
>> + /*
>> + * the other part of the segment count check, allow mergeable pages
>> + */
>> + if ((bio->bi_phys_segments > queue_max_segments(q)) ||
>> + ( (bio->bi_phys_segments == queue_max_segments(q)) &&
>> + !BIOVEC_PHYS_MERGEABLE(bvec - 1, bvec))) {
>> + bvec->bv_page = NULL;
>> + bvec->bv_len = 0;
>> + bvec->bv_offset = 0;
>> + return 0;
>> + }
>> +
>
> This is a bit messy, I think. bi_phys_segments should never be allowed
> to go beyond queue_ma_segments(), so the > test does not look right.
> Maybe it's an artifact of when we fall through with this patch, we bump
> bi_phys_segments even if the segments are physicall contig and
> mergeable.

yeah. it is messy, I tried to go for the least invasive changes.

I took the '>' test from the original while loop '>='. The original
behavior
guaranteed bio->bi_phys_segments <= max_segments, if the bio satisfied
this condition to begin with.
I did not find any guarantees that the 'bio' parameter of this function
has
to satisfy this condition in general.

My understanding is that if a caller of this function (or one of the
two that call this one)
provides an invalid (segment-count-wise) bio, it will fail (return 0
added length),
and let the caller handle the situation.
I admit, I did not check all the call paths that use these functions.

>
> What happens when the segment is physically mergeable, but the resulting
> merged segment is too large (bigger than q->limits.max_segment_size)?
>

ah, yes. I guess I need a check that follows __blk_recalc_rq_segments
more closely.
We know that at this point all pages are merged into segments, so a
helper function that would be used
by both __blk_recalc_rq_segments and this check is possible.


I still assume that a temporary increase of bi_phys_segments above
max_segments is ok.
If we want to avoid this situation we would need to merge tail pages
right away. That's imo uglier.

thanks
--
Jan Vesely <[email protected]>

2013-03-25 19:40:24

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH RESEND v2] block: modify __bio_add_page check to accept pages that don't start a new segment

On Mon, Mar 25 2013, Jan Vesely wrote:
> 51506edc5741209311913
>
> On Mon 25 Mar 2013 15:24:57 CET, Jens Axboe wrote:
> > On Mon, Mar 25 2013, Jan Vesely wrote:
> >> v2: changed a comment
> >>
> >> The original behavior was to refuse all pages after the maximum number of
> >> segments has been reached. However, some drivers (like st) craft their buffers
> >> to potentially require exactly max segments and multiple pages in the last
> >> segment. This patch modifies the check to allow pages that can be merged into
> >> the last segment.
> >>
> >> Fixes EBUSY failures when using large tape block size in high
> >> memory fragmentation condition.
> >> This regression was introduced by commit
> >> 46081b166415acb66d4b3150ecefcd9460bb48a1
> >> st: Increase success probability in driver buffer allocation
> >>
> >> Signed-off-by: Jan Vesely <[email protected]>
> >>
> >> CC: Alexander Viro <[email protected]>
> >> CC: FUJITA Tomonori <[email protected]>
> >> CC: Kai Makisara <[email protected]>
> >> CC: James Bottomley <[email protected]>
> >> CC: Jens Axboe <[email protected]>
> >> CC: [email protected]
> >> ---
> >> fs/bio.c | 27 +++++++++++++++++----------
> >> 1 file changed, 17 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/fs/bio.c b/fs/bio.c
> >> index bb5768f..bc6af71 100644
> >> --- a/fs/bio.c
> >> +++ b/fs/bio.c
> >> @@ -500,7 +500,6 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
> >> *page, unsigned int len, unsigned int offset,
> >> unsigned short max_sectors)
> >> {
> >> - int retried_segments = 0;
> >> struct bio_vec *bvec;
> >>
> >> /*
> >> @@ -551,18 +550,13 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
> >> return 0;
> >>
> >> /*
> >> - * we might lose a segment or two here, but rather that than
> >> - * make this too complex.
> >> + * The first part of the segment count check,
> >> + * reduce segment count if possible
> >> */
> >>
> >> - while (bio->bi_phys_segments >= queue_max_segments(q)) {
> >> -
> >> - if (retried_segments)
> >> - return 0;
> >> -
> >> - retried_segments = 1;
> >> + if (bio->bi_phys_segments >= queue_max_segments(q))
> >> blk_recount_segments(q, bio);
> >> - }
> >> +
> >>
> >> /*
> >> * setup the new entry, we might clear it again later if we
> >> @@ -572,6 +566,19 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
> >> bvec->bv_page = page;
> >> bvec->bv_len = len;
> >> bvec->bv_offset = offset;
> >> +
> >> + /*
> >> + * the other part of the segment count check, allow mergeable pages
> >> + */
> >> + if ((bio->bi_phys_segments > queue_max_segments(q)) ||
> >> + ( (bio->bi_phys_segments == queue_max_segments(q)) &&
> >> + !BIOVEC_PHYS_MERGEABLE(bvec - 1, bvec))) {
> >> + bvec->bv_page = NULL;
> >> + bvec->bv_len = 0;
> >> + bvec->bv_offset = 0;
> >> + return 0;
> >> + }
> >> +
> >
> > This is a bit messy, I think. bi_phys_segments should never be allowed
> > to go beyond queue_ma_segments(), so the > test does not look right.
> > Maybe it's an artifact of when we fall through with this patch, we bump
> > bi_phys_segments even if the segments are physicall contig and
> > mergeable.
>
> yeah. it is messy, I tried to go for the least invasive changes.
>
> I took the '>' test from the original while loop '>='. The original
> behavior guaranteed bio->bi_phys_segments <= max_segments, if the bio
> satisfied this condition to begin with. I did not find any guarantees
> that the 'bio' parameter of this function has to satisfy this
> condition in general.
>
> My understanding is that if a caller of this function (or one of the
> two that call this one) provides an invalid (segment-count-wise) bio,
> it will fail (return 0 added length), and let the caller handle the
> situation. I admit, I did not check all the call paths that use these
> functions.

Yes, that is how it works. So that should be fine.

> > What happens when the segment is physically mergeable, but the resulting
> > merged segment is too large (bigger than q->limits.max_segment_size)?
> >
>
> ah, yes. I guess I need a check that follows __blk_recalc_rq_segments
> more closely. We know that at this point all pages are merged into
> segments, so a helper function that would be used by both
> __blk_recalc_rq_segments and this check is possible.
>
>
> I still assume that a temporary increase of bi_phys_segments above
> max_segments is ok. If we want to avoid this situation we would need
> to merge tail pages right away. That's imo uglier.

Yes, it's OK if we just ensure that we clear the valid segment flag. At
least that would be the best sort of solution, to ensure that it's
recalculated properly when someone checks it.

--
Jens Axboe

2013-08-01 09:38:43

by Jan Vesely

[permalink] [raw]
Subject: Re: [PATCH RESEND v2] block: modify __bio_add_page check to accept pages that don't start a new segment

On Mon 25 Mar 2013 20:40:09 CET, Jens Axboe wrote:
> On Mon, Mar 25 2013, Jan Vesely wrote:
>> 51506edc5741209311913
>>
>> On Mon 25 Mar 2013 15:24:57 CET, Jens Axboe wrote:
>>> On Mon, Mar 25 2013, Jan Vesely wrote:
>>>> v2: changed a comment
>>>>
>>>> The original behavior was to refuse all pages after the maximum number of
>>>> segments has been reached. However, some drivers (like st) craft their buffers
>>>> to potentially require exactly max segments and multiple pages in the last
>>>> segment. This patch modifies the check to allow pages that can be merged into
>>>> the last segment.
>>>>
>>>> Fixes EBUSY failures when using large tape block size in high
>>>> memory fragmentation condition.
>>>> This regression was introduced by commit
>>>> 46081b166415acb66d4b3150ecefcd9460bb48a1
>>>> st: Increase success probability in driver buffer allocation
>>>>
>>>> Signed-off-by: Jan Vesely <[email protected]>
>>>>
>>>> CC: Alexander Viro <[email protected]>
>>>> CC: FUJITA Tomonori <[email protected]>
>>>> CC: Kai Makisara <[email protected]>
>>>> CC: James Bottomley <[email protected]>
>>>> CC: Jens Axboe <[email protected]>
>>>> CC: [email protected]
>>>> ---
>>>> fs/bio.c | 27 +++++++++++++++++----------
>>>> 1 file changed, 17 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/fs/bio.c b/fs/bio.c
>>>> index bb5768f..bc6af71 100644
>>>> --- a/fs/bio.c
>>>> +++ b/fs/bio.c
>>>> @@ -500,7 +500,6 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
>>>> *page, unsigned int len, unsigned int offset,
>>>> unsigned short max_sectors)
>>>> {
>>>> - int retried_segments = 0;
>>>> struct bio_vec *bvec;
>>>>
>>>> /*
>>>> @@ -551,18 +550,13 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
>>>> return 0;
>>>>
>>>> /*
>>>> - * we might lose a segment or two here, but rather that than
>>>> - * make this too complex.
>>>> + * The first part of the segment count check,
>>>> + * reduce segment count if possible
>>>> */
>>>>
>>>> - while (bio->bi_phys_segments >= queue_max_segments(q)) {
>>>> -
>>>> - if (retried_segments)
>>>> - return 0;
>>>> -
>>>> - retried_segments = 1;
>>>> + if (bio->bi_phys_segments >= queue_max_segments(q))
>>>> blk_recount_segments(q, bio);
>>>> - }
>>>> +
>>>>
>>>> /*
>>>> * setup the new entry, we might clear it again later if we
>>>> @@ -572,6 +566,19 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
>>>> bvec->bv_page = page;
>>>> bvec->bv_len = len;
>>>> bvec->bv_offset = offset;
>>>> +
>>>> + /*
>>>> + * the other part of the segment count check, allow mergeable pages
>>>> + */
>>>> + if ((bio->bi_phys_segments > queue_max_segments(q)) ||
>>>> + ( (bio->bi_phys_segments == queue_max_segments(q)) &&
>>>> + !BIOVEC_PHYS_MERGEABLE(bvec - 1, bvec))) {
>>>> + bvec->bv_page = NULL;
>>>> + bvec->bv_len = 0;
>>>> + bvec->bv_offset = 0;
>>>> + return 0;
>>>> + }
>>>> +
>>>
>>> This is a bit messy, I think. bi_phys_segments should never be allowed
>>> to go beyond queue_ma_segments(), so the > test does not look right.
>>> Maybe it's an artifact of when we fall through with this patch, we bump
>>> bi_phys_segments even if the segments are physicall contig and
>>> mergeable.
>>
>> yeah. it is messy, I tried to go for the least invasive changes.
>>
>> I took the '>' test from the original while loop '>='. The original
>> behavior guaranteed bio->bi_phys_segments <= max_segments, if the bio
>> satisfied this condition to begin with. I did not find any guarantees
>> that the 'bio' parameter of this function has to satisfy this
>> condition in general.
>>
>> My understanding is that if a caller of this function (or one of the
>> two that call this one) provides an invalid (segment-count-wise) bio,
>> it will fail (return 0 added length), and let the caller handle the
>> situation. I admit, I did not check all the call paths that use these
>> functions.
>
> Yes, that is how it works. So that should be fine.
>
>>> What happens when the segment is physically mergeable, but the resulting
>>> merged segment is too large (bigger than q->limits.max_segment_size)?
>>>
>>
>> ah, yes. I guess I need a check that follows __blk_recalc_rq_segments
>> more closely. We know that at this point all pages are merged into
>> segments, so a helper function that would be used by both
>> __blk_recalc_rq_segments and this check is possible.
>>
>>
>> I still assume that a temporary increase of bi_phys_segments above
>> max_segments is ok. If we want to avoid this situation we would need
>> to merge tail pages right away. That's imo uglier.
>
> Yes, it's OK if we just ensure that we clear the valid segment flag. At
> least that would be the best sort of solution, to ensure that it's
> recalculated properly when someone checks it.
>

Hi Jens,

v3 has been around for few months and I posted v4(whitespace changes)
two weeks ago.
Let me know if there's something more I can do to get these patches
merged.

regards,
--
Jan Vesely <[email protected]>