2016-12-17 10:49:39

by Ming Lei

[permalink] [raw]
Subject: [PATCH] block: loose check on sg gap

If the last bvec of the 1st bio and the 1st bvec of the next
bio are contineous physically, and the latter can be merged
to last segment of the 1st bio, we should think they don't
violate sg gap(or virt boundary) limit.

Both Vitaly and Dexuan reported lots of unmergeable small bios
are observed when running mkfs on Hyper-V virtual storage, and
performance becomes quite low, so this patch is figured out for
fixing the performance issue.

The same issue should exist on NVMe too sine it sets virt boundary too.

Reported-by: Vitaly Kuznetsov <[email protected]>
Reported-by: Dexuan Cui <[email protected]>
Tested-by: Dexuan Cui <[email protected]>
Cc: Keith Busch <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
include/linux/blkdev.h | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 286b2a264383..1ce26e771bcc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1608,6 +1608,25 @@ static inline bool bvec_gap_to_prev(struct request_queue *q,
return __bvec_gap_to_prev(q, bprv, offset);
}

+/*
+ * Check if the two bvecs from two bios can be merged to one segment.
+ * If yes, no need to check gap between the two bios since the 1st bio
+ * and the 1st bvec in the 2nd bio can be handled in one segment.
+ */
+static inline bool bios_segs_mergeable(struct request_queue *q,
+ struct bio *prev, struct bio_vec *prev_last_bv,
+ struct bio_vec *next_first_bv)
+{
+ if (!BIOVEC_PHYS_MERGEABLE(prev_last_bv, next_first_bv))
+ return false;
+ if (!BIOVEC_SEG_BOUNDARY(q, prev_last_bv, next_first_bv))
+ return false;
+ if (prev->bi_seg_back_size + next_first_bv->bv_len >
+ queue_max_segment_size(q))
+ return false;
+ return true;
+}
+
static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
struct bio *next)
{
@@ -1617,7 +1636,8 @@ static inline bool bio_will_gap(struct request_queue *q, struct bio *prev,
bio_get_last_bvec(prev, &pb);
bio_get_first_bvec(next, &nb);

- return __bvec_gap_to_prev(q, &pb, nb.bv_offset);
+ if (!bios_segs_mergeable(q, prev, &pb, &nb))
+ return __bvec_gap_to_prev(q, &pb, nb.bv_offset);
}

return false;
--
2.7.4


2016-12-17 16:50:57

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] block: loose check on sg gap

On 12/17/2016 03:49 AM, Ming Lei wrote:
> If the last bvec of the 1st bio and the 1st bvec of the next
> bio are contineous physically, and the latter can be merged
> to last segment of the 1st bio, we should think they don't
> violate sg gap(or virt boundary) limit.
>
> Both Vitaly and Dexuan reported lots of unmergeable small bios
> are observed when running mkfs on Hyper-V virtual storage, and
> performance becomes quite low, so this patch is figured out for
> fixing the performance issue.
>
> The same issue should exist on NVMe too sine it sets virt boundary too.

It looks pretty reasonable to me. I'll queue it up for some testing,
changes like this always make me a little nervous.

--
Jens Axboe

2016-12-20 02:08:00

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] block: loose check on sg gap

On Sun, Dec 18, 2016 at 12:49 AM, Jens Axboe <[email protected]> wrote:
> On 12/17/2016 03:49 AM, Ming Lei wrote:
>> If the last bvec of the 1st bio and the 1st bvec of the next
>> bio are contineous physically, and the latter can be merged
>> to last segment of the 1st bio, we should think they don't
>> violate sg gap(or virt boundary) limit.
>>
>> Both Vitaly and Dexuan reported lots of unmergeable small bios
>> are observed when running mkfs on Hyper-V virtual storage, and
>> performance becomes quite low, so this patch is figured out for
>> fixing the performance issue.
>>
>> The same issue should exist on NVMe too sine it sets virt boundary too.
>
> It looks pretty reasonable to me. I'll queue it up for some testing,
> changes like this always make me a little nervous.

Understood.

But given it is still in early stage of 4.10 cycle, seems fine to expose
it now, and we should have enough time to fix it if there might be
regressions.

BTW, it passes my xfstest(ext4) over sata/NVMe.

Thanks,
Ming

2016-12-20 02:32:29

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] block: loose check on sg gap

On 12/19/2016 07:07 PM, Ming Lei wrote:
> On Sun, Dec 18, 2016 at 12:49 AM, Jens Axboe <[email protected]> wrote:
>> On 12/17/2016 03:49 AM, Ming Lei wrote:
>>> If the last bvec of the 1st bio and the 1st bvec of the next
>>> bio are contineous physically, and the latter can be merged
>>> to last segment of the 1st bio, we should think they don't
>>> violate sg gap(or virt boundary) limit.
>>>
>>> Both Vitaly and Dexuan reported lots of unmergeable small bios
>>> are observed when running mkfs on Hyper-V virtual storage, and
>>> performance becomes quite low, so this patch is figured out for
>>> fixing the performance issue.
>>>
>>> The same issue should exist on NVMe too sine it sets virt boundary too.
>>
>> It looks pretty reasonable to me. I'll queue it up for some testing,
>> changes like this always make me a little nervous.
>
> Understood.
>
> But given it is still in early stage of 4.10 cycle, seems fine to expose
> it now, and we should have enough time to fix it if there might be
> regressions.
>
> BTW, it passes my xfstest(ext4) over sata/NVMe.

It's been fine here in testing, too. I'm not worried about performance
regressions, those we can always fix. Merging makes me worried about
corruption, and those regressions are much worse.

Any reason we need to rush this? I'd be more comfortable pushing this to
4.11, unless there are strong reasons this should make 4.10.

--
Jens Axboe

2016-12-20 03:55:52

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] block: loose check on sg gap

> From: Jens Axboe [mailto:[email protected]]
> Sent: Tuesday, December 20, 2016 10:31
> To: Ming Lei <[email protected]>
> Cc: Linux Kernel Mailing List <[email protected]>; linux-block <linux-
> [email protected]>; Christoph Hellwig <[email protected]>; Dexuan Cui
> <[email protected]>; Vitaly Kuznetsov <[email protected]>; Keith Busch
> <[email protected]>; Hannes Reinecke <[email protected]>; Mike Christie
> <[email protected]>; Martin K. Petersen <[email protected]>;
> Toshi Kani <[email protected]>; Dan Williams <[email protected]>;
> Damien Le Moal <[email protected]>
> Subject: Re: [PATCH] block: loose check on sg gap
>
> On 12/19/2016 07:07 PM, Ming Lei wrote:
> > On Sun, Dec 18, 2016 at 12:49 AM, Jens Axboe <[email protected]> wrote:
> >> On 12/17/2016 03:49 AM, Ming Lei wrote:
> >>> If the last bvec of the 1st bio and the 1st bvec of the next
> >>> bio are contineous physically, and the latter can be merged
> >>> to last segment of the 1st bio, we should think they don't
> >>> violate sg gap(or virt boundary) limit.
> >>>
> >>> Both Vitaly and Dexuan reported lots of unmergeable small bios
> >>> are observed when running mkfs on Hyper-V virtual storage, and
> >>> performance becomes quite low, so this patch is figured out for
> >>> fixing the performance issue.
> >>>
> >>> The same issue should exist on NVMe too sine it sets virt boundary too.
> >>
> >> It looks pretty reasonable to me. I'll queue it up for some testing,
> >> changes like this always make me a little nervous.
> >
> > Understood.
> >
> > But given it is still in early stage of 4.10 cycle, seems fine to expose
> > it now, and we should have enough time to fix it if there might be
> > regressions.
> >
> > BTW, it passes my xfstest(ext4) over sata/NVMe.
>
> It's been fine here in testing, too. I'm not worried about performance
> regressions, those we can always fix. Merging makes me worried about
> corruption, and those regressions are much worse.
>
> Any reason we need to rush this? I'd be more comfortable pushing this to
> 4.11, unless there are strong reasons this should make 4.10.
>
> --
> Jens Axboe

Hi Jens,

As far as I know, the patch is important to popular Linux distros,
e.g. at least Ubuntu 14.04.5, 16.x and RHEL 7.3, when they run on
Hyper-V/Azure, because they can suffer from a pretty bad throughput/latency
in some cases, e.g. mkfs.ext4 for a 100GB partition can take 8 minutes, but
with the patch, it only takes 1 second.

Thanks,
-- Dexuan