2016-04-25 23:52:44

by Shaohua Li

[permalink] [raw]
Subject: [PATCH] MD: make bio mergeable

blk_queue_split marks bio unmergeable, which makes sense for normal bio.
But if dispatching the bio to underlayer disk, the blk_queue_split
checks are invalid, hence it's possible the bio becomes mergeable.

In the reported bug, this bug causes trim against raid0 performance slash
https://bugzilla.kernel.org/show_bug.cgi?id=117051

Reported-by: Park Ju Hyung <[email protected]>
Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio)
Cc: [email protected] (v4.3+)
Cc: Ming Lei <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Neil Brown <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
---
drivers/md/md.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 194580f..14d3b37 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -284,6 +284,8 @@ static blk_qc_t md_make_request(struct request_queue *q, struct bio *bio)
* go away inside make_request
*/
sectors = bio_sectors(bio);
+ /* bio could be mergeable after passing to underlayer */
+ bio->bi_rw &= ~REQ_NOMERGE;
mddev->pers->make_request(mddev, bio);

cpu = part_stat_lock();
--
2.8.0.rc2


2016-04-26 00:59:09

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] MD: make bio mergeable

On 04/25/2016 05:52 PM, Shaohua Li wrote:
> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> But if dispatching the bio to underlayer disk, the blk_queue_split
> checks are invalid, hence it's possible the bio becomes mergeable.
>
> In the reported bug, this bug causes trim against raid0 performance slash
> https://bugzilla.kernel.org/show_bug.cgi?id=117051

Good catch! Will apply for this series, thanks Shaohua.

--
Jens Axboe

2016-04-26 01:16:06

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] MD: make bio mergeable

On 04/25/2016 06:59 PM, Jens Axboe wrote:
> On 04/25/2016 05:52 PM, Shaohua Li wrote:
>> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
>> But if dispatching the bio to underlayer disk, the blk_queue_split
>> checks are invalid, hence it's possible the bio becomes mergeable.
>>
>> In the reported bug, this bug causes trim against raid0 performance slash
>> https://bugzilla.kernel.org/show_bug.cgi?id=117051
>
> Good catch! Will apply for this series, thanks Shaohua.

Actually, let's let that go through the md tree instead. But you can add
my Reviewed-by, and it'd be nice to get this into 4.6.

--
Jens Axboe

2016-04-26 09:56:33

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] MD: make bio mergeable

On Tue, Apr 26, 2016 at 7:52 AM, Shaohua Li <[email protected]> wrote:
> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> But if dispatching the bio to underlayer disk, the blk_queue_split
> checks are invalid, hence it's possible the bio becomes mergeable.

If the bio from md is splitted and marked as NOMERGE, it means some
queue limits are reached. So looks the raid's queue limit is set as not
big enough, could your find which limit causes the splitting and nomerge?

>
> In the reported bug, this bug causes trim against raid0 performance slash
> https://bugzilla.kernel.org/show_bug.cgi?id=117051
>
> Reported-by: Park Ju Hyung <[email protected]>
> Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio)
> Cc: [email protected] (v4.3+)
> Cc: Ming Lei <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Cc: Neil Brown <[email protected]>
> Signed-off-by: Shaohua Li <[email protected]>
> ---
> drivers/md/md.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 194580f..14d3b37 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -284,6 +284,8 @@ static blk_qc_t md_make_request(struct request_queue *q, struct bio *bio)
> * go away inside make_request
> */
> sectors = bio_sectors(bio);
> + /* bio could be mergeable after passing to underlayer */
> + bio->bi_rw &= ~REQ_NOMERGE;

IMO it isn't a good fix, eigher we need to set a correct queue limit, or
we simply don't set nomerge for all stackable block device. But I prefer
to the former a bit.

Thanks,

> mddev->pers->make_request(mddev, bio);
>
> cpu = part_stat_lock();
> --
> 2.8.0.rc2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-04-26 14:21:27

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] MD: make bio mergeable

On 04/26/2016 03:56 AM, Ming Lei wrote:
> On Tue, Apr 26, 2016 at 7:52 AM, Shaohua Li <[email protected]> wrote:
>> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
>> But if dispatching the bio to underlayer disk, the blk_queue_split
>> checks are invalid, hence it's possible the bio becomes mergeable.
>
> If the bio from md is splitted and marked as NOMERGE, it means some
> queue limits are reached. So looks the raid's queue limit is set as not
> big enough, could your find which limit causes the splitting and nomerge?

raid0 sets a limit of the stripe size for IO. Once the IO has passed md,
there's no reason why we can't merge for the lower driver. This is
(potentially) a huge performance issue on trim, since a lot of devices
are trim ops / sec limited rather than throughput limited.

--
Jens Axboe

2016-04-26 15:18:04

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] MD: make bio mergeable

On Tue, Apr 26, 2016 at 10:21 PM, Jens Axboe <[email protected]> wrote:
> On 04/26/2016 03:56 AM, Ming Lei wrote:
>>
>> On Tue, Apr 26, 2016 at 7:52 AM, Shaohua Li <[email protected]> wrote:
>>>
>>> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
>>> But if dispatching the bio to underlayer disk, the blk_queue_split
>>> checks are invalid, hence it's possible the bio becomes mergeable.
>>
>>
>> If the bio from md is splitted and marked as NOMERGE, it means some
>> queue limits are reached. So looks the raid's queue limit is set as not
>> big enough, could your find which limit causes the splitting and nomerge?
>
>
> raid0 sets a limit of the stripe size for IO. Once the IO has passed md,
> there's no reason why we can't merge for the lower driver. This is
> (potentially) a huge performance issue on trim, since a lot of devices are
> trim ops / sec limited rather than throughput limited.

Just found raid0 maps the chunk sectors into max hw sectors of queue,
and dm uses blk_stack_limits() to set up the limits.

So looks a raid specific issue, then the fix is correct, sorry for the noise.

thanks,
Ming

2016-04-28 20:06:10

by Holger Kiehl

[permalink] [raw]
Subject: Re: [PATCH] MD: make bio mergeable

Hello,

On Mon, 25 Apr 2016, Shaohua Li wrote:

> blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> But if dispatching the bio to underlayer disk, the blk_queue_split
> checks are invalid, hence it's possible the bio becomes mergeable.
>
> In the reported bug, this bug causes trim against raid0 performance slash
> https://bugzilla.kernel.org/show_bug.cgi?id=117051
>
This patch makes a huge difference. On a system with two Samsung 850 Pro
in a MD Raid0 setup the time for fstrim went down from ~30min to 18sec!

However, on another system with two Intel P3700 1.6TB NVMe PCIe SSD's
also setup as one big MD Raid0, the patch does not make any difference
at all. fstrim takes more then 4 hours!

Any idea what could be wrong?

Regards,
Holger


> Reported-by: Park Ju Hyung <[email protected]>
> Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio)
> Cc: [email protected] (v4.3+)
> Cc: Ming Lei <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Cc: Neil Brown <[email protected]>
> Signed-off-by: Shaohua Li <[email protected]>
> ---
> drivers/md/md.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 194580f..14d3b37 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -284,6 +284,8 @@ static blk_qc_t md_make_request(struct request_queue *q, struct bio *bio)
> * go away inside make_request
> */
> sectors = bio_sectors(bio);
> + /* bio could be mergeable after passing to underlayer */
> + bio->bi_rw &= ~REQ_NOMERGE;
> mddev->pers->make_request(mddev, bio);
>
> cpu = part_stat_lock();
> --
> 2.8.0.rc2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2016-04-28 21:19:31

by Shaohua Li

[permalink] [raw]
Subject: Re: [PATCH] MD: make bio mergeable

On Thu, Apr 28, 2016 at 08:00:22PM +0000, Holger Kiehl wrote:
> Hello,
>
> On Mon, 25 Apr 2016, Shaohua Li wrote:
>
> > blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> > But if dispatching the bio to underlayer disk, the blk_queue_split
> > checks are invalid, hence it's possible the bio becomes mergeable.
> >
> > In the reported bug, this bug causes trim against raid0 performance slash
> > https://bugzilla.kernel.org/show_bug.cgi?id=117051
> >
> This patch makes a huge difference. On a system with two Samsung 850 Pro
> in a MD Raid0 setup the time for fstrim went down from ~30min to 18sec!
>
> However, on another system with two Intel P3700 1.6TB NVMe PCIe SSD's
> also setup as one big MD Raid0, the patch does not make any difference
> at all. fstrim takes more then 4 hours!

Does the raid0 cross two partitions or two SSD?

can you post blktrace data in the bugzilloa, I'll track the bug there.

Thanks,
Shaohua

2016-04-29 09:24:00

by Holger Kiehl

[permalink] [raw]
Subject: Re: [PATCH] MD: make bio mergeable

On Thu, 28 Apr 2016, Shaohua Li wrote:

> On Thu, Apr 28, 2016 at 08:00:22PM +0000, Holger Kiehl wrote:
> > Hello,
> >
> > On Mon, 25 Apr 2016, Shaohua Li wrote:
> >
> > > blk_queue_split marks bio unmergeable, which makes sense for normal bio.
> > > But if dispatching the bio to underlayer disk, the blk_queue_split
> > > checks are invalid, hence it's possible the bio becomes mergeable.
> > >
> > > In the reported bug, this bug causes trim against raid0 performance slash
> > > https://bugzilla.kernel.org/show_bug.cgi?id=117051
> > >
> > This patch makes a huge difference. On a system with two Samsung 850 Pro
> > in a MD Raid0 setup the time for fstrim went down from ~30min to 18sec!
> >
> > However, on another system with two Intel P3700 1.6TB NVMe PCIe SSD's
> > also setup as one big MD Raid0, the patch does not make any difference
> > at all. fstrim takes more then 4 hours!
>
> Does the raid0 cross two partitions or two SSD?
>
Two SSD's. Where it works, for the two Samsung 850 Pro SATA SSD it was
via partitions.

> can you post blktrace data in the bugzilloa, I'll track the bug there.
>
I did the blktrace on the two md raid0 devices /dev/nvme[01]n1 for 2 minutes
and attached them to the bug 117051 as a tar.bz2 file:

https://bugzilla.kernel.org/show_bug.cgi?id=117051

Please just ask if I have forgotten anything. And many thanks for looking
at this and all the good work!

Regards,
Holger