2011-05-02 14:17:42

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 1/3] blkdev: Submit discard bio in batches in blkdev_issue_discard()

Currently we are waiting for every submitted REQ_DISCARD bio separately,
but it can have unwanted consequences of repeatedly flushing the queue,
so we rather submit bios in batches and wait for the entire batch, hence
narrowing the window of other ios going in.

Use bio_batch_end_io() and struct bio_batch for that purpose, the same
is used by blkdev_issue_zeroout(). Also change bio_batch_end_io() so we
always set !BIO_UPTODATE in the case of error and remove the check for
bb, since we are the only user of this function and we always set this.

Remove bio_get()/bio_put() from the blkdev_issue_discard() since
bio_alloc() and bio_batch_end_io() is doing the same thing, hence it is
not needed anymore.

I have done simple dd testing with surprising results. The script I have
used is:

for i in $(seq 10); do
echo $i
dd if=/dev/sdb1 of=/dev/sdc1 bs=4k &
sleep 5
done
/usr/bin/time -f %e ./blkdiscard /dev/sdc1

Running time of BLKDISCARD on the whole device:
with patch without patch
0.95 15.58

So we can see that in this artificial test the kernel with the patch
applied is approx 16x faster in discarding the device.

Signed-off-by: Lukas Czerner <[email protected]>
CC: Dmitry Monakhov <[email protected]>
CC: Jens Axboe <[email protected]>
CC: Jeff Moyer <[email protected]>
---
block/blk-lib.c | 69 +++++++++++++++++++++++-------------------------------
1 files changed, 29 insertions(+), 40 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 25de73e..9260cb0 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -9,17 +9,23 @@

#include "blk.h"

-static void blkdev_discard_end_io(struct bio *bio, int err)
+struct bio_batch {
+ atomic_t done;
+ unsigned long flags;
+ struct completion *wait;
+};
+
+static void bio_batch_end_io(struct bio *bio, int err)
{
+ struct bio_batch *bb = bio->bi_private;
+
if (err) {
if (err == -EOPNOTSUPP)
- set_bit(BIO_EOPNOTSUPP, &bio->bi_flags);
- clear_bit(BIO_UPTODATE, &bio->bi_flags);
+ set_bit(BIO_EOPNOTSUPP, &bb->flags);
+ clear_bit(BIO_UPTODATE, &bb->flags);
}
-
- if (bio->bi_private)
- complete(bio->bi_private);
-
+ if (atomic_dec_and_test(&bb->done))
+ complete(bb->wait);
bio_put(bio);
}

@@ -41,6 +47,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
unsigned int max_discard_sectors;
+ struct bio_batch bb;
struct bio *bio;
int ret = 0;

@@ -67,7 +74,11 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
type |= REQ_SECURE;
}

- while (nr_sects && !ret) {
+ atomic_set(&bb.done, 1);
+ bb.flags = 1 << BIO_UPTODATE;
+ bb.wait = &wait;
+
+ while (nr_sects) {
bio = bio_alloc(gfp_mask, 1);
if (!bio) {
ret = -ENOMEM;
@@ -75,9 +86,9 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
}

bio->bi_sector = sector;
- bio->bi_end_io = blkdev_discard_end_io;
+ bio->bi_end_io = bio_batch_end_io;
bio->bi_bdev = bdev;
- bio->bi_private = &wait;
+ bio->bi_private = &bb;

if (nr_sects > max_discard_sectors) {
bio->bi_size = max_discard_sectors << 9;
@@ -88,45 +99,23 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
nr_sects = 0;
}

- bio_get(bio);
+ atomic_inc(&bb.done);
submit_bio(type, bio);
+ }

+ /* Wait for bios in-flight */
+ if (!atomic_dec_and_test(&bb.done))
wait_for_completion(&wait);

- if (bio_flagged(bio, BIO_EOPNOTSUPP))
- ret = -EOPNOTSUPP;
- else if (!bio_flagged(bio, BIO_UPTODATE))
- ret = -EIO;
- bio_put(bio);
- }
+ if (test_bit(BIO_EOPNOTSUPP, &bb.flags))
+ ret = -EOPNOTSUPP;
+ else if (!test_bit(BIO_UPTODATE, &bb.flags))
+ ret = -EIO;

return ret;
}
EXPORT_SYMBOL(blkdev_issue_discard);

-struct bio_batch
-{
- atomic_t done;
- unsigned long flags;
- struct completion *wait;
-};
-
-static void bio_batch_end_io(struct bio *bio, int err)
-{
- struct bio_batch *bb = bio->bi_private;
-
- if (err) {
- if (err == -EOPNOTSUPP)
- set_bit(BIO_EOPNOTSUPP, &bb->flags);
- else
- clear_bit(BIO_UPTODATE, &bb->flags);
- }
- if (bb)
- if (atomic_dec_and_test(&bb->done))
- complete(bb->wait);
- bio_put(bio);
-}
-
/**
* blkdev_issue_zeroout - generate number of zero filed write bios
* @bdev: blockdev to issue
--
1.7.4.4


2011-05-02 14:18:30

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 2/3] blkdev: Simple cleanup in blkdev_issue_zeroout()

In blkdev_issue_zeroout() we are submitting regular WRITE bios, so we do
not need to check for -EOPNOTSUPP specifically in case of error. Also
there is no need to have label submit: because there is no way to jump
out from the while cycle without an error and we really want to exit,
rather than try again. And also remove the check for (sz == 0) since at
that point sz can never be zero.

Signed-off-by: Lukas Czerner <[email protected]>
Reviewed-by: Jeff Moyer <[email protected]>
CC: Dmitry Monakhov <[email protected]>
CC: Jens Axboe <[email protected]>
---
block/blk-lib.c | 14 --------------
1 files changed, 0 insertions(+), 14 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9260cb0..d7a98d3 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -140,7 +140,6 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
bb.flags = 1 << BIO_UPTODATE;
bb.wait = &wait;

-submit:
ret = 0;
while (nr_sects != 0) {
bio = bio_alloc(gfp_mask,
@@ -157,9 +156,6 @@ submit:

while (nr_sects != 0) {
sz = min((sector_t) PAGE_SIZE >> 9 , nr_sects);
- if (sz == 0)
- /* bio has maximum size possible */
- break;
ret = bio_add_page(bio, ZERO_PAGE(0), sz << 9, 0);
nr_sects -= ret >> 9;
sector += ret >> 9;
@@ -179,16 +175,6 @@ submit:
/* One of bios in the batch was completed with error.*/
ret = -EIO;

- if (ret)
- goto out;
-
- if (test_bit(BIO_EOPNOTSUPP, &bb.flags)) {
- ret = -EOPNOTSUPP;
- goto out;
- }
- if (nr_sects != 0)
- goto submit;
-out:
return ret;
}
EXPORT_SYMBOL(blkdev_issue_zeroout);
--
1.7.4.4

2011-05-02 14:18:45

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH 3/3] blkdev: honor discard_granularity in blkdev_issue_discard()

As Jeff Moyer pointed out we do not honor discard granularity while
submitting REQ_DISCARD bios of size smaller than max_discard_sectors.
That fact might have unwanted consequences of device ignoring the
request, or even worse if device firmware is buggy.

This commit changes the behaviour so all discard bios smaller than
max_discard_sectors are aligned properly wrt discard_granularity.

Signed-off-by: Lukas Czerner <[email protected]>
Reported-by: Jeff Moyer <[email protected]>
CC: Dmitry Monakhov <[email protected]>
CC: Jens Axboe <[email protected]>
---
block/blk-lib.c | 10 ++++------
1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index d7a98d3..4019036 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -46,7 +46,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
- unsigned int max_discard_sectors;
+ unsigned int max_discard_sectors, disc_sects;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
@@ -57,16 +57,14 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;

+ disc_sects = q->limits.discard_granularity >> 9;
/*
* Ensure that max_discard_sectors is of the proper
* granularity
*/
max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
- if (q->limits.discard_granularity) {
- unsigned int disc_sects = q->limits.discard_granularity >> 9;
-
+ if (q->limits.discard_granularity)
max_discard_sectors &= ~(disc_sects - 1);
- }

if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
@@ -95,7 +93,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
nr_sects -= max_discard_sectors;
sector += max_discard_sectors;
} else {
- bio->bi_size = nr_sects << 9;
+ bio->bi_size = ((nr_sects & ~(disc_sects - 1)) << 9);
nr_sects = 0;
}

--
1.7.4.4

2011-05-02 14:38:58

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH 3/3] blkdev: honor discard_granularity in blkdev_issue_discard()

>>>>> "Lukas" == Lukas Czerner <[email protected]> writes:

Lukas> As Jeff Moyer pointed out we do not honor discard granularity
Lukas> while submitting REQ_DISCARD bios of size smaller than
Lukas> max_discard_sectors. That fact might have unwanted consequences
Lukas> of device ignoring the request, or even worse if device firmware
Lukas> is buggy.

We've discussed this before and the consensus was not to do it. The
granularity is a hint, not a hard limit like max_discard_sectors.

We want the reporting to be comprehensive throughout the block layer. If
we start aligning to the granularity at the top we lose information for
stacked devices below with a finer granularity.

So if we were to align to the granularity we'd want to do it at the
bottom of the stack when we issue the command to the device. We've had a
few proposed patches to did that but so far we've only found one device
where it made a difference. And that case didn't justify adding a quirk.

--
Martin K. Petersen Oracle Linux Engineering

2011-05-02 16:11:06

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 3/3] blkdev: honor discard_granularity in blkdev_issue_discard()

"Martin K. Petersen" <[email protected]> writes:

>>>>>> "Lukas" == Lukas Czerner <[email protected]> writes:
>
> Lukas> As Jeff Moyer pointed out we do not honor discard granularity
> Lukas> while submitting REQ_DISCARD bios of size smaller than
> Lukas> max_discard_sectors. That fact might have unwanted consequences
> Lukas> of device ignoring the request, or even worse if device firmware
> Lukas> is buggy.
>
> We've discussed this before and the consensus was not to do it. The
> granularity is a hint, not a hard limit like max_discard_sectors.
>
> We want the reporting to be comprehensive throughout the block layer. If
> we start aligning to the granularity at the top we lose information for
> stacked devices below with a finer granularity.
>
> So if we were to align to the granularity we'd want to do it at the
> bottom of the stack when we issue the command to the device. We've had a
> few proposed patches to did that but so far we've only found one device
> where it made a difference. And that case didn't justify adding a quirk.

Hm, I wonder where it makes sense to document this. Maybe
Documentation/block/queue-sysfs.txt, but I admit I wouldn't have looked
there while reviewing this code.

Cheers,
Jeff

2011-05-02 17:07:25

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 1/3] blkdev: Submit discard bio in batches in blkdev_issue_discard()

On 2011-05-02 08:17, Lukas Czerner wrote:
> Currently we are waiting for every submitted REQ_DISCARD bio separately,
> but it can have unwanted consequences of repeatedly flushing the queue,
> so we rather submit bios in batches and wait for the entire batch, hence
> narrowing the window of other ios going in.
>
> Use bio_batch_end_io() and struct bio_batch for that purpose, the same
> is used by blkdev_issue_zeroout(). Also change bio_batch_end_io() so we
> always set !BIO_UPTODATE in the case of error and remove the check for
> bb, since we are the only user of this function and we always set this.
>
> Remove bio_get()/bio_put() from the blkdev_issue_discard() since
> bio_alloc() and bio_batch_end_io() is doing the same thing, hence it is
> not needed anymore.
>
> I have done simple dd testing with surprising results. The script I have
> used is:
>
> for i in $(seq 10); do
> echo $i
> dd if=/dev/sdb1 of=/dev/sdc1 bs=4k &
> sleep 5
> done
> /usr/bin/time -f %e ./blkdiscard /dev/sdc1
>
> Running time of BLKDISCARD on the whole device:
> with patch without patch
> 0.95 15.58
>
> So we can see that in this artificial test the kernel with the patch
> applied is approx 16x faster in discarding the device.

Good results. It'd be more efficient to add the vectored discard support
and use it for this too though, and it would get rid of the need to wait
on the batches since there'd be just one discard bio for the ranges.

--
Jens Axboe

2011-05-02 17:07:46

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH 3/3] blkdev: honor discard_granularity in blkdev_issue_discard()

>>>>> "Jeff" == Jeff Moyer <[email protected]> writes:

Jeff> Hm, I wonder where it makes sense to document this. Maybe
Jeff> Documentation/block/queue-sysfs.txt, but I admit I wouldn't have
Jeff> looked there while reviewing this code.

Maybe we could simply add a comment in the kerneldoc header for
blkdev_issue_discard()?

I tried to keep Documentation/ABI/ up to date for the topology stuff but
it looks like the discard portions never got merged. Probably fell
through the cracks. I'll dig up the patch.

But as you say -- I'm sure nobody looks. My only motivation for keeping
the sysfs ABI doc in sync is to avoid gregkh's wrath...

--
Martin K. Petersen Oracle Linux Engineering

2011-05-02 23:21:40

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH 3/3] blkdev: honor discard_granularity in blkdev_issue_discard()

On Mon, May 02, 2011 at 10:38:38AM -0400, Martin K. Petersen wrote:
> >>>>> "Lukas" == Lukas Czerner <[email protected]> writes:
>
> Lukas> As Jeff Moyer pointed out we do not honor discard granularity
> Lukas> while submitting REQ_DISCARD bios of size smaller than
> Lukas> max_discard_sectors. That fact might have unwanted consequences
> Lukas> of device ignoring the request, or even worse if device firmware
> Lukas> is buggy.
>
> We've discussed this before and the consensus was not to do it. The
> granularity is a hint, not a hard limit like max_discard_sectors.
>
> We want the reporting to be comprehensive throughout the block layer. If
> we start aligning to the granularity at the top we lose information for
> stacked devices below with a finer granularity.
>
> So if we were to align to the granularity we'd want to do it at the
> bottom of the stack when we issue the command to the device. We've had a
> few proposed patches to did that but so far we've only found one device
> where it made a difference. And that case didn't justify adding a quirk.

Adding this comment to the code to explain why we don't enforce the
granularity would be a good idea, yes?

Cheers,

Dave.
--
Dave Chinner
[email protected]

2011-05-03 09:30:54

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 1/3] blkdev: Submit discard bio in batches in blkdev_issue_discard()

On Mon, 2 May 2011, Jens Axboe wrote:

> On 2011-05-02 08:17, Lukas Czerner wrote:
> > Currently we are waiting for every submitted REQ_DISCARD bio separately,
> > but it can have unwanted consequences of repeatedly flushing the queue,
> > so we rather submit bios in batches and wait for the entire batch, hence
> > narrowing the window of other ios going in.
> >
> > Use bio_batch_end_io() and struct bio_batch for that purpose, the same
> > is used by blkdev_issue_zeroout(). Also change bio_batch_end_io() so we
> > always set !BIO_UPTODATE in the case of error and remove the check for
> > bb, since we are the only user of this function and we always set this.
> >
> > Remove bio_get()/bio_put() from the blkdev_issue_discard() since
> > bio_alloc() and bio_batch_end_io() is doing the same thing, hence it is
> > not needed anymore.
> >
> > I have done simple dd testing with surprising results. The script I have
> > used is:
> >
> > for i in $(seq 10); do
> > echo $i
> > dd if=/dev/sdb1 of=/dev/sdc1 bs=4k &
> > sleep 5
> > done
> > /usr/bin/time -f %e ./blkdiscard /dev/sdc1
> >
> > Running time of BLKDISCARD on the whole device:
> > with patch without patch
> > 0.95 15.58
> >
> > So we can see that in this artificial test the kernel with the patch
> > applied is approx 16x faster in discarding the device.
>
> Good results. It'd be more efficient to add the vectored discard support
> and use it for this too though, and it would get rid of the need to wait
> on the batches since there'd be just one discard bio for the ranges.
>

I know that you and Christoph were working on adding support for
vectored discard, but I do not know what is the status of it ?

Thanks!
-Lukas

2011-05-03 09:57:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/3] blkdev: Submit discard bio in batches in blkdev_issue_discard()

On Tue, May 03, 2011 at 11:30:43AM +0200, Lukas Czerner wrote:
> I know that you and Christoph were working on adding support for
> vectored discard, but I do not know what is the status of it ?

I've not gotten further than the prototype yet. I plan to eventually
look into it, but I have some more urgent projects first.

2011-05-05 15:13:04

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH] blkdev: Do not return -EOPNOTSUPP if discard is supported

Currently we return -EOPNOTSUPP in blkdev_issue_discard() if any of the
bio fails due to underlying device not supporting discard request.
However, if the device is for example dm device composed of devices
which some of them support discard and some of them does not, it is ok
for some bios to fail with EOPNOTSUPP, but it does not mean that discard
is not supported at all.

This commit removes the check for bios failed with EOPNOTSUPP and change
blkdev_issue_discard() to return operation not supported if and only if
the device does not actually supports it, not just part of the device as
some bios might indicate.

This change also fixes problem with BLKDISCARD ioctl() which now works
correctly on such dm devices.

Signed-off-by: Lukas Czerner <[email protected]>
CC: Jens Axboe <[email protected]>
CC: Jeff Moyer <[email protected]>
---
block/blk-lib.c | 9 ++-------
1 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index d7a98d3..78e627e 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -19,11 +19,8 @@ static void bio_batch_end_io(struct bio *bio, int err)
{
struct bio_batch *bb = bio->bi_private;

- if (err) {
- if (err == -EOPNOTSUPP)
- set_bit(BIO_EOPNOTSUPP, &bb->flags);
+ if (err && (err != -EOPNOTSUPP))
clear_bit(BIO_UPTODATE, &bb->flags);
- }
if (atomic_dec_and_test(&bb->done))
complete(bb->wait);
bio_put(bio);
@@ -107,9 +104,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
if (!atomic_dec_and_test(&bb.done))
wait_for_completion(&wait);

- if (test_bit(BIO_EOPNOTSUPP, &bb.flags))
- ret = -EOPNOTSUPP;
- else if (!test_bit(BIO_UPTODATE, &bb.flags))
+ if (!test_bit(BIO_UPTODATE, &bb.flags))
ret = -EIO;

return ret;
--
1.7.4.4

2011-05-05 15:20:51

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH 1/3] blkdev: Submit discard bio in batches in blkdev_issue_discard()

On Mon, 2 May 2011, Jens Axboe wrote:

> On 2011-05-02 08:17, Lukas Czerner wrote:
> > Currently we are waiting for every submitted REQ_DISCARD bio separately,
> > but it can have unwanted consequences of repeatedly flushing the queue,
> > so we rather submit bios in batches and wait for the entire batch, hence
> > narrowing the window of other ios going in.
> >
> > Use bio_batch_end_io() and struct bio_batch for that purpose, the same
> > is used by blkdev_issue_zeroout(). Also change bio_batch_end_io() so we
> > always set !BIO_UPTODATE in the case of error and remove the check for
> > bb, since we are the only user of this function and we always set this.
> >
> > Remove bio_get()/bio_put() from the blkdev_issue_discard() since
> > bio_alloc() and bio_batch_end_io() is doing the same thing, hence it is
> > not needed anymore.
> >
> > I have done simple dd testing with surprising results. The script I have
> > used is:
> >
> > for i in $(seq 10); do
> > echo $i
> > dd if=/dev/sdb1 of=/dev/sdc1 bs=4k &
> > sleep 5
> > done
> > /usr/bin/time -f %e ./blkdiscard /dev/sdc1
> >
> > Running time of BLKDISCARD on the whole device:
> > with patch without patch
> > 0.95 15.58
> >
> > So we can see that in this artificial test the kernel with the patch
> > applied is approx 16x faster in discarding the device.
>
> Good results. It'd be more efficient to add the vectored discard support
> and use it for this too though, and it would get rid of the need to wait
> on the batches since there'd be just one discard bio for the ranges.
>

Since we do not have vectored discard yet, can this change go in as it
is ? Also, Christoph thinks that it would not help anyway since the we do
batch the discard requests only when it exceeds the max range.

Thanks!
-Lukas

2011-05-07 01:23:20

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 1/3] blkdev: Submit discard bio in batches in blkdev_issue_discard()

On 2011-05-03 03:30, Lukas Czerner wrote:
> On Mon, 2 May 2011, Jens Axboe wrote:
>
>> On 2011-05-02 08:17, Lukas Czerner wrote:
>>> Currently we are waiting for every submitted REQ_DISCARD bio separately,
>>> but it can have unwanted consequences of repeatedly flushing the queue,
>>> so we rather submit bios in batches and wait for the entire batch, hence
>>> narrowing the window of other ios going in.
>>>
>>> Use bio_batch_end_io() and struct bio_batch for that purpose, the same
>>> is used by blkdev_issue_zeroout(). Also change bio_batch_end_io() so we
>>> always set !BIO_UPTODATE in the case of error and remove the check for
>>> bb, since we are the only user of this function and we always set this.
>>>
>>> Remove bio_get()/bio_put() from the blkdev_issue_discard() since
>>> bio_alloc() and bio_batch_end_io() is doing the same thing, hence it is
>>> not needed anymore.
>>>
>>> I have done simple dd testing with surprising results. The script I have
>>> used is:
>>>
>>> for i in $(seq 10); do
>>> echo $i
>>> dd if=/dev/sdb1 of=/dev/sdc1 bs=4k &
>>> sleep 5
>>> done
>>> /usr/bin/time -f %e ./blkdiscard /dev/sdc1
>>>
>>> Running time of BLKDISCARD on the whole device:
>>> with patch without patch
>>> 0.95 15.58
>>>
>>> So we can see that in this artificial test the kernel with the patch
>>> applied is approx 16x faster in discarding the device.
>>
>> Good results. It'd be more efficient to add the vectored discard support
>> and use it for this too though, and it would get rid of the need to wait
>> on the batches since there'd be just one discard bio for the ranges.
>>
>
> I know that you and Christoph were working on adding support for
> vectored discard, but I do not know what is the status of it ?

No new status. I sent a test patch to Christoph and that's about it on
that end for now. But it could be expedited, if need be.

--
Jens Axboe

2011-05-07 01:24:27

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 1/3] blkdev: Submit discard bio in batches in blkdev_issue_discard()

On 2011-05-05 09:20, Lukas Czerner wrote:
> On Mon, 2 May 2011, Jens Axboe wrote:
>
>> On 2011-05-02 08:17, Lukas Czerner wrote:
>>> Currently we are waiting for every submitted REQ_DISCARD bio separately,
>>> but it can have unwanted consequences of repeatedly flushing the queue,
>>> so we rather submit bios in batches and wait for the entire batch, hence
>>> narrowing the window of other ios going in.
>>>
>>> Use bio_batch_end_io() and struct bio_batch for that purpose, the same
>>> is used by blkdev_issue_zeroout(). Also change bio_batch_end_io() so we
>>> always set !BIO_UPTODATE in the case of error and remove the check for
>>> bb, since we are the only user of this function and we always set this.
>>>
>>> Remove bio_get()/bio_put() from the blkdev_issue_discard() since
>>> bio_alloc() and bio_batch_end_io() is doing the same thing, hence it is
>>> not needed anymore.
>>>
>>> I have done simple dd testing with surprising results. The script I have
>>> used is:
>>>
>>> for i in $(seq 10); do
>>> echo $i
>>> dd if=/dev/sdb1 of=/dev/sdc1 bs=4k &
>>> sleep 5
>>> done
>>> /usr/bin/time -f %e ./blkdiscard /dev/sdc1
>>>
>>> Running time of BLKDISCARD on the whole device:
>>> with patch without patch
>>> 0.95 15.58
>>>
>>> So we can see that in this artificial test the kernel with the patch
>>> applied is approx 16x faster in discarding the device.
>>
>> Good results. It'd be more efficient to add the vectored discard support
>> and use it for this too though, and it would get rid of the need to wait
>> on the batches since there'd be just one discard bio for the ranges.
>>
>
> Since we do not have vectored discard yet, can this change go in as it
> is ? Also, Christoph thinks that it would not help anyway since the we do
> batch the discard requests only when it exceeds the max range.

Sure, it need not be a show stopper. I will apply 1-2 for now to
for-2.6.40/core.

--
Jens Axboe

2011-05-07 01:30:45

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] blkdev: Do not return -EOPNOTSUPP if discard is supported

On 2011-05-05 09:12, Lukas Czerner wrote:
> Currently we return -EOPNOTSUPP in blkdev_issue_discard() if any of the
> bio fails due to underlying device not supporting discard request.
> However, if the device is for example dm device composed of devices
> which some of them support discard and some of them does not, it is ok
> for some bios to fail with EOPNOTSUPP, but it does not mean that discard
> is not supported at all.
>
> This commit removes the check for bios failed with EOPNOTSUPP and change
> blkdev_issue_discard() to return operation not supported if and only if
> the device does not actually supports it, not just part of the device as
> some bios might indicate.
>
> This change also fixes problem with BLKDISCARD ioctl() which now works
> correctly on such dm devices.

Applied, thanks.

--
Jens Axboe

2011-05-09 14:30:31

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 1/3] blkdev: Submit discard bio in batches in blkdev_issue_discard()

On Thu, May 05, 2011 at 05:20:40PM +0200, Lukas Czerner wrote:
> > Good results. It'd be more efficient to add the vectored discard support
> > and use it for this too though, and it would get rid of the need to wait
> > on the batches since there'd be just one discard bio for the ranges.
> >
>
> Since we do not have vectored discard yet, can this change go in as it
> is ? Also, Christoph thinks that it would not help anyway since the we do
> batch the discard requests only when it exceeds the max range.

I think this patch is orthogonal to vectored discard. The batching
is important for large ranges which generally come from BLKDISCARD or
FITRIM, while vectored discard is mostly interesting for lots of small
discards at transaction commit time.