The block layer spends quite a while in blkdev_direct_IO() to copy and
initialise bio's bvec. However, if we've already got a bvec in the input
iterator it might be reused in some cases, i.e. when new
ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable
performance boost, and it also reduces memory footprint.
Suggested-by: Matthew Wilcox <[email protected]>
[BIO_WORKINGSET] Suggested-by: Johannes Weiner <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/block_dev.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index d699f3af1a09..aee5d2e4f324 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -349,6 +349,28 @@ static void blkdev_bio_end_io(struct bio *bio)
}
}
+static int bio_iov_fixed_bvec_get_pages(struct bio *bio, struct iov_iter *iter)
+{
+ bio->bi_vcnt = iter->nr_segs;
+ bio->bi_max_vecs = iter->nr_segs;
+ bio->bi_io_vec = (struct bio_vec *)iter->bvec;
+ bio->bi_iter.bi_bvec_done = iter->iov_offset;
+ bio->bi_iter.bi_size = iter->count;
+
+ /*
+ * In practice groups of pages tend to be accessed/reclaimed/refaulted
+ * together. To not go over bvec for those who didn't set BIO_WORKINGSET
+ * approximate it by looking at the first page and inducing it to the
+ * whole bio
+ */
+ if (unlikely(PageWorkingset(iter->bvec->bv_page)))
+ bio_set_flag(bio, BIO_WORKINGSET);
+ bio_set_flag(bio, BIO_NO_PAGE_REF);
+
+ iter->count = 0;
+ return 0;
+}
+
static ssize_t
__blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs)
{
@@ -368,6 +390,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs)
(bdev_logical_block_size(bdev) - 1))
return -EINVAL;
+ if (iov_iter_bvec_fixed(iter))
+ nr_vecs = 0;
bio = bio_alloc_bioset(GFP_KERNEL, nr_vecs, &blkdev_dio_pool);
dio = container_of(bio, struct blkdev_dio, bio);
@@ -398,7 +422,11 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs)
bio->bi_end_io = blkdev_bio_end_io;
bio->bi_ioprio = iocb->ki_ioprio;
- ret = bio_iov_iter_get_pages(bio, iter);
+ if (iov_iter_is_bvec(iter) && iov_iter_bvec_fixed(iter))
+ ret = bio_iov_fixed_bvec_get_pages(bio, iter);
+ else
+ ret = bio_iov_iter_get_pages(bio, iter);
+
if (unlikely(ret)) {
bio->bi_status = BLK_STS_IOERR;
bio_endio(bio);
--
2.24.0
> + /*
> + * In practice groups of pages tend to be accessed/reclaimed/refaulted
> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET
> + * approximate it by looking at the first page and inducing it to the
> + * whole bio
> + */
> + if (unlikely(PageWorkingset(iter->bvec->bv_page)))
> + bio_set_flag(bio, BIO_WORKINGSET);
IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
at all for direct I/O.
> + bio_set_flag(bio, BIO_NO_PAGE_REF);
> +
> + iter->count = 0;
> + return 0;
> +}
This helper should go into bio.c, next to bio_iov_iter_get_pages.
And please convert the other callers of bio_iov_iter_get_pages to this
scheme as well.
On Wed, Dec 09, 2020 at 12:01:27PM +0000, Pavel Begunkov wrote:
> On 09/12/2020 08:40, Christoph Hellwig wrote:
> >> + /*
> >> + * In practice groups of pages tend to be accessed/reclaimed/refaulted
> >> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET
> >> + * approximate it by looking at the first page and inducing it to the
> >> + * whole bio
> >> + */
> >> + if (unlikely(PageWorkingset(iter->bvec->bv_page)))
> >> + bio_set_flag(bio, BIO_WORKINGSET);
> >
> > IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
> > at all for direct I/O.
>
> I was waiting for the conversation to unfold, i.e. for Johannes to
> answer. BTW, would the same (skipping BIO_WORKINGSET) stand true for
> iomap?
iomap direct I/O: yes.
On 09/12/2020 08:40, Christoph Hellwig wrote:
>> + /*
>> + * In practice groups of pages tend to be accessed/reclaimed/refaulted
>> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET
>> + * approximate it by looking at the first page and inducing it to the
>> + * whole bio
>> + */
>> + if (unlikely(PageWorkingset(iter->bvec->bv_page)))
>> + bio_set_flag(bio, BIO_WORKINGSET);
>
> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
> at all for direct I/O.
I was waiting for the conversation to unfold, i.e. for Johannes to
answer. BTW, would the same (skipping BIO_WORKINGSET) stand true for
iomap?
>
>> + bio_set_flag(bio, BIO_NO_PAGE_REF);
>> +
>> + iter->count = 0;
>> + return 0;
>> +}
>
> This helper should go into bio.c, next to bio_iov_iter_get_pages.
> And please convert the other callers of bio_iov_iter_get_pages to this
> scheme as well.
Agree. In the end I want to merge that into bio_iov_iter_get_pages().
--
Pavel Begunkov
On 09/12/2020 12:05, Christoph Hellwig wrote:
> On Wed, Dec 09, 2020 at 12:01:27PM +0000, Pavel Begunkov wrote:
>> On 09/12/2020 08:40, Christoph Hellwig wrote:
>>>> + /*
>>>> + * In practice groups of pages tend to be accessed/reclaimed/refaulted
>>>> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET
>>>> + * approximate it by looking at the first page and inducing it to the
>>>> + * whole bio
>>>> + */
>>>> + if (unlikely(PageWorkingset(iter->bvec->bv_page)))
>>>> + bio_set_flag(bio, BIO_WORKINGSET);
>>>
>>> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
>>> at all for direct I/O.
>>
>> I was waiting for the conversation to unfold, i.e. for Johannes to
>> answer. BTW, would the same (skipping BIO_WORKINGSET) stand true for
>> iomap?
>
> iomap direct I/O: yes.
That one, got it
--
Pavel Begunkov
From: Pavel Begunkov
> Sent: 09 December 2020 02:20
>
> The block layer spends quite a while in blkdev_direct_IO() to copy and
> initialise bio's bvec. However, if we've already got a bvec in the input
> iterator it might be reused in some cases, i.e. when new
> ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable
> performance boost, and it also reduces memory footprint.
...
> @@ -398,7 +422,11 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs)
> bio->bi_end_io = blkdev_bio_end_io;
> bio->bi_ioprio = iocb->ki_ioprio;
>
> - ret = bio_iov_iter_get_pages(bio, iter);
> + if (iov_iter_is_bvec(iter) && iov_iter_bvec_fixed(iter))
> + ret = bio_iov_fixed_bvec_get_pages(bio, iter);
> + else
> + ret = bio_iov_iter_get_pages(bio, iter);
> +
Is it necessary to check iov_iter_is_bvec() as well as iov_iter_bvec_fixed() ?
If so it is probably worth using & not && so the compiler stands
a chance of generating a & (B | C) == B instead of 2 conditionals.
(I think I saw the bits in the same field being tested.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On 11/12/2020 14:06, Johannes Weiner wrote:
> On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote:
>>> + /*
>>> + * In practice groups of pages tend to be accessed/reclaimed/refaulted
>>> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET
>>> + * approximate it by looking at the first page and inducing it to the
>>> + * whole bio
>>> + */
>>> + if (unlikely(PageWorkingset(iter->bvec->bv_page)))
>>> + bio_set_flag(bio, BIO_WORKINGSET);
>>
>> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
>> at all for direct I/O.
>
> Yes, this hunk is incorrect. We must not use this flag for direct IO.
> It's only for paging IO, when you bring in the data at page->mapping +
> page->index. Otherwise you tell the pressure accounting code that you
> are paging in a thrashing page, when really you're just reading new
> data into a page frame that happens to be hot.
>
> (As per the other thread, bio_add_page() currently makes that same
> mistake for direct IO. I'm fixing that.)
I have that stuff fixed, it just didn't go into the RFC. That's basically
removing replacing add_page() with its version without BIO_WORKINGSET
in bio_iov_iter_get_pages() and all __bio_iov_*_{add,get}_pages() +
fix up ./fs/direct-io.c. Should cover all direct cases if I didn't miss
some.
--
Pavel Begunkov
On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote:
> > + /*
> > + * In practice groups of pages tend to be accessed/reclaimed/refaulted
> > + * together. To not go over bvec for those who didn't set BIO_WORKINGSET
> > + * approximate it by looking at the first page and inducing it to the
> > + * whole bio
> > + */
> > + if (unlikely(PageWorkingset(iter->bvec->bv_page)))
> > + bio_set_flag(bio, BIO_WORKINGSET);
>
> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
> at all for direct I/O.
Yes, this hunk is incorrect. We must not use this flag for direct IO.
It's only for paging IO, when you bring in the data at page->mapping +
page->index. Otherwise you tell the pressure accounting code that you
are paging in a thrashing page, when really you're just reading new
data into a page frame that happens to be hot.
(As per the other thread, bio_add_page() currently makes that same
mistake for direct IO. I'm fixing that.)
On Fri, Dec 11, 2020 at 02:20:11PM +0000, Pavel Begunkov wrote:
> On 11/12/2020 14:06, Johannes Weiner wrote:
> > On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote:
> >>> + /*
> >>> + * In practice groups of pages tend to be accessed/reclaimed/refaulted
> >>> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET
> >>> + * approximate it by looking at the first page and inducing it to the
> >>> + * whole bio
> >>> + */
> >>> + if (unlikely(PageWorkingset(iter->bvec->bv_page)))
> >>> + bio_set_flag(bio, BIO_WORKINGSET);
> >>
> >> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
> >> at all for direct I/O.
> >
> > Yes, this hunk is incorrect. We must not use this flag for direct IO.
> > It's only for paging IO, when you bring in the data at page->mapping +
> > page->index. Otherwise you tell the pressure accounting code that you
> > are paging in a thrashing page, when really you're just reading new
> > data into a page frame that happens to be hot.
> >
> > (As per the other thread, bio_add_page() currently makes that same
> > mistake for direct IO. I'm fixing that.)
>
> I have that stuff fixed, it just didn't go into the RFC. That's basically
> removing replacing add_page() with its version without BIO_WORKINGSET
> in bio_iov_iter_get_pages() and all __bio_iov_*_{add,get}_pages() +
> fix up ./fs/direct-io.c. Should cover all direct cases if I didn't miss
> some.
Ah, that's fantastic! Thanks for clarifying.
On Fri, Dec 11, 2020 at 03:47:23PM +0000, Pavel Begunkov wrote:
> On 11/12/2020 15:38, Johannes Weiner wrote:
> > On Fri, Dec 11, 2020 at 02:20:11PM +0000, Pavel Begunkov wrote:
> >> On 11/12/2020 14:06, Johannes Weiner wrote:
> >>> On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote:
> >>>>> + /*
> >>>>> + * In practice groups of pages tend to be accessed/reclaimed/refaulted
> >>>>> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET
> >>>>> + * approximate it by looking at the first page and inducing it to the
> >>>>> + * whole bio
> >>>>> + */
> >>>>> + if (unlikely(PageWorkingset(iter->bvec->bv_page)))
> >>>>> + bio_set_flag(bio, BIO_WORKINGSET);
> >>>>
> >>>> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
> >>>> at all for direct I/O.
> >>>
> >>> Yes, this hunk is incorrect. We must not use this flag for direct IO.
> >>> It's only for paging IO, when you bring in the data at page->mapping +
> >>> page->index. Otherwise you tell the pressure accounting code that you
> >>> are paging in a thrashing page, when really you're just reading new
> >>> data into a page frame that happens to be hot.
> >>>
> >>> (As per the other thread, bio_add_page() currently makes that same
> >>> mistake for direct IO. I'm fixing that.)
> >>
> >> I have that stuff fixed, it just didn't go into the RFC. That's basically
> >> removing replacing add_page() with its version without BIO_WORKINGSET
>
> I wrote something strange... Should have been "replacing add_page() in
> those functions with a version without BIO_WORKINGSET".
No worries, I understood.
> >> in bio_iov_iter_get_pages() and all __bio_iov_*_{add,get}_pages() +
> >> fix up ./fs/direct-io.c. Should cover all direct cases if I didn't miss
> >> some.
> >
> > Ah, that's fantastic! Thanks for clarifying.
>
> To keep it clear, do we go with what I have stashed (I'm planning to
> reiterate this weekend)? or you're going to write it up yourself?
> Just in case there is some cooler way you have in mind :)
Honestly, I only wrote all my ideas down and asked for feedback
because I wasn't super excited about any of them ;-)
If your changes happen to separate the direct io path from the
buffered io path naturally, I'm okay with it.
I'd say let's go with what you already have and see whether Jens and
Christoph like it. We can always do follow-on cleanups.
On 11/12/2020 15:38, Johannes Weiner wrote:
> On Fri, Dec 11, 2020 at 02:20:11PM +0000, Pavel Begunkov wrote:
>> On 11/12/2020 14:06, Johannes Weiner wrote:
>>> On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote:
>>>>> + /*
>>>>> + * In practice groups of pages tend to be accessed/reclaimed/refaulted
>>>>> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET
>>>>> + * approximate it by looking at the first page and inducing it to the
>>>>> + * whole bio
>>>>> + */
>>>>> + if (unlikely(PageWorkingset(iter->bvec->bv_page)))
>>>>> + bio_set_flag(bio, BIO_WORKINGSET);
>>>>
>>>> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
>>>> at all for direct I/O.
>>>
>>> Yes, this hunk is incorrect. We must not use this flag for direct IO.
>>> It's only for paging IO, when you bring in the data at page->mapping +
>>> page->index. Otherwise you tell the pressure accounting code that you
>>> are paging in a thrashing page, when really you're just reading new
>>> data into a page frame that happens to be hot.
>>>
>>> (As per the other thread, bio_add_page() currently makes that same
>>> mistake for direct IO. I'm fixing that.)
>>
>> I have that stuff fixed, it just didn't go into the RFC. That's basically
>> removing replacing add_page() with its version without BIO_WORKINGSET
I wrote something strange... Should have been "replacing add_page() in
those functions with a version without BIO_WORKINGSET".
>> in bio_iov_iter_get_pages() and all __bio_iov_*_{add,get}_pages() +
>> fix up ./fs/direct-io.c. Should cover all direct cases if I didn't miss
>> some.
>
> Ah, that's fantastic! Thanks for clarifying.
To keep it clear, do we go with what I have stashed (I'm planning to
reiterate this weekend)? or you're going to write it up yourself?
Just in case there is some cooler way you have in mind :)
--
Pavel Begunkov