In several places, btrfs allocates an array of pages, one at a time. In
addition to duplicating code, the mm subsystem provides a helper to
allocate multiple pages at once into an array which is suited for our
usecase. In the fast path, the batching can result in better allocation
decisions and less locking. This changeset first adjusts the users to
call a common array-of-pages allocation function, then adjusts that
common function to use the batch page allocator.
v3:
- fixed up the other style comments from kdave's v1 comments
v2:
- moved new helper to extent_io.[ch]; fixed title format
- https://lore.kernel.org/linux-btrfs/[email protected]/
v1:
- https://lore.kernel.org/linux-btrfs/[email protected]/
Sweet Tea Dorminy (2):
btrfs: factor out allocating an array of pages
btrfs: allocate page arrays using bulk page allocator
fs/btrfs/check-integrity.c | 8 ++---
fs/btrfs/compression.c | 39 +++++++++------------
fs/btrfs/extent_io.c | 71 +++++++++++++++++++++++++++++---------
fs/btrfs/extent_io.h | 2 ++
fs/btrfs/inode.c | 10 +++---
fs/btrfs/raid56.c | 30 +++-------------
6 files changed, 85 insertions(+), 75 deletions(-)
--
2.35.1
While calling alloc_page() in a loop is an effective way to populate an
array of pages, the kernel provides a method to allocate pages in bulk.
alloc_pages_bulk_array() populates the NULL slots in a page array, trying to
grab more than one page at a time.
Unfortunately, it doesn't guarantee allocating all slots in the array,
but it's easy to call it in a loop and return an error if no progress
occurs. Similar code can be found in xfs/xfs_buf.c:xfs_buf_alloc_pages().
Signed-off-by: Sweet Tea Dorminy <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
---
Changes in v3:
- Added a newline after variable declaration
Changes in v2:
- Moved from ctree.c to extent_io.c
---
fs/btrfs/extent_io.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ab4c1c4d1b59..b268e47aa2b7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3144,19 +3144,25 @@ static void end_bio_extent_readpage(struct bio *bio)
*/
int btrfs_alloc_page_array(unsigned long nr_pages, struct page **page_array)
{
- int i;
+ long allocated = 0;
+
+ for (;;) {
+ long last = allocated;
- for (i = 0; i < nr_pages; i++) {
- struct page *page;
+ allocated = alloc_pages_bulk_array(GFP_NOFS, nr_pages,
+ page_array);
+ if (allocated == nr_pages)
+ return 0;
- if (page_array[i])
+ if (allocated != last)
continue;
- page = alloc_page(GFP_NOFS);
- if (!page)
- return -ENOMEM;
- page_array[i] = page;
+ /*
+ * During this iteration, no page could be allocated, even
+ * though alloc_pages_bulk_array() falls back to alloc_page()
+ * if it could not bulk-allocate. So we must be out of memory.
+ */
+ return -ENOMEM;
}
- return 0;
}
/*
--
2.35.1
Several functions currently populate an array of page pointers one
allocated page at a time; factor out the common code so as to allow
improvements to all of the sites at once.
Signed-off-by: Sweet Tea Dorminy <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
---
Changes in v3:
- expanded 'int r' to 'int ret', reflected renaming throughout
- made sure there was space after variable declarations
- fixed up kerneldoc to be more btrfs's style
Changes in v2:
- moved the new btrfs_alloc_page_array() function to extent_io.[ch]
---
fs/btrfs/check-integrity.c | 8 ++---
fs/btrfs/compression.c | 39 ++++++++++-------------
fs/btrfs/extent_io.c | 65 ++++++++++++++++++++++++++++----------
fs/btrfs/extent_io.h | 2 ++
fs/btrfs/inode.c | 10 +++---
fs/btrfs/raid56.c | 30 +++---------------
6 files changed, 79 insertions(+), 75 deletions(-)
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 7e9f90fa0388..366d5a80f3c5 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1553,11 +1553,9 @@ static int btrfsic_read_block(struct btrfsic_state *state,
return -ENOMEM;
block_ctx->datav = block_ctx->mem_to_free;
block_ctx->pagev = (struct page **)(block_ctx->datav + num_pages);
- for (i = 0; i < num_pages; i++) {
- block_ctx->pagev[i] = alloc_page(GFP_NOFS);
- if (!block_ctx->pagev[i])
- return -1;
- }
+ ret = btrfs_alloc_page_array(num_pages, block_ctx->pagev);
+ if (ret)
+ return ret;
dev_bytenr = block_ctx->dev_bytenr;
for (i = 0; i < num_pages;) {
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index be476f094300..3772195222e9 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -801,8 +801,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
struct extent_map_tree *em_tree;
struct compressed_bio *cb;
unsigned int compressed_len;
- unsigned int nr_pages;
- unsigned int pg_index;
struct bio *comp_bio = NULL;
const u64 disk_bytenr = bio->bi_iter.bi_sector << SECTOR_SHIFT;
u64 cur_disk_byte = disk_bytenr;
@@ -812,7 +810,8 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
u64 em_start;
struct extent_map *em;
blk_status_t ret;
- int faili = 0;
+ int ret2;
+ int i;
u8 *sums;
em_tree = &BTRFS_I(inode)->extent_tree;
@@ -855,25 +854,20 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
cb->compress_type = extent_compress_type(bio_flags);
cb->orig_bio = bio;
- nr_pages = DIV_ROUND_UP(compressed_len, PAGE_SIZE);
- cb->compressed_pages = kcalloc(nr_pages, sizeof(struct page *),
- GFP_NOFS);
+ cb->nr_pages = DIV_ROUND_UP(compressed_len, PAGE_SIZE);
+ cb->compressed_pages = kcalloc(cb->nr_pages, sizeof(struct page *),
+ GFP_NOFS);
if (!cb->compressed_pages) {
ret = BLK_STS_RESOURCE;
- goto fail1;
+ goto fail;
}
- for (pg_index = 0; pg_index < nr_pages; pg_index++) {
- cb->compressed_pages[pg_index] = alloc_page(GFP_NOFS);
- if (!cb->compressed_pages[pg_index]) {
- faili = pg_index - 1;
- ret = BLK_STS_RESOURCE;
- goto fail2;
- }
+ ret2 = btrfs_alloc_page_array(cb->nr_pages, cb->compressed_pages);
+ if (ret2) {
+ ret = BLK_STS_RESOURCE;
+ goto fail;
}
- faili = nr_pages - 1;
- cb->nr_pages = nr_pages;
-
+
add_ra_bio_pages(inode, em_start + em_len, cb);
/* include any pages we added in add_ra-bio_pages */
@@ -949,14 +943,15 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
}
return BLK_STS_OK;
-fail2:
- while (faili >= 0) {
- __free_page(cb->compressed_pages[faili]);
- faili--;
+fail:
+ if (cb->compressed_pages) {
+ for (i = 0; i < cb->nr_pages; i++) {
+ if (cb->compressed_pages[i])
+ __free_page(cb->compressed_pages[i]);
+ }
}
kfree(cb->compressed_pages);
-fail1:
kfree(cb);
out:
free_extent_map(em);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 53b59944013f..ab4c1c4d1b59 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3132,6 +3132,33 @@ static void end_bio_extent_readpage(struct bio *bio)
bio_put(bio);
}
+/**
+ * Populate every slot in a provided array with pages.
+ *
+ * @nr_pages: the number of pages to request
+ * @page_array: the array to fill with pages; any existing non-null entries in
+ * the array will be skipped
+ *
+ * Return: 0 if all pages were able to be allocated; -ENOMEM otherwise, and the
+ * caller is responsible for freeing all non-null page pointers in the array.
+ */
+int btrfs_alloc_page_array(unsigned long nr_pages, struct page **page_array)
+{
+ int i;
+
+ for (i = 0; i < nr_pages; i++) {
+ struct page *page;
+
+ if (page_array[i])
+ continue;
+ page = alloc_page(GFP_NOFS);
+ if (!page)
+ return -ENOMEM;
+ page_array[i] = page;
+ }
+ return 0;
+}
+
/*
* Initialize the members up to but not including 'bio'. Use after allocating a
* new bio by bio_alloc_bioset as it does not initialize the bytes outside of
@@ -5898,9 +5925,9 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
{
int i;
- struct page *p;
struct extent_buffer *new;
int num_pages = num_extent_pages(src);
+ int ret;
new = __alloc_extent_buffer(src->fs_info, src->start, src->len);
if (new == NULL)
@@ -5913,22 +5940,23 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
*/
set_bit(EXTENT_BUFFER_UNMAPPED, &new->bflags);
+ memset(new->pages, 0, sizeof(*new->pages) * num_pages);
+ ret = btrfs_alloc_page_array(num_pages, new->pages);
+ if (ret) {
+ btrfs_release_extent_buffer(new);
+ return NULL;
+ }
+
for (i = 0; i < num_pages; i++) {
int ret;
+ struct page *p = new->pages[i];
- p = alloc_page(GFP_NOFS);
- if (!p) {
- btrfs_release_extent_buffer(new);
- return NULL;
- }
ret = attach_extent_buffer_page(new, p, NULL);
if (ret < 0) {
- put_page(p);
btrfs_release_extent_buffer(new);
return NULL;
}
WARN_ON(PageDirty(p));
- new->pages[i] = p;
copy_page(page_address(p), page_address(src->pages[i]));
}
set_extent_buffer_uptodate(new);
@@ -5942,31 +5970,36 @@ struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info,
struct extent_buffer *eb;
int num_pages;
int i;
+ int ret;
eb = __alloc_extent_buffer(fs_info, start, len);
if (!eb)
return NULL;
num_pages = num_extent_pages(eb);
+ ret = btrfs_alloc_page_array(num_pages, eb->pages);
+ if (ret)
+ goto err;
+
for (i = 0; i < num_pages; i++) {
- int ret;
+ struct page *p = eb->pages[i];
- eb->pages[i] = alloc_page(GFP_NOFS);
- if (!eb->pages[i])
- goto err;
- ret = attach_extent_buffer_page(eb, eb->pages[i], NULL);
+ ret = attach_extent_buffer_page(eb, p, NULL);
if (ret < 0)
goto err;
}
+
set_extent_buffer_uptodate(eb);
btrfs_set_header_nritems(eb, 0);
set_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
return eb;
err:
- for (; i > 0; i--) {
- detach_extent_buffer_page(eb, eb->pages[i - 1]);
- __free_page(eb->pages[i - 1]);
+ for (i = 0; i < num_pages; i++) {
+ if (eb->pages[i]) {
+ detach_extent_buffer_page(eb, eb->pages[i]);
+ __free_page(eb->pages[i]);
+ }
}
__free_extent_buffer(eb);
return NULL;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 151e9da5da2d..464269e28941 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -277,6 +277,8 @@ void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end);
void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
struct page *locked_page,
u32 bits_to_clear, unsigned long page_ops);
+
+int btrfs_alloc_page_array(unsigned long nr_pages, struct page **page_array);
struct bio *btrfs_bio_alloc(unsigned int nr_iovecs);
struct bio *btrfs_bio_clone(struct bio *bio);
struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c7b15634fe70..121858652a09 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10427,13 +10427,11 @@ static ssize_t btrfs_encoded_read_regular(struct kiocb *iocb,
pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
if (!pages)
return -ENOMEM;
- for (i = 0; i < nr_pages; i++) {
- pages[i] = alloc_page(GFP_NOFS);
- if (!pages[i]) {
- ret = -ENOMEM;
- goto out;
+ ret = btrfs_alloc_page_array(nr_pages, pages);
+ if (ret) {
+ ret = -ENOMEM;
+ goto out;
}
- }
ret = btrfs_encoded_read_regular_fill_pages(inode, start, disk_bytenr,
disk_io_size, pages);
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0e239a4c3b26..ea7a9152b1cc 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1026,37 +1026,15 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
/* allocate pages for all the stripes in the bio, including parity */
static int alloc_rbio_pages(struct btrfs_raid_bio *rbio)
{
- int i;
- struct page *page;
-
- for (i = 0; i < rbio->nr_pages; i++) {
- if (rbio->stripe_pages[i])
- continue;
- page = alloc_page(GFP_NOFS);
- if (!page)
- return -ENOMEM;
- rbio->stripe_pages[i] = page;
- }
- return 0;
+ return btrfs_alloc_page_array(rbio->nr_pages, rbio->stripe_pages);
}
/* only allocate pages for p/q stripes */
static int alloc_rbio_parity_pages(struct btrfs_raid_bio *rbio)
{
- int i;
- struct page *page;
-
- i = rbio_stripe_page_index(rbio, rbio->nr_data, 0);
-
- for (; i < rbio->nr_pages; i++) {
- if (rbio->stripe_pages[i])
- continue;
- page = alloc_page(GFP_NOFS);
- if (!page)
- return -ENOMEM;
- rbio->stripe_pages[i] = page;
- }
- return 0;
+ int data_pages = rbio_stripe_page_index(rbio, rbio->nr_data, 0);
+ return btrfs_alloc_page_array(rbio->nr_pages - data_pages,
+ rbio->stripe_pages + data_pages);
}
/*
--
2.35.1
On Wed, Mar 30, 2022 at 04:11:22PM -0400, Sweet Tea Dorminy wrote:
> +/**
> + * Populate every slot in a provided array with pages.
> + *
> + * @nr_pages: the number of pages to request
> + * @page_array: the array to fill with pages; any existing non-null entries in
> + * the array will be skipped
> + *
> + * Return: 0 if all pages were able to be allocated; -ENOMEM otherwise, and the
> + * caller is responsible for freeing all non-null page pointers in the array.
> + */
> +int btrfs_alloc_page_array(unsigned long nr_pages, struct page **page_array)
I've switched nr_pages to 'unsigned int', that's what all callers use
and I don't see a reason for long.
> +{
> + int i;
> +
> + for (i = 0; i < nr_pages; i++) {
> + struct page *page;
> +
> + if (page_array[i])
> + continue;
> + page = alloc_page(GFP_NOFS);
> + if (!page)
> + return -ENOMEM;
> + page_array[i] = page;
> + }
> + return 0;
> +}
On 3/31/22 13:35, David Sterba wrote:
> On Wed, Mar 30, 2022 at 04:11:23PM -0400, Sweet Tea Dorminy wrote:
>> While calling alloc_page() in a loop is an effective way to populate an
>> array of pages, the kernel provides a method to allocate pages in bulk.
>> alloc_pages_bulk_array() populates the NULL slots in a page array, trying to
>> grab more than one page at a time.
>>
>> Unfortunately, it doesn't guarantee allocating all slots in the array,
>> but it's easy to call it in a loop and return an error if no progress
>> occurs. Similar code can be found in xfs/xfs_buf.c:xfs_buf_alloc_pages().
>>
>> Signed-off-by: Sweet Tea Dorminy <[email protected]>
>> Reviewed-by: Nikolay Borisov <[email protected]>
>> ---
>> Changes in v3:
>> - Added a newline after variable declaration
>> Changes in v2:
>> - Moved from ctree.c to extent_io.c
>> ---
>> fs/btrfs/extent_io.c | 24 +++++++++++++++---------
>> 1 file changed, 15 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index ab4c1c4d1b59..b268e47aa2b7 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -3144,19 +3144,25 @@ static void end_bio_extent_readpage(struct bio *bio)
>> */
>> int btrfs_alloc_page_array(unsigned long nr_pages, struct page **page_array)
>> {
>> - int i;
>> + long allocated = 0;
>> +
>> + for (;;) {
>> + long last = allocated;
>>
>> - for (i = 0; i < nr_pages; i++) {
>> - struct page *page;
>> + allocated = alloc_pages_bulk_array(GFP_NOFS, nr_pages,
>> + page_array);
>> + if (allocated == nr_pages)
>> + return 0;
>>
>> - if (page_array[i])
>> + if (allocated != last)
>> continue;
>> - page = alloc_page(GFP_NOFS);
>> - if (!page)
>> - return -ENOMEM;
>> - page_array[i] = page;
>> + /*
>> + * During this iteration, no page could be allocated, even
>> + * though alloc_pages_bulk_array() falls back to alloc_page()
>> + * if it could not bulk-allocate. So we must be out of memory.
>> + */
>> + return -ENOMEM;
>> }
>
> I find the way the loop is structured a bit cumbersome so I'd suggest to
> rewrite it as:
>
> int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array)
> {
> unsigned int allocated;
>
> for (allocated = 0; allocated < nr_pages;) {
> unsigned int last = allocated;
>
> allocated = alloc_pages_bulk_array(GFP_NOFS, nr_pages, page_array);
>
> /*
> * During this iteration, no page could be allocated, even
> * though alloc_pages_bulk_array() falls back to alloc_page()
> * if it could not bulk-allocate. So we must be out of memory.
> */
> if (allocated == last)
> return -ENOMEM;
> }
> return 0;
> }
Sounds good, I'll amend it that way.
>
> Also in the xfs code there's memalloc_retry_wait() which is supposed to be
> called when repeated memory allocation is retried. What was the reason
> you removed it?
Trying to keep the behavior as close as possible to the existing behavior.
The current behavior of each alloc_page loop is to fail if alloc_page()
fails; in the worst case, alloc_pages_bulk_array() calls alloc_page()
after trying to get a batch, so I figured the worst case is still
basically a loop calling alloc_page() and failing if it ever fails.
Reading up on it, though, arguably the memalloc_retry_wait() should
already be in all the callsites, so maybe I should insert a patch in the
middle that just adds the memalloc_retry_wait() into
btrfs_alloc_page_array()? Since it's an orthogonal fixup to either the
refactoring or the conversion to alloc_pages_bulk_array()?
Thanks!
Sweet Tea
On Thu, Mar 31, 2022 at 02:19:07PM -0400, Sweet Tea Dorminy wrote:
> > Also in the xfs code there's memalloc_retry_wait() which is supposed to be
> > called when repeated memory allocation is retried. What was the reason
> > you removed it?
>
> Trying to keep the behavior as close as possible to the existing behavior.
I see, makes sense.
> The current behavior of each alloc_page loop is to fail if alloc_page()
> fails; in the worst case, alloc_pages_bulk_array() calls alloc_page()
> after trying to get a batch, so I figured the worst case is still
> basically a loop calling alloc_page() and failing if it ever fails.
>
> Reading up on it, though, arguably the memalloc_retry_wait() should
> already be in all the callsites, so maybe I should insert a patch in the
> middle that just adds the memalloc_retry_wait() into
> btrfs_alloc_page_array()? Since it's an orthogonal fixup to either the
> refactoring or the conversion to alloc_pages_bulk_array()?
Yeah a separate patch with the reasonig about the potential effects is
better. The v3 is now in misc-next with the suggested loop refactoring,
so please send the memalloc_retry_wait() update on top of that. Thanks.