LinuxLists.cc - [PATCH v5 00/13] Change readahead API

2020-02-11 01:05:08

Subject: [PATCH v5 00/13] Change readahead API

From: "Matthew Wilcox (Oracle)" <[email protected]>

This series adds a readahead address_space operation to eventually
replace the readpages operation. The key difference is that
pages are added to the page cache as they are allocated (and
then looked up by the filesystem) instead of passing them on a
list to the readpages operation and having the filesystem add
them to the page cache. It's a net reduction in code for each
implementation, more efficient than walking a list, and solves
the direct-write vs buffered-read problem reported by yu kuai at
https://lore.kernel.org/linux-fsdevel/[email protected]/

v5 switches to passing a readahead_control struct (mirroring the
writepages_control struct passed to writepages). This has a number of
advantages:
- It fixes a number of bugs in various implementations, eg forgetting to
increment 'start', an off-by-one error in 'nr_pages' or treating 'start'
as a byte offset instead of a page offset.
- It allows us to change the arguments without changing all the
implementations of ->readahead which just call mpage_readahead() or
iomap_readahead()
- Figuring out which pages haven't been attempted by the implementation
is more natural this way.
- There's less code in each implementation.

This version deletes a lot more lines than previous versions of the patch
-- we're net -97 lines instead of -17 with v4. It'll be even more when
we can finish the conversion and remove all the ->readpages support code,
including read_cache_pages().

Also new in v5 is patch 5 which adds page_cache_readahead_limit() and
converts ext4 and f2fs to use it for their Merkel trees.

Matthew Wilcox (Oracle) (13):
mm: Fix the return type of __do_page_cache_readahead
mm: Ignore return value of ->readpages
mm: Put readahead pages in cache earlier
mm: Add readahead address space operation
mm: Add page_cache_readahead_limit
fs: Convert mpage_readpages to mpage_readahead
btrfs: Convert from readpages to readahead
erofs: Convert uncompressed files from readpages to readahead
erofs: Convert compressed files from readpages to readahead
ext4: Convert from readpages to readahead
f2fs: Convert from readpages to readahead
fuse: Convert from readpages to readahead
iomap: Convert from readpages to readahead

Documentation/filesystems/locking.rst | 6 +-
Documentation/filesystems/vfs.rst | 13 +++
drivers/staging/exfat/exfat_super.c | 7 +-
fs/block_dev.c | 7 +-
fs/btrfs/extent_io.c | 48 +++------
fs/btrfs/extent_io.h | 3 +-
fs/btrfs/inode.c | 16 ++-
fs/erofs/data.c | 39 +++----
fs/erofs/zdata.c | 29 ++----
fs/ext2/inode.c | 10 +-
fs/ext4/ext4.h | 3 +-
fs/ext4/inode.c | 23 ++---
fs/ext4/readpage.c | 22 ++--
fs/ext4/verity.c | 35 +------
fs/f2fs/data.c | 50 ++++-----
fs/f2fs/f2fs.h | 5 +-
fs/f2fs/verity.c | 35 +------
fs/fat/inode.c | 7 +-
fs/fuse/file.c | 46 ++++-----
fs/gfs2/aops.c | 23 ++---
fs/hpfs/file.c | 7 +-
fs/iomap/buffered-io.c | 103 ++++++-------------
fs/iomap/trace.h | 2 +-
fs/isofs/inode.c | 7 +-
fs/jfs/inode.c | 7 +-
fs/mpage.c | 38 +++----
fs/nilfs2/inode.c | 15 +--
fs/ocfs2/aops.c | 34 +++---
fs/omfs/file.c | 7 +-
fs/qnx6/inode.c | 7 +-
fs/reiserfs/inode.c | 8 +-
fs/udf/inode.c | 7 +-
fs/xfs/xfs_aops.c | 13 +--
fs/zonefs/super.c | 7 +-
include/linux/fs.h | 2 +
include/linux/iomap.h | 3 +-
include/linux/mpage.h | 4 +-
include/linux/pagemap.h | 84 +++++++++++++++
include/trace/events/erofs.h | 6 +-
include/trace/events/f2fs.h | 6 +-
mm/internal.h | 2 +-
mm/migrate.c | 2 +-
mm/readahead.c | 143 ++++++++++++++++----------
43 files changed, 422 insertions(+), 519 deletions(-)

--
2.25.0

2020-02-11 01:05:22

by Matthew Wilcox

[permalink] [raw]

Subject: [PATCH v5 01/13] mm: Fix the return type of __do_page_cache_readahead

From: "Matthew Wilcox (Oracle)" <[email protected]>

ra_submit() which is a wrapper around __do_page_cache_readahead() already
returns an unsigned long, and the 'nr_to_read' parameter is an unsigned
long, so fix __do_page_cache_readahead() to return an unsigned long,
even though I'm pretty sure we're not going to readahead more than 2^32
pages ever.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/internal.h | 2 +-
mm/readahead.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 3cf20ab3ca01..41b93c4b3ab7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -49,7 +49,7 @@ void unmap_page_range(struct mmu_gather *tlb,
unsigned long addr, unsigned long end,
struct zap_details *details);

-extern unsigned int __do_page_cache_readahead(struct address_space *mapping,
+extern unsigned long __do_page_cache_readahead(struct address_space *mapping,
struct file *filp, pgoff_t offset, unsigned long nr_to_read,
unsigned long lookahead_size);

diff --git a/mm/readahead.c b/mm/readahead.c
index 2fe72cd29b47..6bf73ef33b7e 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -152,7 +152,7 @@ static int read_pages(struct address_space *mapping, struct file *filp,
*
* Returns the number of pages requested, or the maximum amount of I/O allowed.
*/
-unsigned int __do_page_cache_readahead(struct address_space *mapping,
+unsigned long __do_page_cache_readahead(struct address_space *mapping,
struct file *filp, pgoff_t offset, unsigned long nr_to_read,
unsigned long lookahead_size)
{
@@ -161,7 +161,7 @@ unsigned int __do_page_cache_readahead(struct address_space *mapping,
unsigned long end_index; /* The last page we want to read */
LIST_HEAD(page_pool);
int page_idx;
- unsigned int nr_pages = 0;
+ unsigned long nr_pages = 0;
loff_t isize = i_size_read(inode);
gfp_t gfp_mask = readahead_gfp_mask(mapping);

--
2.25.0

2020-02-11 01:05:44

by Matthew Wilcox

[permalink] [raw]

Subject: [PATCH v5 03/13] mm: Put readahead pages in cache earlier

From: "Matthew Wilcox (Oracle)" <[email protected]>

At allocation time, put the pages in the cache unless we're using
->readpages.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/readahead.c | 66 ++++++++++++++++++++++++++++++++------------------
1 file changed, 42 insertions(+), 24 deletions(-)

diff --git a/mm/readahead.c b/mm/readahead.c
index fc77d13af556..96c6ca68a174 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -114,10 +114,10 @@ int read_cache_pages(struct address_space *mapping, struct list_head *pages,
EXPORT_SYMBOL(read_cache_pages);

static void read_pages(struct address_space *mapping, struct file *filp,
- struct list_head *pages, unsigned int nr_pages, gfp_t gfp)
+ struct list_head *pages, pgoff_t start,
+ unsigned int nr_pages)
{
struct blk_plug plug;
- unsigned page_idx;

blk_start_plug(&plug);

@@ -125,18 +125,17 @@ static void read_pages(struct address_space *mapping, struct file *filp,
mapping->a_ops->readpages(filp, mapping, pages, nr_pages);
/* Clean up the remaining pages */
put_pages_list(pages);
- goto out;
- }
+ } else {
+ struct page *page;
+ unsigned long index;

- for (page_idx = 0; page_idx < nr_pages; page_idx++) {
- struct page *page = lru_to_page(pages);
- list_del(&page->lru);
- if (!add_to_page_cache_lru(page, mapping, page->index, gfp))
+ xa_for_each_range(&mapping->i_pages, index, page, start,
+ start + nr_pages - 1) {
mapping->a_ops->readpage(filp, page);
- put_page(page);
+ put_page(page);
+ }
}

-out:
blk_finish_plug(&plug);
}

@@ -149,17 +148,18 @@ static void read_pages(struct address_space *mapping, struct file *filp,
* Returns the number of pages requested, or the maximum amount of I/O allowed.
*/
unsigned long __do_page_cache_readahead(struct address_space *mapping,
- struct file *filp, pgoff_t offset, unsigned long nr_to_read,
+ struct file *filp, pgoff_t start, unsigned long nr_to_read,
unsigned long lookahead_size)
{
struct inode *inode = mapping->host;
- struct page *page;
unsigned long end_index; /* The last page we want to read */
LIST_HEAD(page_pool);
int page_idx;
+ pgoff_t page_offset = start;
unsigned long nr_pages = 0;
loff_t isize = i_size_read(inode);
gfp_t gfp_mask = readahead_gfp_mask(mapping);
+ bool use_list = mapping->a_ops->readpages;

if (isize == 0)
goto out;
@@ -170,7 +170,7 @@ unsigned long __do_page_cache_readahead(struct address_space *mapping,
* Preallocate as many pages as we will need.
*/
for (page_idx = 0; page_idx < nr_to_read; page_idx++) {
- pgoff_t page_offset = offset + page_idx;
+ struct page *page;

if (page_offset > end_index)
break;
@@ -178,25 +178,43 @@ unsigned long __do_page_cache_readahead(struct address_space *mapping,
page = xa_load(&mapping->i_pages, page_offset);
if (page && !xa_is_value(page)) {
/*
- * Page already present? Kick off the current batch of
- * contiguous pages before continuing with the next
- * batch.
+ * Page already present? Kick off the current batch
+ * of contiguous pages before continuing with the
+ * next batch.
+ * It's possible this page is the page we should
+ * be marking with PageReadahead. However, we
+ * don't have a stable ref to this page so it might
+ * be reallocated to another user before we can set
+ * the bit. There's probably another page in the
+ * cache marked with PageReadahead from the other
+ * process which accessed this file.
*/
- if (nr_pages)
- read_pages(mapping, filp, &page_pool, nr_pages,
- gfp_mask);
- nr_pages = 0;
- continue;
+ goto skip;
}

page = __page_cache_alloc(gfp_mask);
if (!page)
break;
- page->index = page_offset;
- list_add(&page->lru, &page_pool);
+ if (use_list) {
+ page->index = page_offset;
+ list_add(&page->lru, &page_pool);
+ } else if (add_to_page_cache_lru(page, mapping, page_offset,
+ gfp_mask) < 0) {
+ put_page(page);
+ goto skip;
+ }
+
if (page_idx == nr_to_read - lookahead_size)
SetPageReadahead(page);
nr_pages++;
+ page_offset++;
+ continue;
+skip:
+ if (nr_pages)
+ read_pages(mapping, filp, &page_pool, start, nr_pages);
+ nr_pages = 0;
+ page_offset++;
+ start = page_offset;
}

/*
@@ -205,7 +223,7 @@ unsigned long __do_page_cache_readahead(struct address_space *mapping,
* will then handle the error.
*/
if (nr_pages)
- read_pages(mapping, filp, &page_pool, nr_pages, gfp_mask);
+ read_pages(mapping, filp, &page_pool, start, nr_pages);
BUG_ON(!list_empty(&page_pool));
out:
return nr_pages;
--
2.25.0

2020-02-11 01:06:01

by Matthew Wilcox

[permalink] [raw]

Subject: [PATCH v5 13/13] iomap: Convert from readpages to readahead

From: "Matthew Wilcox (Oracle)" <[email protected]>

Use the new readahead operation in iomap. Convert XFS and ZoneFS to
use it.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/iomap/buffered-io.c | 101 ++++++++++++-----------------------------
fs/iomap/trace.h | 2 +-
fs/xfs/xfs_aops.c | 13 ++----
fs/zonefs/super.c | 7 ++-
include/linux/iomap.h | 3 +-
5 files changed, 39 insertions(+), 87 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index cb3511eb152a..e40eb45230fa 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -214,9 +214,8 @@ iomap_read_end_io(struct bio *bio)
struct iomap_readpage_ctx {
struct page *cur_page;
bool cur_page_in_bio;
- bool is_readahead;
struct bio *bio;
- struct list_head *pages;
+ struct readahead_control *rac;
};

static void
@@ -307,11 +306,11 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
if (ctx->bio)
submit_bio(ctx->bio);

- if (ctx->is_readahead) /* same as readahead_gfp_mask */
+ if (ctx->rac) /* same as readahead_gfp_mask */
gfp |= __GFP_NORETRY | __GFP_NOWARN;
ctx->bio = bio_alloc(gfp, min(BIO_MAX_PAGES, nr_vecs));
ctx->bio->bi_opf = REQ_OP_READ;
- if (ctx->is_readahead)
+ if (ctx->rac)
ctx->bio->bi_opf |= REQ_RAHEAD;
ctx->bio->bi_iter.bi_sector = sector;
bio_set_dev(ctx->bio, iomap->bdev);
@@ -367,104 +366,62 @@ iomap_readpage(struct page *page, const struct iomap_ops *ops)
}
EXPORT_SYMBOL_GPL(iomap_readpage);

-static struct page *
-iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos,
- loff_t length, loff_t *done)
-{
- while (!list_empty(pages)) {
- struct page *page = lru_to_page(pages);
-
- if (page_offset(page) >= (u64)pos + length)
- break;
-
- list_del(&page->lru);
- if (!add_to_page_cache_lru(page, inode->i_mapping, page->index,
- GFP_NOFS))
- return page;
-
- /*
- * If we already have a page in the page cache at index we are
- * done. Upper layers don't care if it is uptodate after the
- * readpages call itself as every page gets checked again once
- * actually needed.
- */
- *done += PAGE_SIZE;
- put_page(page);
- }
-
- return NULL;
-}
-
static loff_t
-iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length,
+iomap_readahead_actor(struct inode *inode, loff_t pos, loff_t length,
void *data, struct iomap *iomap, struct iomap *srcmap)
{
struct iomap_readpage_ctx *ctx = data;
- loff_t done, ret;
+ loff_t ret, done = 0;

- for (done = 0; done < length; done += ret) {
- if (ctx->cur_page && offset_in_page(pos + done) == 0) {
- if (!ctx->cur_page_in_bio)
- unlock_page(ctx->cur_page);
- put_page(ctx->cur_page);
- ctx->cur_page = NULL;
- }
+ while (done < length) {
if (!ctx->cur_page) {
- ctx->cur_page = iomap_next_page(inode, ctx->pages,
- pos, length, &done);
- if (!ctx->cur_page)
- break;
+ ctx->cur_page = readahead_page(ctx->rac);
ctx->cur_page_in_bio = false;
}
ret = iomap_readpage_actor(inode, pos + done, length - done,
ctx, iomap, srcmap);
+ if (WARN_ON(ret == 0))
+ break;
+ done += ret;
+ if (offset_in_page(pos + done) == 0) {
+ ctx->rac->nr_pages -= ctx->rac->batch_count;
+ if (!ctx->cur_page_in_bio)
+ unlock_page(ctx->cur_page);
+ put_page(ctx->cur_page);
+ ctx->cur_page = NULL;
+ }
}

return done;
}

-int
-iomap_readpages(struct address_space *mapping, struct list_head *pages,
- unsigned nr_pages, const struct iomap_ops *ops)
+void iomap_readahead(struct readahead_control *rac, const struct iomap_ops *ops)
{
+ struct inode *inode = rac->mapping->host;
struct iomap_readpage_ctx ctx = {
- .pages = pages,
- .is_readahead = true,
+ .rac = rac,
};
- loff_t pos = page_offset(list_entry(pages->prev, struct page, lru));
- loff_t last = page_offset(list_entry(pages->next, struct page, lru));
- loff_t length = last - pos + PAGE_SIZE, ret = 0;
+ loff_t pos = readahead_offset(rac);
+ loff_t length = readahead_length(rac);

- trace_iomap_readpages(mapping->host, nr_pages);
+ trace_iomap_readahead(inode, readahead_count(rac));

while (length > 0) {
- ret = iomap_apply(mapping->host, pos, length, 0, ops,
- &ctx, iomap_readpages_actor);
+ loff_t ret = iomap_apply(inode, pos, length, 0, ops,
+ &ctx, iomap_readahead_actor);
if (ret <= 0) {
WARN_ON_ONCE(ret == 0);
- goto done;
+ break;
}
pos += ret;
length -= ret;
}
- ret = 0;
-done:
+
if (ctx.bio)
submit_bio(ctx.bio);
- if (ctx.cur_page) {
- if (!ctx.cur_page_in_bio)
- unlock_page(ctx.cur_page);
- put_page(ctx.cur_page);
- }
-
- /*
- * Check that we didn't lose a page due to the arcance calling
- * conventions..
- */
- WARN_ON_ONCE(!ret && !list_empty(ctx.pages));
- return ret;
+ BUG_ON(ctx.cur_page);
}
-EXPORT_SYMBOL_GPL(iomap_readpages);
+EXPORT_SYMBOL_GPL(iomap_readahead);

/*
* iomap_is_partially_uptodate checks whether blocks within a page are
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 6dc227b8c47e..d6ba705f938a 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -39,7 +39,7 @@ DEFINE_EVENT(iomap_readpage_class, name, \
TP_PROTO(struct inode *inode, int nr_pages), \
TP_ARGS(inode, nr_pages))
DEFINE_READPAGE_EVENT(iomap_readpage);
-DEFINE_READPAGE_EVENT(iomap_readpages);
+DEFINE_READPAGE_EVENT(iomap_readahead);

DECLARE_EVENT_CLASS(iomap_page_class,
TP_PROTO(struct inode *inode, struct page *page, unsigned long off,
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 3a688eb5c5ae..0897dd71c622 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -621,14 +621,11 @@ xfs_vm_readpage(
return iomap_readpage(page, &xfs_read_iomap_ops);
}

-STATIC int
-xfs_vm_readpages(
- struct file *unused,
- struct address_space *mapping,
- struct list_head *pages,
- unsigned nr_pages)
+STATIC void
+xfs_vm_readahead(
+ struct readahead_control *rac)
{
- return iomap_readpages(mapping, pages, nr_pages, &xfs_read_iomap_ops);
+ iomap_readahead(rac, &xfs_read_iomap_ops);
}

static int
@@ -644,7 +641,7 @@ xfs_iomap_swapfile_activate(

const struct address_space_operations xfs_address_space_operations = {
.readpage = xfs_vm_readpage,
- .readpages = xfs_vm_readpages,
+ .readahead = xfs_vm_readahead,
.writepage = xfs_vm_writepage,
.writepages = xfs_vm_writepages,
.set_page_dirty = iomap_set_page_dirty,
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 8bc6ef82d693..8327a01d3bac 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -78,10 +78,9 @@ static int zonefs_readpage(struct file *unused, struct page *page)
return iomap_readpage(page, &zonefs_iomap_ops);
}

-static int zonefs_readpages(struct file *unused, struct address_space *mapping,
- struct list_head *pages, unsigned int nr_pages)
+static void zonefs_readahead(struct readahead_control *rac)
{
- return iomap_readpages(mapping, pages, nr_pages, &zonefs_iomap_ops);
+ iomap_readahead(rac, &zonefs_iomap_ops);
}

/*
@@ -128,7 +127,7 @@ static int zonefs_writepages(struct address_space *mapping,

static const struct address_space_operations zonefs_file_aops = {
.readpage = zonefs_readpage,
- .readpages = zonefs_readpages,
+ .readahead = zonefs_readahead,
.writepage = zonefs_writepage,
.writepages = zonefs_writepages,
.set_page_dirty = iomap_set_page_dirty,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 8b09463dae0d..bc20bd04c2a2 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -155,8 +155,7 @@ loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length,
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops);
int iomap_readpage(struct page *page, const struct iomap_ops *ops);
-int iomap_readpages(struct address_space *mapping, struct list_head *pages,
- unsigned nr_pages, const struct iomap_ops *ops);
+void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
int iomap_set_page_dirty(struct page *page);
int iomap_is_partially_uptodate(struct page *page, unsigned long from,
unsigned long count);
--
2.25.0

2020-02-11 08:19:34

by Johannes Thumshirn

[permalink] [raw]

Subject: Re: [PATCH v5 01/13] mm: Fix the return type of __do_page_cache_readahead

On 11/02/2020 02:05, Matthew Wilcox wrote:
> even though I'm pretty sure we're not going to readahead more than 2^32
> pages ever.

And 640K is more memory than anyone will ever need on a computer *scnr*

2020-02-11 12:48:24

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH v5 01/13] mm: Fix the return type of __do_page_cache_readahead

On Tue, Feb 11, 2020 at 08:19:14AM +0000, Johannes Thumshirn wrote:
> On 11/02/2020 02:05, Matthew Wilcox wrote:
> > even though I'm pretty sure we're not going to readahead more than 2^32
> > pages ever.
>
> And 640K is more memory than anyone will ever need on a computer *scnr*

Sure, but bandwidth just isn't increasing quickly enough to have
this make sense. 2^32 pages even on our smallest page size machines
is 16GB. Right now, we cap readahead at just 256kB. If we did try to
readahead 16GB, we'd be occupying a PCIe gen4 x4 drive for two seconds,
just satisfying this one readahead. PCIe has historically doubled in
bandwidth every three years or so, so to get this down to something
reasonable like a hundredth of a second, we're looking at PCIe gen12 in
twenty years or so. And I bet we still won't do it (also, I doubt PCIe
will continue doubling bandwidth every three years).

And Linus has forbidden individual IOs over 2GB anyway, so not happening
until he's forced to see the error of his ways ;-)

2020-02-12 18:13:22

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v5 01/13] mm: Fix the return type of __do_page_cache_readahead

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2020-02-14 04:21:57

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH v5 01/13] mm: Fix the return type of __do_page_cache_readahead

On Thu, Feb 13, 2020 at 07:19:53PM -0800, John Hubbard wrote:
> On 2/10/20 5:03 PM, Matthew Wilcox wrote:
> > @@ -161,7 +161,7 @@ unsigned int __do_page_cache_readahead(struct address_space *mapping,
> > unsigned long end_index; /* The last page we want to read */
> > LIST_HEAD(page_pool);
> > int page_idx;
>
>
> What about page_idx, too? It should also have the same data type as nr_pages, as long as
> we're trying to be consistent on this point.
>
> Just want to ensure we're ready to handle those 2^33+ page readaheads... :)

Nah, this is just a type used internally to the function. Getting the
API right for the callers is the important part.

2020-02-14 04:33:53

by John Hubbard

[permalink] [raw]

Subject: Re: [PATCH v5 01/13] mm: Fix the return type of __do_page_cache_readahead

On 2/13/20 8:21 PM, Matthew Wilcox wrote:
> On Thu, Feb 13, 2020 at 07:19:53PM -0800, John Hubbard wrote:
>> On 2/10/20 5:03 PM, Matthew Wilcox wrote:
>>> @@ -161,7 +161,7 @@ unsigned int __do_page_cache_readahead(struct address_space *mapping,
>>> unsigned long end_index; /* The last page we want to read */
>>> LIST_HEAD(page_pool);
>>> int page_idx;
>>
>>
>> What about page_idx, too? It should also have the same data type as nr_pages, as long as
>> we're trying to be consistent on this point.
>>
>> Just want to ensure we're ready to handle those 2^33+ page readaheads... :)
>
> Nah, this is just a type used internally to the function. Getting the
> API right for the callers is the important part.
>

Agreed that the real point of this is to match up the API, but why not finish the job
by going all the way through? It's certainly not something we need to lose sleep over,
but it does seem like you don't want to have code like this:

for (page_idx = 0; page_idx < nr_to_read; page_idx++) {

...with the ability, technically, to overflow page_idx due to it being an int,
while nr_to_read is an unsigned long. (And the new sanitizers and checkers are
apt to complain about it, btw.)

(Apologies if there is some kernel coding idiom that I still haven't learned, about this
sort of thing.)

thanks,
--
John Hubbard
NVIDIA

2020-02-14 19:51:04

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH v5 01/13] mm: Fix the return type of __do_page_cache_readahead

On Mon, Feb 10, 2020 at 05:03:36PM -0800, Matthew Wilcox wrote:
> From: "Matthew Wilcox (Oracle)" <[email protected]>
>
> ra_submit() which is a wrapper around __do_page_cache_readahead() already
> returns an unsigned long, and the 'nr_to_read' parameter is an unsigned
> long, so fix __do_page_cache_readahead() to return an unsigned long,
> even though I'm pretty sure we're not going to readahead more than 2^32
> pages ever.

I was going through this and realised it's completely pointless -- the
returned value from ra_submit() and __do_page_cache_readahead() is
eventually ignored through all paths. So I'm replacing this patch with
one that makes everything return void.