2023-03-07 14:38:46

by Christoph Hellwig

[permalink] [raw]
Subject: return an ERR_PTR from __filemap_get_folio v3

Hi all,

__filemap_get_folio and its wrappers can return NULL for three different
conditions, which in some cases requires the caller to reverse engineer
the decision making. This is fixed by returning an ERR_PTR instead of
NULL and thus transporting the reason for the failure. But to make
that work we first need to ensure that no xa_value special case is
returned and thus return the FGP_ENTRY flag. It turns out that flag
is barely used and can usually be deal with in a better way.

Note that the shmem patches in here are non-trivial and I would appreicate
a careful review from Hugh.

Changes since v2:
- rebased to v6.3-rc which includes a couple of new callsites

Changes since v1:
- drop the patches to check for errors in btrfs and gfs2
- document the new calling conventions for the wrappers around
__filemap_get_folio
- rebased against the iomap changes in latest linux-next

Diffstat
fs/afs/dir.c | 10 +++----
fs/afs/dir_edit.c | 2 -
fs/afs/write.c | 4 +-
fs/ext4/inode.c | 2 -
fs/ext4/move_extent.c | 8 ++---
fs/hugetlbfs/inode.c | 2 -
fs/iomap/buffered-io.c | 11 +-------
fs/netfs/buffered_read.c | 4 +-
fs/nfs/file.c | 4 +-
fs/nilfs2/page.c | 6 ++--
include/linux/pagemap.h | 15 +++++------
include/linux/shmem_fs.h | 1
mm/filemap.c | 27 ++++++++-----------
mm/folio-compat.c | 4 +-
mm/huge_memory.c | 5 +--
mm/hugetlb.c | 6 ++--
mm/memcontrol.c | 2 -
mm/mincore.c | 2 -
mm/shmem.c | 64 +++++++++++++++++++----------------------------
mm/swap_state.c | 17 ++++++------
mm/swapfile.c | 4 +-
mm/truncate.c | 15 +++++------
22 files changed, 99 insertions(+), 116 deletions(-)


2023-03-07 14:38:49

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 1/7] mm: don't look at xarray value entries in split_huge_pages_in_file

split_huge_pages_in_file never wants to do anything with the special
value enties. Switch to using filemap_get_folio to not even see them.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/huge_memory.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4fc43859e59a31..62843afeb7946d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3090,11 +3090,10 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start,
mapping = candidate->f_mapping;

for (index = off_start; index < off_end; index += nr_pages) {
- struct folio *folio = __filemap_get_folio(mapping, index,
- FGP_ENTRY, 0);
+ struct folio *folio = filemap_get_folio(mapping, index);

nr_pages = 1;
- if (xa_is_value(folio) || !folio)
+ if (!folio)
continue;

if (!folio_test_large(folio))
--
2.39.1


2023-03-07 14:38:59

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 2/7] mm: make mapping_get_entry available outside of filemap.c

mapping_get_entry is useful for page cache API users that need to know
about xa_value internals. Rename it and make it available in pagemap.h.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 1 +
mm/filemap.c | 6 +++---
2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 0acb8e1fb7afdc..5d9b51d3854220 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -507,6 +507,7 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
#define FGP_ENTRY 0x00000080
#define FGP_STABLE 0x00000100

+void *filemap_get_entry(struct address_space *mapping, pgoff_t index);
struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
int fgp_flags, gfp_t gfp);
struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
diff --git a/mm/filemap.c b/mm/filemap.c
index 2723104cc06a12..a674108a4d524b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1836,7 +1836,7 @@ EXPORT_SYMBOL(page_cache_prev_miss);
*/

/*
- * mapping_get_entry - Get a page cache entry.
+ * filemap_get_entry - Get a page cache entry.
* @mapping: the address_space to search
* @index: The page cache index.
*
@@ -1847,7 +1847,7 @@ EXPORT_SYMBOL(page_cache_prev_miss);
*
* Return: The folio, swap or shadow entry, %NULL if nothing is found.
*/
-static void *mapping_get_entry(struct address_space *mapping, pgoff_t index)
+void *filemap_get_entry(struct address_space *mapping, pgoff_t index)
{
XA_STATE(xas, &mapping->i_pages, index);
struct folio *folio;
@@ -1917,7 +1917,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
struct folio *folio;

repeat:
- folio = mapping_get_entry(mapping, index);
+ folio = filemap_get_entry(mapping, index);
if (xa_is_value(folio)) {
if (fgp_flags & FGP_ENTRY)
return folio;
--
2.39.1


2023-03-07 14:39:02

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 3/7] mm: use filemap_get_entry in filemap_get_incore_folio

filemap_get_incore_folio wants to look at the details of xa_is_value
entries, but doesn't need any of the other logic in filemap_get_folio.
Switch it to use the lower-level filemap_get_entry interface.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/swap_state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 7a003d8abb37bc..92234f4b51d29a 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -386,7 +386,7 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping,
{
swp_entry_t swp;
struct swap_info_struct *si;
- struct folio *folio = __filemap_get_folio(mapping, index, FGP_ENTRY, 0);
+ struct folio *folio = filemap_get_entry(mapping, index);

if (!xa_is_value(folio))
goto out;
--
2.39.1


2023-03-07 14:39:13

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 4/7] shmem: remove shmem_get_partial_folio

Add a new SGP_FIND mode for shmem_get_partial_folio that works like
SGP_READ, but does not check i_size. Use that instead of open coding
the page cache lookup in shmem_get_partial_folio. Note that this is
a behavior change in that it reads in swap cache entries for offsets
outside i_size, possibly causing a little bit of extra work.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/shmem_fs.h | 1 +
mm/shmem.c | 46 ++++++++++++----------------------------
2 files changed, 15 insertions(+), 32 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 103d1000a5a2e2..b87df1b137d3e5 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -105,6 +105,7 @@ enum sgp_type {
SGP_CACHE, /* don't exceed i_size, may allocate page */
SGP_WRITE, /* may exceed i_size, may allocate !Uptodate page */
SGP_FALLOC, /* like SGP_WRITE, but make existing page Uptodate */
+ SGP_FIND, /* like SGP_READ, but also read outside i_size */
};

int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
diff --git a/mm/shmem.c b/mm/shmem.c
index 448f393d8ab2b1..3705437c5757ba 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -877,27 +877,6 @@ void shmem_unlock_mapping(struct address_space *mapping)
}
}

-static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)
-{
- struct folio *folio;
-
- /*
- * At first avoid shmem_get_folio(,,,SGP_READ): that fails
- * beyond i_size, and reports fallocated pages as holes.
- */
- folio = __filemap_get_folio(inode->i_mapping, index,
- FGP_ENTRY | FGP_LOCK, 0);
- if (!xa_is_value(folio))
- return folio;
- /*
- * But read a page back from swap if any of it is within i_size
- * (although in some cases this is just a waste of time).
- */
- folio = NULL;
- shmem_get_folio(inode, index, &folio, SGP_READ);
- return folio;
-}
-
/*
* Remove range of pages and swap entries from page cache, and free them.
* If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
@@ -957,7 +936,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
goto whole_folios;

same_folio = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT);
- folio = shmem_get_partial_folio(inode, lstart >> PAGE_SHIFT);
+ folio = NULL;
+ shmem_get_folio(inode, lstart >> PAGE_SHIFT, &folio, SGP_FIND);
if (folio) {
same_folio = lend < folio_pos(folio) + folio_size(folio);
folio_mark_dirty(folio);
@@ -971,14 +951,16 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
folio = NULL;
}

- if (!same_folio)
- folio = shmem_get_partial_folio(inode, lend >> PAGE_SHIFT);
- if (folio) {
- folio_mark_dirty(folio);
- if (!truncate_inode_partial_folio(folio, lstart, lend))
- end = folio->index;
- folio_unlock(folio);
- folio_put(folio);
+ if (!same_folio) {
+ folio = NULL;
+ shmem_get_folio(inode, lend >> PAGE_SHIFT, &folio, SGP_FIND);
+ if (folio) {
+ folio_mark_dirty(folio);
+ if (!truncate_inode_partial_folio(folio, lstart, lend))
+ end = folio->index;
+ folio_unlock(folio);
+ folio_put(folio);
+ }
}

whole_folios:
@@ -1900,7 +1882,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
if (folio_test_uptodate(folio))
goto out;
/* fallocated folio */
- if (sgp != SGP_READ)
+ if (sgp != SGP_READ && sgp != SGP_FIND)
goto clear;
folio_unlock(folio);
folio_put(folio);
@@ -1911,7 +1893,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
* SGP_NOALLOC: fail on hole, with NULL folio, letting caller fail.
*/
*foliop = NULL;
- if (sgp == SGP_READ)
+ if (sgp == SGP_READ || sgp == SGP_FIND)
return 0;
if (sgp == SGP_NOALLOC)
return -ENOENT;
--
2.39.1


2023-03-07 14:39:15

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 7/7] mm: return an ERR_PTR from __filemap_get_folio

Instead of returning NULL for all errors, distinguish between:

- no entry found and not asked to allocated (-ENOENT)
- failed to allocate memory (-ENOMEM)
- would block (-EAGAIN)

so that callers don't have to guess the error based on the passed
in flags.

Also pass through the error through the direct callers:
filemap_get_folio, filemap_lock_folio filemap_grab_folio
and filemap_get_incore_folio.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Ryusuke Konishi <[email protected]> [nilfs2]
---
fs/afs/dir.c | 10 +++++-----
fs/afs/dir_edit.c | 2 +-
fs/afs/write.c | 4 ++--
fs/ext4/inode.c | 2 +-
fs/ext4/move_extent.c | 8 ++++----
fs/hugetlbfs/inode.c | 2 +-
fs/iomap/buffered-io.c | 11 ++---------
fs/netfs/buffered_read.c | 4 ++--
fs/nfs/file.c | 4 ++--
fs/nilfs2/page.c | 6 +++---
include/linux/pagemap.h | 11 ++++++-----
mm/filemap.c | 14 ++++++++------
mm/folio-compat.c | 2 +-
mm/huge_memory.c | 2 +-
mm/hugetlb.c | 6 ++++--
mm/memcontrol.c | 2 +-
mm/mincore.c | 2 +-
mm/shmem.c | 4 ++--
mm/swap_state.c | 15 ++++++++-------
mm/swapfile.c | 4 ++--
mm/truncate.c | 15 ++++++++-------
21 files changed, 65 insertions(+), 65 deletions(-)

diff --git a/fs/afs/dir.c b/fs/afs/dir.c
index 82690d1dd49a02..f92b9e62d567b9 100644
--- a/fs/afs/dir.c
+++ b/fs/afs/dir.c
@@ -319,16 +319,16 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key)
struct folio *folio;

folio = filemap_get_folio(mapping, i);
- if (!folio) {
+ if (IS_ERR(folio)) {
if (test_and_clear_bit(AFS_VNODE_DIR_VALID, &dvnode->flags))
afs_stat_v(dvnode, n_inval);
-
- ret = -ENOMEM;
folio = __filemap_get_folio(mapping,
i, FGP_LOCK | FGP_CREAT,
mapping->gfp_mask);
- if (!folio)
+ if (IS_ERR(folio)) {
+ ret = PTR_ERR(folio);
goto error;
+ }
folio_attach_private(folio, (void *)1);
folio_unlock(folio);
}
@@ -524,7 +524,7 @@ static int afs_dir_iterate(struct inode *dir, struct dir_context *ctx,
*/
folio = __filemap_get_folio(dir->i_mapping, ctx->pos / PAGE_SIZE,
FGP_ACCESSED, 0);
- if (!folio) {
+ if (IS_ERR(folio)) {
ret = afs_bad(dvnode, afs_file_error_dir_missing_page);
break;
}
diff --git a/fs/afs/dir_edit.c b/fs/afs/dir_edit.c
index 0ab7752d1b758e..f0eddccbdd9541 100644
--- a/fs/afs/dir_edit.c
+++ b/fs/afs/dir_edit.c
@@ -115,7 +115,7 @@ static struct folio *afs_dir_get_folio(struct afs_vnode *vnode, pgoff_t index)
folio = __filemap_get_folio(mapping, index,
FGP_LOCK | FGP_ACCESSED | FGP_CREAT,
mapping->gfp_mask);
- if (!folio)
+ if (IS_ERR(folio))
clear_bit(AFS_VNODE_DIR_VALID, &vnode->flags);
else if (folio && !folio_test_private(folio))
folio_attach_private(folio, (void *)1);
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 571f3b9a417e5f..c822d6006033a7 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -232,7 +232,7 @@ static void afs_kill_pages(struct address_space *mapping,
_debug("kill %lx (to %lx)", index, last);

folio = filemap_get_folio(mapping, index);
- if (!folio) {
+ if (IS_ERR(folio)) {
next = index + 1;
continue;
}
@@ -270,7 +270,7 @@ static void afs_redirty_pages(struct writeback_control *wbc,
_debug("redirty %llx @%llx", len, start);

folio = filemap_get_folio(mapping, index);
- if (!folio) {
+ if (IS_ERR(folio)) {
next = index + 1;
continue;
}
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d251d705c2763c..1ad3e369e5a450 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5390,7 +5390,7 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode)
while (1) {
struct folio *folio = filemap_lock_folio(inode->i_mapping,
inode->i_size >> PAGE_SHIFT);
- if (!folio)
+ if (IS_ERR(folio))
return;
ret = __ext4_journalled_invalidate_folio(folio, offset,
folio_size(folio) - offset);
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 2de9829aed63bf..7bf6d069199cbb 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -141,18 +141,18 @@ mext_folio_double_lock(struct inode *inode1, struct inode *inode2,
flags = memalloc_nofs_save();
folio[0] = __filemap_get_folio(mapping[0], index1, fgp_flags,
mapping_gfp_mask(mapping[0]));
- if (!folio[0]) {
+ if (IS_ERR(folio[0])) {
memalloc_nofs_restore(flags);
- return -ENOMEM;
+ return PTR_ERR(folio[0]);
}

folio[1] = __filemap_get_folio(mapping[1], index2, fgp_flags,
mapping_gfp_mask(mapping[1]));
memalloc_nofs_restore(flags);
- if (!folio[1]) {
+ if (IS_ERR(folio[1])) {
folio_unlock(folio[0]);
folio_put(folio[0]);
- return -ENOMEM;
+ return PTR_ERR(folio[1]);
}
/*
* __filemap_get_folio() may not wait on folio's writeback if
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9062da6da56753..702d79639c0dff 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -697,7 +697,7 @@ static void hugetlbfs_zero_partial_page(struct hstate *h,
struct folio *folio;

folio = filemap_lock_folio(mapping, idx);
- if (!folio)
+ if (IS_ERR(folio))
return;

start = start & ~huge_page_mask(h);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 6f4c97a6d7e9dc..96bb56c203f49d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -468,19 +468,12 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);
struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos)
{
unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
- struct folio *folio;

if (iter->flags & IOMAP_NOWAIT)
fgp |= FGP_NOWAIT;

- folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
+ return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
fgp, mapping_gfp_mask(iter->inode->i_mapping));
- if (folio)
- return folio;
-
- if (iter->flags & IOMAP_NOWAIT)
- return ERR_PTR(-EAGAIN);
- return ERR_PTR(-ENOMEM);
}
EXPORT_SYMBOL_GPL(iomap_get_folio);

@@ -911,7 +904,7 @@ static int iomap_write_delalloc_scan(struct inode *inode,
/* grab locked page */
folio = filemap_lock_folio(inode->i_mapping,
start_byte >> PAGE_SHIFT);
- if (!folio) {
+ if (IS_ERR(folio)) {
start_byte = ALIGN_DOWN(start_byte, PAGE_SIZE) +
PAGE_SIZE;
continue;
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 7679a68e819307..209726a9cfdb9c 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -350,8 +350,8 @@ int netfs_write_begin(struct netfs_inode *ctx,
retry:
folio = __filemap_get_folio(mapping, index, fgp_flags,
mapping_gfp_mask(mapping));
- if (!folio)
- return -ENOMEM;
+ if (IS_ERR(folio))
+ return PTR_ERR(folio);

if (ctx->ops->check_write_begin) {
/* Allow the netfs (eg. ceph) to flush conflicts. */
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 893625eacab9fa..1d03406e6c039a 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -336,8 +336,8 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,

start:
folio = nfs_folio_grab_cache_write_begin(mapping, pos >> PAGE_SHIFT);
- if (!folio)
- return -ENOMEM;
+ if (IS_ERR(folio))
+ return PTR_ERR(folio);
*pagep = &folio->page;

ret = nfs_flush_incompatible(file, folio);
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 41ccd43cd9797f..5cf30827f244c4 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -259,10 +259,10 @@ int nilfs_copy_dirty_pages(struct address_space *dmap,
NILFS_PAGE_BUG(&folio->page, "inconsistent dirty state");

dfolio = filemap_grab_folio(dmap, folio->index);
- if (unlikely(!dfolio)) {
+ if (unlikely(IS_ERR(dfolio))) {
/* No empty page is added to the page cache */
- err = -ENOMEM;
folio_unlock(folio);
+ err = PTR_ERR(dfolio);
break;
}
if (unlikely(!folio_buffers(folio)))
@@ -311,7 +311,7 @@ void nilfs_copy_back_pages(struct address_space *dmap,

folio_lock(folio);
dfolio = filemap_lock_folio(dmap, index);
- if (dfolio) {
+ if (!IS_ERR(dfolio)) {
/* overwrite existing folio in the destination cache */
WARN_ON(folio_test_dirty(dfolio));
nilfs_copy_page(&dfolio->page, &folio->page, 0);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index bb60e0209b7e3b..0135d60d3b4c13 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -520,7 +520,8 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
* Looks up the page cache entry at @mapping & @index. If a folio is
* present, it is returned with an increased refcount.
*
- * Otherwise, %NULL is returned.
+ * Return: A folio or ERR_PTR(-ENOENT) if there is no folio in the cache for
+ * this index. Will not return a shadow, swap or DAX entry.
*/
static inline struct folio *filemap_get_folio(struct address_space *mapping,
pgoff_t index)
@@ -537,8 +538,8 @@ static inline struct folio *filemap_get_folio(struct address_space *mapping,
* present, it is returned locked with an increased refcount.
*
* Context: May sleep.
- * Return: A folio or %NULL if there is no folio in the cache for this
- * index. Will not return a shadow, swap or DAX entry.
+ * Return: A folio or ERR_PTR(-ENOENT) if there is no folio in the cache for
+ * this index. Will not return a shadow, swap or DAX entry.
*/
static inline struct folio *filemap_lock_folio(struct address_space *mapping,
pgoff_t index)
@@ -555,8 +556,8 @@ static inline struct folio *filemap_lock_folio(struct address_space *mapping,
* a new folio is created. The folio is locked, marked as accessed, and
* returned.
*
- * Return: A found or created folio. NULL if no folio is found and failed to
- * create a folio.
+ * Return: A found or created folio. ERR_PTR(-ENOMEM) if no folio is found
+ * and failed to create a folio.
*/
static inline struct folio *filemap_grab_folio(struct address_space *mapping,
pgoff_t index)
diff --git a/mm/filemap.c b/mm/filemap.c
index ac161b50f5bc17..a34abfe8c65430 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1907,7 +1907,7 @@ void *filemap_get_entry(struct address_space *mapping, pgoff_t index)
*
* If there is a page cache page, it is returned with an increased refcount.
*
- * Return: The found folio or %NULL otherwise.
+ * Return: The found folio or an ERR_PTR() otherwise.
*/
struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
int fgp_flags, gfp_t gfp)
@@ -1925,7 +1925,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
if (fgp_flags & FGP_NOWAIT) {
if (!folio_trylock(folio)) {
folio_put(folio);
- return NULL;
+ return ERR_PTR(-EAGAIN);
}
} else {
folio_lock(folio);
@@ -1964,7 +1964,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,

folio = filemap_alloc_folio(gfp, 0);
if (!folio)
- return NULL;
+ return ERR_PTR(-ENOMEM);

if (WARN_ON_ONCE(!(fgp_flags & (FGP_LOCK | FGP_FOR_MMAP))))
fgp_flags |= FGP_LOCK;
@@ -1989,6 +1989,8 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
folio_unlock(folio);
}

+ if (!folio)
+ return ERR_PTR(-ENOENT);
return folio;
}
EXPORT_SYMBOL(__filemap_get_folio);
@@ -3258,7 +3260,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
* Do we have something in the page cache already?
*/
folio = filemap_get_folio(mapping, index);
- if (likely(folio)) {
+ if (likely(!IS_ERR(folio))) {
/*
* We found the page, so try async readahead before waiting for
* the lock.
@@ -3287,7 +3289,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
folio = __filemap_get_folio(mapping, index,
FGP_CREAT|FGP_FOR_MMAP,
vmf->gfp_mask);
- if (!folio) {
+ if (IS_ERR(folio)) {
if (fpin)
goto out_retry;
filemap_invalidate_unlock_shared(mapping);
@@ -3638,7 +3640,7 @@ static struct folio *do_read_cache_folio(struct address_space *mapping,
filler = mapping->a_ops->read_folio;
repeat:
folio = filemap_get_folio(mapping, index);
- if (!folio) {
+ if (IS_ERR(folio)) {
folio = filemap_alloc_folio(gfp, 0);
if (!folio)
return ERR_PTR(-ENOMEM);
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 1754daa85d35c2..2511c055a35ff6 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -97,7 +97,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
struct folio *folio;

folio = __filemap_get_folio(mapping, index, fgp_flags, gfp);
- if (!folio)
+ if (IS_ERR(folio))
return NULL;
return folio_file_page(folio, index);
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 62843afeb7946d..c57303db6993fb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3093,7 +3093,7 @@ static int split_huge_pages_in_file(const char *file_path, pgoff_t off_start,
struct folio *folio = filemap_get_folio(mapping, index);

nr_pages = 1;
- if (!folio)
+ if (IS_ERR(folio))
continue;

if (!folio_test_large(folio))
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 07abcb6eb20304..712e32b382950e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5780,7 +5780,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
*/
new_folio = false;
folio = filemap_lock_folio(mapping, idx);
- if (!folio) {
+ if (IS_ERR(folio)) {
size = i_size_read(mapping->host) >> huge_page_shift(h);
if (idx >= size)
goto out;
@@ -6071,6 +6071,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
vma_end_reservation(h, vma, haddr);

pagecache_folio = filemap_lock_folio(mapping, idx);
+ if (IS_ERR(pagecache_folio))
+ pagecache_folio = NULL;
}

ptl = huge_pte_lock(h, mm, ptep);
@@ -6182,7 +6184,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
if (is_continue) {
ret = -EFAULT;
folio = filemap_lock_folio(mapping, idx);
- if (!folio)
+ if (IS_ERR(folio))
goto out;
folio_in_pagecache = true;
} else if (!*pagep) {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5abffe6f8389e2..2e46ddf802300b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5705,7 +5705,7 @@ static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
/* shmem/tmpfs may report page out on swap: account for that too. */
index = linear_page_index(vma, addr);
folio = filemap_get_incore_folio(vma->vm_file->f_mapping, index);
- if (!folio)
+ if (IS_ERR(folio))
return NULL;
return folio_file_page(folio, index);
}
diff --git a/mm/mincore.c b/mm/mincore.c
index cd69b9db008126..5437e584b208bf 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -61,7 +61,7 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t index)
* tmpfs's .fault). So swapped out tmpfs mappings are tested here.
*/
folio = filemap_get_incore_folio(mapping, index);
- if (folio) {
+ if (!IS_ERR(folio)) {
present = folio_test_uptodate(folio);
folio_put(folio);
}
diff --git a/mm/shmem.c b/mm/shmem.c
index 714ff3fb02a938..d064f0bfd2ada8 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -603,7 +603,7 @@ static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,

index = (inode->i_size & HPAGE_PMD_MASK) >> PAGE_SHIFT;
folio = filemap_get_folio(inode->i_mapping, index);
- if (!folio)
+ if (IS_ERR(folio))
goto drop;

/* No huge page at the end of the file: nothing to split */
@@ -3188,7 +3188,7 @@ static const char *shmem_get_link(struct dentry *dentry,

if (!dentry) {
folio = filemap_get_folio(inode->i_mapping, 0);
- if (!folio)
+ if (IS_ERR(folio))
return ERR_PTR(-ECHILD);
if (PageHWPoison(folio_page(folio, 0)) ||
!folio_test_uptodate(folio)) {
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 92234f4b51d29a..c7160070b9daa9 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -336,7 +336,7 @@ struct folio *swap_cache_get_folio(swp_entry_t entry,
struct folio *folio;

folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry));
- if (folio) {
+ if (!IS_ERR(folio)) {
bool vma_ra = swap_use_vma_readahead();
bool readahead;

@@ -366,6 +366,8 @@ struct folio *swap_cache_get_folio(swp_entry_t entry,
if (!vma || !vma_ra)
atomic_inc(&swapin_readahead_hits);
}
+ } else {
+ folio = NULL;
}

return folio;
@@ -389,22 +391,21 @@ struct folio *filemap_get_incore_folio(struct address_space *mapping,
struct folio *folio = filemap_get_entry(mapping, index);

if (!xa_is_value(folio))
- goto out;
+ return folio;
if (!shmem_mapping(mapping))
- return NULL;
+ return ERR_PTR(-ENOENT);

swp = radix_to_swp_entry(folio);
/* There might be swapin error entries in shmem mapping. */
if (non_swap_entry(swp))
- return NULL;
+ return ERR_PTR(-ENOENT);
/* Prevent swapoff from happening to us */
si = get_swap_device(swp);
if (!si)
- return NULL;
+ return ERR_PTR(-ENOENT);
index = swp_offset(swp);
folio = filemap_get_folio(swap_address_space(swp), index);
put_swap_device(si);
-out:
return folio;
}

@@ -431,7 +432,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
folio = filemap_get_folio(swap_address_space(entry),
swp_offset(entry));
put_swap_device(si);
- if (folio)
+ if (!IS_ERR(folio))
return folio_file_page(folio, swp_offset(entry));

/*
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 62ba2bf577d7e7..f7c74779391219 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -136,7 +136,7 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
int ret = 0;

folio = filemap_get_folio(swap_address_space(entry), offset);
- if (!folio)
+ if (IS_ERR(folio))
return 0;
/*
* When this function is called from scan_swap_map_slots() and it's
@@ -2095,7 +2095,7 @@ static int try_to_unuse(unsigned int type)

entry = swp_entry(type, i);
folio = filemap_get_folio(swap_address_space(entry), i);
- if (!folio)
+ if (IS_ERR(folio))
continue;

/*
diff --git a/mm/truncate.c b/mm/truncate.c
index 7b4ea4c4a46b20..86de31ed4d3238 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -375,7 +375,7 @@ void truncate_inode_pages_range(struct address_space *mapping,

same_folio = (lstart >> PAGE_SHIFT) == (lend >> PAGE_SHIFT);
folio = __filemap_get_folio(mapping, lstart >> PAGE_SHIFT, FGP_LOCK, 0);
- if (folio) {
+ if (!IS_ERR(folio)) {
same_folio = lend < folio_pos(folio) + folio_size(folio);
if (!truncate_inode_partial_folio(folio, lstart, lend)) {
start = folio->index + folio_nr_pages(folio);
@@ -387,14 +387,15 @@ void truncate_inode_pages_range(struct address_space *mapping,
folio = NULL;
}

- if (!same_folio)
+ if (!same_folio) {
folio = __filemap_get_folio(mapping, lend >> PAGE_SHIFT,
FGP_LOCK, 0);
- if (folio) {
- if (!truncate_inode_partial_folio(folio, lstart, lend))
- end = folio->index;
- folio_unlock(folio);
- folio_put(folio);
+ if (!IS_ERR(folio)) {
+ if (!truncate_inode_partial_folio(folio, lstart, lend))
+ end = folio->index;
+ folio_unlock(folio);
+ folio_put(folio);
+ }
}

index = start;
--
2.39.1


2023-03-07 14:40:55

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 5/7] shmem: open code the page cache lookup in shmem_get_folio_gfp

Use the very low level filemap_get_entry helper to look up the
entry in the xarray, and then:

- don't bother locking the folio if only doing a userfault notification
- open code locking the page and checking for truncation in a related
code block

This will allow to eventually remove the FGP_ENTRY flag.

Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/shmem.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 3705437c5757ba..714ff3fb02a938 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1856,12 +1856,10 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
sbinfo = SHMEM_SB(inode->i_sb);
charge_mm = vma ? vma->vm_mm : NULL;

- folio = __filemap_get_folio(mapping, index, FGP_ENTRY | FGP_LOCK, 0);
+ folio = filemap_get_entry(mapping, index);
if (folio && vma && userfaultfd_minor(vma)) {
- if (!xa_is_value(folio)) {
- folio_unlock(folio);
+ if (!xa_is_value(folio))
folio_put(folio);
- }
*fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
return 0;
}
@@ -1877,6 +1875,14 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
}

if (folio) {
+ folio_lock(folio);
+
+ /* Has the page been truncated? */
+ if (unlikely(folio->mapping != mapping)) {
+ folio_unlock(folio);
+ folio_put(folio);
+ goto repeat;
+ }
if (sgp == SGP_WRITE)
folio_mark_accessed(folio);
if (folio_test_uptodate(folio))
--
2.39.1


2023-03-07 14:40:59

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 6/7] mm: remove FGP_ENTRY

FGP_ENTRY is unused now, so remove it.

Signed-off-by: Christoph Hellwig <[email protected]>
---
include/linux/pagemap.h | 3 +--
mm/filemap.c | 7 +------
mm/folio-compat.c | 4 ++--
3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 5d9b51d3854220..bb60e0209b7e3b 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -504,8 +504,7 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
#define FGP_NOFS 0x00000010
#define FGP_NOWAIT 0x00000020
#define FGP_FOR_MMAP 0x00000040
-#define FGP_ENTRY 0x00000080
-#define FGP_STABLE 0x00000100
+#define FGP_STABLE 0x00000080

void *filemap_get_entry(struct address_space *mapping, pgoff_t index);
struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
diff --git a/mm/filemap.c b/mm/filemap.c
index a674108a4d524b..ac161b50f5bc17 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1891,8 +1891,6 @@ void *filemap_get_entry(struct address_space *mapping, pgoff_t index)
*
* * %FGP_ACCESSED - The folio will be marked accessed.
* * %FGP_LOCK - The folio is returned locked.
- * * %FGP_ENTRY - If there is a shadow / swap / DAX entry, return it
- * instead of allocating a new folio to replace it.
* * %FGP_CREAT - If no page is present then a new page is allocated using
* @gfp and added to the page cache and the VM's LRU list.
* The page is returned locked and with an increased refcount.
@@ -1918,11 +1916,8 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,

repeat:
folio = filemap_get_entry(mapping, index);
- if (xa_is_value(folio)) {
- if (fgp_flags & FGP_ENTRY)
- return folio;
+ if (xa_is_value(folio))
folio = NULL;
- }
if (!folio)
goto no_page;

diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index cabcd1de9ecbb2..1754daa85d35c2 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -97,8 +97,8 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
struct folio *folio;

folio = __filemap_get_folio(mapping, index, fgp_flags, gfp);
- if (!folio || xa_is_value(folio))
- return &folio->page;
+ if (!folio)
+ return NULL;
return folio_file_page(folio, index);
}
EXPORT_SYMBOL(pagecache_get_page);
--
2.39.1


2023-03-07 15:27:55

by Andreas Gruenbacher

[permalink] [raw]
Subject: Re: [Cluster-devel] return an ERR_PTR from __filemap_get_folio v3

On Tue, Mar 7, 2023 at 4:07 PM Christoph Hellwig <[email protected]> wrote:
> __filemap_get_folio and its wrappers can return NULL for three different
> conditions, which in some cases requires the caller to reverse engineer
> the decision making. This is fixed by returning an ERR_PTR instead of
> NULL and thus transporting the reason for the failure. But to make
> that work we first need to ensure that no xa_value special case is
> returned and thus return the FGP_ENTRY flag. It turns out that flag
> is barely used and can usually be deal with in a better way.

Thanks for working on this cleanup; looking good at a first glance.

Andreas


2023-03-20 05:19:42

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 4/7] shmem: remove shmem_get_partial_folio

On Tue, 7 Mar 2023, Christoph Hellwig wrote:

> Add a new SGP_FIND mode for shmem_get_partial_folio that works like
> SGP_READ, but does not check i_size. Use that instead of open coding
> the page cache lookup in shmem_get_partial_folio. Note that this is
> a behavior change in that it reads in swap cache entries for offsets
> outside i_size, possibly causing a little bit of extra work.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> include/linux/shmem_fs.h | 1 +
> mm/shmem.c | 46 ++++++++++++----------------------------
> 2 files changed, 15 insertions(+), 32 deletions(-)

I thought this was fine at first, and of course it's good for all the
usual cases; but not for shmem_get_partial_folio()'s awkward cases.

Two issues with it.

One, as you highlight above, the possibility of reading more swap
unnecessarily. I do not mind if partial truncation entails reading
a little unnecessary swap; but I don't like the common case of
truncation to 0 to entail that; even less eviction; even less
unmounting, when eviction of all risks reading lots of swap.
The old code behaved well at i_size 0, the new code not so much.

Two, truncating a large folio is expected to trim that folio down
to the smaller sizei (if splitting allows); but SGP_FIND was coded
too much like SGP_READ, in reporting fallocated (!uptodate) folios
as NULL, unlike before. Then the following loop of shmem_undo_range()
removed that whole folio - removing pages "promised" to the file by
the earlier fallocate. Not as seriously bad as deleting data would be,
and not very likely to be noticed, but still not right.

Replacing shmem_get_partial_folio() by SGP_FIND was a good direction
to try, but it hasn't worked out. I tried to get SGPs to work right
for it before, when shmem_get_partial_page() was introduced; but I
did not manage to do so. I think we have to go back to how this was.

Andrew, please replace Christoph's "shmem: remove shmem_get_partial_folio"
in mm-unstable by this patch below, which achieves the same object
(eliminating FGP_ENTRY) while keeping closer to the old mechanism.

[PATCH] shmem: shmem_get_partial_folio use filemap_get_entry

To avoid use of the FGP_ENTRY flag, adapt shmem_get_partial_folio() to
use filemap_get_entry() and folio_lock() instead of __filemap_get_folio().
Update "page" in the comments there to "folio".

Signed-off-by: Hugh Dickins <[email protected]>
---

mm/shmem.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -886,14 +886,21 @@ static struct folio *shmem_get_partial_folio(struct inode *inode, pgoff_t index)

/*
* At first avoid shmem_get_folio(,,,SGP_READ): that fails
- * beyond i_size, and reports fallocated pages as holes.
+ * beyond i_size, and reports fallocated folios as holes.
*/
- folio = __filemap_get_folio(inode->i_mapping, index,
- FGP_ENTRY | FGP_LOCK, 0);
- if (!xa_is_value(folio))
+ folio = filemap_get_entry(inode->i_mapping, index);
+ if (!folio)
return folio;
+ if (!xa_is_value(folio)) {
+ folio_lock(folio);
+ if (folio->mapping == inode->i_mapping)
+ return folio;
+ /* The folio has been swapped out */
+ folio_unlock(folio);
+ folio_put(folio);
+ }
/*
- * But read a page back from swap if any of it is within i_size
+ * But read a folio back from swap if any of it is within i_size
* (although in some cases this is just a waste of time).
*/
folio = NULL;

2023-03-20 05:23:47

by Hugh Dickins

[permalink] [raw]
Subject: Re: [PATCH 5/7] shmem: open code the page cache lookup in shmem_get_folio_gfp

On Tue, 7 Mar 2023, Christoph Hellwig wrote:

> Use the very low level filemap_get_entry helper to look up the
> entry in the xarray, and then:
>
> - don't bother locking the folio if only doing a userfault notification
> - open code locking the page and checking for truncation in a related
> code block
>
> This will allow to eventually remove the FGP_ENTRY flag.
>
> Signed-off-by: Christoph Hellwig <[email protected]>

Acked-by: Hugh Dickins <[email protected]>

but Andrew, please fold in this small improvement to its comment:

[PATCH] shmem: open code the page cache lookup in shmem_get_folio_gfp fix

Adjust the new comment line: shmem folio may have been swapped out.

Signed-off-by: Hugh Dickins <[email protected]>
---

mm/shmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1905,7 +1905,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
if (folio) {
folio_lock(folio);

- /* Has the page been truncated? */
+ /* Has the folio been truncated or swapped out? */
if (unlikely(folio->mapping != mapping)) {
folio_unlock(folio);
folio_put(folio);

2023-03-20 13:59:01

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 4/7] shmem: remove shmem_get_partial_folio

On Sun, Mar 19, 2023 at 10:19:21PM -0700, Hugh Dickins wrote:
> I thought this was fine at first, and of course it's good for all the
> usual cases; but not for shmem_get_partial_folio()'s awkward cases.
>
> Two issues with it.
>
> One, as you highlight above, the possibility of reading more swap
> unnecessarily. I do not mind if partial truncation entails reading
> a little unnecessary swap; but I don't like the common case of
> truncation to 0 to entail that; even less eviction; even less
> unmounting, when eviction of all risks reading lots of swap.
> The old code behaved well at i_size 0, the new code not so much.

True. We could restore that by doing the i_size check for SGP_FIND,
though.

> Replacing shmem_get_partial_folio() by SGP_FIND was a good direction
> to try, but it hasn't worked out. I tried to get SGPs to work right
> for it before, when shmem_get_partial_page() was introduced; but I
> did not manage to do so. I think we have to go back to how this was.

Hmm, would be sad to lose this entirely. One thing I though about
but didn't manage to do, is to rework how the SGP_* flags works.
Right now they are used as en enum, and we actually do numerical
comparisms on them, which is highly confusing. To be it seems like
having actual flags that can be combined and have useful names
would seem much better. But I did run out patience for finding good
names and figuring out what would be granular enough behavior
for such flags.

e.g. one would be for limiting to i_size, one for allocating new
folios if none was found, etc.