2020-09-17 15:18:33

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 00/13] Allow readpage to return a locked page

Linus recently made the page lock more fair. That means that the old
pattern where we returned from ->readpage with the page unlocked and
then attempted to re-lock it will send us to the back of the queue for
this page's lock.

Ideally all filesystems would return from ->readpage with the
page Uptodate and Locked, but it's a bit painful to convert all the
asynchronous readpage implementations to synchronous. These ones are
already synchronous, so convert them while I work on iomap.

A further benefit is that a synchronous readpage implementation allows
us to return an error to someone who might actually care about it.
There's no need to SetPageError, but I don't want to learn about how
a dozen filesystems handle I/O errors (hint: they're all different),
so I have not attempted to change that.

Please review your filesystem carefully. I've tried to catch all the
places where a filesystem calls its own internal readpage implementation
without going through ->readpage, but I may have missed some.

Matthew Wilcox (Oracle) (13):
mm: Add AOP_UPDATED_PAGE return value
9p: Tell the VFS that readpage was synchronous
afs: Tell the VFS that readpage was synchronous
ceph: Tell the VFS that readpage was synchronous
cifs: Tell the VFS that readpage was synchronous
cramfs: Tell the VFS that readpage was synchronous
ecryptfs: Tell the VFS that readpage was synchronous
fuse: Tell the VFS that readpage was synchronous
hostfs: Tell the VFS that readpage was synchronous
jffs2: Tell the VFS that readpage was synchronous
ubifs: Tell the VFS that readpage was synchronous
udf: Tell the VFS that readpage was synchronous
vboxsf: Tell the VFS that readpage was synchronous

Documentation/filesystems/locking.rst | 7 ++++---
Documentation/filesystems/vfs.rst | 21 ++++++++++++++-------
fs/9p/vfs_addr.c | 6 +++++-
fs/afs/file.c | 3 ++-
fs/ceph/addr.c | 9 +++++----
fs/cifs/file.c | 8 ++++++--
fs/cramfs/inode.c | 5 ++---
fs/ecryptfs/mmap.c | 11 ++++++-----
fs/fuse/file.c | 2 ++
fs/hostfs/hostfs_kern.c | 2 ++
fs/jffs2/file.c | 6 ++++--
fs/ubifs/file.c | 16 ++++++++++------
fs/udf/file.c | 3 +--
fs/vboxsf/file.c | 2 ++
include/linux/fs.h | 5 +++++
mm/filemap.c | 12 ++++++++++--
16 files changed, 80 insertions(+), 38 deletions(-)

--
2.28.0


2020-09-17 15:22:02

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 03/13] afs: Tell the VFS that readpage was synchronous

The afs readpage implementation was already synchronous, so use
AOP_UPDATED_PAGE to avoid cycling the page lock.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/afs/file.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 6f6ed1605cfe..8f15305b6574 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -367,7 +367,8 @@ int afs_page_filler(void *data, struct page *page)
BUG_ON(PageFsCache(page));
}
#endif
- unlock_page(page);
+ _leave(" = AOP_UPDATED_PAGE");
+ return AOP_UPDATED_PAGE;
}

_leave(" = 0");
--
2.28.0

2020-09-17 15:22:02

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 11/13] ubifs: Tell the VFS that readpage was synchronous

The ubifs readpage implementation was already synchronous, so use
AOP_UPDATED_PAGE to avoid cycling the page lock.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/ubifs/file.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index b77d1637bbbc..82633509c45e 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -772,7 +772,6 @@ static int ubifs_do_bulk_read(struct ubifs_info *c, struct bu_info *bu,
if (err)
goto out_warn;

- unlock_page(page1);
ret = 1;

isize = i_size_read(inode);
@@ -892,11 +891,16 @@ static int ubifs_bulk_read(struct page *page)

static int ubifs_readpage(struct file *file, struct page *page)
{
- if (ubifs_bulk_read(page))
- return 0;
- do_readpage(page);
- unlock_page(page);
- return 0;
+ int err;
+
+ err = ubifs_bulk_read(page);
+ if (err == 0)
+ err = do_readpage(page);
+ if (err < 0) {
+ unlock_page(page);
+ return err;
+ }
+ return AOP_UPDATED_PAGE;
}

static int do_writepage(struct page *page, int len)
--
2.28.0

2020-09-17 15:22:45

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 09/13] hostfs: Tell the VFS that readpage was synchronous

The hostfs readpage implementation was already synchronous, so use
AOP_UPDATED_PAGE to avoid cycling the page lock.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/hostfs/hostfs_kern.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index c070c0d8e3e9..c49221c09c4b 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -455,6 +455,8 @@ static int hostfs_readpage(struct file *file, struct page *page)
out:
flush_dcache_page(page);
kunmap(page);
+ if (!ret)
+ return AOP_UPDATED_PAGE;
unlock_page(page);
return ret;
}
--
2.28.0

2020-09-17 15:23:05

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 02/13] 9p: Tell the VFS that readpage was synchronous

The 9p readpage implementation was already synchronous, so use
AOP_UPDATED_PAGE to avoid cycling the page lock.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/9p/vfs_addr.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index cce9ace651a2..506ca0ba2ec7 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -65,7 +65,7 @@ static int v9fs_fid_readpage(void *data, struct page *page)
SetPageUptodate(page);

v9fs_readpage_to_fscache(inode, page);
- retval = 0;
+ return AOP_UPDATED_PAGE;

done:
unlock_page(page);
@@ -280,6 +280,10 @@ static int v9fs_write_begin(struct file *filp, struct address_space *mapping,
goto out;

retval = v9fs_fid_readpage(v9inode->writeback_fid, page);
+ if (retval == AOP_UPDATED_PAGE) {
+ retval = 0;
+ goto out;
+ }
put_page(page);
if (!retval)
goto start;
--
2.28.0

2020-09-17 15:25:50

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 12/13] udf: Tell the VFS that readpage was synchronous

The udf inline data readpage implementation was already synchronous,
so use AOP_UPDATED_PAGE to avoid cycling the page lock.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/udf/file.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/udf/file.c b/fs/udf/file.c
index 628941a6b79a..52bbe92d7c43 100644
--- a/fs/udf/file.c
+++ b/fs/udf/file.c
@@ -61,9 +61,8 @@ static int udf_adinicb_readpage(struct file *file, struct page *page)
{
BUG_ON(!PageLocked(page));
__udf_adinicb_readpage(page);
- unlock_page(page);

- return 0;
+ return AOP_UPDATED_PAGE;
}

static int udf_adinicb_writepage(struct page *page,
--
2.28.0

2020-09-17 15:43:21

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 13/13] vboxsf: Tell the VFS that readpage was synchronous

The vboxsf inline data readpage implementation was already synchronous,
so use AOP_UPDATED_PAGE to avoid cycling the page lock.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/vboxsf/file.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c
index c4ab5996d97a..c2a144e5cb5a 100644
--- a/fs/vboxsf/file.c
+++ b/fs/vboxsf/file.c
@@ -228,6 +228,8 @@ static int vboxsf_readpage(struct file *file, struct page *page)
}

kunmap(page);
+ if (!err)
+ return AOP_UPDATED_PAGE;
unlock_page(page);
return err;
}
--
2.28.0

2020-09-17 15:45:13

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 08/13] fuse: Tell the VFS that readpage was synchronous

The fuse readpage implementation was already synchronous, so use
AOP_UPDATED_PAGE to avoid cycling the page lock.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/fuse/file.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 6611ef3269a8..7aa5626bc582 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -850,6 +850,8 @@ static int fuse_readpage(struct file *file, struct page *page)

err = fuse_do_readpage(file, page);
fuse_invalidate_atime(inode);
+ if (!err)
+ return AOP_UPDATED_PAGE;
out:
unlock_page(page);
return err;
--
2.28.0

2020-09-17 16:14:09

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 04/13] ceph: Tell the VFS that readpage was synchronous

The ceph readpage implementation was already synchronous, so use
AOP_UPDATED_PAGE to avoid cycling the page lock.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/ceph/addr.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 6ea761c84494..b2bf8bf7a312 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -291,10 +291,11 @@ static int ceph_do_readpage(struct file *filp, struct page *page)
static int ceph_readpage(struct file *filp, struct page *page)
{
int r = ceph_do_readpage(filp, page);
- if (r != -EINPROGRESS)
- unlock_page(page);
- else
- r = 0;
+ if (r == -EINPROGRESS)
+ return 0;
+ if (r == 0)
+ return AOP_UPDATED_PAGE;
+ unlock_page(page);
return r;
}

--
2.28.0

2020-09-17 16:52:23

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 04/13] ceph: Tell the VFS that readpage was synchronous

On Thu, 2020-09-17 at 16:10 +0100, Matthew Wilcox (Oracle) wrote:
> The ceph readpage implementation was already synchronous, so use
> AOP_UPDATED_PAGE to avoid cycling the page lock.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> fs/ceph/addr.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 6ea761c84494..b2bf8bf7a312 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -291,10 +291,11 @@ static int ceph_do_readpage(struct file *filp, struct page *page)
> static int ceph_readpage(struct file *filp, struct page *page)
> {
> int r = ceph_do_readpage(filp, page);
> - if (r != -EINPROGRESS)
> - unlock_page(page);
> - else
> - r = 0;
> + if (r == -EINPROGRESS)
> + return 0;
> + if (r == 0)
> + return AOP_UPDATED_PAGE;
> + unlock_page(page);
> return r;
> }
>

Looks good to me. I assume you'll merge all of these as a set since the
early ones are a prerequisite?

Reviewed-by: Jeff Layton <[email protected]>

2020-09-17 19:43:38

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 10/13] jffs2: Tell the VFS that readpage was synchronous

The jffs2 readpage implementation was already synchronous, so use
AOP_UPDATED_PAGE to avoid cycling the page lock.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/jffs2/file.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/jffs2/file.c b/fs/jffs2/file.c
index f8fb89b10227..959a74027041 100644
--- a/fs/jffs2/file.c
+++ b/fs/jffs2/file.c
@@ -116,15 +116,17 @@ int jffs2_do_readpage_unlock(void *data, struct page *pg)
return ret;
}

-
static int jffs2_readpage (struct file *filp, struct page *pg)
{
struct jffs2_inode_info *f = JFFS2_INODE_INFO(pg->mapping->host);
int ret;

mutex_lock(&f->sem);
- ret = jffs2_do_readpage_unlock(pg->mapping->host, pg);
+ ret = jffs2_do_readpage_nolock(pg->mapping->host, pg);
mutex_unlock(&f->sem);
+ if (!ret)
+ return AOP_UPDATED_PAGE;
+ unlock_page(pg);
return ret;
}

--
2.28.0

2020-09-17 19:56:13

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 01/13] mm: Add AOP_UPDATED_PAGE return value

Allow synchronous ->readpage implementations to execute more
efficiently by skipping the re-locking of the page.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
Documentation/filesystems/locking.rst | 7 ++++---
Documentation/filesystems/vfs.rst | 21 ++++++++++++++-------
include/linux/fs.h | 5 +++++
mm/filemap.c | 12 ++++++++++--
4 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 64f94a18d97e..06a7a8bf2362 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -269,7 +269,7 @@ locking rules:
ops PageLocked(page) i_rwsem
====================== ======================== =========
writepage: yes, unlocks (see below)
-readpage: yes, unlocks
+readpage: yes, may unlock
writepages:
set_page_dirty no
readahead: yes, unlocks
@@ -294,8 +294,9 @@ swap_deactivate: no
->write_begin(), ->write_end() and ->readpage() may be called from
the request handler (/dev/loop).

-->readpage() unlocks the page, either synchronously or via I/O
-completion.
+->readpage() may return AOP_UPDATED_PAGE if the page is now Uptodate
+or 0 if the page will be unlocked asynchronously by I/O completion.
+If it returns -errno, it should unlock the page.

->readahead() unlocks the pages that I/O is attempted on like ->readpage().

diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index ca52c82e5bb5..16248c299aaa 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -643,7 +643,7 @@ set_page_dirty to write data into the address_space, and writepage and
writepages to writeback data to storage.

Adding and removing pages to/from an address_space is protected by the
-inode's i_mutex.
+inode's i_rwsem held exclusively.

When data is written to a page, the PG_Dirty flag should be set. It
typically remains set until writepage asks for it to be written. This
@@ -757,12 +757,19 @@ cache in your filesystem. The following members are defined:

``readpage``
called by the VM to read a page from backing store. The page
- will be Locked when readpage is called, and should be unlocked
- and marked uptodate once the read completes. If ->readpage
- discovers that it needs to unlock the page for some reason, it
- can do so, and then return AOP_TRUNCATED_PAGE. In this case,
- the page will be relocated, relocked and if that all succeeds,
- ->readpage will be called again.
+ will be Locked and !Uptodate when readpage is called. Ideally,
+ the filesystem will bring the page Uptodate and return
+ AOP_UPDATED_PAGE. If the filesystem encounters an error, it
+ should unlock the page and return a negative errno without marking
+ the page Uptodate. It does not need to mark the page as Error.
+ If the filesystem returns 0, this means the page will be unlocked
+ asynchronously by I/O completion. The VFS will wait for the
+ page to be unlocked, so there is no advantage to executing this
+ operation asynchronously.
+
+ The filesystem can also return AOP_TRUNCATED_PAGE to indicate
+ that it had to unlock the page to avoid a deadlock. The caller
+ will re-check the page cache and call ->readpage again.

``writepages``
called by the VM to write out pages associated with the
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e019ea2f1347..6fc650050d20 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -273,6 +273,10 @@ struct iattr {
* reference, it should drop it before retrying. Returned
* by readpage().
*
+ * @AOP_UPDATED_PAGE: The readpage method has brought the page Uptodate
+ * without releasing the page lock. This is suitable for synchronous
+ * implementations of readpage.
+ *
* address_space_operation functions return these large constants to indicate
* special semantics to the caller. These are much larger than the bytes in a
* page to allow for functions that return the number of bytes operated on in a
@@ -282,6 +286,7 @@ struct iattr {
enum positive_aop_returns {
AOP_WRITEPAGE_ACTIVATE = 0x80000,
AOP_TRUNCATED_PAGE = 0x80001,
+ AOP_UPDATED_PAGE = 0x80002,
};

#define AOP_FLAG_CONT_EXPAND 0x0001 /* called from cont_expand */
diff --git a/mm/filemap.c b/mm/filemap.c
index 1aaea26556cc..131a2aaa1537 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2254,8 +2254,10 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
* PG_error will be set again if readpage fails.
*/
ClearPageError(page);
- /* Start the actual read. The read will unlock the page. */
+ /* Start the actual read. The read may unlock the page. */
error = mapping->a_ops->readpage(filp, page);
+ if (error == AOP_UPDATED_PAGE)
+ goto page_ok;

if (unlikely(error)) {
if (error == AOP_TRUNCATED_PAGE) {
@@ -2619,7 +2621,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
*/
if (unlikely(!PageUptodate(page)))
goto page_not_uptodate;
-
+page_ok:
/*
* We've made it this far and we had to drop our mmap_lock, now is the
* time to return to the upper layer and have it re-find the vma and
@@ -2654,6 +2656,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
ClearPageError(page);
fpin = maybe_unlock_mmap_for_io(vmf, fpin);
error = mapping->a_ops->readpage(file, page);
+ if (error == AOP_UPDATED_PAGE)
+ goto page_ok;
if (!error) {
wait_on_page_locked(page);
if (!PageUptodate(page))
@@ -2867,6 +2871,10 @@ static struct page *do_read_cache_page(struct address_space *mapping,
err = filler(data, page);
else
err = mapping->a_ops->readpage(data, page);
+ if (err == AOP_UPDATED_PAGE) {
+ unlock_page(page);
+ goto out;
+ }

if (err < 0) {
put_page(page);
--
2.28.0

2020-09-17 19:56:53

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 07/13] ecryptfs: Tell the VFS that readpage was synchronous

The ecryptfs readpage implementation was already synchronous, so use
AOP_UPDATED_PAGE to avoid cycling the page lock.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/ecryptfs/mmap.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/ecryptfs/mmap.c b/fs/ecryptfs/mmap.c
index 019572c6b39a..dee35181d789 100644
--- a/fs/ecryptfs/mmap.c
+++ b/fs/ecryptfs/mmap.c
@@ -219,12 +219,13 @@ static int ecryptfs_readpage(struct file *file, struct page *page)
}
}
out:
- if (rc)
- ClearPageUptodate(page);
- else
- SetPageUptodate(page);
- ecryptfs_printk(KERN_DEBUG, "Unlocking page with index = [0x%.16lx]\n",
+ ecryptfs_printk(KERN_DEBUG, "Returning page with index = [0x%.16lx]\n",
page->index);
+ if (!rc) {
+ SetPageUptodate(page);
+ return AOP_UPDATED_PAGE;
+ }
+ ClearPageUptodate(page);
unlock_page(page);
return rc;
}
--
2.28.0

2020-09-17 19:57:01

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 06/13] cramfs: Tell the VFS that readpage was synchronous

The cramfs readpage implementation was already synchronous, so use
AOP_UPDATED_PAGE to avoid cycling the page lock.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/cramfs/inode.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 912308600d39..7a642146c074 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -916,15 +916,14 @@ static int cramfs_readpage(struct file *file, struct page *page)
flush_dcache_page(page);
kunmap(page);
SetPageUptodate(page);
- unlock_page(page);
- return 0;
+ return AOP_UPDATED_PAGE;

err:
kunmap(page);
ClearPageUptodate(page);
SetPageError(page);
unlock_page(page);
- return 0;
+ return -EIO;
}

static const struct address_space_operations cramfs_aops = {
--
2.28.0

2020-09-17 20:54:43

by Richard Weinberger

[permalink] [raw]
Subject: Re: [PATCH 11/13] ubifs: Tell the VFS that readpage was synchronous

----- Ursprüngliche Mail -----
> Von: "Matthew Wilcox" <[email protected]>
> An: "linux-fsdevel" <[email protected]>
> CC: "Matthew Wilcox" <[email protected]>, "linux-mm" <[email protected]>, [email protected],
> "linux-kernel" <[email protected]>, [email protected], "ceph-devel"
> <[email protected]>, [email protected], [email protected], "linux-um"
> <[email protected]>, "linux-mtd" <[email protected]>, "richard" <[email protected]>
> Gesendet: Donnerstag, 17. September 2020 17:10:48
> Betreff: [PATCH 11/13] ubifs: Tell the VFS that readpage was synchronous

> The ubifs readpage implementation was already synchronous, so use
> AOP_UPDATED_PAGE to avoid cycling the page lock.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> fs/ubifs/file.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)

For ubifs, jffs2 and hostfs:

Acked-by: Richard Weinberger <[email protected]>

Thanks,
//richard

2020-09-17 22:04:49

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 01/13] mm: Add AOP_UPDATED_PAGE return value

On Thu, Sep 17, 2020 at 04:10:38PM +0100, Matthew Wilcox (Oracle) wrote:
> +++ b/mm/filemap.c
> @@ -2254,8 +2254,10 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
> * PG_error will be set again if readpage fails.
> */
> ClearPageError(page);
> - /* Start the actual read. The read will unlock the page. */
> + /* Start the actual read. The read may unlock the page. */
> error = mapping->a_ops->readpage(filp, page);
> + if (error == AOP_UPDATED_PAGE)
> + goto page_ok;
>
> if (unlikely(error)) {
> if (error == AOP_TRUNCATED_PAGE) {

If anybody wants to actually test this, this hunk is wrong.

+++ b/mm/filemap.c
@@ -2256,8 +2256,11 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
ClearPageError(page);
/* Start the actual read. The read may unlock the page. */
error = mapping->a_ops->readpage(filp, page);
- if (error == AOP_UPDATED_PAGE)
+ if (error == AOP_UPDATED_PAGE) {
+ unlock_page(page);
+ error = 0;
goto page_ok;
+ }

if (unlikely(error)) {
if (error == AOP_TRUNCATED_PAGE) {

> @@ -2619,7 +2621,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
> */
> if (unlikely(!PageUptodate(page)))
> goto page_not_uptodate;
> -
> +page_ok:
> /*
> * We've made it this far and we had to drop our mmap_lock, now is the
> * time to return to the upper layer and have it re-find the vma and
> @@ -2654,6 +2656,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
> ClearPageError(page);
> fpin = maybe_unlock_mmap_for_io(vmf, fpin);
> error = mapping->a_ops->readpage(file, page);
> + if (error == AOP_UPDATED_PAGE)
> + goto page_ok;
> if (!error) {
> wait_on_page_locked(page);
> if (!PageUptodate(page))
> @@ -2867,6 +2871,10 @@ static struct page *do_read_cache_page(struct address_space *mapping,
> err = filler(data, page);
> else
> err = mapping->a_ops->readpage(data, page);
> + if (err == AOP_UPDATED_PAGE) {
> + unlock_page(page);
> + goto out;
> + }
>
> if (err < 0) {
> put_page(page);
> --
> 2.28.0
>

2020-09-18 06:00:55

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 02/13] 9p: Tell the VFS that readpage was synchronous

Matthew Wilcox (Oracle) wrote on Thu, Sep 17, 2020:
> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> index cce9ace651a2..506ca0ba2ec7 100644
> --- a/fs/9p/vfs_addr.c
> +++ b/fs/9p/vfs_addr.c
> @@ -280,6 +280,10 @@ static int v9fs_write_begin(struct file *filp, struct address_space *mapping,
> goto out;
>
> retval = v9fs_fid_readpage(v9inode->writeback_fid, page);
> + if (retval == AOP_UPDATED_PAGE) {
> + retval = 0;
> + goto out;
> + }

FWIW this is a change of behaviour; for some reason the code used to
loop back to grab_cache_page_write_begin() and bail out on
PageUptodate() I suppose; some sort of race check?
The whole pattern is a bit weird to me and 9p has no guarantee on
concurrent writes to a file with cache enabled (except that it will
corrupt something), so this part is fine with me.

What I'm curious about is the page used to be both unlocked and put, but
now isn't either and the return value hasn't changed for the caller to
make a difference on write_begin / I don't see any code change in the
vfs to handle that.
What did I miss?


(FWIW at least cifs in the series has the same pattern change; didn't
check all of them)


Thanks,
--
Dominique

2020-09-18 11:20:53

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 02/13] 9p: Tell the VFS that readpage was synchronous

On Fri, Sep 18, 2020 at 07:59:19AM +0200, Dominique Martinet wrote:
> Matthew Wilcox (Oracle) wrote on Thu, Sep 17, 2020:
> > diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> > index cce9ace651a2..506ca0ba2ec7 100644
> > --- a/fs/9p/vfs_addr.c
> > +++ b/fs/9p/vfs_addr.c
> > @@ -280,6 +280,10 @@ static int v9fs_write_begin(struct file *filp, struct address_space *mapping,
> > goto out;
> >
> > retval = v9fs_fid_readpage(v9inode->writeback_fid, page);
> > + if (retval == AOP_UPDATED_PAGE) {
> > + retval = 0;
> > + goto out;
> > + }
>
> FWIW this is a change of behaviour; for some reason the code used to
> loop back to grab_cache_page_write_begin() and bail out on
> PageUptodate() I suppose; some sort of race check?
> The whole pattern is a bit weird to me and 9p has no guarantee on
> concurrent writes to a file with cache enabled (except that it will
> corrupt something), so this part is fine with me.
>
> What I'm curious about is the page used to be both unlocked and put, but
> now isn't either and the return value hasn't changed for the caller to
> make a difference on write_begin / I don't see any code change in the
> vfs to handle that.
> What did I miss?

The page cache is kind of subtle. The grab_cache_page_write_begin()
will return a Locked page with an increased refcount. If it's Uptodate,
that's exactly what we want, and we return it. If we have to read the
page, readpage used to unlock the page before returning, and rather than
re-lock it, we would drop the reference to the page and look it up again.
It's possible that after dropping the lock on that page that the page
was replaced in the page cache and so we'd get a different page.

Anyway, now (unless fscache is involved), v9fs_fid_readpage will return
the page without unlocking it. So we don't need to do the dance of
dropping the lock, putting the refcount and looking the page back up
again. We can just return the page. The VFS doesn't need a special
return code because nothing has changed from the VFS's point of view --
it asked you to get a page and you got the page.

2020-09-18 12:32:18

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 02/13] 9p: Tell the VFS that readpage was synchronous

Matthew Wilcox (Oracle) wrote on Thu, Sep 17, 2020:
> The 9p readpage implementation was already synchronous, so use
> AOP_UPDATED_PAGE to avoid cycling the page lock.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

Acked-by: Dominique Martinet <[email protected]>

(I assume it'll be merged together with the rest)

> > What I'm curious about is the page used to be both unlocked and put, but
> > now isn't either and the return value hasn't changed for the caller to
> > make a difference on write_begin / I don't see any code change in the
> > vfs to handle that.
> > What did I miss?
>
> The page cache is kind of subtle. The grab_cache_page_write_begin()
> will return a Locked page with an increased refcount. If it's Uptodate,
> that's exactly what we want, and we return it. If we have to read the
> page, readpage used to unlock the page before returning, and rather than
> re-lock it, we would drop the reference to the page and look it up again.
> It's possible that after dropping the lock on that page that the page
> was replaced in the page cache and so we'd get a different page.

Thanks for the explanation, I didn't realize the page already is
gotten/locked at the PageUptodate goto out.

> Anyway, now (unless fscache is involved), v9fs_fid_readpage will return
> the page without unlocking it. So we don't need to do the dance of
> dropping the lock, putting the refcount and looking the page back up
> again. We can just return the page. The VFS doesn't need a special
> return code because nothing has changed from the VFS's point of view --
> it asked you to get a page and you got the page.

Yes, looks good to me.

Cheers,
--
Dominique

2020-09-24 13:47:34

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 12/13] udf: Tell the VFS that readpage was synchronous

On Thu 17-09-20 16:10:49, Matthew Wilcox (Oracle) wrote:
> The udf inline data readpage implementation was already synchronous,
> so use AOP_UPDATED_PAGE to avoid cycling the page lock.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

Looks good. You can add:

Reviewed-by: Jan Kara <[email protected]>

Honza
> ---
> fs/udf/file.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/udf/file.c b/fs/udf/file.c
> index 628941a6b79a..52bbe92d7c43 100644
> --- a/fs/udf/file.c
> +++ b/fs/udf/file.c
> @@ -61,9 +61,8 @@ static int udf_adinicb_readpage(struct file *file, struct page *page)
> {
> BUG_ON(!PageLocked(page));
> __udf_adinicb_readpage(page);
> - unlock_page(page);
>
> - return 0;
> + return AOP_UPDATED_PAGE;
> }
>
> static int udf_adinicb_writepage(struct page *page,
> --
> 2.28.0
>
--
Jan Kara <[email protected]>
SUSE Labs, CR