2022-12-16 15:08:15

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v3 0/7] Turn iomap_page_ops into iomap_folio_ops

This is an updated proposal for changing the iomap page_ops operations
to make them more flexible so that they better suite the filesystem
needs. It closes a race on gfs2 and cleans up the recent iomap changes
merged in the following upstream commit:

87be949912ee ("Merge tag 'xfs-6.2-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux")

The first patch introduces a folio_may_straddle_isize() helper as a
replacement for pagecache_isize_extended() when we have a locked folio.
This still needs independent verification, but it looks like a
worthwhile improvement to me. I've left it in this patch queue for now,
but I can moved out of the way if prefered.

Any thoughts?

Thanks,
Andreas

Andreas Gruenbacher (7):
fs: Add folio_may_straddle_isize helper
iomap: Add iomap_folio_done helper
iomap/gfs2: Unlock and put folio in page_done handler
iomap: Add iomap_folio_prepare helper
iomap: Get page in page_prepare handler
iomap/xfs: Eliminate the iomap_valid handler
iomap: Rename page_ops to folio_ops

fs/buffer.c | 5 +--
fs/ext4/inode.c | 13 +++---
fs/gfs2/bmap.c | 43 +++++++++++++------
fs/iomap/buffered-io.c | 95 +++++++++++++++++++++---------------------
fs/xfs/xfs_iomap.c | 42 +++++++++++++------
include/linux/iomap.h | 46 +++++++-------------
include/linux/mm.h | 2 +
mm/truncate.c | 35 ++++++++++++++++
8 files changed, 169 insertions(+), 112 deletions(-)

--
2.38.1


2022-12-16 15:08:15

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v3 1/7] fs: Add folio_may_straddle_isize helper

Add a folio_may_straddle_isize() helper as a replacement for
pagecache_isize_extended() when we have a locked folio.

Use the new helper in generic_write_end(), iomap_write_end(),
ext4_write_end(), and ext4_journalled_write_end().

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/buffer.c | 5 ++---
fs/ext4/inode.c | 13 ++++++-------
fs/iomap/buffered-io.c | 3 +--
include/linux/mm.h | 2 ++
mm/truncate.c | 35 +++++++++++++++++++++++++++++++++++
5 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index d9c6d1fbb6dd..bbae1437994b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2164,16 +2164,15 @@ int generic_write_end(struct file *file, struct address_space *mapping,
* But it's important to update i_size while still holding page lock:
* page writeout could otherwise come in and zero beyond i_size.
*/
- if (pos + copied > inode->i_size) {
+ if (pos + copied > old_size) {
i_size_write(inode, pos + copied);
i_size_changed = true;
+ folio_may_straddle_isize(inode, page_folio(page), old_size, pos);
}

unlock_page(page);
put_page(page);

- if (old_size < pos)
- pagecache_isize_extended(inode, old_size, pos);
/*
* Don't mark the inode dirty under page lock. First, it unnecessarily
* makes the holding time of page lock longer. Second, it forces lock
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9d9f414f99fe..6fe1c9609d86 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1327,13 +1327,13 @@ static int ext4_write_end(struct file *file,
* If FS_IOC_ENABLE_VERITY is running on this inode, then Merkle tree
* blocks are being written past EOF, so skip the i_size update.
*/
- if (!verity)
+ if (!verity) {
i_size_changed = ext4_update_inode_size(inode, pos + copied);
+ folio_may_straddle_isize(inode, page_folio(page), old_size, pos);
+ }
unlock_page(page);
put_page(page);

- if (old_size < pos && !verity)
- pagecache_isize_extended(inode, old_size, pos);
/*
* Don't mark the inode dirty under page lock. First, it unnecessarily
* makes the holding time of page lock longer. Second, it forces lock
@@ -1439,16 +1439,15 @@ static int ext4_journalled_write_end(struct file *file,
if (!partial)
SetPageUptodate(page);
}
- if (!verity)
+ if (!verity) {
size_changed = ext4_update_inode_size(inode, pos + copied);
+ folio_may_straddle_isize(inode, page_folio(page), old_size, pos);
+ }
ext4_set_inode_state(inode, EXT4_STATE_JDATA);
EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
unlock_page(page);
put_page(page);

- if (old_size < pos && !verity)
- pagecache_isize_extended(inode, old_size, pos);
-
if (size_changed) {
ret2 = ext4_mark_inode_dirty(handle, inode);
if (!ret)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 356193e44cf0..347010c6a652 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -734,11 +734,10 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
if (pos + ret > old_size) {
i_size_write(iter->inode, pos + ret);
iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
+ folio_may_straddle_isize(iter->inode, folio, old_size, pos);
}
folio_unlock(folio);

- if (old_size < pos)
- pagecache_isize_extended(iter->inode, old_size, pos);
if (page_ops && page_ops->page_done)
page_ops->page_done(iter->inode, pos, ret, &folio->page);
folio_put(folio);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8178fe894e2e..a8632747780e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2016,6 +2016,8 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,

extern void truncate_pagecache(struct inode *inode, loff_t new);
extern void truncate_setsize(struct inode *inode, loff_t newsize);
+void folio_may_straddle_isize(struct inode *inode, struct folio *folio,
+ loff_t old_size, loff_t start);
void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to);
void truncate_pagecache_range(struct inode *inode, loff_t offset, loff_t end);
int generic_error_remove_page(struct address_space *mapping, struct page *page);
diff --git a/mm/truncate.c b/mm/truncate.c
index 7b4ea4c4a46b..971b08399144 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -769,6 +769,41 @@ void truncate_setsize(struct inode *inode, loff_t newsize)
}
EXPORT_SYMBOL(truncate_setsize);

+/**
+ * folio_may_straddle_isize - update pagecache after extending i_size
+ * @inode: inode for which i_size was extended
+ * @folio: folio to maybe mark read-only
+ * @old_size: original inode size
+ * @start: start of the write
+ *
+ * Handle extending an inode by a write that starts behind the old inode size.
+ * If a block-aligned hole exists between the old inode size and the start of
+ * the write, we mark the folio read-only so that page_mkwrite() is called on
+ * the nearest write access to the page. That way, the filesystem can be sure
+ * that page_mkwrite() is called on the page before a user writes to the page
+ * via mmap.
+ *
+ * This function must be called while we still hold i_rwsem - this not only
+ * makes sure i_size is stable but also that userspace cannot observe the new
+ * i_size value before we are prepared to handle mmap writes there.
+ */
+void folio_may_straddle_isize(struct inode *inode, struct folio *folio,
+ loff_t old_size, loff_t start)
+{
+ unsigned int blocksize = i_blocksize(inode);
+
+ if (round_up(old_size, blocksize) >= round_down(start, blocksize))
+ return;
+
+ /*
+ * See clear_page_dirty_for_io() for details why folio_set_dirty()
+ * is needed.
+ */
+ if (folio_mkclean(folio))
+ folio_set_dirty(folio);
+}
+EXPORT_SYMBOL(folio_may_straddle_isize);
+
/**
* pagecache_isize_extended - update pagecache after extension of i_size
* @inode: inode for which i_size was extended
--
2.38.1

2022-12-16 15:08:26

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v3 3/7] iomap/gfs2: Unlock and put folio in page_done handler

When an iomap defines a ->page_done() handler in its page_ops, delegate
unlocking the folio and putting the folio reference to that handler.

This allows to fix a race between journaled data writes and folio
writeback in gfs2: before this change, gfs2_iomap_page_done() was called
after unlocking the folio, so writeback could start writing the folio's
buffers back before they could be marked for writing to the journal.
Also, try_to_free_buffers() could free the buffers before
gfs2_iomap_page_done() was done adding the buffers to the current
current transaction. With this change, gfs2_iomap_page_done() can add
the buffers to the current transaction while the folio is still locked.
It can then unlock the folio and complete the current transaction.

(If we just moved the entire ->page_done() handler under the folio lock,
dirtying the inode could deadlock with the locked folio on filesystems
with a block size smaller than the page size.)

The only current user of ->page_done() is gfs2, so other filesystems are
not affected. Still, to catch out any new users, switch from a page to
a folio in ->page_done().

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/gfs2/bmap.c | 15 ++++++++++++---
fs/iomap/buffered-io.c | 8 ++++----
include/linux/iomap.h | 7 ++++---
3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 3bdb2c668a71..11115fce68cb 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -971,14 +971,23 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
}

static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
- unsigned copied, struct page *page)
+ unsigned copied, struct folio *folio)
{
struct gfs2_trans *tr = current->journal_info;
struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_sbd *sdp = GFS2_SB(inode);

- if (page && !gfs2_is_stuffed(ip))
- gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
+ if (!folio) {
+ gfs2_trans_end(sdp);
+ return;
+ }
+
+ if (!gfs2_is_stuffed(ip))
+ gfs2_page_add_databufs(ip, &folio->page, offset_in_page(pos),
+ copied);
+
+ folio_unlock(folio);
+ folio_put(folio);

if (tr->tr_num_buf_new)
__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8ce9abb29d46..517ad5380a62 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -580,12 +580,12 @@ static void iomap_folio_done(struct iomap_iter *iter, loff_t pos, size_t ret,
{
const struct iomap_page_ops *page_ops = iter->iomap.page_ops;

- if (folio)
+ if (page_ops && page_ops->page_done) {
+ page_ops->page_done(iter->inode, pos, ret, folio);
+ } else if (folio) {
folio_unlock(folio);
- if (page_ops && page_ops->page_done)
- page_ops->page_done(iter->inode, pos, ret, &folio->page);
- if (folio)
folio_put(folio);
+ }
}

static int iomap_write_begin_inline(const struct iomap_iter *iter,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0983dfc9a203..743e2a909162 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -131,13 +131,14 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap)
* associated with them.
*
* When page_prepare succeeds, page_done will always be called to do any
- * cleanup work necessary. In that page_done call, @page will be NULL if the
- * associated page could not be obtained.
+ * cleanup work necessary. In that page_done call, @folio will be NULL if the
+ * associated folio could not be obtained. When folio is not NULL, page_done
+ * is responsible for unlocking and putting the folio.
*/
struct iomap_page_ops {
int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len);
void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
- struct page *page);
+ struct folio *folio);

/*
* Check that the cached iomap still maps correctly to the filesystem's
--
2.38.1

2022-12-16 15:08:40

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v3 4/7] iomap: Add iomap_folio_prepare helper

Add an iomap_folio_prepare() helper that gets a folio reference based on
an iomap iterator and an offset into the address space.

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/iomap/buffered-io.c | 27 +++++++++++++++++++++------
include/linux/iomap.h | 1 +
2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 517ad5380a62..f0f167ca1b2e 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -457,6 +457,26 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
}
EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);

+/**
+ * iomap_folio_prepare - get a folio reference for writing
+ * @iter: iteration structure
+ * @pos: start offset of write
+ *
+ * Returns a locked reference to the folio at @pos, or NULL if the folio could
+ * not be obtained.
+ */
+struct folio *iomap_folio_prepare(struct iomap_iter *iter, loff_t pos)
+{
+ unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
+
+ if (iter->flags & IOMAP_NOWAIT)
+ fgp |= FGP_NOWAIT;
+
+ return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
+ fgp, mapping_gfp_mask(iter->inode->i_mapping));
+}
+EXPORT_SYMBOL(iomap_folio_prepare);
+
bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags)
{
trace_iomap_release_folio(folio->mapping->host, folio_pos(folio),
@@ -603,12 +623,8 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
const struct iomap *srcmap = iomap_iter_srcmap(iter);
struct folio *folio;
- unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
int status = 0;

- if (iter->flags & IOMAP_NOWAIT)
- fgp |= FGP_NOWAIT;
-
BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
if (srcmap != &iter->iomap)
BUG_ON(pos + len > srcmap->offset + srcmap->length);
@@ -625,8 +641,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
return status;
}

- folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
- fgp, mapping_gfp_mask(iter->inode->i_mapping));
+ folio = iomap_folio_prepare(iter, pos);
if (!folio) {
status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
iomap_folio_done(iter, pos, 0, NULL);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 743e2a909162..0bf16ef22d81 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -261,6 +261,7 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
+struct folio *iomap_folio_prepare(struct iomap_iter *iter, loff_t pos);
bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
--
2.38.1

2022-12-16 15:08:42

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v3 2/7] iomap: Add iomap_folio_done helper

Add an iomap_folio_done() helper to encapsulate unlocking the folio,
calling ->page_done(), and putting the folio. This doesn't change the
functionality.

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/iomap/buffered-io.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 347010c6a652..8ce9abb29d46 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -575,6 +575,19 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
return 0;
}

+static void iomap_folio_done(struct iomap_iter *iter, loff_t pos, size_t ret,
+ struct folio *folio)
+{
+ const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
+
+ if (folio)
+ folio_unlock(folio);
+ if (page_ops && page_ops->page_done)
+ page_ops->page_done(iter->inode, pos, ret, &folio->page);
+ if (folio)
+ folio_put(folio);
+}
+
static int iomap_write_begin_inline(const struct iomap_iter *iter,
struct folio *folio)
{
@@ -616,7 +629,8 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
fgp, mapping_gfp_mask(iter->inode->i_mapping));
if (!folio) {
status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
- goto out_no_page;
+ iomap_folio_done(iter, pos, 0, NULL);
+ return status;
}

/*
@@ -656,13 +670,9 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
return 0;

out_unlock:
- folio_unlock(folio);
- folio_put(folio);
+ iomap_folio_done(iter, pos, 0, folio);
iomap_write_failed(iter->inode, pos, len);

-out_no_page:
- if (page_ops && page_ops->page_done)
- page_ops->page_done(iter->inode, pos, 0, NULL);
return status;
}

@@ -712,7 +722,6 @@ static size_t iomap_write_end_inline(const struct iomap_iter *iter,
static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
size_t copied, struct folio *folio)
{
- const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
const struct iomap *srcmap = iomap_iter_srcmap(iter);
loff_t old_size = iter->inode->i_size;
size_t ret;
@@ -736,11 +745,8 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
folio_may_straddle_isize(iter->inode, folio, old_size, pos);
}
- folio_unlock(folio);

- if (page_ops && page_ops->page_done)
- page_ops->page_done(iter->inode, pos, ret, &folio->page);
- folio_put(folio);
+ iomap_folio_done(iter, pos, ret, folio);

if (ret < len)
iomap_write_failed(iter->inode, pos + ret, len - ret);
--
2.38.1

2022-12-16 15:09:04

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v3 5/7] iomap: Get page in page_prepare handler

Change the iomap ->page_prepare() handler to get and return a locked
folio instead of doing that in iomap_write_begin(). This allows to
recover from out-of-memory situations in ->page_prepare(), which
eliminates the corresponding error handling code in iomap_write_begin().
The ->page_done() handler is now not called with a NULL folio anymore.

Filesystems are expected to use the iomap_folio_prepare() helper for
getting locked folios in their ->page_prepare() handlers.

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/gfs2/bmap.c | 16 +++++++++++++---
fs/iomap/buffered-io.c | 21 +++++++++------------
include/linux/iomap.h | 9 +++++----
3 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 11115fce68cb..cd5984d3ba50 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -959,15 +959,25 @@ static int __gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
goto out;
}

-static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
- unsigned len)
+static struct folio *
+gfs2_iomap_page_prepare(struct iomap_iter *iter, loff_t pos, unsigned len)
{
+ struct inode *inode = iter->inode;
unsigned int blockmask = i_blocksize(inode) - 1;
struct gfs2_sbd *sdp = GFS2_SB(inode);
unsigned int blocks;
+ struct folio *folio;
+ int status;

blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits;
- return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
+ status = gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
+ if (status)
+ return ERR_PTR(status);
+
+ folio = iomap_folio_prepare(iter, pos);
+ if (!folio)
+ gfs2_trans_end(sdp);
+ return folio;
}

static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f0f167ca1b2e..6b7c1a10b8ec 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -602,7 +602,7 @@ static void iomap_folio_done(struct iomap_iter *iter, loff_t pos, size_t ret,

if (page_ops && page_ops->page_done) {
page_ops->page_done(iter->inode, pos, ret, folio);
- } else if (folio) {
+ } else {
folio_unlock(folio);
folio_put(folio);
}
@@ -635,17 +635,14 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
if (!mapping_large_folio_support(iter->inode->i_mapping))
len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));

- if (page_ops && page_ops->page_prepare) {
- status = page_ops->page_prepare(iter->inode, pos, len);
- if (status)
- return status;
- }
-
- folio = iomap_folio_prepare(iter, pos);
- if (!folio) {
- status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
- iomap_folio_done(iter, pos, 0, NULL);
- return status;
+ if (page_ops && page_ops->page_prepare)
+ folio = page_ops->page_prepare(iter, pos, len);
+ else
+ folio = iomap_folio_prepare(iter, pos);
+ if (IS_ERR_OR_NULL(folio)) {
+ if (!folio)
+ return (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
+ return PTR_ERR(folio);
}

/*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0bf16ef22d81..c74ab8c53b47 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -13,6 +13,7 @@
struct address_space;
struct fiemap_extent_info;
struct inode;
+struct iomap_iter;
struct iomap_dio;
struct iomap_writepage_ctx;
struct iov_iter;
@@ -131,12 +132,12 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap)
* associated with them.
*
* When page_prepare succeeds, page_done will always be called to do any
- * cleanup work necessary. In that page_done call, @folio will be NULL if the
- * associated folio could not be obtained. When folio is not NULL, page_done
- * is responsible for unlocking and putting the folio.
+ * cleanup work necessary. page_done is responsible for unlocking and putting
+ * @folio.
*/
struct iomap_page_ops {
- int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len);
+ struct folio *(*page_prepare)(struct iomap_iter *iter, loff_t pos,
+ unsigned len);
void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
struct folio *folio);

--
2.38.1

2022-12-16 15:09:04

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v3 6/7] iomap/xfs: Eliminate the iomap_valid handler

Eliminate the ->iomap_valid() handler by switching to a ->page_prepare()
handler and validating the mapping there.

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/iomap/buffered-io.c | 24 ++++--------------------
fs/xfs/xfs_iomap.c | 38 +++++++++++++++++++++++++++-----------
include/linux/iomap.h | 17 -----------------
3 files changed, 31 insertions(+), 48 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 6b7c1a10b8ec..b73ff317da21 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -623,7 +623,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
const struct iomap *srcmap = iomap_iter_srcmap(iter);
struct folio *folio;
- int status = 0;
+ int status;

BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
if (srcmap != &iter->iomap)
@@ -642,27 +642,11 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
if (IS_ERR_OR_NULL(folio)) {
if (!folio)
return (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
- return PTR_ERR(folio);
- }
-
- /*
- * Now we have a locked folio, before we do anything with it we need to
- * check that the iomap we have cached is not stale. The inode extent
- * mapping can change due to concurrent IO in flight (e.g.
- * IOMAP_UNWRITTEN state can change and memory reclaim could have
- * reclaimed a previously partially written page at this index after IO
- * completion before this write reaches this file offset) and hence we
- * could do the wrong thing here (zero a page range incorrectly or fail
- * to zero) and corrupt data.
- */
- if (page_ops && page_ops->iomap_valid) {
- bool iomap_valid = page_ops->iomap_valid(iter->inode,
- &iter->iomap);
- if (!iomap_valid) {
+ if (folio == ERR_PTR(-ESTALE)) {
iter->iomap.flags |= IOMAP_F_STALE;
- status = 0;
- goto out_unlock;
+ return 0;
}
+ return PTR_ERR(folio);
}

if (pos + len > folio_pos(folio) + folio_size(folio))
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 669c1bc5c3a7..2248ce7be2e3 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -62,29 +62,45 @@ xfs_iomap_inode_sequence(
return cookie | READ_ONCE(ip->i_df.if_seq);
}

-/*
- * Check that the iomap passed to us is still valid for the given offset and
- * length.
- */
-static bool
-xfs_iomap_valid(
- struct inode *inode,
- const struct iomap *iomap)
+static struct folio *
+xfs_page_prepare(
+ struct iomap_iter *iter,
+ loff_t pos,
+ unsigned len)
{
+ struct inode *inode = iter->inode;
+ struct iomap *iomap = &iter->iomap;
struct xfs_inode *ip = XFS_I(inode);
+ struct folio *folio;

+ folio = iomap_folio_prepare(iter, pos);
+ if (!folio)
+ return NULL;
+
+ /*
+ * Now we have a locked folio, before we do anything with it we need to
+ * check that the iomap we have cached is not stale. The inode extent
+ * mapping can change due to concurrent IO in flight (e.g.
+ * IOMAP_UNWRITTEN state can change and memory reclaim could have
+ * reclaimed a previously partially written page at this index after IO
+ * completion before this write reaches this file offset) and hence we
+ * could do the wrong thing here (zero a page range incorrectly or fail
+ * to zero) and corrupt data.
+ */
if (iomap->validity_cookie !=
xfs_iomap_inode_sequence(ip, iomap->flags)) {
trace_xfs_iomap_invalid(ip, iomap);
- return false;
+ folio_unlock(folio);
+ folio_put(folio);
+ return ERR_PTR(-ESTALE);
}

XFS_ERRORTAG_DELAY(ip->i_mount, XFS_ERRTAG_WRITE_DELAY_MS);
- return true;
+ return folio;
}

const struct iomap_page_ops xfs_iomap_page_ops = {
- .iomap_valid = xfs_iomap_valid,
+ .page_prepare = xfs_page_prepare,
};

int
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index c74ab8c53b47..1c8b9a04b0bb 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -140,23 +140,6 @@ struct iomap_page_ops {
unsigned len);
void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
struct folio *folio);
-
- /*
- * Check that the cached iomap still maps correctly to the filesystem's
- * internal extent map. FS internal extent maps can change while iomap
- * is iterating a cached iomap, so this hook allows iomap to detect that
- * the iomap needs to be refreshed during a long running write
- * operation.
- *
- * The filesystem can store internal state (e.g. a sequence number) in
- * iomap->validity_cookie when the iomap is first mapped to be able to
- * detect changes between mapping time and whenever .iomap_valid() is
- * called.
- *
- * This is called with the folio over the specified file position held
- * locked by the iomap code.
- */
- bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
};

/*
--
2.38.1

2022-12-16 15:09:22

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v3 7/7] iomap: Rename page_ops to folio_ops

The operations in struct page_ops all operate on folios, so rename
struct page_ops to struct folio_ops, ->page_prepare() to
->folio_prepare(), and ->page_done() to ->folio_done().

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/gfs2/bmap.c | 16 ++++++++--------
fs/iomap/buffered-io.c | 12 ++++++------
fs/xfs/xfs_iomap.c | 8 ++++----
include/linux/iomap.h | 22 +++++++++++-----------
4 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index cd5984d3ba50..ba8627ddc2bc 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -960,7 +960,7 @@ static int __gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
}

static struct folio *
-gfs2_iomap_page_prepare(struct iomap_iter *iter, loff_t pos, unsigned len)
+gfs2_iomap_folio_prepare(struct iomap_iter *iter, loff_t pos, unsigned len)
{
struct inode *inode = iter->inode;
unsigned int blockmask = i_blocksize(inode) - 1;
@@ -980,8 +980,8 @@ gfs2_iomap_page_prepare(struct iomap_iter *iter, loff_t pos, unsigned len)
return folio;
}

-static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
- unsigned copied, struct folio *folio)
+static void gfs2_iomap_folio_done(struct inode *inode, loff_t pos,
+ unsigned copied, struct folio *folio)
{
struct gfs2_trans *tr = current->journal_info;
struct gfs2_inode *ip = GFS2_I(inode);
@@ -1005,9 +1005,9 @@ static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
gfs2_trans_end(sdp);
}

-static const struct iomap_page_ops gfs2_iomap_page_ops = {
- .page_prepare = gfs2_iomap_page_prepare,
- .page_done = gfs2_iomap_page_done,
+static const struct iomap_folio_ops gfs2_iomap_folio_ops = {
+ .folio_prepare = gfs2_iomap_folio_prepare,
+ .folio_done = gfs2_iomap_folio_done,
};

static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
@@ -1083,7 +1083,7 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
}

if (gfs2_is_stuffed(ip) || gfs2_is_jdata(ip))
- iomap->page_ops = &gfs2_iomap_page_ops;
+ iomap->folio_ops = &gfs2_iomap_folio_ops;
return 0;

out_trans_end:
@@ -1299,7 +1299,7 @@ int gfs2_alloc_extent(struct inode *inode, u64 lblock, u64 *dblock,
/*
* NOTE: Never call gfs2_block_zero_range with an open transaction because it
* uses iomap write to perform its actions, which begin their own transactions
- * (iomap_begin, page_prepare, etc.)
+ * (iomap_begin, folio_prepare, etc.)
*/
static int gfs2_block_zero_range(struct inode *inode, loff_t from,
unsigned int length)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index b73ff317da21..da4570d9d1ff 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -598,10 +598,10 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
static void iomap_folio_done(struct iomap_iter *iter, loff_t pos, size_t ret,
struct folio *folio)
{
- const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
+ const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;

- if (page_ops && page_ops->page_done) {
- page_ops->page_done(iter->inode, pos, ret, folio);
+ if (folio_ops && folio_ops->folio_done) {
+ folio_ops->folio_done(iter->inode, pos, ret, folio);
} else {
folio_unlock(folio);
folio_put(folio);
@@ -620,7 +620,7 @@ static int iomap_write_begin_inline(const struct iomap_iter *iter,
static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
size_t len, struct folio **foliop)
{
- const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
+ const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
const struct iomap *srcmap = iomap_iter_srcmap(iter);
struct folio *folio;
int status;
@@ -635,8 +635,8 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
if (!mapping_large_folio_support(iter->inode->i_mapping))
len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));

- if (page_ops && page_ops->page_prepare)
- folio = page_ops->page_prepare(iter, pos, len);
+ if (folio_ops && folio_ops->folio_prepare)
+ folio = folio_ops->folio_prepare(iter, pos, len);
else
folio = iomap_folio_prepare(iter, pos);
if (IS_ERR_OR_NULL(folio)) {
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 2248ce7be2e3..79b3f2d4c8ab 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -63,7 +63,7 @@ xfs_iomap_inode_sequence(
}

static struct folio *
-xfs_page_prepare(
+xfs_folio_prepare(
struct iomap_iter *iter,
loff_t pos,
unsigned len)
@@ -99,8 +99,8 @@ xfs_page_prepare(
return folio;
}

-const struct iomap_page_ops xfs_iomap_page_ops = {
- .page_prepare = xfs_page_prepare,
+const struct iomap_folio_ops xfs_iomap_folio_ops = {
+ .folio_prepare = xfs_folio_prepare,
};

int
@@ -149,7 +149,7 @@ xfs_bmbt_to_iomap(
iomap->flags |= IOMAP_F_DIRTY;

iomap->validity_cookie = sequence_cookie;
- iomap->page_ops = &xfs_iomap_page_ops;
+ iomap->folio_ops = &xfs_iomap_folio_ops;
return 0;
}

diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 1c8b9a04b0bb..85d360881851 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -86,7 +86,7 @@ struct vm_fault;
*/
#define IOMAP_NULL_ADDR -1ULL /* addr is not valid */

-struct iomap_page_ops;
+struct iomap_folio_ops;

struct iomap {
u64 addr; /* disk offset of mapping, bytes */
@@ -98,7 +98,7 @@ struct iomap {
struct dax_device *dax_dev; /* dax_dev for dax operations */
void *inline_data;
void *private; /* filesystem private */
- const struct iomap_page_ops *page_ops;
+ const struct iomap_folio_ops *folio_ops;
u64 validity_cookie; /* used with .iomap_valid() */
};

@@ -126,19 +126,19 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap)
}

/*
- * When a filesystem sets page_ops in an iomap mapping it returns, page_prepare
- * and page_done will be called for each page written to. This only applies to
- * buffered writes as unbuffered writes will not typically have pages
- * associated with them.
+ * When a filesystem sets folio_ops in an iomap mapping it returns,
+ * folio_prepare and folio_done will be called for each page written to. This
+ * only applies to buffered writes as unbuffered writes will not typically have
+ * pages associated with them.
*
- * When page_prepare succeeds, page_done will always be called to do any
- * cleanup work necessary. page_done is responsible for unlocking and putting
+ * When folio_prepare succeeds, folio_done will always be called to do any
+ * cleanup work necessary. folio_done is responsible for unlocking and putting
* @folio.
*/
-struct iomap_page_ops {
- struct folio *(*page_prepare)(struct iomap_iter *iter, loff_t pos,
+struct iomap_folio_ops {
+ struct folio *(*folio_prepare)(struct iomap_iter *iter, loff_t pos,
unsigned len);
- void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
+ void (*folio_done)(struct inode *inode, loff_t pos, unsigned copied,
struct folio *folio);
};

--
2.38.1

2022-12-16 16:32:54

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC v3 5/7] iomap: Get page in page_prepare handler

On Fri, Dec 16, 2022 at 04:06:24PM +0100, Andreas Gruenbacher wrote:
> + if (page_ops && page_ops->page_prepare)
> + folio = page_ops->page_prepare(iter, pos, len);
> + else
> + folio = iomap_folio_prepare(iter, pos);
> + if (IS_ERR_OR_NULL(folio)) {
> + if (!folio)
> + return (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
> + return PTR_ERR(folio);

Wouldn't it be cleaner if iomap_folio_prepare() always
returned an ERR_PTR on failure?

2022-12-16 17:18:28

by Andreas Gruenbacher

[permalink] [raw]
Subject: Re: [RFC v3 5/7] iomap: Get page in page_prepare handler

On Fri, Dec 16, 2022 at 5:30 PM Matthew Wilcox <[email protected]> wrote:
> On Fri, Dec 16, 2022 at 04:06:24PM +0100, Andreas Gruenbacher wrote:
> > + if (page_ops && page_ops->page_prepare)
> > + folio = page_ops->page_prepare(iter, pos, len);
> > + else
> > + folio = iomap_folio_prepare(iter, pos);
> > + if (IS_ERR_OR_NULL(folio)) {
> > + if (!folio)
> > + return (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
> > + return PTR_ERR(folio);
>
> Wouldn't it be cleaner if iomap_folio_prepare() always
> returned an ERR_PTR on failure?

Yes indeed, thanks.

Andreas

2022-12-18 22:17:45

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v4 1/7] fs: Add folio_may_straddle_isize helper

Add a folio_may_straddle_isize() helper as a replacement for
pagecache_isize_extended() when we have a locked folio.

Use the new helper in generic_write_end(), iomap_write_end(),
ext4_write_end(), and ext4_journalled_write_end().

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/buffer.c | 5 ++---
fs/ext4/inode.c | 13 ++++++-------
fs/iomap/buffered-io.c | 3 +--
include/linux/mm.h | 2 ++
mm/truncate.c | 35 +++++++++++++++++++++++++++++++++++
5 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index d9c6d1fbb6dd..bbae1437994b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2164,16 +2164,15 @@ int generic_write_end(struct file *file, struct address_space *mapping,
* But it's important to update i_size while still holding page lock:
* page writeout could otherwise come in and zero beyond i_size.
*/
- if (pos + copied > inode->i_size) {
+ if (pos + copied > old_size) {
i_size_write(inode, pos + copied);
i_size_changed = true;
+ folio_may_straddle_isize(inode, page_folio(page), old_size, pos);
}

unlock_page(page);
put_page(page);

- if (old_size < pos)
- pagecache_isize_extended(inode, old_size, pos);
/*
* Don't mark the inode dirty under page lock. First, it unnecessarily
* makes the holding time of page lock longer. Second, it forces lock
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9d9f414f99fe..6fe1c9609d86 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1327,13 +1327,13 @@ static int ext4_write_end(struct file *file,
* If FS_IOC_ENABLE_VERITY is running on this inode, then Merkle tree
* blocks are being written past EOF, so skip the i_size update.
*/
- if (!verity)
+ if (!verity) {
i_size_changed = ext4_update_inode_size(inode, pos + copied);
+ folio_may_straddle_isize(inode, page_folio(page), old_size, pos);
+ }
unlock_page(page);
put_page(page);

- if (old_size < pos && !verity)
- pagecache_isize_extended(inode, old_size, pos);
/*
* Don't mark the inode dirty under page lock. First, it unnecessarily
* makes the holding time of page lock longer. Second, it forces lock
@@ -1439,16 +1439,15 @@ static int ext4_journalled_write_end(struct file *file,
if (!partial)
SetPageUptodate(page);
}
- if (!verity)
+ if (!verity) {
size_changed = ext4_update_inode_size(inode, pos + copied);
+ folio_may_straddle_isize(inode, page_folio(page), old_size, pos);
+ }
ext4_set_inode_state(inode, EXT4_STATE_JDATA);
EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
unlock_page(page);
put_page(page);

- if (old_size < pos && !verity)
- pagecache_isize_extended(inode, old_size, pos);
-
if (size_changed) {
ret2 = ext4_mark_inode_dirty(handle, inode);
if (!ret)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 356193e44cf0..347010c6a652 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -734,11 +734,10 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
if (pos + ret > old_size) {
i_size_write(iter->inode, pos + ret);
iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
+ folio_may_straddle_isize(iter->inode, folio, old_size, pos);
}
folio_unlock(folio);

- if (old_size < pos)
- pagecache_isize_extended(iter->inode, old_size, pos);
if (page_ops && page_ops->page_done)
page_ops->page_done(iter->inode, pos, ret, &folio->page);
folio_put(folio);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8178fe894e2e..a8632747780e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2016,6 +2016,8 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,

extern void truncate_pagecache(struct inode *inode, loff_t new);
extern void truncate_setsize(struct inode *inode, loff_t newsize);
+void folio_may_straddle_isize(struct inode *inode, struct folio *folio,
+ loff_t old_size, loff_t start);
void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to);
void truncate_pagecache_range(struct inode *inode, loff_t offset, loff_t end);
int generic_error_remove_page(struct address_space *mapping, struct page *page);
diff --git a/mm/truncate.c b/mm/truncate.c
index 7b4ea4c4a46b..971b08399144 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -769,6 +769,41 @@ void truncate_setsize(struct inode *inode, loff_t newsize)
}
EXPORT_SYMBOL(truncate_setsize);

+/**
+ * folio_may_straddle_isize - update pagecache after extending i_size
+ * @inode: inode for which i_size was extended
+ * @folio: folio to maybe mark read-only
+ * @old_size: original inode size
+ * @start: start of the write
+ *
+ * Handle extending an inode by a write that starts behind the old inode size.
+ * If a block-aligned hole exists between the old inode size and the start of
+ * the write, we mark the folio read-only so that page_mkwrite() is called on
+ * the nearest write access to the page. That way, the filesystem can be sure
+ * that page_mkwrite() is called on the page before a user writes to the page
+ * via mmap.
+ *
+ * This function must be called while we still hold i_rwsem - this not only
+ * makes sure i_size is stable but also that userspace cannot observe the new
+ * i_size value before we are prepared to handle mmap writes there.
+ */
+void folio_may_straddle_isize(struct inode *inode, struct folio *folio,
+ loff_t old_size, loff_t start)
+{
+ unsigned int blocksize = i_blocksize(inode);
+
+ if (round_up(old_size, blocksize) >= round_down(start, blocksize))
+ return;
+
+ /*
+ * See clear_page_dirty_for_io() for details why folio_set_dirty()
+ * is needed.
+ */
+ if (folio_mkclean(folio))
+ folio_set_dirty(folio);
+}
+EXPORT_SYMBOL(folio_may_straddle_isize);
+
/**
* pagecache_isize_extended - update pagecache after extension of i_size
* @inode: inode for which i_size was extended
--
2.38.1

2022-12-18 22:17:46

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v4 2/7] iomap: Add iomap_folio_done helper

Add an iomap_folio_done() helper to encapsulate unlocking the folio,
calling ->page_done(), and putting the folio. This doesn't change the
functionality.

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/iomap/buffered-io.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 347010c6a652..8ce9abb29d46 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -575,6 +575,19 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
return 0;
}

+static void iomap_folio_done(struct iomap_iter *iter, loff_t pos, size_t ret,
+ struct folio *folio)
+{
+ const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
+
+ if (folio)
+ folio_unlock(folio);
+ if (page_ops && page_ops->page_done)
+ page_ops->page_done(iter->inode, pos, ret, &folio->page);
+ if (folio)
+ folio_put(folio);
+}
+
static int iomap_write_begin_inline(const struct iomap_iter *iter,
struct folio *folio)
{
@@ -616,7 +629,8 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
fgp, mapping_gfp_mask(iter->inode->i_mapping));
if (!folio) {
status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
- goto out_no_page;
+ iomap_folio_done(iter, pos, 0, NULL);
+ return status;
}

/*
@@ -656,13 +670,9 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
return 0;

out_unlock:
- folio_unlock(folio);
- folio_put(folio);
+ iomap_folio_done(iter, pos, 0, folio);
iomap_write_failed(iter->inode, pos, len);

-out_no_page:
- if (page_ops && page_ops->page_done)
- page_ops->page_done(iter->inode, pos, 0, NULL);
return status;
}

@@ -712,7 +722,6 @@ static size_t iomap_write_end_inline(const struct iomap_iter *iter,
static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
size_t copied, struct folio *folio)
{
- const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
const struct iomap *srcmap = iomap_iter_srcmap(iter);
loff_t old_size = iter->inode->i_size;
size_t ret;
@@ -736,11 +745,8 @@ static size_t iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
iter->iomap.flags |= IOMAP_F_SIZE_CHANGED;
folio_may_straddle_isize(iter->inode, folio, old_size, pos);
}
- folio_unlock(folio);

- if (page_ops && page_ops->page_done)
- page_ops->page_done(iter->inode, pos, ret, &folio->page);
- folio_put(folio);
+ iomap_folio_done(iter, pos, ret, folio);

if (ret < len)
iomap_write_failed(iter->inode, pos + ret, len - ret);
--
2.38.1

2022-12-18 22:18:14

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v4 4/7] iomap: Add iomap_folio_prepare helper

Add an iomap_folio_prepare() helper that gets a folio reference based on
an iomap iterator and an offset into the address space.

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/iomap/buffered-io.c | 37 ++++++++++++++++++++++++++++---------
include/linux/iomap.h | 1 +
2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 517ad5380a62..93647dae934a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -457,6 +457,31 @@ bool iomap_is_partially_uptodate(struct folio *folio, size_t from, size_t count)
}
EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate);

+/**
+ * iomap_folio_prepare - get a folio reference for writing
+ * @iter: iteration structure
+ * @pos: start offset of write
+ *
+ * Returns a locked reference to the folio at @pos, or an error pointer if the
+ * folio could not be obtained.
+ */
+struct folio *iomap_folio_prepare(struct iomap_iter *iter, loff_t pos)
+{
+ unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
+ struct folio *folio;
+
+ if (iter->flags & IOMAP_NOWAIT)
+ fgp |= FGP_NOWAIT;
+
+ folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
+ fgp, mapping_gfp_mask(iter->inode->i_mapping));
+ if (folio)
+ return folio;
+
+ return ERR_PTR((iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM);
+}
+EXPORT_SYMBOL(iomap_folio_prepare);
+
bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags)
{
trace_iomap_release_folio(folio->mapping->host, folio_pos(folio),
@@ -603,12 +628,8 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
const struct iomap *srcmap = iomap_iter_srcmap(iter);
struct folio *folio;
- unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
int status = 0;

- if (iter->flags & IOMAP_NOWAIT)
- fgp |= FGP_NOWAIT;
-
BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
if (srcmap != &iter->iomap)
BUG_ON(pos + len > srcmap->offset + srcmap->length);
@@ -625,12 +646,10 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
return status;
}

- folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
- fgp, mapping_gfp_mask(iter->inode->i_mapping));
- if (!folio) {
- status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
+ folio = iomap_folio_prepare(iter, pos);
+ if (IS_ERR(folio)) {
iomap_folio_done(iter, pos, 0, NULL);
- return status;
+ return PTR_ERR(folio);
}

/*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 743e2a909162..0bf16ef22d81 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -261,6 +261,7 @@ int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
+struct folio *iomap_folio_prepare(struct iomap_iter *iter, loff_t pos);
bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags);
void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len);
int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
--
2.38.1

2022-12-18 22:18:15

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v4 0/7] Turn iomap_page_ops into iomap_folio_ops

Here's an updated version that changes iomap_folio_prepare() to return
an ERR_PTR() instead of NULL when the folio cannot be obtained as
suggested by Matthew Wilcox.

Thanks,
Andreas

Andreas Gruenbacher (7):
fs: Add folio_may_straddle_isize helper
iomap: Add iomap_folio_done helper
iomap/gfs2: Unlock and put folio in page_done handler
iomap: Add iomap_folio_prepare helper
iomap/gfs2: Get page in page_prepare handler
iomap/xfs: Eliminate the iomap_valid handler
iomap: Rename page_ops to folio_ops

fs/buffer.c | 5 +--
fs/ext4/inode.c | 13 +++---
fs/gfs2/bmap.c | 43 ++++++++++++------
fs/iomap/buffered-io.c | 98 ++++++++++++++++++++++--------------------
fs/xfs/xfs_iomap.c | 42 ++++++++++++------
include/linux/iomap.h | 46 +++++++-------------
include/linux/mm.h | 2 +
mm/truncate.c | 35 +++++++++++++++
8 files changed, 172 insertions(+), 112 deletions(-)

--
2.38.1

2022-12-18 22:18:24

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v4 3/7] iomap/gfs2: Unlock and put folio in page_done handler

When an iomap defines a ->page_done() handler in its page_ops, delegate
unlocking the folio and putting the folio reference to that handler.

This allows to fix a race between journaled data writes and folio
writeback in gfs2: before this change, gfs2_iomap_page_done() was called
after unlocking the folio, so writeback could start writing the folio's
buffers back before they could be marked for writing to the journal.
Also, try_to_free_buffers() could free the buffers before
gfs2_iomap_page_done() was done adding the buffers to the current
current transaction. With this change, gfs2_iomap_page_done() can add
the buffers to the current transaction while the folio is still locked.
It can then unlock the folio and complete the current transaction.

(If we just moved the entire ->page_done() handler under the folio lock,
dirtying the inode could deadlock with the locked folio on filesystems
with a block size smaller than the page size.)

The only current user of ->page_done() is gfs2, so other filesystems are
not affected. Still, to catch out any new users, switch from a page to
a folio in ->page_done().

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/gfs2/bmap.c | 15 ++++++++++++---
fs/iomap/buffered-io.c | 8 ++++----
include/linux/iomap.h | 7 ++++---
3 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 3bdb2c668a71..11115fce68cb 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -971,14 +971,23 @@ static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
}

static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
- unsigned copied, struct page *page)
+ unsigned copied, struct folio *folio)
{
struct gfs2_trans *tr = current->journal_info;
struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_sbd *sdp = GFS2_SB(inode);

- if (page && !gfs2_is_stuffed(ip))
- gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
+ if (!folio) {
+ gfs2_trans_end(sdp);
+ return;
+ }
+
+ if (!gfs2_is_stuffed(ip))
+ gfs2_page_add_databufs(ip, &folio->page, offset_in_page(pos),
+ copied);
+
+ folio_unlock(folio);
+ folio_put(folio);

if (tr->tr_num_buf_new)
__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 8ce9abb29d46..517ad5380a62 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -580,12 +580,12 @@ static void iomap_folio_done(struct iomap_iter *iter, loff_t pos, size_t ret,
{
const struct iomap_page_ops *page_ops = iter->iomap.page_ops;

- if (folio)
+ if (page_ops && page_ops->page_done) {
+ page_ops->page_done(iter->inode, pos, ret, folio);
+ } else if (folio) {
folio_unlock(folio);
- if (page_ops && page_ops->page_done)
- page_ops->page_done(iter->inode, pos, ret, &folio->page);
- if (folio)
folio_put(folio);
+ }
}

static int iomap_write_begin_inline(const struct iomap_iter *iter,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0983dfc9a203..743e2a909162 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -131,13 +131,14 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap)
* associated with them.
*
* When page_prepare succeeds, page_done will always be called to do any
- * cleanup work necessary. In that page_done call, @page will be NULL if the
- * associated page could not be obtained.
+ * cleanup work necessary. In that page_done call, @folio will be NULL if the
+ * associated folio could not be obtained. When folio is not NULL, page_done
+ * is responsible for unlocking and putting the folio.
*/
struct iomap_page_ops {
int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len);
void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
- struct page *page);
+ struct folio *folio);

/*
* Check that the cached iomap still maps correctly to the filesystem's
--
2.38.1

2022-12-18 22:18:26

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v4 7/7] iomap: Rename page_ops to folio_ops

The operations in struct page_ops all operate on folios, so rename
struct page_ops to struct folio_ops, ->page_prepare() to
->folio_prepare(), and ->page_done() to ->folio_done().

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/gfs2/bmap.c | 16 ++++++++--------
fs/iomap/buffered-io.c | 12 ++++++------
fs/xfs/xfs_iomap.c | 8 ++++----
include/linux/iomap.h | 22 +++++++++++-----------
4 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index ed8ceed64100..57282e3720be 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -960,7 +960,7 @@ static int __gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
}

static struct folio *
-gfs2_iomap_page_prepare(struct iomap_iter *iter, loff_t pos, unsigned len)
+gfs2_iomap_folio_prepare(struct iomap_iter *iter, loff_t pos, unsigned len)
{
struct inode *inode = iter->inode;
unsigned int blockmask = i_blocksize(inode) - 1;
@@ -980,8 +980,8 @@ gfs2_iomap_page_prepare(struct iomap_iter *iter, loff_t pos, unsigned len)
return folio;
}

-static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
- unsigned copied, struct folio *folio)
+static void gfs2_iomap_folio_done(struct inode *inode, loff_t pos,
+ unsigned copied, struct folio *folio)
{
struct gfs2_trans *tr = current->journal_info;
struct gfs2_inode *ip = GFS2_I(inode);
@@ -1005,9 +1005,9 @@ static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
gfs2_trans_end(sdp);
}

-static const struct iomap_page_ops gfs2_iomap_page_ops = {
- .page_prepare = gfs2_iomap_page_prepare,
- .page_done = gfs2_iomap_page_done,
+static const struct iomap_folio_ops gfs2_iomap_folio_ops = {
+ .folio_prepare = gfs2_iomap_folio_prepare,
+ .folio_done = gfs2_iomap_folio_done,
};

static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
@@ -1083,7 +1083,7 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
}

if (gfs2_is_stuffed(ip) || gfs2_is_jdata(ip))
- iomap->page_ops = &gfs2_iomap_page_ops;
+ iomap->folio_ops = &gfs2_iomap_folio_ops;
return 0;

out_trans_end:
@@ -1299,7 +1299,7 @@ int gfs2_alloc_extent(struct inode *inode, u64 lblock, u64 *dblock,
/*
* NOTE: Never call gfs2_block_zero_range with an open transaction because it
* uses iomap write to perform its actions, which begin their own transactions
- * (iomap_begin, page_prepare, etc.)
+ * (iomap_begin, folio_prepare, etc.)
*/
static int gfs2_block_zero_range(struct inode *inode, loff_t from,
unsigned int length)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 32a2a287d32c..1d1ca1fb4a88 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -603,10 +603,10 @@ static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos,
static void iomap_folio_done(struct iomap_iter *iter, loff_t pos, size_t ret,
struct folio *folio)
{
- const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
+ const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;

- if (page_ops && page_ops->page_done) {
- page_ops->page_done(iter->inode, pos, ret, folio);
+ if (folio_ops && folio_ops->folio_done) {
+ folio_ops->folio_done(iter->inode, pos, ret, folio);
} else {
folio_unlock(folio);
folio_put(folio);
@@ -625,7 +625,7 @@ static int iomap_write_begin_inline(const struct iomap_iter *iter,
static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
size_t len, struct folio **foliop)
{
- const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
+ const struct iomap_folio_ops *folio_ops = iter->iomap.folio_ops;
const struct iomap *srcmap = iomap_iter_srcmap(iter);
struct folio *folio;
int status;
@@ -640,8 +640,8 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
if (!mapping_large_folio_support(iter->inode->i_mapping))
len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));

- if (page_ops && page_ops->page_prepare)
- folio = page_ops->page_prepare(iter, pos, len);
+ if (folio_ops && folio_ops->folio_prepare)
+ folio = folio_ops->folio_prepare(iter, pos, len);
else
folio = iomap_folio_prepare(iter, pos);
if (IS_ERR(folio)) {
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index ae83cb89279d..04ba368395cc 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -63,7 +63,7 @@ xfs_iomap_inode_sequence(
}

static struct folio *
-xfs_page_prepare(
+xfs_folio_prepare(
struct iomap_iter *iter,
loff_t pos,
unsigned len)
@@ -99,8 +99,8 @@ xfs_page_prepare(
return folio;
}

-const struct iomap_page_ops xfs_iomap_page_ops = {
- .page_prepare = xfs_page_prepare,
+const struct iomap_folio_ops xfs_iomap_folio_ops = {
+ .folio_prepare = xfs_folio_prepare,
};

int
@@ -149,7 +149,7 @@ xfs_bmbt_to_iomap(
iomap->flags |= IOMAP_F_DIRTY;

iomap->validity_cookie = sequence_cookie;
- iomap->page_ops = &xfs_iomap_page_ops;
+ iomap->folio_ops = &xfs_iomap_folio_ops;
return 0;
}

diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 1c8b9a04b0bb..85d360881851 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -86,7 +86,7 @@ struct vm_fault;
*/
#define IOMAP_NULL_ADDR -1ULL /* addr is not valid */

-struct iomap_page_ops;
+struct iomap_folio_ops;

struct iomap {
u64 addr; /* disk offset of mapping, bytes */
@@ -98,7 +98,7 @@ struct iomap {
struct dax_device *dax_dev; /* dax_dev for dax operations */
void *inline_data;
void *private; /* filesystem private */
- const struct iomap_page_ops *page_ops;
+ const struct iomap_folio_ops *folio_ops;
u64 validity_cookie; /* used with .iomap_valid() */
};

@@ -126,19 +126,19 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap)
}

/*
- * When a filesystem sets page_ops in an iomap mapping it returns, page_prepare
- * and page_done will be called for each page written to. This only applies to
- * buffered writes as unbuffered writes will not typically have pages
- * associated with them.
+ * When a filesystem sets folio_ops in an iomap mapping it returns,
+ * folio_prepare and folio_done will be called for each page written to. This
+ * only applies to buffered writes as unbuffered writes will not typically have
+ * pages associated with them.
*
- * When page_prepare succeeds, page_done will always be called to do any
- * cleanup work necessary. page_done is responsible for unlocking and putting
+ * When folio_prepare succeeds, folio_done will always be called to do any
+ * cleanup work necessary. folio_done is responsible for unlocking and putting
* @folio.
*/
-struct iomap_page_ops {
- struct folio *(*page_prepare)(struct iomap_iter *iter, loff_t pos,
+struct iomap_folio_ops {
+ struct folio *(*folio_prepare)(struct iomap_iter *iter, loff_t pos,
unsigned len);
- void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
+ void (*folio_done)(struct inode *inode, loff_t pos, unsigned copied,
struct folio *folio);
};

--
2.38.1

2022-12-18 22:18:47

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v4 5/7] iomap/gfs2: Get page in page_prepare handler

Change the iomap ->page_prepare() handler to get and return a locked
folio instead of doing that in iomap_write_begin(). This allows to
recover from out-of-memory situations in ->page_prepare(), which
eliminates the corresponding error handling code in iomap_write_begin().
The ->page_done() handler is now not called with a NULL folio anymore.

Filesystems are expected to use the iomap_folio_prepare() helper for
getting locked folios in their ->page_prepare() handlers.

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/gfs2/bmap.c | 16 +++++++++++++---
fs/iomap/buffered-io.c | 17 ++++++-----------
include/linux/iomap.h | 9 +++++----
3 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 11115fce68cb..ed8ceed64100 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -959,15 +959,25 @@ static int __gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
goto out;
}

-static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
- unsigned len)
+static struct folio *
+gfs2_iomap_page_prepare(struct iomap_iter *iter, loff_t pos, unsigned len)
{
+ struct inode *inode = iter->inode;
unsigned int blockmask = i_blocksize(inode) - 1;
struct gfs2_sbd *sdp = GFS2_SB(inode);
unsigned int blocks;
+ struct folio *folio;
+ int status;

blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits;
- return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
+ status = gfs2_trans_begin(sdp, RES_DINODE + blocks, 0);
+ if (status)
+ return ERR_PTR(status);
+
+ folio = iomap_folio_prepare(iter, pos);
+ if (IS_ERR(folio))
+ gfs2_trans_end(sdp);
+ return folio;
}

static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 93647dae934a..819562633998 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -607,7 +607,7 @@ static void iomap_folio_done(struct iomap_iter *iter, loff_t pos, size_t ret,

if (page_ops && page_ops->page_done) {
page_ops->page_done(iter->inode, pos, ret, folio);
- } else if (folio) {
+ } else {
folio_unlock(folio);
folio_put(folio);
}
@@ -640,17 +640,12 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
if (!mapping_large_folio_support(iter->inode->i_mapping))
len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos));

- if (page_ops && page_ops->page_prepare) {
- status = page_ops->page_prepare(iter->inode, pos, len);
- if (status)
- return status;
- }
-
- folio = iomap_folio_prepare(iter, pos);
- if (IS_ERR(folio)) {
- iomap_folio_done(iter, pos, 0, NULL);
+ if (page_ops && page_ops->page_prepare)
+ folio = page_ops->page_prepare(iter, pos, len);
+ else
+ folio = iomap_folio_prepare(iter, pos);
+ if (IS_ERR(folio))
return PTR_ERR(folio);
- }

/*
* Now we have a locked folio, before we do anything with it we need to
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0bf16ef22d81..c74ab8c53b47 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -13,6 +13,7 @@
struct address_space;
struct fiemap_extent_info;
struct inode;
+struct iomap_iter;
struct iomap_dio;
struct iomap_writepage_ctx;
struct iov_iter;
@@ -131,12 +132,12 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap)
* associated with them.
*
* When page_prepare succeeds, page_done will always be called to do any
- * cleanup work necessary. In that page_done call, @folio will be NULL if the
- * associated folio could not be obtained. When folio is not NULL, page_done
- * is responsible for unlocking and putting the folio.
+ * cleanup work necessary. page_done is responsible for unlocking and putting
+ * @folio.
*/
struct iomap_page_ops {
- int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len);
+ struct folio *(*page_prepare)(struct iomap_iter *iter, loff_t pos,
+ unsigned len);
void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
struct folio *folio);

--
2.38.1

2022-12-18 22:20:00

by Andreas Gruenbacher

[permalink] [raw]
Subject: [RFC v4 6/7] iomap/xfs: Eliminate the iomap_valid handler

Eliminate the ->iomap_valid() handler by switching to a ->page_prepare()
handler and validating the mapping there.

Signed-off-by: Andreas Gruenbacher <[email protected]>
---
fs/iomap/buffered-io.c | 25 +++++--------------------
fs/xfs/xfs_iomap.c | 38 +++++++++++++++++++++++++++-----------
include/linux/iomap.h | 17 -----------------
3 files changed, 32 insertions(+), 48 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 819562633998..32a2a287d32c 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -628,7 +628,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
const struct iomap *srcmap = iomap_iter_srcmap(iter);
struct folio *folio;
- int status = 0;
+ int status;

BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
if (srcmap != &iter->iomap)
@@ -644,27 +644,12 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
folio = page_ops->page_prepare(iter, pos, len);
else
folio = iomap_folio_prepare(iter, pos);
- if (IS_ERR(folio))
- return PTR_ERR(folio);
-
- /*
- * Now we have a locked folio, before we do anything with it we need to
- * check that the iomap we have cached is not stale. The inode extent
- * mapping can change due to concurrent IO in flight (e.g.
- * IOMAP_UNWRITTEN state can change and memory reclaim could have
- * reclaimed a previously partially written page at this index after IO
- * completion before this write reaches this file offset) and hence we
- * could do the wrong thing here (zero a page range incorrectly or fail
- * to zero) and corrupt data.
- */
- if (page_ops && page_ops->iomap_valid) {
- bool iomap_valid = page_ops->iomap_valid(iter->inode,
- &iter->iomap);
- if (!iomap_valid) {
+ if (IS_ERR(folio)) {
+ if (folio == ERR_PTR(-ESTALE)) {
iter->iomap.flags |= IOMAP_F_STALE;
- status = 0;
- goto out_unlock;
+ return 0;
}
+ return PTR_ERR(folio);
}

if (pos + len > folio_pos(folio) + folio_size(folio))
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 669c1bc5c3a7..ae83cb89279d 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -62,29 +62,45 @@ xfs_iomap_inode_sequence(
return cookie | READ_ONCE(ip->i_df.if_seq);
}

-/*
- * Check that the iomap passed to us is still valid for the given offset and
- * length.
- */
-static bool
-xfs_iomap_valid(
- struct inode *inode,
- const struct iomap *iomap)
+static struct folio *
+xfs_page_prepare(
+ struct iomap_iter *iter,
+ loff_t pos,
+ unsigned len)
{
+ struct inode *inode = iter->inode;
+ struct iomap *iomap = &iter->iomap;
struct xfs_inode *ip = XFS_I(inode);
+ struct folio *folio;

+ folio = iomap_folio_prepare(iter, pos);
+ if (IS_ERR(folio))
+ return folio;
+
+ /*
+ * Now we have a locked folio, before we do anything with it we need to
+ * check that the iomap we have cached is not stale. The inode extent
+ * mapping can change due to concurrent IO in flight (e.g.
+ * IOMAP_UNWRITTEN state can change and memory reclaim could have
+ * reclaimed a previously partially written page at this index after IO
+ * completion before this write reaches this file offset) and hence we
+ * could do the wrong thing here (zero a page range incorrectly or fail
+ * to zero) and corrupt data.
+ */
if (iomap->validity_cookie !=
xfs_iomap_inode_sequence(ip, iomap->flags)) {
trace_xfs_iomap_invalid(ip, iomap);
- return false;
+ folio_unlock(folio);
+ folio_put(folio);
+ return ERR_PTR(-ESTALE);
}

XFS_ERRORTAG_DELAY(ip->i_mount, XFS_ERRTAG_WRITE_DELAY_MS);
- return true;
+ return folio;
}

const struct iomap_page_ops xfs_iomap_page_ops = {
- .iomap_valid = xfs_iomap_valid,
+ .page_prepare = xfs_page_prepare,
};

int
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index c74ab8c53b47..1c8b9a04b0bb 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -140,23 +140,6 @@ struct iomap_page_ops {
unsigned len);
void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
struct folio *folio);
-
- /*
- * Check that the cached iomap still maps correctly to the filesystem's
- * internal extent map. FS internal extent maps can change while iomap
- * is iterating a cached iomap, so this hook allows iomap to detect that
- * the iomap needs to be refreshed during a long running write
- * operation.
- *
- * The filesystem can store internal state (e.g. a sequence number) in
- * iomap->validity_cookie when the iomap is first mapped to be able to
- * detect changes between mapping time and whenever .iomap_valid() is
- * called.
- *
- * This is called with the folio over the specified file position held
- * locked by the iomap code.
- */
- bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
};

/*
--
2.38.1

2022-12-23 15:06:58

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC v3 1/7] fs: Add folio_may_straddle_isize helper

On Fri, Dec 16, 2022 at 04:06:20PM +0100, Andreas Gruenbacher wrote:
> Add a folio_may_straddle_isize() helper as a replacement for
> pagecache_isize_extended() when we have a locked folio.

I find the naming very confusing. Any good reason to not follow
the naming of pagecache_isize_extended an call it
folio_isize_extended?

> Use the new helper in generic_write_end(), iomap_write_end(),
> ext4_write_end(), and ext4_journalled_write_end().

Please split this into a patch per caller in addition to the one
adding the helper, and write commit logs explaining the rationale
for the helper. The obious ones I'm trying to guess are that
the new helper avoid a page cache radix tree lookup and a lock
page/folio cycle, but I'd rather hear that from the horses mouth
in the commit log.

> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -2164,16 +2164,15 @@ int generic_write_end(struct file *file, struct address_space *mapping,
> * But it's important to update i_size while still holding page lock:
> * page writeout could otherwise come in and zero beyond i_size.
> */
> - if (pos + copied > inode->i_size) {
> + if (pos + copied > old_size) {

This is and unrelated and undocument (but useful) change. Please split
it out as well.

> + * This function must be called while we still hold i_rwsem - this not only
> + * makes sure i_size is stable but also that userspace cannot observe the new
> + * i_size value before we are prepared to handle mmap writes there.

Please add a lockdep_assert_held_write to enforce that.

> +void folio_may_straddle_isize(struct inode *inode, struct folio *folio,
> + loff_t old_size, loff_t start)
> +{
> + unsigned int blocksize = i_blocksize(inode);
> +
> + if (round_up(old_size, blocksize) >= round_down(start, blocksize))
> + return;
> +
> + /*
> + * See clear_page_dirty_for_io() for details why folio_set_dirty()
> + * is needed.
> + */
> + if (folio_mkclean(folio))
> + folio_set_dirty(folio);

Should pagecache_isize_extended be rewritten to use this helper,
i.e. turn this into a factoring out of a helper?

> +EXPORT_SYMBOL(folio_may_straddle_isize);

Please make this an EXPORT_SYMBOL_GPL just like folio_mkclean.

2022-12-23 15:12:43

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC v3 2/7] iomap: Add iomap_folio_done helper

On Fri, Dec 16, 2022 at 04:06:21PM +0100, Andreas Gruenbacher wrote:
> +static void iomap_folio_done(struct iomap_iter *iter, loff_t pos, size_t ret,
> + struct folio *folio)
> +{
> + const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
> +
> + if (folio)
> + folio_unlock(folio);
> + if (page_ops && page_ops->page_done)
> + page_ops->page_done(iter->inode, pos, ret, &folio->page);
> + if (folio)
> + folio_put(folio);
> +}

How is the folio derefence going to work if folio is NULL?

That being said, I really wonder if the current API is the right way to
go. Can't we just have a ->get_folio method with the same signature as
__filemap_get_folio, and then do the __filemap_get_folio from the file
system and avoid the page/folio == NULL clean path entirely? Then on
the done side move the unlock and put into the done method as well.

> if (!folio) {
> status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
> - goto out_no_page;
> + iomap_folio_done(iter, pos, 0, NULL);
> + return status;
> }
>
> /*
> @@ -656,13 +670,9 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> return 0;
>
> out_unlock:
> - folio_unlock(folio);
> - folio_put(folio);
> + iomap_folio_done(iter, pos, 0, folio);
> iomap_write_failed(iter->inode, pos, len);
>
> -out_no_page:
> - if (page_ops && page_ops->page_done)
> - page_ops->page_done(iter->inode, pos, 0, NULL);
> return status;

But for the current version I don't really understand why the error
unwinding changes here.

2022-12-23 15:17:41

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC v3 3/7] iomap/gfs2: Unlock and put folio in page_done handler

A yes, this is what I meant in the second half of the last patch.
The page_done name is a bit misleading now, though.

2022-12-23 15:22:22

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC v3 4/7] iomap: Add iomap_folio_prepare helper

> +struct folio *iomap_folio_prepare(struct iomap_iter *iter, loff_t pos)
> +{
> + unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
> +
> + if (iter->flags & IOMAP_NOWAIT)
> + fgp |= FGP_NOWAIT;
> +
> + return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
> + fgp, mapping_gfp_mask(iter->inode->i_mapping));
> +}
> +EXPORT_SYMBOL(iomap_folio_prepare);

I'd name this __iomap_get_folio to match __filemap_get_folio.
And all iomap exports are EXPORT_SYMBOL_GPL.

2022-12-23 15:24:06

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC v3 5/7] iomap: Get page in page_prepare handler

On Fri, Dec 16, 2022 at 04:06:24PM +0100, Andreas Gruenbacher wrote:
> Change the iomap ->page_prepare() handler to get and return a locked
> folio instead of doing that in iomap_write_begin(). This allows to
> recover from out-of-memory situations in ->page_prepare(), which
> eliminates the corresponding error handling code in iomap_write_begin().
> The ->page_done() handler is now not called with a NULL folio anymore.

Ah, okay - this is the other half of what I asked for earlier, so
we're aligned. Sorry for the noise earlier. I'd still prefer the
naming I suggest, though.

> + if (page_ops && page_ops->page_prepare)
> + folio = page_ops->page_prepare(iter, pos, len);
> + else
> + folio = iomap_folio_prepare(iter, pos);
> + if (IS_ERR_OR_NULL(folio)) {
> + if (!folio)
> + return (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
> + return PTR_ERR(folio);
> }

Maybe encapsulate this in a iomap_get_folio wrapper just to keep the
symmetry with the done side.

2022-12-23 15:24:44

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC v3 6/7] iomap/xfs: Eliminate the iomap_valid handler

On Fri, Dec 16, 2022 at 04:06:25PM +0100, Andreas Gruenbacher wrote:
> Eliminate the ->iomap_valid() handler by switching to a ->page_prepare()
> handler and validating the mapping there.
>
> Signed-off-by: Andreas Gruenbacher <[email protected]>
> ---
> fs/iomap/buffered-io.c | 24 ++++--------------------
> fs/xfs/xfs_iomap.c | 38 +++++++++++++++++++++++++++-----------
> include/linux/iomap.h | 17 -----------------
> 3 files changed, 31 insertions(+), 48 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 6b7c1a10b8ec..b73ff317da21 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -623,7 +623,7 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
> const struct iomap *srcmap = iomap_iter_srcmap(iter);
> struct folio *folio;
> - int status = 0;
> + int status;
>
> BUG_ON(pos + len > iter->iomap.offset + iter->iomap.length);
> if (srcmap != &iter->iomap)
> @@ -642,27 +642,11 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> if (IS_ERR_OR_NULL(folio)) {
> if (!folio)
> return (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
> + if (folio == ERR_PTR(-ESTALE)) {
> iter->iomap.flags |= IOMAP_F_STALE;
> + return 0;
> }
> + return PTR_ERR(folio);
> }
>
> if (pos + len > folio_pos(folio) + folio_size(folio))
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 669c1bc5c3a7..2248ce7be2e3 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -62,29 +62,45 @@ xfs_iomap_inode_sequence(
> return cookie | READ_ONCE(ip->i_df.if_seq);
> }
>
> -/*
> - * Check that the iomap passed to us is still valid for the given offset and
> - * length.
> - */
> -static bool
> -xfs_iomap_valid(
> - struct inode *inode,
> - const struct iomap *iomap)
> +static struct folio *
> +xfs_page_prepare(
> + struct iomap_iter *iter,
> + loff_t pos,
> + unsigned len)
> {
> + struct inode *inode = iter->inode;
> + struct iomap *iomap = &iter->iomap;
> struct xfs_inode *ip = XFS_I(inode);
> + struct folio *folio;

Please tab align this like the other variable declarations above.

> - /*
> - * Check that the cached iomap still maps correctly to the filesystem's
> - * internal extent map. FS internal extent maps can change while iomap
> - * is iterating a cached iomap, so this hook allows iomap to detect that
> - * the iomap needs to be refreshed during a long running write
> - * operation.
> - *
> - * The filesystem can store internal state (e.g. a sequence number) in
> - * iomap->validity_cookie when the iomap is first mapped to be able to
> - * detect changes between mapping time and whenever .iomap_valid() is
> - * called.
> - *
> - * This is called with the folio over the specified file position held
> - * locked by the iomap code.
> - */

We'll still need to capture this information somewhere. I'd suggest
to move it to the prepare/get method and reword it so that this is
mentioned as an additional use case / requirement.

2022-12-23 20:57:32

by Andreas Grünbacher

[permalink] [raw]
Subject: Re: [RFC v3 2/7] iomap: Add iomap_folio_done helper

Am Fr., 23. Dez. 2022 um 16:12 Uhr schrieb Christoph Hellwig
<[email protected]>:
> On Fri, Dec 16, 2022 at 04:06:21PM +0100, Andreas Gruenbacher wrote:
> > +static void iomap_folio_done(struct iomap_iter *iter, loff_t pos, size_t ret,
> > + struct folio *folio)
> > +{
> > + const struct iomap_page_ops *page_ops = iter->iomap.page_ops;
> > +
> > + if (folio)
> > + folio_unlock(folio);
> > + if (page_ops && page_ops->page_done)
> > + page_ops->page_done(iter->inode, pos, ret, &folio->page);
> > + if (folio)
> > + folio_put(folio);
> > +}
>
> How is the folio dereference going to work if folio is NULL?

'&folio->page' is effectively a type cast, not a dereference. I
realize iomap_folio_done() as introduced here is not pretty, but it's
only an intermediary step and the ugliness goes away later in this
series.

> That being said, I really wonder if the current API is the right way to
> go. Can't we just have a ->get_folio method with the same signature as
> __filemap_get_folio, and then do the __filemap_get_folio from the file
> system and avoid the page/folio == NULL clean path entirely? Then on
> the done side move the unlock and put into the done method as well.

Yes, this is what happens later in this series (as you've seen by now).

> > if (!folio) {
> > status = (iter->flags & IOMAP_NOWAIT) ? -EAGAIN : -ENOMEM;
> > - goto out_no_page;
> > + iomap_folio_done(iter, pos, 0, NULL);
> > + return status;
> > }
> >
> > /*
> > @@ -656,13 +670,9 @@ static int iomap_write_begin(struct iomap_iter *iter, loff_t pos,
> > return 0;
> >
> > out_unlock:
> > - folio_unlock(folio);
> > - folio_put(folio);
> > + iomap_folio_done(iter, pos, 0, folio);
> > iomap_write_failed(iter->inode, pos, len);
> >
> > -out_no_page:
> > - if (page_ops && page_ops->page_done)
> > - page_ops->page_done(iter->inode, pos, 0, NULL);
> > return status;
>
> But for the current version I don't really understand why the error
> unwinding changes here.

Currently, we have this order of operations in iomap_write_begin():

folio_unlock() // folio_put() // iomap_write_failed() // ->page_done()

and this order in iomap_write_end():

folio_unlock() // ->page_done() // folio_put() // iomap_write_failed()

The unwinding in iomap_write_begin() works because this is the trivial
case in which nothing happens to the page. We might just as well use
the same order of operations there as in iomap_write_end() though, and
when you switch to that, this is what you get.

Thank you for the review.

Andreas

2022-12-23 21:12:20

by Andreas Grünbacher

[permalink] [raw]
Subject: Re: [RFC v3 4/7] iomap: Add iomap_folio_prepare helper

Am Fr., 23. Dez. 2022 um 16:22 Uhr schrieb Christoph Hellwig
<[email protected]>:
> > +struct folio *iomap_folio_prepare(struct iomap_iter *iter, loff_t pos)
> > +{
> > + unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS;
> > +
> > + if (iter->flags & IOMAP_NOWAIT)
> > + fgp |= FGP_NOWAIT;
> > +
> > + return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT,
> > + fgp, mapping_gfp_mask(iter->inode->i_mapping));
> > +}
> > +EXPORT_SYMBOL(iomap_folio_prepare);
>
> I'd name this __iomap_get_folio to match __filemap_get_folio.

I was looking at it from the filesystem point of view: this helper is
meant to be used in ->folio_prepare() handlers, so
iomap_folio_prepare() seemed to be a better name than
__iomap_get_folio().

> And all iomap exports are EXPORT_SYMBOL_GPL.

Sure.

Thanks,
Andreas

2022-12-23 22:26:47

by Andreas Grünbacher

[permalink] [raw]
Subject: Re: [RFC v3 1/7] fs: Add folio_may_straddle_isize helper

Am Fr., 23. Dez. 2022 um 16:06 Uhr schrieb Christoph Hellwig
<[email protected]>:
> On Fri, Dec 16, 2022 at 04:06:20PM +0100, Andreas Gruenbacher wrote:
> > Add a folio_may_straddle_isize() helper as a replacement for
> > pagecache_isize_extended() when we have a locked folio.
>
> I find the naming very confusing. Any good reason to not follow
> the naming of pagecache_isize_extended an call it
> folio_isize_extended?

A good reason for a different name is because
folio_may_straddle_isize() requires a locked folio, while
pagecache_isize_extended() will fail if the folio is still locked. So
this doesn't follow the usual "replace 'page' with 'folio'" pattern.

> > Use the new helper in generic_write_end(), iomap_write_end(),
> > ext4_write_end(), and ext4_journalled_write_end().
>
> Please split this into a patch per caller in addition to the one
> adding the helper, and write commit logs explaining the rationale
> for the helper. The obious ones I'm trying to guess are that
> the new helper avoid a page cache radix tree lookup and a lock
> page/folio cycle, but I'd rather hear that from the horses mouth
> in the commit log.

Yes, that's what the horse says.

> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -2164,16 +2164,15 @@ int generic_write_end(struct file *file, struct address_space *mapping,
> > * But it's important to update i_size while still holding page lock:
> > * page writeout could otherwise come in and zero beyond i_size.
> > */
> > - if (pos + copied > inode->i_size) {
> > + if (pos + copied > old_size) {
>
> This is and unrelated and undocument (but useful) change. Please split
> it out as well.
>
> > + * This function must be called while we still hold i_rwsem - this not only
> > + * makes sure i_size is stable but also that userspace cannot observe the new
> > + * i_size value before we are prepared to handle mmap writes there.
>
> Please add a lockdep_assert_held_write to enforce that.
>
> > +void folio_may_straddle_isize(struct inode *inode, struct folio *folio,
> > + loff_t old_size, loff_t start)
> > +{
> > + unsigned int blocksize = i_blocksize(inode);
> > +
> > + if (round_up(old_size, blocksize) >= round_down(start, blocksize))
> > + return;
> > +
> > + /*
> > + * See clear_page_dirty_for_io() for details why folio_set_dirty()
> > + * is needed.
> > + */
> > + if (folio_mkclean(folio))
> > + folio_set_dirty(folio);
>
> Should pagecache_isize_extended be rewritten to use this helper,
> i.e. turn this into a factoring out of a helper?

I'm not really sure about that. The boundary conditions in the two
functions are not identical. I think the logic in
folio_may_straddle_isize() is sufficient for the
extending-write-under-folio-lock case, but I'd still need confirmation
for that. If the same logic would also be enough in
pagecache_isize_extended() is more unclear to me.

> > +EXPORT_SYMBOL(folio_may_straddle_isize);
>
> Please make this an EXPORT_SYMBOL_GPL just like folio_mkclean.

Thanks,
Andreas

2022-12-24 07:24:40

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC v3 1/7] fs: Add folio_may_straddle_isize helper

On Fri, Dec 23, 2022 at 11:04:51PM +0100, Andreas Gr?nbacher wrote:
> > I find the naming very confusing. Any good reason to not follow
> > the naming of pagecache_isize_extended an call it
> > folio_isize_extended?
>
> A good reason for a different name is because
> folio_may_straddle_isize() requires a locked folio, while
> pagecache_isize_extended() will fail if the folio is still locked. So
> this doesn't follow the usual "replace 'page' with 'folio'" pattern.

pagecache also doesn't say page, it says pagecache. I'd still prepfer
to keep the postfix the same. And I think the fact that it needs
a locked folio should also have an assert, which both documents this
and catches errors. I think that's much better than an arbitrarily
different name.

> > Should pagecache_isize_extended be rewritten to use this helper,
> > i.e. turn this into a factoring out of a helper?
>
> I'm not really sure about that. The boundary conditions in the two
> functions are not identical. I think the logic in
> folio_may_straddle_isize() is sufficient for the
> extending-write-under-folio-lock case, but I'd still need confirmation
> for that. If the same logic would also be enough in
> pagecache_isize_extended() is more unclear to me.

That's another thing that really needs to into the commit log,
why is the condition different and pagecache_isize_extended can't
just be extended for it (if it really can't).

2022-12-24 07:39:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC v3 2/7] iomap: Add iomap_folio_done helper

On Fri, Dec 23, 2022 at 09:54:34PM +0100, Andreas Gr?nbacher wrote:
> > But for the current version I don't really understand why the error
> > unwinding changes here.
>
> Currently, we have this order of operations in iomap_write_begin():
>
> folio_unlock() // folio_put() // iomap_write_failed() // ->page_done()
>
> and this order in iomap_write_end():
>
> folio_unlock() // ->page_done() // folio_put() // iomap_write_failed()
>
> The unwinding in iomap_write_begin() works because this is the trivial
> case in which nothing happens to the page. We might just as well use
> the same order of operations there as in iomap_write_end() though, and
> when you switch to that, this is what you get.

Please document this in the commit message.

2022-12-24 07:48:39

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC v3 4/7] iomap: Add iomap_folio_prepare helper

On Fri, Dec 23, 2022 at 10:05:05PM +0100, Andreas Gr?nbacher wrote:
> > I'd name this __iomap_get_folio to match __filemap_get_folio.
>
> I was looking at it from the filesystem point of view: this helper is
> meant to be used in ->folio_prepare() handlers, so
> iomap_folio_prepare() seemed to be a better name than
> __iomap_get_folio().

Well, I think the right name for the methods that gets a folio is
probably ->folio_get anyway.

2022-12-25 10:29:49

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC v3 4/7] iomap: Add iomap_folio_prepare helper

On Fri, Dec 23, 2022 at 11:23:34PM -0800, Christoph Hellwig wrote:
> On Fri, Dec 23, 2022 at 10:05:05PM +0100, Andreas Gr?nbacher wrote:
> > > I'd name this __iomap_get_folio to match __filemap_get_folio.
> >
> > I was looking at it from the filesystem point of view: this helper is
> > meant to be used in ->folio_prepare() handlers, so
> > iomap_folio_prepare() seemed to be a better name than
> > __iomap_get_folio().
>
> Well, I think the right name for the methods that gets a folio is
> probably ->folio_get anyway.

For the a_ops, the convention I've been following is:

folio_mark_dirty()
-> aops->dirty_folio()
-> iomap_dirty_folio()

ie VERB_folio() as the name of the operation, and MODULE_VERB_folio()
as the implementation. Seems to work pretty well.

2022-12-28 15:56:48

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC v3 4/7] iomap: Add iomap_folio_prepare helper

On Sun, Dec 25, 2022 at 09:12:34AM +0000, Matthew Wilcox wrote:
> > > I was looking at it from the filesystem point of view: this helper is
> > > meant to be used in ->folio_prepare() handlers, so
> > > iomap_folio_prepare() seemed to be a better name than
> > > __iomap_get_folio().
> >
> > Well, I think the right name for the methods that gets a folio is
> > probably ->folio_get anyway.
>
> For the a_ops, the convention I've been following is:
>
> folio_mark_dirty()
> -> aops->dirty_folio()
> -> iomap_dirty_folio()
>
> ie VERB_folio() as the name of the operation, and MODULE_VERB_folio()
> as the implementation. Seems to work pretty well.

Yeay, ->get_folio sounds fine if not even better as it matches
filemap_get_folio.