The patch fixes a race between mmap-ed write and fallocate(PUNCH_HOLE):
1) An user makes a page dirty via mmap-ed write.
2) The user performs fallocate(2) with mode == PUNCH_HOLE|KEEP_SIZE
and <offset, size> covering the page.
3) Before truncate_pagecache_range call from fuse_file_fallocate,
the page goes to write-back. The page is fully processed by fuse_writepage
(including end_page_writeback on the page), but fuse_flush_writepages did
nothing because fi->writectr < 0.
4) truncate_pagecache_range is called and fuse_file_fallocate is finishing
by calling fuse_release_nowrite. The latter triggers processing queued
write-back request which will write stale data to the hole soon.
Changed in v2 (thanks to Brian for suggestion):
- Do not truncate page cache until FUSE_FALLOCATE succeeded. Otherwise,
we can end up in returning -ENOTSUPP while user data is already punched
from page cache. Use filemap_write_and_wait_range() instead.
Changed in v3 (thanks to Miklos for suggestion):
- fuse_wait_on_writeback() is prone to livelocks; use fuse_set_nowrite()
instead. So far as we need a dirty-page barrier only, fuse_sync_writes()
should be enough.
- rebased to for-linus branch of fuse.git
Signed-off-by: Maxim Patlasov <[email protected]>
---
fs/fuse/file.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index d409dea..f9f07c4 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2484,8 +2484,15 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
if (lock_inode) {
mutex_lock(&inode->i_mutex);
- if (mode & FALLOC_FL_PUNCH_HOLE)
- fuse_set_nowrite(inode);
+ if (mode & FALLOC_FL_PUNCH_HOLE) {
+ loff_t endbyte = offset + length - 1;
+ err = filemap_write_and_wait_range(inode->i_mapping,
+ offset, endbyte);
+ if (err)
+ goto out;
+
+ fuse_sync_writes(inode);
+ }
}
req = fuse_get_req_nopages(fc);
@@ -2520,11 +2527,8 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
fuse_invalidate_attr(inode);
out:
- if (lock_inode) {
- if (mode & FALLOC_FL_PUNCH_HOLE)
- fuse_release_nowrite(inode);
+ if (lock_inode)
mutex_unlock(&inode->i_mutex);
- }
return err;
}
A former patch introducing FUSE_I_SIZE_UNSTABLE flag provided detailed
description of races between ftruncate and anyone who can extend i_size:
> 1. As in the previous scenario fuse_dentry_revalidate() discovered that i_size
> changed (due to our own fuse_do_setattr()) and is going to call
> truncate_pagecache() for some 'new_size' it believes valid right now. But by
> the time that particular truncate_pagecache() is called ...
> 2. fuse_do_setattr() returns (either having called truncate_pagecache() or
> not -- it doesn't matter).
> 3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
> 4. mmap-ed write makes a page in the extended region dirty.
This patch adds necessary bits to fuse_file_fallocate() to protect from that
race.
Signed-off-by: Maxim Patlasov <[email protected]>
---
fs/fuse/file.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index f9f07c4..4598345 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2467,6 +2467,7 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
{
struct fuse_file *ff = file->private_data;
struct inode *inode = file->f_inode;
+ struct fuse_inode *fi = get_fuse_inode(inode);
struct fuse_conn *fc = ff->fc;
struct fuse_req *req;
struct fuse_fallocate_in inarg = {
@@ -2495,6 +2496,9 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
}
}
+ if (!(mode & FALLOC_FL_KEEP_SIZE))
+ set_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
+
req = fuse_get_req_nopages(fc);
if (IS_ERR(req)) {
err = PTR_ERR(req);
@@ -2527,6 +2531,9 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
fuse_invalidate_attr(inode);
out:
+ if (!(mode & FALLOC_FL_KEEP_SIZE))
+ clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
+
if (lock_inode)
mutex_unlock(&inode->i_mutex);