The patch is for
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git writepages.v2
The patch fixes a race between ftruncate(2), mmap-ed write and write(2):
1) An user makes a page dirty via mmap-ed write.
2) The user performs shrinking truncate(2) intended to purge the page.
3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
writeback. fuse_writepages_fill attaches a new page to FUSE_WRITE request,
then releases the original page by end_page_writeback and unlock it.
4) fuse_do_setattr completes and successfully returns. Since now, i_mutex
is free.
5) Ordinary write(2) extends i_size back to cover the page. Note that
fuse_send_write_pages do wait for fuse writeback, but for another
page->index.
6) fuse_writepages_fill attaches more pages to the request (if any), then
fuse_writepages_send is eventually called. It is supposed to crop
inarg->size of the request, but it doesn't because i_size has already been
extended back.
Moving end_page_writeback behind fuse_writepages_send guarantees that
__fuse_release_nowrite (called from fuse_do_setattr) will crop inarg->size
of the request before write(2) gets the chance to extend i_size.
Signed-off-by: Maxim Patlasov <[email protected]>
---
fs/fuse/file.c | 17 ++++++++++++++++-
1 files changed, 16 insertions(+), 1 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 568e859..0ebcc79 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1583,6 +1583,7 @@ struct fuse_fill_wb_data {
struct fuse_req *req;
struct fuse_file *ff;
struct inode *inode;
+ struct page **orig_pages;
};
static void fuse_writepages_send(struct fuse_fill_wb_data *data)
@@ -1591,12 +1592,17 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data)
struct inode *inode = data->inode;
struct fuse_conn *fc = get_fuse_conn(inode);
struct fuse_inode *fi = get_fuse_inode(inode);
+ int num_pages = req->num_pages;
+ int i;
req->ff = fuse_file_get(data->ff);
spin_lock(&fc->lock);
list_add_tail(&req->list, &fi->queued_writes);
fuse_flush_writepages(inode);
spin_unlock(&fc->lock);
+
+ for (i = 0; i < num_pages; i++)
+ end_page_writeback(data->orig_pages[i]);
}
static int fuse_writepages_fill(struct page *page,
@@ -1677,7 +1683,7 @@ static int fuse_writepages_fill(struct page *page,
inc_bdi_stat(page->mapping->backing_dev_info, BDI_WRITEBACK);
inc_zone_page_state(tmp_page, NR_WRITEBACK_TEMP);
- end_page_writeback(page);
+ data->orig_pages[req->num_pages] = page;
/*
* Protected by fc->lock against concurrent access by
@@ -1709,6 +1715,13 @@ static int fuse_writepages(struct address_space *mapping,
data.req = NULL;
data.ff = NULL;
+ err = -ENOMEM;
+ data.orig_pages = kzalloc(sizeof(struct page *) *
+ FUSE_MAX_PAGES_PER_REQ,
+ GFP_NOFS);
+ if (!data.orig_pages)
+ goto out;
+
err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data);
if (data.req) {
/* Ignore errors if we can write at least one page */
@@ -1718,6 +1731,8 @@ static int fuse_writepages(struct address_space *mapping,
}
if (data.ff)
fuse_file_put(data.ff, false);
+
+ kfree(data.orig_pages);
out:
return err;
}
On Fri, Aug 16, 2013 at 03:51:41PM +0400, Maxim Patlasov wrote:
> The patch is for
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git writepages.v2
>
> The patch fixes a race between ftruncate(2), mmap-ed write and write(2):
>
> 1) An user makes a page dirty via mmap-ed write.
> 2) The user performs shrinking truncate(2) intended to purge the page.
> 3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
> writeback. fuse_writepages_fill attaches a new page to FUSE_WRITE request,
> then releases the original page by end_page_writeback and unlock it.
> 4) fuse_do_setattr completes and successfully returns. Since now, i_mutex
> is free.
> 5) Ordinary write(2) extends i_size back to cover the page. Note that
> fuse_send_write_pages do wait for fuse writeback, but for another
> page->index.
> 6) fuse_writepages_fill attaches more pages to the request (if any), then
> fuse_writepages_send is eventually called. It is supposed to crop
> inarg->size of the request, but it doesn't because i_size has already been
> extended back.
>
> Moving end_page_writeback behind fuse_writepages_send guarantees that
> __fuse_release_nowrite (called from fuse_do_setattr) will crop inarg->size
> of the request before write(2) gets the chance to extend i_size.
Thanks for the report. Your analysis looks correct.
Just one nit, why orig_pages? req->pages is already there, so why duplicate it?
Note: you can do __fuse_get_request()/fuse_put_request() to prevent the req from
going away after it's been sent.
Thanks,
Miklos
>
> Signed-off-by: Maxim Patlasov <[email protected]>
> ---
> fs/fuse/file.c | 17 ++++++++++++++++-
> 1 files changed, 16 insertions(+), 1 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 568e859..0ebcc79 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1583,6 +1583,7 @@ struct fuse_fill_wb_data {
> struct fuse_req *req;
> struct fuse_file *ff;
> struct inode *inode;
> + struct page **orig_pages;
> };
>
> static void fuse_writepages_send(struct fuse_fill_wb_data *data)
> @@ -1591,12 +1592,17 @@ static void fuse_writepages_send(struct fuse_fill_wb_data *data)
> struct inode *inode = data->inode;
> struct fuse_conn *fc = get_fuse_conn(inode);
> struct fuse_inode *fi = get_fuse_inode(inode);
> + int num_pages = req->num_pages;
> + int i;
>
> req->ff = fuse_file_get(data->ff);
> spin_lock(&fc->lock);
> list_add_tail(&req->list, &fi->queued_writes);
> fuse_flush_writepages(inode);
> spin_unlock(&fc->lock);
> +
> + for (i = 0; i < num_pages; i++)
> + end_page_writeback(data->orig_pages[i]);
> }
>
> static int fuse_writepages_fill(struct page *page,
> @@ -1677,7 +1683,7 @@ static int fuse_writepages_fill(struct page *page,
>
> inc_bdi_stat(page->mapping->backing_dev_info, BDI_WRITEBACK);
> inc_zone_page_state(tmp_page, NR_WRITEBACK_TEMP);
> - end_page_writeback(page);
> + data->orig_pages[req->num_pages] = page;
>
> /*
> * Protected by fc->lock against concurrent access by
> @@ -1709,6 +1715,13 @@ static int fuse_writepages(struct address_space *mapping,
> data.req = NULL;
> data.ff = NULL;
>
> + err = -ENOMEM;
> + data.orig_pages = kzalloc(sizeof(struct page *) *
> + FUSE_MAX_PAGES_PER_REQ,
> + GFP_NOFS);
> + if (!data.orig_pages)
> + goto out;
> +
> err = write_cache_pages(mapping, wbc, fuse_writepages_fill, &data);
> if (data.req) {
> /* Ignore errors if we can write at least one page */
> @@ -1718,6 +1731,8 @@ static int fuse_writepages(struct address_space *mapping,
> }
> if (data.ff)
> fuse_file_put(data.ff, false);
> +
> + kfree(data.orig_pages);
> out:
> return err;
> }
>
Hi,
08/29/2013 03:46 PM, Miklos Szeredi пишет:
> On Fri, Aug 16, 2013 at 03:51:41PM +0400, Maxim Patlasov wrote:
>> The patch is for
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git writepages.v2
>>
>> The patch fixes a race between ftruncate(2), mmap-ed write and write(2):
>>
>> 1) An user makes a page dirty via mmap-ed write.
>> 2) The user performs shrinking truncate(2) intended to purge the page.
>> 3) Before fuse_do_setattr calls truncate_pagecache, the page goes to
>> writeback. fuse_writepages_fill attaches a new page to FUSE_WRITE request,
>> then releases the original page by end_page_writeback and unlock it.
>> 4) fuse_do_setattr completes and successfully returns. Since now, i_mutex
>> is free.
>> 5) Ordinary write(2) extends i_size back to cover the page. Note that
>> fuse_send_write_pages do wait for fuse writeback, but for another
>> page->index.
>> 6) fuse_writepages_fill attaches more pages to the request (if any), then
>> fuse_writepages_send is eventually called. It is supposed to crop
>> inarg->size of the request, but it doesn't because i_size has already been
>> extended back.
>>
>> Moving end_page_writeback behind fuse_writepages_send guarantees that
>> __fuse_release_nowrite (called from fuse_do_setattr) will crop inarg->size
>> of the request before write(2) gets the chance to extend i_size.
> Thanks for the report. Your analysis looks correct.
>
> Just one nit, why orig_pages? req->pages is already there, so why duplicate it?
req->pages is there, but it is already occupied by new pages (allocated
by fuse_writepages_fill). We can't re-use req->pages for original pages
because as soon as we put the request to bg_queue (in
fuse_writepages_send) and released fc->lock, req->pages may be accessed
w/o any delay. So we have two bunches of pointers to "struct page" to be
stashed somewhere : original and new one. req->pages is for new pages,
orig_pages[] is for original ones.
> Note: you can do __fuse_get_request()/fuse_put_request() to prevent the req from
> going away after it's been sent.
Yes, I experimented with this technique before adding orig_pages[]. I
was very reluctant about duplicating that page array and was looking for
any opportunity to avoid it. Pinning original pages to new ones using
page->private looked promising, but unfortunately it didn't work because
__fuse_get_request() protects only request itself from disappearing, not
from releasing pages that req->pages[] points to. And obviously, as soon
as a page released, it's not correct to rely on the content of its
'private' field.
Thanks,
Maxim
On Thu, Aug 29, 2013 at 2:38 PM, Maxim Patlasov <[email protected]> wrote:
>> Just one nit, why orig_pages? req->pages is already there, so why
>> duplicate it?
>
>
> req->pages is there, but it is already occupied by new pages (allocated by
> fuse_writepages_fill). We can't re-use req->pages for original pages because
> as soon as we put the request to bg_queue (in fuse_writepages_send) and
> released fc->lock, req->pages may be accessed w/o any delay. So we have two
> bunches of pointers to "struct page" to be stashed somewhere : original and
> new one. req->pages is for new pages, orig_pages[] is for original ones.
Yeah. Applied the original patch.
Thanks,
Miklos