From: Trond Myklebust Subject: Re: [BUG] problem with nfs_invalidate_page Date: Wed, 29 Aug 2007 11:49:22 -0400 Message-ID: <1188402562.6580.74.camel@heimdal.trondhjem.org> References: <1188221412.6701.45.camel@heimdal.trondhjem.org> <200708280832.AA00252@paprika.lab.ntt.co.jp> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-IOgtJVsVpTYsKV/4gbs/" Cc: nfs@lists.sourceforge.net, linux-kernel@vger.kernel.org To: Ryusuke Konishi , Andrew Morton Return-path: Received: from sc8-sf-mx1-b.sourceforge.net ([10.3.1.91] helo=mail.sourceforge.net) by sc8-sf-list2-new.sourceforge.net with esmtp (Exim 4.43) id 1IQPnX-0004mU-RB for nfs@lists.sourceforge.net; Wed, 29 Aug 2007 08:49:39 -0700 Received: from pat.uio.no ([129.240.10.15]) by mail.sourceforge.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.44) id 1IQPnZ-0005sN-BC for nfs@lists.sourceforge.net; Wed, 29 Aug 2007 08:49:44 -0700 In-Reply-To: <200708280832.AA00252@paprika.lab.ntt.co.jp> List-Id: "Discussion of NFS under Linux development, interoperability, and testing." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfs-bounces@lists.sourceforge.net Errors-To: nfs-bounces@lists.sourceforge.net --=-IOgtJVsVpTYsKV/4gbs/ Content-Type: text/plain Content-Transfer-Encoding: 7bit On Tue, 2007-08-28 at 17:32 +0900, Ryusuke Konishi wrote: > On Mon, 27 Aug 2007 09:30:12 -0400, Trond Myklebust wrote: > >It looks as if ecryptfs is dropping the page lock between the calls to > >prepare_write() and commit_write(). That would be a bug. > > No, ecryptfs is holding the page lock between the calls to > nfs_prepare_write() and nfs_commit_write(). > This is a regression since kernel 2.6.20; kernel 2.6.19 does not > yield the BUG. > > Please look at truncate_complete_page() and nfs_wb_page_priority() > which is called from nfs_invalidate_page(). > > The recent truncate_complete_page() clears the dirty flag from a page > before calling a_ops->invalidatepage(), > ^^^^^^ > static void > truncate_complete_page(struct address_space *mapping, struct page *page) > { > ... > cancel_dirty_page(page, PAGE_CACHE_SIZE); <--- Inserted here at kernel 2.6.20 > > if (PagePrivate(page)) > do_invalidatepage(page, 0); ---> will call a_ops->invalidatepage() > ... > } > > and this is disturbing nfs_wb_page_priority() from calling > nfs_writepage_locked() that is expected to handle the pending > request (=nfs_page) associated with the page. > > int nfs_wb_page_priority(struct inode *inode, struct page *page, int how) > { > ... > if (clear_page_dirty_for_io(page)) { > ret = nfs_writepage_locked(page, &wbc); > if (ret < 0) > goto out; > } > ... > } > > Since truncate_complete_page() will get rid of the page after > a_ops->invalidatepage() returns, the request (=nfs_page) associated > with the page becomes a garbage in nfs_inode->nfs_page_tree. > > This causes the collision of nfs_page and yields the BUG. > > > Cheers, > Ryusuke Konishi OK. I see your point... Basically, you are saying that the new ->invalidatepage() semantics do not allow us to rely on the dirty status of the page in order to figure out if we need to clean up. Andrew, that was a fairly significant change in semantics... Anyhow, well done debugging it! Does the following patch fix the Oops? Trond --=-IOgtJVsVpTYsKV/4gbs/ Content-Disposition: inline; filename=linux-2.6.23-004-fix_nfs_wb_page_priority.dif Content-Type: message/rfc822; name=linux-2.6.23-004-fix_nfs_wb_page_priority.dif From: Trond Myklebust Date: Tue, 28 Aug 2007 10:29:36 -0400 NFS: Fix a write request leak in nfs_invalidate_page() Subject: No Subject Message-Id: <1188402562.6580.75.camel@heimdal.trondhjem.org> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Ryusuke Konishi says: The recent truncate_complete_page() clears the dirty flag from a page before calling a_ops->invalidatepage(), ^^^^^^ static void truncate_complete_page(struct address_space *mapping, struct page *page) { ... cancel_dirty_page(page, PAGE_CACHE_SIZE); <--- Inserted here at kernel 2.6.20 if (PagePrivate(page)) do_invalidatepage(page, 0); ---> will call a_ops->invalidatepage() ... } and this is disturbing nfs_wb_page_priority() from calling nfs_writepage_locked() that is expected to handle the pending request (=nfs_page) associated with the page. int nfs_wb_page_priority(struct inode *inode, struct page *page, int how) { ... if (clear_page_dirty_for_io(page)) { ret = nfs_writepage_locked(page, &wbc); if (ret < 0) goto out; } ... } Since truncate_complete_page() will get rid of the page after a_ops->invalidatepage() returns, the request (=nfs_page) associated with the page becomes a garbage in nfs_inode->nfs_page_tree. ------------------------ Fix this by ensuring that nfs_wb_page_priority() recognises that it may also need to clear out non-dirty pages that have an nfs_page associated with them. Signed-off-by: Trond Myklebust --- fs/nfs/file.c | 2 +- fs/nfs/write.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/nfs_fs.h | 1 + 3 files changed, 46 insertions(+), 1 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index c87dc71..579cf8a 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -316,7 +316,7 @@ static void nfs_invalidate_page(struct page *page, unsigned long offset) if (offset != 0) return; /* Cancel any unstarted writes on this page */ - nfs_wb_page_priority(page->mapping->host, page, FLUSH_INVALIDATE); + nfs_wb_page_cancel(page->mapping->host, page); } static int nfs_release_page(struct page *page, gfp_t gfp) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index ef97e0c..0d7a77c 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1396,6 +1396,50 @@ out: return ret; } +int nfs_wb_page_cancel(struct inode *inode, struct page *page) +{ + struct nfs_page *req; + loff_t range_start = page_offset(page); + loff_t range_end = range_start + (loff_t)(PAGE_CACHE_SIZE - 1); + struct writeback_control wbc = { + .bdi = page->mapping->backing_dev_info, + .sync_mode = WB_SYNC_ALL, + .nr_to_write = LONG_MAX, + .range_start = range_start, + .range_end = range_end, + }; + int ret = 0; + + BUG_ON(!PageLocked(page)); + for (;;) { + req = nfs_page_find_request(page); + if (req == NULL) + goto out; + if (test_bit(PG_NEED_COMMIT, &req->wb_flags)) { + nfs_release_request(req); + break; + } + if (nfs_lock_request_dontget(req)) { + nfs_inode_remove_request(req); + /* + * In case nfs_inode_remove_request has marked the + * page as being dirty + */ + cancel_dirty_page(page, PAGE_CACHE_SIZE); + nfs_unlock_request(req); + break; + } + ret = nfs_wait_on_request(req); + if (ret < 0) + goto out; + } + if (!PagePrivate(page)) + return 0; + ret = nfs_sync_mapping_wait(page->mapping, &wbc, FLUSH_INVALIDATE); +out: + return ret; +} + int nfs_wb_page_priority(struct inode *inode, struct page *page, int how) { loff_t range_start = page_offset(page); diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 157dcb0..7250eea 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -431,6 +431,7 @@ extern int nfs_sync_mapping_range(struct address_space *, loff_t, loff_t, int); extern int nfs_wb_all(struct inode *inode); extern int nfs_wb_page(struct inode *inode, struct page* page); extern int nfs_wb_page_priority(struct inode *inode, struct page* page, int how); +extern int nfs_wb_page_cancel(struct inode *inode, struct page* page); #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4) extern int nfs_commit_inode(struct inode *, int); extern struct nfs_write_data *nfs_commit_alloc(void); --=-IOgtJVsVpTYsKV/4gbs/ Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ --=-IOgtJVsVpTYsKV/4gbs/ Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs --=-IOgtJVsVpTYsKV/4gbs/--