2013-07-08 09:52:09

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH 1/5] mm: remove redundant dirty pages check from __delete_from_page_cache()

This chunk was added by commit 3a6927906f1b2adf5a31b789322d32eb8559ada0
("Do dirty page accounting when removing a page from the page cache") in 2.6.24.

That was fix for side-effects of commit 3e67c0987d7567ad666641164a153dca9a43b11d
("[PATCH] truncate: clear page dirtiness before running try_to_free_buffers()")
which was required for 46d2277c796f9f4937bfa668c40b2e3f43e93dd0 ("Clean up and
make try_to_free_buffers() not race with dirty pages") and that patch in turn
was reverted by commit ecdfc9787fe527491baefc22dce8b2dbd5b2908d ("Resurrect
'try_to_free_buffers()' VM hackery").

And finally, cancel_dirty_page() was placed after do_invalidatepage in 2.6.25 by
commit a2b345642f530054a92b8d2b5108436225a8093e
("Fix dirty page accounting leak with ext3 data=journal")

So, that hunk is redundant. All other callers of delete_from_page_cache() and
__delete_from_page_cache() handle dirty pages themselves.

I have run xfstest on ext3 in mode data=journal several times. Everything works
fine, except testcase '068' which have triggered deadlock on fs-freeze.
This seems unrelated and probably it's never worked for ext3 in this mode.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
---
mm/filemap.c | 12 ------------
mm/truncate.c | 4 ++++
2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 7905fe7..504aab2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -135,18 +135,6 @@ void __delete_from_page_cache(struct page *page)
if (PageSwapBacked(page))
__dec_zone_page_state(page, NR_SHMEM);
BUG_ON(page_mapped(page));
-
- /*
- * Some filesystems seem to re-dirty the page even after
- * the VM has canceled the dirty bit (eg ext3 journaling).
- *
- * Fix it up by doing a final dirty accounting check after
- * having removed the page entirely.
- */
- if (PageDirty(page) && mapping_cap_account_dirty(mapping)) {
- dec_zone_page_state(page, NR_FILE_DIRTY);
- dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
- }
}

/**
diff --git a/mm/truncate.c b/mm/truncate.c
index e2e8a8a..9b4721f 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -100,6 +100,10 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
if (page_has_private(page))
do_invalidatepage(page, 0, PAGE_CACHE_SIZE);

+ /*
+ * This is final dirty accounting check. Some filesystems may re-dirty
+ * pages during invalidation, hence it's placed after that.
+ */
cancel_dirty_page(page, PAGE_CACHE_SIZE);

ClearPageMappedToDisk(page);


2013-07-08 09:52:12

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH 2/5] nfs: remove redundant cancel_dirty_page() from nfs_wb_page_cancel()

This chunk was added by commit 1b3b4a1a2deb7d3e5d66063bd76304d840c966b3
("NFS: Fix a write request leak in nfs_invalidate_page()") in kernel 2.6.23,
as fix for problem introduced in commit 3e67c0987d7567ad666641164a153dca9a43b11d
("[PATCH] truncate: clear page dirtiness before running try_to_free_buffers()")
in v2.6.20, which has placed cancel_dirty_page() in truncate_complete_page()
before calling do_invalidatepage(). But that change in truncate_complete_page()
was reverted by commit a2b345642f530054a92b8d2b5108436225a8093e in v2.6.25
("Fix dirty page accounting leak with ext3 data=journal").

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Cc: [email protected]
---
fs/nfs/write.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index a2c7c28..737981f 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1735,11 +1735,6 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
if (nfs_lock_request(req)) {
nfs_clear_request_commit(req);
nfs_inode_remove_request(req);
- /*
- * In case nfs_inode_remove_request has marked the
- * page as being dirty
- */
- cancel_dirty_page(page, PAGE_CACHE_SIZE);
nfs_unlock_and_release_request(req);
break;
}

2013-07-08 09:52:19

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH 3/5] hugetlbfs: remove cancel_dirty_page() from truncate_huge_page()

There is no backend and writeback, dirty pages accounting is disabled.
ClearPageDirty() more suits for this place.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Cc: Nadia Yvette Chambers <[email protected]>
---
fs/hugetlbfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index a3f868a..548badf 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -323,7 +323,7 @@ static int hugetlbfs_write_end(struct file *file, struct address_space *mapping,

static void truncate_huge_page(struct page *page)
{
- cancel_dirty_page(page, /* No IO accounting for huge pages? */0);
+ ClearPageDirty(page);
ClearPageUptodate(page);
delete_from_page_cache(page);
}

2013-07-08 09:52:25

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH 5/5] page_writeback: put account_page_redirty() after set_page_dirty()

Function account_page_redirty() fixes dirty pages counter for redieried pages.
This patch prevents temporary underflows of dirty pages counters on zone/bdi
and current->nr_dirtied. This puts decrement after increment.

__set_page_dirty_nobuffers() in redirty_page_for_writeback() always returns true
because this page is locked and all ptes are write-protected by previously
called clear_page_dirty_for_io(). Thus nobody can mark it dirty except current
task. This prevents multy-threaded scenarios of taks->nr_dirtied underflows.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
---
fs/btrfs/extent_io.c | 2 +-
mm/page-writeback.c | 7 ++++++-
2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6bca947..d17fb9b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1311,8 +1311,8 @@ int extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end)
while (index <= end_index) {
page = find_get_page(inode->i_mapping, index);
BUG_ON(!page); /* Pages should be in the extent_io_tree */
- account_page_redirty(page);
__set_page_dirty_nobuffers(page);
+ account_page_redirty(page);
page_cache_release(page);
index++;
}
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 4514ad7..a599f38 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2061,6 +2061,8 @@ EXPORT_SYMBOL(__set_page_dirty_nobuffers);
* counters (NR_WRITTEN, BDI_WRITTEN) in long term. The mismatches will lead to
* systematic errors in balanced_dirty_ratelimit and the dirty pages position
* control.
+ *
+ * Must be called after marking page dirty.
*/
void account_page_redirty(struct page *page)
{
@@ -2080,9 +2082,12 @@ EXPORT_SYMBOL(account_page_redirty);
*/
int redirty_page_for_writepage(struct writeback_control *wbc, struct page *page)
{
+ int ret;
+
wbc->pages_skipped++;
+ ret = __set_page_dirty_nobuffers(page);
account_page_redirty(page);
- return __set_page_dirty_nobuffers(page);
+ return ret;
}
EXPORT_SYMBOL(redirty_page_for_writepage);

2013-07-08 09:52:44

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH 4/5] page_writeback: get rid of account_size argument in cancel_dirty_page()

Dirty pages accounting always works with PAGE_CACHE_SIZE granularity.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
---
fs/buffer.c | 2 +-
include/linux/page-flags.h | 2 +-
mm/truncate.c | 7 +++----
3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 4d74335..37eee33 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3243,7 +3243,7 @@ int try_to_free_buffers(struct page *page)
* dirty bit from being lost.
*/
if (ret)
- cancel_dirty_page(page, PAGE_CACHE_SIZE);
+ cancel_dirty_page(page);
spin_unlock(&mapping->private_lock);
out:
if (buffers_to_free) {
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6d53675..8a88f59 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -314,7 +314,7 @@ static inline void SetPageUptodate(struct page *page)

CLEARPAGEFLAG(Uptodate, uptodate)

-extern void cancel_dirty_page(struct page *page, unsigned int account_size);
+extern void cancel_dirty_page(struct page *page);

int test_clear_page_writeback(struct page *page);
int test_set_page_writeback(struct page *page);
diff --git a/mm/truncate.c b/mm/truncate.c
index 9b4721f..e212252 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -66,7 +66,7 @@ void do_invalidatepage(struct page *page, unsigned int offset,
* out all the buffers on a page without actually doing it through
* the VM. Can you say "ext3 is horribly ugly"? Tought you could.
*/
-void cancel_dirty_page(struct page *page, unsigned int account_size)
+void cancel_dirty_page(struct page *page)
{
if (TestClearPageDirty(page)) {
struct address_space *mapping = page->mapping;
@@ -74,8 +74,7 @@ void cancel_dirty_page(struct page *page, unsigned int account_size)
dec_zone_page_state(page, NR_FILE_DIRTY);
dec_bdi_stat(mapping->backing_dev_info,
BDI_RECLAIMABLE);
- if (account_size)
- task_io_account_cancelled_write(account_size);
+ task_io_account_cancelled_write(PAGE_CACHE_SIZE);
}
}
}
@@ -104,7 +103,7 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
* This is final dirty accounting check. Some filesystems may re-dirty
* pages during invalidation, hence it's placed after that.
*/
- cancel_dirty_page(page, PAGE_CACHE_SIZE);
+ cancel_dirty_page(page);

ClearPageMappedToDisk(page);
delete_from_page_cache(page);