2006-12-22 14:47:15

by Thomas Meyer

[permalink] [raw]
Subject: WARNING: "test_clear_page_dirty" [fs/cifs/cifs.ko] undefined!

Again current git head:

I guess this should be fixed by someone!

WARNING: "test_clear_page_dirty" [fs/cifs/cifs.ko] undefined!
make[1]: *** [__modpost] Fehler 1
make: *** [modules] Fehler 2


2006-12-22 21:30:31

by Jean Delvare

[permalink] [raw]
Subject: Re: WARNING: "test_clear_page_dirty" [fs/cifs/cifs.ko] undefined!

On Fri, 22 Dec 2006 15:28:45 +0100, Thomas Meyer wrote:
> Again current git head:
>
> I guess this should be fixed by someone!
>
> WARNING: "test_clear_page_dirty" [fs/cifs/cifs.ko] undefined!
> make[1]: *** [__modpost] Fehler 1
> make: *** [modules] Fehler 2

This is caused by this commit:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fba2591bf4e418b6c3f9f8794c9dd8fe40ae7bd9

>From the log message:
"Some filesystems need to be fixed up for this: CIFS, FUSE, JFS,
ReiserFS, XFS all use the old confusing functions, and will be fixed
separately in subsequent commits (with some of them just removing the
offending logic, and others using clear_page_dirty_for_io())."

The approach seems quite broken to me, the users should have been fixed
_before_ removing the function, so as to avoid compilation failures.
These are a pain for testers, and break git bisect too. Grmbl.

Now that it's done... Steve, can you please take a look and provide a
patch so that cifs builds again?

Thanks,
--
Jean Delvare

2006-12-23 00:26:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: WARNING: "test_clear_page_dirty" [fs/cifs/cifs.ko] undefined!



On Fri, 22 Dec 2006, Jean Delvare wrote:
>
> The approach seems quite broken to me, the users should have been fixed
> _before_ removing the function, so as to avoid compilation failures.
> These are a pain for testers, and break git bisect too. Grmbl.

This needed to be fixed, and quite frankly, things don't get fixed nearly
as quickly if you don't just break them first. And there really were just
two filesystems that got broken, cifs being one of them.

I just can't test it.

> Now that it's done... Steve, can you please take a look and provide a
> patch so that cifs builds again?

CIFS _should_ be using "clear_page_dirty_for_io()" in that place, and that
will fix the build. However, the reason I didn't just do that myself is
that I can't test the end result, and for the life of me, I can't see
where CIFS does the "end_page_writeback()" that it needs to do at IO
completion time.

And the thing that confuses me about that, is that if CIFS doesn't do
"end_page_writeback()", then it was already broken before - because when
the VM calls "->writepage()" the clear_page_dirty_for_io() will have been
done by the VM, and it needs that "end_page_writeback()" so that the
system can know when the IO is done.

I _suspect_ that those "unlock_page()" calls should be accompanied by a
"end_page_writeback()" call, and that the proper patch MAY look something
like the appended, but I worry about having missed something really
subtle. Maybe there's a end_page_writeback() somewhere else.

And if there isn't, I wonder if shared mappings have _ever_ worked on
CIFS? And if so, how? That writeback bit thing isn't new per se.

So this may or may not fix it. If you can test it (_including_ with some
dirty shared mmap-on-mmap action, please - just call me kinky), I'll
commit it. But I need somebody who actually uses this to test it.

Linus

---
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..4f0472d 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,7 +1245,7 @@ retry:
wait_on_page_writeback(page);

if (PageWriteback(page) ||
- !test_clear_page_dirty(page)) {
+ !clear_page_dirty_for_io(page)) {
unlock_page(page);
break;
}
@@ -1253,6 +1253,7 @@ retry:
if (page_offset(page) >= mapping->host->i_size) {
done = 1;
unlock_page(page);
+ end_page_writeback(page);
break;
}

@@ -1316,6 +1317,7 @@ retry:
SetPageError(page);
kunmap(page);
unlock_page(page);
+ end_page_writeback(page);
page_cache_release(page);
}
if ((wbc->nr_to_write -= n_iov) <= 0)
@@ -1356,7 +1358,8 @@ static int cifs_writepage(struct page* page, struct writeback_control *wbc)
rc = cifs_partialpagewrite(page, 0, PAGE_CACHE_SIZE);
SetPageUptodate(page); /* BB add check for error and Clearuptodate? */
unlock_page(page);
- page_cache_release(page);
+ end_page_writeback(page);
+ page_cache_release(page);
FreeXid(xid);
return rc;
}

2006-12-23 18:31:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: WARNING: "test_clear_page_dirty" [fs/cifs/cifs.ko] undefined!


[ Andrew - I'm cc'ing you, because you caused the requirement that people
use "set_page_writeback()" in their writepage() routine that CIFS seems
to have been ignoring all these years. That was introduced more than
two years ago, back in April 11, 2004:

[PATCH] fdatasync integrity fix

fdatasync can fail to wait on some pages due to a race.
...

and as far as I can see, ever since then, any filesystem that didn't do
a "set_page_writeback()" to sync up the TAG_DIRTY bit would have this
CPU usage problem. Please double-check whether I'm right or barking up
the wrong tree.

Afaik, the lack of doing the page writeback bit handling properly would
seem to not cause any actual visible _semantic_ problems, it would just
cause fdatasync to not necessarily be entirely reliable (which I guess
is semantic, but very hard to see) and just wasted CPU cycles when we
look up pages that are marked dirty in the radix tree, but aren't
actually really dirty.

Correct? Who else is implicated in all of this? ]

On Fri, 22 Dec 2006, Linus Torvalds wrote:
>
> CIFS _should_ be using "clear_page_dirty_for_io()" in that place, and that
> will fix the build. However, the reason I didn't just do that myself is
> that I can't test the end result, and for the life of me, I can't see
> where CIFS does the "end_page_writeback()" that it needs to do at IO
> completion time.

Ok, I spent some more time looking at this.

The reason cifs didn't do an "end_page_writeback()" was that it didn't
even do the "set_page_writeback()" to mark the page under writeback in the
first place.

Now, you might think that since it didn't do a set_page_writeback(), it
doesn't need to do the matching end_page_writeback() at all, and instead
just continue to use the old (_really_ old) way of just unlocking the page
when it is done.

However, you'd be wrong. The thing is, a "writepage()" function will be
called with the dirty bit cleared in the "struct page *", but the mapping
radix trees will still have the dirty bit set, exactly because the VM
_requires_ the filesystem to tell it what the h*ll it is doing with the
page. So a low-level filesystem must always do one of two things in it's
"writepage()" function. Either:

- "set_page_writeback()" (and then an "end_page_writeback()" when
finished, of course)

OR

- "redirty_page_for_writepage()" to tell the VM to move the page to the
back of the LRU queues because it can't be cleaned (eg, some temporary
problem with write ordering or similar, or something fundamental like
"I'm ramfs, and I don't _have_ any backing store").

and if the low-level filesystem doesn't do either of those, then the
status bits in the radix tree that contains the mapping information will
never be updated, so the page that got cleaned will continue to be marked
"dirty" in the radix tree (which admittedly will generally be invisible,
except for "sync()" and friends spending inordinate amounts of time
looking at pages that aren't even dirty any more - since they look things
up by the radix tree tags).

So I think the old code happened to work, but it was definitely incorrect,
and would leave the dirty tags in the radix tree very confused indeed (it
so happened that "cifs_writepages()" - with an "s" at the end - because it
used "test_clear_page_dirty()" - would also clear the dirty tag, but any
page that went through the generic VM routines and the single-page
"cifs_writepage()" - without an "s" at the end - would then be forever
marked dirty in the radix tree even though it was clean.

Somebody should check me, though.

This fairly mindless patch adds the proper "set_page_writeback()" calls
(and the "clear_page_writeback()" ones I had already added before I looked
more closely at this).

I added a comment in "cifs_writepage()" (the single-page case) for why
this all is the case,

Linus

PS. To clarify: the old "test_clear_page_dirty()" would actually clear the
dirty bit in the radix tree too, so in that sense it was the RIGHT thing
to do for CIFS, since CIFS was mostly unaware of the need to clear the
radix tree dirty bit (even if cifs_writepages() actually used that bit to
look up pages).

HOWEVER, since CIFS is called from the generic routines (which _are_
radix-tree-aware and need the bit to be cleared explicitly), even the old
code was actually totally broken. It would clear - largely by mistake -
the radix tree dirty bit only for one case, not for _all_ the cases. A
filesystem really does need to know about these things now, although a
lot of filesystems can ignore them, since if they use all the generic
routines, they generic routines will handle it all for them.

---
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 0f05cab..8a49b2e 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -1245,14 +1245,21 @@ retry:
wait_on_page_writeback(page);

if (PageWriteback(page) ||
- !test_clear_page_dirty(page)) {
+ !clear_page_dirty_for_io(page)) {
unlock_page(page);
break;
}

+ /*
+ * This actually clears the dirty bit in the radix tree.
+ * See cifs_writepage() for more commentary.
+ */
+ set_page_writeback(page);
+
if (page_offset(page) >= mapping->host->i_size) {
done = 1;
unlock_page(page);
+ end_page_writeback(page);
break;
}

@@ -1316,6 +1323,7 @@ retry:
SetPageError(page);
kunmap(page);
unlock_page(page);
+ end_page_writeback(page);
page_cache_release(page);
}
if ((wbc->nr_to_write -= n_iov) <= 0)
@@ -1352,11 +1360,23 @@ static int cifs_writepage(struct page* page, struct writeback_control *wbc)
if (!PageUptodate(page)) {
cFYI(1, ("ppw - page not up to date"));
}
-
+
+ /*
+ * Set the "writeback" flag, and clear "dirty" in the radix tree.
+ *
+ * A writepage() implementation always needs to do either this,
+ * or re-dirty the page with "redirty_page_for_writepage()" in
+ * the case of a failure.
+ *
+ * Just unlocking the page will cause the radix tree tag-bits
+ * to fail to update with the state of the page correctly.
+ */
+ set_page_writeback(page);
rc = cifs_partialpagewrite(page, 0, PAGE_CACHE_SIZE);
SetPageUptodate(page); /* BB add check for error and Clearuptodate? */
unlock_page(page);
- page_cache_release(page);
+ end_page_writeback(page);
+ page_cache_release(page);
FreeXid(xid);
return rc;
}

2006-12-23 19:45:29

by Randy Dunlap

[permalink] [raw]
Subject: Re: WARNING: "test_clear_page_dirty" [fs/cifs/cifs.ko] undefined!

On Sat, 23 Dec 2006 10:30:40 -0800 (PST) Linus Torvalds wrote:

>
> [ Andrew - I'm cc'ing you, because you caused the requirement that people
> use "set_page_writeback()" in their writepage() routine that CIFS seems
> to have been ignoring all these years. That was introduced more than
> two years ago, back in April 11, 2004:
>
> [PATCH] fdatasync integrity fix
>
> fdatasync can fail to wait on some pages due to a race.
> ...
>
> and as far as I can see, ever since then, any filesystem that didn't do
> a "set_page_writeback()" to sync up the TAG_DIRTY bit would have this
> CPU usage problem. Please double-check whether I'm right or barking up
> the wrong tree.
>
> Afaik, the lack of doing the page writeback bit handling properly would
> seem to not cause any actual visible _semantic_ problems, it would just
> cause fdatasync to not necessarily be entirely reliable (which I guess
> is semantic, but very hard to see) and just wasted CPU cycles when we
> look up pages that are marked dirty in the radix tree, but aren't
> actually really dirty.
>
> Correct? Who else is implicated in all of this? ]
>
> On Fri, 22 Dec 2006, Linus Torvalds wrote:
> >
> > CIFS _should_ be using "clear_page_dirty_for_io()" in that place, and that
> > will fix the build. However, the reason I didn't just do that myself is
> > that I can't test the end result, and for the life of me, I can't see
> > where CIFS does the "end_page_writeback()" that it needs to do at IO
> > completion time.
>
> Ok, I spent some more time looking at this.
>
> The reason cifs didn't do an "end_page_writeback()" was that it didn't
> even do the "set_page_writeback()" to mark the page under writeback in the
> first place.
>
> Now, you might think that since it didn't do a set_page_writeback(), it
> doesn't need to do the matching end_page_writeback() at all, and instead
> just continue to use the old (_really_ old) way of just unlocking the page
> when it is done.
>
> However, you'd be wrong. The thing is, a "writepage()" function will be
> called with the dirty bit cleared in the "struct page *", but the mapping
> radix trees will still have the dirty bit set, exactly because the VM
> _requires_ the filesystem to tell it what the h*ll it is doing with the
> page. So a low-level filesystem must always do one of two things in it's
> "writepage()" function. Either:
>
> - "set_page_writeback()" (and then an "end_page_writeback()" when
> finished, of course)
>
> OR
>
> - "redirty_page_for_writepage()" to tell the VM to move the page to the
> back of the LRU queues because it can't be cleaned (eg, some temporary
> problem with write ordering or similar, or something fundamental like
> "I'm ramfs, and I don't _have_ any backing store").
>
> and if the low-level filesystem doesn't do either of those, then the
> status bits in the radix tree that contains the mapping information will
> never be updated, so the page that got cleaned will continue to be marked
> "dirty" in the radix tree (which admittedly will generally be invisible,
> except for "sync()" and friends spending inordinate amounts of time
> looking at pages that aren't even dirty any more - since they look things
> up by the radix tree tags).
>
> So I think the old code happened to work, but it was definitely incorrect,
> and would leave the dirty tags in the radix tree very confused indeed (it
> so happened that "cifs_writepages()" - with an "s" at the end - because it
> used "test_clear_page_dirty()" - would also clear the dirty tag, but any
> page that went through the generic VM routines and the single-page
> "cifs_writepage()" - without an "s" at the end - would then be forever
> marked dirty in the radix tree even though it was clean.
>
> Somebody should check me, though.
>
> This fairly mindless patch adds the proper "set_page_writeback()" calls
> (and the "clear_page_writeback()" ones I had already added before I looked
> more closely at this).
>
> I added a comment in "cifs_writepage()" (the single-page case) for why
> this all is the case,

BTW, reiserfs has similar build problems: it uses clear_page_dirty()
so it won't build.

fs/built-in.o: In function `reiserfs_cut_from_item':
(.text.reiserfs_cut_from_item+0x868): undefined reference to `clear_page_dirty'

---
~Randy

2006-12-23 20:08:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: WARNING: "test_clear_page_dirty" [fs/cifs/cifs.ko] undefined!



On Sat, 23 Dec 2006, Randy Dunlap wrote:
>
> BTW, reiserfs has similar build problems: it uses clear_page_dirty()
> so it won't build.

Not any more. I fixed that one (very different issue, btw: it's not
actually doign writeout, it actually wanted to cancel IO on truncated
buffers.

However, it's certainly possible that my fix hasn't mirrored out yet, I
pushed it just a couple of hours ago. So if you want to test it, here are
the two commits in question..

(The "cancel_dirty_page()" cleanup is needed not just to do reiserfs as a
module, it's also to make it more robust against reiserfs possibly feeding
that function with strange pages, and to match the other related functions
in the accounting functions).

Len Brown tested the reiserfs changes, and claims that it was all good,
but if somebody wants to run fsx-linux or some other filesystem stress
testing tool that actually tests shared mmap (and truncate), that would be
really appreciated.

Linus

--
commit 8368e328dfe1c534957051333a87b3210a12743b
Author: Linus Torvalds <[email protected]>
Date: Sat Dec 23 09:25:04 2006 -0800

Clean up and export cancel_dirty_page() to modules

Make cancel_dirty_page() act more like all the other dirty and writeback
accounting functions: test for "mapping" being NULL, and do the
NR_FILE_DIRY accounting purely based on mapping_cap_account_dirty()).

Also, add it to the exports, so that modular filesystems can use it.

Acked-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
---
mm/truncate.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 4a38dd1..ecdfdcc 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -60,12 +60,16 @@ void cancel_dirty_page(struct page *page, unsigned int account_size)
WARN_ON(++warncount < 5);
}

- if (TestClearPageDirty(page) && account_size &&
- mapping_cap_account_dirty(page->mapping)) {
- dec_zone_page_state(page, NR_FILE_DIRTY);
- task_io_account_cancelled_write(account_size);
+ if (TestClearPageDirty(page)) {
+ struct address_space *mapping = page->mapping;
+ if (mapping && mapping_cap_account_dirty(mapping)) {
+ dec_zone_page_state(page, NR_FILE_DIRTY);
+ if (account_size)
+ task_io_account_cancelled_write(account_size);
+ }
}
}
+EXPORT_SYMBOL(cancel_dirty_page);

/*
* If truncate cannot remove the fs-private metadata from the page, the page

commit ffaa82008f1aad52a6d3979f49d2a76c2928b60f
Author: Linus Torvalds <[email protected]>
Date: Sat Dec 23 09:32:45 2006 -0800

Fix reiserfs after "test_clear_page_dirty()" removal

Thanks to Len Brown for testing this fix, since while they have in the
past, none of my machines run reiserfs at the moment.

Cc: Vladimir V. Saveliev <[email protected]>
Acked-by: Len Brown <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
---
fs/reiserfs/stree.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c
index 47e7027..afb21ea 100644
--- a/fs/reiserfs/stree.c
+++ b/fs/reiserfs/stree.c
@@ -1459,7 +1459,7 @@ static void unmap_buffers(struct page *page, loff_t pos)
bh = next;
} while (bh != head);
if (PAGE_SIZE == bh->b_size) {
- clear_page_dirty(page);
+ cancel_dirty_page(page, PAGE_CACHE_SIZE);
}
}
}

2006-12-24 05:35:18

by Randy Dunlap

[permalink] [raw]
Subject: Re: WARNING: "test_clear_page_dirty" [fs/cifs/cifs.ko] undefined!

On Sat, 23 Dec 2006 12:06:43 -0800 (PST) Linus Torvalds wrote:

>
>
> On Sat, 23 Dec 2006, Randy Dunlap wrote:
> >
> > BTW, reiserfs has similar build problems: it uses clear_page_dirty()
> > so it won't build.
>
> Not any more. I fixed that one (very different issue, btw: it's not
> actually doign writeout, it actually wanted to cancel IO on truncated
> buffers.
>
> However, it's certainly possible that my fix hasn't mirrored out yet, I
> pushed it just a couple of hours ago. So if you want to test it, here are
> the two commits in question..
>
> (The "cancel_dirty_page()" cleanup is needed not just to do reiserfs as a
> module, it's also to make it more robust against reiserfs possibly feeding
> that function with strange pages, and to match the other related functions
> in the accounting functions).
>
> Len Brown tested the reiserfs changes, and claims that it was all good,
> but if somebody wants to run fsx-linux or some other filesystem stress
> testing tool that actually tests shared mmap (and truncate), that would be
> really appreciated.

I ran fsx-linux on it for one hour... with no problems reported.

---
~Randy