2007-12-28 08:04:06

by Wu Fengguang

[permalink] [raw]
Subject: [PATCH] jfs: clear PAGECACHE_TAG_DIRTY for no-write pages

Andrew,

This patch fixed the 'pdflush stuck in D state' bug
http://bugzilla.kernel.org/show_bug.cgi?id=9291
and should be pushed to mainline ASAP.
---

When JFS decides to drop a dirty metapage, it simply clears the META_dirty bit
and leave alone the PG_dirty and PAGECACHE_TAG_DIRTY bits.

When such no-write page goes to metapage_writepage(), the `relic'
PAGECACHE_TAG_DIRTY tag should be cleared, to prevent pdflush from
repeatedly trying to sync them.

Also, avoid the redirty when a bio submission is planned.

Tested-by: Markus Rehbach <[email protected]>
Signed-off-by: Fengguang Wu <[email protected]>
---

diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index f5cd8d3..0c3ffc4 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -353,7 +353,8 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
{
struct bio *bio = NULL;
unsigned int block_offset; /* block offset of mp within page */
- struct inode *inode = page->mapping->host;
+ struct address_space *mapping = page->mapping;
+ struct inode *inode = mapping->host;
unsigned int blocks_per_mp = JFS_SBI(inode->i_sb)->nbperpage;
unsigned int len;
unsigned int xlen;
@@ -449,9 +450,15 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
goto dump_bio;

submit_bio(WRITE, bio);
- }
- if (redirty)
+ } else if (redirty) {
redirty_page_for_writepage(wbc, page);
+ } else {
+ write_lock_irq(&mapping->tree_lock);
+ radix_tree_tag_clear(&mapping->page_tree,
+ page_index(page),
+ PAGECACHE_TAG_DIRTY);
+ write_unlock_irq(&mapping->tree_lock);
+ }

unlock_page(page);


2007-12-28 11:34:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] jfs: clear PAGECACHE_TAG_DIRTY for no-write pages


On Fri, 2007-12-28 at 16:03 +0800, Fengguang Wu wrote:
> Andrew,
>
> This patch fixed the 'pdflush stuck in D state' bug
> http://bugzilla.kernel.org/show_bug.cgi?id=9291
> and should be pushed to mainline ASAP.
> ---
>
> When JFS decides to drop a dirty metapage, it simply clears the META_dirty bit
> and leave alone the PG_dirty and PAGECACHE_TAG_DIRTY bits.
>
> When such no-write page goes to metapage_writepage(), the `relic'
> PAGECACHE_TAG_DIRTY tag should be cleared, to prevent pdflush from
> repeatedly trying to sync them.
>
> Also, avoid the redirty when a bio submission is planned.
>
> Tested-by: Markus Rehbach <[email protected]>
> Signed-off-by: Fengguang Wu <[email protected]>
> ---

> +++ b/fs/jfs/jfs_metapage.c

> @@ -449,9 +450,15 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
> goto dump_bio;
>
> submit_bio(WRITE, bio);
> - }
> - if (redirty)
> + } else if (redirty) {
> redirty_page_for_writepage(wbc, page);
> + } else {
> + write_lock_irq(&mapping->tree_lock);
> + radix_tree_tag_clear(&mapping->page_tree,
> + page_index(page),
> + PAGECACHE_TAG_DIRTY);
> + write_unlock_irq(&mapping->tree_lock);
> + }

I'm not liking this open-coded tag_clear, although I currently fail to
come up with a nice solution.

2007-12-28 11:42:50

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH] jfs: clear PAGECACHE_TAG_DIRTY for no-write pages

On Fri, Dec 28, 2007 at 12:33:46PM +0100, Peter Zijlstra wrote:
>
> On Fri, 2007-12-28 at 16:03 +0800, Fengguang Wu wrote:
> > Andrew,
> >
> > This patch fixed the 'pdflush stuck in D state' bug
> > http://bugzilla.kernel.org/show_bug.cgi?id=9291
> > and should be pushed to mainline ASAP.
> > ---
> >
> > When JFS decides to drop a dirty metapage, it simply clears the META_dirty bit
> > and leave alone the PG_dirty and PAGECACHE_TAG_DIRTY bits.
> >
> > When such no-write page goes to metapage_writepage(), the `relic'
> > PAGECACHE_TAG_DIRTY tag should be cleared, to prevent pdflush from
> > repeatedly trying to sync them.
> >
> > Also, avoid the redirty when a bio submission is planned.
> >
> > Tested-by: Markus Rehbach <[email protected]>
> > Signed-off-by: Fengguang Wu <[email protected]>
> > ---
>
> > +++ b/fs/jfs/jfs_metapage.c
>
> > @@ -449,9 +450,15 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
> > goto dump_bio;
> >
> > submit_bio(WRITE, bio);
> > - }
> > - if (redirty)
> > + } else if (redirty) {
> > redirty_page_for_writepage(wbc, page);
> > + } else {
> > + write_lock_irq(&mapping->tree_lock);
> > + radix_tree_tag_clear(&mapping->page_tree,
> > + page_index(page),
> > + PAGECACHE_TAG_DIRTY);
> > + write_unlock_irq(&mapping->tree_lock);
> > + }
>
> I'm not liking this open-coded tag_clear, although I currently fail to
> come up with a nice solution.

Another option is to release/drop such pages ASAP. But what if the
metapage_writepage() is called even before we have the chance to
release it?

2007-12-28 16:25:23

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH] jfs: clear PAGECACHE_TAG_DIRTY for no-write pages


On Fri, 2007-12-28 at 12:33 +0100, Peter Zijlstra wrote:
> On Fri, 2007-12-28 at 16:03 +0800, Fengguang Wu wrote:
> > Andrew,
> >
> > This patch fixed the 'pdflush stuck in D state' bug
> > http://bugzilla.kernel.org/show_bug.cgi?id=9291
> > and should be pushed to mainline ASAP.
> > ---
> >
> > When JFS decides to drop a dirty metapage, it simply clears the META_dirty bit
> > and leave alone the PG_dirty and PAGECACHE_TAG_DIRTY bits.
> >
> > When such no-write page goes to metapage_writepage(), the `relic'
> > PAGECACHE_TAG_DIRTY tag should be cleared, to prevent pdflush from
> > repeatedly trying to sync them.
> >
> > Also, avoid the redirty when a bio submission is planned.
> >
> > Tested-by: Markus Rehbach <[email protected]>
> > Signed-off-by: Fengguang Wu <[email protected]>
> > ---
>
> > +++ b/fs/jfs/jfs_metapage.c
>
> > @@ -449,9 +450,15 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
> > goto dump_bio;
> >
> > submit_bio(WRITE, bio);
> > - }
> > - if (redirty)
> > + } else if (redirty) {
> > redirty_page_for_writepage(wbc, page);
> > + } else {
> > + write_lock_irq(&mapping->tree_lock);
> > + radix_tree_tag_clear(&mapping->page_tree,
> > + page_index(page),
> > + PAGECACHE_TAG_DIRTY);
> > + write_unlock_irq(&mapping->tree_lock);
> > + }
>
> I'm not liking this open-coded tag_clear, although I currently fail to
> come up with a nice solution.

I'm looking at __block_write_full_page() for guidance. The situation
here is similar to writing a page where all the bufferheads are clean.
__block_write_full_page() always calls set_page_writeback(page), and
then, if no I/O is initiated, immediately calls
end_page_writeback(page).

I'll put together a patch for testing.

Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center

2007-12-28 16:53:30

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH] jfs: clear PAGECACHE_TAG_DIRTY for no-write pages


On Fri, 2007-12-28 at 10:25 -0600, Dave Kleikamp wrote:
> On Fri, 2007-12-28 at 12:33 +0100, Peter Zijlstra wrote:

> > I'm not liking this open-coded tag_clear, although I currently fail to
> > come up with a nice solution.
>
> I'm looking at __block_write_full_page() for guidance. The situation
> here is similar to writing a page where all the bufferheads are clean.
> __block_write_full_page() always calls set_page_writeback(page), and
> then, if no I/O is initiated, immediately calls
> end_page_writeback(page).
>
> I'll put together a patch for testing.

Here it is, compile-test only so far. I borrowed some of the changeset
comments from Fengguang.

JFS: clear PAGECACHE_TAG_DIRTY for no-write pages

When JFS decides to drop a dirty metapage, it simply clears the META_dirty bit
and leave alone the PG_dirty and PAGECACHE_TAG_DIRTY bits.

When such no-write page goes to metapage_writepage(), the `relic'
PAGECACHE_TAG_DIRTY tag should be cleared, to prevent pdflush from
repeatedly trying to sync them. This is done through
set_page_writeback(), so call it should be called in all cases. If
no I/O is initiated, end_page_writeback() should be called immediately.

This is how __block_write_full_page() does things.

Signed-off-by: Dave Kleikamp <[email protected]>
CC: Fengguang Wu <[email protected]>
---

diff -Nurp linux-2.6.24-rc6-git5/fs/jfs/jfs_metapage.c linux/fs/jfs/jfs_metapage.c
--- linux-2.6.24-rc6-git5/fs/jfs/jfs_metapage.c 2007-12-28 10:28:33.000000000 -0600
+++ linux/fs/jfs/jfs_metapage.c 2007-12-28 10:37:30.000000000 -0600
@@ -360,6 +360,7 @@ static int metapage_writepage(struct pag
struct metapage *mp;
int redirty = 0;
sector_t lblock;
+ int nr_underway = 0;
sector_t pblock;
sector_t next_block = 0;
sector_t page_start;
@@ -371,6 +372,7 @@ static int metapage_writepage(struct pag
(PAGE_CACHE_SHIFT - inode->i_blkbits);
BUG_ON(!PageLocked(page));
BUG_ON(PageWriteback(page));
+ set_page_writeback(page);

for (offset = 0; offset < PAGE_CACHE_SIZE; offset += PSIZE) {
mp = page_to_mp(page, offset);
@@ -413,11 +415,10 @@ static int metapage_writepage(struct pag
if (!bio->bi_size)
goto dump_bio;
submit_bio(WRITE, bio);
+ nr_underway++;
bio = NULL;
- } else {
- set_page_writeback(page);
+ } else
inc_io(page);
- }
xlen = (PAGE_CACHE_SIZE - offset) >> inode->i_blkbits;
pblock = metapage_get_blocks(inode, lblock, &xlen);
if (!pblock) {
@@ -449,12 +450,16 @@ static int metapage_writepage(struct pag
goto dump_bio;

submit_bio(WRITE, bio);
+ nr_underway++;
}
if (redirty)
redirty_page_for_writepage(wbc, page);

unlock_page(page);

+ if (nr_underway == 0)
+ end_page_writeback(page);
+
return 0;
add_failed:
/* We should never reach here, since we're only adding one vec */

--
David Kleikamp
IBM Linux Technology Center

2007-12-29 02:21:18

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH] jfs: clear PAGECACHE_TAG_DIRTY for no-write pages

On Fri, Dec 28, 2007 at 10:53:14AM -0600, Dave Kleikamp wrote:


> ---
>
> diff -Nurp linux-2.6.24-rc6-git5/fs/jfs/jfs_metapage.c linux/fs/jfs/jfs_metapage.c
> --- linux-2.6.24-rc6-git5/fs/jfs/jfs_metapage.c 2007-12-28 10:28:33.000000000 -0600
> +++ linux/fs/jfs/jfs_metapage.c 2007-12-28 10:37:30.000000000 -0600
> @@ -360,6 +360,7 @@ static int metapage_writepage(struct pag
> struct metapage *mp;
> int redirty = 0;
> sector_t lblock;
> + int nr_underway = 0;
> sector_t pblock;
> sector_t next_block = 0;
> sector_t page_start;
> @@ -371,6 +372,7 @@ static int metapage_writepage(struct pag
> (PAGE_CACHE_SHIFT - inode->i_blkbits);
> BUG_ON(!PageLocked(page));
> BUG_ON(PageWriteback(page));

This line should be moved below:
> + set_page_writeback(page);

>
> for (offset = 0; offset < PAGE_CACHE_SIZE; offset += PSIZE) {
> mp = page_to_mp(page, offset);
> @@ -413,11 +415,10 @@ static int metapage_writepage(struct pag
> if (!bio->bi_size)
> goto dump_bio;
> submit_bio(WRITE, bio);
> + nr_underway++;
> bio = NULL;
> - } else {
> - set_page_writeback(page);
> + } else
> inc_io(page);
> - }
> xlen = (PAGE_CACHE_SIZE - offset) >> inode->i_blkbits;
> pblock = metapage_get_blocks(inode, lblock, &xlen);
> if (!pblock) {
> @@ -449,12 +450,16 @@ static int metapage_writepage(struct pag
> goto dump_bio;
>
> submit_bio(WRITE, bio);
> + nr_underway++;
> }
> if (redirty)
> redirty_page_for_writepage(wbc, page);
+ else
+ set_page_writeback(page);
>
> unlock_page(page);
>
> + if (nr_underway == 0)
+ if (nr_underway == 0 && redirty == 0)

> + end_page_writeback(page);
> +
> return 0;
> add_failed:
> /* We should never reach here, since we're only adding one vec */

Otherwise looks pretty good.

Reviewed-by: Fengguang Wu <[email protected]>

2007-12-29 04:51:16

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH] jfs: clear PAGECACHE_TAG_DIRTY for no-write pages


On Sat, 2007-12-29 at 10:21 +0800, Fengguang Wu wrote:
> On Fri, Dec 28, 2007 at 10:53:14AM -0600, Dave Kleikamp wrote:
>
>
> > ---
> >
> > diff -Nurp linux-2.6.24-rc6-git5/fs/jfs/jfs_metapage.c linux/fs/jfs/jfs_metapage.c
> > --- linux-2.6.24-rc6-git5/fs/jfs/jfs_metapage.c 2007-12-28 10:28:33.000000000 -0600
> > +++ linux/fs/jfs/jfs_metapage.c 2007-12-28 10:37:30.000000000 -0600
> > @@ -360,6 +360,7 @@ static int metapage_writepage(struct pag
> > struct metapage *mp;
> > int redirty = 0;
> > sector_t lblock;
> > + int nr_underway = 0;
> > sector_t pblock;
> > sector_t next_block = 0;
> > sector_t page_start;
> > @@ -371,6 +372,7 @@ static int metapage_writepage(struct pag
> > (PAGE_CACHE_SHIFT - inode->i_blkbits);
> > BUG_ON(!PageLocked(page));
> > BUG_ON(PageWriteback(page));
>
> This line should be moved below:
> > + set_page_writeback(page);

No. set_page_writeback() needs to be called before submit_bio() is
called.

I don't think there is any harm in calling set_page_writeback(),
redirty_page_for_writeback() and end_page_writeback() in the case where
there is no I/O to submit, and some dirty data cannot be written. It is
consistent with what happens in __block_write_full_page().

It's also possible that some part of the page was written, and another
part cannot be, causing the page to be redirtied.

> >
> > for (offset = 0; offset < PAGE_CACHE_SIZE; offset += PSIZE) {
> > mp = page_to_mp(page, offset);
> > @@ -413,11 +415,10 @@ static int metapage_writepage(struct pag
> > if (!bio->bi_size)
> > goto dump_bio;
> > submit_bio(WRITE, bio);
> > + nr_underway++;
> > bio = NULL;
> > - } else {
> > - set_page_writeback(page);
> > + } else
> > inc_io(page);
> > - }
> > xlen = (PAGE_CACHE_SIZE - offset) >> inode->i_blkbits;
> > pblock = metapage_get_blocks(inode, lblock, &xlen);
> > if (!pblock) {
> > @@ -449,12 +450,16 @@ static int metapage_writepage(struct pag
> > goto dump_bio;
> >
> > submit_bio(WRITE, bio);
> > + nr_underway++;
> > }
> > if (redirty)
> > redirty_page_for_writepage(wbc, page);
> + else
> + set_page_writeback(page);
> >
> > unlock_page(page);
> >
> > + if (nr_underway == 0)
> + if (nr_underway == 0 && redirty == 0)
>
> > + end_page_writeback(page);
> > +
> > return 0;
> > add_failed:
> > /* We should never reach here, since we're only adding one vec */
>
> Otherwise looks pretty good.
>
> Reviewed-by: Fengguang Wu <[email protected]>
>
--
David Kleikamp
IBM Linux Technology Center

2007-12-29 05:51:31

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH] jfs: clear PAGECACHE_TAG_DIRTY for no-write pages

On Fri, Dec 28, 2007 at 10:50:59PM -0600, Dave Kleikamp wrote:
>
> On Sat, 2007-12-29 at 10:21 +0800, Fengguang Wu wrote:
> > On Fri, Dec 28, 2007 at 10:53:14AM -0600, Dave Kleikamp wrote:
> >
> >
> > > ---
> > >
> > > diff -Nurp linux-2.6.24-rc6-git5/fs/jfs/jfs_metapage.c linux/fs/jfs/jfs_metapage.c
> > > --- linux-2.6.24-rc6-git5/fs/jfs/jfs_metapage.c 2007-12-28 10:28:33.000000000 -0600
> > > +++ linux/fs/jfs/jfs_metapage.c 2007-12-28 10:37:30.000000000 -0600
> > > @@ -360,6 +360,7 @@ static int metapage_writepage(struct pag
> > > struct metapage *mp;
> > > int redirty = 0;
> > > sector_t lblock;
> > > + int nr_underway = 0;
> > > sector_t pblock;
> > > sector_t next_block = 0;
> > > sector_t page_start;
> > > @@ -371,6 +372,7 @@ static int metapage_writepage(struct pag
> > > (PAGE_CACHE_SHIFT - inode->i_blkbits);
> > > BUG_ON(!PageLocked(page));
> > > BUG_ON(PageWriteback(page));
> >
> > This line should be moved below:
> > > + set_page_writeback(page);
>
> No. set_page_writeback() needs to be called before submit_bio() is
> called.

Ah yes.

> I don't think there is any harm in calling set_page_writeback(),
> redirty_page_for_writeback() and end_page_writeback() in the case where
> there is no I/O to submit, and some dirty data cannot be written. It is
> consistent with what happens in __block_write_full_page().
>
> It's also possible that some part of the page was written, and another
> part cannot be, causing the page to be redirtied.

You are right. I revisited the code and there's nothing wrong with
your patch :-)

Regards,
Fengguang

2007-12-29 15:21:16

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [PATCH] jfs: clear PAGECACHE_TAG_DIRTY for no-write pages


On Sat, 2007-12-29 at 13:51 +0800, Fengguang Wu wrote:
> On Fri, Dec 28, 2007 at 10:50:59PM -0600, Dave Kleikamp wrote:
> >
> > On Sat, 2007-12-29 at 10:21 +0800, Fengguang Wu wrote:

> > > This line should be moved below:
> > > > + set_page_writeback(page);
> >
> > No. set_page_writeback() needs to be called before submit_bio() is
> > called.
>
> Ah yes.
>
> > I don't think there is any harm in calling set_page_writeback(),
> > redirty_page_for_writeback() and end_page_writeback() in the case where
> > there is no I/O to submit, and some dirty data cannot be written. It is
> > consistent with what happens in __block_write_full_page().
> >
> > It's also possible that some part of the page was written, and another
> > part cannot be, causing the page to be redirtied.
>
> You are right. I revisited the code and there's nothing wrong with
> your patch :-)

Fengguang,

Thank you for all your help with this bug. I've been on vacation and
hadn't had much time to look at it. We'd probably still be scratching
our heads wondering where the problem was without your hard work.

Shaggy
>
> Regards,
> Fengguang
>
--
David Kleikamp
IBM Linux Technology Center