2008-11-20 00:34:50

by Toshiyuki Okajima

[permalink] [raw]
Subject: [RESEND][PATCH 2/3 BUG,RFC] ext3: release block-device-mapping buffer_heads which have the filesystem private data for avoiding oom-killer

From: Toshiyuki Okajima <[email protected]>

Implement releasepage() of a client of a block device to release its private
data (journal_heads) and the page. The client is an ext3 filesystem.

This function is called from releasepage() of the block-device mapping,
blkdev_releasepage().

Signed-off-by: Toshiyuki Okajima <[email protected]>
---
fs/ext3/inode.c | 16 ++++++++++++++++
fs/ext3/super.c | 2 ++
include/linux/ext3_fs.h | 2 ++
3 files changed, 20 insertions(+)

diff -Nurp linux-2.6.28-rc4.orig/fs/ext3/inode.c linux-2.6.28-rc4/fs/ext3/inode.c
--- linux-2.6.28-rc4.orig/fs/ext3/inode.c 2008-11-10 09:36:15.000000000 +0900
+++ linux-2.6.28-rc4/fs/ext3/inode.c 2008-11-11 09:31:13.000000000 +0900
@@ -1680,6 +1680,28 @@ static int ext3_releasepage(struct page
}

/*
+ * ext3_release_metadata: try to release own page.
+ * The mapping of this page is block device's. Therefore try_to_free_buffers()
+ * cannot release it. As a result, a function which releases own page is
+ * required.
+ */
+int ext3_release_metadata(void *client, struct page *page, gfp_t wait)
+{
+ struct super_block *sb = (struct super_block*)client;
+ journal_t *journal;
+
+ WARN_ON(PageChecked(page));
+ if (!page_has_buffers(page))
+ return 0;
+ BUG_ON(EXT3_SB(sb) == NULL);
+ journal = EXT3_SB(sb)->s_journal;
+ if (journal != NULL)
+ return journal_try_to_free_buffers(journal, page, wait);
+ else
+ return try_to_free_buffers(page);
+}
+
+/*
* If the O_DIRECT write will extend the file then add this inode to the
* orphan list. So recovery will truncate it back to the original size
* if the machine crashes during the write.
diff -Nurp linux-2.6.28-rc4.orig/fs/ext3/super.c linux-2.6.28-rc4/fs/ext3/super.c
--- linux-2.6.28-rc4.orig/fs/ext3/super.c 2008-11-10 09:36:15.000000000 +0900
+++ linux-2.6.28-rc4/fs/ext3/super.c 2008-11-11 09:01:53.000000000 +0900
@@ -2972,6 +2972,8 @@ static struct file_system_type ext3_fs_t
.name = "ext3",
.get_sb = ext3_get_sb,
.kill_sb = kill_block_super,
+ .release_metadata
+ = ext3_release_metadata,
.fs_flags = FS_REQUIRES_DEV,
};

diff -Nurp linux-2.6.28-rc4.orig/include/linux/ext3_fs.h linux-2.6.28-rc4/include/linux/ext3_fs.h
--- linux-2.6.28-rc4.orig/include/linux/ext3_fs.h 2008-11-10 09:36:15.000000000 +0900
+++ linux-2.6.28-rc4/include/linux/ext3_fs.h 2008-11-10 18:33:52.000000000 +0900
@@ -841,6 +841,8 @@ extern void ext3_get_inode_flags(struct
extern void ext3_set_aops(struct inode *inode);
extern int ext3_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len);
+extern int ext3_release_metadata(void *client, struct page *page,
+ gfp_t wait);

/* ioctl.c */
extern int ext3_ioctl (struct inode *, struct file *, unsigned int,


2008-11-24 05:51:33

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RESEND][PATCH 2/3 BUG,RFC] ext3: release block-device-mapping buffer_heads which have the filesystem private data for avoiding oom-killer

Hi Toshijuki,

My apologizes for not getting back to you sooner. My travel schedule
has been a little crazy lately, including being in Japan last week
speaking at the Japan Linux Symposium.

Your patch series looks good, but I have one comment, for the ext3 and
ext4 patches:

> + if (journal != NULL)
> + return journal_try_to_free_buffers(journal, page, wait);
> + else
> + return try_to_free_buffers(page);

According to the documentation for journal_try_to_free_buffers():

* This function returns non-zero if we wish try_to_free_buffers()
* to be called. We do this if the page is releasable by try_to_free_buffers().
* We also do it if the page has locked or dirty buffers and the caller wants
* us to perform sync or async writeout.

So I think the last conditional in ext3_release_metadata() needs to be
changed to be like this:

+ if ((journal != NULL) &&
+ (journal_try_to_free_buffers(journal, page, wait) == 0))
+ return 0;
+ return try_to_free_buffers(page);

A similar change should be made in the ext4 version of the patch.
Does that sound OK to you?

Regards,

- Ted

2008-11-25 01:01:08

by Toshiyuki Okajima

[permalink] [raw]
Subject: Re: [RESEND][PATCH 2/3 BUG,RFC] ext3: release block-device-mapping buffer_heads which have the filesystem private data for avoiding oom-killer

Hi Ted,
Thanks for your comments.

> Hi Toshijuki,
>
> My apologizes for not getting back to you sooner. My travel schedule
Never mind.

> has been a little crazy lately, including being in Japan last week
> speaking at the Japan Linux Symposium.
>
> Your patch series looks good, but I have one comment, for the ext3 and
> ext4 patches:
>
> > + if (journal != NULL)
> > + return journal_try_to_free_buffers(journal, page, wait);
> > + else
> > + return try_to_free_buffers(page);
>
> According to the documentation for journal_try_to_free_buffers():
>
> * This function returns non-zero if we wish try_to_free_buffers()
> * to be called. We do this if the page is releasable by try_to_free_buffers().
> * We also do it if the page has locked or dirty buffers and the caller wants
> * us to perform sync or async writeout.
I forgot reading it.

>
> So I think the last conditional in ext3_release_metadata() needs to be
> changed to be like this:
>
> + if ((journal != NULL) &&
> + (journal_try_to_free_buffers(journal, page, wait) == 0))
> + return 0;
> + return try_to_free_buffers(page);
>
> A similar change should be made in the ext4 version of the patch.
> Does that sound OK to you?
Yes.

I will make and repost a revised patch which reflects your comments later.

>
> Regards,
>
> - Ted

Thanks again,
Toshiyuki Okajima


2008-11-25 04:09:24

by Toshiyuki Okajima

[permalink] [raw]
Subject: Re: [RESEND][PATCH 2/3 BUG,RFC] ext3: release block-device-mapping buffer_heads which have the filesystem private data for avoiding oom-killer

Hi Ted,
I have reconsidered your comments.

Toshiyuki Okajima wrote:
> Hi Ted,
> Thanks for your comments.
>
> > Hi Toshijuki,
<SNIP>
> > Your patch series looks good, but I have one comment, for the ext3 and
> > ext4 patches:
> >
> > > + if (journal != NULL)
> > > + return journal_try_to_free_buffers(journal, page, wait);
> > > + else
> > > + return try_to_free_buffers(page);
> >
> > According to the documentation for journal_try_to_free_buffers():
> >
> > * This function returns non-zero if we wish try_to_free_buffers()
> > * to be called. We do this if the page is releasable by
> try_to_free_buffers().
> > * We also do it if the page has locked or dirty buffers and the
> caller wants
> > * us to perform sync or async writeout.
> I forgot reading it.
I think this document is obsolete because a current version of
journal_try_to_free_buffers() also calls try_to_free_buffers().
So, this document needs to be changed.
Therefore I don't need to change my patch, OK?

[current version]
1727 int journal_try_to_free_buffers(journal_t *journal,
1728 struct page *page, gfp_t gfp_mask)
1729 {
1730 struct buffer_head *head;
1731 struct buffer_head *bh;
1732 int ret = 0;
1733
1734 J_ASSERT(PageLocked(page));
1735
1736 head = page_buffers(page);
1737 bh = head;
1738 do {
1739 struct journal_head *jh;
1740
1741 /*
1742 * We take our own ref against the journal_head here to avoid
1743 * having to add tons of locking around each instance of
1744 * journal_remove_journal_head() and journal_put_journal_head().
1745 */
1746 jh = journal_grab_journal_head(bh);
1747 if (!jh)
1748 continue;
1749
1750 jbd_lock_bh_state(bh);
1751 __journal_try_to_free_buffer(journal, bh);
1752 journal_put_journal_head(jh);
1753 jbd_unlock_bh_state(bh);
1754 if (buffer_jbd(bh))
1755 goto busy;
1756 } while ((bh = bh->b_this_page) != head);
1757
1758 ret = try_to_free_buffers(page);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1759
1760 /*
1761 * There are a number of places where journal_try_to_free_buffers()
1762 * could race with journal_commit_transaction(), the later still
1763 * holds the reference to the buffers to free while processing them.
1764 * try_to_free_buffers() failed to free those buffers. Some of the
1765 * caller of releasepage() request page buffers to be dropped, otherwise
1766 * treat the fail-to-free as errors (such as generic_file_direct_IO())
1767 *
1768 * So, if the caller of try_to_release_page() wants the synchronous
1769 * behaviour(i.e make sure buffers are dropped upon return),
1770 * let's wait for the current transaction to finish flush of
1771 * dirty data buffers, then try to free those buffers again,
1772 * with the journal locked.
1773 */
1774 if (ret == 0 && (gfp_mask & __GFP_WAIT) && (gfp_mask & __GFP_FS)) {
1775 journal_wait_for_transaction_sync_data(journal);
1776 ret = try_to_free_buffers(page);
1777 }
1778
1779 busy:
1780 return ret;
1781 }

[old version(linux-2.4.36.8)]
1731 int journal_try_to_free_buffers(journal_t *journal,
1732 struct page *page, int gfp_mask)
1733 {
...
1759 struct buffer_head *bh;
1760 struct buffer_head *tmp;
1761 int locked_or_dirty = 0;
1762 int call_ttfb = 1;
1763
1764 J_ASSERT(PageLocked(page));
1765
1766 bh = page->buffers;
1767 tmp = bh;
1768 spin_lock(&journal_datalist_lock);
1769 do {
1770 struct buffer_head *p = tmp;
1771
1772 if (unlikely(!tmp)) {
1773 debug_page(page);
1774 BUG();
1775 }
1776
1777 tmp = tmp->b_this_page;
1778 if (buffer_jbd(p))
1779 if (!__journal_try_to_free_buffer(p, &locked_or_dirty))
1780 call_ttfb = 0;
1781 } while (tmp != bh);
1782 spin_unlock(&journal_datalist_lock);
1783
1784 if (!(gfp_mask & (__GFP_IO|__GFP_WAIT)))
1785 goto out;
1786 if (!locked_or_dirty)
1787 goto out;
1788 /*
1789 * The VM wants us to do writeout, or to block on IO, or both.
1790 * So we allow try_to_free_buffers to be called even if the page
1791 * still has journalled buffers.
1792 */
1793 call_ttfb = 1;
1794 out:
1795 return call_ttfb;
1796 }

Regards,
Toshiyuki Okajima