by Mingming Cao

[permalink] [raw]

Subject: Re: [PATCH] ext4: Fix delalloc sync hang with journal lock inversion

On Thu, 2008-05-22 at 23:53 +0530, Aneesh Kumar K.V wrote:
> On Thu, May 22, 2008 at 10:58:35AM -0700, Mingming wrote:
> >
> > On Thu, 2008-05-22 at 15:55 +0530, Aneesh Kumar K.V wrote:
> > > On Wed, May 21, 2008 at 11:14:17PM +0530, Aneesh Kumar K.V wrote:
> > > > Signed-off-by: Aneesh Kumar K.V <[email protected]>
> > > > ---
> > > > fs/ext4/inode.c | 10 +++++++---
> > > > 1 files changed, 7 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > > index 46cc610..076d00f 100644
> > > > --- a/fs/ext4/inode.c
> > > > +++ b/fs/ext4/inode.c
> > > > @@ -1571,13 +1571,17 @@ static int ext4_da_writepages(struct address_space *mapping,
> > > > */
> > > > if (wbc->nr_to_write > EXT4_MAX_WRITEBACK_PAGES)
> > > > wbc->nr_to_write = EXT4_MAX_WRITEBACK_PAGES;
> > > > - to_write -= wbc->nr_to_write;
> > > >
> > > > + to_write -= wbc->nr_to_write;
> > > > ret = mpage_da_writepages(mapping, wbc, ext4_da_get_block_write);
> > > > ext4_journal_stop(handle);
> > > > - to_write +=wbc->nr_to_write;
> > > > + if (wbc->nr_to_write) {
> > > > + /* We failed to write what we requested for */
> > > > + to_write += wbc->nr_to_write;
> > > > + break;
> > > > + }
> > > > + wbc->nr_to_write = to_write;
> > > > }
> > > > -
> > > > out_writepages:
> > > > wbc->nr_to_write = to_write;
> > > > wbc->range_cyclic = range_cyclic;
> > >
> > > We need related fix for ext4_da_writepage. We need to allocate blocks in
> > > ext4_da_writepage and we are called with page_lock. The handle
> > > will be NULL in the below case and that would result in
> > > ext4_get_block starting a new transaction when allocating blocks.
> > >
> >
> > Hi Aneesh, the blocks are not allocated at ext4_da_writepage() time,
> >
> > the block allocation has been done in this path:
> >
> > ext4_da_writepages()->mpage_da_writepages()->write_cache_pages()->
> > __mpage_da_writepage()->mpage_da_map_blocks() will ensure blocks are all
> > mapped before mpage_da_submit_io() calling
> > __mpage_writepage()->ext4_da_writepage() to submit the IO.
> >
>
> Does that mean we don't allocate new blocks at all in ext4_da_writepage.
> Then I will put a BUG() if we get passed a page that doesn't have all
> the buffer head mapped in ext4_da_writepage.
>
Yes. that would be nice to have.

> We still need a diff as below
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 46cc610..8327796 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1498,9 +1498,8 @@ static int __ext4_da_writepage(struct page *page,
> {
> struct inode *inode = page->mapping->host;
> handle_t *handle = NULL;
> - int ret = 0;
> + int ret = 0, err;
>
> - handle = ext4_journal_current_handle();
>
> if (test_opt(inode->i_sb, NOBH) && ext4_should_writeback_data(inode))
> ret = nobh_writepage(page, ext4_get_block, wbc);
> @@ -1508,12 +1507,21 @@ static int __ext4_da_writepage(struct page *page,
> ret = block_write_full_page(page, ext4_get_block, wbc);
>
> if (!ret && inode->i_size > EXT4_I(inode)->i_disksize) {
> + handle = ext4_journal_start(inode, 1);
> + if (IS_ERR(handle)) {
> + ret = PTR_ERR(handle);
> + goto out;
> + }
> EXT4_I(inode)->i_disksize = inode->i_size;
> - ext4_mark_inode_dirty(handle, inode);
> + ret = ext4_mark_inode_dirty(handle, inode);
> + err = ext4_journal_stop(handle);
> + if (!ret)
> + ret = err;
> }
> -
> +out:
> return ret;
> }
> +
>
>

As we have discussed on IRC, I think there is bug in
ext4_da_write_page(), since ext4_da_write_page()/__ext4_da_write_page()
is always called with a journal started (by ext4_da_writepages()), so we
don't need to start a new journal in __ext4_da_write_page().

Mingming
> -aneesh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html