From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Subject: Re: [PATCH] ext4: Add validation to jbd lock inversion patch and
	split and writepage
Date: Mon, 2 Jun 2008 15:22:22 +0530
Message-ID: <20080602095222.GA9225@skywalker>
References: <1212154769-16486-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1212154769-16486-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1212154769-16486-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1212154769-16486-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1212154769-16486-5-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20080602093106.GB30613@duck.suse.cz>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: cmm@us.ibm.com, linux-ext4@vger.kernel.org
To: Jan Kara <jack@suse.cz>
Content-Disposition: inline
In-Reply-To: <20080602093106.GB30613@duck.suse.cz>
Sender: linux-ext4-owner@vger.kernel.org

Hi Jan,


On Mon, Jun 02, 2008 at 11:31:06AM +0200, Jan Kara wrote:
>   Hi Aneesh,
> 
>   Thanks for the patch but I though we decided to do this a bit differently -
> see below.

Please feel free to merge the changes back to the lock inversion
patches. I sent it as a separate the patches to make it easier for review.

> 
> On Fri 30-05-08 19:09:27, Aneesh Kumar K.V wrote:
> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > ---
> >  fs/ext4/inode.c |  181 +++++++++++++++++++++++++++++++++++++++++++++++--------
> >  1 files changed, 156 insertions(+), 25 deletions(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index a96c325..b122425 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -1479,6 +1479,11 @@ static int jbd2_journal_dirty_data_fn(handle_t *handle, struct buffer_head *bh)
> >  	return 0;
> >  }
> >  
> > +static int ext4_bh_unmapped_or_delay(handle_t *handle, struct buffer_head *bh)
> > +{
> > +	return (!buffer_mapped(bh) || buffer_delay(bh));
> > +}
> > +
> >  /*
> >   * Note that we don't need to start a transaction unless we're journaling
> >   * data because we should have holes filled from ext4_page_mkwrite(). If
> > @@ -1531,18 +1536,26 @@ static int jbd2_journal_dirty_data_fn(handle_t *handle, struct buffer_head *bh)
> >  static int __ext4_ordered_writepage(struct page *page,
> >  				struct writeback_control *wbc)
> >  {
> > -	struct inode *inode = page->mapping->host;
> > -	struct buffer_head *page_bufs;
> > +	int ret = 0, err;
> > +	unsigned long len;
> >  	handle_t *handle = NULL;
> > -	int ret = 0;
> > -	int err;
> > +	struct buffer_head *page_bufs;
> > +	struct inode *inode = page->mapping->host;
> > +	loff_t size = i_size_read(inode);
> >  
> > -	if (!page_has_buffers(page)) {
> > -		create_empty_buffers(page, inode->i_sb->s_blocksize,
> > -				(1 << BH_Dirty)|(1 << BH_Uptodate));
> > -	}
>   This is OK.
> 
> >  	page_bufs = page_buffers(page);
> > -	walk_page_buffers(handle, page_bufs, 0,
> > +	if (page->index == size >> PAGE_CACHE_SHIFT)
> > +		len = size & ~PAGE_CACHE_MASK;
> > +	else
> > +		len = PAGE_CACHE_SIZE;
> > +
> > +	if (walk_page_buffers(NULL, page_bufs, 0,
> > +				len, NULL, ext4_bh_unmapped_or_delay)) {
> > +		printk(KERN_CRIT "%s called with unmapped or delay buffer\n",
> > +								__func__);
> > +		BUG();
> > +	}
>   But I'd move this check to ext4_ordered_writepage().
> 

Please feel free to do so. The only reason for me to do it as a separate
function is to clarify that with the changes writepage doesn't do any
block allocation at all. And calling writepage via page_mkwrite goes
against that. So i renamed the page_mkwrite call out to *_allocpage
and made sure writepage doesn't do any block allocation.


> > +	walk_page_buffers(NULL, page_bufs, 0,
> >  			PAGE_CACHE_SIZE, NULL, bget_one);
> >  

[.... snip ..... ]

> >  
> > +	if (walk_page_buffers(NULL, page_bufs, 0,
> > +				len, NULL, ext4_bh_unmapped_or_delay)) {
> > +		printk(KERN_CRIT "%s called with unmapped or delay buffer\n",
> > +								__func__);
> > +		BUG();
> > +	}
> > +	/* FIXME!! do we need to call prepare_write for a mapped buffer */
>   This can go to ext4_journalled_writepage(). What is actually this FIXME
> about? I'm not sure I get it...
> 


I was wondering whether we need to call prepare_write in writepage ?. We
are not allocating any new blocks in writepage with these changes.


> >  	ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE, ext4_get_block);
> >  	if (ret != 0)
> >  		goto out_unlock;
> >  
> > -	page_bufs = page_buffers(page);
> >  	walk_page_buffers(handle, page_bufs, 0, PAGE_CACHE_SIZE, NULL,
> >  								bget_one);
> >  	/* As soon as we unlock the page, it can go away, but we have
> > @@ -1671,14 +1713,13 @@ static int __ext4_journalled_writepage(struct page *page,
> >  static int ext4_journalled_writepage(struct page *page,
> >  				struct writeback_control *wbc)
> >  {
> > +	BUG_ON(!page_has_buffers(page));
> > +
> >  	if (ext4_journal_current_handle())
> >  		goto no_write;
> >  
> > -	if (!page_has_buffers(page) || PageChecked(page)) {
> > -		/*
> > -		 * It's mmapped pagecache.  Add buffers and journal it.  There
> > -		 * doesn't seem much point in redirtying the page here.
> > -		 */
> > +	if (PageChecked(page)) {
> > +		/* dirty pages in data=journal mode */
> >  		ClearPageChecked(page);
> >  		return __ext4_journalled_writepage(page, wbc);
> >  	} else {
> > @@ -3520,6 +3561,96 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> >  	return err;
> >  }
> >  
> > +static int __ext4_journalled_allocpage(struct page *page,
> > +				struct writeback_control *wbc)
> > +{
> > +	int ret = 0, err;
> > +	handle_t *handle = NULL;
> > +	struct address_space *mapping = page->mapping;
> > +	struct inode *inode = mapping->host;
> > +	struct buffer_head *page_bufs;
> > +
> > +	/* if alloc we are called after statring a journal */
> > +	handle = ext4_journal_current_handle();
> > +	BUG_ON(!handle);
> > +
> > +	ret = block_prepare_write(page, 0, PAGE_CACHE_SIZE, ext4_get_block);
> > +	if (ret != 0)
> > +		goto out_unlock;
> > +
> > +	/* FIXME!! should we do a bget_one */
> > +	page_bufs = page_buffers(page);
> > +	ret = walk_page_buffers(handle, page_bufs, 0,
> > +			PAGE_CACHE_SIZE, NULL, do_journal_get_write_access);
> > +


I also have a FIXME here. I am not sure whether unlocking the page have
some effect. Can you verify this ?


> > +	err = walk_page_buffers(handle, page_bufs, 0,
> > +				PAGE_CACHE_SIZE, NULL, write_end_fn);
> > +	if (ret == 0)
> > +		ret = err;
> > +	EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
> > +
> > +out_unlock:
> > +	unlock_page(page);
> > +	return ret;
> > +}
> > +


-aneesh