From: "Aneesh Kumar K.V" Subject: Re: [PATCH -V2 2/2] ext4: truncate the file properly if we fail to copy data from userspace Date: Mon, 8 Jun 2009 22:13:57 +0530 Message-ID: <20090608164357.GA23723@skywalker> References: <20090605234458.GG11650@duck.suse.cz> <1244435715-6807-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <1244435715-6807-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com> <20090608162959.GJ23883@mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: cmm@us.ibm.com, sandeen@redhat.com, linux-ext4@vger.kernel.org, Jan Kara To: Theodore Tso Return-path: Received: from e23smtp04.au.ibm.com ([202.81.31.146]:50778 "EHLO e23smtp04.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754340AbZFHQoe (ORCPT ); Mon, 8 Jun 2009 12:44:34 -0400 Received: from d23relay02.au.ibm.com (d23relay02.au.ibm.com [202.81.31.244]) by e23smtp04.au.ibm.com (8.13.1/8.13.1) with ESMTP id n58GgDob000973 for ; Tue, 9 Jun 2009 02:42:13 +1000 Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by d23relay02.au.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n58GiYMp774160 for ; Tue, 9 Jun 2009 02:44:34 +1000 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n58GiXvc002377 for ; Tue, 9 Jun 2009 02:44:34 +1000 Content-Disposition: inline In-Reply-To: <20090608162959.GJ23883@mit.edu> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Mon, Jun 08, 2009 at 12:29:59PM -0400, Theodore Tso wrote: > I had also already fixed this up in the patch queue. This is what is > currently there. > > - Ted > > From: "Aneesh Kumar K.V" > > ext4: truncate the file properly if we fail to copy data from userspace > > In generic_perform_write if we fail to copy the user data we don't > update the inode->i_size. We should truncate the file in the above > case so that we don't have blocks allocated outside inode->i_size. Add > the inode to orphan list in the same transaction as block allocation > This ensures that if we crash in between the recovery would do the > truncate. > > Signed-off-by: Aneesh Kumar K.V > CC: Jan Kara > Signed-off-by: "Theodore Ts'o" > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 50ef1a2..8225f21 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1551,6 +1551,52 @@ static int write_end_fn(handle_t *handle, struct buffer_head *bh) > return ext4_handle_dirty_metadata(handle, NULL, bh); > } > > +static int ext4_generic_write_end(struct file *file, > + struct address_space *mapping, > + loff_t pos, unsigned len, unsigned copied, > + struct page *page, void *fsdata) > +{ > + int i_size_changed = 0; > + struct inode *inode = mapping->host; > + handle_t *handle = ext4_journal_current_handle(); > + > + copied = block_write_end(file, mapping, pos, len, copied, page, fsdata); > + > + /* > + * No need to use i_size_read() here, the i_size > + * cannot change under us because we hold i_mutex. > + * > + * But it's important to update i_size while still holding page lock: > + * page writeout could otherwise come in and zero beyond i_size. > + */ > + if (pos + copied > inode->i_size) { > + i_size_write(inode, pos + copied); > + i_size_changed = 1; > + } > + > + if (pos + copied > EXT4_I(inode)->i_disksize) { > + /* We need to mark inode dirty even if > + * new_i_size is less that inode->i_size > + * bu greater than i_disksize.(hint delalloc) > + */ > + ext4_update_i_disksize(inode, (pos + copied)); > + i_size_changed = 1; > + } > + unlock_page(page); > + page_cache_release(page); You missed a block_unlock_hole_extend here. So you need to either fix jan's patches or move this patch after jan's patch > + > + /* > + * Don't mark the inode dirty under page lock. First, it unnecessarily > + * makes the holding time of page lock longer. Second, it forces lock > + * ordering of page lock and transaction start for journaling > + * filesystems. > + */ > + if (i_size_changed) > + ext4_mark_inode_dirty(handle, inode); > + > + return copied; > +} > + > /* > * We need to pick up the new inode size which generic_commit_write gave us > * `file' can be NULL - eg, when called from page_symlink(). > @@ -1574,21 +1620,15 @@ static int ext4_ordered_write_end(struct file *file, > ret = ext4_jbd2_file_inode(handle, inode); > > if (ret == 0) { > - loff_t new_i_size; > - > - new_i_size = pos + copied; > - if (new_i_size > EXT4_I(inode)->i_disksize) { > - ext4_update_i_disksize(inode, new_i_size); > - /* We need to mark inode dirty even if > - * new_i_size is less that inode->i_size > - * bu greater than i_disksize.(hint delalloc) > - */ > - ext4_mark_inode_dirty(handle, inode); > - } > - > - ret2 = generic_write_end(file, mapping, pos, len, copied, > + ret2 = ext4_generic_write_end(file, mapping, pos, len, copied, > page, fsdata); > copied = ret2; > + if (pos + len > inode->i_size) > + /* if we have allocated more blocks and copied > + * less. We will have blocks allocated outside > + * inode->i_size. So truncate them > + */ > + ext4_orphan_add(handle, inode); > if (ret2 < 0) > ret = ret2; > } > @@ -1596,6 +1636,18 @@ static int ext4_ordered_write_end(struct file *file, > if (!ret) > ret = ret2; > > + if (pos + len > inode->i_size) { > + vmtruncate(inode, inode->i_size); > + /* > + * If vmtruncate failed early the inode might still be > + * on the orphan list; we need to make sure the inode > + * is removed from the orphan list in that case. > + */ > + if (inode->i_nlink) > + ext4_orphan_del(NULL, inode); > + } > + > + > return ret ? ret : copied; > } > > @@ -1607,25 +1659,21 @@ static int ext4_writeback_write_end(struct file *file, > handle_t *handle = ext4_journal_current_handle(); > struct inode *inode = mapping->host; > int ret = 0, ret2; > - loff_t new_i_size; > > trace_mark(ext4_writeback_write_end, > "dev %s ino %lu pos %llu len %u copied %u", > inode->i_sb->s_id, inode->i_ino, > (unsigned long long) pos, len, copied); > - new_i_size = pos + copied; > - if (new_i_size > EXT4_I(inode)->i_disksize) { > - ext4_update_i_disksize(inode, new_i_size); > - /* We need to mark inode dirty even if > - * new_i_size is less that inode->i_size > - * bu greater than i_disksize.(hint delalloc) > - */ > - ext4_mark_inode_dirty(handle, inode); > - } > - > - ret2 = generic_write_end(file, mapping, pos, len, copied, > + ret2 = ext4_generic_write_end(file, mapping, pos, len, copied, > page, fsdata); > copied = ret2; > + if (pos + len > inode->i_size) > + /* if we have allocated more blocks and copied > + * less. We will have blocks allocated outside > + * inode->i_size. So truncate them > + */ > + ext4_orphan_add(handle, inode); > + > if (ret2 < 0) > ret = ret2; > > @@ -1633,6 +1681,17 @@ static int ext4_writeback_write_end(struct file *file, > if (!ret) > ret = ret2; > > + if (pos + len > inode->i_size) { > + vmtruncate(inode, inode->i_size); > + /* > + * If vmtruncate failed early the inode might still be > + * on the orphan list; we need to make sure the inode > + * is removed from the orphan list in that case. > + */ > + if (inode->i_nlink) > + ext4_orphan_del(NULL, inode); > + } > + > return ret ? ret : copied; > } > > @@ -1677,10 +1736,27 @@ static int ext4_journalled_write_end(struct file *file, > } > > unlock_page(page); > + page_cache_release(page); > + if (pos + len > inode->i_size) > + /* if we have allocated more blocks and copied > + * less. We will have blocks allocated outside > + * inode->i_size. So truncate them > + */ > + ext4_orphan_add(handle, inode); > + > ret2 = ext4_journal_stop(handle); > if (!ret) > ret = ret2; > - page_cache_release(page); > + if (pos + len > inode->i_size) { > + vmtruncate(inode, inode->i_size); > + /* > + * If vmtruncate failed early the inode might still be > + * on the orphan list; we need to make sure the inode > + * is removed from the orphan list in that case. > + */ > + if (inode->i_nlink) > + ext4_orphan_del(NULL, inode); > + } Here also need special care with block_unlock_hole_extend. We need to do a vmtruncate outside block_unlock_hole_extend. > > return ret ? ret : copied; > } I think i both the case Jan's patch allocate-blocks-correctly-with-subpage-blocksize need an update ? Since you have put the patch before Jan's changes. -aneesh