2007-12-14 15:48:46

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [RFC] truncate_mutex to read_write semaphore

The series include the truncate_mutex to read write semaphore conversion. I am
marking below some of the test results.


For O_DIRECT workloads we won't see the contention on truncate mutex because we are
doing a get_block under inode->i_mutex.


For FIBMAP we won't see contention because the get_block get called under BKL.


threaded read with low memory
---------------------------

Top contenting locks were: (/proc/lock_stat output)

&q->__queue_lock: 12549 12572 10.65 3302.16 36818.78 46618 395721 3.47 49636.48 571453.47
&inode->i_data.tree_lock-W: 3970 4026 2.62 33.39 3508.74 25924 95164 5.33 949.59 80180.03
&inode->i_data.tree_lock-R: 1937 2002 2.52 22.05 1528.78 19543 141863 5.57 119.72 137126.60
&ei->truncate_mutex#2: 4553 4769 169.62 1028484.20 39334253.92 19610 47069 31.74 102280.63 680802.57

second run
---------

&q->__queue_lock: 12499 12535 3.76 247.71 19799.94 46341 405427 4.34 216.31 527282.59
&inode->i_data.tree_lock-W: 4009 4071 10.04 31.78 3434.95 25612 93458 7.29 61.87 78365.20
&inode->i_data.tree_lock-R: 1919 1973 4.43 30.93 1523.04 18953 142635 4.95 109.20 137098.84
&ei->truncate_mutex#2: 4346 4499 1546.39 896379.29 31107317.47 19051 48223 37.94 122579.25 628968.65

The above result implies that the threaded read with low memory (booted with
mem=512M on a 16 cpu x86-64) results in contention on truncate_mutex.


threaded read with low memory after converting to i_data_sem
---------------

&ei->i_data_sem-R: 0 0 0.00 0.00 0.00 18017 48801 38.12 3494783.37 22982474.21
&ei->i_data_sem-R: 0 0 0.00 0.00 0.00 18233 49118 45.09 4953783.87 32699001.46

As you can see from the /proc/lock_stat output above the write semaphore is
not taken at all.

threaded write
--------------
&ei->i_data_sem-W: 0 0 0.00 0.00 0.00 24 64163 41.04 2620905.32 16331786.48
&ei->i_data_sem-R: 0 0 0.00 0.00 0.00 13352 83969 51.40 1212864.74 2834511.75


Here we see some read semphore acquisition. We take read mode of the semaphore
to not content in the overwrite case. We see no contention here because the
write gets done under inode->i_mutex

&sb->s_type->i_mutex_key#1: 313958 313962 3650.35 99510834.17 4881402594.11 314481 616528 37.22 7579553.97 54139119.82
--------------------------
&sb->s_type->i_mutex_key#1 313962 [<ffffffff8027aa80>] generic_file_aio_write+0x4f/0xc2
&sb->s_type->i_mutex_key#1 0 [<ffffffff802a538c>] generic_file_llseek+0x36/0x98


second-run
---------

&ei->i_data_sem-W: 0 0 0.00 0.00 0.00 2 61143 41.56 9299754.45 15811211.79
&ei->i_data_sem-R: 0 0 0.00 0.00 0.00 13272 82442 68.40 1632405.22 2877135.32



&sb->s_type->i_mutex_key#1: 441031 441163 10873.77 144350457.93 4988289572.34 441679 742079 163.05 15158665.56 59655118.60
--------------------------
&sb->s_type->i_mutex_key#1 441163 [<ffffffff802a538c>] generic_file_llseek+0x36/0x98
&sb->s_type->i_mutex_key#1 0 [<ffffffff8027aa80>] generic_file_aio_write+0x4f/0xc2



The test program is at http://www.radian.org/~kvaneesh/ext4/truncate_mutex/
The file system is modified to create highly fragmented file via frag.c


2007-12-14 15:48:51

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH 3/3] ext4: Take read lock during overwrite case.

When we are overwriting a file and not actually allocating new file system
blocks we need to take only the read lock on i_data_sem.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/inode.c | 32 ++++++++++++++++++++++++--------
1 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 669d560..7dda65d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -895,11 +895,31 @@ int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block,
int create, int extend_disksize)
{
int retval;
- if (create) {
- down_write((&EXT4_I(inode)->i_data_sem));
+ /*
+ * Try to see if we can get the block without requesting
+ * for new file system block.
+ */
+ down_read((&EXT4_I(inode)->i_data_sem));
+ if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
+ retval = ext4_ext_get_blocks(handle, inode, block, max_blocks,
+ bh, 0, 0);
} else {
- down_read((&EXT4_I(inode)->i_data_sem));
+ retval = ext4_get_blocks_handle(handle, inode, block, max_blocks,
+ bh, 0, 0);
}
+ up_read((&EXT4_I(inode)->i_data_sem));
+ if (!create || (retval > 0))
+ return retval;
+
+ /*
+ * We need to allocate new blocks which will result
+ * in i_data update
+ */
+ down_write((&EXT4_I(inode)->i_data_sem));
+ /*
+ * We need to check for EXT4 here because migrate
+ * could have changed the inode type in between
+ */
if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
retval = ext4_ext_get_blocks(handle, inode, block, max_blocks,
bh, create, extend_disksize);
@@ -907,11 +927,7 @@ int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block,
retval = ext4_get_blocks_handle(handle, inode, block, max_blocks,
bh, create, extend_disksize);
}
- if (create) {
- up_write((&EXT4_I(inode)->i_data_sem));
- } else {
- up_read((&EXT4_I(inode)->i_data_sem));
- }
+ up_write((&EXT4_I(inode)->i_data_sem));
return retval;
}
static int ext4_get_block(struct inode *inode, sector_t iblock,
--
1.5.4.rc0-dirty

2007-12-14 15:48:50

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH 2/3] ext4: Convert truncate_mutex to read write semaphore.

We are currently taking the truncate_mutex for every read. This would have
performance impact on large CPU configuration. Convert the lock to read write
semaphore and take read lock when we are trying to read the file.

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/balloc.c | 2 +-
fs/ext4/extents.c | 13 +++++++------
fs/ext4/file.c | 4 ++--
fs/ext4/inode.c | 39 ++++++++++++++++++++++++++++++++-------
fs/ext4/ioctl.c | 4 ++--
fs/ext4/super.c | 2 +-
include/linux/ext4_fs.h | 22 +++-------------------
include/linux/ext4_fs_i.h | 6 +++---
8 files changed, 51 insertions(+), 41 deletions(-)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 71ee95e..4eca63d 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -463,7 +463,7 @@ static inline int rsv_is_empty(struct ext4_reserve_window *rsv)
* when setting the reservation window size through ioctl before the file
* is open for write (needs block allocation).
*
- * Needs truncate_mutex protection prior to call this function.
+ * Needs down_write(i_data_sem) protection prior to call this function.
*/
void ext4_init_block_alloc_info(struct inode *inode)
{
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 98ef84d..3513301 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1666,7 +1666,7 @@ int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
* This routine returns max. credits that the extent tree can consume.
* It should be OK for low-performance paths like ->writepage()
* To allow many writing processes to fit into a single transaction,
- * the caller should calculate credits under truncate_mutex and
+ * the caller should calculate credits under i_data_sem and
* pass the actual path.
*/
int ext4_ext_calc_credits_for_insert(struct inode *inode,
@@ -2227,7 +2227,8 @@ out:

/*
* Need to be called with
- * mutex_lock(&EXT4_I(inode)->truncate_mutex);
+ * down_read(&EXT4_I(inode)->i_data_sem) if not allocating file system block
+ * (ie, create is zero). Otherwise down_write(&EXT4_I(inode)->i_data_sem)
*/
int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
ext4_fsblk_t iblock,
@@ -2446,7 +2447,7 @@ void ext4_ext_truncate(struct inode * inode, struct page *page)
if (page)
ext4_block_truncate_page(handle, page, mapping, inode->i_size);

- mutex_lock(&EXT4_I(inode)->truncate_mutex);
+ down_write(&EXT4_I(inode)->i_data_sem);
ext4_ext_invalidate_cache(inode);

/*
@@ -2482,7 +2483,7 @@ out_stop:
if (inode->i_nlink)
ext4_orphan_del(handle, inode);

- mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+ up_write(&EXT4_I(inode)->i_data_sem);
ext4_journal_stop(handle);
}

@@ -2545,7 +2546,7 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
* modify 1 super block, 1 block bitmap and 1 group descriptor.
*/
credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb) + 3;
- mutex_lock(&EXT4_I(inode)->truncate_mutex)
+ down_write((&EXT4_I(inode)->i_data_sem));
retry:
while (ret >= 0 && ret < max_blocks) {
block = block + ret;
@@ -2602,7 +2603,7 @@ retry:
if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
goto retry;

- mutex_unlock(&EXT4_I(inode)->truncate_mutex)
+ up_write((&EXT4_I(inode)->i_data_sem));
/*
* Time to update the file size.
* Update only when preallocation was requested beyond the file size.
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 1a81cd6..ba4cf05 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -37,9 +37,9 @@ static int ext4_release_file (struct inode * inode, struct file * filp)
if ((filp->f_mode & FMODE_WRITE) &&
(atomic_read(&inode->i_writecount) == 1))
{
- mutex_lock(&EXT4_I(inode)->truncate_mutex);
+ down_write(&EXT4_I(inode)->i_data_sem);
ext4_discard_reservation(inode);
- mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+ up_write(&EXT4_I(inode)->i_data_sem);
}
if (is_dx(inode) && filp->private_data)
ext4_htree_free_dir_info(filp->private_data);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9cf37b2..669d560 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -341,7 +341,7 @@ static int ext4_block_to_path(struct inode *inode,
* the whole chain, all way to the data (returns %NULL, *err == 0).
*
* Need to be called with
- * mutex_lock(&EXT4_I(inode)->truncate_mutex)
+ * down_read(&EXT4_I(inode)->i_data_sem)
*/
static Indirect *ext4_get_branch(struct inode *inode, int depth, int *offsets,
Indirect chain[4], int *err)
@@ -772,7 +772,8 @@ err_out:
*
*
* Need to be called with
- * mutex_lock(&EXT4_I(inode)->truncate_mutex)
+ * down_read(&EXT4_I(inode)->i_data_sem) if not allocating file system block
+ * (ie, create is zero). Otherwise down_write(&EXT4_I(inode)->i_data_sem)
*/
int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
sector_t iblock, unsigned long maxblocks,
@@ -794,7 +795,7 @@ int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,

J_ASSERT(!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL));
J_ASSERT(handle != NULL || create == 0);
- depth = ext4_block_to_path(inode,iblock,offsets,&blocks_to_boundary);
+ depth = ext4_block_to_path(inode, iblock, offsets, &blocks_to_boundary);

if (depth == 0)
goto out;
@@ -859,7 +860,7 @@ int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
err = ext4_splice_branch(handle, inode, iblock,
partial, indirect_blks, count);
/*
- * i_disksize growing is protected by truncate_mutex. Don't forget to
+ * i_disksize growing is protected by i_data_sem. Don't forget to
* protect it if you're about to implement concurrent
* ext4_get_block() -bzzz
*/
@@ -889,6 +890,30 @@ out:

#define DIO_CREDITS (EXT4_RESERVE_TRANS_BLOCKS + 32)

+int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block,
+ unsigned long max_blocks, struct buffer_head *bh,
+ int create, int extend_disksize)
+{
+ int retval;
+ if (create) {
+ down_write((&EXT4_I(inode)->i_data_sem));
+ } else {
+ down_read((&EXT4_I(inode)->i_data_sem));
+ }
+ if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
+ retval = ext4_ext_get_blocks(handle, inode, block, max_blocks,
+ bh, create, extend_disksize);
+ } else {
+ retval = ext4_get_blocks_handle(handle, inode, block, max_blocks,
+ bh, create, extend_disksize);
+ }
+ if (create) {
+ up_write((&EXT4_I(inode)->i_data_sem));
+ } else {
+ up_read((&EXT4_I(inode)->i_data_sem));
+ }
+ return retval;
+}
static int ext4_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
@@ -1393,7 +1418,7 @@ static int jbd2_journal_dirty_data_fn(handle_t *handle, struct buffer_head *bh)
* ext4_file_write() -> generic_file_write() -> __alloc_pages() -> ...
*
* Same applies to ext4_get_block(). We will deadlock on various things like
- * lock_journal and i_truncate_mutex.
+ * lock_journal and i_data_sem
*
* Setting PF_MEMALLOC here doesn't work - too many internal memory
* allocations fail.
@@ -2316,7 +2341,7 @@ void ext4_truncate(struct inode *inode)
* From here we block out all ext4_get_block() callers who want to
* modify the block allocation tree.
*/
- mutex_lock(&ei->truncate_mutex);
+ down_write(&ei->i_data_sem);

if (n == 1) { /* direct blocks */
ext4_free_data(handle, inode, NULL, i_data+offsets[0],
@@ -2380,7 +2405,7 @@ do_indirects:

ext4_discard_reservation(inode);

- mutex_unlock(&ei->truncate_mutex);
+ up_write(&ei->i_data_sem);
inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
ext4_mark_inode_dirty(handle, inode);

diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index e7f894b..c0e5b8c 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -199,7 +199,7 @@ flags_err:
* need to allocate reservation structure for this inode
* before set the window size
*/
- mutex_lock(&ei->truncate_mutex);
+ down_write(&ei->i_data_sem);
if (!ei->i_block_alloc_info)
ext4_init_block_alloc_info(inode);

@@ -207,7 +207,7 @@ flags_err:
struct ext4_reserve_window_node *rsv = &ei->i_block_alloc_info->rsv_window_node;
rsv->rsv_goal_size = rsv_window_size;
}
- mutex_unlock(&ei->truncate_mutex);
+ up_write(&ei->i_data_sem);
return 0;
}
case EXT4_IOC_GROUP_EXTEND: {
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 8031dc0..a55866f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -533,7 +533,7 @@ static void init_once(struct kmem_cache *cachep, void *foo)
#ifdef CONFIG_EXT4DEV_FS_XATTR
init_rwsem(&ei->xattr_sem);
#endif
- mutex_init(&ei->truncate_mutex);
+ init_rwsem(&ei->i_data_sem);
inode_init_once(&ei->vfs_inode);
}

diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index 7f84645..69e9b75 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -1056,25 +1056,9 @@ extern void ext4_ext_init(struct super_block *);
extern void ext4_ext_release(struct super_block *);
extern long ext4_fallocate(struct inode *inode, int mode, loff_t offset,
loff_t len);
-static inline int
-ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block,
- unsigned long max_blocks, struct buffer_head *bh,
- int create, int extend_disksize)
-{
- int retval;
- mutex_lock(&EXT4_I(inode)->truncate_mutex);
- if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
- retval = ext4_ext_get_blocks(handle, inode, block, max_blocks,
- bh, create, extend_disksize);
- } else {
- retval = ext4_get_blocks_handle(handle, inode, block, max_blocks, bh,
- create, extend_disksize);
- }
- mutex_unlock(&EXT4_I(inode)->truncate_mutex);
- return retval;
-}
-
-
+extern int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode,
+ sector_t block, unsigned long max_blocks,
+ struct buffer_head *bh, int create, int extend_disksize);
#endif /* __KERNEL__ */

#endif /* _LINUX_EXT4_FS_H */
diff --git a/include/linux/ext4_fs_i.h b/include/linux/ext4_fs_i.h
index 86ddfe2..55e185b 100644
--- a/include/linux/ext4_fs_i.h
+++ b/include/linux/ext4_fs_i.h
@@ -134,16 +134,16 @@ struct ext4_inode_info {
__u16 i_extra_isize;

/*
- * truncate_mutex is for serialising ext4_truncate() against
+ * i_data_sem is for serialising ext4_truncate() against
* ext4_getblock(). In the 2.4 ext2 design, great chunks of inode's
* data tree are chopped off during truncate. We can't do that in
* ext4 because whenever we perform intermediate commits during
* truncate, the inode and all the metadata blocks *must* be in a
* consistent state which allows truncation of the orphans to restart
* during recovery. Hence we must fix the get_block-vs-truncate race
- * by other means, so we have truncate_mutex.
+ * by other means, so we have i_data_sem.
*/
- struct mutex truncate_mutex;
+ struct rw_semaphore i_data_sem;
struct inode vfs_inode;

unsigned long i_ext_generation;
--
1.5.4.rc0-dirty

2007-12-14 16:18:51

by Aneesh Kumar K.V

[permalink] [raw]
Subject: [PATCH 1/3] ext4: Make ext4_get_blocks_wrap take the truncate_mutex early.

When doing a migrate from ext3 to ext4 inode we need to make sure the test
for inode type and walking inode data happens inside lock. To make this
happen move truncate_mutex early before checking the i_flags.


This actually should enable us to remove the verify_chain().

Signed-off-by: Aneesh Kumar K.V <[email protected]>
---
fs/ext4/extents.c | 9 ++++--
fs/ext4/inode.c | 69 +++++-----------------------------------------
include/linux/ext4_fs.h | 12 ++++++--
3 files changed, 23 insertions(+), 67 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 8528774..98ef84d 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2225,6 +2225,10 @@ out:
return err ? err : allocated;
}

+/*
+ * Need to be called with
+ * mutex_lock(&EXT4_I(inode)->truncate_mutex);
+ */
int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
ext4_fsblk_t iblock,
unsigned long max_blocks, struct buffer_head *bh_result,
@@ -2240,7 +2244,6 @@ int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
__clear_bit(BH_New, &bh_result->b_state);
ext_debug("blocks %d/%lu requested for inode %u\n", (int) iblock,
max_blocks, (unsigned) inode->i_ino);
- mutex_lock(&EXT4_I(inode)->truncate_mutex);

/* check in cache */
goal = ext4_ext_in_cache(inode, iblock, &newex);
@@ -2414,8 +2417,6 @@ out2:
ext4_ext_drop_refs(path);
kfree(path);
}
- mutex_unlock(&EXT4_I(inode)->truncate_mutex);
-
return err ? err : allocated;
}

@@ -2544,6 +2545,7 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
* modify 1 super block, 1 block bitmap and 1 group descriptor.
*/
credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb) + 3;
+ mutex_lock(&EXT4_I(inode)->truncate_mutex)
retry:
while (ret >= 0 && ret < max_blocks) {
block = block + ret;
@@ -2600,6 +2602,7 @@ retry:
if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
goto retry;

+ mutex_unlock(&EXT4_I(inode)->truncate_mutex)
/*
* Time to update the file size.
* Update only when preallocation was requested beyond the file size.
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5489703..9cf37b2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -243,13 +243,6 @@ static inline void add_chain(Indirect *p, struct buffer_head *bh, __le32 *v)
p->bh = bh;
}

-static int verify_chain(Indirect *from, Indirect *to)
-{
- while (from <= to && from->key == *from->p)
- from++;
- return (from > to);
-}
-
/**
* ext4_block_to_path - parse the block number into array of offsets
* @inode: inode in question (we are only interested in its superblock)
@@ -344,10 +337,11 @@ static int ext4_block_to_path(struct inode *inode,
* (pointer to last triple returned, *@err == 0)
* or when it gets an IO error reading an indirect block
* (ditto, *@err == -EIO)
- * or when it notices that chain had been changed while it was reading
- * (ditto, *@err == -EAGAIN)
* or when it reads all @depth-1 indirect blocks successfully and finds
* the whole chain, all way to the data (returns %NULL, *err == 0).
+ *
+ * Need to be called with
+ * mutex_lock(&EXT4_I(inode)->truncate_mutex)
*/
static Indirect *ext4_get_branch(struct inode *inode, int depth, int *offsets,
Indirect chain[4], int *err)
@@ -365,9 +359,6 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth, int *offsets,
bh = sb_bread(sb, le32_to_cpu(p->key));
if (!bh)
goto failure;
- /* Reader: pointers */
- if (!verify_chain(chain, p))
- goto changed;
add_chain(++p, bh, (__le32*)bh->b_data + *++offsets);
/* Reader: end */
if (!p->key)
@@ -375,10 +366,6 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth, int *offsets,
}
return NULL;

-changed:
- brelse(bh);
- *err = -EAGAIN;
- goto no_block;
failure:
*err = -EIO;
no_block:
@@ -782,6 +769,10 @@ err_out:
* return > 0, # of blocks mapped or allocated.
* return = 0, if plain lookup failed.
* return < 0, error case.
+ *
+ *
+ * Need to be called with
+ * mutex_lock(&EXT4_I(inode)->truncate_mutex)
*/
int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
sector_t iblock, unsigned long maxblocks,
@@ -819,18 +810,6 @@ int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
while (count < maxblocks && count <= blocks_to_boundary) {
ext4_fsblk_t blk;

- if (!verify_chain(chain, partial)) {
- /*
- * Indirect block might be removed by
- * truncate while we were reading it.
- * Handling of that case: forget what we've
- * got now. Flag the err as EAGAIN, so it
- * will reread.
- */
- err = -EAGAIN;
- count = 0;
- break;
- }
blk = le32_to_cpu(*(chain[depth-1].p + count));

if (blk == first_block + count)
@@ -838,44 +817,13 @@ int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
else
break;
}
- if (err != -EAGAIN)
- goto got_it;
+ goto got_it;
}

/* Next simple case - plain lookup or failed read of indirect block */
if (!create || err == -EIO)
goto cleanup;

- mutex_lock(&ei->truncate_mutex);
-
- /*
- * If the indirect block is missing while we are reading
- * the chain(ext4_get_branch() returns -EAGAIN err), or
- * if the chain has been changed after we grab the semaphore,
- * (either because another process truncated this branch, or
- * another get_block allocated this branch) re-grab the chain to see if
- * the request block has been allocated or not.
- *
- * Since we already block the truncate/other get_block
- * at this point, we will have the current copy of the chain when we
- * splice the branch into the tree.
- */
- if (err == -EAGAIN || !verify_chain(chain, partial)) {
- while (partial > chain) {
- brelse(partial->bh);
- partial--;
- }
- partial = ext4_get_branch(inode, depth, offsets, chain, &err);
- if (!partial) {
- count++;
- mutex_unlock(&ei->truncate_mutex);
- if (err)
- goto cleanup;
- clear_buffer_new(bh_result);
- goto got_it;
- }
- }
-
/*
* Okay, we need to do block allocation. Lazily initialize the block
* allocation info here if necessary
@@ -917,7 +865,6 @@ int ext4_get_blocks_handle(handle_t *handle, struct inode *inode,
*/
if (!err && extend_disksize && inode->i_size > ei->i_disksize)
ei->i_disksize = inode->i_size;
- mutex_unlock(&ei->truncate_mutex);
if (err)
goto cleanup;

diff --git a/include/linux/ext4_fs.h b/include/linux/ext4_fs.h
index 97dd409..7f84645 100644
--- a/include/linux/ext4_fs.h
+++ b/include/linux/ext4_fs.h
@@ -1061,11 +1061,17 @@ ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block,
unsigned long max_blocks, struct buffer_head *bh,
int create, int extend_disksize)
{
- if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)
- return ext4_ext_get_blocks(handle, inode, block, max_blocks,
+ int retval;
+ mutex_lock(&EXT4_I(inode)->truncate_mutex);
+ if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL) {
+ retval = ext4_ext_get_blocks(handle, inode, block, max_blocks,
bh, create, extend_disksize);
- return ext4_get_blocks_handle(handle, inode, block, max_blocks, bh,
+ } else {
+ retval = ext4_get_blocks_handle(handle, inode, block, max_blocks, bh,
create, extend_disksize);
+ }
+ mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+ return retval;
}


--
1.5.4.rc0-dirty