2011-04-14 05:42:41

by Yongqiang Yang

[permalink] [raw]
Subject: [PATCH REC 0/3] ext4:Factor common code from convert and split unwritten.

These patches factor common code from ext4_ext_convert_to_initialized() and
ext4_split_unwritten_extents(), so that extent-move-on-write in snapshot and
punch-hole can be built on the common code as well.

ext4:Add a function merging extent right and left.
ext4:Add two functions splitting an extent.
ext4:Reimplement convert and split_unwritten.



2011-04-14 05:42:43

by Yongqiang Yang

[permalink] [raw]
Subject: [PATCH RFC 1/3] ext4:Add a function merging extent right and left.

1] Rename ext4_ext_try_to_merge() to ext4_ext_try_to_merge_right().

2] Add a new function ext4_ext_try_to_merge() which tries to merge
an extent both left and right.

3] Use the new function in ext4_ext_convert_unwritten_endio() and
ext4_ext_insert_extent().

Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/extents.c | 65 ++++++++++++++++++++++++++++------------------------
1 files changed, 35 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index dd2cb50..11f30d2 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1563,7 +1563,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
* Returns 0 if the extents (ex and ex+1) were _not_ merged and returns
* 1 if they got merged.
*/
-static int ext4_ext_try_to_merge(struct inode *inode,
+static int ext4_ext_try_to_merge_right(struct inode *inode,
struct ext4_ext_path *path,
struct ext4_extent *ex)
{
@@ -1603,6 +1603,31 @@ static int ext4_ext_try_to_merge(struct inode *inode,
}

/*
+ * This function tries to merge the @ex extent to neighbours in the tree.
+ * return 1 if merge left else 0.
+ */
+static int ext4_ext_try_to_merge(struct inode *inode,
+ struct ext4_ext_path *path,
+ struct ext4_extent *ex) {
+ struct ext4_extent_header *eh;
+ unsigned int depth;
+ int merge_done = 0;
+ int ret = 0;
+
+ depth = ext_depth(inode);
+ BUG_ON(path[depth].p_hdr == NULL);
+ eh = path[depth].p_hdr;
+
+ if (ex > EXT_FIRST_EXTENT(eh))
+ merge_done = ext4_ext_try_to_merge_right(inode, path, ex - 1);
+
+ if (!merge_done)
+ ret = ext4_ext_try_to_merge_right(inode, path, ex);
+
+ return ret;
+}
+
+/*
* check if a portion of the "newext" extent overlaps with an
* existing extent.
*
@@ -3039,6 +3064,7 @@ fix_extent_len:
ext4_ext_dirty(handle, inode, path + depth);
return err;
}
+
static int ext4_convert_unwritten_extents_endio(handle_t *handle,
struct inode *inode,
struct ext4_ext_path *path)
@@ -3047,46 +3073,25 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
struct ext4_extent_header *eh;
int depth;
int err = 0;
- int ret = 0;

depth = ext_depth(inode);
eh = path[depth].p_hdr;
ex = path[depth].p_ext;

+ ext_debug("ext4_convert_unwritten_extents_endio: inode %lu, logical"
+ "block %llu, max_blocks %u\n", inode->i_ino,
+ (unsigned long long)le32_to_cpu(ex->ee_block),
+ ext4_ext_get_actual_len(ex));
+
err = ext4_ext_get_access(handle, inode, path + depth);
if (err)
goto out;
/* first mark the extent as initialized */
ext4_ext_mark_initialized(ex);

- /*
- * We have to see if it can be merged with the extent
- * on the left.
- */
- if (ex > EXT_FIRST_EXTENT(eh)) {
- /*
- * To merge left, pass "ex - 1" to try_to_merge(),
- * since it merges towards right _only_.
- */
- ret = ext4_ext_try_to_merge(inode, path, ex - 1);
- if (ret) {
- err = ext4_ext_correct_indexes(handle, inode, path);
- if (err)
- goto out;
- depth = ext_depth(inode);
- ex--;
- }
- }
- /*
- * Try to Merge towards right.
- */
- ret = ext4_ext_try_to_merge(inode, path, ex);
- if (ret) {
- err = ext4_ext_correct_indexes(handle, inode, path);
- if (err)
- goto out;
- depth = ext_depth(inode);
- }
+ /* correct indexes is nt needed becasue borders are not changed */
+ ext4_ext_try_to_merge(inode, path, ex);
+
/* Mark modified extent as dirty */
err = ext4_ext_dirty(handle, inode, path + depth);
out:
--
1.7.4.4


2011-04-14 05:42:47

by Yongqiang Yang

[permalink] [raw]
Subject: [PATCH RFC 2/3] ext4:Add two functions splitting an extent.

1] Add a function named ext4_split_extent_at() which splits an extent
into two extents at given logical block.

2] Add a function called ext4_split_extent() which splits an extent
into three extents.

Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/extents.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 200 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 11f30d2..1756e0f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2554,6 +2554,206 @@ static int ext4_ext_zeroout(struct inode *inode, struct ext4_extent *ex)
return ret;
}

+/*
+ * used by extent splitting.
+ */
+#define EXT4_EXT_MAY_ZEROOUT 0x1 /* safe to zeroout if split fails \
+ due to ENOSPC */
+#define EXT4_EXT_MARK_UNINIT1 0x2 /* mark first half uninitialized */
+#define EXT4_EXT_MARK_UNINIT2 0x4 /* mark second half uninitialized */
+
+/*
+ * ext4_split_extent_at() splits an extent at given block.
+ *
+ * @handle: the journal handle
+ * @inode: the file inode
+ * @path: the path to the extent
+ * @split: the logical block where the extent is splitted.
+ * @split_flags: indicates if the extent could be zeroout if split fails, and
+ * the states(init or uninit) of new extents.
+ * @flags: flags used to insert new extent to extent tree.
+ *
+ *
+ * Splits extent [a, b] into two extents [a, @split) and [@split, b], states
+ * of which are deterimined by split_flag.
+ *
+ * There are two cases:
+ * a> the extent are splitted into two extent.
+ * b> split is not needed, and just mark the extent.
+ *
+ * return 0 on success.
+ */
+static int ext4_split_extent_at(handle_t *handle,
+ struct inode *inode,
+ struct ext4_ext_path *path,
+ ext4_lblk_t split,
+ int split_flag,
+ int flags)
+{
+ ext4_fsblk_t newblock;
+ ext4_lblk_t ee_block;
+ struct ext4_extent_header *eh;
+ struct ext4_extent *ex, newex, orig_ex;
+ struct ext4_extent *ex2 = NULL;
+ unsigned int ee_len, depth;
+ int err = 0;
+
+ ext_debug("ext4_split_extents_at: inode %lu, logical"
+ "block %llu\n", inode->i_ino, (unsigned long long)split);
+
+ ext4_ext_show_leaf(inode, path);
+
+ depth = ext_depth(inode);
+ eh = path[depth].p_hdr;
+ ex = path[depth].p_ext;
+ ee_block = le32_to_cpu(ex->ee_block);
+ ee_len = ext4_ext_get_actual_len(ex);
+ newblock = split - ee_block + ext4_ext_pblock(ex);
+
+ BUG_ON(split < ee_block || split >= (ee_block + ee_len));
+
+ err = ext4_ext_get_access(handle, inode, path + depth);
+ if (err)
+ goto out;
+
+ if (split == ee_block) {
+ /*
+ * case b: block @split is the block that the extent begins with
+ * then we just change the state of the extent, and splitting
+ * is not needed.
+ */
+ if (split_flag & EXT4_EXT_MARK_UNINIT2)
+ ext4_ext_mark_uninitialized(ex);
+ else
+ ext4_ext_mark_initialized(ex);
+
+ if (!(flags & EXT4_GET_BLOCKS_PRE_IO))
+ ext4_ext_try_to_merge(inode, path, ex);
+
+ err = ext4_ext_dirty(handle, inode, path + depth);
+ goto out;
+ }
+
+ /* case a */
+ orig_ex.ee_block = ex->ee_block;
+ orig_ex.ee_len = ex->ee_len;
+ ext4_ext_store_pblock(&orig_ex, ext4_ext_pblock(ex));
+
+ ex->ee_len = cpu_to_le16(split - ee_block);
+ if (split_flag & EXT4_EXT_MARK_UNINIT1)
+ ext4_ext_mark_uninitialized(ex);
+
+ /*
+ * path may lead to new leaf, not to original leaf any more
+ * after ext4_ext_insert_extent() returns,
+ */
+ err = ext4_ext_dirty(handle, inode, path + depth);
+ if (err)
+ goto fix_extent_len;
+
+ ex2 = &newex;
+ ex2->ee_block = cpu_to_le32(split);
+ ex2->ee_len = cpu_to_le16(ee_len - (split - ee_block));
+ ext4_ext_store_pblock(ex2, newblock);
+ if (split_flag & EXT4_EXT_MARK_UNINIT2)
+ ext4_ext_mark_uninitialized(ex2);
+
+ err = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
+ if (err == -ENOSPC && (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
+
+ err = ext4_ext_zeroout(inode, &orig_ex);
+ if (err)
+ goto fix_extent_len;
+ /* update the extent length and mark as initialized */
+ ex->ee_block = orig_ex.ee_block;
+ ex->ee_len = orig_ex.ee_len;
+ ext4_ext_mark_initialized(ex);
+ ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
+ err = ext4_ext_dirty(handle, inode, path + depth);
+ goto out;
+ } else if (err)
+ goto fix_extent_len;
+
+out:
+ ext4_ext_show_leaf(inode, path);
+ return err;
+
+fix_extent_len:
+ ex->ee_block = orig_ex.ee_block;
+ ex->ee_len = orig_ex.ee_len;
+ ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
+ ext4_ext_dirty(handle, inode, path + depth);
+ return err;
+}
+
+/*
+ * ext4_split_extents() splits an extent and mark extent which is covered
+ * by @map as split_flags indicates
+ *
+ * It may result in splitting the extent into multiple extents (upto three)
+ * There are three possibilities:
+ * a> There is no split required
+ * b> Splits in two extents: Split is happening at either end of the extent
+ * c> Splits in three extents: Somone is splitting in middle of the extent
+ *
+ */
+static int ext4_split_extent(handle_t *handle,
+ struct inode *inode,
+ struct ext4_ext_path *path,
+ struct ext4_map_blocks *map,
+ int split_flag,
+ int flags)
+{
+ ext4_lblk_t ee_block;
+ struct ext4_extent *ex;
+ unsigned int ee_len, depth;
+ int err = 0;
+ int uninitialized;
+ int split_flag1, flags1;
+
+ depth = ext_depth(inode);
+ ex = path[depth].p_ext;
+ ee_block = le32_to_cpu(ex->ee_block);
+ ee_len = ext4_ext_get_actual_len(ex);
+ uninitialized = ext4_ext_is_uninitialized(ex);
+
+ if (map->m_lblk + map->m_len < ee_block + ee_len) {
+ split_flag1 = 0;
+ split_flag1 |= split_flag & EXT4_EXT_MAY_ZEROOUT ?
+ EXT4_EXT_MAY_ZEROOUT : 0;
+ flags1 = flags;
+ flags1 |= EXT4_GET_BLOCKS_PRE_IO;
+ if (uninitialized)
+ split_flag1 |= EXT4_EXT_MARK_UNINIT1 |
+ EXT4_EXT_MARK_UNINIT2;
+ err = ext4_split_extent_at(handle, inode, path,
+ map->m_lblk + map->m_len, split_flag1, flags1);
+ }
+
+ ext4_ext_drop_refs(path);
+ path = ext4_ext_find_extent(inode, map->m_lblk, path);
+ if (IS_ERR(path))
+ return PTR_ERR(path);
+
+ if (map->m_lblk >= ee_block) {
+ split_flag1 = 0;
+ split_flag1 |= split_flag & EXT4_EXT_MAY_ZEROOUT ?
+ EXT4_EXT_MAY_ZEROOUT : 0;
+ if (uninitialized)
+ split_flag1 |= EXT4_EXT_MARK_UNINIT1;
+ if (split_flag & EXT4_EXT_MARK_UNINIT2)
+ split_flag1 |= EXT4_EXT_MARK_UNINIT2;
+ err = ext4_split_extent_at(handle, inode, path,
+ map->m_lblk, split_flag1, flags);
+ if (err)
+ goto out;
+ }
+
+ ext4_ext_show_leaf(inode, path);
+out:
+ return err ? err : map->m_len;
+}
+
#define EXT4_EXT_ZERO_LEN 7
/*
* This function is called by ext4_ext_map_blocks() if someone tries to write
--
1.7.4.4


2011-04-14 05:42:50

by Yongqiang Yang

[permalink] [raw]
Subject: [PATCH RFC 3/3] ext4:Reimplement convert and split_unwritten.

convert and split unwritten are reimplemented based on ext4_split_extent()
added in last patch.

Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/extents.c | 483 ++++++++--------------------------------------------
1 files changed, 75 insertions(+), 408 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 1756e0f..ee2dda3 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2770,17 +2770,13 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
struct ext4_map_blocks *map,
struct ext4_ext_path *path)
{
- struct ext4_extent *ex, newex, orig_ex;
- struct ext4_extent *ex1 = NULL;
- struct ext4_extent *ex2 = NULL;
- struct ext4_extent *ex3 = NULL;
- struct ext4_extent_header *eh;
+ struct ext4_map_blocks split_map;
+ struct ext4_extent zero_ex;
+ struct ext4_extent *ex;
ext4_lblk_t ee_block, eof_block;
unsigned int allocated, ee_len, depth;
- ext4_fsblk_t newblock;
int err = 0;
- int ret = 0;
- int may_zeroout;
+ int split_flag = 0;

ext_debug("ext4_ext_convert_to_initialized: inode %lu, logical"
"block %llu, max_blocks %u\n", inode->i_ino,
@@ -2792,280 +2788,90 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
eof_block = map->m_lblk + map->m_len;

depth = ext_depth(inode);
- eh = path[depth].p_hdr;
ex = path[depth].p_ext;
ee_block = le32_to_cpu(ex->ee_block);
ee_len = ext4_ext_get_actual_len(ex);
allocated = ee_len - (map->m_lblk - ee_block);
- newblock = map->m_lblk - ee_block + ext4_ext_pblock(ex);
-
- ex2 = ex;
- orig_ex.ee_block = ex->ee_block;
- orig_ex.ee_len = cpu_to_le16(ee_len);
- ext4_ext_store_pblock(&orig_ex, ext4_ext_pblock(ex));

+ WARN_ON(map->m_lblk < ee_block);
/*
* It is safe to convert extent to initialized via explicit
* zeroout only if extent is fully insde i_size or new_size.
*/
- may_zeroout = ee_block + ee_len <= eof_block;
+ split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;

- err = ext4_ext_get_access(handle, inode, path + depth);
- if (err)
- goto out;
/* If extent has less than 2*EXT4_EXT_ZERO_LEN zerout directly */
- if (ee_len <= 2*EXT4_EXT_ZERO_LEN && may_zeroout) {
- err = ext4_ext_zeroout(inode, &orig_ex);
+ if (ee_len <= 2*EXT4_EXT_ZERO_LEN &&
+ (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
+ zero_ex.ee_block = ex->ee_block;
+ zero_ex.ee_len = cpu_to_le16(map->m_lblk - ee_block);
+ ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex));
+ err = ext4_ext_zeroout(inode, &zero_ex);
if (err)
- goto fix_extent_len;
- /* update the extent length and mark as initialized */
- ex->ee_block = orig_ex.ee_block;
- ex->ee_len = orig_ex.ee_len;
- ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
- ext4_ext_dirty(handle, inode, path + depth);
- /* zeroed the full extent */
- return allocated;
- }
-
- /* ex1: ee_block to map->m_lblk - 1 : uninitialized */
- if (map->m_lblk > ee_block) {
- ex1 = ex;
- ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
- ext4_ext_mark_uninitialized(ex1);
- ex2 = &newex;
- }
- /*
- * for sanity, update the length of the ex2 extent before
- * we insert ex3, if ex1 is NULL. This is to avoid temporary
- * overlap of blocks.
- */
- if (!ex1 && allocated > map->m_len)
- ex2->ee_len = cpu_to_le16(map->m_len);
- /* ex3: to ee_block + ee_len : uninitialised */
- if (allocated > map->m_len) {
- unsigned int newdepth;
- /* If extent has less than EXT4_EXT_ZERO_LEN zerout directly */
- if (allocated <= EXT4_EXT_ZERO_LEN && may_zeroout) {
- /*
- * map->m_lblk == ee_block is handled by the zerouout
- * at the beginning.
- * Mark first half uninitialized.
- * Mark second half initialized and zero out the
- * initialized extent
- */
- ex->ee_block = orig_ex.ee_block;
- ex->ee_len = cpu_to_le16(ee_len - allocated);
- ext4_ext_mark_uninitialized(ex);
- ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
- ext4_ext_dirty(handle, inode, path + depth);
-
- ex3 = &newex;
- ex3->ee_block = cpu_to_le32(map->m_lblk);
- ext4_ext_store_pblock(ex3, newblock);
- ex3->ee_len = cpu_to_le16(allocated);
- err = ext4_ext_insert_extent(handle, inode, path,
- ex3, 0);
- if (err == -ENOSPC) {
- err = ext4_ext_zeroout(inode, &orig_ex);
- if (err)
- goto fix_extent_len;
- ex->ee_block = orig_ex.ee_block;
- ex->ee_len = orig_ex.ee_len;
- ext4_ext_store_pblock(ex,
- ext4_ext_pblock(&orig_ex));
- ext4_ext_dirty(handle, inode, path + depth);
- /* blocks available from map->m_lblk */
- return allocated;
-
- } else if (err)
- goto fix_extent_len;
-
- /*
- * We need to zero out the second half because
- * an fallocate request can update file size and
- * converting the second half to initialized extent
- * implies that we can leak some junk data to user
- * space.
- */
- err = ext4_ext_zeroout(inode, ex3);
- if (err) {
- /*
- * We should actually mark the
- * second half as uninit and return error
- * Insert would have changed the extent
- */
- depth = ext_depth(inode);
- ext4_ext_drop_refs(path);
- path = ext4_ext_find_extent(inode, map->m_lblk,
- path);
- if (IS_ERR(path)) {
- err = PTR_ERR(path);
- return err;
- }
- /* get the second half extent details */
- ex = path[depth].p_ext;
- err = ext4_ext_get_access(handle, inode,
- path + depth);
- if (err)
- return err;
- ext4_ext_mark_uninitialized(ex);
- ext4_ext_dirty(handle, inode, path + depth);
- return err;
- }
-
- /* zeroed the second half */
- return allocated;
- }
- ex3 = &newex;
- ex3->ee_block = cpu_to_le32(map->m_lblk + map->m_len);
- ext4_ext_store_pblock(ex3, newblock + map->m_len);
- ex3->ee_len = cpu_to_le16(allocated - map->m_len);
- ext4_ext_mark_uninitialized(ex3);
- err = ext4_ext_insert_extent(handle, inode, path, ex3, 0);
- if (err == -ENOSPC && may_zeroout) {
- err = ext4_ext_zeroout(inode, &orig_ex);
- if (err)
- goto fix_extent_len;
- /* update the extent length and mark as initialized */
- ex->ee_block = orig_ex.ee_block;
- ex->ee_len = orig_ex.ee_len;
- ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
- ext4_ext_dirty(handle, inode, path + depth);
- /* zeroed the full extent */
- /* blocks available from map->m_lblk */
- return allocated;
-
- } else if (err)
- goto fix_extent_len;
- /*
- * The depth, and hence eh & ex might change
- * as part of the insert above.
- */
- newdepth = ext_depth(inode);
- /*
- * update the extent length after successful insert of the
- * split extent
- */
- ee_len -= ext4_ext_get_actual_len(ex3);
- orig_ex.ee_len = cpu_to_le16(ee_len);
- may_zeroout = ee_block + ee_len <= eof_block;
-
- depth = newdepth;
- ext4_ext_drop_refs(path);
- path = ext4_ext_find_extent(inode, map->m_lblk, path);
- if (IS_ERR(path)) {
- err = PTR_ERR(path);
goto out;
- }
- eh = path[depth].p_hdr;
- ex = path[depth].p_ext;
- if (ex2 != &newex)
- ex2 = ex;

err = ext4_ext_get_access(handle, inode, path + depth);
if (err)
goto out;
-
- allocated = map->m_len;
-
- /* If extent has less than EXT4_EXT_ZERO_LEN and we are trying
- * to insert a extent in the middle zerout directly
- * otherwise give the extent a chance to merge to left
- */
- if (le16_to_cpu(orig_ex.ee_len) <= EXT4_EXT_ZERO_LEN &&
- map->m_lblk != ee_block && may_zeroout) {
- err = ext4_ext_zeroout(inode, &orig_ex);
- if (err)
- goto fix_extent_len;
- /* update the extent length and mark as initialized */
- ex->ee_block = orig_ex.ee_block;
- ex->ee_len = orig_ex.ee_len;
- ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
- ext4_ext_dirty(handle, inode, path + depth);
- /* zero out the first half */
- /* blocks available from map->m_lblk */
- return allocated;
- }
- }
- /*
- * If there was a change of depth as part of the
- * insertion of ex3 above, we need to update the length
- * of the ex1 extent again here
- */
- if (ex1 && ex1 != ex) {
- ex1 = ex;
- ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
- ext4_ext_mark_uninitialized(ex1);
- ex2 = &newex;
- }
- /* ex2: map->m_lblk to map->m_lblk + maxblocks-1 : initialised */
- ex2->ee_block = cpu_to_le32(map->m_lblk);
- ext4_ext_store_pblock(ex2, newblock);
- ex2->ee_len = cpu_to_le16(allocated);
- if (ex2 != ex)
- goto insert;
- /*
- * New (initialized) extent starts from the first block
- * in the current extent. i.e., ex2 == ex
- * We have to see if it can be merged with the extent
- * on the left.
- */
- if (ex2 > EXT_FIRST_EXTENT(eh)) {
- /*
- * To merge left, pass "ex2 - 1" to try_to_merge(),
- * since it merges towards right _only_.
- */
- ret = ext4_ext_try_to_merge(inode, path, ex2 - 1);
- if (ret) {
- err = ext4_ext_correct_indexes(handle, inode, path);
- if (err)
- goto out;
- depth = ext_depth(inode);
- ex2--;
- }
+ ext4_ext_mark_initialized(ex);
+ ext4_ext_try_to_merge(inode, path, ex);
+ err = ext4_ext_dirty(handle, inode, path + depth);
+ goto out;
}
+
/*
- * Try to Merge towards right. This might be required
- * only when the whole extent is being written to.
- * i.e. ex2 == ex and ex3 == NULL.
+ * four cases:
+ * 1. split the extent into three extents.
+ * 2. split the extent into two extents, zeroout the first half.
+ * 3. split the extent into two extents, zeroout the second half.
+ * 4. split the extent into two extents with out zeroout.
*/
- if (!ex3) {
- ret = ext4_ext_try_to_merge(inode, path, ex2);
- if (ret) {
- err = ext4_ext_correct_indexes(handle, inode, path);
+ split_map.m_lblk = map->m_lblk;
+ split_map.m_len = map->m_len;
+
+ if (allocated > map->m_len) {
+ if (allocated <= EXT4_EXT_ZERO_LEN &&
+ (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
+ /* case 3 */
+ zero_ex.ee_block =
+ cpu_to_le32(map->m_lblk + map->m_len);
+ zero_ex.ee_len = cpu_to_le16(allocated - map->m_len);
+ ext4_ext_store_pblock(&zero_ex,
+ ext4_ext_pblock(ex) + map->m_lblk - ee_block);
+ err = ext4_ext_zeroout(inode, &zero_ex);
if (err)
goto out;
+ split_map.m_lblk = map->m_lblk;
+ split_map.m_len = allocated;
+ } else if ((map->m_lblk - ee_block + map->m_len <
+ EXT4_EXT_ZERO_LEN) &&
+ (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
+ /* case 2 */
+ if (map->m_lblk != ee_block) {
+ zero_ex.ee_block = ex->ee_block;
+ zero_ex.ee_len = cpu_to_le16(map->m_lblk -
+ ee_block);
+ ext4_ext_store_pblock(&zero_ex,
+ ext4_ext_pblock(ex));
+ err = ext4_ext_zeroout(inode, &zero_ex);
+ if (err)
+ goto out;
+ }
+
+ allocated = map->m_lblk - ee_block + map->m_len;
+
+ split_map.m_lblk = ee_block;
+ split_map.m_len = allocated;
}
}
- /* Mark modified extent as dirty */
- err = ext4_ext_dirty(handle, inode, path + depth);
- goto out;
-insert:
- err = ext4_ext_insert_extent(handle, inode, path, &newex, 0);
- if (err == -ENOSPC && may_zeroout) {
- err = ext4_ext_zeroout(inode, &orig_ex);
- if (err)
- goto fix_extent_len;
- /* update the extent length and mark as initialized */
- ex->ee_block = orig_ex.ee_block;
- ex->ee_len = orig_ex.ee_len;
- ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
- ext4_ext_dirty(handle, inode, path + depth);
- /* zero out the first half */
- return allocated;
- } else if (err)
- goto fix_extent_len;
+
+ allocated = ext4_split_extent(handle, inode, path,
+ &split_map, split_flag, 0);
+ if (allocated < 0)
+ err = allocated;
+
out:
- ext4_ext_show_leaf(inode, path);
return err ? err : allocated;
-
-fix_extent_len:
- ex->ee_block = orig_ex.ee_block;
- ex->ee_len = orig_ex.ee_len;
- ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
- ext4_ext_mark_uninitialized(ex);
- ext4_ext_dirty(handle, inode, path + depth);
- return err;
}

/*
@@ -3096,15 +2902,11 @@ static int ext4_split_unwritten_extents(handle_t *handle,
struct ext4_ext_path *path,
int flags)
{
- struct ext4_extent *ex, newex, orig_ex;
- struct ext4_extent *ex1 = NULL;
- struct ext4_extent *ex2 = NULL;
- struct ext4_extent *ex3 = NULL;
- ext4_lblk_t ee_block, eof_block;
- unsigned int allocated, ee_len, depth;
- ext4_fsblk_t newblock;
- int err = 0;
- int may_zeroout;
+ ext4_lblk_t eof_block;
+ ext4_lblk_t ee_block;
+ struct ext4_extent *ex;
+ unsigned int ee_len;
+ int split_flag = 0, depth;

ext_debug("ext4_split_unwritten_extents: inode %lu, logical"
"block %llu, max_blocks %u\n", inode->i_ino,
@@ -3114,155 +2916,20 @@ static int ext4_split_unwritten_extents(handle_t *handle,
inode->i_sb->s_blocksize_bits;
if (eof_block < map->m_lblk + map->m_len)
eof_block = map->m_lblk + map->m_len;
-
- depth = ext_depth(inode);
- ex = path[depth].p_ext;
- ee_block = le32_to_cpu(ex->ee_block);
- ee_len = ext4_ext_get_actual_len(ex);
- allocated = ee_len - (map->m_lblk - ee_block);
- newblock = map->m_lblk - ee_block + ext4_ext_pblock(ex);
-
- ex2 = ex;
- orig_ex.ee_block = ex->ee_block;
- orig_ex.ee_len = cpu_to_le16(ee_len);
- ext4_ext_store_pblock(&orig_ex, ext4_ext_pblock(ex));
-
/*
* It is safe to convert extent to initialized via explicit
* zeroout only if extent is fully insde i_size or new_size.
*/
- may_zeroout = ee_block + ee_len <= eof_block;
-
- /*
- * If the uninitialized extent begins at the same logical
- * block where the write begins, and the write completely
- * covers the extent, then we don't need to split it.
- */
- if ((map->m_lblk == ee_block) && (allocated <= map->m_len))
- return allocated;
-
- err = ext4_ext_get_access(handle, inode, path + depth);
- if (err)
- goto out;
- /* ex1: ee_block to map->m_lblk - 1 : uninitialized */
- if (map->m_lblk > ee_block) {
- ex1 = ex;
- ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
- ext4_ext_mark_uninitialized(ex1);
- ex2 = &newex;
- }
- /*
- * for sanity, update the length of the ex2 extent before
- * we insert ex3, if ex1 is NULL. This is to avoid temporary
- * overlap of blocks.
- */
- if (!ex1 && allocated > map->m_len)
- ex2->ee_len = cpu_to_le16(map->m_len);
- /* ex3: to ee_block + ee_len : uninitialised */
- if (allocated > map->m_len) {
- unsigned int newdepth;
- ex3 = &newex;
- ex3->ee_block = cpu_to_le32(map->m_lblk + map->m_len);
- ext4_ext_store_pblock(ex3, newblock + map->m_len);
- ex3->ee_len = cpu_to_le16(allocated - map->m_len);
- ext4_ext_mark_uninitialized(ex3);
- err = ext4_ext_insert_extent(handle, inode, path, ex3, flags);
- if (err == -ENOSPC && may_zeroout) {
- err = ext4_ext_zeroout(inode, &orig_ex);
- if (err)
- goto fix_extent_len;
- /* update the extent length and mark as initialized */
- ex->ee_block = orig_ex.ee_block;
- ex->ee_len = orig_ex.ee_len;
- ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
- ext4_ext_dirty(handle, inode, path + depth);
- /* zeroed the full extent */
- /* blocks available from map->m_lblk */
- return allocated;
-
- } else if (err)
- goto fix_extent_len;
- /*
- * The depth, and hence eh & ex might change
- * as part of the insert above.
- */
- newdepth = ext_depth(inode);
- /*
- * update the extent length after successful insert of the
- * split extent
- */
- ee_len -= ext4_ext_get_actual_len(ex3);
- orig_ex.ee_len = cpu_to_le16(ee_len);
- may_zeroout = ee_block + ee_len <= eof_block;
-
- depth = newdepth;
- ext4_ext_drop_refs(path);
- path = ext4_ext_find_extent(inode, map->m_lblk, path);
- if (IS_ERR(path)) {
- err = PTR_ERR(path);
- goto out;
- }
- ex = path[depth].p_ext;
- if (ex2 != &newex)
- ex2 = ex;
-
- err = ext4_ext_get_access(handle, inode, path + depth);
- if (err)
- goto out;
+ depth = ext_depth(inode);
+ ex = path[depth].p_ext;
+ ee_block = le32_to_cpu(ex->ee_block);
+ ee_len = ext4_ext_get_actual_len(ex);

- allocated = map->m_len;
- }
- /*
- * If there was a change of depth as part of the
- * insertion of ex3 above, we need to update the length
- * of the ex1 extent again here
- */
- if (ex1 && ex1 != ex) {
- ex1 = ex;
- ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
- ext4_ext_mark_uninitialized(ex1);
- ex2 = &newex;
- }
- /*
- * ex2: map->m_lblk to map->m_lblk + map->m_len-1 : to be written
- * using direct I/O, uninitialised still.
- */
- ex2->ee_block = cpu_to_le32(map->m_lblk);
- ext4_ext_store_pblock(ex2, newblock);
- ex2->ee_len = cpu_to_le16(allocated);
- ext4_ext_mark_uninitialized(ex2);
- if (ex2 != ex)
- goto insert;
- /* Mark modified extent as dirty */
- err = ext4_ext_dirty(handle, inode, path + depth);
- ext_debug("out here\n");
- goto out;
-insert:
- err = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
- if (err == -ENOSPC && may_zeroout) {
- err = ext4_ext_zeroout(inode, &orig_ex);
- if (err)
- goto fix_extent_len;
- /* update the extent length and mark as initialized */
- ex->ee_block = orig_ex.ee_block;
- ex->ee_len = orig_ex.ee_len;
- ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
- ext4_ext_dirty(handle, inode, path + depth);
- /* zero out the first half */
- return allocated;
- } else if (err)
- goto fix_extent_len;
-out:
- ext4_ext_show_leaf(inode, path);
- return err ? err : allocated;
+ split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
+ split_flag |= EXT4_EXT_MARK_UNINIT2;

-fix_extent_len:
- ex->ee_block = orig_ex.ee_block;
- ex->ee_len = orig_ex.ee_len;
- ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
- ext4_ext_mark_uninitialized(ex);
- ext4_ext_dirty(handle, inode, path + depth);
- return err;
+ flags |= EXT4_GET_BLOCKS_PRE_IO;
+ return ext4_split_extent(handle, inode, path, map, split_flag, flags);
}

static int ext4_convert_unwritten_extents_endio(handle_t *handle,
--
1.7.4.4


2011-04-14 14:14:29

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH REC 0/3] ext4:Factor common code from convert and split unwritten.

On Wed, Apr 13, 2011 at 10:40:48PM -0700, Yongqiang Yang wrote:
> These patches factor common code from ext4_ext_convert_to_initialized() and
> ext4_split_unwritten_extents(), so that extent-move-on-write in snapshot and
> punch-hole can be built on the common code as well.
>
> ext4:Add a function merging extent right and left.
> ext4:Add two functions splitting an extent.
> ext4:Reimplement convert and split_unwritten.

Hi Yongqiang,

I'd like to hear back from Mingming and Allison about whether these
patches conflict with their punch patches, since my understanding was
they were planning on doing very similar things (and at least would be
working in the very similar parts of the code).

I agree that the extents code here could definitely use this cleanup,
and I'm really glad that you worked on this; thank you! I just want
to make sure work you are doing is coordinated with the work that
Allison and Mingming are working on.

- Ted

2011-04-14 14:34:39

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH REC 0/3] ext4:Factor common code from convert and split unwritten.

On Thu, Apr 14, 2011 at 10:14 PM, Ted Ts'o <[email protected]> wrote:
> On Wed, Apr 13, 2011 at 10:40:48PM -0700, Yongqiang Yang wrote:
>> These patches factor common code from ext4_ext_convert_to_initialized() and
>> ext4_split_unwritten_extents(), so that extent-move-on-write in snapshot and
>> punch-hole can be built on the common code as well.
>>
>> ext4:Add a function merging extent right and left.
>> ext4:Add two functions splitting an extent.
>> ext4:Reimplement convert and split_unwritten.
>
> Hi Yongqiang,
>
> I'd like to hear back from Mingming and Allison about whether these
> patches conflict with their punch patches, since my understanding was
> they were planning on doing very similar things (and at least would be
> working in the very similar parts of the code).
>
> I agree that the extents code here could definitely use this cleanup,
> and I'm really glad that you worked on this; thank you! ?I just want
> to make sure work you are doing is coordinated with the work that
> Allison and Mingming are working on.

I knew it and Mingming told me that common code should be factor out
in previous emails. so I cced patches to Mingming and Allison.
extent-move-on-write in ext4-snapshots needs a function splitting an
initialized extent.

These patches indeed need advices, reviews and even tests from
Mingming and Allison.

Thank you,
Yongqiang.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted
>



--
Best Wishes
Yongqiang Yang

2011-04-14 14:36:23

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH REC 0/3] ext4:Factor common code from convert and split unwritten.

On Thu, Apr 14, 2011 at 5:14 PM, Ted Ts'o <[email protected]> wrote:
> On Wed, Apr 13, 2011 at 10:40:48PM -0700, Yongqiang Yang wrote:
>> These patches factor common code from ext4_ext_convert_to_initialized() and
>> ext4_split_unwritten_extents(), so that extent-move-on-write in snapshot and
>> punch-hole can be built on the common code as well.
>>
>> ext4:Add a function merging extent right and left.
>> ext4:Add two functions splitting an extent.
>> ext4:Reimplement convert and split_unwritten.
>
> Hi Yongqiang,
>
> I'd like to hear back from Mingming and Allison about whether these
> patches conflict with their punch patches, since my understanding was
> they were planning on doing very similar things (and at least would be
> working in the very similar parts of the code).
>
> I agree that the extents code here could definitely use this cleanup,
> and I'm really glad that you worked on this; thank you! ?I just want
> to make sure work you are doing is coordinated with the work that
> Allison and Mingming are working on.
>

I did tell Mingming the patches are coming her way on LSF,
so she should be expecting them.
I hope Allison and Mingming like the results too :-)

Amir.

2011-04-14 18:46:05

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH REC 0/3] ext4:Factor common code from convert and split unwritten.

On Thu, 2011-04-14 at 22:34 +0800, Yongqiang Yang wrote:
> On Thu, Apr 14, 2011 at 10:14 PM, Ted Ts'o <[email protected]> wrote:
> > On Wed, Apr 13, 2011 at 10:40:48PM -0700, Yongqiang Yang wrote:
> >> These patches factor common code from ext4_ext_convert_to_initialized() and
> >> ext4_split_unwritten_extents(), so that extent-move-on-write in snapshot and
> >> punch-hole can be built on the common code as well.
> >>
> >> ext4:Add a function merging extent right and left.
> >> ext4:Add two functions splitting an extent.
> >> ext4:Reimplement convert and split_unwritten.
> >
> > Hi Yongqiang,
> >
> > I'd like to hear back from Mingming and Allison about whether these
> > patches conflict with their punch patches, since my understanding was
> > they were planning on doing very similar things (and at least would be
> > working in the very similar parts of the code).
> >
> > I agree that the extents code here could definitely use this cleanup,
> > and I'm really glad that you worked on this; thank you! I just want
> > to make sure work you are doing is coordinated with the work that
> > Allison and Mingming are working on.
>
> I knew it and Mingming told me that common code should be factor out
> in previous emails. so I cced patches to Mingming and Allison.
> extent-move-on-write in ext4-snapshots needs a function splitting an
> initialized extent.
>
> These patches indeed need advices, reviews and even tests from
> Mingming and Allison.

Hi Yongqiang,

Thanks for working on this. I took a try before linux filesystem and
storage summit and didn't get completed. I was slightly concerned about
the amount of complexity to adding flags in order to merging the common
code. It's good to hear that you get this far and I will review it.

Allison is working on re-use the direct IO version split code in her
punch hole work, but leave the buffered IO version code unchanged.

Mingming


2011-04-14 19:03:15

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH RFC 1/3] ext4:Add a function merging extent right and left.

On Wed, 2011-04-13 at 22:40 -0700, Yongqiang Yang wrote:
> 1] Rename ext4_ext_try_to_merge() to ext4_ext_try_to_merge_right().
>
> 2] Add a new function ext4_ext_try_to_merge() which tries to merge
> an extent both left and right.
>
> 3] Use the new function in ext4_ext_convert_unwritten_endio() and
> ext4_ext_insert_extent().
>
> Signed-off-by: Yongqiang Yang <[email protected]>
> ---
> fs/ext4/extents.c | 65 ++++++++++++++++++++++++++++------------------------
> 1 files changed, 35 insertions(+), 30 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index dd2cb50..11f30d2 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -1563,7 +1563,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
> * Returns 0 if the extents (ex and ex+1) were _not_ merged and returns
> * 1 if they got merged.
> */
> -static int ext4_ext_try_to_merge(struct inode *inode,
> +static int ext4_ext_try_to_merge_right(struct inode *inode,
> struct ext4_ext_path *path,
> struct ext4_extent *ex)
> {
> @@ -1603,6 +1603,31 @@ static int ext4_ext_try_to_merge(struct inode *inode,
> }
>
> /*
> + * This function tries to merge the @ex extent to neighbours in the tree.
> + * return 1 if merge left else 0.
> + */
> +static int ext4_ext_try_to_merge(struct inode *inode,
> + struct ext4_ext_path *path,
> + struct ext4_extent *ex) {
> + struct ext4_extent_header *eh;
> + unsigned int depth;
> + int merge_done = 0;
> + int ret = 0;
> +
> + depth = ext_depth(inode);
> + BUG_ON(path[depth].p_hdr == NULL);
> + eh = path[depth].p_hdr;
> +
> + if (ex > EXT_FIRST_EXTENT(eh))
> + merge_done = ext4_ext_try_to_merge_right(inode, path, ex - 1);
> +
> + if (!merge_done)
> + ret = ext4_ext_try_to_merge_right(inode, path, ex);
> +

Is there any reason not to merge right after merge left? The old code
does both, I think.

> + return ret;
> +}
> +
> +/*
> * check if a portion of the "newext" extent overlaps with an
> * existing extent.
> *
> @@ -3039,6 +3064,7 @@ fix_extent_len:
> ext4_ext_dirty(handle, inode, path + depth);
> return err;
> }
> +
> static int ext4_convert_unwritten_extents_endio(handle_t *handle,
> struct inode *inode,
> struct ext4_ext_path *path)
> @@ -3047,46 +3073,25 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
> struct ext4_extent_header *eh;
> int depth;
> int err = 0;
> - int ret = 0;
>
> depth = ext_depth(inode);
> eh = path[depth].p_hdr;
> ex = path[depth].p_ext;
>
> + ext_debug("ext4_convert_unwritten_extents_endio: inode %lu, logical"
> + "block %llu, max_blocks %u\n", inode->i_ino,
> + (unsigned long long)le32_to_cpu(ex->ee_block),
> + ext4_ext_get_actual_len(ex));
> +
> err = ext4_ext_get_access(handle, inode, path + depth);
> if (err)
> goto out;
> /* first mark the extent as initialized */
> ext4_ext_mark_initialized(ex);
>
> - /*
> - * We have to see if it can be merged with the extent
> - * on the left.
> - */
> - if (ex > EXT_FIRST_EXTENT(eh)) {
> - /*
> - * To merge left, pass "ex - 1" to try_to_merge(),
> - * since it merges towards right _only_.
> - */
> - ret = ext4_ext_try_to_merge(inode, path, ex - 1);
> - if (ret) {
> - err = ext4_ext_correct_indexes(handle, inode, path);
> - if (err)
> - goto out;
> - depth = ext_depth(inode);
> - ex--;
> - }
> - }
> - /*
> - * Try to Merge towards right.
> - */
> - ret = ext4_ext_try_to_merge(inode, path, ex);
> - if (ret) {
> - err = ext4_ext_correct_indexes(handle, inode, path);
> - if (err)
> - goto out;
> - depth = ext_depth(inode);
> - }
> + /* correct indexes is nt needed becasue borders are not changed */
> + ext4_ext_try_to_merge(inode, path, ex);
> +

Ah, so you discovered an issue -- currently ext4 can't merge across the
index block borders. that's a pity. This might need to fix up,
especially with split is going to be heavily used in punch hole,
snapshots, direct IO handling holes.


> /* Mark modified extent as dirty */
> err = ext4_ext_dirty(handle, inode, path + depth);
> out:



2011-04-15 01:14:16

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH RFC 1/3] ext4:Add a function merging extent right and left.

On Fri, Apr 15, 2011 at 3:03 AM, Mingming Cao <[email protected]> wrote:
> On Wed, 2011-04-13 at 22:40 -0700, Yongqiang Yang wrote:
>> 1] Rename ext4_ext_try_to_merge() to ext4_ext_try_to_merge_right().
>>
>> 2] Add a new function ext4_ext_try_to_merge() which tries to merge
>> ? ?an extent both left and right.
>>
>> 3] Use the new function in ext4_ext_convert_unwritten_endio() and
>> ? ?ext4_ext_insert_extent().
>>
>> Signed-off-by: Yongqiang Yang <[email protected]>
>> ---
>> ?fs/ext4/extents.c | ? 65 ++++++++++++++++++++++++++++------------------------
>> ?1 files changed, 35 insertions(+), 30 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index dd2cb50..11f30d2 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -1563,7 +1563,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
>> ? * Returns 0 if the extents (ex and ex+1) were _not_ merged and returns
>> ? * 1 if they got merged.
>> ? */
>> -static int ext4_ext_try_to_merge(struct inode *inode,
>> +static int ext4_ext_try_to_merge_right(struct inode *inode,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct ext4_ext_path *path,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct ext4_extent *ex)
>> ?{
>> @@ -1603,6 +1603,31 @@ static int ext4_ext_try_to_merge(struct inode *inode,
>> ?}
>>
>> ?/*
>> + * This function tries to merge the @ex extent to neighbours in the tree.
>> + * return 1 if merge left else 0.
>> + */
>> +static int ext4_ext_try_to_merge(struct inode *inode,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct ext4_ext_path *path,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct ext4_extent *ex) {
>> + ? ? struct ext4_extent_header *eh;
>> + ? ? unsigned int depth;
>> + ? ? int merge_done = 0;
>> + ? ? int ret = 0;
>> +
>> + ? ? depth = ext_depth(inode);
>> + ? ? BUG_ON(path[depth].p_hdr == NULL);
>> + ? ? eh = path[depth].p_hdr;
>> +
>> + ? ? if (ex > EXT_FIRST_EXTENT(eh))
>> + ? ? ? ? ? ? merge_done = ext4_ext_try_to_merge_right(inode, path, ex - 1);
>> +
>> + ? ? if (!merge_done)
>> + ? ? ? ? ? ? ret = ext4_ext_try_to_merge_right(inode, path, ex);
>> +
>
> Is there any reason not to merge right after merge left? The old code
> does both, I think.
What does merge left do? Actually it does as merge right, it merge
ex-1 to right until no more merges can be done. So if merge to right
happens then, ex,ex+1 has been tried to merge also.

>
>> + ? ? return ret;
>> +}
>> +
>> +/*
>> ? * check if a portion of the "newext" extent overlaps with an
>> ? * existing extent.
>> ? *
>> @@ -3039,6 +3064,7 @@ fix_extent_len:
>> ? ? ? ext4_ext_dirty(handle, inode, path + depth);
>> ? ? ? return err;
>> ?}
>> +
>> ?static int ext4_convert_unwritten_extents_endio(handle_t *handle,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct inode *inode,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct ext4_ext_path *path)
>> @@ -3047,46 +3073,25 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
>> ? ? ? struct ext4_extent_header *eh;
>> ? ? ? int depth;
>> ? ? ? int err = 0;
>> - ? ? int ret = 0;
>>
>> ? ? ? depth = ext_depth(inode);
>> ? ? ? eh = path[depth].p_hdr;
>> ? ? ? ex = path[depth].p_ext;
>>
>> + ? ? ext_debug("ext4_convert_unwritten_extents_endio: inode %lu, logical"
>> + ? ? ? ? ? ? "block %llu, max_blocks %u\n", inode->i_ino,
>> + ? ? ? ? ? ? (unsigned long long)le32_to_cpu(ex->ee_block),
>> + ? ? ? ? ? ? ext4_ext_get_actual_len(ex));
>> +
>> ? ? ? err = ext4_ext_get_access(handle, inode, path + depth);
>> ? ? ? if (err)
>> ? ? ? ? ? ? ? goto out;
>> ? ? ? /* first mark the extent as initialized */
>> ? ? ? ext4_ext_mark_initialized(ex);
>>
>> - ? ? /*
>> - ? ? ?* We have to see if it can be merged with the extent
>> - ? ? ?* on the left.
>> - ? ? ?*/
>> - ? ? if (ex > EXT_FIRST_EXTENT(eh)) {
>> - ? ? ? ? ? ? /*
>> - ? ? ? ? ? ? ?* To merge left, pass "ex - 1" to try_to_merge(),
>> - ? ? ? ? ? ? ?* since it merges towards right _only_.
>> - ? ? ? ? ? ? ?*/
>> - ? ? ? ? ? ? ret = ext4_ext_try_to_merge(inode, path, ex - 1);
>> - ? ? ? ? ? ? if (ret) {
>> - ? ? ? ? ? ? ? ? ? ? err = ext4_ext_correct_indexes(handle, inode, path);
>> - ? ? ? ? ? ? ? ? ? ? if (err)
>> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto out;
>> - ? ? ? ? ? ? ? ? ? ? depth = ext_depth(inode);
>> - ? ? ? ? ? ? ? ? ? ? ex--;
>> - ? ? ? ? ? ? }
>> - ? ? }
>> - ? ? /*
>> - ? ? ?* Try to Merge towards right.
>> - ? ? ?*/
>> - ? ? ret = ext4_ext_try_to_merge(inode, path, ex);
>> - ? ? if (ret) {
>> - ? ? ? ? ? ? err = ext4_ext_correct_indexes(handle, inode, path);
>> - ? ? ? ? ? ? if (err)
>> - ? ? ? ? ? ? ? ? ? ? goto out;
>> - ? ? ? ? ? ? depth = ext_depth(inode);
>> - ? ? }
>> + ? ? /* correct indexes is nt needed becasue borders are not changed */
>> + ? ? ext4_ext_try_to_merge(inode, path, ex);
>> +
>
> Ah, so you discovered an issue -- currently ext4 can't merge across the
> index block borders. that's a pity. This might need to fix up,
Yes, currently borders are changed only in one case that storing an
extent to an empty leaf.

I think the patch which enable merging across index block should be
independent on these patches.


> especially with split is going to be heavily used in punch hole,
> snapshots, direct IO handling holes.
What does direct IO here refer to? It seems I am missing something.

These patches involves ext4_ext_convrt_to_initialized() in buffer
write case, and split_unwritten_extents() and
convert_to_initialized_endio() in direct IO case.

>
>
>> ? ? ? /* Mark modified extent as dirty */
>> ? ? ? err = ext4_ext_dirty(handle, inode, path + depth);
>> ?out:
>
>
>



--
Best Wishes
Yongqiang Yang

2011-04-15 16:39:42

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH RFC 1/3] ext4:Add a function merging extent right and left.

On Fri, 2011-04-15 at 09:12 +0800, Yongqiang Yang wrote:
> On Fri, Apr 15, 2011 at 3:03 AM, Mingming Cao <[email protected]> wrote:
> > On Wed, 2011-04-13 at 22:40 -0700, Yongqiang Yang wrote:
> >> 1] Rename ext4_ext_try_to_merge() to ext4_ext_try_to_merge_right().
> >>
> >> 2] Add a new function ext4_ext_try_to_merge() which tries to merge
> >> an extent both left and right.
> >>
> >> 3] Use the new function in ext4_ext_convert_unwritten_endio() and
> >> ext4_ext_insert_extent().
> >>
> >> Signed-off-by: Yongqiang Yang <[email protected]>
> >> ---
> >> fs/ext4/extents.c | 65 ++++++++++++++++++++++++++++------------------------
> >> 1 files changed, 35 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index dd2cb50..11f30d2 100644
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -1563,7 +1563,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
> >> * Returns 0 if the extents (ex and ex+1) were _not_ merged and returns
> >> * 1 if they got merged.
> >> */
> >> -static int ext4_ext_try_to_merge(struct inode *inode,
> >> +static int ext4_ext_try_to_merge_right(struct inode *inode,
> >> struct ext4_ext_path *path,
> >> struct ext4_extent *ex)
> >> {
> >> @@ -1603,6 +1603,31 @@ static int ext4_ext_try_to_merge(struct inode *inode,
> >> }
> >>
> >> /*
> >> + * This function tries to merge the @ex extent to neighbours in the tree.
> >> + * return 1 if merge left else 0.
> >> + */
> >> +static int ext4_ext_try_to_merge(struct inode *inode,
> >> + struct ext4_ext_path *path,
> >> + struct ext4_extent *ex) {
> >> + struct ext4_extent_header *eh;
> >> + unsigned int depth;
> >> + int merge_done = 0;
> >> + int ret = 0;
> >> +
> >> + depth = ext_depth(inode);
> >> + BUG_ON(path[depth].p_hdr == NULL);
> >> + eh = path[depth].p_hdr;
> >> +
> >> + if (ex > EXT_FIRST_EXTENT(eh))
> >> + merge_done = ext4_ext_try_to_merge_right(inode, path, ex - 1);
> >> +
> >> + if (!merge_done)
> >> + ret = ext4_ext_try_to_merge_right(inode, path, ex);
> >> +
> >
> > Is there any reason not to merge right after merge left? The old code
> > does both, I think.
> What does merge left do? Actually it does as merge right, it merge
> ex-1 to right until no more merges can be done. So if merge to right
> happens then, ex,ex+1 has been tried to merge also.
>

Yep, you are right, the merge is always merge toward right.

> >
> >> + return ret;
> >> +}
> >> +
> >> +/*
> >> * check if a portion of the "newext" extent overlaps with an
> >> * existing extent.
> >> *
> >> @@ -3039,6 +3064,7 @@ fix_extent_len:
> >> ext4_ext_dirty(handle, inode, path + depth);
> >> return err;
> >> }
> >> +
> >> static int ext4_convert_unwritten_extents_endio(handle_t *handle,
> >> struct inode *inode,
> >> struct ext4_ext_path *path)
> >> @@ -3047,46 +3073,25 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
> >> struct ext4_extent_header *eh;
> >> int depth;
> >> int err = 0;
> >> - int ret = 0;
> >>
> >> depth = ext_depth(inode);
> >> eh = path[depth].p_hdr;
> >> ex = path[depth].p_ext;
> >>
> >> + ext_debug("ext4_convert_unwritten_extents_endio: inode %lu, logical"
> >> + "block %llu, max_blocks %u\n", inode->i_ino,
> >> + (unsigned long long)le32_to_cpu(ex->ee_block),
> >> + ext4_ext_get_actual_len(ex));
> >> +
> >> err = ext4_ext_get_access(handle, inode, path + depth);
> >> if (err)
> >> goto out;
> >> /* first mark the extent as initialized */
> >> ext4_ext_mark_initialized(ex);
> >>
> >> - /*
> >> - * We have to see if it can be merged with the extent
> >> - * on the left.
> >> - */
> >> - if (ex > EXT_FIRST_EXTENT(eh)) {
> >> - /*
> >> - * To merge left, pass "ex - 1" to try_to_merge(),
> >> - * since it merges towards right _only_.
> >> - */
> >> - ret = ext4_ext_try_to_merge(inode, path, ex - 1);
> >> - if (ret) {
> >> - err = ext4_ext_correct_indexes(handle, inode, path);
> >> - if (err)
> >> - goto out;
> >> - depth = ext_depth(inode);
> >> - ex--;
> >> - }
> >> - }
> >> - /*
> >> - * Try to Merge towards right.
> >> - */
> >> - ret = ext4_ext_try_to_merge(inode, path, ex);
> >> - if (ret) {
> >> - err = ext4_ext_correct_indexes(handle, inode, path);
> >> - if (err)
> >> - goto out;
> >> - depth = ext_depth(inode);
> >> - }
> >> + /* correct indexes is nt needed becasue borders are not changed */
> >> + ext4_ext_try_to_merge(inode, path, ex);
> >> +
> >
> > Ah, so you discovered an issue -- currently ext4 can't merge across the
> > index block borders. that's a pity. This might need to fix up,
> Yes, currently borders are changed only in one case that storing an
> extent to an empty leaf.
>
> I think the patch which enable merging across index block should be
> independent on these patches.
>

Sure.

>
> > especially with split is going to be heavily used in punch hole,
> > snapshots, direct IO handling holes.
> What does direct IO here refer to? It seems I am missing something.
>
> These patches involves ext4_ext_convrt_to_initialized() in buffer
> write case, and split_unwritten_extents() and
> convert_to_initialized_endio() in direct IO case.
>

The plit_unwritten_extents() and convert_to_initialized_endio() were
added to support direct IO on holes/fallocated space. That's what I am
referring to above.

> >
> >
> >> /* Mark modified extent as dirty */
> >> err = ext4_ext_dirty(handle, inode, path + depth);
> >> out:
> >
> >
> >
>
>
>



2011-04-20 17:22:25

by Allison Henderson

[permalink] [raw]
Subject: Re: [PATCH RFC 2/3] ext4:Add two functions splitting an extent.

On 4/13/2011 10:40 PM, Yongqiang Yang wrote:
> 1] Add a function named ext4_split_extent_at() which splits an extent
> into two extents at given logical block.
>
> 2] Add a function called ext4_split_extent() which splits an extent
> into three extents.
>
> Signed-off-by: Yongqiang Yang<[email protected]>
> ---
> fs/ext4/extents.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 200 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 11f30d2..1756e0f 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2554,6 +2554,206 @@ static int ext4_ext_zeroout(struct inode *inode, struct ext4_extent *ex)
> return ret;
> }
>
> +/*
> + * used by extent splitting.
> + */
> +#define EXT4_EXT_MAY_ZEROOUT 0x1 /* safe to zeroout if split fails \
> + due to ENOSPC */
> +#define EXT4_EXT_MARK_UNINIT1 0x2 /* mark first half uninitialized */
> +#define EXT4_EXT_MARK_UNINIT2 0x4 /* mark second half uninitialized */
> +
> +/*
> + * ext4_split_extent_at() splits an extent at given block.
> + *
> + * @handle: the journal handle
> + * @inode: the file inode
> + * @path: the path to the extent
> + * @split: the logical block where the extent is splitted.
> + * @split_flags: indicates if the extent could be zeroout if split fails, and
> + * the states(init or uninit) of new extents.
> + * @flags: flags used to insert new extent to extent tree.
> + *
> + *
> + * Splits extent [a, b] into two extents [a, @split) and [@split, b], states
> + * of which are deterimined by split_flag.
> + *
> + * There are two cases:
> + * a> the extent are splitted into two extent.
> + * b> split is not needed, and just mark the extent.
> + *
> + * return 0 on success.
> + */
> +static int ext4_split_extent_at(handle_t *handle,
> + struct inode *inode,
> + struct ext4_ext_path *path,
> + ext4_lblk_t split,
> + int split_flag,
> + int flags)
> +{
> + ext4_fsblk_t newblock;
> + ext4_lblk_t ee_block;
> + struct ext4_extent_header *eh;
> + struct ext4_extent *ex, newex, orig_ex;
> + struct ext4_extent *ex2 = NULL;
> + unsigned int ee_len, depth;
> + int err = 0;
> +
> + ext_debug("ext4_split_extents_at: inode %lu, logical"
> + "block %llu\n", inode->i_ino, (unsigned long long)split);
> +
> + ext4_ext_show_leaf(inode, path);
> +
> + depth = ext_depth(inode);
> + eh = path[depth].p_hdr;
> + ex = path[depth].p_ext;
> + ee_block = le32_to_cpu(ex->ee_block);
> + ee_len = ext4_ext_get_actual_len(ex);
> + newblock = split - ee_block + ext4_ext_pblock(ex);
> +
> + BUG_ON(split< ee_block || split>= (ee_block + ee_len));
> +
> + err = ext4_ext_get_access(handle, inode, path + depth);
> + if (err)
> + goto out;
> +
> + if (split == ee_block) {
> + /*
> + * case b: block @split is the block that the extent begins with
> + * then we just change the state of the extent, and splitting
> + * is not needed.
> + */
> + if (split_flag& EXT4_EXT_MARK_UNINIT2)
> + ext4_ext_mark_uninitialized(ex);
> + else
> + ext4_ext_mark_initialized(ex);
> +
> + if (!(flags& EXT4_GET_BLOCKS_PRE_IO))
> + ext4_ext_try_to_merge(inode, path, ex);
> +
> + err = ext4_ext_dirty(handle, inode, path + depth);
> + goto out;
> + }
> +
> + /* case a */
> + orig_ex.ee_block = ex->ee_block;
> + orig_ex.ee_len = ex->ee_len;
> + ext4_ext_store_pblock(&orig_ex, ext4_ext_pblock(ex));
> +
> + ex->ee_len = cpu_to_le16(split - ee_block);
> + if (split_flag& EXT4_EXT_MARK_UNINIT1)
> + ext4_ext_mark_uninitialized(ex);
> +
> + /*
> + * path may lead to new leaf, not to original leaf any more
> + * after ext4_ext_insert_extent() returns,
> + */
> + err = ext4_ext_dirty(handle, inode, path + depth);
> + if (err)
> + goto fix_extent_len;
> +
> + ex2 =&newex;
> + ex2->ee_block = cpu_to_le32(split);
> + ex2->ee_len = cpu_to_le16(ee_len - (split - ee_block));
> + ext4_ext_store_pblock(ex2, newblock);
> + if (split_flag& EXT4_EXT_MARK_UNINIT2)
> + ext4_ext_mark_uninitialized(ex2);
> +
> + err = ext4_ext_insert_extent(handle, inode, path,&newex, flags);
> + if (err == -ENOSPC&& (EXT4_EXT_MAY_ZEROOUT& split_flag)) {
> +
> + err = ext4_ext_zeroout(inode,&orig_ex);
> + if (err)
> + goto fix_extent_len;
> + /* update the extent length and mark as initialized */
> + ex->ee_block = orig_ex.ee_block;
> + ex->ee_len = orig_ex.ee_len;
> + ext4_ext_mark_initialized(ex);
> + ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> + err = ext4_ext_dirty(handle, inode, path + depth);
> + goto out;
> + } else if (err)
> + goto fix_extent_len;
> +
> +out:
> + ext4_ext_show_leaf(inode, path);
> + return err;
> +
> +fix_extent_len:
> + ex->ee_block = orig_ex.ee_block;
> + ex->ee_len = orig_ex.ee_len;
> + ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> + ext4_ext_dirty(handle, inode, path + depth);
> + return err;
> +}
> +
> +/*
> + * ext4_split_extents() splits an extent and mark extent which is covered
> + * by @map as split_flags indicates
> + *
> + * It may result in splitting the extent into multiple extents (upto three)
> + * There are three possibilities:
> + * a> There is no split required
> + * b> Splits in two extents: Split is happening at either end of the extent
> + * c> Splits in three extents: Somone is splitting in middle of the extent
> + *
> + */
> +static int ext4_split_extent(handle_t *handle,
> + struct inode *inode,
> + struct ext4_ext_path *path,
> + struct ext4_map_blocks *map,
> + int split_flag,
> + int flags)
> +{
> + ext4_lblk_t ee_block;
> + struct ext4_extent *ex;
> + unsigned int ee_len, depth;
> + int err = 0;
> + int uninitialized;
> + int split_flag1, flags1;
> +
> + depth = ext_depth(inode);
> + ex = path[depth].p_ext;
> + ee_block = le32_to_cpu(ex->ee_block);
> + ee_len = ext4_ext_get_actual_len(ex);
> + uninitialized = ext4_ext_is_uninitialized(ex);
> +
> + if (map->m_lblk + map->m_len< ee_block + ee_len) {
> + split_flag1 = 0;
> + split_flag1 |= split_flag& EXT4_EXT_MAY_ZEROOUT ?
> + EXT4_EXT_MAY_ZEROOUT : 0;
> + flags1 = flags;
> + flags1 |= EXT4_GET_BLOCKS_PRE_IO;
> + if (uninitialized)
> + split_flag1 |= EXT4_EXT_MARK_UNINIT1 |
> + EXT4_EXT_MARK_UNINIT2;
> + err = ext4_split_extent_at(handle, inode, path,
> + map->m_lblk + map->m_len, split_flag1, flags1);
> + }
> +
> + ext4_ext_drop_refs(path);
> + path = ext4_ext_find_extent(inode, map->m_lblk, path);
> + if (IS_ERR(path))
> + return PTR_ERR(path);
> +
> + if (map->m_lblk>= ee_block) {
> + split_flag1 = 0;
> + split_flag1 |= split_flag& EXT4_EXT_MAY_ZEROOUT ?
> + EXT4_EXT_MAY_ZEROOUT : 0;
> + if (uninitialized)
> + split_flag1 |= EXT4_EXT_MARK_UNINIT1;
> + if (split_flag& EXT4_EXT_MARK_UNINIT2)
> + split_flag1 |= EXT4_EXT_MARK_UNINIT2;
> + err = ext4_split_extent_at(handle, inode, path,
> + map->m_lblk, split_flag1, flags);
> + if (err)
> + goto out;
> + }
> +
> + ext4_ext_show_leaf(inode, path);
> +out:
> + return err ? err : map->m_len;
> +}
> +
> #define EXT4_EXT_ZERO_LEN 7
> /*
> * This function is called by ext4_ext_map_blocks() if someone tries to write


Hi there,

I've been working on trying to get the punch hole patch to work with with these new changes, but it looks like some test cases are not passing at the moment, so I'm trying track down where the issues are. I had to make some adjustments to this patch to fix one of the test cases. Here is what I did:

---
:100644 100644 ee2dda3... c7d763d... M fs/ext4/extents.c
fs/ext4/extents.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ee2dda3..c7d763d 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2717,12 +2717,12 @@ static int ext4_split_extent(handle_t *handle,
ee_len = ext4_ext_get_actual_len(ex);
uninitialized = ext4_ext_is_uninitialized(ex);

+ flags1 = flags;
+ flags1 |= EXT4_GET_BLOCKS_PRE_IO;
if (map->m_lblk + map->m_len < ee_block + ee_len) {
split_flag1 = 0;
split_flag1 |= split_flag & EXT4_EXT_MAY_ZEROOUT ?
EXT4_EXT_MAY_ZEROOUT : 0;
- flags1 = flags;
- flags1 |= EXT4_GET_BLOCKS_PRE_IO;
if (uninitialized)
split_flag1 |= EXT4_EXT_MARK_UNINIT1 |
EXT4_EXT_MARK_UNINIT2;
@@ -2744,7 +2744,7 @@ static int ext4_split_extent(handle_t *handle,
if (split_flag & EXT4_EXT_MARK_UNINIT2)
split_flag1 |= EXT4_EXT_MARK_UNINIT2;
err = ext4_split_extent_at(handle, inode, path,
- map->m_lblk, split_flag1, flags);
+ map->m_lblk, split_flag1, flags1);
if (err)
goto out;
}
--
1.7.1


Let me know if this change works for you. It looks like the problem was that the extents were getting merged back together because the second ext4_split_extent_at didnt have the EXT4_GET_BLOCKS_PRE_IO flag, so this should fix it. There are still some other test cases that are showing some odd behavior, so I will keep you posted if I have to make any other changes.

Allison Henderson

2011-04-20 18:13:07

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH RFC 2/3] ext4:Add two functions splitting an extent.

On 2011-04-20, at 11:21 AM, Allison Henderson wrote:
> I've been working on trying to get the punch hole patch to work with with these new changes, but it looks like some test cases are not passing at the moment, so I'm trying track down where the issues are. I had to make some adjustments to this patch to fix one of the test cases. Here is what I did:
>
> ---
> :100644 100644 ee2dda3... c7d763d... M fs/ext4/extents.c
> fs/ext4/extents.c | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index ee2dda3..c7d763d 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2717,12 +2717,12 @@ static int ext4_split_extent(handle_t *handle,
> ee_len = ext4_ext_get_actual_len(ex);
> uninitialized = ext4_ext_is_uninitialized(ex);
>
> + flags1 = flags;
> + flags1 |= EXT4_GET_BLOCKS_PRE_IO;

Can you please use normal C style: "flags1 = flags | EXT4_GET_BLOCKS_PRE_IO;"


Cheers, Andreas






2011-04-20 20:52:45

by Allison Henderson

[permalink] [raw]
Subject: Re: [PATCH RFC 2/3] ext4:Add two functions splitting an extent.

On 4/20/2011 11:13 AM, Andreas Dilger wrote:
> On 2011-04-20, at 11:21 AM, Allison Henderson wrote:
>> I've been working on trying to get the punch hole patch to work with with these new changes, but it looks like some test cases are not passing at the moment, so I'm trying track down where the issues are. I had to make some adjustments to this patch to fix one of the test cases. Here is what I did:
>>
>> ---
>> :100644 100644 ee2dda3... c7d763d... M fs/ext4/extents.c
>> fs/ext4/extents.c | 6 +++---
>> 1 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index ee2dda3..c7d763d 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -2717,12 +2717,12 @@ static int ext4_split_extent(handle_t *handle,
>> ee_len = ext4_ext_get_actual_len(ex);
>> uninitialized = ext4_ext_is_uninitialized(ex);
>>
>> + flags1 = flags;
>> + flags1 |= EXT4_GET_BLOCKS_PRE_IO;
>
> Can you please use normal C style: "flags1 = flags | EXT4_GET_BLOCKS_PRE_IO;"
>
>
> Cheers, Andreas
>
>
>
>
>
Hi again,

I realize after I sent the note that maybe the caller is still expected to pass EXT4_GET_BLOCKS_PRE_IO in the flags parameter if it is supposed to be used in both ext4_split_extent_at cases. That also fixes that particular failing test case. So maybe we wont need the extra fix if that is how the code is intended to work. In any case though, yes "flags1 = flags | EXT4_GET_BLOCKS_PRE_IO;" does seem a bit cleaner.

Allison Henderson

2011-04-21 01:18:57

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH RFC 2/3] ext4:Add two functions splitting an extent.

On Thu, Apr 21, 2011 at 1:21 AM, Allison Henderson
<[email protected]> wrote:
> On 4/13/2011 10:40 PM, Yongqiang Yang wrote:
>> 1] Add a function named ext4_split_extent_at() which splits an extent
>> ? ? into two extents at given logical block.
>>
>> 2] Add a function called ext4_split_extent() which splits an extent
>> ? ? into three extents.
>>
>> Signed-off-by: Yongqiang Yang<[email protected]>
>> ---
>> ? fs/ext4/extents.c | ?200 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ? 1 files changed, 200 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 11f30d2..1756e0f 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -2554,6 +2554,206 @@ static int ext4_ext_zeroout(struct inode *inode, struct ext4_extent *ex)
>> ? ? ? return ret;
>> ? }
>>
>> +/*
>> + * used by extent splitting.
>> + */
>> +#define EXT4_EXT_MAY_ZEROOUT 0x1 ?/* safe to zeroout if split fails \
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? due to ENOSPC */
>> +#define EXT4_EXT_MARK_UNINIT1 ? ? ? ?0x2 ?/* mark first half uninitialized */
>> +#define EXT4_EXT_MARK_UNINIT2 ? ? ? ?0x4 ?/* mark second half uninitialized */
>> +
>> +/*
>> + * ext4_split_extent_at() splits an extent at given block.
>> + *
>> + * @handle: the journal handle
>> + * @inode: the file inode
>> + * @path: the path to the extent
>> + * @split: the logical block where the extent is splitted.
>> + * @split_flags: indicates if the extent could be zeroout if split fails, and
>> + * ? ? ? ? ? ?the states(init or uninit) of new extents.
>> + * @flags: flags used to insert new extent to extent tree.
>> + *
>> + *
>> + * Splits extent [a, b] into two extents [a, @split) and [@split, b], states
>> + * of which are deterimined by split_flag.
>> + *
>> + * There are two cases:
>> + * ?a> ?the extent are splitted into two extent.
>> + * ?b> ?split is not needed, and just mark the extent.
>> + *
>> + * return 0 on success.
>> + */
>> +static int ext4_split_extent_at(handle_t *handle,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?struct inode *inode,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?struct ext4_ext_path *path,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?ext4_lblk_t split,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?int split_flag,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ?int flags)
>> +{
>> + ? ? ext4_fsblk_t newblock;
>> + ? ? ext4_lblk_t ee_block;
>> + ? ? struct ext4_extent_header *eh;
>> + ? ? struct ext4_extent *ex, newex, orig_ex;
>> + ? ? struct ext4_extent *ex2 = NULL;
>> + ? ? unsigned int ee_len, depth;
>> + ? ? int err = 0;
>> +
>> + ? ? ext_debug("ext4_split_extents_at: inode %lu, logical"
>> + ? ? ? ? ? ? "block %llu\n", inode->i_ino, (unsigned long long)split);
>> +
>> + ? ? ext4_ext_show_leaf(inode, path);
>> +
>> + ? ? depth = ext_depth(inode);
>> + ? ? eh = path[depth].p_hdr;
>> + ? ? ex = path[depth].p_ext;
>> + ? ? ee_block = le32_to_cpu(ex->ee_block);
>> + ? ? ee_len = ext4_ext_get_actual_len(ex);
>> + ? ? newblock = split - ee_block + ext4_ext_pblock(ex);
>> +
>> + ? ? BUG_ON(split< ?ee_block || split>= (ee_block + ee_len));
>> +
>> + ? ? err = ext4_ext_get_access(handle, inode, path + depth);
>> + ? ? if (err)
>> + ? ? ? ? ? ? goto out;
>> +
>> + ? ? if (split == ee_block) {
>> + ? ? ? ? ? ? /*
>> + ? ? ? ? ? ? ?* case b: block @split is the block that the extent begins with
>> + ? ? ? ? ? ? ?* then we just change the state of the extent, and splitting
>> + ? ? ? ? ? ? ?* is not needed.
>> + ? ? ? ? ? ? ?*/
>> + ? ? ? ? ? ? if (split_flag& ?EXT4_EXT_MARK_UNINIT2)
>> + ? ? ? ? ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex);
>> + ? ? ? ? ? ? else
>> + ? ? ? ? ? ? ? ? ? ? ext4_ext_mark_initialized(ex);
>> +
>> + ? ? ? ? ? ? if (!(flags& ?EXT4_GET_BLOCKS_PRE_IO))
>> + ? ? ? ? ? ? ? ? ? ? ext4_ext_try_to_merge(inode, path, ex);
>> +
>> + ? ? ? ? ? ? err = ext4_ext_dirty(handle, inode, path + depth);
>> + ? ? ? ? ? ? goto out;
>> + ? ? }
>> +
>> + ? ? /* case a */
>> + ? ? orig_ex.ee_block = ex->ee_block;
>> + ? ? orig_ex.ee_len ? = ex->ee_len;
>> + ? ? ext4_ext_store_pblock(&orig_ex, ext4_ext_pblock(ex));
>> +
>> + ? ? ex->ee_len = cpu_to_le16(split - ee_block);
>> + ? ? if (split_flag& ?EXT4_EXT_MARK_UNINIT1)
>> + ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex);
>> +
>> + ? ? /*
>> + ? ? ?* path may lead to new leaf, not to original leaf any more
>> + ? ? ?* after ext4_ext_insert_extent() returns,
>> + ? ? ?*/
>> + ? ? err = ext4_ext_dirty(handle, inode, path + depth);
>> + ? ? if (err)
>> + ? ? ? ? ? ? goto fix_extent_len;
>> +
>> + ? ? ex2 =&newex;
>> + ? ? ex2->ee_block = cpu_to_le32(split);
>> + ? ? ex2->ee_len ? = cpu_to_le16(ee_len - (split - ee_block));
>> + ? ? ext4_ext_store_pblock(ex2, newblock);
>> + ? ? if (split_flag& ?EXT4_EXT_MARK_UNINIT2)
>> + ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex2);
>> +
>> + ? ? err = ext4_ext_insert_extent(handle, inode, path,&newex, flags);
>> + ? ? if (err == -ENOSPC&& ?(EXT4_EXT_MAY_ZEROOUT& ?split_flag)) {
>> +
>> + ? ? ? ? ? ? err = ext4_ext_zeroout(inode,&orig_ex);
>> + ? ? ? ? ? ? if (err)
>> + ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
>> + ? ? ? ? ? ? /* update the extent length and mark as initialized */
>> + ? ? ? ? ? ? ex->ee_block = orig_ex.ee_block;
>> + ? ? ? ? ? ? ex->ee_len ? = orig_ex.ee_len;
>> + ? ? ? ? ? ? ext4_ext_mark_initialized(ex);
>> + ? ? ? ? ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
>> + ? ? ? ? ? ? err = ext4_ext_dirty(handle, inode, path + depth);
>> + ? ? ? ? ? ? goto out;
>> + ? ? } else if (err)
>> + ? ? ? ? ? ? goto fix_extent_len;
>> +
>> +out:
>> + ? ? ext4_ext_show_leaf(inode, path);
>> + ? ? return err;
>> +
>> +fix_extent_len:
>> + ? ? ex->ee_block = orig_ex.ee_block;
>> + ? ? ex->ee_len ? = orig_ex.ee_len;
>> + ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
>> + ? ? ext4_ext_dirty(handle, inode, path + depth);
>> + ? ? return err;
>> +}
>> +
>> +/*
>> + * ext4_split_extents() splits an extent and mark extent which is covered
>> + * by @map as split_flags indicates
>> + *
>> + * It may result in splitting the extent into multiple extents (upto three)
>> + * There are three possibilities:
>> + * ? a> ?There is no split required
>> + * ? b> ?Splits in two extents: Split is happening at either end of the extent
>> + * ? c> ?Splits in three extents: Somone is splitting in middle of the extent
>> + *
>> + */
>> +static int ext4_split_extent(handle_t *handle,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? struct inode *inode,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? struct ext4_ext_path *path,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? struct ext4_map_blocks *map,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? int split_flag,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? int flags)
>> +{
>> + ? ? ext4_lblk_t ee_block;
>> + ? ? struct ext4_extent *ex;
>> + ? ? unsigned int ee_len, depth;
>> + ? ? int err = 0;
>> + ? ? int uninitialized;
>> + ? ? int split_flag1, flags1;
>> +
>> + ? ? depth = ext_depth(inode);
>> + ? ? ex = path[depth].p_ext;
>> + ? ? ee_block = le32_to_cpu(ex->ee_block);
>> + ? ? ee_len = ext4_ext_get_actual_len(ex);
>> + ? ? uninitialized = ext4_ext_is_uninitialized(ex);
>> +
>> + ? ? if (map->m_lblk + map->m_len< ?ee_block + ee_len) {
>> + ? ? ? ? ? ? split_flag1 = 0;
>> + ? ? ? ? ? ? split_flag1 |= split_flag& ?EXT4_EXT_MAY_ZEROOUT ?
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?EXT4_EXT_MAY_ZEROOUT : 0;
>> + ? ? ? ? ? ? flags1 = flags;
>> + ? ? ? ? ? ? flags1 |= EXT4_GET_BLOCKS_PRE_IO;
>> + ? ? ? ? ? ? if (uninitialized)
>> + ? ? ? ? ? ? ? ? ? ? split_flag1 |= EXT4_EXT_MARK_UNINIT1 |
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?EXT4_EXT_MARK_UNINIT2;
>> + ? ? ? ? ? ? err = ext4_split_extent_at(handle, inode, path,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? map->m_lblk + map->m_len, split_flag1, flags1);
>> + ? ? }
>> +
>> + ? ? ext4_ext_drop_refs(path);
>> + ? ? path = ext4_ext_find_extent(inode, map->m_lblk, path);
>> + ? ? if (IS_ERR(path))
>> + ? ? ? ? ? ? return PTR_ERR(path);
>> +
>> + ? ? if (map->m_lblk>= ee_block) {
>> + ? ? ? ? ? ? split_flag1 = 0;
>> + ? ? ? ? ? ? split_flag1 |= split_flag& ?EXT4_EXT_MAY_ZEROOUT ?
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?EXT4_EXT_MAY_ZEROOUT : 0;
>> + ? ? ? ? ? ? if (uninitialized)
>> + ? ? ? ? ? ? ? ? ? ? split_flag1 |= EXT4_EXT_MARK_UNINIT1;
>> + ? ? ? ? ? ? if (split_flag& ?EXT4_EXT_MARK_UNINIT2)
>> + ? ? ? ? ? ? ? ? ? ? split_flag1 |= EXT4_EXT_MARK_UNINIT2;
>> + ? ? ? ? ? ? err = ext4_split_extent_at(handle, inode, path,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? map->m_lblk, split_flag1, flags);
>> + ? ? ? ? ? ? if (err)
>> + ? ? ? ? ? ? ? ? ? ? goto out;
>> + ? ? }
>> +
>> + ? ? ext4_ext_show_leaf(inode, path);
>> +out:
>> + ? ? return err ? err : map->m_len;
>> +}
>> +
>> ? #define EXT4_EXT_ZERO_LEN 7
>> ? /*
>> ? ?* This function is called by ext4_ext_map_blocks() if someone tries to write
>
>
> Hi there,
>
> I've been working on trying to get the punch hole patch to work with with these new changes, but it looks like some test cases are not passing at the moment, so I'm trying track down where the issues are. ?I had to make some adjustments to this patch to fix one of the test cases. ?Here is what I did:
>
> ---
> :100644 100644 ee2dda3... c7d763d... M ?fs/ext4/extents.c
> ?fs/ext4/extents.c | ? ?6 +++---
> ?1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index ee2dda3..c7d763d 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2717,12 +2717,12 @@ static int ext4_split_extent(handle_t *handle,
> ? ? ? ?ee_len = ext4_ext_get_actual_len(ex);
> ? ? ? ?uninitialized = ext4_ext_is_uninitialized(ex);
>
> + ? ? ? flags1 = flags;
> + ? ? ? flags1 |= EXT4_GET_BLOCKS_PRE_IO;
Hi, the flag should be passed via parameters,
ext4_convert_to_initialized() uses this function also. so merge
should be done in this case.

If the patches has other problems needing me to fix, please let me know.

Thank you,
Yongqiang.

> ? ? ? ?if (map->m_lblk + map->m_len < ee_block + ee_len) {
> ? ? ? ? ? ? ? ?split_flag1 = 0;
> ? ? ? ? ? ? ? ?split_flag1 |= split_flag & EXT4_EXT_MAY_ZEROOUT ?
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? EXT4_EXT_MAY_ZEROOUT : 0;
> - ? ? ? ? ? ? ? flags1 = flags;
> - ? ? ? ? ? ? ? flags1 |= EXT4_GET_BLOCKS_PRE_IO;
> ? ? ? ? ? ? ? ?if (uninitialized)
> ? ? ? ? ? ? ? ? ? ? ? ?split_flag1 |= EXT4_EXT_MARK_UNINIT1 |
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? EXT4_EXT_MARK_UNINIT2;
> @@ -2744,7 +2744,7 @@ static int ext4_split_extent(handle_t *handle,
> ? ? ? ? ? ? ? ?if (split_flag & EXT4_EXT_MARK_UNINIT2)
> ? ? ? ? ? ? ? ? ? ? ? ?split_flag1 |= EXT4_EXT_MARK_UNINIT2;
> ? ? ? ? ? ? ? ?err = ext4_split_extent_at(handle, inode, path,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? map->m_lblk, split_flag1, flags);
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? map->m_lblk, split_flag1, flags1);
> ? ? ? ? ? ? ? ?if (err)
> ? ? ? ? ? ? ? ? ? ? ? ?goto out;
> ? ? ? ?}
> --
> 1.7.1
>
>
> Let me know if this change works for you. ?It looks like the problem was that the extents were getting merged back together because the second ext4_split_extent_at didnt have the EXT4_GET_BLOCKS_PRE_IO flag, so this should fix it. ?There are still some other test cases that are showing some odd behavior, so I will keep you posted if I have to make any other changes.
>
> Allison Henderson
>



--
Best Wishes
Yongqiang Yang

2011-04-21 01:21:44

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] ext4:Reimplement convert and split_unwritten.

Hi Allison,

Flags are intended to be passed like these two functions.


On Thu, Apr 14, 2011 at 1:40 PM, Yongqiang Yang <[email protected]> wrote:
> convert and split unwritten are reimplemented based on ext4_split_extent()
> added in last patch.
>
> Signed-off-by: Yongqiang Yang <[email protected]>
> ---
> ?fs/ext4/extents.c | ?483 ++++++++--------------------------------------------
> ?1 files changed, 75 insertions(+), 408 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 1756e0f..ee2dda3 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2770,17 +2770,13 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct ext4_map_blocks *map,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct ext4_ext_path *path)
> ?{
> - ? ? ? struct ext4_extent *ex, newex, orig_ex;
> - ? ? ? struct ext4_extent *ex1 = NULL;
> - ? ? ? struct ext4_extent *ex2 = NULL;
> - ? ? ? struct ext4_extent *ex3 = NULL;
> - ? ? ? struct ext4_extent_header *eh;
> + ? ? ? struct ext4_map_blocks split_map;
> + ? ? ? struct ext4_extent zero_ex;
> + ? ? ? struct ext4_extent *ex;
> ? ? ? ?ext4_lblk_t ee_block, eof_block;
> ? ? ? ?unsigned int allocated, ee_len, depth;
> - ? ? ? ext4_fsblk_t newblock;
> ? ? ? ?int err = 0;
> - ? ? ? int ret = 0;
> - ? ? ? int may_zeroout;
> + ? ? ? int split_flag = 0;
>
> ? ? ? ?ext_debug("ext4_ext_convert_to_initialized: inode %lu, logical"
> ? ? ? ? ? ? ? ?"block %llu, max_blocks %u\n", inode->i_ino,
> @@ -2792,280 +2788,90 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> ? ? ? ? ? ? ? ?eof_block = map->m_lblk + map->m_len;
>
> ? ? ? ?depth = ext_depth(inode);
> - ? ? ? eh = path[depth].p_hdr;
> ? ? ? ?ex = path[depth].p_ext;
> ? ? ? ?ee_block = le32_to_cpu(ex->ee_block);
> ? ? ? ?ee_len = ext4_ext_get_actual_len(ex);
> ? ? ? ?allocated = ee_len - (map->m_lblk - ee_block);
> - ? ? ? newblock = map->m_lblk - ee_block + ext4_ext_pblock(ex);
> -
> - ? ? ? ex2 = ex;
> - ? ? ? orig_ex.ee_block = ex->ee_block;
> - ? ? ? orig_ex.ee_len ? = cpu_to_le16(ee_len);
> - ? ? ? ext4_ext_store_pblock(&orig_ex, ext4_ext_pblock(ex));
>
> + ? ? ? WARN_ON(map->m_lblk < ee_block);
> ? ? ? ?/*
> ? ? ? ? * It is safe to convert extent to initialized via explicit
> ? ? ? ? * zeroout only if extent is fully insde i_size or new_size.
> ? ? ? ? */
> - ? ? ? may_zeroout = ee_block + ee_len <= eof_block;
> + ? ? ? split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
>
> - ? ? ? err = ext4_ext_get_access(handle, inode, path + depth);
> - ? ? ? if (err)
> - ? ? ? ? ? ? ? goto out;
> ? ? ? ?/* If extent has less than 2*EXT4_EXT_ZERO_LEN zerout directly */
> - ? ? ? if (ee_len <= 2*EXT4_EXT_ZERO_LEN && may_zeroout) {
> - ? ? ? ? ? ? ? err = ?ext4_ext_zeroout(inode, &orig_ex);
> + ? ? ? if (ee_len <= 2*EXT4_EXT_ZERO_LEN &&
> + ? ? ? ? ? (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
> + ? ? ? ? ? ? ? zero_ex.ee_block = ex->ee_block;
> + ? ? ? ? ? ? ? zero_ex.ee_len = cpu_to_le16(map->m_lblk - ee_block);
> + ? ? ? ? ? ? ? ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex));
> + ? ? ? ? ? ? ? err = ext4_ext_zeroout(inode, &zero_ex);
> ? ? ? ? ? ? ? ?if (err)
> - ? ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
> - ? ? ? ? ? ? ? /* update the extent length and mark as initialized */
> - ? ? ? ? ? ? ? ex->ee_block = orig_ex.ee_block;
> - ? ? ? ? ? ? ? ex->ee_len ? = orig_ex.ee_len;
> - ? ? ? ? ? ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ? ? ? ? ? ? ? ext4_ext_dirty(handle, inode, path + depth);
> - ? ? ? ? ? ? ? /* zeroed the full extent */
> - ? ? ? ? ? ? ? return allocated;
> - ? ? ? }
> -
> - ? ? ? /* ex1: ee_block to map->m_lblk - 1 : uninitialized */
> - ? ? ? if (map->m_lblk > ee_block) {
> - ? ? ? ? ? ? ? ex1 = ex;
> - ? ? ? ? ? ? ? ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
> - ? ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex1);
> - ? ? ? ? ? ? ? ex2 = &newex;
> - ? ? ? }
> - ? ? ? /*
> - ? ? ? ?* for sanity, update the length of the ex2 extent before
> - ? ? ? ?* we insert ex3, if ex1 is NULL. This is to avoid temporary
> - ? ? ? ?* overlap of blocks.
> - ? ? ? ?*/
> - ? ? ? if (!ex1 && allocated > map->m_len)
> - ? ? ? ? ? ? ? ex2->ee_len = cpu_to_le16(map->m_len);
> - ? ? ? /* ex3: to ee_block + ee_len : uninitialised */
> - ? ? ? if (allocated > map->m_len) {
> - ? ? ? ? ? ? ? unsigned int newdepth;
> - ? ? ? ? ? ? ? /* If extent has less than EXT4_EXT_ZERO_LEN zerout directly */
> - ? ? ? ? ? ? ? if (allocated <= EXT4_EXT_ZERO_LEN && may_zeroout) {
> - ? ? ? ? ? ? ? ? ? ? ? /*
> - ? ? ? ? ? ? ? ? ? ? ? ?* map->m_lblk == ee_block is handled by the zerouout
> - ? ? ? ? ? ? ? ? ? ? ? ?* at the beginning.
> - ? ? ? ? ? ? ? ? ? ? ? ?* Mark first half uninitialized.
> - ? ? ? ? ? ? ? ? ? ? ? ?* Mark second half initialized and zero out the
> - ? ? ? ? ? ? ? ? ? ? ? ?* initialized extent
> - ? ? ? ? ? ? ? ? ? ? ? ?*/
> - ? ? ? ? ? ? ? ? ? ? ? ex->ee_block = orig_ex.ee_block;
> - ? ? ? ? ? ? ? ? ? ? ? ex->ee_len ? = cpu_to_le16(ee_len - allocated);
> - ? ? ? ? ? ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex);
> - ? ? ? ? ? ? ? ? ? ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ? ? ? ? ? ? ? ? ? ? ? ext4_ext_dirty(handle, inode, path + depth);
> -
> - ? ? ? ? ? ? ? ? ? ? ? ex3 = &newex;
> - ? ? ? ? ? ? ? ? ? ? ? ex3->ee_block = cpu_to_le32(map->m_lblk);
> - ? ? ? ? ? ? ? ? ? ? ? ext4_ext_store_pblock(ex3, newblock);
> - ? ? ? ? ? ? ? ? ? ? ? ex3->ee_len = cpu_to_le16(allocated);
> - ? ? ? ? ? ? ? ? ? ? ? err = ext4_ext_insert_extent(handle, inode, path,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ex3, 0);
> - ? ? ? ? ? ? ? ? ? ? ? if (err == -ENOSPC) {
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? err = ?ext4_ext_zeroout(inode, &orig_ex);
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? if (err)
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ex->ee_block = orig_ex.ee_block;
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ex->ee_len ? = orig_ex.ee_len;
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ext4_ext_store_pblock(ex,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ext4_ext_pblock(&orig_ex));
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ext4_ext_dirty(handle, inode, path + depth);
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /* blocks available from map->m_lblk */
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return allocated;
> -
> - ? ? ? ? ? ? ? ? ? ? ? } else if (err)
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
> -
> - ? ? ? ? ? ? ? ? ? ? ? /*
> - ? ? ? ? ? ? ? ? ? ? ? ?* We need to zero out the second half because
> - ? ? ? ? ? ? ? ? ? ? ? ?* an fallocate request can update file size and
> - ? ? ? ? ? ? ? ? ? ? ? ?* converting the second half to initialized extent
> - ? ? ? ? ? ? ? ? ? ? ? ?* implies that we can leak some junk data to user
> - ? ? ? ? ? ? ? ? ? ? ? ?* space.
> - ? ? ? ? ? ? ? ? ? ? ? ?*/
> - ? ? ? ? ? ? ? ? ? ? ? err = ?ext4_ext_zeroout(inode, ex3);
> - ? ? ? ? ? ? ? ? ? ? ? if (err) {
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /*
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?* We should actually mark the
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?* second half as uninit and return error
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?* Insert would have changed the extent
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?*/
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? depth = ext_depth(inode);
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ext4_ext_drop_refs(path);
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? path = ext4_ext_find_extent(inode, map->m_lblk,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? path);
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? if (IS_ERR(path)) {
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? err = PTR_ERR(path);
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return err;
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? }
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? /* get the second half extent details */
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ex = path[depth].p_ext;
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? err = ext4_ext_get_access(handle, inode,
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? path + depth);
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? if (err)
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return err;
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex);
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ext4_ext_dirty(handle, inode, path + depth);
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return err;
> - ? ? ? ? ? ? ? ? ? ? ? }
> -
> - ? ? ? ? ? ? ? ? ? ? ? /* zeroed the second half */
> - ? ? ? ? ? ? ? ? ? ? ? return allocated;
> - ? ? ? ? ? ? ? }
> - ? ? ? ? ? ? ? ex3 = &newex;
> - ? ? ? ? ? ? ? ex3->ee_block = cpu_to_le32(map->m_lblk + map->m_len);
> - ? ? ? ? ? ? ? ext4_ext_store_pblock(ex3, newblock + map->m_len);
> - ? ? ? ? ? ? ? ex3->ee_len = cpu_to_le16(allocated - map->m_len);
> - ? ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex3);
> - ? ? ? ? ? ? ? err = ext4_ext_insert_extent(handle, inode, path, ex3, 0);
> - ? ? ? ? ? ? ? if (err == -ENOSPC && may_zeroout) {
> - ? ? ? ? ? ? ? ? ? ? ? err = ?ext4_ext_zeroout(inode, &orig_ex);
> - ? ? ? ? ? ? ? ? ? ? ? if (err)
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
> - ? ? ? ? ? ? ? ? ? ? ? /* update the extent length and mark as initialized */
> - ? ? ? ? ? ? ? ? ? ? ? ex->ee_block = orig_ex.ee_block;
> - ? ? ? ? ? ? ? ? ? ? ? ex->ee_len ? = orig_ex.ee_len;
> - ? ? ? ? ? ? ? ? ? ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ? ? ? ? ? ? ? ? ? ? ? ext4_ext_dirty(handle, inode, path + depth);
> - ? ? ? ? ? ? ? ? ? ? ? /* zeroed the full extent */
> - ? ? ? ? ? ? ? ? ? ? ? /* blocks available from map->m_lblk */
> - ? ? ? ? ? ? ? ? ? ? ? return allocated;
> -
> - ? ? ? ? ? ? ? } else if (err)
> - ? ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
> - ? ? ? ? ? ? ? /*
> - ? ? ? ? ? ? ? ?* The depth, and hence eh & ex might change
> - ? ? ? ? ? ? ? ?* as part of the insert above.
> - ? ? ? ? ? ? ? ?*/
> - ? ? ? ? ? ? ? newdepth = ext_depth(inode);
> - ? ? ? ? ? ? ? /*
> - ? ? ? ? ? ? ? ?* update the extent length after successful insert of the
> - ? ? ? ? ? ? ? ?* split extent
> - ? ? ? ? ? ? ? ?*/
> - ? ? ? ? ? ? ? ee_len -= ext4_ext_get_actual_len(ex3);
> - ? ? ? ? ? ? ? orig_ex.ee_len = cpu_to_le16(ee_len);
> - ? ? ? ? ? ? ? may_zeroout = ee_block + ee_len <= eof_block;
> -
> - ? ? ? ? ? ? ? depth = newdepth;
> - ? ? ? ? ? ? ? ext4_ext_drop_refs(path);
> - ? ? ? ? ? ? ? path = ext4_ext_find_extent(inode, map->m_lblk, path);
> - ? ? ? ? ? ? ? if (IS_ERR(path)) {
> - ? ? ? ? ? ? ? ? ? ? ? err = PTR_ERR(path);
> ? ? ? ? ? ? ? ? ? ? ? ?goto out;
> - ? ? ? ? ? ? ? }
> - ? ? ? ? ? ? ? eh = path[depth].p_hdr;
> - ? ? ? ? ? ? ? ex = path[depth].p_ext;
> - ? ? ? ? ? ? ? if (ex2 != &newex)
> - ? ? ? ? ? ? ? ? ? ? ? ex2 = ex;
>
> ? ? ? ? ? ? ? ?err = ext4_ext_get_access(handle, inode, path + depth);
> ? ? ? ? ? ? ? ?if (err)
> ? ? ? ? ? ? ? ? ? ? ? ?goto out;
> -
> - ? ? ? ? ? ? ? allocated = map->m_len;
> -
> - ? ? ? ? ? ? ? /* If extent has less than EXT4_EXT_ZERO_LEN and we are trying
> - ? ? ? ? ? ? ? ?* to insert a extent in the middle zerout directly
> - ? ? ? ? ? ? ? ?* otherwise give the extent a chance to merge to left
> - ? ? ? ? ? ? ? ?*/
> - ? ? ? ? ? ? ? if (le16_to_cpu(orig_ex.ee_len) <= EXT4_EXT_ZERO_LEN &&
> - ? ? ? ? ? ? ? ? ? ? ? map->m_lblk != ee_block && may_zeroout) {
> - ? ? ? ? ? ? ? ? ? ? ? err = ?ext4_ext_zeroout(inode, &orig_ex);
> - ? ? ? ? ? ? ? ? ? ? ? if (err)
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
> - ? ? ? ? ? ? ? ? ? ? ? /* update the extent length and mark as initialized */
> - ? ? ? ? ? ? ? ? ? ? ? ex->ee_block = orig_ex.ee_block;
> - ? ? ? ? ? ? ? ? ? ? ? ex->ee_len ? = orig_ex.ee_len;
> - ? ? ? ? ? ? ? ? ? ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ? ? ? ? ? ? ? ? ? ? ? ext4_ext_dirty(handle, inode, path + depth);
> - ? ? ? ? ? ? ? ? ? ? ? /* zero out the first half */
> - ? ? ? ? ? ? ? ? ? ? ? /* blocks available from map->m_lblk */
> - ? ? ? ? ? ? ? ? ? ? ? return allocated;
> - ? ? ? ? ? ? ? }
> - ? ? ? }
> - ? ? ? /*
> - ? ? ? ?* If there was a change of depth as part of the
> - ? ? ? ?* insertion of ex3 above, we need to update the length
> - ? ? ? ?* of the ex1 extent again here
> - ? ? ? ?*/
> - ? ? ? if (ex1 && ex1 != ex) {
> - ? ? ? ? ? ? ? ex1 = ex;
> - ? ? ? ? ? ? ? ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
> - ? ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex1);
> - ? ? ? ? ? ? ? ex2 = &newex;
> - ? ? ? }
> - ? ? ? /* ex2: map->m_lblk to map->m_lblk + maxblocks-1 : initialised */
> - ? ? ? ex2->ee_block = cpu_to_le32(map->m_lblk);
> - ? ? ? ext4_ext_store_pblock(ex2, newblock);
> - ? ? ? ex2->ee_len = cpu_to_le16(allocated);
> - ? ? ? if (ex2 != ex)
> - ? ? ? ? ? ? ? goto insert;
> - ? ? ? /*
> - ? ? ? ?* New (initialized) extent starts from the first block
> - ? ? ? ?* in the current extent. i.e., ex2 == ex
> - ? ? ? ?* We have to see if it can be merged with the extent
> - ? ? ? ?* on the left.
> - ? ? ? ?*/
> - ? ? ? if (ex2 > EXT_FIRST_EXTENT(eh)) {
> - ? ? ? ? ? ? ? /*
> - ? ? ? ? ? ? ? ?* To merge left, pass "ex2 - 1" to try_to_merge(),
> - ? ? ? ? ? ? ? ?* since it merges towards right _only_.
> - ? ? ? ? ? ? ? ?*/
> - ? ? ? ? ? ? ? ret = ext4_ext_try_to_merge(inode, path, ex2 - 1);
> - ? ? ? ? ? ? ? if (ret) {
> - ? ? ? ? ? ? ? ? ? ? ? err = ext4_ext_correct_indexes(handle, inode, path);
> - ? ? ? ? ? ? ? ? ? ? ? if (err)
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto out;
> - ? ? ? ? ? ? ? ? ? ? ? depth = ext_depth(inode);
> - ? ? ? ? ? ? ? ? ? ? ? ex2--;
> - ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? ext4_ext_mark_initialized(ex);
> + ? ? ? ? ? ? ? ext4_ext_try_to_merge(inode, path, ex);
> + ? ? ? ? ? ? ? err = ext4_ext_dirty(handle, inode, path + depth);
> + ? ? ? ? ? ? ? goto out;
> ? ? ? ?}
> +
> ? ? ? ?/*
> - ? ? ? ?* Try to Merge towards right. This might be required
> - ? ? ? ?* only when the whole extent is being written to.
> - ? ? ? ?* i.e. ex2 == ex and ex3 == NULL.
> + ? ? ? ?* four cases:
> + ? ? ? ?* 1. split the extent into three extents.
> + ? ? ? ?* 2. split the extent into two extents, zeroout the first half.
> + ? ? ? ?* 3. split the extent into two extents, zeroout the second half.
> + ? ? ? ?* 4. split the extent into two extents with out zeroout.
> ? ? ? ? */
> - ? ? ? if (!ex3) {
> - ? ? ? ? ? ? ? ret = ext4_ext_try_to_merge(inode, path, ex2);
> - ? ? ? ? ? ? ? if (ret) {
> - ? ? ? ? ? ? ? ? ? ? ? err = ext4_ext_correct_indexes(handle, inode, path);
> + ? ? ? split_map.m_lblk = map->m_lblk;
> + ? ? ? split_map.m_len = map->m_len;
> +
> + ? ? ? if (allocated > map->m_len) {
> + ? ? ? ? ? ? ? if (allocated <= EXT4_EXT_ZERO_LEN &&
> + ? ? ? ? ? ? ? ? ? (EXT4_EXT_MAY_ZEROOUT & split_flag)) {
> + ? ? ? ? ? ? ? ? ? ? ? /* case 3 */
> + ? ? ? ? ? ? ? ? ? ? ? zero_ex.ee_block =
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?cpu_to_le32(map->m_lblk + map->m_len);
> + ? ? ? ? ? ? ? ? ? ? ? zero_ex.ee_len = cpu_to_le16(allocated - map->m_len);
> + ? ? ? ? ? ? ? ? ? ? ? ext4_ext_store_pblock(&zero_ex,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ext4_ext_pblock(ex) + map->m_lblk - ee_block);
> + ? ? ? ? ? ? ? ? ? ? ? err = ext4_ext_zeroout(inode, &zero_ex);
> ? ? ? ? ? ? ? ? ? ? ? ?if (err)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?goto out;
> + ? ? ? ? ? ? ? ? ? ? ? split_map.m_lblk = map->m_lblk;
> + ? ? ? ? ? ? ? ? ? ? ? split_map.m_len = allocated;
> + ? ? ? ? ? ? ? } else if ((map->m_lblk - ee_block + map->m_len <
> + ? ? ? ? ? ? ? ? ? ? ? ? ?EXT4_EXT_ZERO_LEN) &&
> + ? ? ? ? ? ? ? ? ? ? ? ? ?(EXT4_EXT_MAY_ZEROOUT & split_flag)) {
> + ? ? ? ? ? ? ? ? ? ? ? /* case 2 */
> + ? ? ? ? ? ? ? ? ? ? ? if (map->m_lblk != ee_block) {
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? zero_ex.ee_block = ex->ee_block;
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? zero_ex.ee_len = cpu_to_le16(map->m_lblk -
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ee_block);
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ext4_ext_store_pblock(&zero_ex,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ext4_ext_pblock(ex));
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? err = ext4_ext_zeroout(inode, &zero_ex);
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? if (err)
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto out;
> + ? ? ? ? ? ? ? ? ? ? ? }
> +
> + ? ? ? ? ? ? ? ? ? ? ? allocated = map->m_lblk - ee_block + map->m_len;
> +
> + ? ? ? ? ? ? ? ? ? ? ? split_map.m_lblk = ee_block;
> + ? ? ? ? ? ? ? ? ? ? ? split_map.m_len = allocated;
> ? ? ? ? ? ? ? ?}
> ? ? ? ?}
> - ? ? ? /* Mark modified extent as dirty */
> - ? ? ? err = ext4_ext_dirty(handle, inode, path + depth);
> - ? ? ? goto out;
> -insert:
> - ? ? ? err = ext4_ext_insert_extent(handle, inode, path, &newex, 0);
> - ? ? ? if (err == -ENOSPC && may_zeroout) {
> - ? ? ? ? ? ? ? err = ?ext4_ext_zeroout(inode, &orig_ex);
> - ? ? ? ? ? ? ? if (err)
> - ? ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
> - ? ? ? ? ? ? ? /* update the extent length and mark as initialized */
> - ? ? ? ? ? ? ? ex->ee_block = orig_ex.ee_block;
> - ? ? ? ? ? ? ? ex->ee_len ? = orig_ex.ee_len;
> - ? ? ? ? ? ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ? ? ? ? ? ? ? ext4_ext_dirty(handle, inode, path + depth);
> - ? ? ? ? ? ? ? /* zero out the first half */
> - ? ? ? ? ? ? ? return allocated;
> - ? ? ? } else if (err)
> - ? ? ? ? ? ? ? goto fix_extent_len;
> +
> + ? ? ? allocated = ext4_split_extent(handle, inode, path,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?&split_map, split_flag, 0);
> + ? ? ? if (allocated < 0)
> + ? ? ? ? ? ? ? err = allocated;
> +
> ?out:
> - ? ? ? ext4_ext_show_leaf(inode, path);
> ? ? ? ?return err ? err : allocated;
> -
> -fix_extent_len:
> - ? ? ? ex->ee_block = orig_ex.ee_block;
> - ? ? ? ex->ee_len ? = orig_ex.ee_len;
> - ? ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ? ? ? ext4_ext_mark_uninitialized(ex);
> - ? ? ? ext4_ext_dirty(handle, inode, path + depth);
> - ? ? ? return err;
> ?}
>
> ?/*
> @@ -3096,15 +2902,11 @@ static int ext4_split_unwritten_extents(handle_t *handle,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct ext4_ext_path *path,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?int flags)
> ?{
> - ? ? ? struct ext4_extent *ex, newex, orig_ex;
> - ? ? ? struct ext4_extent *ex1 = NULL;
> - ? ? ? struct ext4_extent *ex2 = NULL;
> - ? ? ? struct ext4_extent *ex3 = NULL;
> - ? ? ? ext4_lblk_t ee_block, eof_block;
> - ? ? ? unsigned int allocated, ee_len, depth;
> - ? ? ? ext4_fsblk_t newblock;
> - ? ? ? int err = 0;
> - ? ? ? int may_zeroout;
> + ? ? ? ext4_lblk_t eof_block;
> + ? ? ? ext4_lblk_t ee_block;
> + ? ? ? struct ext4_extent *ex;
> + ? ? ? unsigned int ee_len;
> + ? ? ? int split_flag = 0, depth;
>
> ? ? ? ?ext_debug("ext4_split_unwritten_extents: inode %lu, logical"
> ? ? ? ? ? ? ? ?"block %llu, max_blocks %u\n", inode->i_ino,
> @@ -3114,155 +2916,20 @@ static int ext4_split_unwritten_extents(handle_t *handle,
> ? ? ? ? ? ? ? ?inode->i_sb->s_blocksize_bits;
> ? ? ? ?if (eof_block < map->m_lblk + map->m_len)
> ? ? ? ? ? ? ? ?eof_block = map->m_lblk + map->m_len;
> -
> - ? ? ? depth = ext_depth(inode);
> - ? ? ? ex = path[depth].p_ext;
> - ? ? ? ee_block = le32_to_cpu(ex->ee_block);
> - ? ? ? ee_len = ext4_ext_get_actual_len(ex);
> - ? ? ? allocated = ee_len - (map->m_lblk - ee_block);
> - ? ? ? newblock = map->m_lblk - ee_block + ext4_ext_pblock(ex);
> -
> - ? ? ? ex2 = ex;
> - ? ? ? orig_ex.ee_block = ex->ee_block;
> - ? ? ? orig_ex.ee_len ? = cpu_to_le16(ee_len);
> - ? ? ? ext4_ext_store_pblock(&orig_ex, ext4_ext_pblock(ex));
> -
> ? ? ? ?/*
> ? ? ? ? * It is safe to convert extent to initialized via explicit
> ? ? ? ? * zeroout only if extent is fully insde i_size or new_size.
> ? ? ? ? */
> - ? ? ? may_zeroout = ee_block + ee_len <= eof_block;
> -
> - ? ? ? /*
> - ? ? ? ?* If the uninitialized extent begins at the same logical
> - ? ? ? ?* block where the write begins, and the write completely
> - ? ? ? ?* covers the extent, then we don't need to split it.
> - ? ? ? ?*/
> - ? ? ? if ((map->m_lblk == ee_block) && (allocated <= map->m_len))
> - ? ? ? ? ? ? ? return allocated;
> -
> - ? ? ? err = ext4_ext_get_access(handle, inode, path + depth);
> - ? ? ? if (err)
> - ? ? ? ? ? ? ? goto out;
> - ? ? ? /* ex1: ee_block to map->m_lblk - 1 : uninitialized */
> - ? ? ? if (map->m_lblk > ee_block) {
> - ? ? ? ? ? ? ? ex1 = ex;
> - ? ? ? ? ? ? ? ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
> - ? ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex1);
> - ? ? ? ? ? ? ? ex2 = &newex;
> - ? ? ? }
> - ? ? ? /*
> - ? ? ? ?* for sanity, update the length of the ex2 extent before
> - ? ? ? ?* we insert ex3, if ex1 is NULL. This is to avoid temporary
> - ? ? ? ?* overlap of blocks.
> - ? ? ? ?*/
> - ? ? ? if (!ex1 && allocated > map->m_len)
> - ? ? ? ? ? ? ? ex2->ee_len = cpu_to_le16(map->m_len);
> - ? ? ? /* ex3: to ee_block + ee_len : uninitialised */
> - ? ? ? if (allocated > map->m_len) {
> - ? ? ? ? ? ? ? unsigned int newdepth;
> - ? ? ? ? ? ? ? ex3 = &newex;
> - ? ? ? ? ? ? ? ex3->ee_block = cpu_to_le32(map->m_lblk + map->m_len);
> - ? ? ? ? ? ? ? ext4_ext_store_pblock(ex3, newblock + map->m_len);
> - ? ? ? ? ? ? ? ex3->ee_len = cpu_to_le16(allocated - map->m_len);
> - ? ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex3);
> - ? ? ? ? ? ? ? err = ext4_ext_insert_extent(handle, inode, path, ex3, flags);
> - ? ? ? ? ? ? ? if (err == -ENOSPC && may_zeroout) {
> - ? ? ? ? ? ? ? ? ? ? ? err = ?ext4_ext_zeroout(inode, &orig_ex);
> - ? ? ? ? ? ? ? ? ? ? ? if (err)
> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
> - ? ? ? ? ? ? ? ? ? ? ? /* update the extent length and mark as initialized */
> - ? ? ? ? ? ? ? ? ? ? ? ex->ee_block = orig_ex.ee_block;
> - ? ? ? ? ? ? ? ? ? ? ? ex->ee_len ? = orig_ex.ee_len;
> - ? ? ? ? ? ? ? ? ? ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ? ? ? ? ? ? ? ? ? ? ? ext4_ext_dirty(handle, inode, path + depth);
> - ? ? ? ? ? ? ? ? ? ? ? /* zeroed the full extent */
> - ? ? ? ? ? ? ? ? ? ? ? /* blocks available from map->m_lblk */
> - ? ? ? ? ? ? ? ? ? ? ? return allocated;
> -
> - ? ? ? ? ? ? ? } else if (err)
> - ? ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
> - ? ? ? ? ? ? ? /*
> - ? ? ? ? ? ? ? ?* The depth, and hence eh & ex might change
> - ? ? ? ? ? ? ? ?* as part of the insert above.
> - ? ? ? ? ? ? ? ?*/
> - ? ? ? ? ? ? ? newdepth = ext_depth(inode);
> - ? ? ? ? ? ? ? /*
> - ? ? ? ? ? ? ? ?* update the extent length after successful insert of the
> - ? ? ? ? ? ? ? ?* split extent
> - ? ? ? ? ? ? ? ?*/
> - ? ? ? ? ? ? ? ee_len -= ext4_ext_get_actual_len(ex3);
> - ? ? ? ? ? ? ? orig_ex.ee_len = cpu_to_le16(ee_len);
> - ? ? ? ? ? ? ? may_zeroout = ee_block + ee_len <= eof_block;
> -
> - ? ? ? ? ? ? ? depth = newdepth;
> - ? ? ? ? ? ? ? ext4_ext_drop_refs(path);
> - ? ? ? ? ? ? ? path = ext4_ext_find_extent(inode, map->m_lblk, path);
> - ? ? ? ? ? ? ? if (IS_ERR(path)) {
> - ? ? ? ? ? ? ? ? ? ? ? err = PTR_ERR(path);
> - ? ? ? ? ? ? ? ? ? ? ? goto out;
> - ? ? ? ? ? ? ? }
> - ? ? ? ? ? ? ? ex = path[depth].p_ext;
> - ? ? ? ? ? ? ? if (ex2 != &newex)
> - ? ? ? ? ? ? ? ? ? ? ? ex2 = ex;
> -
> - ? ? ? ? ? ? ? err = ext4_ext_get_access(handle, inode, path + depth);
> - ? ? ? ? ? ? ? if (err)
> - ? ? ? ? ? ? ? ? ? ? ? goto out;
> + ? ? ? depth = ext_depth(inode);
> + ? ? ? ex = path[depth].p_ext;
> + ? ? ? ee_block = le32_to_cpu(ex->ee_block);
> + ? ? ? ee_len = ext4_ext_get_actual_len(ex);
>
> - ? ? ? ? ? ? ? allocated = map->m_len;
> - ? ? ? }
> - ? ? ? /*
> - ? ? ? ?* If there was a change of depth as part of the
> - ? ? ? ?* insertion of ex3 above, we need to update the length
> - ? ? ? ?* of the ex1 extent again here
> - ? ? ? ?*/
> - ? ? ? if (ex1 && ex1 != ex) {
> - ? ? ? ? ? ? ? ex1 = ex;
> - ? ? ? ? ? ? ? ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
> - ? ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex1);
> - ? ? ? ? ? ? ? ex2 = &newex;
> - ? ? ? }
> - ? ? ? /*
> - ? ? ? ?* ex2: map->m_lblk to map->m_lblk + map->m_len-1 : to be written
> - ? ? ? ?* using direct I/O, uninitialised still.
> - ? ? ? ?*/
> - ? ? ? ex2->ee_block = cpu_to_le32(map->m_lblk);
> - ? ? ? ext4_ext_store_pblock(ex2, newblock);
> - ? ? ? ex2->ee_len = cpu_to_le16(allocated);
> - ? ? ? ext4_ext_mark_uninitialized(ex2);
> - ? ? ? if (ex2 != ex)
> - ? ? ? ? ? ? ? goto insert;
> - ? ? ? /* Mark modified extent as dirty */
> - ? ? ? err = ext4_ext_dirty(handle, inode, path + depth);
> - ? ? ? ext_debug("out here\n");
> - ? ? ? goto out;
> -insert:
> - ? ? ? err = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
> - ? ? ? if (err == -ENOSPC && may_zeroout) {
> - ? ? ? ? ? ? ? err = ?ext4_ext_zeroout(inode, &orig_ex);
> - ? ? ? ? ? ? ? if (err)
> - ? ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
> - ? ? ? ? ? ? ? /* update the extent length and mark as initialized */
> - ? ? ? ? ? ? ? ex->ee_block = orig_ex.ee_block;
> - ? ? ? ? ? ? ? ex->ee_len ? = orig_ex.ee_len;
> - ? ? ? ? ? ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ? ? ? ? ? ? ? ext4_ext_dirty(handle, inode, path + depth);
> - ? ? ? ? ? ? ? /* zero out the first half */
> - ? ? ? ? ? ? ? return allocated;
> - ? ? ? } else if (err)
> - ? ? ? ? ? ? ? goto fix_extent_len;
> -out:
> - ? ? ? ext4_ext_show_leaf(inode, path);
> - ? ? ? return err ? err : allocated;
> + ? ? ? split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
> + ? ? ? split_flag |= EXT4_EXT_MARK_UNINIT2;
>
> -fix_extent_len:
> - ? ? ? ex->ee_block = orig_ex.ee_block;
> - ? ? ? ex->ee_len ? = orig_ex.ee_len;
> - ? ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ? ? ? ext4_ext_mark_uninitialized(ex);
> - ? ? ? ext4_ext_dirty(handle, inode, path + depth);
> - ? ? ? return err;
> + ? ? ? flags |= EXT4_GET_BLOCKS_PRE_IO;
> + ? ? ? return ext4_split_extent(handle, inode, path, map, split_flag, flags);
> ?}
>
> ?static int ext4_convert_unwritten_extents_endio(handle_t *handle,
> --
> 1.7.4.4
>
>



--
Best Wishes
Yongqiang Yang

2011-04-21 08:29:01

by Allison Henderson

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] ext4:Reimplement convert and split_unwritten.

On 4/13/2011 10:40 PM, Yongqiang Yang wrote:
> convert and split unwritten are reimplemented based on ext4_split_extent()
> added in last patch.
>
> Signed-off-by: Yongqiang Yang<[email protected]>
> ---
> fs/ext4/extents.c | 483 ++++++++--------------------------------------------
> 1 files changed, 75 insertions(+), 408 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 1756e0f..ee2dda3 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2770,17 +2770,13 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> struct ext4_map_blocks *map,
> struct ext4_ext_path *path)
> {
> - struct ext4_extent *ex, newex, orig_ex;
> - struct ext4_extent *ex1 = NULL;
> - struct ext4_extent *ex2 = NULL;
> - struct ext4_extent *ex3 = NULL;
> - struct ext4_extent_header *eh;
> + struct ext4_map_blocks split_map;
> + struct ext4_extent zero_ex;
> + struct ext4_extent *ex;
> ext4_lblk_t ee_block, eof_block;
> unsigned int allocated, ee_len, depth;
> - ext4_fsblk_t newblock;
> int err = 0;
> - int ret = 0;
> - int may_zeroout;
> + int split_flag = 0;
>
> ext_debug("ext4_ext_convert_to_initialized: inode %lu, logical"
> "block %llu, max_blocks %u\n", inode->i_ino,
> @@ -2792,280 +2788,90 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> eof_block = map->m_lblk + map->m_len;
>
> depth = ext_depth(inode);
> - eh = path[depth].p_hdr;
> ex = path[depth].p_ext;
> ee_block = le32_to_cpu(ex->ee_block);
> ee_len = ext4_ext_get_actual_len(ex);
> allocated = ee_len - (map->m_lblk - ee_block);
> - newblock = map->m_lblk - ee_block + ext4_ext_pblock(ex);
> -
> - ex2 = ex;
> - orig_ex.ee_block = ex->ee_block;
> - orig_ex.ee_len = cpu_to_le16(ee_len);
> - ext4_ext_store_pblock(&orig_ex, ext4_ext_pblock(ex));
>
> + WARN_ON(map->m_lblk< ee_block);
> /*
> * It is safe to convert extent to initialized via explicit
> * zeroout only if extent is fully insde i_size or new_size.
> */
> - may_zeroout = ee_block + ee_len<= eof_block;
> + split_flag |= ee_block + ee_len<= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
>
> - err = ext4_ext_get_access(handle, inode, path + depth);
> - if (err)
> - goto out;
> /* If extent has less than 2*EXT4_EXT_ZERO_LEN zerout directly */
> - if (ee_len<= 2*EXT4_EXT_ZERO_LEN&& may_zeroout) {
> - err = ext4_ext_zeroout(inode,&orig_ex);
> + if (ee_len<= 2*EXT4_EXT_ZERO_LEN&&
> + (EXT4_EXT_MAY_ZEROOUT& split_flag)) {
> + zero_ex.ee_block = ex->ee_block;
> + zero_ex.ee_len = cpu_to_le16(map->m_lblk - ee_block);
> + ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex));
> + err = ext4_ext_zeroout(inode,&zero_ex);
> if (err)
> - goto fix_extent_len;
> - /* update the extent length and mark as initialized */
> - ex->ee_block = orig_ex.ee_block;
> - ex->ee_len = orig_ex.ee_len;
> - ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ext4_ext_dirty(handle, inode, path + depth);
> - /* zeroed the full extent */
> - return allocated;
> - }
> -
> - /* ex1: ee_block to map->m_lblk - 1 : uninitialized */
> - if (map->m_lblk> ee_block) {
> - ex1 = ex;
> - ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
> - ext4_ext_mark_uninitialized(ex1);
> - ex2 =&newex;
> - }
> - /*
> - * for sanity, update the length of the ex2 extent before
> - * we insert ex3, if ex1 is NULL. This is to avoid temporary
> - * overlap of blocks.
> - */
> - if (!ex1&& allocated> map->m_len)
> - ex2->ee_len = cpu_to_le16(map->m_len);
> - /* ex3: to ee_block + ee_len : uninitialised */
> - if (allocated> map->m_len) {
> - unsigned int newdepth;
> - /* If extent has less than EXT4_EXT_ZERO_LEN zerout directly */
> - if (allocated<= EXT4_EXT_ZERO_LEN&& may_zeroout) {
> - /*
> - * map->m_lblk == ee_block is handled by the zerouout
> - * at the beginning.
> - * Mark first half uninitialized.
> - * Mark second half initialized and zero out the
> - * initialized extent
> - */
> - ex->ee_block = orig_ex.ee_block;
> - ex->ee_len = cpu_to_le16(ee_len - allocated);
> - ext4_ext_mark_uninitialized(ex);
> - ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ext4_ext_dirty(handle, inode, path + depth);
> -
> - ex3 =&newex;
> - ex3->ee_block = cpu_to_le32(map->m_lblk);
> - ext4_ext_store_pblock(ex3, newblock);
> - ex3->ee_len = cpu_to_le16(allocated);
> - err = ext4_ext_insert_extent(handle, inode, path,
> - ex3, 0);
> - if (err == -ENOSPC) {
> - err = ext4_ext_zeroout(inode,&orig_ex);
> - if (err)
> - goto fix_extent_len;
> - ex->ee_block = orig_ex.ee_block;
> - ex->ee_len = orig_ex.ee_len;
> - ext4_ext_store_pblock(ex,
> - ext4_ext_pblock(&orig_ex));
> - ext4_ext_dirty(handle, inode, path + depth);
> - /* blocks available from map->m_lblk */
> - return allocated;
> -
> - } else if (err)
> - goto fix_extent_len;
> -
> - /*
> - * We need to zero out the second half because
> - * an fallocate request can update file size and
> - * converting the second half to initialized extent
> - * implies that we can leak some junk data to user
> - * space.
> - */
> - err = ext4_ext_zeroout(inode, ex3);
> - if (err) {
> - /*
> - * We should actually mark the
> - * second half as uninit and return error
> - * Insert would have changed the extent
> - */
> - depth = ext_depth(inode);
> - ext4_ext_drop_refs(path);
> - path = ext4_ext_find_extent(inode, map->m_lblk,
> - path);
> - if (IS_ERR(path)) {
> - err = PTR_ERR(path);
> - return err;
> - }
> - /* get the second half extent details */
> - ex = path[depth].p_ext;
> - err = ext4_ext_get_access(handle, inode,
> - path + depth);
> - if (err)
> - return err;
> - ext4_ext_mark_uninitialized(ex);
> - ext4_ext_dirty(handle, inode, path + depth);
> - return err;
> - }
> -
> - /* zeroed the second half */
> - return allocated;
> - }
> - ex3 =&newex;
> - ex3->ee_block = cpu_to_le32(map->m_lblk + map->m_len);
> - ext4_ext_store_pblock(ex3, newblock + map->m_len);
> - ex3->ee_len = cpu_to_le16(allocated - map->m_len);
> - ext4_ext_mark_uninitialized(ex3);
> - err = ext4_ext_insert_extent(handle, inode, path, ex3, 0);
> - if (err == -ENOSPC&& may_zeroout) {
> - err = ext4_ext_zeroout(inode,&orig_ex);
> - if (err)
> - goto fix_extent_len;
> - /* update the extent length and mark as initialized */
> - ex->ee_block = orig_ex.ee_block;
> - ex->ee_len = orig_ex.ee_len;
> - ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ext4_ext_dirty(handle, inode, path + depth);
> - /* zeroed the full extent */
> - /* blocks available from map->m_lblk */
> - return allocated;
> -
> - } else if (err)
> - goto fix_extent_len;
> - /*
> - * The depth, and hence eh& ex might change
> - * as part of the insert above.
> - */
> - newdepth = ext_depth(inode);
> - /*
> - * update the extent length after successful insert of the
> - * split extent
> - */
> - ee_len -= ext4_ext_get_actual_len(ex3);
> - orig_ex.ee_len = cpu_to_le16(ee_len);
> - may_zeroout = ee_block + ee_len<= eof_block;
> -
> - depth = newdepth;
> - ext4_ext_drop_refs(path);
> - path = ext4_ext_find_extent(inode, map->m_lblk, path);
> - if (IS_ERR(path)) {
> - err = PTR_ERR(path);
> goto out;
> - }
> - eh = path[depth].p_hdr;
> - ex = path[depth].p_ext;
> - if (ex2 !=&newex)
> - ex2 = ex;
>
Hi again,

I am working on trying to get the code to pass the fsx stress test. If I add this snippet of code here, it seems to fix most of the tests, but it is very slow. So this may not be the best solution yet. But I thought if updated you with what I found we might be able to come up with a better solution.

if((map->m_lblk + map->m_len) < ee_block + ee_len){
zero_ex.ee_block = map->m_lblk + map->m_len;
zero_ex.ee_len = cpu_to_le16(ee_len - (map->m_lblk + map->m_len));
ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex)+(map->m_lblk - ee_block)+ map->m_len);
err = ext4_ext_zeroout(inode, &zero_ex);
if (err) {
goto out;
}
}

At the moment, the test fails because a portion of the test file does not get zeroed out when it should. The punch hole code needs to flush the pages before it punches out any blocks. It uses the filemap_write_and_wait_range routine, which eventually calls ext4_ext_convert_to_initialized. The bug only appears to show up when flushing out the middle of an unwritten extent. The two halves on either side are supposed to get zeroed out or split off, but it looks like one of them was not getting zeroed out. This snippet seems to fix that, but I'm not yet sure why it's so slow.

> err = ext4_ext_get_access(handle, inode, path + depth);
> if (err)
> goto out;
> -
> - allocated = map->m_len;
> -
> - /* If extent has less than EXT4_EXT_ZERO_LEN and we are trying
> - * to insert a extent in the middle zerout directly
> - * otherwise give the extent a chance to merge to left
> - */
> - if (le16_to_cpu(orig_ex.ee_len)<= EXT4_EXT_ZERO_LEN&&
> - map->m_lblk != ee_block&& may_zeroout) {
> - err = ext4_ext_zeroout(inode,&orig_ex);
> - if (err)
> - goto fix_extent_len;
> - /* update the extent length and mark as initialized */
> - ex->ee_block = orig_ex.ee_block;
> - ex->ee_len = orig_ex.ee_len;
> - ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ext4_ext_dirty(handle, inode, path + depth);
> - /* zero out the first half */
> - /* blocks available from map->m_lblk */
> - return allocated;
> - }
> - }
> - /*
> - * If there was a change of depth as part of the
> - * insertion of ex3 above, we need to update the length
> - * of the ex1 extent again here
> - */
> - if (ex1&& ex1 != ex) {
> - ex1 = ex;
> - ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
> - ext4_ext_mark_uninitialized(ex1);
> - ex2 =&newex;
> - }
> - /* ex2: map->m_lblk to map->m_lblk + maxblocks-1 : initialised */
> - ex2->ee_block = cpu_to_le32(map->m_lblk);
> - ext4_ext_store_pblock(ex2, newblock);
> - ex2->ee_len = cpu_to_le16(allocated);
> - if (ex2 != ex)
> - goto insert;
> - /*
> - * New (initialized) extent starts from the first block
> - * in the current extent. i.e., ex2 == ex
> - * We have to see if it can be merged with the extent
> - * on the left.
> - */
> - if (ex2> EXT_FIRST_EXTENT(eh)) {
> - /*
> - * To merge left, pass "ex2 - 1" to try_to_merge(),
> - * since it merges towards right _only_.
> - */
> - ret = ext4_ext_try_to_merge(inode, path, ex2 - 1);
> - if (ret) {
> - err = ext4_ext_correct_indexes(handle, inode, path);
> - if (err)
> - goto out;
> - depth = ext_depth(inode);
> - ex2--;
> - }
> + ext4_ext_mark_initialized(ex);
> + ext4_ext_try_to_merge(inode, path, ex);
> + err = ext4_ext_dirty(handle, inode, path + depth);
> + goto out;
> }
> +
> /*
> - * Try to Merge towards right. This might be required
> - * only when the whole extent is being written to.
> - * i.e. ex2 == ex and ex3 == NULL.
> + * four cases:
> + * 1. split the extent into three extents.
> + * 2. split the extent into two extents, zeroout the first half.
> + * 3. split the extent into two extents, zeroout the second half.
> + * 4. split the extent into two extents with out zeroout.
> */
> - if (!ex3) {
> - ret = ext4_ext_try_to_merge(inode, path, ex2);
> - if (ret) {
> - err = ext4_ext_correct_indexes(handle, inode, path);
> + split_map.m_lblk = map->m_lblk;
> + split_map.m_len = map->m_len;
> +
> + if (allocated> map->m_len) {
> + if (allocated<= EXT4_EXT_ZERO_LEN&&
> + (EXT4_EXT_MAY_ZEROOUT& split_flag)) {
> + /* case 3 */
> + zero_ex.ee_block =
> + cpu_to_le32(map->m_lblk + map->m_len);
> + zero_ex.ee_len = cpu_to_le16(allocated - map->m_len);
> + ext4_ext_store_pblock(&zero_ex,
> + ext4_ext_pblock(ex) + map->m_lblk - ee_block);
> + err = ext4_ext_zeroout(inode,&zero_ex);
> if (err)
> goto out;
> + split_map.m_lblk = map->m_lblk;
> + split_map.m_len = allocated;
> + } else if ((map->m_lblk - ee_block + map->m_len<
> + EXT4_EXT_ZERO_LEN)&&
> + (EXT4_EXT_MAY_ZEROOUT& split_flag)) {
> + /* case 2 */
> + if (map->m_lblk != ee_block) {
> + zero_ex.ee_block = ex->ee_block;
> + zero_ex.ee_len = cpu_to_le16(map->m_lblk -
> + ee_block);
> + ext4_ext_store_pblock(&zero_ex,
> + ext4_ext_pblock(ex));
> + err = ext4_ext_zeroout(inode,&zero_ex);
> + if (err)
> + goto out;
> + }
> +
> + allocated = map->m_lblk - ee_block + map->m_len;
> +
> + split_map.m_lblk = ee_block;
> + split_map.m_len = allocated;
> }
> }
> - /* Mark modified extent as dirty */
> - err = ext4_ext_dirty(handle, inode, path + depth);
> - goto out;
> -insert:
> - err = ext4_ext_insert_extent(handle, inode, path,&newex, 0);
> - if (err == -ENOSPC&& may_zeroout) {
> - err = ext4_ext_zeroout(inode,&orig_ex);
> - if (err)
> - goto fix_extent_len;
> - /* update the extent length and mark as initialized */
> - ex->ee_block = orig_ex.ee_block;
> - ex->ee_len = orig_ex.ee_len;
> - ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ext4_ext_dirty(handle, inode, path + depth);
> - /* zero out the first half */
> - return allocated;
> - } else if (err)
> - goto fix_extent_len;
> +
> + allocated = ext4_split_extent(handle, inode, path,
> + &split_map, split_flag, 0);
> + if (allocated< 0)
> + err = allocated;
> +
> out:
> - ext4_ext_show_leaf(inode, path);
> return err ? err : allocated;
> -
> -fix_extent_len:
> - ex->ee_block = orig_ex.ee_block;
> - ex->ee_len = orig_ex.ee_len;
> - ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ext4_ext_mark_uninitialized(ex);
> - ext4_ext_dirty(handle, inode, path + depth);
> - return err;
> }
>
> /*
> @@ -3096,15 +2902,11 @@ static int ext4_split_unwritten_extents(handle_t *handle,
> struct ext4_ext_path *path,
> int flags)
> {
> - struct ext4_extent *ex, newex, orig_ex;
> - struct ext4_extent *ex1 = NULL;
> - struct ext4_extent *ex2 = NULL;
> - struct ext4_extent *ex3 = NULL;
> - ext4_lblk_t ee_block, eof_block;
> - unsigned int allocated, ee_len, depth;
> - ext4_fsblk_t newblock;
> - int err = 0;
> - int may_zeroout;
> + ext4_lblk_t eof_block;
> + ext4_lblk_t ee_block;
> + struct ext4_extent *ex;
> + unsigned int ee_len;
> + int split_flag = 0, depth;
>
> ext_debug("ext4_split_unwritten_extents: inode %lu, logical"
> "block %llu, max_blocks %u\n", inode->i_ino,
> @@ -3114,155 +2916,20 @@ static int ext4_split_unwritten_extents(handle_t *handle,
> inode->i_sb->s_blocksize_bits;
> if (eof_block< map->m_lblk + map->m_len)
> eof_block = map->m_lblk + map->m_len;
> -
> - depth = ext_depth(inode);
> - ex = path[depth].p_ext;
> - ee_block = le32_to_cpu(ex->ee_block);
> - ee_len = ext4_ext_get_actual_len(ex);
> - allocated = ee_len - (map->m_lblk - ee_block);
> - newblock = map->m_lblk - ee_block + ext4_ext_pblock(ex);
> -
> - ex2 = ex;
> - orig_ex.ee_block = ex->ee_block;
> - orig_ex.ee_len = cpu_to_le16(ee_len);
> - ext4_ext_store_pblock(&orig_ex, ext4_ext_pblock(ex));
> -
> /*
> * It is safe to convert extent to initialized via explicit
> * zeroout only if extent is fully insde i_size or new_size.
> */
> - may_zeroout = ee_block + ee_len<= eof_block;
> -
> - /*
> - * If the uninitialized extent begins at the same logical
> - * block where the write begins, and the write completely
> - * covers the extent, then we don't need to split it.
> - */
> - if ((map->m_lblk == ee_block)&& (allocated<= map->m_len))
> - return allocated;
> -
> - err = ext4_ext_get_access(handle, inode, path + depth);
> - if (err)
> - goto out;
> - /* ex1: ee_block to map->m_lblk - 1 : uninitialized */
> - if (map->m_lblk> ee_block) {
> - ex1 = ex;
> - ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
> - ext4_ext_mark_uninitialized(ex1);
> - ex2 =&newex;
> - }
> - /*
> - * for sanity, update the length of the ex2 extent before
> - * we insert ex3, if ex1 is NULL. This is to avoid temporary
> - * overlap of blocks.
> - */
> - if (!ex1&& allocated> map->m_len)
> - ex2->ee_len = cpu_to_le16(map->m_len);
> - /* ex3: to ee_block + ee_len : uninitialised */
> - if (allocated> map->m_len) {
> - unsigned int newdepth;
> - ex3 =&newex;
> - ex3->ee_block = cpu_to_le32(map->m_lblk + map->m_len);
> - ext4_ext_store_pblock(ex3, newblock + map->m_len);
> - ex3->ee_len = cpu_to_le16(allocated - map->m_len);
> - ext4_ext_mark_uninitialized(ex3);
> - err = ext4_ext_insert_extent(handle, inode, path, ex3, flags);
> - if (err == -ENOSPC&& may_zeroout) {
> - err = ext4_ext_zeroout(inode,&orig_ex);
> - if (err)
> - goto fix_extent_len;
> - /* update the extent length and mark as initialized */
> - ex->ee_block = orig_ex.ee_block;
> - ex->ee_len = orig_ex.ee_len;
> - ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ext4_ext_dirty(handle, inode, path + depth);
> - /* zeroed the full extent */
> - /* blocks available from map->m_lblk */
> - return allocated;
> -
> - } else if (err)
> - goto fix_extent_len;
> - /*
> - * The depth, and hence eh& ex might change
> - * as part of the insert above.
> - */
> - newdepth = ext_depth(inode);
> - /*
> - * update the extent length after successful insert of the
> - * split extent
> - */
> - ee_len -= ext4_ext_get_actual_len(ex3);
> - orig_ex.ee_len = cpu_to_le16(ee_len);
> - may_zeroout = ee_block + ee_len<= eof_block;
> -
> - depth = newdepth;
> - ext4_ext_drop_refs(path);
> - path = ext4_ext_find_extent(inode, map->m_lblk, path);
> - if (IS_ERR(path)) {
> - err = PTR_ERR(path);
> - goto out;
> - }
> - ex = path[depth].p_ext;
> - if (ex2 !=&newex)
> - ex2 = ex;
> -
> - err = ext4_ext_get_access(handle, inode, path + depth);
> - if (err)
> - goto out;
> + depth = ext_depth(inode);
> + ex = path[depth].p_ext;
> + ee_block = le32_to_cpu(ex->ee_block);
> + ee_len = ext4_ext_get_actual_len(ex);
>
> - allocated = map->m_len;
> - }
> - /*
> - * If there was a change of depth as part of the
> - * insertion of ex3 above, we need to update the length
> - * of the ex1 extent again here
> - */
> - if (ex1&& ex1 != ex) {
> - ex1 = ex;
> - ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
> - ext4_ext_mark_uninitialized(ex1);
> - ex2 =&newex;
> - }
> - /*
> - * ex2: map->m_lblk to map->m_lblk + map->m_len-1 : to be written
> - * using direct I/O, uninitialised still.
> - */
> - ex2->ee_block = cpu_to_le32(map->m_lblk);
> - ext4_ext_store_pblock(ex2, newblock);
> - ex2->ee_len = cpu_to_le16(allocated);
> - ext4_ext_mark_uninitialized(ex2);
> - if (ex2 != ex)
> - goto insert;
> - /* Mark modified extent as dirty */
> - err = ext4_ext_dirty(handle, inode, path + depth);
> - ext_debug("out here\n");
> - goto out;
> -insert:
> - err = ext4_ext_insert_extent(handle, inode, path,&newex, flags);
> - if (err == -ENOSPC&& may_zeroout) {
> - err = ext4_ext_zeroout(inode,&orig_ex);
> - if (err)
> - goto fix_extent_len;
> - /* update the extent length and mark as initialized */
> - ex->ee_block = orig_ex.ee_block;
> - ex->ee_len = orig_ex.ee_len;
> - ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ext4_ext_dirty(handle, inode, path + depth);
> - /* zero out the first half */
> - return allocated;
> - } else if (err)
> - goto fix_extent_len;
> -out:
> - ext4_ext_show_leaf(inode, path);
> - return err ? err : allocated;
> + split_flag |= ee_block + ee_len<= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
> + split_flag |= EXT4_EXT_MARK_UNINIT2;
>
> -fix_extent_len:
> - ex->ee_block = orig_ex.ee_block;
> - ex->ee_len = orig_ex.ee_len;
> - ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
> - ext4_ext_mark_uninitialized(ex);
> - ext4_ext_dirty(handle, inode, path + depth);
> - return err;
> + flags |= EXT4_GET_BLOCKS_PRE_IO;
> + return ext4_split_extent(handle, inode, path, map, split_flag, flags);
> }
>
> static int ext4_convert_unwritten_extents_endio(handle_t *handle,


2011-04-21 09:10:12

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] ext4:Reimplement convert and split_unwritten.

On Thu, Apr 21, 2011 at 4:28 PM, Allison Henderson
<[email protected]> wrote:
> On 4/13/2011 10:40 PM, Yongqiang Yang wrote:
>> convert and split unwritten are reimplemented based on ext4_split_extent()
>> added in last patch.
>>
>> Signed-off-by: Yongqiang Yang<[email protected]>
>> ---
>> ? fs/ext4/extents.c | ?483 ++++++++--------------------------------------------
>> ? 1 files changed, 75 insertions(+), 408 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 1756e0f..ee2dda3 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -2770,17 +2770,13 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct ext4_map_blocks *map,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct ext4_ext_path *path)
>> ? {
>> + ? ? struct ext4_map_blocks split_map;
>> + ? ? struct ext4_extent zero_ex;
>> + ? ? struct ext4_extent *ex;
>> ? ? ? ext4_lblk_t ee_block, eof_block;
>> ? ? ? unsigned int allocated, ee_len, depth;
>> ? ? ? int err = 0;
>> + ? ? int split_flag = 0;
>>
>> ? ? ? ext_debug("ext4_ext_convert_to_initialized: inode %lu, logical"
>> ? ? ? ? ? ? ? "block %llu, max_blocks %u\n", inode->i_ino,
>> @@ -2792,280 +2788,90 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>> ? ? ? ? ? ? ? eof_block = map->m_lblk + map->m_len;
>>
>> ? ? ? depth = ext_depth(inode);
>> ? ? ? ex = path[depth].p_ext;
>> ? ? ? ee_block = le32_to_cpu(ex->ee_block);
>> ? ? ? ee_len = ext4_ext_get_actual_len(ex);
>> ? ? ? allocated = ee_len - (map->m_lblk - ee_block);
>>
>> + ? ? WARN_ON(map->m_lblk< ?ee_block);
>> ? ? ? /*
>> ? ? ? ?* It is safe to convert extent to initialized via explicit
>> ? ? ? ?* zeroout only if extent is fully insde i_size or new_size.
>> ? ? ? ?*/
>> + ? ? split_flag |= ee_block + ee_len<= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
>>
>> ? ? ? /* If extent has less than 2*EXT4_EXT_ZERO_LEN zerout directly */
>> + ? ? if (ee_len<= 2*EXT4_EXT_ZERO_LEN&&
>> + ? ? ? ? (EXT4_EXT_MAY_ZEROOUT& ?split_flag)) {
>> + ? ? ? ? ? ? zero_ex.ee_block = ex->ee_block;
>> + ? ? ? ? ? ? zero_ex.ee_len = cpu_to_le16(map->m_lblk - ee_block);
>> + ? ? ? ? ? ? ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex));
>> + ? ? ? ? ? ? err = ext4_ext_zeroout(inode,&zero_ex);
Hi,

Sorry, this is a bug. We should zero out two extents [ee_block,
map->m_lblk) and [map->m_lblk + map->m_len, ee_block + ee_len).
Original ext4 code zero out whole extent ex. Could you try to have a
test which zero out whole extent like below.
err = ext4_ext_zeroout(inode,ex);

ext4_ext_zeroout() wait until all zero IOs are finished. Slowness may
come from two ext4_ext_zeroout()s.

BTW: we can optimize ext4_ext_zeroout() by zeroing out pages in page
cache and marking the pages dirty.? What's your opinion?

Thank you,
Yongqiang.

> Hi again,
>
> I am working on trying to get the code to pass the fsx stress test. ?If I add this snippet of code here, it seems to fix most of the tests, but it is very slow. ?So this may not be the best solution yet. ?But I thought if updated you with what I found we might be able to come up with a better solution.
>
> ? ? ? ? ? ? ? ?if((map->m_lblk + map->m_len) < ee_block + ee_len){
> ? ? ? ? ? ? ? ? ? ? ? ?zero_ex.ee_block = map->m_lblk + map->m_len;
> ? ? ? ? ? ? ? ? ? ? ? ?zero_ex.ee_len = cpu_to_le16(ee_len - (map->m_lblk + map->m_len));
> ? ? ? ? ? ? ? ? ? ? ? ?ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex)+(map->m_lblk - ee_block)+ map->m_len);
> ? ? ? ? ? ? ? ? ? ? ? ?err = ext4_ext_zeroout(inode, &zero_ex);
> ? ? ? ? ? ? ? ? ? ? ? ?if (err) {
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?goto out;
> ? ? ? ? ? ? ? ? ? ? ? ?}
> ? ? ? ? ? ? ? ?}
>
> At the moment, the test fails because a portion of the test file does not get zeroed out when it should. ?The punch hole code needs to flush the pages before it punches out any blocks. ?It uses the filemap_write_and_wait_range routine, which eventually calls ext4_ext_convert_to_initialized. ?The bug only appears to show up when flushing out the middle of an unwritten extent. The two halves on either side are supposed to get zeroed out or split off, but it looks like one of them was not getting zeroed out. ?This snippet seems to fix that, but I'm not yet sure why it's so slow.
>
>> ? ? ? ? ? ? ? err = ext4_ext_get_access(handle, inode, path + depth);
>> ? ? ? ? ? ? ? if (err)
>> ? ? ? ? ? ? ? ? ? ? ? goto out;
>> + ? ? ? ? ? ? ext4_ext_mark_initialized(ex);
>> + ? ? ? ? ? ? ext4_ext_try_to_merge(inode, path, ex);
>> + ? ? ? ? ? ? err = ext4_ext_dirty(handle, inode, path + depth);
>> + ? ? ? ? ? ? goto out;
>> ? ? ? }
>> +
>> ? ? ? /*
>> + ? ? ?* four cases:
>> + ? ? ?* 1. split the extent into three extents.
>> + ? ? ?* 2. split the extent into two extents, zeroout the first half.
>> + ? ? ?* 3. split the extent into two extents, zeroout the second half.
>> + ? ? ?* 4. split the extent into two extents with out zeroout.
>> ? ? ? ?*/
>> + ? ? split_map.m_lblk = map->m_lblk;
>> + ? ? split_map.m_len = map->m_len;
>> +
>> + ? ? if (allocated> ?map->m_len) {
>> + ? ? ? ? ? ? if (allocated<= EXT4_EXT_ZERO_LEN&&
>> + ? ? ? ? ? ? ? ? (EXT4_EXT_MAY_ZEROOUT& ?split_flag)) {
>> + ? ? ? ? ? ? ? ? ? ? /* case 3 */
>> + ? ? ? ? ? ? ? ? ? ? zero_ex.ee_block =
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?cpu_to_le32(map->m_lblk + map->m_len);
>> + ? ? ? ? ? ? ? ? ? ? zero_ex.ee_len = cpu_to_le16(allocated - map->m_len);
>> + ? ? ? ? ? ? ? ? ? ? ext4_ext_store_pblock(&zero_ex,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ext4_ext_pblock(ex) + map->m_lblk - ee_block);
>> + ? ? ? ? ? ? ? ? ? ? err = ext4_ext_zeroout(inode,&zero_ex);
>> ? ? ? ? ? ? ? ? ? ? ? if (err)
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto out;
>> + ? ? ? ? ? ? ? ? ? ? split_map.m_lblk = map->m_lblk;
>> + ? ? ? ? ? ? ? ? ? ? split_map.m_len = allocated;
>> + ? ? ? ? ? ? } else if ((map->m_lblk - ee_block + map->m_len<
>> + ? ? ? ? ? ? ? ? ? ? ? ?EXT4_EXT_ZERO_LEN)&&
>> + ? ? ? ? ? ? ? ? ? ? ? ?(EXT4_EXT_MAY_ZEROOUT& ?split_flag)) {
>> + ? ? ? ? ? ? ? ? ? ? /* case 2 */
>> + ? ? ? ? ? ? ? ? ? ? if (map->m_lblk != ee_block) {
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? zero_ex.ee_block = ex->ee_block;
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? zero_ex.ee_len = cpu_to_le16(map->m_lblk -
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ee_block);
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ext4_ext_store_pblock(&zero_ex,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ext4_ext_pblock(ex));
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? err = ext4_ext_zeroout(inode,&zero_ex);
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? if (err)
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto out;
>> + ? ? ? ? ? ? ? ? ? ? }
>> +
>> + ? ? ? ? ? ? ? ? ? ? allocated = map->m_lblk - ee_block + map->m_len;
>> +
>> + ? ? ? ? ? ? ? ? ? ? split_map.m_lblk = ee_block;
>> + ? ? ? ? ? ? ? ? ? ? split_map.m_len = allocated;
>> ? ? ? ? ? ? ? }
>> ? ? ? }
>> +
>> + ? ? allocated = ext4_split_extent(handle, inode, path,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? &split_map, split_flag, 0);
>> + ? ? if (allocated< ?0)
>> + ? ? ? ? ? ? err = allocated;
>> +
>> ? out:
>> ? ? ? return err ? err : allocated;
>> ? }
>>
>> ? /*
>> @@ -3096,15 +2902,11 @@ static int ext4_split_unwritten_extents(handle_t *handle,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct ext4_ext_path *path,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int flags)
>> ? {
>> - ? ? struct ext4_extent *ex, newex, orig_ex;
>> - ? ? struct ext4_extent *ex1 = NULL;
>> - ? ? struct ext4_extent *ex2 = NULL;
>> - ? ? struct ext4_extent *ex3 = NULL;
>> - ? ? ext4_lblk_t ee_block, eof_block;
>> - ? ? unsigned int allocated, ee_len, depth;
>> - ? ? ext4_fsblk_t newblock;
>> - ? ? int err = 0;
>> - ? ? int may_zeroout;
>> + ? ? ext4_lblk_t eof_block;
>> + ? ? ext4_lblk_t ee_block;
>> + ? ? struct ext4_extent *ex;
>> + ? ? unsigned int ee_len;
>> + ? ? int split_flag = 0, depth;
>>
>> ? ? ? ext_debug("ext4_split_unwritten_extents: inode %lu, logical"
>> ? ? ? ? ? ? ? "block %llu, max_blocks %u\n", inode->i_ino,
>> @@ -3114,155 +2916,20 @@ static int ext4_split_unwritten_extents(handle_t *handle,
>> ? ? ? ? ? ? ? inode->i_sb->s_blocksize_bits;
>> ? ? ? if (eof_block< ?map->m_lblk + map->m_len)
>> ? ? ? ? ? ? ? eof_block = map->m_lblk + map->m_len;
>> -
>> - ? ? depth = ext_depth(inode);
>> - ? ? ex = path[depth].p_ext;
>> - ? ? ee_block = le32_to_cpu(ex->ee_block);
>> - ? ? ee_len = ext4_ext_get_actual_len(ex);
>> - ? ? allocated = ee_len - (map->m_lblk - ee_block);
>> - ? ? newblock = map->m_lblk - ee_block + ext4_ext_pblock(ex);
>> -
>> - ? ? ex2 = ex;
>> - ? ? orig_ex.ee_block = ex->ee_block;
>> - ? ? orig_ex.ee_len ? = cpu_to_le16(ee_len);
>> - ? ? ext4_ext_store_pblock(&orig_ex, ext4_ext_pblock(ex));
>> -
>> ? ? ? /*
>> ? ? ? ?* It is safe to convert extent to initialized via explicit
>> ? ? ? ?* zeroout only if extent is fully insde i_size or new_size.
>> ? ? ? ?*/
>> - ? ? may_zeroout = ee_block + ee_len<= eof_block;
>> -
>> - ? ? /*
>> - ? ? ?* If the uninitialized extent begins at the same logical
>> - ? ? ?* block where the write begins, and the write completely
>> - ? ? ?* covers the extent, then we don't need to split it.
>> - ? ? ?*/
>> - ? ? if ((map->m_lblk == ee_block)&& ?(allocated<= map->m_len))
>> - ? ? ? ? ? ? return allocated;
>> -
>> - ? ? err = ext4_ext_get_access(handle, inode, path + depth);
>> - ? ? if (err)
>> - ? ? ? ? ? ? goto out;
>> - ? ? /* ex1: ee_block to map->m_lblk - 1 : uninitialized */
>> - ? ? if (map->m_lblk> ?ee_block) {
>> - ? ? ? ? ? ? ex1 = ex;
>> - ? ? ? ? ? ? ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
>> - ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex1);
>> - ? ? ? ? ? ? ex2 =&newex;
>> - ? ? }
>> - ? ? /*
>> - ? ? ?* for sanity, update the length of the ex2 extent before
>> - ? ? ?* we insert ex3, if ex1 is NULL. This is to avoid temporary
>> - ? ? ?* overlap of blocks.
>> - ? ? ?*/
>> - ? ? if (!ex1&& ?allocated> ?map->m_len)
>> - ? ? ? ? ? ? ex2->ee_len = cpu_to_le16(map->m_len);
>> - ? ? /* ex3: to ee_block + ee_len : uninitialised */
>> - ? ? if (allocated> ?map->m_len) {
>> - ? ? ? ? ? ? unsigned int newdepth;
>> - ? ? ? ? ? ? ex3 =&newex;
>> - ? ? ? ? ? ? ex3->ee_block = cpu_to_le32(map->m_lblk + map->m_len);
>> - ? ? ? ? ? ? ext4_ext_store_pblock(ex3, newblock + map->m_len);
>> - ? ? ? ? ? ? ex3->ee_len = cpu_to_le16(allocated - map->m_len);
>> - ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex3);
>> - ? ? ? ? ? ? err = ext4_ext_insert_extent(handle, inode, path, ex3, flags);
>> - ? ? ? ? ? ? if (err == -ENOSPC&& ?may_zeroout) {
>> - ? ? ? ? ? ? ? ? ? ? err = ?ext4_ext_zeroout(inode,&orig_ex);
>> - ? ? ? ? ? ? ? ? ? ? if (err)
>> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
>> - ? ? ? ? ? ? ? ? ? ? /* update the extent length and mark as initialized */
>> - ? ? ? ? ? ? ? ? ? ? ex->ee_block = orig_ex.ee_block;
>> - ? ? ? ? ? ? ? ? ? ? ex->ee_len ? = orig_ex.ee_len;
>> - ? ? ? ? ? ? ? ? ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
>> - ? ? ? ? ? ? ? ? ? ? ext4_ext_dirty(handle, inode, path + depth);
>> - ? ? ? ? ? ? ? ? ? ? /* zeroed the full extent */
>> - ? ? ? ? ? ? ? ? ? ? /* blocks available from map->m_lblk */
>> - ? ? ? ? ? ? ? ? ? ? return allocated;
>> -
>> - ? ? ? ? ? ? } else if (err)
>> - ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
>> - ? ? ? ? ? ? /*
>> - ? ? ? ? ? ? ?* The depth, and hence eh& ?ex might change
>> - ? ? ? ? ? ? ?* as part of the insert above.
>> - ? ? ? ? ? ? ?*/
>> - ? ? ? ? ? ? newdepth = ext_depth(inode);
>> - ? ? ? ? ? ? /*
>> - ? ? ? ? ? ? ?* update the extent length after successful insert of the
>> - ? ? ? ? ? ? ?* split extent
>> - ? ? ? ? ? ? ?*/
>> - ? ? ? ? ? ? ee_len -= ext4_ext_get_actual_len(ex3);
>> - ? ? ? ? ? ? orig_ex.ee_len = cpu_to_le16(ee_len);
>> - ? ? ? ? ? ? may_zeroout = ee_block + ee_len<= eof_block;
>> -
>> - ? ? ? ? ? ? depth = newdepth;
>> - ? ? ? ? ? ? ext4_ext_drop_refs(path);
>> - ? ? ? ? ? ? path = ext4_ext_find_extent(inode, map->m_lblk, path);
>> - ? ? ? ? ? ? if (IS_ERR(path)) {
>> - ? ? ? ? ? ? ? ? ? ? err = PTR_ERR(path);
>> - ? ? ? ? ? ? ? ? ? ? goto out;
>> - ? ? ? ? ? ? }
>> - ? ? ? ? ? ? ex = path[depth].p_ext;
>> - ? ? ? ? ? ? if (ex2 !=&newex)
>> - ? ? ? ? ? ? ? ? ? ? ex2 = ex;
>> -
>> - ? ? ? ? ? ? err = ext4_ext_get_access(handle, inode, path + depth);
>> - ? ? ? ? ? ? if (err)
>> - ? ? ? ? ? ? ? ? ? ? goto out;
>> + ? ? depth = ext_depth(inode);
>> + ? ? ex = path[depth].p_ext;
>> + ? ? ee_block = le32_to_cpu(ex->ee_block);
>> + ? ? ee_len = ext4_ext_get_actual_len(ex);
>>
>> - ? ? ? ? ? ? allocated = map->m_len;
>> - ? ? }
>> - ? ? /*
>> - ? ? ?* If there was a change of depth as part of the
>> - ? ? ?* insertion of ex3 above, we need to update the length
>> - ? ? ?* of the ex1 extent again here
>> - ? ? ?*/
>> - ? ? if (ex1&& ?ex1 != ex) {
>> - ? ? ? ? ? ? ex1 = ex;
>> - ? ? ? ? ? ? ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
>> - ? ? ? ? ? ? ext4_ext_mark_uninitialized(ex1);
>> - ? ? ? ? ? ? ex2 =&newex;
>> - ? ? }
>> - ? ? /*
>> - ? ? ?* ex2: map->m_lblk to map->m_lblk + map->m_len-1 : to be written
>> - ? ? ?* using direct I/O, uninitialised still.
>> - ? ? ?*/
>> - ? ? ex2->ee_block = cpu_to_le32(map->m_lblk);
>> - ? ? ext4_ext_store_pblock(ex2, newblock);
>> - ? ? ex2->ee_len = cpu_to_le16(allocated);
>> - ? ? ext4_ext_mark_uninitialized(ex2);
>> - ? ? if (ex2 != ex)
>> - ? ? ? ? ? ? goto insert;
>> - ? ? /* Mark modified extent as dirty */
>> - ? ? err = ext4_ext_dirty(handle, inode, path + depth);
>> - ? ? ext_debug("out here\n");
>> - ? ? goto out;
>> -insert:
>> - ? ? err = ext4_ext_insert_extent(handle, inode, path,&newex, flags);
>> - ? ? if (err == -ENOSPC&& ?may_zeroout) {
>> - ? ? ? ? ? ? err = ?ext4_ext_zeroout(inode,&orig_ex);
>> - ? ? ? ? ? ? if (err)
>> - ? ? ? ? ? ? ? ? ? ? goto fix_extent_len;
>> - ? ? ? ? ? ? /* update the extent length and mark as initialized */
>> - ? ? ? ? ? ? ex->ee_block = orig_ex.ee_block;
>> - ? ? ? ? ? ? ex->ee_len ? = orig_ex.ee_len;
>> - ? ? ? ? ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
>> - ? ? ? ? ? ? ext4_ext_dirty(handle, inode, path + depth);
>> - ? ? ? ? ? ? /* zero out the first half */
>> - ? ? ? ? ? ? return allocated;
>> - ? ? } else if (err)
>> - ? ? ? ? ? ? goto fix_extent_len;
>> -out:
>> - ? ? ext4_ext_show_leaf(inode, path);
>> - ? ? return err ? err : allocated;
>> + ? ? split_flag |= ee_block + ee_len<= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
>> + ? ? split_flag |= EXT4_EXT_MARK_UNINIT2;
>>
>> -fix_extent_len:
>> - ? ? ex->ee_block = orig_ex.ee_block;
>> - ? ? ex->ee_len ? = orig_ex.ee_len;
>> - ? ? ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
>> - ? ? ext4_ext_mark_uninitialized(ex);
>> - ? ? ext4_ext_dirty(handle, inode, path + depth);
>> - ? ? return err;
>> + ? ? flags |= EXT4_GET_BLOCKS_PRE_IO;
>> + ? ? return ext4_split_extent(handle, inode, path, map, split_flag, flags);
>> ? }
>>
>> ? static int ext4_convert_unwritten_extents_endio(handle_t *handle,
>
>



--
Best Wishes
Yongqiang Yang

2011-04-21 21:13:46

by Allison Henderson

[permalink] [raw]
Subject: Re: [PATCH RFC 3/3] ext4:Reimplement convert and split_unwritten.

On 4/21/2011 2:10 AM, Yongqiang Yang wrote:
> On Thu, Apr 21, 2011 at 4:28 PM, Allison Henderson
> <[email protected]> wrote:
>> On 4/13/2011 10:40 PM, Yongqiang Yang wrote:
>>> convert and split unwritten are reimplemented based on ext4_split_extent()
>>> added in last patch.
>>>
>>> Signed-off-by: Yongqiang Yang<[email protected]>
>>> ---
>>> fs/ext4/extents.c | 483 ++++++++--------------------------------------------
>>> 1 files changed, 75 insertions(+), 408 deletions(-)
>>>
>>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>>> index 1756e0f..ee2dda3 100644
>>> --- a/fs/ext4/extents.c
>>> +++ b/fs/ext4/extents.c
>>> @@ -2770,17 +2770,13 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>>> struct ext4_map_blocks *map,
>>> struct ext4_ext_path *path)
>>> {
>>> + struct ext4_map_blocks split_map;
>>> + struct ext4_extent zero_ex;
>>> + struct ext4_extent *ex;
>>> ext4_lblk_t ee_block, eof_block;
>>> unsigned int allocated, ee_len, depth;
>>> int err = 0;
>>> + int split_flag = 0;
>>>
>>> ext_debug("ext4_ext_convert_to_initialized: inode %lu, logical"
>>> "block %llu, max_blocks %u\n", inode->i_ino,
>>> @@ -2792,280 +2788,90 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
>>> eof_block = map->m_lblk + map->m_len;
>>>
>>> depth = ext_depth(inode);
>>> ex = path[depth].p_ext;
>>> ee_block = le32_to_cpu(ex->ee_block);
>>> ee_len = ext4_ext_get_actual_len(ex);
>>> allocated = ee_len - (map->m_lblk - ee_block);
>>>
>>> + WARN_ON(map->m_lblk< ee_block);
>>> /*
>>> * It is safe to convert extent to initialized via explicit
>>> * zeroout only if extent is fully insde i_size or new_size.
>>> */
>>> + split_flag |= ee_block + ee_len<= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
>>>
>>> /* If extent has less than 2*EXT4_EXT_ZERO_LEN zerout directly */
>>> + if (ee_len<= 2*EXT4_EXT_ZERO_LEN&&
>>> + (EXT4_EXT_MAY_ZEROOUT& split_flag)) {
>>> + zero_ex.ee_block = ex->ee_block;
>>> + zero_ex.ee_len = cpu_to_le16(map->m_lblk - ee_block);
>>> + ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex));
>>> + err = ext4_ext_zeroout(inode,&zero_ex);
> Hi,
>
> Sorry, this is a bug. We should zero out two extents [ee_block,
> map->m_lblk) and [map->m_lblk + map->m_len, ee_block + ee_len).
> Original ext4 code zero out whole extent ex. Could you try to have a
> test which zero out whole extent like below.
> err = ext4_ext_zeroout(inode,ex);
>
> ext4_ext_zeroout() wait until all zero IOs are finished. Slowness may
> come from two ext4_ext_zeroout()s.
>
> BTW: we can optimize ext4_ext_zeroout() by zeroing out pages in page
> cache and marking the pages dirty. What's your opinion?
>
> Thank you,
> Yongqiang.
>
>> Hi again,
>>
>> I am working on trying to get the code to pass the fsx stress test. If I add this snippet of code here, it seems to fix most of the tests, but it is very slow. So this may not be the best solution yet. But I thought if updated you with what I found we might be able to come up with a better solution.
>>
>> if((map->m_lblk + map->m_len)< ee_block + ee_len){
>> zero_ex.ee_block = map->m_lblk + map->m_len;
>> zero_ex.ee_len = cpu_to_le16(ee_len - (map->m_lblk + map->m_len));
>> ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex)+(map->m_lblk - ee_block)+ map->m_len);
>> err = ext4_ext_zeroout(inode,&zero_ex);
>> if (err) {
>> goto out;
>> }
>> }
>>
>> At the moment, the test fails because a portion of the test file does not get zeroed out when it should. The punch hole code needs to flush the pages before it punches out any blocks. It uses the filemap_write_and_wait_range routine, which eventually calls ext4_ext_convert_to_initialized. The bug only appears to show up when flushing out the middle of an unwritten extent. The two halves on either side are supposed to get zeroed out or split off, but it looks like one of them was not getting zeroed out. This snippet seems to fix that, but I'm not yet sure why it's so slow.
>>
>>> err = ext4_ext_get_access(handle, inode, path + depth);
>>> if (err)
>>> goto out;
>>> + ext4_ext_mark_initialized(ex);
>>> + ext4_ext_try_to_merge(inode, path, ex);
>>> + err = ext4_ext_dirty(handle, inode, path + depth);
>>> + goto out;
>>> }
>>> +
>>> /*
>>> + * four cases:
>>> + * 1. split the extent into three extents.
>>> + * 2. split the extent into two extents, zeroout the first half.
>>> + * 3. split the extent into two extents, zeroout the second half.
>>> + * 4. split the extent into two extents with out zeroout.
>>> */
>>> + split_map.m_lblk = map->m_lblk;
>>> + split_map.m_len = map->m_len;
>>> +
>>> + if (allocated> map->m_len) {
>>> + if (allocated<= EXT4_EXT_ZERO_LEN&&
>>> + (EXT4_EXT_MAY_ZEROOUT& split_flag)) {
>>> + /* case 3 */
>>> + zero_ex.ee_block =
>>> + cpu_to_le32(map->m_lblk + map->m_len);
>>> + zero_ex.ee_len = cpu_to_le16(allocated - map->m_len);
>>> + ext4_ext_store_pblock(&zero_ex,
>>> + ext4_ext_pblock(ex) + map->m_lblk - ee_block);
>>> + err = ext4_ext_zeroout(inode,&zero_ex);
>>> if (err)
>>> goto out;
>>> + split_map.m_lblk = map->m_lblk;
>>> + split_map.m_len = allocated;
>>> + } else if ((map->m_lblk - ee_block + map->m_len<
>>> + EXT4_EXT_ZERO_LEN)&&
>>> + (EXT4_EXT_MAY_ZEROOUT& split_flag)) {
>>> + /* case 2 */
>>> + if (map->m_lblk != ee_block) {
>>> + zero_ex.ee_block = ex->ee_block;
>>> + zero_ex.ee_len = cpu_to_le16(map->m_lblk -
>>> + ee_block);
>>> + ext4_ext_store_pblock(&zero_ex,
>>> + ext4_ext_pblock(ex));
>>> + err = ext4_ext_zeroout(inode,&zero_ex);
>>> + if (err)
>>> + goto out;
>>> + }
>>> +
>>> + allocated = map->m_lblk - ee_block + map->m_len;
>>> +
>>> + split_map.m_lblk = ee_block;
>>> + split_map.m_len = allocated;
>>> }
>>> }
>>> +
>>> + allocated = ext4_split_extent(handle, inode, path,
>>> +&split_map, split_flag, 0);
>>> + if (allocated< 0)
>>> + err = allocated;
>>> +
>>> out:
>>> return err ? err : allocated;
>>> }
>>>
>>> /*
>>> @@ -3096,15 +2902,11 @@ static int ext4_split_unwritten_extents(handle_t *handle,
>>> struct ext4_ext_path *path,
>>> int flags)
>>> {
>>> - struct ext4_extent *ex, newex, orig_ex;
>>> - struct ext4_extent *ex1 = NULL;
>>> - struct ext4_extent *ex2 = NULL;
>>> - struct ext4_extent *ex3 = NULL;
>>> - ext4_lblk_t ee_block, eof_block;
>>> - unsigned int allocated, ee_len, depth;
>>> - ext4_fsblk_t newblock;
>>> - int err = 0;
>>> - int may_zeroout;
>>> + ext4_lblk_t eof_block;
>>> + ext4_lblk_t ee_block;
>>> + struct ext4_extent *ex;
>>> + unsigned int ee_len;
>>> + int split_flag = 0, depth;
>>>
>>> ext_debug("ext4_split_unwritten_extents: inode %lu, logical"
>>> "block %llu, max_blocks %u\n", inode->i_ino,
>>> @@ -3114,155 +2916,20 @@ static int ext4_split_unwritten_extents(handle_t *handle,
>>> inode->i_sb->s_blocksize_bits;
>>> if (eof_block< map->m_lblk + map->m_len)
>>> eof_block = map->m_lblk + map->m_len;
>>> -
>>> - depth = ext_depth(inode);
>>> - ex = path[depth].p_ext;
>>> - ee_block = le32_to_cpu(ex->ee_block);
>>> - ee_len = ext4_ext_get_actual_len(ex);
>>> - allocated = ee_len - (map->m_lblk - ee_block);
>>> - newblock = map->m_lblk - ee_block + ext4_ext_pblock(ex);
>>> -
>>> - ex2 = ex;
>>> - orig_ex.ee_block = ex->ee_block;
>>> - orig_ex.ee_len = cpu_to_le16(ee_len);
>>> - ext4_ext_store_pblock(&orig_ex, ext4_ext_pblock(ex));
>>> -
>>> /*
>>> * It is safe to convert extent to initialized via explicit
>>> * zeroout only if extent is fully insde i_size or new_size.
>>> */
>>> - may_zeroout = ee_block + ee_len<= eof_block;
>>> -
>>> - /*
>>> - * If the uninitialized extent begins at the same logical
>>> - * block where the write begins, and the write completely
>>> - * covers the extent, then we don't need to split it.
>>> - */
>>> - if ((map->m_lblk == ee_block)&& (allocated<= map->m_len))
>>> - return allocated;
>>> -
>>> - err = ext4_ext_get_access(handle, inode, path + depth);
>>> - if (err)
>>> - goto out;
>>> - /* ex1: ee_block to map->m_lblk - 1 : uninitialized */
>>> - if (map->m_lblk> ee_block) {
>>> - ex1 = ex;
>>> - ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
>>> - ext4_ext_mark_uninitialized(ex1);
>>> - ex2 =&newex;
>>> - }
>>> - /*
>>> - * for sanity, update the length of the ex2 extent before
>>> - * we insert ex3, if ex1 is NULL. This is to avoid temporary
>>> - * overlap of blocks.
>>> - */
>>> - if (!ex1&& allocated> map->m_len)
>>> - ex2->ee_len = cpu_to_le16(map->m_len);
>>> - /* ex3: to ee_block + ee_len : uninitialised */
>>> - if (allocated> map->m_len) {
>>> - unsigned int newdepth;
>>> - ex3 =&newex;
>>> - ex3->ee_block = cpu_to_le32(map->m_lblk + map->m_len);
>>> - ext4_ext_store_pblock(ex3, newblock + map->m_len);
>>> - ex3->ee_len = cpu_to_le16(allocated - map->m_len);
>>> - ext4_ext_mark_uninitialized(ex3);
>>> - err = ext4_ext_insert_extent(handle, inode, path, ex3, flags);
>>> - if (err == -ENOSPC&& may_zeroout) {
>>> - err = ext4_ext_zeroout(inode,&orig_ex);
>>> - if (err)
>>> - goto fix_extent_len;
>>> - /* update the extent length and mark as initialized */
>>> - ex->ee_block = orig_ex.ee_block;
>>> - ex->ee_len = orig_ex.ee_len;
>>> - ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
>>> - ext4_ext_dirty(handle, inode, path + depth);
>>> - /* zeroed the full extent */
>>> - /* blocks available from map->m_lblk */
>>> - return allocated;
>>> -
>>> - } else if (err)
>>> - goto fix_extent_len;
>>> - /*
>>> - * The depth, and hence eh& ex might change
>>> - * as part of the insert above.
>>> - */
>>> - newdepth = ext_depth(inode);
>>> - /*
>>> - * update the extent length after successful insert of the
>>> - * split extent
>>> - */
>>> - ee_len -= ext4_ext_get_actual_len(ex3);
>>> - orig_ex.ee_len = cpu_to_le16(ee_len);
>>> - may_zeroout = ee_block + ee_len<= eof_block;
>>> -
>>> - depth = newdepth;
>>> - ext4_ext_drop_refs(path);
>>> - path = ext4_ext_find_extent(inode, map->m_lblk, path);
>>> - if (IS_ERR(path)) {
>>> - err = PTR_ERR(path);
>>> - goto out;
>>> - }
>>> - ex = path[depth].p_ext;
>>> - if (ex2 !=&newex)
>>> - ex2 = ex;
>>> -
>>> - err = ext4_ext_get_access(handle, inode, path + depth);
>>> - if (err)
>>> - goto out;
>>> + depth = ext_depth(inode);
>>> + ex = path[depth].p_ext;
>>> + ee_block = le32_to_cpu(ex->ee_block);
>>> + ee_len = ext4_ext_get_actual_len(ex);
>>>
>>> - allocated = map->m_len;
>>> - }
>>> - /*
>>> - * If there was a change of depth as part of the
>>> - * insertion of ex3 above, we need to update the length
>>> - * of the ex1 extent again here
>>> - */
>>> - if (ex1&& ex1 != ex) {
>>> - ex1 = ex;
>>> - ex1->ee_len = cpu_to_le16(map->m_lblk - ee_block);
>>> - ext4_ext_mark_uninitialized(ex1);
>>> - ex2 =&newex;
>>> - }
>>> - /*
>>> - * ex2: map->m_lblk to map->m_lblk + map->m_len-1 : to be written
>>> - * using direct I/O, uninitialised still.
>>> - */
>>> - ex2->ee_block = cpu_to_le32(map->m_lblk);
>>> - ext4_ext_store_pblock(ex2, newblock);
>>> - ex2->ee_len = cpu_to_le16(allocated);
>>> - ext4_ext_mark_uninitialized(ex2);
>>> - if (ex2 != ex)
>>> - goto insert;
>>> - /* Mark modified extent as dirty */
>>> - err = ext4_ext_dirty(handle, inode, path + depth);
>>> - ext_debug("out here\n");
>>> - goto out;
>>> -insert:
>>> - err = ext4_ext_insert_extent(handle, inode, path,&newex, flags);
>>> - if (err == -ENOSPC&& may_zeroout) {
>>> - err = ext4_ext_zeroout(inode,&orig_ex);
>>> - if (err)
>>> - goto fix_extent_len;
>>> - /* update the extent length and mark as initialized */
>>> - ex->ee_block = orig_ex.ee_block;
>>> - ex->ee_len = orig_ex.ee_len;
>>> - ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
>>> - ext4_ext_dirty(handle, inode, path + depth);
>>> - /* zero out the first half */
>>> - return allocated;
>>> - } else if (err)
>>> - goto fix_extent_len;
>>> -out:
>>> - ext4_ext_show_leaf(inode, path);
>>> - return err ? err : allocated;
>>> + split_flag |= ee_block + ee_len<= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
>>> + split_flag |= EXT4_EXT_MARK_UNINIT2;
>>>
>>> -fix_extent_len:
>>> - ex->ee_block = orig_ex.ee_block;
>>> - ex->ee_len = orig_ex.ee_len;
>>> - ext4_ext_store_pblock(ex, ext4_ext_pblock(&orig_ex));
>>> - ext4_ext_mark_uninitialized(ex);
>>> - ext4_ext_dirty(handle, inode, path + depth);
>>> - return err;
>>> + flags |= EXT4_GET_BLOCKS_PRE_IO;
>>> + return ext4_split_extent(handle, inode, path, map, split_flag, flags);
>>> }
>>>
>>> static int ext4_convert_unwritten_extents_endio(handle_t *handle,
>>
>>
>
>
>

Hi there,

That seems to have done the trick. Here is a patch that you can apply to your last patch:

Signed-off-by: Allison Henderson <[email protected]>
---
:100644 100644 ee2dda3... 00e9e4f... M fs/ext4/extents.c
fs/ext4/extents.c | 5 +----
1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ee2dda3..00e9e4f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2803,10 +2803,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
/* If extent has less than 2*EXT4_EXT_ZERO_LEN zerout directly */
if (ee_len <= 2*EXT4_EXT_ZERO_LEN &&
(EXT4_EXT_MAY_ZEROOUT & split_flag)) {
- zero_ex.ee_block = ex->ee_block;
- zero_ex.ee_len = cpu_to_le16(map->m_lblk - ee_block);
- ext4_ext_store_pblock(&zero_ex, ext4_ext_pblock(ex));
- err = ext4_ext_zeroout(inode, &zero_ex);
+ err = ext4_ext_zeroout(inode, ex);
if (err)
goto out;

--
1.7.1


As for the zero out code, I am not as familiar with that code, but I did take a look at it, and it does look like zeroing out pages would be a lot faster than zeroing out blocks. I seem to have a good set up here for debugging, so if you have some more optimizations you would like to put in, I would be happy to run them through here. For now, I will hold on to my punch hole code until I get an updated copy of your patch set. Once we have all that done, I will submit another copy of punch hole that applies on top of your patch set. Thx!

Allison Henderson