2012-03-10 04:40:53

by Allison Henderson

[permalink] [raw]
Subject: [PATCH 0/6] Rename delayed extents to status extents

This set is based on Yongqiang's
"[PATCH V3 0/6] ext4: add delayed extent tree for ext4".
This patch set renames the changes introduced by the delayed
extent set to follow a "status extent" naming scheme. This
set does not yet introduce any functional changes other than renaming.
The purpose of the rename it to prepare the set for supporting
allocated extents as well as delayed extents. Then once we have
it tracking allocated extents, we can implement extent locks.
Merging the solutions will eliminate the need to maintain yet
another tree, sense both solutions are an rbtree of extents.

Sense delayed extents have not yet been merged, the plan is
to merge this set with the existing delayed extent set, and
the new "status extent" set will be the foundation for extent
locks.

Yongqiang, if this all looks good to you, I will merge
our sets and include it as part of the extent lock set,
once I get the rest of it finished up. Thx!

Allison Henderson (6):
Rename de patch1 from de to se scheme
Rename de patch2 from de to se scheme
Rename de patch3 from de to se scheme
Rename de patch4 from de to se scheme
Rename de patch5 from de to se scheme
Rename de patch6 from de to se scheme

fs/ext4/Makefile | 2 +-
fs/ext4/delayed_extents.c | 415 --------------------------------------------
fs/ext4/delayed_extents.h | 40 -----
fs/ext4/ext4.h | 4 +-
fs/ext4/extents.c | 24 ++--
fs/ext4/indirect.c | 2 +-
fs/ext4/inode.c | 4 +-
fs/ext4/status_extents.c | 417 +++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/status_extents.h | 40 +++++
fs/ext4/super.c | 8 +-
10 files changed, 479 insertions(+), 477 deletions(-)
delete mode 100644 fs/ext4/delayed_extents.c
delete mode 100644 fs/ext4/delayed_extents.h
create mode 100644 fs/ext4/status_extents.c
create mode 100644 fs/ext4/status_extents.h



2012-03-10 04:40:54

by Allison Henderson

[permalink] [raw]
Subject: [PATCH 3/6] Rename de patch3 from de to se scheme

This patch renames changes introduced by patch
"[PATCH_V3_3_6]_ext4_initialize_delayed_extent_tree "

All code referencing "delayed extents" or "de" are changed to
"status extents" or "se"

Signed-off-by: Allison Henderson <[email protected]>
---
:100644 100644 e76253b... e6e2415... M fs/ext4/super.c
fs/ext4/super.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e76253b..e6e2415 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -899,7 +899,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache));
INIT_LIST_HEAD(&ei->i_prealloc_list);
spin_lock_init(&ei->i_prealloc_lock);
- ext4_de_init_tree(&ei->i_de_tree);
+ ext4_se_init_tree(&ei->i_se_tree);
ei->i_reserved_data_blocks = 0;
ei->i_reserved_meta_blocks = 0;
ei->i_allocated_meta_blocks = 0;
--
1.7.1


2012-03-10 04:40:55

by Allison Henderson

[permalink] [raw]
Subject: [PATCH 4/6] Rename de patch4 from de to se scheme

This patch renames changes introduced by patch
"[PATCH_V3_4_6]_ext4_let_ext4_maintian_delayed_extent_trees"

All code referencing "delayed extents" or "de" are changed to
"status extents" or "se"

Signed-off-by: Allison Henderson <[email protected]>
---
:100644 100644 de902cd... a8b382e... M fs/ext4/extents.c
:100644 100644 f5f61b28.. 47d78b8... M fs/ext4/indirect.c
:100644 100644 69f9249... af32ea6... M fs/ext4/inode.c
:100644 100644 e6e2415... d1a438b... M fs/ext4/super.c
fs/ext4/extents.c | 2 +-
fs/ext4/indirect.c | 2 +-
fs/ext4/inode.c | 4 ++--
fs/ext4/super.c | 6 +++---
4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index de902cd..a8b382e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4142,7 +4142,7 @@ void ext4_ext_truncate(struct inode *inode)

last_block = (inode->i_size + sb->s_blocksize - 1)
>> EXT4_BLOCK_SIZE_BITS(sb);
- err = ext4_de_remove_space(inode, last_block,
+ err = ext4_se_remove_space(inode, last_block,
EXT_MAX_BLOCKS - last_block);
err = ext4_ext_remove_space(inode, last_block);

diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index f5f61b2..47d78b8 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -1400,7 +1400,7 @@ void ext4_ind_truncate(struct inode *inode)
down_write(&ei->i_data_sem);

ext4_discard_preallocations(inode);
- ext4_de_remove_space(inode, last_block,
+ ext4_se_remove_space(inode, last_block,
EXT_MAX_BLOCKS - last_block);

/*
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 69f9249..af32ea6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -532,7 +532,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
int ret;
delayed_mapped:
/* delayed allocation blocks has been allocated */
- ret = ext4_de_remove_space(inode, map->m_lblk,
+ ret = ext4_se_remove_space(inode, map->m_lblk,
map->m_len);
if (ret < 0)
retval = ret;
@@ -1683,7 +1683,7 @@ out_unlock:

if (retval ==0) {
down_write((&EXT4_I(inode)->i_data_sem));
- ret = ext4_de_add_space(inode, map->m_lblk, map->m_len);
+ ret = ext4_se_add_space(inode, map->m_lblk, map->m_len);
up_write((&EXT4_I(inode)->i_data_sem));
if (ret)
return ret;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e6e2415..d1a438b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -984,7 +984,7 @@ void ext4_clear_inode(struct inode *inode)
end_writeback(inode);
dquot_drop(inode);
ext4_discard_preallocations(inode);
- ext4_de_remove_space(inode, 0, EXT_MAX_BLOCKS);
+ ext4_se_remove_space(inode, 0, EXT_MAX_BLOCKS);
if (EXT4_I(inode)->jinode) {
jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode),
EXT4_I(inode)->jinode);
@@ -5068,7 +5068,7 @@ static int __init ext4_init_fs(void)
init_waitqueue_head(&ext4__ioend_wq[i]);
}

- err = ext4_init_de();
+ err = ext4_init_se();
if (err)
return err;

@@ -5128,7 +5128,7 @@ out6:
out7:
ext4_exit_pageio();
out8:
- ext4_exit_de();
+ ext4_exit_se();

return err;
}
--
1.7.1


2012-03-10 04:40:53

by Allison Henderson

[permalink] [raw]
Subject: [PATCH 2/6] Rename de patch2 from de to se scheme

This patch renames changes introduced by patch
"[PATCH_V3_2_6]_ext4_add_operations_on_delayed_extent_tree"

All code referencing "delayed extents" or "de" are changed to
"status extents" or "se"

Signed-off-by: Allison Henderson <[email protected]>
---
:100644 100644 ee16ad3... 55482c1... M fs/ext4/Makefile
:100644 000000 899fc74... 0000000... D fs/ext4/delayed_extents.c
:000000 100644 0000000... 50079af... A fs/ext4/status_extents.c
:100644 100644 cbf96ed... 415befd... M fs/ext4/status_extents.h
fs/ext4/Makefile | 2 +-
fs/ext4/delayed_extents.c | 415 --------------------------------------------
fs/ext4/status_extents.c | 417 +++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/status_extents.h | 18 ++
4 files changed, 436 insertions(+), 416 deletions(-)

diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index ee16ad3..55482c1 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_EXT4_FS) += ext4.o
ext4-y := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o page-io.o \
ioctl.o namei.o super.o symlink.o hash.o resize.o extents.o \
ext4_jbd2.o migrate.o mballoc.o block_validity.o move_extent.o \
- mmp.o indirect.o delayed_extents.o
+ mmp.o indirect.o status_extents.o

ext4-$(CONFIG_EXT4_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o
ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o
diff --git a/fs/ext4/delayed_extents.c b/fs/ext4/delayed_extents.c
deleted file mode 100644
index 899fc74..0000000
--- a/fs/ext4/delayed_extents.c
+++ /dev/null
@@ -1,415 +0,0 @@
-/*
- * fs/ext4/delayed_extents.c
- *
- * Written by Yongqiang Yang <[email protected]>
- *
- * Ext4 delayed extents core functions.
- */
-#include <linux/rbtree.h>
-#include "ext4.h"
-#include "delayed_extents.h"
-#include "ext4_extents.h"
-
-/*
- * delayed extent implementation for ext4.
- *
- *
- * ==========================================================================
- * 1. Why delayed extent implementation ?
- *
- * Without delayed extent, ext4 identifies a delayed extent by looking up
- * page cache, this has several deficiencies - complicated, buggy, and
- * inefficient code.
- *
- * FIEMAP, SEEK_HOLE/DATA, bigalloc, punch hole and writeout all need to know if
- * a block or a range of blocks are belonged to a delayed extent.
- *
- * Let us have a look at how they do without delayed extents implementation.
- * -- FIEMAP
- * FIEMAP looks up page cache to identify delayed allocations from holes.
- *
- * -- SEEK_HOLE/DATA
- * SEEK_HOLE/DATA has the same problem as FIEMAP.
- *
- * -- bigalloc
- * bigalloc looks up page cache to figure out if a block is already
- * under delayed allocation or not to determine whether quota reserving
- * is needed for the cluster.
- *
- * -- punch hole
- * punch hole looks up page cache to identify a delayed extent.
- *
- * -- writeout
- * Writeout looks up whole page cache to see if a buffer is mapped, If
- * there are not very many delayed buffers, then it is time comsuming.
- *
- * With delayed extents implementation, FIEMAP, SEEK_HOLE/DATA, bigalloc and
- * writeout can figure out if a block or a range of blocks is under delayed
- * allocation(belonged to a delayed extent) or not by searching the delayed
- * extent tree.
- *
- *
- * ==========================================================================
- * 2. ext4 delayed extents impelmentation
- *
- * -- delayed extent
- * A delayed extent is a range of blocks which are contiguous logically and
- * under delayed allocation. Unlike extent in ext4, delayed extent in ext4
- * is a in-memory struct, there is no corresponding on-disk data. There is
- * no limit on length of delayed extent, so a delayed extent can contain as
- * many blocks as they are contiguous logically.
- *
- * -- delayed extent tree
- * Every inode has a delayed extent tree and all under delayed allocation
- * blocks are added to the tree as dealyed extents. Delayed extents in
- * the tree are ordered by logical block no.
- *
- * -- operations on a delayed extent tree
- * There are three operations on a delayed extent tree: find next delayed
- * extent, adding a space(a range of blocks) and removing a space.
- *
- * -- race on a delayed extent tree
- * Delayed extent tree is protected inode->i_data_sem like extent tree.
- *
- *
- * ==========================================================================
- * 3. performance analysis
- * -- overhead
- * 1. Apart from operations on a delayed extent tree, we need to
- * down_write(inode->i_data_sem) in delayed write path to maintain delayed
- * extent tree, this can have impact on parallel read-write and write-write
- *
- * 2. There is a cache extent for write access, so if writes are not very
- * random, adding space operaions are in O(1) time.
- *
- * -- gain
- * 3. Code is much simpler, more readable, more maintainable and
- * more efficient.
- */
-
-static struct kmem_cache *ext4_de_cachep;
-
-int __init ext4_init_de(void)
-{
- ext4_de_cachep = KMEM_CACHE(delayed_extent, SLAB_RECLAIM_ACCOUNT);
- if (ext4_de_cachep == NULL)
- return -ENOMEM;
- return 0;
-}
-
-void ext4_exit_de(void)
-{
- if (ext4_de_cachep)
- kmem_cache_destroy(ext4_de_cachep);
-}
-
-void ext4_de_init_tree(struct ext4_de_tree *tree)
-{
- tree->root = RB_ROOT;
- tree->cache_de = NULL;
-}
-
-#ifdef DE_DEBUG
-static void ext4_de_print_tree(struct inode *inode)
-{
- struct ext4_de_tree *tree;
- struct rb_node *node;
-
- printk(KERN_DEBUG "delayed extents for inode %lu:", inode->i_ino);
- tree = &EXT4_I(inode)->i_de_tree;
- node = rb_first(&tree->root);
- while(node) {
- struct delayed_extent *de;
- de = rb_entry(node, struct delayed_extent, rb_node);
- printk(KERN_DEBUG " [%u/%u)", de->start, de->len);
- node = rb_next(node);
- }
- printk(KERN_DEBUG "\n");
-}
-#else
-#define ext4_de_print_tree(inode)
-#endif
-
-static inline ext4_lblk_t delayed_extent_end(struct delayed_extent *de)
-{
- if (de->start + de->len < de->start)
- return (ext4_lblk_t)-1;
- return de->start + de->len;
-}
-
-/*
- * search through the tree for an delayed_extent with a given offset. If
- * it can't be found, try to find next extent.
- */
-static struct delayed_extent * __de_tree_search(struct rb_root *root,
- ext4_lblk_t offset)
-{
- struct rb_node *node = root->rb_node;
- struct delayed_extent *de = NULL;
-
- while (node) {
- de = rb_entry(node, struct delayed_extent, rb_node);
- if (offset < de->start)
- node = node->rb_left;
- else if (offset >= delayed_extent_end(de))
- node = node->rb_right;
- else
- return de;
- }
-
- if (de && offset < de->start)
- return de;
-
- if (de && offset >= delayed_extent_end(de)) {
- node = rb_next(&de->rb_node);
- return node ? rb_entry(node, struct delayed_extent, rb_node) :
- NULL;
- }
-
- return NULL;
-}
-
-/*
- * ext4_de_find_extent: find the 1st delayed extent covering @de->start
- * if it exists, otherwise, the next extent after @de->start.
- *
- * @inode: the inode which owns delayed extents
- * @de:
- *
- * Returns next block beyond the found extent.
- * Delayed extent is returned via @de.
- */
-ext4_lblk_t ext4_de_find_extent(struct inode *inode, struct delayed_extent *de)
-{
- struct ext4_de_tree *tree;
- struct delayed_extent *de1;
- struct rb_node *node;
-
- de->len = 0;
- tree = &EXT4_I(inode)->i_de_tree;
- de1 = __de_tree_search(&tree->root, de->start);
- if (de1) {
- tree->cache_de = de1;
- de->start = de1->start;
- de->len = de1->len;
- node = rb_next(&de1->rb_node);
- if (node) {
- de1 = rb_entry(node, struct delayed_extent, rb_node);
- return de1->start;
- }
- }
-
- return EXT_MAX_BLOCKS;
-}
-
-static struct delayed_extent *
-ext4_de_alloc_extent(ext4_lblk_t start, ext4_lblk_t len)
-{
- struct delayed_extent *de;
- de = kmem_cache_alloc(ext4_de_cachep, GFP_NOFS);
- if (de == NULL)
- return NULL;
- de->start = start;
- de->len = len;
- return de;
-}
-
-static void ext4_de_free_extent(struct delayed_extent *de)
-{
- kmem_cache_free(ext4_de_cachep, de);
-}
-
-static void ext4_de_try_to_merge_left(struct ext4_de_tree *tree,
- struct delayed_extent *de)
-{
- struct delayed_extent *de1;
- struct rb_node *node;
-
- node = rb_prev(&de->rb_node);
- if (!node)
- return;
-
- de1 = rb_entry(node, struct delayed_extent, rb_node);
- if (delayed_extent_end(de1) == de->start) {
- de1->len += de->len;
- rb_erase(&de->rb_node, &tree->root);
- if (de == tree->cache_de)
- tree->cache_de = de1;
- ext4_de_free_extent(de);
- }
-}
-
-static void ext4_de_try_to_merge_right(struct ext4_de_tree *tree,
- struct delayed_extent *de)
-{
- struct delayed_extent *de1;
- struct rb_node *node;
-
- node = rb_next(&de->rb_node);
- if (!node)
- return;
-
- de1 = rb_entry(node, struct delayed_extent, rb_node);
- if (de1->start == delayed_extent_end(de)) {
- de->len += de1->len;
- rb_erase(node, &tree->root);
- if (de1 == tree->cache_de)
- tree->cache_de = de;
- ext4_de_free_extent(de1);
- }
-}
-
-/*
- * ext4_de_add_space: adds a space to a delayed extent tree.
- * Caller holds inode->i_data_sem.
- *
- * ext4_de_add_space is callyed by ext4_dealyed_write_begin and
- * ext4_de_remove_space.
- *
- * Return 0 on success, error code on failure.
- */
-int ext4_de_add_space(struct inode *inode, ext4_lblk_t offset, ext4_lblk_t len)
-{
- struct ext4_de_tree *tree = &EXT4_I(inode)->i_de_tree;
- struct rb_node **p = &tree->root.rb_node;
- struct rb_node *parent = NULL;
- struct delayed_extent *de;
- ext4_lblk_t end = offset + len;
-
- BUG_ON(end <= offset);
-
- de = tree->cache_de;
- de_debug("add [%u/%u) to delayed extent tree of inode %lu\n",
- offset, len, inode->i_ino);
-
- if (de && delayed_extent_end(de) == offset) {
- de_debug("cached by [%u/%u)\n", de->start, de->len);
- de->len += len;
- ext4_de_try_to_merge_right(tree, de);
- goto out;
- } else if (de && de->start == end) {
- de_debug("cached by [%u/%u)\n", de->start, de->len);
- de->start = offset;
- de->len += len;
- ext4_de_try_to_merge_left(tree, de);
- goto out;
- } else if (de && de->start <= offset &&
- delayed_extent_end(de) >= end) {
- de_debug("cached by [%u/%u)\n", de->start, de->len);
- goto out;
- }
-
- while (*p) {
- parent = *p;
- de = rb_entry(parent, struct delayed_extent, rb_node);
-
- if (offset < de->start) {
- if (end == de->start) {
- de->len += len;
- de->start = offset;
- goto out;
- }
- p = &(*p)->rb_left;
- } else if (offset >= delayed_extent_end(de)) {
- if (delayed_extent_end(de) == offset) {
- de->len += len;
- goto out;
- }
- p = &(*p)->rb_right;
- } else
- goto out;
- }
-
- de = ext4_de_alloc_extent(offset, len);
- if (!de)
- return -ENOMEM;
- rb_link_node(&de->rb_node, parent, p);
- rb_insert_color(&de->rb_node, &tree->root);
-
-out:
- tree->cache_de = de;
- ext4_de_print_tree(inode);
-
- return 0;
-}
-
-/*
- * ext4_de_remove_space() removes a space from a delayed extent tree.
- * Caller holds inode->i_data_sem.
- *
- * Return 0 on success, error code on failure.
- */
-int ext4_de_remove_space(struct inode *inode, ext4_lblk_t offset,
- ext4_lblk_t len)
-{
- struct rb_node *node;
- struct ext4_de_tree *tree;
- struct delayed_extent *de;
- struct delayed_extent orig_de;
- ext4_lblk_t len1, len2, end;
-
- de_debug("remove [%u/%u) from delayed extent tree of inode %lu\n",
- offset, len, inode->i_ino);
-
- end = offset + len;
- BUG_ON(end <= offset);
- tree = &EXT4_I(inode)->i_de_tree;
- de = __de_tree_search(&tree->root, offset);
- if (!de)
- goto out;
-
- /* Simply invalidate cache_de. */
- tree->cache_de = NULL;
-
- orig_de.start = de->start;
- orig_de.len = de->len;
- len1 = offset > de->start ? offset - de->start : 0;
- len2 = delayed_extent_end(de) > end ?
- delayed_extent_end(de) - end : 0;
- if (len1 > 0)
- de->len = len1;
- if (len2 > 0) {
- if (len1 > 0) {
- int err;
- err = ext4_de_add_space(inode, end, len2);
- if (err) {
- de->start = orig_de.start;
- de->len = orig_de.len;
- return err;
- }
- } else {
- de->start = end;
- de->len = len2;
- }
- goto out;
- }
-
- if (len1 > 0) {
- node = rb_next(&de->rb_node);
- if (!node)
- de = rb_entry(node, struct delayed_extent, rb_node);
- else
- de = NULL;
- }
-
- while (de && delayed_extent_end(de) <= end) {
- node = rb_next(&de->rb_node);
- rb_erase(&de->rb_node, &tree->root);
- ext4_de_free_extent(de);
- if (!node) {
- de = NULL;
- break;
- }
- de = rb_entry(node, struct delayed_extent, rb_node);
- }
-
- if (de && de->start < end) {
- len1 = delayed_extent_end(de) - end;
- de->start = end;
- de->len = len1;
- }
-
-out:
- ext4_de_print_tree(inode);
- return 0;
-}
diff --git a/fs/ext4/status_extents.c b/fs/ext4/status_extents.c
new file mode 100644
index 0000000..50079af
--- /dev/null
+++ b/fs/ext4/status_extents.c
@@ -0,0 +1,417 @@
+/*
+ * fs/ext4/status_extents.c
+ *
+ * Written by Yongqiang Yang <[email protected]>
+ *
+ * Ext4 status extents core functions.
+ */
+#include <linux/rbtree.h>
+#include "ext4.h"
+#include "status_extents.h"
+#include "ext4_extents.h"
+
+/*
+ * status extent implementation for ext4.
+ *
+ *
+ * ==========================================================================
+ * Status extents encompass delayed extents and extent locks
+ *
+ * 1. Why delayed extent implementation ?
+ *
+ * Without delayed extent, ext4 identifies a delayed extent by looking up
+ * page cache, this has several deficiencies - complicated, buggy, and
+ * inefficient code.
+ *
+ * FIEMAP, SEEK_HOLE/DATA, bigalloc, punch hole and writeout all need to know if
+ * a block or a range of blocks are belonged to a delayed extent.
+ *
+ * Let us have a look at how they do without delayed extents implementation.
+ * -- FIEMAP
+ * FIEMAP looks up page cache to identify delayed allocations from holes.
+ *
+ * -- SEEK_HOLE/DATA
+ * SEEK_HOLE/DATA has the same problem as FIEMAP.
+ *
+ * -- bigalloc
+ * bigalloc looks up page cache to figure out if a block is already
+ * under delayed allocation or not to determine whether quota reserving
+ * is needed for the cluster.
+ *
+ * -- punch hole
+ * punch hole looks up page cache to identify a delayed extent.
+ *
+ * -- writeout
+ * Writeout looks up whole page cache to see if a buffer is mapped, If
+ * there are not very many delayed buffers, then it is time comsuming.
+ *
+ * With delayed extents implementation, FIEMAP, SEEK_HOLE/DATA, bigalloc and
+ * writeout can figure out if a block or a range of blocks is under delayed
+ * allocation(belonged to a delayed extent) or not by searching the delayed
+ * extent tree.
+ *
+ *
+ * ==========================================================================
+ * 2. ext4 delayed extents impelmentation
+ *
+ * -- delayed extent
+ * A delayed extent is a range of blocks which are contiguous logically and
+ * under delayed allocation. Unlike extent in ext4, delayed extent in ext4
+ * is a in-memory struct, there is no corresponding on-disk data. There is
+ * no limit on length of delayed extent, so a delayed extent can contain as
+ * many blocks as they are contiguous logically.
+ *
+ * -- delayed extent tree
+ * Every inode has a delayed extent tree and all under delayed allocation
+ * blocks are added to the tree as dealyed extents. Delayed extents in
+ * the tree are ordered by logical block no.
+ *
+ * -- operations on a delayed extent tree
+ * There are three operations on a delayed extent tree: find next delayed
+ * extent, adding a space(a range of blocks) and removing a space.
+ *
+ * -- race on a delayed extent tree
+ * Delayed extent tree is protected inode->i_data_sem like extent tree.
+ *
+ *
+ * ==========================================================================
+ * 3. performance analysis
+ * -- overhead
+ * 1. Apart from operations on a delayed extent tree, we need to
+ * down_write(inode->i_data_sem) in delayed write path to maintain delayed
+ * extent tree, this can have impact on parallel read-write and write-write
+ *
+ * 2. There is a cache extent for write access, so if writes are not very
+ * random, adding space operaions are in O(1) time.
+ *
+ * -- gain
+ * 3. Code is much simpler, more readable, more maintainable and
+ * more efficient.
+ */
+
+static struct kmem_cache *ext4_se_cachep;
+
+int __init ext4_init_se(void)
+{
+ ext4_se_cachep = KMEM_CACHE(status_extent, SLAB_RECLAIM_ACCOUNT);
+ if (ext4_se_cachep == NULL)
+ return -ENOMEM;
+ return 0;
+}
+
+void ext4_exit_se(void)
+{
+ if (ext4_se_cachep)
+ kmem_cache_destroy(ext4_se_cachep);
+}
+
+void ext4_se_init_tree(struct ext4_se_tree *tree)
+{
+ tree->root = RB_ROOT;
+ tree->cache_se = NULL;
+}
+
+#ifdef SE_DEBUG
+static void ext4_se_print_tree(struct inode *inode)
+{
+ struct ext4_se_tree *tree;
+ struct rb_node *node;
+
+ printk(KERN_DEBUG "status extents for inode %lu:", inode->i_ino);
+ tree = &EXT4_I(inode)->i_se_tree;
+ node = rb_first(&tree->root);
+ while(node) {
+ struct status_extent *se;
+ se = rb_entry(node, struct status_extent, rb_node);
+ printk(KERN_DEBUG " [%u/%u)", se->start, se->len);
+ node = rb_next(node);
+ }
+ printk(KERN_DEBUG "\n");
+}
+#else
+#define ext4_se_print_tree(inode)
+#endif
+
+static inline ext4_lblk_t status_extent_end(struct status_extent *se)
+{
+ if (se->start + se->len < se->start)
+ return (ext4_lblk_t)-1;
+ return se->start + se->len;
+}
+
+/*
+ * search through the tree for an delayed_extent with a given offset. If
+ * it can't be found, try to find next extent.
+ */
+static struct status_extent * __se_tree_search(struct rb_root *root,
+ ext4_lblk_t offset)
+{
+ struct rb_node *node = root->rb_node;
+ struct status_extent *se = NULL;
+
+ while (node) {
+ se = rb_entry(node, struct status_extent, rb_node);
+ if (offset < se->start)
+ node = node->rb_left;
+ else if (offset >= status_extent_end(se))
+ node = node->rb_right;
+ else
+ return se;
+ }
+
+ if (se && offset < se->start)
+ return se;
+
+ if (se && offset >= status_extent_end(se)) {
+ node = rb_next(&se->rb_node);
+ return node ? rb_entry(node, struct status_extent, rb_node) :
+ NULL;
+ }
+
+ return NULL;
+}
+
+/*
+ * ext4_se_find_extent: find the 1st delayed extent covering @se->start
+ * if it exists, otherwise, the next extent after @se->start.
+ *
+ * @inode: the inode which owns delayed extents
+ * @se:
+ *
+ * Returns next block beyond the found extent.
+ * Delayed extent is returned via @se.
+ */
+ext4_lblk_t ext4_se_find_extent(struct inode *inode, struct status_extent *se)
+{
+ struct ext4_se_tree *tree;
+ struct status_extent *se1;
+ struct rb_node *node;
+
+ se->len = 0;
+ tree = &EXT4_I(inode)->i_se_tree;
+ se1 = __se_tree_search(&tree->root, se->start);
+ if (se1) {
+ tree->cache_se = se1;
+ se->start = se1->start;
+ se->len = se1->len;
+ node = rb_next(&se1->rb_node);
+ if (node) {
+ se1 = rb_entry(node, struct status_extent, rb_node);
+ return se1->start;
+ }
+ }
+
+ return EXT_MAX_BLOCKS;
+}
+
+static struct status_extent *
+ext4_se_alloc_extent(ext4_lblk_t start, ext4_lblk_t len)
+{
+ struct status_extent *se;
+ se = kmem_cache_alloc(ext4_se_cachep, GFP_NOFS);
+ if (se == NULL)
+ return NULL;
+ se->start = start;
+ se->len = len;
+ return se;
+}
+
+static void ext4_se_free_extent(struct status_extent *se)
+{
+ kmem_cache_free(ext4_se_cachep, se);
+}
+
+static void ext4_se_try_to_merge_left(struct ext4_se_tree *tree,
+ struct status_extent *se)
+{
+ struct status_extent *se1;
+ struct rb_node *node;
+
+ node = rb_prev(&se->rb_node);
+ if (!node)
+ return;
+
+ se1 = rb_entry(node, struct status_extent, rb_node);
+ if (status_extent_end(se1) == se->start) {
+ se1->len += se->len;
+ rb_erase(&se->rb_node, &tree->root);
+ if (se == tree->cache_se)
+ tree->cache_se = se1;
+ ext4_se_free_extent(se);
+ }
+}
+
+static void ext4_se_try_to_merge_right(struct ext4_se_tree *tree,
+ struct status_extent *se)
+{
+ struct status_extent *se1;
+ struct rb_node *node;
+
+ node = rb_next(&se->rb_node);
+ if (!node)
+ return;
+
+ se1 = rb_entry(node, struct status_extent, rb_node);
+ if (se1->start == status_extent_end(se)) {
+ se->len += se1->len;
+ rb_erase(node, &tree->root);
+ if (se1 == tree->cache_se)
+ tree->cache_se = se;
+ ext4_se_free_extent(se1);
+ }
+}
+
+/*
+ * ext4_se_add_space: adds a space to a delayed extent tree.
+ * Caller holds inode->i_data_sem.
+ *
+ * ext4_se_add_space is callyed by ext4_dealyed_write_begin and
+ * ext4_se_remove_space.
+ *
+ * Return 0 on success, error code on failure.
+ */
+int ext4_se_add_space(struct inode *inode, ext4_lblk_t offset, ext4_lblk_t len)
+{
+ struct ext4_se_tree *tree = &EXT4_I(inode)->i_se_tree;
+ struct rb_node **p = &tree->root.rb_node;
+ struct rb_node *parent = NULL;
+ struct status_extent *se;
+ ext4_lblk_t end = offset + len;
+
+ BUG_ON(end <= offset);
+
+ se = tree->cache_se;
+ se_debug("add [%u/%u) to status extent tree of inode %lu\n",
+ offset, len, inode->i_ino);
+
+ if (se && status_extent_end(se) == offset) {
+ se_debug("cached by [%u/%u)\n", se->start, se->len);
+ se->len += len;
+ ext4_se_try_to_merge_right(tree, se);
+ goto out;
+ } else if (se && se->start == end) {
+ se_debug("cached by [%u/%u)\n", se->start, se->len);
+ se->start = offset;
+ se->len += len;
+ ext4_se_try_to_merge_left(tree, se);
+ goto out;
+ } else if (se && se->start <= offset &&
+ status_extent_end(se) >= end) {
+ se_debug("cached by [%u/%u)\n", se->start, se->len);
+ goto out;
+ }
+
+ while (*p) {
+ parent = *p;
+ se = rb_entry(parent, struct status_extent, rb_node);
+
+ if (offset < se->start) {
+ if (end == se->start) {
+ se->len += len;
+ se->start = offset;
+ goto out;
+ }
+ p = &(*p)->rb_left;
+ } else if (offset >= status_extent_end(se)) {
+ if (status_extent_end(se) == offset) {
+ se->len += len;
+ goto out;
+ }
+ p = &(*p)->rb_right;
+ } else
+ goto out;
+ }
+
+ se = ext4_se_alloc_extent(offset, len);
+ if (!se)
+ return -ENOMEM;
+ rb_link_node(&se->rb_node, parent, p);
+ rb_insert_color(&se->rb_node, &tree->root);
+
+out:
+ tree->cache_se = se;
+ ext4_se_print_tree(inode);
+
+ return 0;
+}
+
+/*
+ * ext4_se_remove_space() removes a space from a delayed extent tree.
+ * Caller holds inode->i_data_sem.
+ *
+ * Return 0 on success, error code on failure.
+ */
+int ext4_se_remove_space(struct inode *inode, ext4_lblk_t offset,
+ ext4_lblk_t len)
+{
+ struct rb_node *node;
+ struct ext4_se_tree *tree;
+ struct status_extent *se;
+ struct status_extent orig_se;
+ ext4_lblk_t len1, len2, end;
+
+ se_debug("remove [%u/%u) from status extent tree of inode %lu\n",
+ offset, len, inode->i_ino);
+
+ end = offset + len;
+ BUG_ON(end <= offset);
+ tree = &EXT4_I(inode)->i_se_tree;
+ se = __se_tree_search(&tree->root, offset);
+ if (!se)
+ goto out;
+
+ /* Simply invalidate cache_se. */
+ tree->cache_se = NULL;
+
+ orig_se.start = se->start;
+ orig_se.len = se->len;
+ len1 = offset > se->start ? offset - se->start : 0;
+ len2 = status_extent_end(se) > end ?
+ status_extent_end(se) - end : 0;
+ if (len1 > 0)
+ se->len = len1;
+ if (len2 > 0) {
+ if (len1 > 0) {
+ int err;
+ err = ext4_se_add_space(inode, end, len2);
+ if (err) {
+ se->start = orig_se.start;
+ se->len = orig_se.len;
+ return err;
+ }
+ } else {
+ se->start = end;
+ se->len = len2;
+ }
+ goto out;
+ }
+
+ if (len1 > 0) {
+ node = rb_next(&se->rb_node);
+ if (!node)
+ se = rb_entry(node, struct status_extent, rb_node);
+ else
+ se = NULL;
+ }
+
+ while (se && status_extent_end(se) <= end) {
+ node = rb_next(&se->rb_node);
+ rb_erase(&se->rb_node, &tree->root);
+ ext4_se_free_extent(se);
+ if (!node) {
+ se = NULL;
+ break;
+ }
+ se = rb_entry(node, struct status_extent, rb_node);
+ }
+
+ if (se && se->start < end) {
+ len1 = status_extent_end(se) - end;
+ se->start = end;
+ se->len = len1;
+ }
+
+out:
+ ext4_se_print_tree(inode);
+ return 0;
+}
diff --git a/fs/ext4/status_extents.h b/fs/ext4/status_extents.h
index cbf96ed..415befd 100644
--- a/fs/ext4/status_extents.h
+++ b/fs/ext4/status_extents.h
@@ -8,6 +8,13 @@
#ifndef _EXT4_STATUS_EXTENTS_H
#define _EXT4_STATUS_EXTENTS_H

+#define SE_DEBUG
+#ifdef SE_DEBUG
+#define se_debug(a...) printk(a)
+#else
+#define se_debug(a...)
+#endif
+
struct status_extent {
struct rb_node rb_node;
ext4_lblk_t start; /* first block extent covers */
@@ -19,4 +26,15 @@ struct ext4_se_tree {
struct status_extent *cache_se; /* recently accessed extent */
};

+extern int __init ext4_init_se(void);
+extern void ext4_exit_se(void);
+extern void ext4_se_init_tree(struct ext4_se_tree *tree);
+
+extern int ext4_se_add_space(struct inode *inode, ext4_lblk_t start,
+ ext4_lblk_t len);
+extern int ext4_se_remove_space(struct inode *inode, ext4_lblk_t start,
+ ext4_lblk_t len);
+extern ext4_lblk_t ext4_se_find_extent(struct inode *inode,
+ struct status_extent *se);
+
#endif /* _EXT4_STATUS_EXTENTS_H */
--
1.7.1


2012-03-10 04:40:55

by Allison Henderson

[permalink] [raw]
Subject: [PATCH 6/6] Rename de patch6 from de to se scheme

This patch renames changes introduced by patch
"[PATCH_V3_6_6]_ext4_reimplement_ext4_find_delay_alloc_range_on_delayed_extent_tree"

All code referencing "delayed extents" or "de" are changed to
"status extents" or "se"

Signed-off-by: Allison Henderson <[email protected]>
---
:100644 100644 3f76625... bfcbde3... M fs/ext4/extents.c
fs/ext4/extents.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 3f76625..bfcbde3 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3268,11 +3268,11 @@ static int ext4_find_delalloc_range(struct inode *inode,
ext4_lblk_t lblk_start,
ext4_lblk_t lblk_end)
{
- struct delayed_extent de;
+ struct status_extent se;

- de.start = lblk_start;
- ext4_de_find_extent(inode, &de);
- return (de.start + de.len) >= lblk_end && de.start <= lblk_start;
+ se.start = lblk_start;
+ ext4_se_find_extent(inode, &se);
+ return (se.start + se.len) >= lblk_end && se.start <= lblk_start;
}

int ext4_find_delalloc_cluster(struct inode *inode, ext4_lblk_t lblk)
--
1.7.1


2012-03-10 04:40:55

by Allison Henderson

[permalink] [raw]
Subject: [PATCH 5/6] Rename de patch5 from de to se scheme

This patch renames changes introduced by patch
"[PATCH_V3_5_6\]_ext4_reimplement_fiemap_on_delayed_extent_tree"

All code referencing "delayed extents" or "de" are changed to
"status extents" or "se"

Signed-off-by: Allison Henderson <[email protected]>
---
:100644 100644 a8b382e... 3f76625... M fs/ext4/extents.c
fs/ext4/extents.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index a8b382e..3f76625 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4363,7 +4363,7 @@ static int ext4_ext_fiemap_cb(struct inode *inode, ext4_lblk_t next,
struct ext4_ext_cache *newex, struct ext4_extent *ex,
void *data)
{
- struct delayed_extent de;
+ struct status_extent se;
__u64 logical;
__u64 physical;
__u64 length;
@@ -4373,9 +4373,9 @@ static int ext4_ext_fiemap_cb(struct inode *inode, ext4_lblk_t next,
struct fiemap_extent_info *fieinfo = data;
unsigned char blksize_bits;

- de.start = newex->ec_block;
+ se.start = newex->ec_block;
down_read(&EXT4_I(inode)->i_data_sem);
- next_del = ext4_de_find_extent(inode, &de);
+ next_del = ext4_se_find_extent(inode, &se);
up_read(&EXT4_I(inode)->i_data_sem);

next = min(next_del, next);
@@ -4384,19 +4384,19 @@ static int ext4_ext_fiemap_cb(struct inode *inode, ext4_lblk_t next,
* No extent in extent-tree contains block @newex->ec_start,
* then the block may stay in 1)a hole or 2)delayed-extent.
*/
- if (de.len == 0)
+ if (se.len == 0)
/* A hole found. */
return EXT_CONTINUE;

- if (de.start > newex->ec_block) {
+ if (se.start > newex->ec_block) {
/* A hole found. */
- newex->ec_len = min(de.start - newex->ec_block,
+ newex->ec_len = min(se.start - newex->ec_block,
newex->ec_len);
return EXT_CONTINUE;
}

flags |= FIEMAP_EXTENT_DELALLOC;
- newex->ec_len = de.start + de.len - newex->ec_block;
+ newex->ec_len = se.start + se.len - newex->ec_block;
}

if (ex && ext4_ext_is_uninitialized(ex))
--
1.7.1


2012-03-10 04:41:18

by Allison Henderson

[permalink] [raw]
Subject: [PATCH 1/6] Rename de patch1 from de to se scheme

This patch renames changes introduced by patch
"[PATCH_V3_1_6]ext4_add_two_structures_supporting_delayed_extent_tree"

All code referencing "delayed extents" or "de" are changed to
"status extents" or "se"

Signed-off-by: Allison Henderson <[email protected]>
---
:100644 000000 cfd122e... 0000000... D fs/ext4/delayed_extents.h
:100644 100644 f908208... 896f2fb... M fs/ext4/ext4.h
:000000 100644 0000000... cbf96ed... A fs/ext4/status_extents.h
fs/ext4/delayed_extents.h | 40 ----------------------------------------
fs/ext4/ext4.h | 4 ++--
fs/ext4/status_extents.h | 22 ++++++++++++++++++++++
3 files changed, 24 insertions(+), 42 deletions(-)

diff --git a/fs/ext4/delayed_extents.h b/fs/ext4/delayed_extents.h
deleted file mode 100644
index cfd122e..0000000
--- a/fs/ext4/delayed_extents.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/*
- * fs/ext4/delayed_extents.h
- *
- * Written by Yongqiang Yang <[email protected]>
- *
- */
-
-#ifndef _EXT4_DELAYED_EXTENTS_H
-#define _EXT4_DELAYED_EXTENTS_H
-
-#define DE_DEBUG
-#ifdef DE_DEBUG
-#define de_debug(a...) printk(a)
-#else
-#define de_debug(a...)
-#endif
-
-struct delayed_extent {
- struct rb_node rb_node;
- ext4_lblk_t start; /* first block extent covers */
- ext4_lblk_t len; /* length of extent in block */
-};
-
-struct ext4_de_tree {
- struct rb_root root;
- struct delayed_extent *cache_de; /* recently accessed extent */
-};
-
-extern int __init ext4_init_de(void);
-extern void ext4_exit_de(void);
-extern void ext4_de_init_tree(struct ext4_de_tree *tree);
-
-extern int ext4_de_add_space(struct inode *inode, ext4_lblk_t start,
- ext4_lblk_t len);
-extern int ext4_de_remove_space(struct inode *inode, ext4_lblk_t start,
- ext4_lblk_t len);
-extern ext4_lblk_t ext4_de_find_extent(struct inode *inode,
- struct delayed_extent *de);
-
-#endif /* _EXT4_DELAYED_EXTENTS_H */
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f908208..896f2fb 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -782,7 +782,7 @@ struct ext4_ext_cache {
__u32 ec_len; /* must be 32bit to return holes */
};

-#include "delayed_extents.h"
+#include "status_extents.h"

/*
* fourth extended file system inode data in memory
@@ -862,7 +862,7 @@ struct ext4_inode_info {
spinlock_t i_prealloc_lock;

/* delayed extents */
- struct ext4_de_tree i_de_tree;
+ struct ext4_se_tree i_se_tree;

/* ialloc */
ext4_group_t i_last_alloc_group;
diff --git a/fs/ext4/status_extents.h b/fs/ext4/status_extents.h
new file mode 100644
index 0000000..cbf96ed
--- /dev/null
+++ b/fs/ext4/status_extents.h
@@ -0,0 +1,22 @@
+/*
+ * fs/ext4/status_extents.h
+ *
+ * Written by Yongqiang Yang <[email protected]>
+ *
+ */
+
+#ifndef _EXT4_STATUS_EXTENTS_H
+#define _EXT4_STATUS_EXTENTS_H
+
+struct status_extent {
+ struct rb_node rb_node;
+ ext4_lblk_t start; /* first block extent covers */
+ ext4_lblk_t len; /* length of extent in block */
+};
+
+struct ext4_se_tree {
+ struct rb_root root;
+ struct status_extent *cache_se; /* recently accessed extent */
+};
+
+#endif /* _EXT4_STATUS_EXTENTS_H */
--
1.7.1