2012-08-22 06:05:37

by Zheng Liu

[permalink] [raw]
Subject: [RFC][PATCH 0/8 v2] ext4: extent status tree (step 1)

Hi all,

Here is the v2 of extent status tree. In this version, the biggest change is
that the patch set is divided into two parts. One is in this patch set, which
contains the mature code, and another is about punching hole improvement, which
will be sent out later. One reason is that I hope these mature codes can be
reviewed and applied ASAP. Another reason is that the patch set of punching
hole improvement is more relative indenpendent, and Lukas has finished some
works on it. Meanwhile thses patches are still under disscussion.

Extent status tree is a rb-tree per-inode, which records the delay extent in
memory. After adding this tree, there are some problems that can be fixed in
ext4, such as simplifing bigalloc quota reservation and fiemap implementation,
adding SEEK_DATA/SEEK_HOLE support.

In patch 1-5, extent status tree is added in order to identify a dealy extent.

In patch 6, Fiemap is improved. Now we only need to lookup extent status tree
to find a delay extent rather than looking up page cache.

In patch 7, ext4_find_delay_alloc_range is improved using extent status tree to
avoid to lookup page cache.

In patch 8, We introduce lseek SEEK_DATA/SEEK_HOLE support.

v2 <- v1:
- add a BUG_ON to do a sanity check in extent_status_end
- fix off-by-one problem in extent_status_end accroding to Lukas's comment
- add more detailed comments in ext4_es_find_extent
- try to lookup in extent tree cache firstly in ext4_es_find_extent
- rename ext4_es_add_space to ext4_es_insert_extent
- rename ext4_es_remove_space to ext4_es_remove_extent

The first version is in this link:
1. http://www.spinics.net/lists/linux-ext4/msg33101.html

Any feedbacks are appreciated. Thanks!

Regards,
Zheng
---
Zheng Liu (8):
ext4: add two structures supporting extent status tree
ext4: add operations on extent status tree
ext4: initialize extent status tree
ext4: let ext4 maintain extent status tree
ext4: add some tracepoints in extent status tree
ext4: reimplement fiemap on extent status tree
ext4: reimplement ext4_find_delay_alloc_range on extent status tree
ext4: introduce lseek SEEK_DATA/SEEK_HOLE support

fs/ext4/Makefile | 2 +-
fs/ext4/ext4.h | 10 +-
fs/ext4/ext4_extents.h | 3 +-
fs/ext4/extents.c | 302 +++++--------------------------------
fs/ext4/extents_status.c | 442 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/extents_status.h | 45 ++++++
fs/ext4/file.c | 153 ++++++++++++++++++-
fs/ext4/indirect.c | 4 +-
fs/ext4/inode.c | 83 ++++-------
fs/ext4/super.c | 13 +-
include/trace/events/ext4.h | 101 +++++++++++++
11 files changed, 830 insertions(+), 328 deletions(-)


2012-08-22 05:55:52

by Zheng Liu

[permalink] [raw]
Subject: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

From: Zheng Liu <[email protected]>

Let ext4 initialize extent status tree of an inode.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: Allison Henderson <[email protected]>
Signed-off-by: Zheng Liu <[email protected]>
---
fs/ext4/super.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3e0851e..353b1fd 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -944,6 +944,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache));
INIT_LIST_HEAD(&ei->i_prealloc_list);
spin_lock_init(&ei->i_prealloc_lock);
+ ext4_es_init_tree(&ei->i_es_tree);
ei->i_reserved_data_blocks = 0;
ei->i_reserved_meta_blocks = 0;
ei->i_allocated_meta_blocks = 0;
--
1.7.4.1


2012-08-22 06:05:42

by Zheng Liu

[permalink] [raw]
Subject: [RFC][PATCH 5/8 v2] ext4: add some tracepoints in extent status tree

From: Zheng Liu <[email protected]>

This patch adds some tracepoints in extent status tree.

Signed-off-by: Zheng Liu <[email protected]>
---
fs/ext4/extents_status.c | 8 +++
include/trace/events/ext4.h | 101 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 109 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index ccb7e2b..12336b1 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -13,6 +13,8 @@
#include "extents_status.h"
#include "ext4_extents.h"

+#include <trace/events/ext4.h>
+
/*
* extents status tree implementation for ext4.
*
@@ -191,6 +193,8 @@ ext4_lblk_t ext4_es_find_extent(struct inode *inode, struct extent_status *es)
struct rb_node *node;
ext4_lblk_t ret = EXT_MAX_BLOCKS;

+ trace_ext4_es_find_extent_enter(inode, es->start);
+
tree = &EXT4_I(inode)->i_es_tree;

/* find delay extent in cache */
@@ -218,6 +222,8 @@ out:
}
}

+ trace_ext4_es_find_extent_exit(inode, es, ret);
+
return ret;
}

@@ -297,6 +303,7 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t offset,
ext4_lblk_t end = offset + len - 1;

BUG_ON(end < offset);
+ trace_ext4_es_insert_extent(inode, offset, len);

es = tree->cache_es;
es_debug("add [%u/%u) to extent status tree of inode %lu\n",
@@ -367,6 +374,7 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t offset,
struct extent_status orig_es;
ext4_lblk_t len1, len2, end;

+ trace_ext4_es_remove_extent(inode, offset, len);
es_debug("remove [%u/%u) from extent status tree of inode %lu\n",
offset, len, inode->i_ino);

diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 69d8a69..31730f8 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -15,6 +15,7 @@ struct ext4_inode_info;
struct mpage_da_data;
struct ext4_map_blocks;
struct ext4_extent;
+struct extent_status;

#define EXT4_I(inode) (container_of(inode, struct ext4_inode_info, vfs_inode))

@@ -2055,6 +2056,106 @@ TRACE_EVENT(ext4_ext_remove_space_done,
(unsigned short) __entry->eh_entries)
);

+TRACE_EVENT(ext4_es_insert_extent,
+ TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t len),
+
+ TP_ARGS(inode, start, len),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( ino_t, ino )
+ __field( loff_t, start )
+ __field( loff_t, len )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->start = start;
+ __entry->len = len;
+ ),
+
+ TP_printk("dev %d,%d ino %lu es [%lld/%lld)",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino,
+ __entry->start, __entry->len)
+);
+
+TRACE_EVENT(ext4_es_remove_extent,
+ TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t len),
+
+ TP_ARGS(inode, start, len),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( ino_t, ino )
+ __field( loff_t, start )
+ __field( loff_t, len )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->start = start;
+ __entry->len = len;
+ ),
+
+ TP_printk("dev %d,%d ino %lu es [%lld/%lld)",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino,
+ __entry->start, __entry->len)
+);
+
+TRACE_EVENT(ext4_es_find_extent_enter,
+ TP_PROTO(struct inode *inode, ext4_lblk_t start),
+
+ TP_ARGS(inode, start),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( ino_t, ino )
+ __field( ext4_lblk_t, start )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->start = start;
+ ),
+
+ TP_printk("dev %d,%d ino %lu start %u",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino, __entry->start)
+);
+
+TRACE_EVENT(ext4_es_find_extent_exit,
+ TP_PROTO(struct inode *inode, struct extent_status *es,
+ ext4_lblk_t ret),
+
+ TP_ARGS(inode, es, ret),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( ino_t, ino )
+ __field( ext4_lblk_t, start )
+ __field( ext4_lblk_t, len )
+ __field( ext4_lblk_t, ret )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->start = es->start;
+ __entry->len = es->len;
+ __entry->ret = ret;
+ ),
+
+ TP_printk("dev %d,%d ino %lu es [%u/%u) ret %u",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino,
+ __entry->start, __entry->len, __entry->ret)
+);
+
#endif /* _TRACE_EXT4_H */

/* This part must be outside protection */
--
1.7.4.1


2012-08-22 05:55:47

by Zheng Liu

[permalink] [raw]
Subject: [RFC][PATCH 1/8 v2] ext4: add two structures supporting extent status tree

From: Zheng Liu <[email protected]>

This patch adds two structures that supports status extent tree and
status_extent_info to inode_info.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: Allison Henderson <[email protected]>
Signed-off-by: Zheng Liu <[email protected]>
---
fs/ext4/ext4.h | 5 +++++
fs/ext4/extents_status.h | 25 +++++++++++++++++++++++++
2 files changed, 30 insertions(+), 0 deletions(-)
create mode 100644 fs/ext4/extents_status.h

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index c3411d4..66c592e 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -812,6 +812,8 @@ struct ext4_ext_cache {
__u32 ec_len; /* must be 32bit to return holes */
};

+#include "extents_status.h"
+
/*
* fourth extended file system inode data in memory
*/
@@ -889,6 +891,9 @@ struct ext4_inode_info {
struct list_head i_prealloc_list;
spinlock_t i_prealloc_lock;

+ /* extents status tree */
+ struct ext4_es_tree i_es_tree;
+
/* ialloc */
ext4_group_t i_last_alloc_group;

diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
new file mode 100644
index 0000000..8be2ab9
--- /dev/null
+++ b/fs/ext4/extents_status.h
@@ -0,0 +1,25 @@
+/*
+ * fs/ext4/extents_status.h
+ *
+ * Written by Yongqiang Yang <[email protected]>
+ * Modified by
+ * Allison Henderson <[email protected]>
+ * Zheng Liu <[email protected]>
+ *
+ */
+
+#ifndef _EXT4_EXTENTS_STATUS_H
+#define _EXT4_EXTENTS_STATUS_H
+
+struct extent_status {
+ struct rb_node rb_node;
+ ext4_lblk_t start; /* first block extent covers */
+ ext4_lblk_t len; /* length of extent in block */
+};
+
+struct ext4_es_tree {
+ struct rb_root root;
+ struct extent_status *cache_es; /* recently accessed extent */
+};
+
+#endif /* _EXT4_EXTENTS_STATUS_H */
--
1.7.4.1


2012-08-22 05:55:54

by Zheng Liu

[permalink] [raw]
Subject: [RFC][PATCH 4/8 v2] ext4: let ext4 maintain extent status tree

From: Zheng Liu <[email protected]>

This patch lets ext4 maintain extent status tree.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: Allison Henderson <[email protected]>
Signed-off-by: Zheng Liu <[email protected]>
---
fs/ext4/extents.c | 4 ++++
fs/ext4/indirect.c | 3 +++
fs/ext4/inode.c | 30 ++++++++++++++++++++++++++++--
fs/ext4/super.c | 12 +++++++++++-
4 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index cd0c7ed..bf5ece2 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4281,6 +4281,8 @@ void ext4_ext_truncate(struct inode *inode)

last_block = (inode->i_size + sb->s_blocksize - 1)
>> EXT4_BLOCK_SIZE_BITS(sb);
+ err = ext4_es_remove_extent(inode, last_block,
+ EXT_MAX_BLOCKS - last_block);
err = ext4_ext_remove_space(inode, last_block, EXT_MAX_BLOCKS - 1);

/* In a multi-transaction truncate, we only make the final
@@ -4891,6 +4893,8 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
ext4_ext_invalidate_cache(inode);
ext4_discard_preallocations(inode);

+ err = ext4_es_remove_extent(inode, first_block,
+ stop_block - first_block);
err = ext4_ext_remove_space(inode, first_block, stop_block - 1);

ext4_ext_invalidate_cache(inode);
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 830e1b2..9db67b1 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -21,6 +21,7 @@
*/

#include "ext4_jbd2.h"
+#include "ext4_extents.h"
#include "truncate.h"

#include <trace/events/ext4.h>
@@ -1398,6 +1399,8 @@ void ext4_ind_truncate(struct inode *inode)
down_write(&ei->i_data_sem);

ext4_discard_preallocations(inode);
+ ext4_es_remove_extent(inode, last_block,
+ EXT_MAX_BLOCKS - last_block);

/*
* The orphan list entry will now protect us from any crash which
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index dff171c..aa04aa6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -574,7 +574,16 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
up_read((&EXT4_I(inode)->i_data_sem));

if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
- int ret = check_block_validity(inode, map);
+ int ret;
+ if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
+ /* delayed alloc may be allocated by fallocate and
+ * coverted to initialized by directIO.
+ * we need to handle delayed extent here.
+ */
+ down_write((&EXT4_I(inode)->i_data_sem));
+ goto delayed_mapped;
+ }
+ ret = check_block_validity(inode, map);
if (ret != 0)
return ret;
}
@@ -656,8 +665,16 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
* set the BH_Da_Mapped bit on them. Its important to do this
* under the protection of i_data_sem.
*/
- if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED)
+ if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
+ int ret;
set_buffers_da_mapped(inode, map);
+delayed_mapped:
+ /* delayed allocation blocks has been allocated */
+ ret = ext4_es_remove_extent(inode, map->m_lblk,
+ map->m_len);
+ if (ret < 0)
+ retval = ret;
+ }
}

up_write((&EXT4_I(inode)->i_data_sem));
@@ -1779,6 +1796,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
struct buffer_head *bh)
{
int retval;
+ int ret;
sector_t invalid_block = ~((sector_t) 0xffff);

if (invalid_block < ext4_blocks_count(EXT4_SB(inode->i_sb)->s_es))
@@ -1825,6 +1843,14 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
out_unlock:
up_read((&EXT4_I(inode)->i_data_sem));

+ if (retval == 0) {
+ down_write((&EXT4_I(inode)->i_data_sem));
+ ret = ext4_es_insert_extent(inode, map->m_lblk, map->m_len);
+ up_write((&EXT4_I(inode)->i_data_sem));
+ if (ret)
+ return ret;
+ }
+
return retval;
}

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 353b1fd..9cbda38 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -50,6 +50,7 @@
#include "xattr.h"
#include "acl.h"
#include "mballoc.h"
+#include "ext4_extents.h"

#define CREATE_TRACE_POINTS
#include <trace/events/ext4.h>
@@ -1028,6 +1029,7 @@ void ext4_clear_inode(struct inode *inode)
clear_inode(inode);
dquot_drop(inode);
ext4_discard_preallocations(inode);
+ ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
if (EXT4_I(inode)->jinode) {
jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode),
EXT4_I(inode)->jinode);
@@ -5252,9 +5254,14 @@ static int __init ext4_init_fs(void)
init_waitqueue_head(&ext4__ioend_wq[i]);
}

- err = ext4_init_pageio();
+ err = ext4_init_es();
if (err)
return err;
+
+ err = ext4_init_pageio();
+ if (err)
+ goto out7;
+
err = ext4_init_system_zone();
if (err)
goto out6;
@@ -5302,6 +5309,9 @@ out5:
ext4_exit_system_zone();
out6:
ext4_exit_pageio();
+out7:
+ ext4_exit_es();
+
return err;
}

--
1.7.4.1


2012-08-22 05:55:49

by Zheng Liu

[permalink] [raw]
Subject: [RFC][PATCH 2/8 v2] ext4: add operations on extent status tree

From: Zheng Liu <[email protected]>

This patch adds operations on a extent status tree.

CC: Lukas Czerner <[email protected]>
Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: Allison Henderson <[email protected]>
Signed-off-by: Zheng Liu <[email protected]>
---
v2 <- v1:
- add a BUG_ON to do a sanity check in extent_status_end()
- fix off-by-one problem in extent_status_end() according to Lukas's comment
- add more detailed comments in ext4_es_find_extent()
- try to lookup in extent tree cache firstly in ext4_es_find_extent()
- rename ext4_es_add_space to ext4_es_insert_extent
- rename ext4_es_remove_space to ext4_es_remove_extent

Hi Lukas,

I have fix some problems according to your advices, but there still has a little
comments that haven't be applied. I introduce my point of view in here. I
don't change es_debug functions because I see that ext_debug() is defined in
ext4_extents.h. So I think that maybe we should keep es_debug(), and it is
consistent with extent tree. Meanwhile I don't remove es_debug() in
ext4_es_remove_extent, although later we will add a tracepoint in this function
because IMO they ared used by different purposes. One is used for debugging,
and another is used for observing the status of filesystem. What do you think?

Regards,
Zheng

fs/ext4/Makefile | 2 +-
fs/ext4/extents_status.c | 434 ++++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/extents_status.h | 20 ++
3 files changed, 455 insertions(+), 1 deletions(-)
create mode 100644 fs/ext4/extents_status.c

diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 56fd8f86..41f22be 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -7,7 +7,7 @@ obj-$(CONFIG_EXT4_FS) += ext4.o
ext4-y := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o page-io.o \
ioctl.o namei.o super.o symlink.o hash.o resize.o extents.o \
ext4_jbd2.o migrate.o mballoc.o block_validity.o move_extent.o \
- mmp.o indirect.o
+ mmp.o indirect.o extents_status.o

ext4-$(CONFIG_EXT4_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o
ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
new file mode 100644
index 0000000..ccb7e2b
--- /dev/null
+++ b/fs/ext4/extents_status.c
@@ -0,0 +1,434 @@
+/*
+ * fs/ext4/extents_status.c
+ *
+ * Written by Yongqiang Yang <[email protected]>
+ * Modified by
+ * Allison Henderson <[email protected]>
+ * Zheng Liu <[email protected]>
+ *
+ * Ext4 extents status tree core functions.
+ */
+#include <linux/rbtree.h>
+#include "ext4.h"
+#include "extents_status.h"
+#include "ext4_extents.h"
+
+/*
+ * extents status tree implementation for ext4.
+ *
+ *
+ * ==========================================================================
+ * Extents status encompass delayed extents and extent locks
+ *
+ * 1. Why delayed extent implementation ?
+ *
+ * Without delayed extent, ext4 identifies a delayed extent by looking up
+ * page cache, this has several deficiencies - complicated, buggy, and
+ * inefficient code.
+ *
+ * FIEMAP, SEEK_HOLE/DATA, bigalloc, punch hole and writeout all need to know if
+ * a block or a range of blocks are belonged to a delayed extent.
+ *
+ * Let us have a look at how they do without delayed extents implementation.
+ * -- FIEMAP
+ * FIEMAP looks up page cache to identify delayed allocations from holes.
+ *
+ * -- SEEK_HOLE/DATA
+ * SEEK_HOLE/DATA has the same problem as FIEMAP.
+ *
+ * -- bigalloc
+ * bigalloc looks up page cache to figure out if a block is already
+ * under delayed allocation or not to determine whether quota reserving
+ * is needed for the cluster.
+ *
+ * -- punch hole
+ * punch hole looks up page cache to identify a delayed extent.
+ *
+ * -- writeout
+ * Writeout looks up whole page cache to see if a buffer is mapped, If
+ * there are not very many delayed buffers, then it is time comsuming.
+ *
+ * With delayed extents implementation, FIEMAP, SEEK_HOLE/DATA, bigalloc and
+ * writeout can figure out if a block or a range of blocks is under delayed
+ * allocation(belonged to a delayed extent) or not by searching the delayed
+ * extent tree.
+ *
+ *
+ * ==========================================================================
+ * 2. ext4 delayed extents impelmentation
+ *
+ * -- delayed extent
+ * A delayed extent is a range of blocks which are contiguous logically and
+ * under delayed allocation. Unlike extent in ext4, delayed extent in ext4
+ * is a in-memory struct, there is no corresponding on-disk data. There is
+ * no limit on length of delayed extent, so a delayed extent can contain as
+ * many blocks as they are contiguous logically.
+ *
+ * -- delayed extent tree
+ * Every inode has a delayed extent tree and all under delayed allocation
+ * blocks are added to the tree as dealyed extents. Delayed extents in
+ * the tree are ordered by logical block no.
+ *
+ * -- operations on a delayed extent tree
+ * There are three operations on a delayed extent tree: find next delayed
+ * extent, adding a space(a range of blocks) and removing a space.
+ *
+ * -- race on a delayed extent tree
+ * Delayed extent tree is protected inode->i_data_sem like extent tree.
+ *
+ *
+ * ==========================================================================
+ * 3. performance analysis
+ * -- overhead
+ * 1. Apart from operations on a delayed extent tree, we need to
+ * down_write(inode->i_data_sem) in delayed write path to maintain delayed
+ * extent tree, this can have impact on parallel read-write and write-write
+ *
+ * 2. There is a cache extent for write access, so if writes are not very
+ * random, adding space operaions are in O(1) time.
+ *
+ * -- gain
+ * 3. Code is much simpler, more readable, more maintainable and
+ * more efficient.
+ */
+
+static struct kmem_cache *ext4_es_cachep;
+
+int __init ext4_init_es(void)
+{
+ ext4_es_cachep = KMEM_CACHE(extent_status, SLAB_RECLAIM_ACCOUNT);
+ if (ext4_es_cachep == NULL)
+ return -ENOMEM;
+ return 0;
+}
+
+void ext4_exit_es(void)
+{
+ if (ext4_es_cachep)
+ kmem_cache_destroy(ext4_es_cachep);
+}
+
+void ext4_es_init_tree(struct ext4_es_tree *tree)
+{
+ tree->root = RB_ROOT;
+ tree->cache_es = NULL;
+}
+
+#ifdef ES_DEBUG__
+static void ext4_es_print_tree(struct inode *inode)
+{
+ struct ext4_es_tree *tree;
+ struct rb_node *node;
+
+ printk(KERN_DEBUG "status extents for inode %lu:", inode->i_ino);
+ tree = &EXT4_I(inode)->i_es_tree;
+ node = rb_first(&tree->root);
+ while (node) {
+ struct extent_status *es;
+ es = rb_entry(node, struct extent_status, rb_node);
+ printk(KERN_DEBUG " [%u/%u)", es->start, es->len);
+ node = rb_next(node);
+ }
+ printk(KERN_DEBUG "\n");
+}
+#else
+#define ext4_es_print_tree(inode)
+#endif
+
+static inline ext4_lblk_t extent_status_end(struct extent_status *es)
+{
+ BUG_ON(es->start + es->len < es->start);
+ return es->start + es->len - 1;
+}
+
+/*
+ * search through the tree for an delayed_extent with a given offset. If
+ * it can't be found, try to find next extent.
+ */
+static struct extent_status *__es_tree_search(struct rb_root *root,
+ ext4_lblk_t offset)
+{
+ struct rb_node *node = root->rb_node;
+ struct extent_status *es = NULL;
+
+ while (node) {
+ es = rb_entry(node, struct extent_status, rb_node);
+ if (offset < es->start)
+ node = node->rb_left;
+ else if (offset > extent_status_end(es))
+ node = node->rb_right;
+ else
+ return es;
+ }
+
+ if (es && offset < es->start)
+ return es;
+
+ if (es && offset > extent_status_end(es)) {
+ node = rb_next(&es->rb_node);
+ return node ? rb_entry(node, struct extent_status, rb_node) :
+ NULL;
+ }
+
+ return NULL;
+}
+
+/*
+ * ext4_es_find_extent: find the 1st delayed extent covering @es->start
+ * if it exists, otherwise, the next extent after @es->start.
+ *
+ * @inode: the inode which owns delayed extents
+ * @es: delayed extent that we found
+ *
+ * Returns the first block of the next extent after es, otherwise
+ * EXT_MAX_BLOCKS if no delay extent is found.
+ * Delayed extent is returned via @es.
+ */
+ext4_lblk_t ext4_es_find_extent(struct inode *inode, struct extent_status *es)
+{
+ struct ext4_es_tree *tree = NULL;
+ struct extent_status *es1 = NULL;
+ struct rb_node *node;
+ ext4_lblk_t ret = EXT_MAX_BLOCKS;
+
+ tree = &EXT4_I(inode)->i_es_tree;
+
+ /* find delay extent in cache */
+ if (tree->cache_es) {
+ es1 = tree->cache_es;
+ if (in_range(es->start, es1->start, es1->len)) {
+ es_debug("%u cached by [%u/%u)\n",
+ es->start, es1->start, es1->len);
+ goto out;
+ }
+ }
+
+ es->len = 0;
+ es1 = __es_tree_search(&tree->root, es->start);
+
+out:
+ if (es1) {
+ tree->cache_es = es1;
+ es->start = es1->start;
+ es->len = es1->len;
+ node = rb_next(&es1->rb_node);
+ if (node) {
+ es1 = rb_entry(node, struct extent_status, rb_node);
+ ret = es1->start;
+ }
+ }
+
+ return ret;
+}
+
+static struct extent_status *
+ext4_es_alloc_extent(ext4_lblk_t start, ext4_lblk_t len)
+{
+ struct extent_status *es;
+ es = kmem_cache_alloc(ext4_es_cachep, GFP_NOFS);
+ if (es == NULL)
+ return NULL;
+ es->start = start;
+ es->len = len;
+ return es;
+}
+
+static void ext4_es_free_extent(struct extent_status *es)
+{
+ kmem_cache_free(ext4_es_cachep, es);
+}
+
+static void ext4_es_try_to_merge_left(struct ext4_es_tree *tree,
+ struct extent_status *es)
+{
+ struct extent_status *es1;
+ struct rb_node *node;
+
+ node = rb_prev(&es->rb_node);
+ if (!node)
+ return;
+
+ es1 = rb_entry(node, struct extent_status, rb_node);
+ if (es->start == extent_status_end(es1) + 1) {
+ es1->len += es->len;
+ rb_erase(&es->rb_node, &tree->root);
+ if (es == tree->cache_es)
+ tree->cache_es = es1;
+ ext4_es_free_extent(es);
+ }
+}
+
+static void ext4_es_try_to_merge_right(struct ext4_es_tree *tree,
+ struct extent_status *es)
+{
+ struct extent_status *es1;
+ struct rb_node *node;
+
+ node = rb_next(&es->rb_node);
+ if (!node)
+ return;
+
+ es1 = rb_entry(node, struct extent_status, rb_node);
+ if (es1->start == extent_status_end(es) + 1) {
+ es->len += es1->len;
+ rb_erase(node, &tree->root);
+ if (es1 == tree->cache_es)
+ tree->cache_es = es;
+ ext4_es_free_extent(es1);
+ }
+}
+
+/*
+ * ext4_es_insert_extent: adds a space to a delayed extent tree.
+ * Caller holds inode->i_data_sem.
+ *
+ * ext4_es_insert_extent is callyed by ext4_dealyed_write_begin and
+ * ext4_es_remove_extent.
+ *
+ * Return 0 on success, error code on failure.
+ */
+int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t offset,
+ ext4_lblk_t len)
+{
+ struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
+ struct rb_node **p = &tree->root.rb_node;
+ struct rb_node *parent = NULL;
+ struct extent_status *es;
+ ext4_lblk_t end = offset + len - 1;
+
+ BUG_ON(end < offset);
+
+ es = tree->cache_es;
+ es_debug("add [%u/%u) to extent status tree of inode %lu\n",
+ offset, len, inode->i_ino);
+
+ if (es && offset == (extent_status_end(es) + 1)) {
+ es_debug("cached by [%u/%u)\n", es->start, es->len);
+ es->len += len;
+ ext4_es_try_to_merge_right(tree, es);
+ goto out;
+ } else if (es && es->start == end + 1) {
+ es_debug("cached by [%u/%u)\n", es->start, es->len);
+ es->start = offset;
+ es->len += len;
+ ext4_es_try_to_merge_left(tree, es);
+ goto out;
+ } else if (es && in_range(offset, es->start, es->len)) {
+ es_debug("cached by [%u/%u)\n", es->start, es->len);
+ goto out;
+ }
+
+ while (*p) {
+ parent = *p;
+ es = rb_entry(parent, struct extent_status, rb_node);
+
+ if (offset < es->start) {
+ if (es->start == end + 1) {
+ es->len += len;
+ es->start = offset;
+ goto out;
+ }
+ p = &(*p)->rb_left;
+ } else if (offset > extent_status_end(es)) {
+ if (offset == extent_status_end(es) + 1) {
+ es->len += len;
+ goto out;
+ }
+ p = &(*p)->rb_right;
+ } else
+ goto out;
+ }
+
+ es = ext4_es_alloc_extent(offset, len);
+ if (!es)
+ return -ENOMEM;
+ rb_link_node(&es->rb_node, parent, p);
+ rb_insert_color(&es->rb_node, &tree->root);
+
+out:
+ tree->cache_es = es;
+ ext4_es_print_tree(inode);
+
+ return 0;
+}
+
+/*
+ * ext4_es_remove_extent() removes a space from a delayed extent tree.
+ * Caller holds inode->i_data_sem.
+ *
+ * Return 0 on success, error code on failure.
+ */
+int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t offset,
+ ext4_lblk_t len)
+{
+ struct rb_node *node;
+ struct ext4_es_tree *tree;
+ struct extent_status *es;
+ struct extent_status orig_es;
+ ext4_lblk_t len1, len2, end;
+
+ es_debug("remove [%u/%u) from extent status tree of inode %lu\n",
+ offset, len, inode->i_ino);
+
+ end = offset + len - 1;
+ BUG_ON(end < offset);
+ tree = &EXT4_I(inode)->i_es_tree;
+ es = __es_tree_search(&tree->root, offset);
+ if (!es)
+ goto out;
+
+ /* Simply invalidate cache_es. */
+ tree->cache_es = NULL;
+
+ orig_es.start = es->start;
+ orig_es.len = es->len;
+ len1 = offset > es->start ? offset - es->start : 0;
+ len2 = extent_status_end(es) > end ?
+ extent_status_end(es) - end : 0;
+ if (len1 > 0)
+ es->len = len1;
+ if (len2 > 0) {
+ if (len1 > 0) {
+ int err;
+ err = ext4_es_insert_extent(inode, end + 1, len2);
+ if (err) {
+ es->start = orig_es.start;
+ es->len = orig_es.len;
+ return err;
+ }
+ } else {
+ es->start = end + 1;
+ es->len = len2;
+ }
+ goto out;
+ }
+
+ if (len1 > 0) {
+ node = rb_next(&es->rb_node);
+ if (!node)
+ es = rb_entry(node, struct extent_status, rb_node);
+ else
+ es = NULL;
+ }
+
+ while (es && extent_status_end(es) <= end) {
+ node = rb_next(&es->rb_node);
+ rb_erase(&es->rb_node, &tree->root);
+ ext4_es_free_extent(es);
+ if (!node) {
+ es = NULL;
+ break;
+ }
+ es = rb_entry(node, struct extent_status, rb_node);
+ }
+
+ if (es && es->start < end + 1) {
+ len1 = extent_status_end(es) - end;
+ es->start = end + 1;
+ es->len = len1;
+ }
+
+out:
+ ext4_es_print_tree(inode);
+ return 0;
+}
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index 8be2ab9..077f82d 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -11,6 +11,15 @@
#ifndef _EXT4_EXTENTS_STATUS_H
#define _EXT4_EXTENTS_STATUS_H

+/*
+ * Turn on ES_DEBUG__ to get lots of info about extent status operations.
+ */
+#ifdef ES_DEBUG__
+#define es_debug(fmt, ...) printk(fmt, ##__VA_ARGS__)
+#else
+#define es_debug(fmt, ...) no_printk(fmt, ##__VA_ARGS__)
+#endif
+
struct extent_status {
struct rb_node rb_node;
ext4_lblk_t start; /* first block extent covers */
@@ -22,4 +31,15 @@ struct ext4_es_tree {
struct extent_status *cache_es; /* recently accessed extent */
};

+extern int __init ext4_init_es(void);
+extern void ext4_exit_es(void);
+extern void ext4_es_init_tree(struct ext4_es_tree *tree);
+
+extern int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t start,
+ ext4_lblk_t len);
+extern int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t start,
+ ext4_lblk_t len);
+extern ext4_lblk_t ext4_es_find_extent(struct inode *inode,
+ struct extent_status *es);
+
#endif /* _EXT4_EXTENTS_STATUS_H */
--
1.7.4.1


2012-08-22 05:56:01

by Zheng Liu

[permalink] [raw]
Subject: [RFC][PATCH 7/8 v2] ext4: reimplement ext4_find_delay_alloc_range on extent status tree

From: Zheng Liu <[email protected]>

ext4_find_delay_alloc_range is reimplemented on extent status tree.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: Allison Henderson <[email protected]>
Signed-off-by: Zheng Liu <[email protected]>
---
fs/ext4/ext4.h | 4 --
fs/ext4/ext4_extents.h | 3 +-
fs/ext4/extents.c | 112 +++++------------------------------------------
fs/ext4/inode.c | 53 +----------------------
4 files changed, 14 insertions(+), 158 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 66c592e..f1d5087 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2431,14 +2431,10 @@ enum ext4_state_bits {
* never, ever appear in a buffer_head's state
* flag. See EXT4_MAP_FROM_CLUSTER to see where
* this is used. */
- BH_Da_Mapped, /* Delayed allocated block that now has a mapping. This
- * flag is set when ext4_map_blocks is called on a
- * delayed allocated block to get its real mapping. */
};

BUFFER_FNS(Uninit, uninit)
TAS_BUFFER_FNS(Uninit, uninit)
-BUFFER_FNS(Da_Mapped, da_mapped)

/*
* Add new method to test wether block and inode bitmaps are properly
diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
index cb1b2c9..603bb11 100644
--- a/fs/ext4/ext4_extents.h
+++ b/fs/ext4/ext4_extents.h
@@ -314,7 +314,6 @@ extern struct ext4_ext_path *ext4_ext_find_extent(struct inode *, ext4_lblk_t,
struct ext4_ext_path *);
extern void ext4_ext_drop_refs(struct ext4_ext_path *);
extern int ext4_ext_check_inode(struct inode *inode);
-extern int ext4_find_delalloc_cluster(struct inode *inode, ext4_lblk_t lblk,
- int search_hint_reverse);
+extern int ext4_find_delalloc_cluster(struct inode *inode, ext4_lblk_t lblk);
#endif /* _EXT4_EXTENTS */

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 44cda82..f12d7e8 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3405,115 +3405,27 @@ out:
/**
* ext4_find_delalloc_range: find delayed allocated block in the given range.
*
- * Goes through the buffer heads in the range [lblk_start, lblk_end] and returns
- * whether there are any buffers marked for delayed allocation. It returns '1'
- * on the first delalloc'ed buffer head found. If no buffer head in the given
- * range is marked for delalloc, it returns 0.
- * lblk_start should always be <= lblk_end.
- * search_hint_reverse is to indicate that searching in reverse from lblk_end to
- * lblk_start might be more efficient (i.e., we will likely hit the delalloc'ed
- * block sooner). This is useful when blocks are truncated sequentially from
- * lblk_start towards lblk_end.
+ * Return 1 if there is a delalloc block in the range, otherwise 0.
*/
static int ext4_find_delalloc_range(struct inode *inode,
ext4_lblk_t lblk_start,
- ext4_lblk_t lblk_end,
- int search_hint_reverse)
+ ext4_lblk_t lblk_end)
{
- struct address_space *mapping = inode->i_mapping;
- struct buffer_head *head, *bh = NULL;
- struct page *page;
- ext4_lblk_t i, pg_lblk;
- pgoff_t index;
-
- if (!test_opt(inode->i_sb, DELALLOC))
- return 0;
-
- /* reverse search wont work if fs block size is less than page size */
- if (inode->i_blkbits < PAGE_CACHE_SHIFT)
- search_hint_reverse = 0;
-
- if (search_hint_reverse)
- i = lblk_end;
- else
- i = lblk_start;
-
- index = i >> (PAGE_CACHE_SHIFT - inode->i_blkbits);
-
- while ((i >= lblk_start) && (i <= lblk_end)) {
- page = find_get_page(mapping, index);
- if (!page)
- goto nextpage;
-
- if (!page_has_buffers(page))
- goto nextpage;
-
- head = page_buffers(page);
- if (!head)
- goto nextpage;
-
- bh = head;
- pg_lblk = index << (PAGE_CACHE_SHIFT -
- inode->i_blkbits);
- do {
- if (unlikely(pg_lblk < lblk_start)) {
- /*
- * This is possible when fs block size is less
- * than page size and our cluster starts/ends in
- * middle of the page. So we need to skip the
- * initial few blocks till we reach the 'lblk'
- */
- pg_lblk++;
- continue;
- }
-
- /* Check if the buffer is delayed allocated and that it
- * is not yet mapped. (when da-buffers are mapped during
- * their writeout, their da_mapped bit is set.)
- */
- if (buffer_delay(bh) && !buffer_da_mapped(bh)) {
- page_cache_release(page);
- trace_ext4_find_delalloc_range(inode,
- lblk_start, lblk_end,
- search_hint_reverse,
- 1, i);
- return 1;
- }
- if (search_hint_reverse)
- i--;
- else
- i++;
- } while ((i >= lblk_start) && (i <= lblk_end) &&
- ((bh = bh->b_this_page) != head));
-nextpage:
- if (page)
- page_cache_release(page);
- /*
- * Move to next page. 'i' will be the first lblk in the next
- * page.
- */
- if (search_hint_reverse)
- index--;
- else
- index++;
- i = index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
- }
+ struct extent_status es;

- trace_ext4_find_delalloc_range(inode, lblk_start, lblk_end,
- search_hint_reverse, 0, 0);
- return 0;
+ es.start = lblk_start;
+ ext4_es_find_extent(inode, &es);
+ return (es.start + es.len) >= lblk_end && es.start <= lblk_start;
}

-int ext4_find_delalloc_cluster(struct inode *inode, ext4_lblk_t lblk,
- int search_hint_reverse)
+int ext4_find_delalloc_cluster(struct inode *inode, ext4_lblk_t lblk)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
ext4_lblk_t lblk_start, lblk_end;
lblk_start = lblk & (~(sbi->s_cluster_ratio - 1));
lblk_end = lblk_start + sbi->s_cluster_ratio - 1;

- return ext4_find_delalloc_range(inode, lblk_start, lblk_end,
- search_hint_reverse);
+ return ext4_find_delalloc_range(inode, lblk_start, lblk_end);
}

/**
@@ -3574,7 +3486,7 @@ get_reserved_cluster_alloc(struct inode *inode, ext4_lblk_t lblk_start,
lblk_from = lblk_start & (~(sbi->s_cluster_ratio - 1));
lblk_to = lblk_from + c_offset - 1;

- if (ext4_find_delalloc_range(inode, lblk_from, lblk_to, 0))
+ if (ext4_find_delalloc_range(inode, lblk_from, lblk_to))
allocated_clusters--;
}

@@ -3584,7 +3496,7 @@ get_reserved_cluster_alloc(struct inode *inode, ext4_lblk_t lblk_start,
lblk_from = lblk_start + num_blks;
lblk_to = lblk_from + (sbi->s_cluster_ratio - c_offset) - 1;

- if (ext4_find_delalloc_range(inode, lblk_from, lblk_to, 0))
+ if (ext4_find_delalloc_range(inode, lblk_from, lblk_to))
allocated_clusters--;
}

@@ -3868,7 +3780,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
if (ext4_ext_in_cache(inode, map->m_lblk, &newex)) {
if (!newex.ee_start_lo && !newex.ee_start_hi) {
if ((sbi->s_cluster_ratio > 1) &&
- ext4_find_delalloc_cluster(inode, map->m_lblk, 0))
+ ext4_find_delalloc_cluster(inode, map->m_lblk))
map->m_flags |= EXT4_MAP_FROM_CLUSTER;

if ((flags & EXT4_GET_BLOCKS_CREATE) == 0) {
@@ -3956,7 +3868,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
}

if ((sbi->s_cluster_ratio > 1) &&
- ext4_find_delalloc_cluster(inode, map->m_lblk, 0))
+ ext4_find_delalloc_cluster(inode, map->m_lblk))
map->m_flags |= EXT4_MAP_FROM_CLUSTER;

/*
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index aa04aa6..e1416c1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -484,49 +484,6 @@ static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx,
}

/*
- * Sets the BH_Da_Mapped bit on the buffer heads corresponding to the given map.
- */
-static void set_buffers_da_mapped(struct inode *inode,
- struct ext4_map_blocks *map)
-{
- struct address_space *mapping = inode->i_mapping;
- struct pagevec pvec;
- int i, nr_pages;
- pgoff_t index, end;
-
- index = map->m_lblk >> (PAGE_CACHE_SHIFT - inode->i_blkbits);
- end = (map->m_lblk + map->m_len - 1) >>
- (PAGE_CACHE_SHIFT - inode->i_blkbits);
-
- pagevec_init(&pvec, 0);
- while (index <= end) {
- nr_pages = pagevec_lookup(&pvec, mapping, index,
- min(end - index + 1,
- (pgoff_t)PAGEVEC_SIZE));
- if (nr_pages == 0)
- break;
- for (i = 0; i < nr_pages; i++) {
- struct page *page = pvec.pages[i];
- struct buffer_head *bh, *head;
-
- if (unlikely(page->mapping != mapping) ||
- !PageDirty(page))
- break;
-
- if (page_has_buffers(page)) {
- bh = head = page_buffers(page);
- do {
- set_buffer_da_mapped(bh);
- bh = bh->b_this_page;
- } while (bh != head);
- }
- index++;
- }
- pagevec_release(&pvec);
- }
-}
-
-/*
* The ext4_map_blocks() function tries to look up the requested blocks,
* and returns if the blocks are already mapped.
*
@@ -661,13 +618,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
ext4_clear_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED);

- /* If we have successfully mapped the delayed allocated blocks,
- * set the BH_Da_Mapped bit on them. Its important to do this
- * under the protection of i_data_sem.
- */
if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
int ret;
- set_buffers_da_mapped(inode, map);
delayed_mapped:
/* delayed allocation blocks has been allocated */
ret = ext4_es_remove_extent(inode, map->m_lblk,
@@ -1325,7 +1277,6 @@ static void ext4_da_page_release_reservation(struct page *page,
if ((offset <= curr_off) && (buffer_delay(bh))) {
to_release++;
clear_buffer_delay(bh);
- clear_buffer_da_mapped(bh);
}
curr_off = next_off;
} while ((bh = bh->b_this_page) != head);
@@ -1338,7 +1289,7 @@ static void ext4_da_page_release_reservation(struct page *page,
lblk = (page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits)) +
((num_clusters - 1) << sbi->s_cluster_bits);
if (sbi->s_cluster_ratio == 1 ||
- !ext4_find_delalloc_cluster(inode, lblk, 1))
+ !ext4_find_delalloc_cluster(inode, lblk))
ext4_da_release_space(inode, 1);

num_clusters--;
@@ -1444,8 +1395,6 @@ static int mpage_da_submit_io(struct mpage_da_data *mpd,
clear_buffer_delay(bh);
bh->b_blocknr = pblock;
}
- if (buffer_da_mapped(bh))
- clear_buffer_da_mapped(bh);
if (buffer_unwritten(bh) ||
buffer_mapped(bh))
BUG_ON(bh->b_blocknr != pblock);
--
1.7.4.1


2012-08-22 06:05:43

by Zheng Liu

[permalink] [raw]
Subject: [RFC][PATCH 6/8 v2] ext4: reimplement fiemap on extent status tree

From: Zheng Liu <[email protected]>

This patch reimplements fiemap on extent status tree.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: Allison Henderson <[email protected]>
Signed-off-by: Zheng Liu <[email protected]>
---
fs/ext4/extents.c | 186 +++++++----------------------------------------------
1 files changed, 23 insertions(+), 163 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index bf5ece2..44cda82 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4514,193 +4514,53 @@ static int ext4_ext_fiemap_cb(struct inode *inode, ext4_lblk_t next,
struct ext4_ext_cache *newex, struct ext4_extent *ex,
void *data)
{
+ struct extent_status es;
__u64 logical;
__u64 physical;
__u64 length;
__u32 flags = 0;
+ ext4_lblk_t next_del;
int ret = 0;
struct fiemap_extent_info *fieinfo = data;
unsigned char blksize_bits;

- blksize_bits = inode->i_sb->s_blocksize_bits;
- logical = (__u64)newex->ec_block << blksize_bits;
+ es.start = newex->ec_block;
+ down_read(&EXT4_I(inode)->i_data_sem);
+ next_del = ext4_es_find_extent(inode, &es);
+ up_read(&EXT4_I(inode)->i_data_sem);

+ next = min(next_del, next);
if (newex->ec_start == 0) {
/*
* No extent in extent-tree contains block @newex->ec_start,
* then the block may stay in 1)a hole or 2)delayed-extent.
- *
- * Holes or delayed-extents are processed as follows.
- * 1. lookup dirty pages with specified range in pagecache.
- * If no page is got, then there is no delayed-extent and
- * return with EXT_CONTINUE.
- * 2. find the 1st mapped buffer,
- * 3. check if the mapped buffer is both in the request range
- * and a delayed buffer. If not, there is no delayed-extent,
- * then return.
- * 4. a delayed-extent is found, the extent will be collected.
*/
- ext4_lblk_t end = 0;
- pgoff_t last_offset;
- pgoff_t offset;
- pgoff_t index;
- pgoff_t start_index = 0;
- struct page **pages = NULL;
- struct buffer_head *bh = NULL;
- struct buffer_head *head = NULL;
- unsigned int nr_pages = PAGE_SIZE / sizeof(struct page *);
-
- pages = kmalloc(PAGE_SIZE, GFP_KERNEL);
- if (pages == NULL)
- return -ENOMEM;
-
- offset = logical >> PAGE_SHIFT;
-repeat:
- last_offset = offset;
- head = NULL;
- ret = find_get_pages_tag(inode->i_mapping, &offset,
- PAGECACHE_TAG_DIRTY, nr_pages, pages);
-
- if (!(flags & FIEMAP_EXTENT_DELALLOC)) {
- /* First time, try to find a mapped buffer. */
- if (ret == 0) {
-out:
- for (index = 0; index < ret; index++)
- page_cache_release(pages[index]);
- /* just a hole. */
- kfree(pages);
- return EXT_CONTINUE;
- }
- index = 0;
-
-next_page:
- /* Try to find the 1st mapped buffer. */
- end = ((__u64)pages[index]->index << PAGE_SHIFT) >>
- blksize_bits;
- if (!page_has_buffers(pages[index]))
- goto out;
- head = page_buffers(pages[index]);
- if (!head)
- goto out;
-
- index++;
- bh = head;
- do {
- if (end >= newex->ec_block +
- newex->ec_len)
- /* The buffer is out of
- * the request range.
- */
- goto out;
-
- if (buffer_mapped(bh) &&
- end >= newex->ec_block) {
- start_index = index - 1;
- /* get the 1st mapped buffer. */
- goto found_mapped_buffer;
- }
-
- bh = bh->b_this_page;
- end++;
- } while (bh != head);
-
- /* No mapped buffer in the range found in this page,
- * We need to look up next page.
- */
- if (index >= ret) {
- /* There is no page left, but we need to limit
- * newex->ec_len.
- */
- newex->ec_len = end - newex->ec_block;
- goto out;
- }
- goto next_page;
- } else {
- /*Find contiguous delayed buffers. */
- if (ret > 0 && pages[0]->index == last_offset)
- head = page_buffers(pages[0]);
- bh = head;
- index = 1;
- start_index = 0;
+ if (es.len == 0)
+ /* A hole found. */
+ return EXT_CONTINUE;
+
+ if (es.start > newex->ec_block) {
+ /* A hole found. */
+ newex->ec_len = min(es.start - newex->ec_block,
+ newex->ec_len);
+ return EXT_CONTINUE;
}

-found_mapped_buffer:
- if (bh != NULL && buffer_delay(bh)) {
- /* 1st or contiguous delayed buffer found. */
- if (!(flags & FIEMAP_EXTENT_DELALLOC)) {
- /*
- * 1st delayed buffer found, record
- * the start of extent.
- */
- flags |= FIEMAP_EXTENT_DELALLOC;
- newex->ec_block = end;
- logical = (__u64)end << blksize_bits;
- }
- /* Find contiguous delayed buffers. */
- do {
- if (!buffer_delay(bh))
- goto found_delayed_extent;
- bh = bh->b_this_page;
- end++;
- } while (bh != head);
-
- for (; index < ret; index++) {
- if (!page_has_buffers(pages[index])) {
- bh = NULL;
- break;
- }
- head = page_buffers(pages[index]);
- if (!head) {
- bh = NULL;
- break;
- }
-
- if (pages[index]->index !=
- pages[start_index]->index + index
- - start_index) {
- /* Blocks are not contiguous. */
- bh = NULL;
- break;
- }
- bh = head;
- do {
- if (!buffer_delay(bh))
- /* Delayed-extent ends. */
- goto found_delayed_extent;
- bh = bh->b_this_page;
- end++;
- } while (bh != head);
- }
- } else if (!(flags & FIEMAP_EXTENT_DELALLOC))
- /* a hole found. */
- goto out;
-
-found_delayed_extent:
- newex->ec_len = min(end - newex->ec_block,
- (ext4_lblk_t)EXT_INIT_MAX_LEN);
- if (ret == nr_pages && bh != NULL &&
- newex->ec_len < EXT_INIT_MAX_LEN &&
- buffer_delay(bh)) {
- /* Have not collected an extent and continue. */
- for (index = 0; index < ret; index++)
- page_cache_release(pages[index]);
- goto repeat;
- }
-
- for (index = 0; index < ret; index++)
- page_cache_release(pages[index]);
- kfree(pages);
+ flags |= FIEMAP_EXTENT_DELALLOC;
+ newex->ec_len = es.start + es.len - newex->ec_block;
}

- physical = (__u64)newex->ec_start << blksize_bits;
- length = (__u64)newex->ec_len << blksize_bits;
-
if (ex && ext4_ext_is_uninitialized(ex))
flags |= FIEMAP_EXTENT_UNWRITTEN;

if (next == EXT_MAX_BLOCKS)
flags |= FIEMAP_EXTENT_LAST;

+ blksize_bits = inode->i_sb->s_blocksize_bits;
+ logical = (__u64)newex->ec_block << blksize_bits;
+ physical = (__u64)newex->ec_start << blksize_bits;
+ length = (__u64)newex->ec_len << blksize_bits;
+
ret = fiemap_fill_next_extent(fieinfo, logical, physical,
length, flags);
if (ret < 0)
--
1.7.4.1


2012-08-22 05:56:04

by Zheng Liu

[permalink] [raw]
Subject: [RFC][PATCH 8/8 v2] ext4: introduce lseek SEEK_DATA/SEEK_HOLE support

From: Zheng Liu <[email protected]>

This patch makes ext4 really support SEEK_DATA/SEEK_HOLE flags. Extent-based
and block-based files are implemented together because ext4_map_blocks hides
this difference.

CC: Hugh Dickins <[email protected]>
CC: Jie Liu <[email protected]>
Signed-off-by: Zheng Liu <[email protected]>
---
[I notice that Jeff has an implementation of SEEK_DATA/SEEK_HOLE for ext4. So
Cc' to Jeff]

[Cc' to Hugh because tmpfs's folks are thinking about whether add this feature
or not]

fs/ext4/ext4.h | 1 +
fs/ext4/file.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++++++-
fs/ext4/indirect.c | 1 -
3 files changed, 152 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f1d5087..149851b 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2398,6 +2398,7 @@ extern int ext4_map_blocks(handle_t *handle, struct inode *inode,
struct ext4_map_blocks *map, int flags);
extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len);
+
/* move_extent.c */
extern int ext4_move_extents(struct file *o_filp, struct file *d_filp,
__u64 start_orig, __u64 start_donor,
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 3b0e3bd..cfdf774 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -285,6 +285,145 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
return dquot_file_open(inode, filp);
}

+static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
+{
+ struct inode *inode = file->f_mapping->host;
+ struct ext4_map_blocks map;
+ struct extent_status es;
+ ext4_lblk_t start_block, end_block, len, delblk;
+ loff_t dataoff, isize;
+ int blkbits;
+ int ret = 0;
+
+ mutex_lock(&inode->i_mutex);
+
+ blkbits = inode->i_sb->s_blocksize_bits;
+ isize = i_size_read(inode);
+ start_block = offset >> blkbits;
+ end_block = isize >> blkbits;
+ len = isize - offset + 1;
+
+ do {
+ map.m_lblk = start_block;
+ map.m_len = len >> blkbits;
+
+ ret = ext4_map_blocks(NULL, inode, &map, 0);
+
+ if (ret > 0) {
+ dataoff = start_block << blkbits;
+ break;
+ } else {
+ /* search in extent status */
+ es.start = start_block;
+ down_read(&EXT4_I(inode)->i_data_sem);
+ delblk = ext4_es_find_extent(inode, &es);
+ up_read(&EXT4_I(inode)->i_data_sem);
+
+ if (start_block >= es.start &&
+ start_block < es.start + es.len) {
+ dataoff = start_block << blkbits;
+ break;
+ }
+
+ start_block++;
+
+ /*
+ * currently hole punching doesn't change the size of
+ * file. So after hole punching, maybe there has a
+ * hole at the end of file.
+ */
+ if (start_block > end_block) {
+ dataoff = -ENXIO;
+ break;
+ }
+ }
+ } while (1);
+
+ mutex_unlock(&inode->i_mutex);
+ return dataoff;
+}
+
+static loff_t ext4_seek_hole(struct file *file, loff_t offset, loff_t maxsize)
+{
+ struct inode *inode = file->f_mapping->host;
+ struct ext4_map_blocks map;
+ struct extent_status es;
+ ext4_lblk_t start_block, end_block, len, delblk;
+ loff_t holeoff, isize;
+ int blkbits;
+ int ret = 0;
+
+ mutex_lock(&inode->i_mutex);
+
+ blkbits = inode->i_sb->s_blocksize_bits;
+ isize = i_size_read(inode);
+ start_block = offset >> blkbits;
+ end_block = isize >> blkbits;
+ len = isize - offset + 1;
+
+ do {
+ map.m_lblk = start_block;
+ map.m_len = len >> blkbits;
+
+ ret = ext4_map_blocks(NULL, inode, &map, 0);
+
+ if (ret > 0) {
+ /* skip this extent */
+ start_block += ret;
+ if (start_block > end_block) {
+ holeoff = isize;
+ break;
+ }
+ } else {
+ /* search in extent status */
+ es.start = start_block;
+ down_read(&EXT4_I(inode)->i_data_sem);
+ delblk = ext4_es_find_extent(inode, &es);
+ up_read(&EXT4_I(inode)->i_data_sem);
+
+ if (start_block >= es.start &&
+ start_block < es.start + es.len) {
+ /* skip this delay extent */
+ start_block = es.start + es.len;
+ continue;
+ }
+
+ holeoff = start_block << blkbits;
+ break;
+ }
+ } while (1);
+
+ mutex_unlock(&inode->i_mutex);
+ return holeoff;
+}
+
+static loff_t ext4_seek_data_hole(struct file *file, loff_t offset,
+ int origin, loff_t maxsize)
+{
+ struct inode *inode = file->f_mapping->host;
+
+ BUG_ON((origin != SEEK_DATA) && (origin != SEEK_HOLE));
+
+ if (offset >= i_size_read(inode))
+ return -ENXIO;
+
+ if (offset < 0 && !(file->f_mode & FMODE_UNSIGNED_OFFSET))
+ return -EINVAL;
+ if (offset > maxsize)
+ return -EINVAL;
+
+ switch (origin) {
+ case SEEK_DATA:
+ return ext4_seek_data(file, offset, maxsize);
+ case SEEK_HOLE:
+ return ext4_seek_hole(file, offset, maxsize);
+ default:
+ ext4_error(inode->i_sb, "Unknown origin");
+ }
+
+ return -EINVAL;
+}
+
/*
* ext4_llseek() handles both block-mapped and extent-mapped maxbytes values
* by calling generic_file_llseek_size() with the appropriate maxbytes
@@ -300,8 +439,18 @@ loff_t ext4_llseek(struct file *file, loff_t offset, int origin)
else
maxbytes = inode->i_sb->s_maxbytes;

- return generic_file_llseek_size(file, offset, origin,
- maxbytes, i_size_read(inode));
+ switch (origin) {
+ case SEEK_END:
+ case SEEK_CUR:
+ case SEEK_SET:
+ return generic_file_llseek_size(file, offset, origin,
+ maxbytes, i_size_read(inode));
+ case SEEK_DATA:
+ case SEEK_HOLE:
+ return ext4_seek_data_hole(file, offset, origin, maxbytes);
+ }
+
+ return -EINVAL;
}

const struct file_operations ext4_file_operations = {
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 9db67b1..9917c42 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -1502,4 +1502,3 @@ out_stop:
ext4_journal_stop(handle);
trace_ext4_truncate_exit(inode);
}

2012-09-19 18:34:52

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/8 v2] ext4: add operations on extent status tree

On Wed, Aug 22, 2012 at 02:05:39PM +0800, Zheng Liu wrote:
> + * 3. performance analysis
> + * -- overhead
> + * 1. Apart from operations on a delayed extent tree, we need to
> + * down_write(inode->i_data_sem) in delayed write path to maintain delayed
> + * extent tree, this can have impact on parallel read-write and write-write

I'm working on going through this patch set now, and I'm not sure this
is worth holding back on this patch series, but I am really concerned
about the performance impact of this.... it would definitely show up
on some of the scalability testing that Eric Whitney had been doing,
for example.

Given that operations on the delayed extent tree should be fast,
instead of using a mutex, any reason why we can't just add a new
spinlock (I'm not even sure we need a rw_spinlock here) to the
ext4_inode_info structure and use that to serialize operations on the
delayed extent tree?

- Ted

2012-09-19 18:41:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/8 v2] ext4: add operations on extent status tree

On Wed, Aug 22, 2012 at 02:05:39PM +0800, Zheng Liu wrote:
> + * Extents status encompass delayed extents and extent locks

I've looked over this patch set, and as near as I can tell, we aren't
(yet) using the extents status structure to do extent-level locking.
I could imagine using that in the future so we aren't using i_data_sem
to lock out the entire tree, so block allocations could happen in
parallel, but that's not here yet.

Is there something I'm missing, or was something else meant by "extent
locks" here?

Thanks,

- Ted

2012-09-19 18:53:38

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Wed, Aug 22, 2012 at 02:05:40PM +0800, Zheng Liu wrote:
> From: Zheng Liu <[email protected]>
>
> Let ext4 initialize extent status tree of an inode.
>
> Signed-off-by: Yongqiang Yang <[email protected]>
> Signed-off-by: Allison Henderson <[email protected]>
> Signed-off-by: Zheng Liu <[email protected]>

One general comment --- "Signed-off-by" should only be used when
someone actually handles a patch; that is, you add the Signed-off-by
for yourself if you originally wrote the patch, or if you started with
someone else's patch and modified it.

If you merely reviewed the patch, you can request that the patch
author add a "Reviewed-by: " header.

The other footer that you will sometimes see is "Acked-by: ". This
gets used by subsystem maintainers who approve of a patch, but instead
of their handling the patch, they are going to let the patch flow into
the upstream via some other tree.

So for example, Lukas's invalidate_page_range patch set will require
an "Acked-by: " footer from the XFS, OCFS2, and mm maintainers for
those commits which touch their subsystems. Once we've gotten an OK
from those folks, then when I accept those patches into the ext4 tree,
I will add my "Signed-off-by: " header since the patches flowed
through my tree.

These are minor points, but Linus will sometimes complain if these
footers aren't used correctly, since Signed-off-by: does have a very
specific legal meaning. See: http://elinux.org/Developer_Certificate_Of_Origin

In any case, I've changed the Signed-off-by footers for Yongqiang Yang
and Allison Heanderson to be "Reviewed-by: ", in the commit
descriptions, since I believe that should be the more appropriate
footer to be using in this case. (If you did actually use some code
from Yongqiang or Allison from patches which they had previously sent
with a DCO, then Signed-off-by would be correct. Let me know if
that's the case....)

Regards,

- Ted


2012-09-19 19:05:49

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Wed, Aug 22, 2012 at 02:05:40PM +0800, Zheng Liu wrote:
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 3e0851e..353b1fd 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -944,6 +944,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
> memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache));
> INIT_LIST_HEAD(&ei->i_prealloc_list);
> spin_lock_init(&ei->i_prealloc_lock);
> + ext4_es_init_tree(&ei->i_es_tree);
> ei->i_reserved_data_blocks = 0;
> ei->i_reserved_meta_blocks = 0;
> ei->i_allocated_meta_blocks = 0;

This patch hunk immediately me ask, "so when does the extent_status
tree get freed?" And I believe the answer is that currently, since it
only tracks delayed extents (and we're not using it for locking
purposes), by the time we have evicted the inode and are ready to call
ext4_clear_inode(), we should have released all of the nodes in the
ext4_es_tree. Is that correct?

If so, we might want to think about adding a sanity check to make sure
that by the time we are done with the inode in ext4_evict_inode()
(after we have forced writeback), the ext4_es_tree is empty. Agreed?

- Ted

2012-09-20 14:41:38

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8 v2] ext4: extent status tree (step 1)

Hi Zheng,

I've applied your patches to the ext4 patch queue, and so they have
shown up in the dev branch of the ext4 git tree.

One thing I've noticed in my regression tests is that there is a new
test failure with this patch series applied. It is xfstests #230,
when bigalloc is enabled:

FSTYP -- ext4
PLATFORM -- Linux/i686 candygram 3.6.0-rc1-00043-ga3229f7
MKFS_OPTIONS -- -q -O bigalloc /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc

230 - output mismatch (see 230.out.bad)
--- 230.out 2012-09-19 01:15:44.000000000 -0400
+++ 230.out.bad 2012-09-20 10:37:05.143728890 -0400
@@ -5,7 +5,9 @@
### create files, setting up ownership (type=u)
### some buffered IO (type=u)
Write 900k...
+pwrite64: Disk quota exceeded
Rewrite 1001k...
+pwrite64: Disk quota exceeded
Write 1000k...
pwrite64: Disk quota exceeded
Write 4096...
@@ -21,7 +23,9 @@
### create files, setting up ownership (type=g)
### some buffered IO (type=g)
Write 900k...
+pwrite64: Disk quota exceeded
Rewrite 1001k...
+pwrite64: Disk quota exceeded
Write 1000k...
pwrite64: Disk quota exceeded
Write 4096...
Ran: 230
Failures: 230
Failed 1 of 1 tests

I haven't had time to take a look at this, and it's only failing when
bigalloc is enabled. If you could take a look I would greatly
appreciate it.

Thanks!!

- Ted

2012-09-21 01:41:11

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8 v2] ext4: extent status tree (step 1)

On Thu, Sep 20, 2012 at 10:41:26AM -0400, Theodore Ts'o wrote:
> Hi Zheng,
>
> I've applied your patches to the ext4 patch queue, and so they have
> shown up in the dev branch of the ext4 git tree.
>
> One thing I've noticed in my regression tests is that there is a new
> test failure with this patch series applied. It is xfstests #230,
> when bigalloc is enabled:
>
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.6.0-rc1-00043-ga3229f7
> MKFS_OPTIONS -- -q -O bigalloc /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc
>
> 230 - output mismatch (see 230.out.bad)
> --- 230.out 2012-09-19 01:15:44.000000000 -0400
> +++ 230.out.bad 2012-09-20 10:37:05.143728890 -0400
> @@ -5,7 +5,9 @@
> ### create files, setting up ownership (type=u)
> ### some buffered IO (type=u)
> Write 900k...
> +pwrite64: Disk quota exceeded
> Rewrite 1001k...
> +pwrite64: Disk quota exceeded
> Write 1000k...
> pwrite64: Disk quota exceeded
> Write 4096...
> @@ -21,7 +23,9 @@
> ### create files, setting up ownership (type=g)
> ### some buffered IO (type=g)
> Write 900k...
> +pwrite64: Disk quota exceeded
> Rewrite 1001k...
> +pwrite64: Disk quota exceeded
> Write 1000k...
> pwrite64: Disk quota exceeded
> Write 4096...
> Ran: 230
> Failures: 230
> Failed 1 of 1 tests
>
> I haven't had time to take a look at this, and it's only failing when
> bigalloc is enabled. If you could take a look I would greatly
> appreciate it.

Hi Ted,

Sorry for delay reply. I will look at this problem after I finish
inline data patch set for e2fsprogs. Thanks for your review.

Regards,
Zheng

2012-09-21 03:20:15

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8 v2] ext4: extent status tree (step 1)

Hi Ted,

I notices you cares Signed-off line, so I just throw a message.

I noticed you changed all Signed-off line of patches, some patches are
written by me initially and is modified by Allison and Zheng. So I
think Signed-off line should be me, Allison and Zheng. Do I
misunderstand the rules?

Yongqiang.

On Thu, Sep 20, 2012 at 10:41 PM, Theodore Ts'o <[email protected]> wrote:
> Hi Zheng,
>
> I've applied your patches to the ext4 patch queue, and so they have
> shown up in the dev branch of the ext4 git tree.
>
> One thing I've noticed in my regression tests is that there is a new
> test failure with this patch series applied. It is xfstests #230,
> when bigalloc is enabled:
>
> FSTYP -- ext4
> PLATFORM -- Linux/i686 candygram 3.6.0-rc1-00043-ga3229f7
> MKFS_OPTIONS -- -q -O bigalloc /dev/vdc
> MOUNT_OPTIONS -- -o acl,user_xattr -o block_validity /dev/vdc /vdc
>
> 230 - output mismatch (see 230.out.bad)
> --- 230.out 2012-09-19 01:15:44.000000000 -0400
> +++ 230.out.bad 2012-09-20 10:37:05.143728890 -0400
> @@ -5,7 +5,9 @@
> ### create files, setting up ownership (type=u)
> ### some buffered IO (type=u)
> Write 900k...
> +pwrite64: Disk quota exceeded
> Rewrite 1001k...
> +pwrite64: Disk quota exceeded
> Write 1000k...
> pwrite64: Disk quota exceeded
> Write 4096...
> @@ -21,7 +23,9 @@
> ### create files, setting up ownership (type=g)
> ### some buffered IO (type=g)
> Write 900k...
> +pwrite64: Disk quota exceeded
> Rewrite 1001k...
> +pwrite64: Disk quota exceeded
> Write 1000k...
> pwrite64: Disk quota exceeded
> Write 4096...
> Ran: 230
> Failures: 230
> Failed 1 of 1 tests
>
> I haven't had time to take a look at this, and it's only failing when
> bigalloc is enabled. If you could take a look I would greatly
> appreciate it.
>
> Thanks!!
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
Best Wishes
Yongqiang Yang

2012-09-22 00:02:25

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8 v2] ext4: extent status tree (step 1)

On Fri, Sep 21, 2012 at 11:19:54AM +0800, Yongqiang Yang wrote:
> Hi Ted,
>
> I notices you cares Signed-off line, so I just throw a message.
>
> I noticed you changed all Signed-off line of patches, some patches are
> written by me initially and is modified by Allison and Zheng. So I
> think Signed-off line should be me, Allison and Zheng. Do I
> misunderstand the rules?

Thanks, I didn't realize that you and Allison had written parts of the
patches. Yes, in that case it should be "Signed-off-by". I've made
this change to my tree. (And this is why I don't cast commits into
stone for a few days before I advance the master branch pointer.)

- Ted

2012-09-24 03:05:57

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/8 v2] ext4: extent status tree (step 1)

On Fri, Sep 21, 2012 at 08:02:17PM -0400, Theodore Ts'o wrote:
> On Fri, Sep 21, 2012 at 11:19:54AM +0800, Yongqiang Yang wrote:
> > Hi Ted,
> >
> > I notices you cares Signed-off line, so I just throw a message.
> >
> > I noticed you changed all Signed-off line of patches, some patches are
> > written by me initially and is modified by Allison and Zheng. So I
> > think Signed-off line should be me, Allison and Zheng. Do I
> > misunderstand the rules?
>
> Thanks, I didn't realize that you and Allison had written parts of the
> patches. Yes, in that case it should be "Signed-off-by". I've made
> this change to my tree. (And this is why I don't cast commits into
> stone for a few days before I advance the master branch pointer.)

Thanks Yongqiang to point out it. Indeed Yongqiang and Allison had
written parts of this patch set.

Regards,
Zheng

2012-09-24 03:24:35

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/8 v2] ext4: add operations on extent status tree

On Wed, Sep 19, 2012 at 02:41:14PM -0400, Theodore Ts'o wrote:
> On Wed, Aug 22, 2012 at 02:05:39PM +0800, Zheng Liu wrote:
> > + * Extents status encompass delayed extents and extent locks
>
> I've looked over this patch set, and as near as I can tell, we aren't
> (yet) using the extents status structure to do extent-level locking.
> I could imagine using that in the future so we aren't using i_data_sem
> to lock out the entire tree, so block allocations could happen in
> parallel, but that's not here yet.
>
> Is there something I'm missing, or was something else meant by "extent
> locks" here?

Hi Ted,

In my plan, the first step for extent status tree is only to record
delay extent in memory. It can simplify the implementation of fiemap
and bigalloc, and introduce lseek SEEK_DATA/SEEK_HOLE support. That is
all I want to finish in first step. I am very glad to see that these
patches has been applied into dev branch even though it still has some
problems. I will look at these problem ASAP.

Indeed, in this patch set, extent-level locking is still not used
because that is my next step in my plan [1]. I will try to use it
after these patches are fully tested and applied. Thanks.

1. http://www.spinics.net/lists/linux-fsdevel/msg56291.html

Regards,
Zheng

2012-09-24 04:25:12

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/8 v2] ext4: add operations on extent status tree

On Wed, Sep 19, 2012 at 02:34:52PM -0400, Theodore Ts'o wrote:
> On Wed, Aug 22, 2012 at 02:05:39PM +0800, Zheng Liu wrote:
> > + * 3. performance analysis
> > + * -- overhead
> > + * 1. Apart from operations on a delayed extent tree, we need to
> > + * down_write(inode->i_data_sem) in delayed write path to maintain delayed
> > + * extent tree, this can have impact on parallel read-write and write-write
>
> I'm working on going through this patch set now, and I'm not sure this
> is worth holding back on this patch series, but I am really concerned
> about the performance impact of this.... it would definitely show up
> on some of the scalability testing that Eric Whitney had been doing,
> for example.

Sorry, maybe I miss some mails. Could you please tell me where I can
find Eric's mail. Thanks.

>
> Given that operations on the delayed extent tree should be fast,
> instead of using a mutex, any reason why we can't just add a new
> spinlock (I'm not even sure we need a rw_spinlock here) to the
> ext4_inode_info structure and use that to serialize operations on the
> delayed extent tree?

Thanks for your comment. I will fix it.

Regards,
Zheng

2012-09-24 04:34:30

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Wed, Sep 19, 2012 at 03:05:41PM -0400, Theodore Ts'o wrote:
> On Wed, Aug 22, 2012 at 02:05:40PM +0800, Zheng Liu wrote:
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index 3e0851e..353b1fd 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -944,6 +944,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
> > memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache));
> > INIT_LIST_HEAD(&ei->i_prealloc_list);
> > spin_lock_init(&ei->i_prealloc_lock);
> > + ext4_es_init_tree(&ei->i_es_tree);
> > ei->i_reserved_data_blocks = 0;
> > ei->i_reserved_meta_blocks = 0;
> > ei->i_allocated_meta_blocks = 0;
>
> This patch hunk immediately me ask, "so when does the extent_status
> tree get freed?" And I believe the answer is that currently, since it
> only tracks delayed extents (and we're not using it for locking
> purposes), by the time we have evicted the inode and are ready to call
> ext4_clear_inode(), we should have released all of the nodes in the
> ext4_es_tree. Is that correct?
>
> If so, we might want to think about adding a sanity check to make sure
> that by the time we are done with the inode in ext4_evict_inode()
> (after we have forced writeback), the ext4_es_tree is empty. Agreed?

Yes, I agree with you. I will add a sanity check. :-)

Regards,
Zheng

2012-09-25 12:32:16

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Wed, Sep 19, 2012 at 03:05:41PM -0400, Theodore Ts'o wrote:
> On Wed, Aug 22, 2012 at 02:05:40PM +0800, Zheng Liu wrote:
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index 3e0851e..353b1fd 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -944,6 +944,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
> > memset(&ei->i_cached_extent, 0, sizeof(struct ext4_ext_cache));
> > INIT_LIST_HEAD(&ei->i_prealloc_list);
> > spin_lock_init(&ei->i_prealloc_lock);
> > + ext4_es_init_tree(&ei->i_es_tree);
> > ei->i_reserved_data_blocks = 0;
> > ei->i_reserved_meta_blocks = 0;
> > ei->i_allocated_meta_blocks = 0;
>
> This patch hunk immediately me ask, "so when does the extent_status
> tree get freed?" And I believe the answer is that currently, since it
> only tracks delayed extents (and we're not using it for locking
> purposes), by the time we have evicted the inode and are ready to call
> ext4_clear_inode(), we should have released all of the nodes in the
> ext4_es_tree. Is that correct?
>
> If so, we might want to think about adding a sanity check to make sure
> that by the time we are done with the inode in ext4_evict_inode()
> (after we have forced writeback), the ext4_es_tree is empty. Agreed?

Hi Ted,

Today I revise this patch again, and I find extent_status_tree is freed
in ext4_clear_inode(). So maybe I don't think that we need to check
this tree to be freed in ext4_evict_inode(). This change is in this
patch '[RFC][PATCH 4/8 v2] ext4: let ext4 maintain extent status tree'.
What's your opinion?

Regards,
Zheng

2012-09-25 20:59:21

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Tue, Sep 25, 2012 at 08:42:52PM +0800, Zheng Liu wrote:
> > If so, we might want to think about adding a sanity check to make sure
> > that by the time we are done with the inode in ext4_evict_inode()
> > (after we have forced writeback), the ext4_es_tree is empty. Agreed?
>
> Today I revise this patch again, and I find extent_status_tree is freed
> in ext4_clear_inode(). So maybe I don't think that we need to check
> this tree to be freed in ext4_evict_inode(). This change is in this
> patch '[RFC][PATCH 4/8 v2] ext4: let ext4 maintain extent status tree'.
> What's your opinion?

When you say "revise this patch again", does that mean that you would
like to submit a new set of patch series with changes? Or just that
you are looking at this patch set again?

It's certainly true that ext4_evict_inode() will call
ext4_clear_inode(), so it's not a question of worrying about a memory
leak. I was thinking more about doing this as a cheap sanity check
for the data structure. By the time we call ext4_evict_inode(), the
mm layer all writeback should be complete. Hence, all of the entries
to the tree _should_ have been removed by the time we call
ext4_evict_inode().

I don't know if this is going to change as you start using this data
structure for other purposes (such as locking a range of pages), but
if I understand how things are currently working, it _should_ be the
case that when ext4_evict_inode() calls ext4_clear_inode(), the call
to ext4_es_remove_extent() should be a no-op, since all of the nodes
in the extent status tree should have been released by then. If it
isn't, then either I'm not understanding the code, or there's a bug in
the code.

- Ted

2012-09-26 02:09:55

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Tue, Sep 25, 2012 at 04:59:21PM -0400, Theodore Ts'o wrote:
> On Tue, Sep 25, 2012 at 08:42:52PM +0800, Zheng Liu wrote:
> > > If so, we might want to think about adding a sanity check to make sure
> > > that by the time we are done with the inode in ext4_evict_inode()
> > > (after we have forced writeback), the ext4_es_tree is empty. Agreed?
> >
> > Today I revise this patch again, and I find extent_status_tree is freed
> > in ext4_clear_inode(). So maybe I don't think that we need to check
> > this tree to be freed in ext4_evict_inode(). This change is in this
> > patch '[RFC][PATCH 4/8 v2] ext4: let ext4 maintain extent status tree'.
> > What's your opinion?
>
> When you say "revise this patch again", does that mean that you would
> like to submit a new set of patch series with changes? Or just that
> you are looking at this patch set again?

Yes, I prepare to submit a new patch set.

>
> It's certainly true that ext4_evict_inode() will call
> ext4_clear_inode(), so it's not a question of worrying about a memory
> leak. I was thinking more about doing this as a cheap sanity check
> for the data structure. By the time we call ext4_evict_inode(), the
> mm layer all writeback should be complete. Hence, all of the entries
> to the tree _should_ have been removed by the time we call
> ext4_evict_inode().
>
> I don't know if this is going to change as you start using this data
> structure for other purposes (such as locking a range of pages), but
> if I understand how things are currently working, it _should_ be the
> case that when ext4_evict_inode() calls ext4_clear_inode(), the call
> to ext4_es_remove_extent() should be a no-op, since all of the nodes
> in the extent status tree should have been released by then. If it
> isn't, then either I'm not understanding the code, or there's a bug in
> the code.

Yes, you are right. In currently implementation, extent status tree
only maintains the status of delay extents. So in ext4_evict_inode()
extent status tree should be an empty tree. I will add a sanity check
to ensure it. Thanks for your explanation.

Regards,
Zheng

2012-09-26 02:47:45

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Wed, Sep 26, 2012 at 10:09:55AM +0800, Zheng Liu wrote:
> On Tue, Sep 25, 2012 at 04:59:21PM -0400, Theodore Ts'o wrote:
> > On Tue, Sep 25, 2012 at 08:42:52PM +0800, Zheng Liu wrote:
> > > > If so, we might want to think about adding a sanity check to make sure
> > > > that by the time we are done with the inode in ext4_evict_inode()
> > > > (after we have forced writeback), the ext4_es_tree is empty. Agreed?
> > >
> > > Today I revise this patch again, and I find extent_status_tree is freed
> > > in ext4_clear_inode(). So maybe I don't think that we need to check
> > > this tree to be freed in ext4_evict_inode(). This change is in this
> > > patch '[RFC][PATCH 4/8 v2] ext4: let ext4 maintain extent status tree'.
> > > What's your opinion?
> >
> > When you say "revise this patch again", does that mean that you would
> > like to submit a new set of patch series with changes? Or just that
> > you are looking at this patch set again?
>
> Yes, I prepare to submit a new patch set.

Well, note that the merge window is opening *soon*. I haven't yet
moved the master branch, so I can update the patch set, but I'm going
to need it soon.

Can you let me know what changes you need to make? If it is to add
new features or new sanity checks, does it make sense to simply make
it as new commits to existing patch set? Or are there fundamental
problems with the current set, that would be better to fix in the
current set of commits? (Or is it just minor stylistic/spelling
fixes?)

Thanks!!

- Ted

2012-09-26 03:13:50

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Tue, Sep 25, 2012 at 10:47:45PM -0400, Theodore Ts'o wrote:
> On Wed, Sep 26, 2012 at 10:09:55AM +0800, Zheng Liu wrote:
> > On Tue, Sep 25, 2012 at 04:59:21PM -0400, Theodore Ts'o wrote:
> > > On Tue, Sep 25, 2012 at 08:42:52PM +0800, Zheng Liu wrote:
> > > > > If so, we might want to think about adding a sanity check to make sure
> > > > > that by the time we are done with the inode in ext4_evict_inode()
> > > > > (after we have forced writeback), the ext4_es_tree is empty. Agreed?
> > > >
> > > > Today I revise this patch again, and I find extent_status_tree is freed
> > > > in ext4_clear_inode(). So maybe I don't think that we need to check
> > > > this tree to be freed in ext4_evict_inode(). This change is in this
> > > > patch '[RFC][PATCH 4/8 v2] ext4: let ext4 maintain extent status tree'.
> > > > What's your opinion?
> > >
> > > When you say "revise this patch again", does that mean that you would
> > > like to submit a new set of patch series with changes? Or just that
> > > you are looking at this patch set again?
> >
> > Yes, I prepare to submit a new patch set.
>
> Well, note that the merge window is opening *soon*. I haven't yet
> moved the master branch, so I can update the patch set, but I'm going
> to need it soon.
>
> Can you let me know what changes you need to make? If it is to add
> new features or new sanity checks, does it make sense to simply make
> it as new commits to existing patch set? Or are there fundamental
> problems with the current set, that would be better to fix in the
> current set of commits? (Or is it just minor stylistic/spelling
> fixes?)
>
> Thanks!!

In new patch set, there is three changes as beblow:

1. add a sanity check in ext4_evict_inode()
2. fix a bug in ext4_find_delalloc_range(). This bug is reported by
xfstest #230 when we enable bigalloc feature.
3. Add a new rwlock to protect extent status tree.

So I think that we can only add a sanity check and fix the bigalloc bug,
and then apply this patch set because the changes are minor. For adding
a new lock to protect extent status tree, we can add this feature in a
new patch. If you think it is OK, I can generate a new patch set, do
some tests using xfstest, and submit it as soon as possible. What's
your opinion?

Regards,
Zheng

2012-09-26 03:37:40

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Wed, Sep 26, 2012 at 11:24:26AM +0800, Zheng Liu wrote:
> > Can you let me know what changes you need to make? If it is to add
> > new features or new sanity checks, does it make sense to simply make
> > it as new commits to existing patch set? Or are there fundamental
> > problems with the current set, that would be better to fix in the
> > current set of commits? (Or is it just minor stylistic/spelling
> > fixes?)
> >
> > Thanks!!
>
> In new patch set, there is three changes as beblow:
>
> 1. add a sanity check in ext4_evict_inode()
> 2. fix a bug in ext4_find_delalloc_range(). This bug is reported by
> xfstest #230 when we enable bigalloc feature.
> 3. Add a new rwlock to protect extent status tree.
>
> So I think that we can only add a sanity check and fix the bigalloc bug,
> and then apply this patch set because the changes are minor. For adding
> a new lock to protect extent status tree, we can add this feature in a
> new patch. If you think it is OK, I can generate a new patch set, do
> some tests using xfstest, and submit it as soon as possible. What's
> your opinion?

Do you think you can get me the patches by the end of the week? If
so, that should work.

Thanks!!

- Ted

2012-09-26 03:54:48

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Tue, Sep 25, 2012 at 11:37:40PM -0400, Theodore Ts'o wrote:
> On Wed, Sep 26, 2012 at 11:24:26AM +0800, Zheng Liu wrote:
> > > Can you let me know what changes you need to make? If it is to add
> > > new features or new sanity checks, does it make sense to simply make
> > > it as new commits to existing patch set? Or are there fundamental
> > > problems with the current set, that would be better to fix in the
> > > current set of commits? (Or is it just minor stylistic/spelling
> > > fixes?)
> > >
> > > Thanks!!
> >
> > In new patch set, there is three changes as beblow:
> >
> > 1. add a sanity check in ext4_evict_inode()
> > 2. fix a bug in ext4_find_delalloc_range(). This bug is reported by
> > xfstest #230 when we enable bigalloc feature.
> > 3. Add a new rwlock to protect extent status tree.
> >
> > So I think that we can only add a sanity check and fix the bigalloc bug,
> > and then apply this patch set because the changes are minor. For adding
> > a new lock to protect extent status tree, we can add this feature in a
> > new patch. If you think it is OK, I can generate a new patch set, do
> > some tests using xfstest, and submit it as soon as possible. What's
> > your opinion?
>
> Do you think you can get me the patches by the end of the week? If
> so, that should work.

Before this weekend, I can finish a new patch set with (1) adding a sanity
check and (2) fixing bigalloc bug. For adding a new lock, it is
unstable now. So this lock will not be included in this patch set.

Regards,
Zheng

2012-09-26 03:46:41

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Wed, Sep 26, 2012 at 11:54:48AM +0800, Zheng Liu wrote:
>
> Before this weekend, I can finish a new patch set with (1) adding a sanity
> check and (2) fixing bigalloc bug. For adding a new lock, it is
> unstable now. So this lock will not be included in this patch set.

That sounds good to me. Adding a rw lock is something we can do in a
separate patch (which might not make the merge window for 3.7,
depending on when Linus decides to release 3.6, but that's OK).

- Ted

2012-09-26 03:48:53

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Tue, Sep 25, 2012 at 11:46:37PM -0400, Theodore Ts'o wrote:
> On Wed, Sep 26, 2012 at 11:54:48AM +0800, Zheng Liu wrote:
> >
> > Before this weekend, I can finish a new patch set with (1) adding a sanity
> > check and (2) fixing bigalloc bug. For adding a new lock, it is
> > unstable now. So this lock will not be included in this patch set.
>
> That sounds good to me. Adding a rw lock is something we can do in a
> separate patch (which might not make the merge window for 3.7,
> depending on when Linus decides to release 3.6, but that's OK).

OK, I will submit a new patch set ASAP.

Regards,
Zheng

2012-09-26 08:00:47

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Tue, Sep 25, 2012 at 04:59:21PM -0400, Theodore Ts'o wrote:
> On Tue, Sep 25, 2012 at 08:42:52PM +0800, Zheng Liu wrote:
> > > If so, we might want to think about adding a sanity check to make sure
> > > that by the time we are done with the inode in ext4_evict_inode()
> > > (after we have forced writeback), the ext4_es_tree is empty. Agreed?
> >
> > Today I revise this patch again, and I find extent_status_tree is freed
> > in ext4_clear_inode(). So maybe I don't think that we need to check
> > this tree to be freed in ext4_evict_inode(). This change is in this
> > patch '[RFC][PATCH 4/8 v2] ext4: let ext4 maintain extent status tree'.
> > What's your opinion?
>
> When you say "revise this patch again", does that mean that you would
> like to submit a new set of patch series with changes? Or just that
> you are looking at this patch set again?
>
> It's certainly true that ext4_evict_inode() will call
> ext4_clear_inode(), so it's not a question of worrying about a memory
> leak. I was thinking more about doing this as a cheap sanity check
> for the data structure. By the time we call ext4_evict_inode(), the
> mm layer all writeback should be complete. Hence, all of the entries
> to the tree _should_ have been removed by the time we call
> ext4_evict_inode().
>
> I don't know if this is going to change as you start using this data
> structure for other purposes (such as locking a range of pages), but
> if I understand how things are currently working, it _should_ be the
> case that when ext4_evict_inode() calls ext4_clear_inode(), the call
> to ext4_es_remove_extent() should be a no-op, since all of the nodes
> in the extent status tree should have been released by then. If it
> isn't, then either I'm not understanding the code, or there's a bug in
> the code.

Hi Ted,

When I try to add a BUG_ON in ext4_evict_inode() to ensure that extent
status tree is empty, I will get an error, which reports that the tree
is not empty. I find that there still has some dirty pages when we
call ext4_evict_inode() because vfs doesn't write all dirty pages back.
In iput_final(), we firstly call ext4_drop_inode(). If the return value
is true, we won't call write_inode_now() to do a writeback. It means
that we write some data to a file and remove it immediately, and then
all dirty pages that aren't during writeback progress won't be written.
These pages will be truncated in ext4_evict_inode(). Thus, the extent
status tree is possible not to be empty when we call ext4_evict_inode().
So we cannot add a sanity check in this function. Am I missing something?

Regards,
Zheng

2012-09-28 07:16:37

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Tue, Sep 25, 2012 at 11:37:40PM -0400, Theodore Ts'o wrote:
> On Wed, Sep 26, 2012 at 11:24:26AM +0800, Zheng Liu wrote:
> > > Can you let me know what changes you need to make? If it is to add
> > > new features or new sanity checks, does it make sense to simply make
> > > it as new commits to existing patch set? Or are there fundamental
> > > problems with the current set, that would be better to fix in the
> > > current set of commits? (Or is it just minor stylistic/spelling
> > > fixes?)
> > >
> > > Thanks!!
> >
> > In new patch set, there is three changes as beblow:
> >
> > 1. add a sanity check in ext4_evict_inode()
> > 2. fix a bug in ext4_find_delalloc_range(). This bug is reported by
> > xfstest #230 when we enable bigalloc feature.
> > 3. Add a new rwlock to protect extent status tree.
> >
> > So I think that we can only add a sanity check and fix the bigalloc bug,
> > and then apply this patch set because the changes are minor. For adding
> > a new lock to protect extent status tree, we can add this feature in a
> > new patch. If you think it is OK, I can generate a new patch set, do
> > some tests using xfstest, and submit it as soon as possible. What's
> > your opinion?
>
> Do you think you can get me the patches by the end of the week? If
> so, that should work.

Hi Ted,

Until now, I have fixed the bigalloc bug that is reported by xfstest
#230, and merged Hugh's patch. But I do really think that this patch
set couldn't be applied at this merge window because the change is not
*minor*, and it still needs to do more tests. That would be great if
you can keep this patch set in dev branch at this merge window. Thanks!

Regards,
Zheng

2012-09-28 17:42:33

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Fri, Sep 28, 2012 at 03:27:14PM +0800, Zheng Liu wrote:
>
> Until now, I have fixed the bigalloc bug that is reported by xfstest
> #230, and merged Hugh's patch. But I do really think that this patch
> set couldn't be applied at this merge window because the change is not
> *minor*, and it still needs to do more tests. That would be great if
> you can keep this patch set in dev branch at this merge window. Thanks!

The dev branch is the set of patches that are planned to go to Linus
during the next merge window, so if we drop it from the merge window,
I would drop it from the dev branch and put it in the "unstable"
portion of the patch series.

It would be a shame to drop it since this provides the SEEK_HOLE
capability, though. Can you say more about which change is not minor?
The change to fix the bigalloc bug? Or the whole patch series?

- Ted

2012-09-29 03:08:08

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Sat, Sep 29, 2012 at 08:54:47AM +0800, Zheng Liu wrote:
>
> When I try to fix the bigalloc bug, some code that operates on extent
> status tree and maintains its status are changed when I do some changes in
> ext4_find_delalloc_range(). So that quite has a lot of changes I thought
> in the whole patch series. So I think that we'd better drop this patch set
> from dev branch. Thanks.

OK, dropped.

- Ted

2012-09-29 13:26:38

by Zheng Liu

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Sat, Sep 29, 2012 at 11:07 AM, Theodore Ts'o <[email protected]> wrote:
> On Sat, Sep 29, 2012 at 08:54:47AM +0800, Zheng Liu wrote:
>>
>> When I try to fix the bigalloc bug, some code that operates on extent
>> status tree and maintains its status are changed when I do some changes in
>> ext4_find_delalloc_range(). So that quite has a lot of changes I thought
>> in the whole patch series. So I think that we'd better drop this patch set
>> from dev branch. Thanks.
>
> OK, dropped.

Thank you very much. I will add rwlock to protect this tree, and do
more tests for this patch set.

Regards,
Zheng

2012-09-30 14:01:00

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/8 v2] ext4: initialize extent status tree

On Fri, 28 Sep 2012 23:07:59 -0400, Theodore Ts'o <[email protected]> wrote:
> On Sat, Sep 29, 2012 at 08:54:47AM +0800, Zheng Liu wrote:
> >
> > When I try to fix the bigalloc bug, some code that operates on extent
> > status tree and maintains its status are changed when I do some changes in
> > ext4_find_delalloc_range(). So that quite has a lot of changes I thought
> > in the whole patch series. So I think that we'd better drop this patch set
> > from dev branch. Thanks.
>
> OK, dropped.
After patch-set was dropped, 269'th succeed.
Both warnings: ext4_convert_unwritten_extent which fail with ENOSPC and
WARNING: at fs/ext4/inode.c:362 ext4_da_update_reserve_space
are gone away.
>
> - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html