2013-02-17 16:13:24

by Zheng Liu

[permalink] [raw]
Subject: [PATCH 0/9 v6] ext4: extent status tree (step2)

Hi all,

Here is the sixth version of extent status tree (step2). In this
version a regression is fixed when bigalloc and delalloc are enabled,
which is triggered by xfstests #13. The root cause is that when an
extent is delayed allocated and later is allocated by fallocate it
should be as a delayed extent until this extent is written out because
we need to use it to update reserved space. That means that in extent
status tree an extent could be with unwritten and delayed status
simultaneously.

As Jan's suggestions, ext4_es_find_delayed_extent is refined. Now it
has a input parameter 'lblk' and a output parameter 'es'. Meanwhile
this function never return the first block of the next delayed extent
after 'es'. In the mean time, in ext4_es_lookup_extent its parameter
also is splitted like above.

In fifth version, I try to convert unwritten extents from extent status
tree in end_io callback function and remove a bogus wait in direct io
codepath. But there is a bug. So in this version these patches are
dropped. I will give it a try after this patch series is applied.

Further, after applied commit (ext4: grab page before starting
transaction handle in write_begin()), it seems that we could try to call
end_page_writeback() after unwritten io has done using extent status
tree. Previously, I have tried to clear page's writeback flag after
unwritten extent is converted using extent status tree, but there is a
deadlock because in ext4 we always start a transaction handle before
grabbing a page.

This patch set can be applied against 3.8-rc7 and 'dev' branch of ext4
directly.

changelog:
v6 <- v5:
- fix a regression that is reported by xfstests #13 with bigalloc
- improve ext4_es_find_delayed_extent according to Jan's suggestions
- improve ext4_es_lookup_extent according to Jan's suggestions
- drop patches that convert unwritten extents in status tree and
remove a bogus wait in ext4_ind_direct_IO()
- add a comment to describe why es_lblk can be changed directly in
__es_insert_extent()

v5: http://lwn.net/Articles/537371/
v4: http://lwn.net/Articles/536037/
v3: http://lwn.net/Articles/533730/
v2: http://lwn.net/Articles/532446/
v1: http://lwn.net/Articles/531065/

Any feedbacks are always welcome.

Thanks,
- Zheng

Zheng Liu (9):
ext4: refine extent status tree
ext4: add physical block and status member into extent status tree
ext4: ext4_es_find_extent improvement
ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag
ext4: track all extent status in extent status tree
ext4: lookup block mapping in extent status tree
ext4: remove single extent cache
ext4: adjust some functions for reclaiming extents from extent status
tree
ext4: reclaim extents from extent status tree

fs/ext4/ext4.h | 24 +-
fs/ext4/ext4_extents.h | 6 -
fs/ext4/extents.c | 234 ++++++-----------
fs/ext4/extents_status.c | 625 ++++++++++++++++++++++++++++++++------------
fs/ext4/extents_status.h | 83 +++++-
fs/ext4/file.c | 14 +-
fs/ext4/inode.c | 153 ++++++++---
fs/ext4/move_extent.c | 3 -
fs/ext4/super.c | 8 +-
include/trace/events/ext4.h | 189 +++++++++++---
10 files changed, 914 insertions(+), 425 deletions(-)

--
1.7.12.rc2.18.g61b472e



2013-02-17 16:13:30

by Zheng Liu

[permalink] [raw]
Subject: [PATCH 1/9 v6] ext4: refine extent status tree

From: Zheng Liu <[email protected]>

This commit refines the extent status tree code.

1) A prefix 'es_' is added to to the extent status tree structure
members.

2) Refactored es_remove_extent() so that __es_remove_extent() can be
used by es_insert_extent() to remove the old extent entry(-ies) before
inserting a new one.

3) Rename extent_status_end() to ext4_es_end()

4) ext4_es_can_be_merged() is define to check whether two extents can
be merged or not.

5) Update and clarified comments.

Signed-off-by: Zheng Liu <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Jan kara <[email protected]>
---
fs/ext4/extents.c | 21 +--
fs/ext4/extents_status.c | 322 +++++++++++++++++++++++++-------------------
fs/ext4/extents_status.h | 8 +-
fs/ext4/file.c | 12 +-
include/trace/events/ext4.h | 40 +++---
5 files changed, 221 insertions(+), 182 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 5ae1674..f7bf616 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3525,13 +3525,14 @@ static int ext4_find_delalloc_range(struct inode *inode,
{
struct extent_status es;

- es.start = lblk_start;
- ext4_es_find_extent(inode, &es);
- if (es.len == 0)
+ es.es_lblk = lblk_start;
+ (void)ext4_es_find_extent(inode, &es);
+ if (es.es_len == 0)
return 0; /* there is no delay extent in this tree */
- else if (es.start <= lblk_start && lblk_start < es.start + es.len)
+ else if (es.es_lblk <= lblk_start &&
+ lblk_start < es.es_lblk + es.es_len)
return 1;
- else if (lblk_start <= es.start && es.start <= lblk_end)
+ else if (lblk_start <= es.es_lblk && es.es_lblk <= lblk_end)
return 1;
else
return 0;
@@ -4567,7 +4568,7 @@ static int ext4_find_delayed_extent(struct inode *inode,
struct extent_status es;
ext4_lblk_t next_del;

- es.start = newex->ec_block;
+ es.es_lblk = newex->ec_block;
next_del = ext4_es_find_extent(inode, &es);

if (newex->ec_start == 0) {
@@ -4575,18 +4576,18 @@ static int ext4_find_delayed_extent(struct inode *inode,
* No extent in extent-tree contains block @newex->ec_start,
* then the block may stay in 1)a hole or 2)delayed-extent.
*/
- if (es.len == 0)
+ if (es.es_len == 0)
/* A hole found. */
return 0;

- if (es.start > newex->ec_block) {
+ if (es.es_lblk > newex->ec_block) {
/* A hole found. */
- newex->ec_len = min(es.start - newex->ec_block,
+ newex->ec_len = min(es.es_lblk - newex->ec_block,
newex->ec_len);
return 0;
}

- newex->ec_len = es.start + es.len - newex->ec_block;
+ newex->ec_len = es.es_lblk + es.es_len - newex->ec_block;
}

return next_del;
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 564d981..a6d2fe1 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -23,40 +23,53 @@
* (e.g. Reservation space warning), and provide extent-level locking.
* Delay extent tree is the first step to achieve this goal. It is
* original built by Yongqiang Yang. At that time it is called delay
- * extent tree, whose goal is only track delay extent in memory to
+ * extent tree, whose goal is only track delayed extents in memory to
* simplify the implementation of fiemap and bigalloc, and introduce
* lseek SEEK_DATA/SEEK_HOLE support. That is why it is still called
- * delay extent tree at the following comment. But for better
- * understand what it does, it has been rename to extent status tree.
+ * delay extent tree at the first commit. But for better understand
+ * what it does, it has been rename to extent status tree.
*
- * Currently the first step has been done. All delay extents are
- * tracked in the tree. It maintains the delay extent when a delay
- * allocation is issued, and the delay extent is written out or
+ * Step1:
+ * Currently the first step has been done. All delayed extents are
+ * tracked in the tree. It maintains the delayed extent when a delayed
+ * allocation is issued, and the delayed extent is written out or
* invalidated. Therefore the implementation of fiemap and bigalloc
* are simplified, and SEEK_DATA/SEEK_HOLE are introduced.
*
* The following comment describes the implemenmtation of extent
* status tree and future works.
+ *
+ * Step2:
+ * In this step all extent status are tracked by extent status tree.
+ * Thus, we can first try to lookup a block mapping in this tree before
+ * finding it in extent tree. Hence, single extent cache can be removed
+ * because extent status tree can do a better job. Extents in status
+ * tree are loaded on-demand. Therefore, the extent status tree may not
+ * contain all of the extents in a file. Meanwhile we define a shrinker
+ * to reclaim memory from extent status tree because fragmented extent
+ * tree will make status tree cost too much memory. written/unwritten/-
+ * hole extents in the tree will be reclaimed by this shrinker when we
+ * are under a high memory pressure. Delayed extent will not be
+ * reclimed because fiemap, bigalloc, and seek_data/hole need it.
*/

/*
- * extents status tree implementation for ext4.
+ * Extent status tree implementation for ext4.
*
*
* ==========================================================================
- * Extents status encompass delayed extents and extent locks
+ * Extent status tree tracks all extent status.
*
- * 1. Why delayed extent implementation ?
+ * 1. Why we need to implement extent status tree?
*
- * Without delayed extent, ext4 identifies a delayed extent by looking
+ * Without extent status tree, ext4 identifies a delayed extent by looking
* up page cache, this has several deficiencies - complicated, buggy,
* and inefficient code.
*
- * FIEMAP, SEEK_HOLE/DATA, bigalloc, punch hole and writeout all need
- * to know if a block or a range of blocks are belonged to a delayed
- * extent.
+ * FIEMAP, SEEK_HOLE/DATA, bigalloc, and writeout all need to know if a
+ * block or a range of blocks are belonged to a delayed extent.
*
- * Let us have a look at how they do without delayed extents implementation.
+ * Let us have a look at how they do without extent status tree.
* -- FIEMAP
* FIEMAP looks up page cache to identify delayed allocations from holes.
*
@@ -68,47 +81,48 @@
* already under delayed allocation or not to determine whether
* quota reserving is needed for the cluster.
*
- * -- punch hole
- * punch hole looks up page cache to identify a delayed extent.
- *
* -- writeout
* Writeout looks up whole page cache to see if a buffer is
* mapped, If there are not very many delayed buffers, then it is
* time comsuming.
*
- * With delayed extents implementation, FIEMAP, SEEK_HOLE/DATA,
+ * With extent status tree implementation, FIEMAP, SEEK_HOLE/DATA,
* bigalloc and writeout can figure out if a block or a range of
* blocks is under delayed allocation(belonged to a delayed extent) or
- * not by searching the delayed extent tree.
+ * not by searching the extent tree.
*
*
* ==========================================================================
- * 2. ext4 delayed extents impelmentation
+ * 2. Ext4 extent status tree impelmentation
+ *
+ * -- extent
+ * A extent is a range of blocks which are contiguous logically and
+ * physically. Unlike extent in extent tree, this extent in ext4 is
+ * a in-memory struct, there is no corresponding on-disk data. There
+ * is no limit on length of extent, so an extent can contain as many
+ * blocks as they are contiguous logically and physically.
*
- * -- delayed extent
- * A delayed extent is a range of blocks which are contiguous
- * logically and under delayed allocation. Unlike extent in
- * ext4, delayed extent in ext4 is a in-memory struct, there is
- * no corresponding on-disk data. There is no limit on length of
- * delayed extent, so a delayed extent can contain as many blocks
- * as they are contiguous logically.
+ * -- extent status tree
+ * Every inode has an extent status tree and all allocation blocks
+ * are added to the tree with different status. The extent in the
+ * tree are ordered by logical block no.
*
- * -- delayed extent tree
- * Every inode has a delayed extent tree and all under delayed
- * allocation blocks are added to the tree as delayed extents.
- * Delayed extents in the tree are ordered by logical block no.
+ * -- operations on a extent status tree
+ * There are three important operations on a delayed extent tree: find
+ * next extent, adding a extent(a range of blocks) and removing a extent.
*
- * -- operations on a delayed extent tree
- * There are three operations on a delayed extent tree: find next
- * delayed extent, adding a space(a range of blocks) and removing
- * a space.
+ * -- race on a extent status tree
+ * Extent status tree is protected inode->i_es_lock.
*
- * -- race on a delayed extent tree
- * Delayed extent tree is protected inode->i_es_lock.
+ * -- memory consumption
+ * Fragmented extent tree will make extent status tree cost too much
+ * memory. Hence, we will reclaim written/unwritten/hole extents from
+ * the tree under a heavy memory pressure.
*
*
* ==========================================================================
- * 3. performance analysis
+ * 3. Performance analysis
+ *
* -- overhead
* 1. There is a cache extent for write access, so if writes are
* not very random, adding space operaions are in O(1) time.
@@ -120,15 +134,19 @@
*
* ==========================================================================
* 4. TODO list
- * -- Track all extent status
*
- * -- Improve get block process
+ * -- Refactor delayed space reservation
*
* -- Extent-level locking
*/

static struct kmem_cache *ext4_es_cachep;

+static int __es_insert_extent(struct ext4_es_tree *tree,
+ struct extent_status *newes);
+static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
+ ext4_lblk_t end);
+
int __init ext4_init_es(void)
{
ext4_es_cachep = KMEM_CACHE(extent_status, SLAB_RECLAIM_ACCOUNT);
@@ -161,7 +179,7 @@ static void ext4_es_print_tree(struct inode *inode)
while (node) {
struct extent_status *es;
es = rb_entry(node, struct extent_status, rb_node);
- printk(KERN_DEBUG " [%u/%u)", es->start, es->len);
+ printk(KERN_DEBUG " [%u/%u)", es->es_lblk, es->es_len);
node = rb_next(node);
}
printk(KERN_DEBUG "\n");
@@ -170,10 +188,10 @@ static void ext4_es_print_tree(struct inode *inode)
#define ext4_es_print_tree(inode)
#endif

-static inline ext4_lblk_t extent_status_end(struct extent_status *es)
+static inline ext4_lblk_t ext4_es_end(struct extent_status *es)
{
- BUG_ON(es->start + es->len < es->start);
- return es->start + es->len - 1;
+ BUG_ON(es->es_lblk + es->es_len < es->es_lblk);
+ return es->es_lblk + es->es_len - 1;
}

/*
@@ -181,25 +199,25 @@ static inline ext4_lblk_t extent_status_end(struct extent_status *es)
* it can't be found, try to find next extent.
*/
static struct extent_status *__es_tree_search(struct rb_root *root,
- ext4_lblk_t offset)
+ ext4_lblk_t lblk)
{
struct rb_node *node = root->rb_node;
struct extent_status *es = NULL;

while (node) {
es = rb_entry(node, struct extent_status, rb_node);
- if (offset < es->start)
+ if (lblk < es->es_lblk)
node = node->rb_left;
- else if (offset > extent_status_end(es))
+ else if (lblk > ext4_es_end(es))
node = node->rb_right;
else
return es;
}

- if (es && offset < es->start)
+ if (es && lblk < es->es_lblk)
return es;

- if (es && offset > extent_status_end(es)) {
+ if (es && lblk > ext4_es_end(es)) {
node = rb_next(&es->rb_node);
return node ? rb_entry(node, struct extent_status, rb_node) :
NULL;
@@ -209,8 +227,8 @@ static struct extent_status *__es_tree_search(struct rb_root *root,
}

/*
- * ext4_es_find_extent: find the 1st delayed extent covering @es->start
- * if it exists, otherwise, the next extent after @es->start.
+ * ext4_es_find_extent: find the 1st delayed extent covering @es->lblk
+ * if it exists, otherwise, the next extent after @es->lblk.
*
* @inode: the inode which owns delayed extents
* @es: delayed extent that we found
@@ -226,7 +244,7 @@ ext4_lblk_t ext4_es_find_extent(struct inode *inode, struct extent_status *es)
struct rb_node *node;
ext4_lblk_t ret = EXT_MAX_BLOCKS;

- trace_ext4_es_find_extent_enter(inode, es->start);
+ trace_ext4_es_find_extent_enter(inode, es->es_lblk);

read_lock(&EXT4_I(inode)->i_es_lock);
tree = &EXT4_I(inode)->i_es_tree;
@@ -234,25 +252,25 @@ ext4_lblk_t ext4_es_find_extent(struct inode *inode, struct extent_status *es)
/* find delay extent in cache firstly */
if (tree->cache_es) {
es1 = tree->cache_es;
- if (in_range(es->start, es1->start, es1->len)) {
+ if (in_range(es->es_lblk, es1->es_lblk, es1->es_len)) {
es_debug("%u cached by [%u/%u)\n",
- es->start, es1->start, es1->len);
+ es->es_lblk, es1->es_lblk, es1->es_len);
goto out;
}
}

- es->len = 0;
- es1 = __es_tree_search(&tree->root, es->start);
+ es->es_len = 0;
+ es1 = __es_tree_search(&tree->root, es->es_lblk);

out:
if (es1) {
tree->cache_es = es1;
- es->start = es1->start;
- es->len = es1->len;
+ es->es_lblk = es1->es_lblk;
+ es->es_len = es1->es_len;
node = rb_next(&es1->rb_node);
if (node) {
es1 = rb_entry(node, struct extent_status, rb_node);
- ret = es1->start;
+ ret = es1->es_lblk;
}
}

@@ -263,14 +281,14 @@ out:
}

static struct extent_status *
-ext4_es_alloc_extent(ext4_lblk_t start, ext4_lblk_t len)
+ext4_es_alloc_extent(ext4_lblk_t lblk, ext4_lblk_t len)
{
struct extent_status *es;
es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC);
if (es == NULL)
return NULL;
- es->start = start;
- es->len = len;
+ es->es_lblk = lblk;
+ es->es_len = len;
return es;
}

@@ -279,6 +297,20 @@ static void ext4_es_free_extent(struct extent_status *es)
kmem_cache_free(ext4_es_cachep, es);
}

+/*
+ * Check whether or not two extents can be merged
+ * Condition:
+ * - logical block number is contiguous
+ */
+static int ext4_es_can_be_merged(struct extent_status *es1,
+ struct extent_status *es2)
+{
+ if (es1->es_lblk + es1->es_len != es2->es_lblk)
+ return 0;
+
+ return 1;
+}
+
static struct extent_status *
ext4_es_try_to_merge_left(struct ext4_es_tree *tree, struct extent_status *es)
{
@@ -290,8 +322,8 @@ ext4_es_try_to_merge_left(struct ext4_es_tree *tree, struct extent_status *es)
return es;

es1 = rb_entry(node, struct extent_status, rb_node);
- if (es->start == extent_status_end(es1) + 1) {
- es1->len += es->len;
+ if (ext4_es_can_be_merged(es1, es)) {
+ es1->es_len += es->es_len;
rb_erase(&es->rb_node, &tree->root);
ext4_es_free_extent(es);
es = es1;
@@ -311,8 +343,8 @@ ext4_es_try_to_merge_right(struct ext4_es_tree *tree, struct extent_status *es)
return es;

es1 = rb_entry(node, struct extent_status, rb_node);
- if (es1->start == extent_status_end(es) + 1) {
- es->len += es1->len;
+ if (ext4_es_can_be_merged(es, es1)) {
+ es->es_len += es1->es_len;
rb_erase(node, &tree->root);
ext4_es_free_extent(es1);
}
@@ -320,60 +352,43 @@ ext4_es_try_to_merge_right(struct ext4_es_tree *tree, struct extent_status *es)
return es;
}

-static int __es_insert_extent(struct ext4_es_tree *tree, ext4_lblk_t offset,
- ext4_lblk_t len)
+static int __es_insert_extent(struct ext4_es_tree *tree,
+ struct extent_status *newes)
{
struct rb_node **p = &tree->root.rb_node;
struct rb_node *parent = NULL;
struct extent_status *es;
- ext4_lblk_t end = offset + len - 1;
-
- BUG_ON(end < offset);
- es = tree->cache_es;
- if (es && offset == (extent_status_end(es) + 1)) {
- es_debug("cached by [%u/%u)\n", es->start, es->len);
- es->len += len;
- es = ext4_es_try_to_merge_right(tree, es);
- goto out;
- } else if (es && es->start == end + 1) {
- es_debug("cached by [%u/%u)\n", es->start, es->len);
- es->start = offset;
- es->len += len;
- es = ext4_es_try_to_merge_left(tree, es);
- goto out;
- } else if (es && es->start <= offset &&
- end <= extent_status_end(es)) {
- es_debug("cached by [%u/%u)\n", es->start, es->len);
- goto out;
- }

while (*p) {
parent = *p;
es = rb_entry(parent, struct extent_status, rb_node);

- if (offset < es->start) {
- if (es->start == end + 1) {
- es->start = offset;
- es->len += len;
+ if (newes->es_lblk < es->es_lblk) {
+ if (ext4_es_can_be_merged(newes, es)) {
+ /*
+ * Here we can modify es_lblk directly
+ * because it isn't overlapped.
+ */
+ es->es_lblk = newes->es_lblk;
+ es->es_len += newes->es_len;
es = ext4_es_try_to_merge_left(tree, es);
goto out;
}
p = &(*p)->rb_left;
- } else if (offset > extent_status_end(es)) {
- if (offset == extent_status_end(es) + 1) {
- es->len += len;
+ } else if (newes->es_lblk > ext4_es_end(es)) {
+ if (ext4_es_can_be_merged(es, newes)) {
+ es->es_len += newes->es_len;
es = ext4_es_try_to_merge_right(tree, es);
goto out;
}
p = &(*p)->rb_right;
} else {
- if (extent_status_end(es) <= end)
- es->len = offset - es->start + len;
- goto out;
+ BUG_ON(1);
+ return -EINVAL;
}
}

- es = ext4_es_alloc_extent(offset, len);
+ es = ext4_es_alloc_extent(newes->es_lblk, newes->es_len);
if (!es)
return -ENOMEM;
rb_link_node(&es->rb_node, parent, p);
@@ -385,27 +400,38 @@ out:
}

/*
- * ext4_es_insert_extent() adds a space to a delayed extent tree.
- * Caller holds inode->i_es_lock.
+ * ext4_es_insert_extent() adds a space to a extent status tree.
*
* ext4_es_insert_extent is called by ext4_da_write_begin and
* ext4_es_remove_extent.
*
* Return 0 on success, error code on failure.
*/
-int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t offset,
+int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len)
{
struct ext4_es_tree *tree;
+ struct extent_status newes;
+ ext4_lblk_t end = lblk + len - 1;
int err = 0;

- trace_ext4_es_insert_extent(inode, offset, len);
+ trace_ext4_es_insert_extent(inode, lblk, len);
es_debug("add [%u/%u) to extent status tree of inode %lu\n",
- offset, len, inode->i_ino);
+ lblk, len, inode->i_ino);
+
+ BUG_ON(end < lblk);
+
+ newes.es_lblk = lblk;
+ newes.es_len = len;

write_lock(&EXT4_I(inode)->i_es_lock);
tree = &EXT4_I(inode)->i_es_tree;
- err = __es_insert_extent(tree, offset, len);
+ err = __es_remove_extent(tree, lblk, end);
+ if (err != 0)
+ goto error;
+ err = __es_insert_extent(tree, &newes);
+
+error:
write_unlock(&EXT4_I(inode)->i_es_lock);

ext4_es_print_tree(inode);
@@ -413,57 +439,45 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t offset,
return err;
}

-/*
- * ext4_es_remove_extent() removes a space from a delayed extent tree.
- * Caller holds inode->i_es_lock.
- *
- * Return 0 on success, error code on failure.
- */
-int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t offset,
- ext4_lblk_t len)
+static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
+ ext4_lblk_t end)
{
struct rb_node *node;
- struct ext4_es_tree *tree;
struct extent_status *es;
struct extent_status orig_es;
- ext4_lblk_t len1, len2, end;
+ ext4_lblk_t len1, len2;
int err = 0;

- trace_ext4_es_remove_extent(inode, offset, len);
- es_debug("remove [%u/%u) from extent status tree of inode %lu\n",
- offset, len, inode->i_ino);
-
- end = offset + len - 1;
- BUG_ON(end < offset);
- write_lock(&EXT4_I(inode)->i_es_lock);
- tree = &EXT4_I(inode)->i_es_tree;
- es = __es_tree_search(&tree->root, offset);
+ es = __es_tree_search(&tree->root, lblk);
if (!es)
goto out;
- if (es->start > end)
+ if (es->es_lblk > end)
goto out;

/* Simply invalidate cache_es. */
tree->cache_es = NULL;

- orig_es.start = es->start;
- orig_es.len = es->len;
- len1 = offset > es->start ? offset - es->start : 0;
- len2 = extent_status_end(es) > end ?
- extent_status_end(es) - end : 0;
+ orig_es.es_lblk = es->es_lblk;
+ orig_es.es_len = es->es_len;
+ len1 = lblk > es->es_lblk ? lblk - es->es_lblk : 0;
+ len2 = ext4_es_end(es) > end ? ext4_es_end(es) - end : 0;
if (len1 > 0)
- es->len = len1;
+ es->es_len = len1;
if (len2 > 0) {
if (len1 > 0) {
- err = __es_insert_extent(tree, end + 1, len2);
+ struct extent_status newes;
+
+ newes.es_lblk = end + 1;
+ newes.es_len = len2;
+ err = __es_insert_extent(tree, &newes);
if (err) {
- es->start = orig_es.start;
- es->len = orig_es.len;
+ es->es_lblk = orig_es.es_lblk;
+ es->es_len = orig_es.es_len;
goto out;
}
} else {
- es->start = end + 1;
- es->len = len2;
+ es->es_lblk = end + 1;
+ es->es_len = len2;
}
goto out;
}
@@ -476,7 +490,7 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t offset,
es = NULL;
}

- while (es && extent_status_end(es) <= end) {
+ while (es && ext4_es_end(es) <= end) {
node = rb_next(&es->rb_node);
rb_erase(&es->rb_node, &tree->root);
ext4_es_free_extent(es);
@@ -487,13 +501,39 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t offset,
es = rb_entry(node, struct extent_status, rb_node);
}

- if (es && es->start < end + 1) {
- len1 = extent_status_end(es) - end;
- es->start = end + 1;
- es->len = len1;
+ if (es && es->es_lblk < end + 1) {
+ len1 = ext4_es_end(es) - end;
+ es->es_lblk = end + 1;
+ es->es_len = len1;
}

out:
+ return err;
+}
+
+/*
+ * ext4_es_remove_extent() removes a space from a extent status tree.
+ *
+ * Return 0 on success, error code on failure.
+ */
+int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
+ ext4_lblk_t len)
+{
+ struct ext4_es_tree *tree;
+ ext4_lblk_t end;
+ int err = 0;
+
+ trace_ext4_es_remove_extent(inode, lblk, len);
+ es_debug("remove [%u/%u) from extent status tree of inode %lu\n",
+ lblk, len, inode->i_ino);
+
+ end = lblk + len - 1;
+ BUG_ON(end < lblk);
+
+ tree = &EXT4_I(inode)->i_es_tree;
+
+ write_lock(&EXT4_I(inode)->i_es_lock);
+ err = __es_remove_extent(tree, lblk, end);
write_unlock(&EXT4_I(inode)->i_es_lock);
ext4_es_print_tree(inode);
return err;
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index 077f82d..81e9339 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -22,8 +22,8 @@

struct extent_status {
struct rb_node rb_node;
- ext4_lblk_t start; /* first block extent covers */
- ext4_lblk_t len; /* length of extent in block */
+ ext4_lblk_t es_lblk; /* first logical block extent covers */
+ ext4_lblk_t es_len; /* length of extent in block */
};

struct ext4_es_tree {
@@ -35,9 +35,9 @@ extern int __init ext4_init_es(void);
extern void ext4_exit_es(void);
extern void ext4_es_init_tree(struct ext4_es_tree *tree);

-extern int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t start,
+extern int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len);
-extern int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t start,
+extern int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len);
extern ext4_lblk_t ext4_es_find_extent(struct inode *inode,
struct extent_status *es);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 405565a..aceaf5f 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -464,10 +464,9 @@ static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
* If there is a delay extent at this offset,
* it will be as a data.
*/
- es.start = last;
+ es.es_lblk = last;
(void)ext4_es_find_extent(inode, &es);
- if (last >= es.start &&
- last < es.start + es.len) {
+ if (es.es_len != 0 && in_range(last, es.es_lblk, es.es_len)) {
if (last != start)
dataoff = last << blkbits;
break;
@@ -549,11 +548,10 @@ static loff_t ext4_seek_hole(struct file *file, loff_t offset, loff_t maxsize)
* If there is a delay extent at this offset,
* we will skip this extent.
*/
- es.start = last;
+ es.es_lblk = last;
(void)ext4_es_find_extent(inode, &es);
- if (last >= es.start &&
- last < es.start + es.len) {
- last = es.start + es.len;
+ if (es.es_len != 0 && in_range(last, es.es_lblk, es.es_len)) {
+ last = es.es_lblk + es.es_len;
holeoff = last << blkbits;
continue;
}
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 7e8c36b..952628a 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2068,75 +2068,75 @@ TRACE_EVENT(ext4_ext_remove_space_done,
);

TRACE_EVENT(ext4_es_insert_extent,
- TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t len),
+ TP_PROTO(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len),

- TP_ARGS(inode, start, len),
+ TP_ARGS(inode, lblk, len),

TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
- __field( loff_t, start )
+ __field( loff_t, lblk )
__field( loff_t, len )
),

TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
- __entry->start = start;
+ __entry->lblk = lblk;
__entry->len = len;
),

TP_printk("dev %d,%d ino %lu es [%lld/%lld)",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
- __entry->start, __entry->len)
+ __entry->lblk, __entry->len)
);

TRACE_EVENT(ext4_es_remove_extent,
- TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t len),
+ TP_PROTO(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len),

- TP_ARGS(inode, start, len),
+ TP_ARGS(inode, lblk, len),

TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
- __field( loff_t, start )
+ __field( loff_t, lblk )
__field( loff_t, len )
),

TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
- __entry->start = start;
+ __entry->lblk = lblk;
__entry->len = len;
),

TP_printk("dev %d,%d ino %lu es [%lld/%lld)",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
- __entry->start, __entry->len)
+ __entry->lblk, __entry->len)
);

TRACE_EVENT(ext4_es_find_extent_enter,
- TP_PROTO(struct inode *inode, ext4_lblk_t start),
+ TP_PROTO(struct inode *inode, ext4_lblk_t lblk),

- TP_ARGS(inode, start),
+ TP_ARGS(inode, lblk),

TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
- __field( ext4_lblk_t, start )
+ __field( ext4_lblk_t, lblk )
),

TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
- __entry->start = start;
+ __entry->lblk = lblk;
),

- TP_printk("dev %d,%d ino %lu start %u",
+ TP_printk("dev %d,%d ino %lu lblk %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
- (unsigned long) __entry->ino, __entry->start)
+ (unsigned long) __entry->ino, __entry->lblk)
);

TRACE_EVENT(ext4_es_find_extent_exit,
@@ -2148,7 +2148,7 @@ TRACE_EVENT(ext4_es_find_extent_exit,
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
- __field( ext4_lblk_t, start )
+ __field( ext4_lblk_t, lblk )
__field( ext4_lblk_t, len )
__field( ext4_lblk_t, ret )
),
@@ -2156,15 +2156,15 @@ TRACE_EVENT(ext4_es_find_extent_exit,
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
- __entry->start = es->start;
- __entry->len = es->len;
+ __entry->lblk = es->es_lblk;
+ __entry->len = es->es_len;
__entry->ret = ret;
),

TP_printk("dev %d,%d ino %lu es [%u/%u) ret %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
- __entry->start, __entry->len, __entry->ret)
+ __entry->lblk, __entry->len, __entry->ret)
);

#endif /* _TRACE_EXT4_H */
--
1.7.12.rc2.18.g61b472e


2013-02-17 16:13:35

by Zheng Liu

[permalink] [raw]
Subject: [PATCH 2/9 v6] ext4: add physical block and status member into extent status tree

From: Zheng Liu <[email protected]>

This commit adds two members in extent_status structure to let it record
physical block and extent status. Here es_pblk is used to record both
of them because physical block only has 48 bits. So extent status could
be stashed into it so that we can save some memory. Now written,
unwritten, delayed and hole are defined as status.

Due to new member is added into extent status tree, all interfaces need
to be adjusted.

Signed-off-by: Zheng Liu <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Jan kara <[email protected]>
---
fs/ext4/extents_status.c | 67 +++++++++++++++++++++++++++++++++++++--------
fs/ext4/extents_status.h | 64 ++++++++++++++++++++++++++++++++++++++++++-
fs/ext4/inode.c | 3 +-
include/trace/events/ext4.h | 34 +++++++++++++++--------
4 files changed, 142 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index a6d2fe1..243611f 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -179,7 +179,9 @@ static void ext4_es_print_tree(struct inode *inode)
while (node) {
struct extent_status *es;
es = rb_entry(node, struct extent_status, rb_node);
- printk(KERN_DEBUG " [%u/%u)", es->es_lblk, es->es_len);
+ printk(KERN_DEBUG " [%u/%u) %llu %llx",
+ es->es_lblk, es->es_len,
+ ext4_es_pblock(es), ext4_es_status(es));
node = rb_next(node);
}
printk(KERN_DEBUG "\n");
@@ -234,7 +236,7 @@ static struct extent_status *__es_tree_search(struct rb_root *root,
* @es: delayed extent that we found
*
* Returns the first block of the next extent after es, otherwise
- * EXT_MAX_BLOCKS if no delay extent is found.
+ * EXT_MAX_BLOCKS if no extent is found.
* Delayed extent is returned via @es.
*/
ext4_lblk_t ext4_es_find_extent(struct inode *inode, struct extent_status *es)
@@ -249,17 +251,18 @@ ext4_lblk_t ext4_es_find_extent(struct inode *inode, struct extent_status *es)
read_lock(&EXT4_I(inode)->i_es_lock);
tree = &EXT4_I(inode)->i_es_tree;

- /* find delay extent in cache firstly */
+ /* find extent in cache firstly */
+ es->es_len = es->es_pblk = 0;
if (tree->cache_es) {
es1 = tree->cache_es;
if (in_range(es->es_lblk, es1->es_lblk, es1->es_len)) {
- es_debug("%u cached by [%u/%u)\n",
- es->es_lblk, es1->es_lblk, es1->es_len);
+ es_debug("%u cached by [%u/%u) %llu %llx\n",
+ es->es_lblk, es1->es_lblk, es1->es_len,
+ ext4_es_pblock(es1), ext4_es_status(es1));
goto out;
}
}

- es->es_len = 0;
es1 = __es_tree_search(&tree->root, es->es_lblk);

out:
@@ -267,6 +270,7 @@ out:
tree->cache_es = es1;
es->es_lblk = es1->es_lblk;
es->es_len = es1->es_len;
+ es->es_pblk = es1->es_pblk;
node = rb_next(&es1->rb_node);
if (node) {
es1 = rb_entry(node, struct extent_status, rb_node);
@@ -281,7 +285,7 @@ out:
}

static struct extent_status *
-ext4_es_alloc_extent(ext4_lblk_t lblk, ext4_lblk_t len)
+ext4_es_alloc_extent(ext4_lblk_t lblk, ext4_lblk_t len, ext4_fsblk_t pblk)
{
struct extent_status *es;
es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC);
@@ -289,6 +293,7 @@ ext4_es_alloc_extent(ext4_lblk_t lblk, ext4_lblk_t len)
return NULL;
es->es_lblk = lblk;
es->es_len = len;
+ es->es_pblk = pblk;
return es;
}

@@ -301,6 +306,8 @@ static void ext4_es_free_extent(struct extent_status *es)
* Check whether or not two extents can be merged
* Condition:
* - logical block number is contiguous
+ * - physical block number is contiguous
+ * - status is equal
*/
static int ext4_es_can_be_merged(struct extent_status *es1,
struct extent_status *es2)
@@ -308,6 +315,13 @@ static int ext4_es_can_be_merged(struct extent_status *es1,
if (es1->es_lblk + es1->es_len != es2->es_lblk)
return 0;

+ if (ext4_es_status(es1) != ext4_es_status(es2))
+ return 0;
+
+ if ((ext4_es_is_written(es1) || ext4_es_is_unwritten(es1)) &&
+ (ext4_es_pblock(es1) + es1->es_len != ext4_es_pblock(es2)))
+ return 0;
+
return 1;
}

@@ -371,6 +385,10 @@ static int __es_insert_extent(struct ext4_es_tree *tree,
*/
es->es_lblk = newes->es_lblk;
es->es_len += newes->es_len;
+ if (ext4_es_is_written(es) ||
+ ext4_es_is_unwritten(es))
+ ext4_es_store_pblock(es,
+ newes->es_pblk);
es = ext4_es_try_to_merge_left(tree, es);
goto out;
}
@@ -388,7 +406,8 @@ static int __es_insert_extent(struct ext4_es_tree *tree,
}
}

- es = ext4_es_alloc_extent(newes->es_lblk, newes->es_len);
+ es = ext4_es_alloc_extent(newes->es_lblk, newes->es_len,
+ newes->es_pblk);
if (!es)
return -ENOMEM;
rb_link_node(&es->rb_node, parent, p);
@@ -408,21 +427,24 @@ out:
* Return 0 on success, error code on failure.
*/
int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
- ext4_lblk_t len)
+ ext4_lblk_t len, ext4_fsblk_t pblk,
+ unsigned long long status)
{
struct ext4_es_tree *tree;
struct extent_status newes;
ext4_lblk_t end = lblk + len - 1;
int err = 0;

- trace_ext4_es_insert_extent(inode, lblk, len);
- es_debug("add [%u/%u) to extent status tree of inode %lu\n",
- lblk, len, inode->i_ino);
+ es_debug("add [%u/%u) %llu %llx to extent status tree of inode %lu\n",
+ lblk, len, pblk, status, inode->i_ino);

BUG_ON(end < lblk);

newes.es_lblk = lblk;
newes.es_len = len;
+ ext4_es_store_pblock(&newes, pblk);
+ ext4_es_store_status(&newes, status);
+ trace_ext4_es_insert_extent(inode, &newes);

write_lock(&EXT4_I(inode)->i_es_lock);
tree = &EXT4_I(inode)->i_es_tree;
@@ -446,6 +468,7 @@ static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
struct extent_status *es;
struct extent_status orig_es;
ext4_lblk_t len1, len2;
+ ext4_fsblk_t block;
int err = 0;

es = __es_tree_search(&tree->root, lblk);
@@ -459,6 +482,8 @@ static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,

orig_es.es_lblk = es->es_lblk;
orig_es.es_len = es->es_len;
+ orig_es.es_pblk = es->es_pblk;
+
len1 = lblk > es->es_lblk ? lblk - es->es_lblk : 0;
len2 = ext4_es_end(es) > end ? ext4_es_end(es) - end : 0;
if (len1 > 0)
@@ -469,6 +494,13 @@ static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,

newes.es_lblk = end + 1;
newes.es_len = len2;
+ if (ext4_es_is_written(&orig_es) ||
+ ext4_es_is_unwritten(&orig_es)) {
+ block = ext4_es_pblock(&orig_es) +
+ orig_es.es_len - len2;
+ ext4_es_store_pblock(&newes, block);
+ }
+ ext4_es_store_status(&newes, ext4_es_status(&orig_es));
err = __es_insert_extent(tree, &newes);
if (err) {
es->es_lblk = orig_es.es_lblk;
@@ -478,6 +510,11 @@ static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
} else {
es->es_lblk = end + 1;
es->es_len = len2;
+ if (ext4_es_is_written(es) ||
+ ext4_es_is_unwritten(es)) {
+ block = orig_es.es_pblk + orig_es.es_len - len2;
+ ext4_es_store_pblock(es, block);
+ }
}
goto out;
}
@@ -502,9 +539,15 @@ static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
}

if (es && es->es_lblk < end + 1) {
+ ext4_lblk_t orig_len = es->es_len;
+
len1 = ext4_es_end(es) - end;
es->es_lblk = end + 1;
es->es_len = len1;
+ if (ext4_es_is_written(es) || ext4_es_is_unwritten(es)) {
+ block = es->es_pblk + orig_len - len1;
+ ext4_es_store_pblock(es, block);
+ }
}

out:
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index 81e9339..3cad833 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -20,10 +20,21 @@
#define es_debug(fmt, ...) no_printk(fmt, ##__VA_ARGS__)
#endif

+#define EXTENT_STATUS_WRITTEN 0x80000000 /* written extent */
+#define EXTENT_STATUS_UNWRITTEN 0x40000000 /* unwritten extent */
+#define EXTENT_STATUS_DELAYED 0x20000000 /* delayed extent */
+#define EXTENT_STATUS_HOLE 0x10000000 /* hole */
+
+#define EXTENT_STATUS_FLAGS (EXTENT_STATUS_WRITTEN | \
+ EXTENT_STATUS_UNWRITTEN | \
+ EXTENT_STATUS_DELAYED | \
+ EXTENT_STATUS_HOLE)
+
struct extent_status {
struct rb_node rb_node;
ext4_lblk_t es_lblk; /* first logical block extent covers */
ext4_lblk_t es_len; /* length of extent in block */
+ ext4_fsblk_t es_pblk; /* first physical block */
};

struct ext4_es_tree {
@@ -36,10 +47,61 @@ extern void ext4_exit_es(void);
extern void ext4_es_init_tree(struct ext4_es_tree *tree);

extern int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
- ext4_lblk_t len);
+ ext4_lblk_t len, ext4_fsblk_t pblk,
+ unsigned long long status);
extern int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len);
extern ext4_lblk_t ext4_es_find_extent(struct inode *inode,
struct extent_status *es);

+static inline int ext4_es_is_written(struct extent_status *es)
+{
+ return (es->es_pblk & EXTENT_STATUS_WRITTEN);
+}
+
+static inline int ext4_es_is_unwritten(struct extent_status *es)
+{
+ return (es->es_pblk & EXTENT_STATUS_UNWRITTEN);
+}
+
+static inline int ext4_es_is_delayed(struct extent_status *es)
+{
+ return (es->es_pblk & EXTENT_STATUS_DELAYED);
+}
+
+static inline int ext4_es_is_hole(struct extent_status *es)
+{
+ return (es->es_pblk & EXTENT_STATUS_HOLE);
+}
+
+static inline ext4_fsblk_t ext4_es_status(struct extent_status *es)
+{
+ return (es->es_pblk & EXTENT_STATUS_FLAGS);
+}
+
+static inline ext4_fsblk_t ext4_es_pblock(struct extent_status *es)
+{
+ return (es->es_pblk & ~EXTENT_STATUS_FLAGS);
+}
+
+static inline void ext4_es_store_pblock(struct extent_status *es,
+ ext4_fsblk_t pb)
+{
+ ext4_fsblk_t block;
+
+ block = (pb & ~EXTENT_STATUS_FLAGS) |
+ (es->es_pblk & EXTENT_STATUS_FLAGS);
+ es->es_pblk = block;
+}
+
+static inline void ext4_es_store_status(struct extent_status *es,
+ unsigned long long status)
+{
+ ext4_fsblk_t block;
+
+ block = (status & EXTENT_STATUS_FLAGS) |
+ (es->es_pblk & ~EXTENT_STATUS_FLAGS);
+ es->es_pblk = block;
+}
+
#endif /* _EXT4_EXTENTS_STATUS_H */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index cbfe13b..7fb00d8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1821,7 +1821,8 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
goto out_unlock;
}

- retval = ext4_es_insert_extent(inode, map->m_lblk, map->m_len);
+ retval = ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
+ ~0, EXTENT_STATUS_DELAYED);
if (retval)
goto out_unlock;

diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 952628a..ef2f96e 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2068,28 +2068,33 @@ TRACE_EVENT(ext4_ext_remove_space_done,
);

TRACE_EVENT(ext4_es_insert_extent,
- TP_PROTO(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len),
+ TP_PROTO(struct inode *inode, struct extent_status *es),

- TP_ARGS(inode, lblk, len),
+ TP_ARGS(inode, es),

TP_STRUCT__entry(
- __field( dev_t, dev )
- __field( ino_t, ino )
- __field( loff_t, lblk )
- __field( loff_t, len )
+ __field( dev_t, dev )
+ __field( ino_t, ino )
+ __field( ext4_lblk_t, lblk )
+ __field( ext4_lblk_t, len )
+ __field( ext4_fsblk_t, pblk )
+ __field( unsigned long long, status )
),

TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
- __entry->lblk = lblk;
- __entry->len = len;
+ __entry->lblk = es->es_lblk;
+ __entry->len = es->es_len;
+ __entry->pblk = ext4_es_pblock(es);
+ __entry->status = ext4_es_status(es);
),

- TP_printk("dev %d,%d ino %lu es [%lld/%lld)",
+ TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %llx",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
- __entry->lblk, __entry->len)
+ __entry->lblk, __entry->len,
+ __entry->pblk, __entry->status)
);

TRACE_EVENT(ext4_es_remove_extent,
@@ -2150,6 +2155,8 @@ TRACE_EVENT(ext4_es_find_extent_exit,
__field( ino_t, ino )
__field( ext4_lblk_t, lblk )
__field( ext4_lblk_t, len )
+ __field( ext4_fsblk_t, pblk )
+ __field( unsigned long long, status )
__field( ext4_lblk_t, ret )
),

@@ -2158,13 +2165,16 @@ TRACE_EVENT(ext4_es_find_extent_exit,
__entry->ino = inode->i_ino;
__entry->lblk = es->es_lblk;
__entry->len = es->es_len;
+ __entry->pblk = ext4_es_pblock(es);
+ __entry->status = ext4_es_status(es);
__entry->ret = ret;
),

- TP_printk("dev %d,%d ino %lu es [%u/%u) ret %u",
+ TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %llx ret %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
- __entry->lblk, __entry->len, __entry->ret)
+ __entry->lblk, __entry->len,
+ __entry->pblk, __entry->status, __entry->ret)
);

#endif /* _TRACE_EXT4_H */
--
1.7.12.rc2.18.g61b472e


2013-02-17 16:13:40

by Zheng Liu

[permalink] [raw]
Subject: [PATCH 3/9 v6] ext4: ext4_es_find_extent improvement

From: Zheng Liu <[email protected]>

This commit renames ext4_es_find_extent with ext4_es_find_delayed_extent
and improve this function. First, we split input and output parameter.
Second, this function never return the first block of the next delayed
extent after 'es'.

Signed-off-by: Zheng Liu <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Jan kara <[email protected]>
---
fs/ext4/extents.c | 15 ++++++++++-----
fs/ext4/extents_status.c | 40 ++++++++++++++++++++--------------------
fs/ext4/extents_status.h | 4 ++--
fs/ext4/file.c | 6 ++----
include/trace/events/ext4.h | 15 ++++++---------
5 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f7bf616..c230840 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3525,8 +3525,7 @@ static int ext4_find_delalloc_range(struct inode *inode,
{
struct extent_status es;

- es.es_lblk = lblk_start;
- (void)ext4_es_find_extent(inode, &es);
+ ext4_es_find_delayed_extent(inode, lblk_start, &es);
if (es.es_len == 0)
return 0; /* there is no delay extent in this tree */
else if (es.es_lblk <= lblk_start &&
@@ -4566,10 +4565,9 @@ static int ext4_find_delayed_extent(struct inode *inode,
struct ext4_ext_cache *newex)
{
struct extent_status es;
- ext4_lblk_t next_del;
+ ext4_lblk_t block, next_del;

- es.es_lblk = newex->ec_block;
- next_del = ext4_es_find_extent(inode, &es);
+ ext4_es_find_delayed_extent(inode, newex->ec_block, &es);

if (newex->ec_start == 0) {
/*
@@ -4590,6 +4588,13 @@ static int ext4_find_delayed_extent(struct inode *inode,
newex->ec_len = es.es_lblk + es.es_len - newex->ec_block;
}

+ block = newex->ec_block + newex->ec_len;
+ ext4_es_find_delayed_extent(inode, block, &es);
+ if (es.es_len == 0)
+ next_del = EXT_MAX_BLOCKS;
+ else
+ next_del = es.es_lblk;
+
return next_del;
}
/* fiemap flags we can handle specified here */
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 243611f..cf0fd41 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -229,59 +229,59 @@ static struct extent_status *__es_tree_search(struct rb_root *root,
}

/*
- * ext4_es_find_extent: find the 1st delayed extent covering @es->lblk
+ * ext4_es_find_delayed_extent: find the 1st delayed extent covering @es->lblk
* if it exists, otherwise, the next extent after @es->lblk.
*
* @inode: the inode which owns delayed extents
+ * @lblk: the offset where we start to search
* @es: delayed extent that we found
- *
- * Returns the first block of the next extent after es, otherwise
- * EXT_MAX_BLOCKS if no extent is found.
- * Delayed extent is returned via @es.
*/
-ext4_lblk_t ext4_es_find_extent(struct inode *inode, struct extent_status *es)
+void ext4_es_find_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
+ struct extent_status *es)
{
struct ext4_es_tree *tree = NULL;
struct extent_status *es1 = NULL;
struct rb_node *node;
- ext4_lblk_t ret = EXT_MAX_BLOCKS;

- trace_ext4_es_find_extent_enter(inode, es->es_lblk);
+ BUG_ON(es == NULL);
+ trace_ext4_es_find_delayed_extent_enter(inode, lblk);

read_lock(&EXT4_I(inode)->i_es_lock);
tree = &EXT4_I(inode)->i_es_tree;

/* find extent in cache firstly */
- es->es_len = es->es_pblk = 0;
+ es->es_lblk = es->es_len = es->es_pblk = 0;
if (tree->cache_es) {
es1 = tree->cache_es;
- if (in_range(es->es_lblk, es1->es_lblk, es1->es_len)) {
+ if (in_range(lblk, es1->es_lblk, es1->es_len)) {
es_debug("%u cached by [%u/%u) %llu %llx\n",
- es->es_lblk, es1->es_lblk, es1->es_len,
+ lblk, es1->es_lblk, es1->es_len,
ext4_es_pblock(es1), ext4_es_status(es1));
goto out;
}
}

- es1 = __es_tree_search(&tree->root, es->es_lblk);
+ es1 = __es_tree_search(&tree->root, lblk);

out:
- if (es1) {
+ if (es1 && !ext4_es_is_delayed(es1)) {
+ while ((node = rb_next(&es1->rb_node)) != NULL) {
+ es1 = rb_entry(node, struct extent_status, rb_node);
+ if (ext4_es_is_delayed(es1))
+ break;
+ }
+ }
+
+ if (es1 && ext4_es_is_delayed(es1)) {
tree->cache_es = es1;
es->es_lblk = es1->es_lblk;
es->es_len = es1->es_len;
es->es_pblk = es1->es_pblk;
- node = rb_next(&es1->rb_node);
- if (node) {
- es1 = rb_entry(node, struct extent_status, rb_node);
- ret = es1->es_lblk;
- }
}

read_unlock(&EXT4_I(inode)->i_es_lock);

- trace_ext4_es_find_extent_exit(inode, es, ret);
- return ret;
+ trace_ext4_es_find_delayed_extent_exit(inode, es);
}

static struct extent_status *
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index 3cad833..3f69d09 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -51,8 +51,8 @@ extern int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
unsigned long long status);
extern int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len);
-extern ext4_lblk_t ext4_es_find_extent(struct inode *inode,
- struct extent_status *es);
+extern void ext4_es_find_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
+ struct extent_status *es);

static inline int ext4_es_is_written(struct extent_status *es)
{
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index aceaf5f..a7bd479 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -464,8 +464,7 @@ static loff_t ext4_seek_data(struct file *file, loff_t offset, loff_t maxsize)
* If there is a delay extent at this offset,
* it will be as a data.
*/
- es.es_lblk = last;
- (void)ext4_es_find_extent(inode, &es);
+ ext4_es_find_delayed_extent(inode, last, &es);
if (es.es_len != 0 && in_range(last, es.es_lblk, es.es_len)) {
if (last != start)
dataoff = last << blkbits;
@@ -548,8 +547,7 @@ static loff_t ext4_seek_hole(struct file *file, loff_t offset, loff_t maxsize)
* If there is a delay extent at this offset,
* we will skip this extent.
*/
- es.es_lblk = last;
- (void)ext4_es_find_extent(inode, &es);
+ ext4_es_find_delayed_extent(inode, last, &es);
if (es.es_len != 0 && in_range(last, es.es_lblk, es.es_len)) {
last = es.es_lblk + es.es_len;
holeoff = last << blkbits;
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index ef2f96e..7f7d57b 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2122,7 +2122,7 @@ TRACE_EVENT(ext4_es_remove_extent,
__entry->lblk, __entry->len)
);

-TRACE_EVENT(ext4_es_find_extent_enter,
+TRACE_EVENT(ext4_es_find_delayed_extent_enter,
TP_PROTO(struct inode *inode, ext4_lblk_t lblk),

TP_ARGS(inode, lblk),
@@ -2144,11 +2144,10 @@ TRACE_EVENT(ext4_es_find_extent_enter,
(unsigned long) __entry->ino, __entry->lblk)
);

-TRACE_EVENT(ext4_es_find_extent_exit,
- TP_PROTO(struct inode *inode, struct extent_status *es,
- ext4_lblk_t ret),
+TRACE_EVENT(ext4_es_find_delayed_extent_exit,
+ TP_PROTO(struct inode *inode, struct extent_status *es),

- TP_ARGS(inode, es, ret),
+ TP_ARGS(inode, es),

TP_STRUCT__entry(
__field( dev_t, dev )
@@ -2157,7 +2156,6 @@ TRACE_EVENT(ext4_es_find_extent_exit,
__field( ext4_lblk_t, len )
__field( ext4_fsblk_t, pblk )
__field( unsigned long long, status )
- __field( ext4_lblk_t, ret )
),

TP_fast_assign(
@@ -2167,14 +2165,13 @@ TRACE_EVENT(ext4_es_find_extent_exit,
__entry->len = es->es_len;
__entry->pblk = ext4_es_pblock(es);
__entry->status = ext4_es_status(es);
- __entry->ret = ret;
),

- TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %llx ret %u",
+ TP_printk("dev %d,%d ino %lu es [%u/%u) mapped %llu status %llx",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino,
__entry->lblk, __entry->len,
- __entry->pblk, __entry->status, __entry->ret)
+ __entry->pblk, __entry->status)
);

#endif /* _TRACE_EXT4_H */
--
1.7.12.rc2.18.g61b472e


2013-02-17 16:13:45

by Zheng Liu

[permalink] [raw]
Subject: [PATCH 4/9 v6] ext4: let ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag

From: Zheng Liu <[email protected]>

This commit lets ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag
because in later commit ext4_map_blocks needs to use this flag to
determine the extent status.

Signed-off-by: Zheng Liu <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Jan kara <[email protected]>
---
fs/ext4/extents.c | 6 +++++-
fs/ext4/inode.c | 12 +++---------
2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index c230840..ad6c20e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3656,6 +3656,7 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
ext4_set_io_unwritten_flag(inode, io);
else
ext4_set_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN);
+ map->m_flags |= EXT4_MAP_UNWRITTEN;
if (ext4_should_dioread_nolock(inode))
map->m_flags |= EXT4_MAP_UNINIT;
goto out;
@@ -3677,8 +3678,10 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
* repeat fallocate creation request
* we already have an unwritten extent
*/
- if (flags & EXT4_GET_BLOCKS_UNINIT_EXT)
+ if (flags & EXT4_GET_BLOCKS_UNINIT_EXT) {
+ map->m_flags |= EXT4_MAP_UNWRITTEN;
goto map_out;
+ }

/* buffered READ or buffered write_begin() lookup */
if ((flags & EXT4_GET_BLOCKS_CREATE) == 0) {
@@ -4108,6 +4111,7 @@ got_allocated_blocks:
/* Mark uninitialized */
if (flags & EXT4_GET_BLOCKS_UNINIT_EXT){
ext4_ext_mark_uninitialized(&newex);
+ map->m_flags |= EXT4_MAP_UNWRITTEN;
/*
* io_end structure was created for every IO write to an
* uninitialized extent. To avoid unnecessary conversion,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7fb00d8..c7e9665 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -560,16 +560,10 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
return retval;

/*
- * When we call get_blocks without the create flag, the
- * BH_Unwritten flag could have gotten set if the blocks
- * requested were part of a uninitialized extent. We need to
- * clear this flag now that we are committed to convert all or
- * part of the uninitialized extent to be an initialized
- * extent. This is because we need to avoid the combination
- * of BH_Unwritten and BH_Mapped flags being simultaneously
- * set on the buffer_head.
+ * Here we clear m_flags because after allocating an new extent,
+ * it will be set again.
*/
- map->m_flags &= ~EXT4_MAP_UNWRITTEN;
+ map->m_flags &= ~EXT4_MAP_FLAGS;

/*
* New blocks allocate and/or writing to uninitialized extent
--
1.7.12.rc2.18.g61b472e


2013-02-17 16:13:50

by Zheng Liu

[permalink] [raw]
Subject: [PATCH 5/9 v6] ext4: track all extent status in extent status tree

From: Zheng Liu <[email protected]>

By recording the phycisal block and status, extent status tree is able
to track the status of every extents. When we call _map_blocks
functions to lookup an extent or create a new written/unwritten/delayed
extent, this extent will be inserted into extent status tree.

We don't load all extents from disk in alloc_inode() because it costs
too much memory, and if a file is opened and closed frequently it will
takes too much time to load all extent information. So currently when
we create/lookup an extent, this extent will be inserted into extent
status tree. Hence, the extent status tree may not comprehensively
contain all of the extents found in the file.

Here a condition we need to take care is that an extent might contains
unwritten and delayed status simultaneously because an extent is delayed
allocated and could be allocated by fallocate. At this time we need to
keep delayed status because later we need to update delayed reservation
space using it.

Signed-off-by: Zheng Liu <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Jan kara <[email protected]>
---
fs/ext4/ext4.h | 3 +++
fs/ext4/extents.c | 20 +++++++++++----
fs/ext4/inode.c | 76 +++++++++++++++++++++++++++++++++++++------------------
3 files changed, 69 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8462eb3..bf9d835 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2520,6 +2520,9 @@ extern struct ext4_ext_path *ext4_ext_find_extent(struct inode *, ext4_lblk_t,
struct ext4_ext_path *);
extern void ext4_ext_drop_refs(struct ext4_ext_path *);
extern int ext4_ext_check_inode(struct inode *inode);
+extern int ext4_find_delalloc_range(struct inode *inode,
+ ext4_lblk_t lblk_start,
+ ext4_lblk_t lblk_end);
extern int ext4_find_delalloc_cluster(struct inode *inode, ext4_lblk_t lblk);
extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ad6c20e..a294db3 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2073,8 +2073,18 @@ static int ext4_fill_fiemap_extents(struct inode *inode,
break;
}

- /* This is possible iff next == next_del == EXT_MAX_BLOCKS */
- if (next == next_del) {
+ /*
+ * This is possible iff next == next_del == EXT_MAX_BLOCKS.
+ * we need to check next == EXT_MAX_BLOCKS because it is
+ * possible that an extent is with unwritten and delayed
+ * status due to when an extent is delayed allocated and
+ * is allocated by fallocate status tree will track both of
+ * them in a extent.
+ *
+ * So we could return a unwritten and delayed extent, and
+ * its block is equal to 'next'.
+ */
+ if (next == next_del && next == EXT_MAX_BLOCKS) {
flags |= FIEMAP_EXTENT_LAST;
if (unlikely(next_del != EXT_MAX_BLOCKS ||
next != EXT_MAX_BLOCKS)) {
@@ -3519,9 +3529,9 @@ out:
*
* Return 1 if there is a delalloc block in the range, otherwise 0.
*/
-static int ext4_find_delalloc_range(struct inode *inode,
- ext4_lblk_t lblk_start,
- ext4_lblk_t lblk_end)
+int ext4_find_delalloc_range(struct inode *inode,
+ ext4_lblk_t lblk_start,
+ ext4_lblk_t lblk_end)
{
struct extent_status es;

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c7e9665..42c38e0 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -527,20 +527,26 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
retval = ext4_ind_map_blocks(handle, inode, map, flags &
EXT4_GET_BLOCKS_KEEP_SIZE);
}
+ if (retval > 0) {
+ int ret;
+ unsigned long long status;
+
+ status = map->m_flags & EXT4_MAP_UNWRITTEN ?
+ EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
+ if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
+ ext4_find_delalloc_range(inode, map->m_lblk,
+ map->m_lblk + map->m_len - 1))
+ status |= EXTENT_STATUS_DELAYED;
+ ret = ext4_es_insert_extent(inode, map->m_lblk,
+ map->m_len, map->m_pblk, status);
+ if (ret < 0)
+ retval = ret;
+ }
if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
up_read((&EXT4_I(inode)->i_data_sem));

if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
- int ret;
- if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
- /* delayed alloc may be allocated by fallocate and
- * coverted to initialized by directIO.
- * we need to handle delayed extent here.
- */
- down_write((&EXT4_I(inode)->i_data_sem));
- goto delayed_mapped;
- }
- ret = check_block_validity(inode, map);
+ int ret = check_block_validity(inode, map);
if (ret != 0)
return ret;
}
@@ -609,18 +615,23 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE))
ext4_da_update_reserve_space(inode, retval, 1);
}
- if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
+ if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
ext4_clear_inode_state(inode, EXT4_STATE_DELALLOC_RESERVED);

- if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
- int ret;
-delayed_mapped:
- /* delayed allocation blocks has been allocated */
- ret = ext4_es_remove_extent(inode, map->m_lblk,
- map->m_len);
- if (ret < 0)
- retval = ret;
- }
+ if (retval > 0) {
+ int ret;
+ unsigned long long status;
+
+ status = map->m_flags & EXT4_MAP_UNWRITTEN ?
+ EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
+ if (!(flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) &&
+ ext4_find_delalloc_range(inode, map->m_lblk,
+ map->m_lblk + map->m_len - 1))
+ status |= EXTENT_STATUS_DELAYED;
+ ret = ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
+ map->m_pblk, status);
+ if (ret < 0)
+ retval = ret;
}

up_write((&EXT4_I(inode)->i_data_sem));
@@ -1802,6 +1813,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
retval = ext4_ind_map_blocks(NULL, inode, map, 0);

if (retval == 0) {
+ int ret;
/*
* XXX: __block_prepare_write() unmaps passed block,
* is it OK?
@@ -1809,16 +1821,20 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
/* If the block was allocated from previously allocated cluster,
* then we dont need to reserve it again. */
if (!(map->m_flags & EXT4_MAP_FROM_CLUSTER)) {
- retval = ext4_da_reserve_space(inode, iblock);
- if (retval)
+ ret = ext4_da_reserve_space(inode, iblock);
+ if (ret) {
/* not enough space to reserve */
+ retval = ret;
goto out_unlock;
+ }
}

- retval = ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
- ~0, EXTENT_STATUS_DELAYED);
- if (retval)
+ ret = ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
+ ~0, EXTENT_STATUS_DELAYED);
+ if (ret) {
+ retval = ret;
goto out_unlock;
+ }

/* Clear EXT4_MAP_FROM_CLUSTER flag since its purpose is served
* and it should not appear on the bh->b_state.
@@ -1828,6 +1844,16 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
map_bh(bh, inode->i_sb, invalid_block);
set_buffer_new(bh);
set_buffer_delay(bh);
+ } else if (retval > 0) {
+ int ret;
+ unsigned long long status;
+
+ status = map->m_flags & EXT4_MAP_UNWRITTEN ?
+ EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
+ ret = ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
+ map->m_pblk, status);
+ if (ret != 0)
+ retval = ret;
}

out_unlock:
--
1.7.12.rc2.18.g61b472e


2013-02-17 16:13:55

by Zheng Liu

[permalink] [raw]
Subject: [PATCH 6/9 v6] ext4: lookup block mapping in extent status tree

From: Zheng Liu <[email protected]>

After tracking all extent status, we already have a extent cache in
memory. Every time we want to lookup a block mapping, we can first
try to lookup it in extent status tree to avoid a potential disk I/O.

A new function called ext4_es_lookup_extent is defined to finish this
work. When we try to lookup a block mapping, we always call
ext4_map_blocks and/or ext4_da_map_blocks. So in these functions we
first try to lookup a block mapping in extent status tree.

A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks
in order not to put a hole into extent status tree because this hole
will be converted to delayed extent in the tree immediately.

Signed-off-by: Zheng Liu <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Jan kara <[email protected]>
---
fs/ext4/ext4.h | 2 ++
fs/ext4/extents.c | 9 ++++++-
fs/ext4/extents_status.c | 60 +++++++++++++++++++++++++++++++++++++++++
fs/ext4/extents_status.h | 2 ++
fs/ext4/inode.c | 66 +++++++++++++++++++++++++++++++++++++++++++--
include/trace/events/ext4.h | 56 ++++++++++++++++++++++++++++++++++++++
6 files changed, 192 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index bf9d835..8c8cd57 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -582,6 +582,8 @@ enum {
#define EXT4_GET_BLOCKS_KEEP_SIZE 0x0080
/* Do not take i_data_sem locking in ext4_map_blocks */
#define EXT4_GET_BLOCKS_NO_LOCK 0x0100
+ /* Do not put hole in extent cache */
+#define EXT4_GET_BLOCKS_NO_PUT_HOLE 0x0200

/*
* Flags used by ext4_free_blocks
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index a294db3..d5161cd 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2164,6 +2164,9 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
block,
le32_to_cpu(ex->ee_block),
ext4_ext_get_actual_len(ex));
+ if (!ext4_find_delalloc_range(inode, lblock, lblock + len - 1))
+ ext4_es_insert_extent(inode, lblock, len, ~0,
+ EXTENT_STATUS_HOLE);
} else if (block >= le32_to_cpu(ex->ee_block)
+ ext4_ext_get_actual_len(ex)) {
ext4_lblk_t next;
@@ -2177,6 +2180,9 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
block);
BUG_ON(next == lblock);
len = next - lblock;
+ if (!ext4_find_delalloc_range(inode, lblock, lblock + len - 1))
+ ext4_es_insert_extent(inode, lblock, len, ~0,
+ EXTENT_STATUS_HOLE);
} else {
lblock = len = 0;
BUG();
@@ -4015,7 +4021,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
* put just found gap into cache to speed up
* subsequent requests
*/
- ext4_ext_put_gap_in_cache(inode, path, map->m_lblk);
+ if ((flags & EXT4_GET_BLOCKS_NO_PUT_HOLE) == 0)
+ ext4_ext_put_gap_in_cache(inode, path, map->m_lblk);
goto out2;
}

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index cf0fd41..473a935 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -461,6 +461,66 @@ error:
return err;
}

+/*
+ * ext4_es_lookup_extent() looks up an extent in extent status tree.
+ *
+ * ext4_es_lookup_extent is called by ext4_map_blocks/ext4_da_map_blocks.
+ *
+ * Return: 1 on found, 0 on not
+ */
+int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk,
+ struct extent_status *es)
+{
+ struct ext4_es_tree *tree;
+ struct extent_status *es1 = NULL;
+ struct rb_node *node;
+ int found = 0;
+
+ trace_ext4_es_lookup_extent_enter(inode, lblk);
+ es_debug("lookup extent in block %u\n", lblk);
+
+ tree = &EXT4_I(inode)->i_es_tree;
+ read_lock(&EXT4_I(inode)->i_es_lock);
+
+ /* find extent in cache firstly */
+ es->es_lblk = es->es_len = es->es_pblk = 0;
+ if (tree->cache_es) {
+ es1 = tree->cache_es;
+ if (in_range(lblk, es1->es_lblk, es1->es_len)) {
+ es_debug("%u cached by [%u/%u)\n",
+ lblk, es1->es_lblk, es1->es_len);
+ found = 1;
+ goto out;
+ }
+ }
+
+ node = tree->root.rb_node;
+ while (node) {
+ es1 = rb_entry(node, struct extent_status, rb_node);
+ if (lblk < es1->es_lblk)
+ node = node->rb_left;
+ else if (lblk > ext4_es_end(es1))
+ node = node->rb_right;
+ else {
+ found = 1;
+ break;
+ }
+ }
+
+out:
+ if (found) {
+ BUG_ON(!es1);
+ es->es_lblk = es1->es_lblk;
+ es->es_len = es1->es_len;
+ es->es_pblk = es1->es_pblk;
+ }
+
+ read_unlock(&EXT4_I(inode)->i_es_lock);
+
+ trace_ext4_es_lookup_extent_exit(inode, es, found);
+ return found;
+}
+
static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
ext4_lblk_t end)
{
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index 3f69d09..8ffc90c 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -53,6 +53,8 @@ extern int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len);
extern void ext4_es_find_delayed_extent(struct inode *inode, ext4_lblk_t lblk,
struct extent_status *es);
+extern int ext4_es_lookup_extent(struct inode *inode, ext4_lblk_t lblk,
+ struct extent_status *es);

static inline int ext4_es_is_written(struct extent_status *es)
{
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 42c38e0..039e8bd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -508,12 +508,33 @@ static pgoff_t ext4_num_dirty_pages(struct inode *inode, pgoff_t idx,
int ext4_map_blocks(handle_t *handle, struct inode *inode,
struct ext4_map_blocks *map, int flags)
{
+ struct extent_status es;
int retval;

map->m_flags = 0;
ext_debug("ext4_map_blocks(): inode %lu, flag %d, max_blocks %u,"
"logical block %lu\n", inode->i_ino, flags, map->m_len,
(unsigned long) map->m_lblk);
+
+ /* Lookup extent status tree firstly */
+ if (ext4_es_lookup_extent(inode, map->m_lblk, &es)) {
+ if (ext4_es_is_written(&es) || ext4_es_is_unwritten(&es)) {
+ map->m_pblk = ext4_es_pblock(&es) +
+ map->m_lblk - es.es_lblk;
+ map->m_flags |= ext4_es_is_written(&es) ?
+ EXT4_MAP_MAPPED : EXT4_MAP_UNWRITTEN;
+ retval = es.es_len - (map->m_lblk - es.es_lblk);
+ if (retval > map->m_len)
+ retval = map->m_len;
+ map->m_len = retval;
+ } else if (ext4_es_is_delayed(&es) || ext4_es_is_hole(&es)) {
+ retval = 0;
+ } else {
+ BUG_ON(1);
+ }
+ goto found;
+ }
+
/*
* Try to see if we can get the block without requesting a new
* file system block.
@@ -545,6 +566,7 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
if (!(flags & EXT4_GET_BLOCKS_NO_LOCK))
up_read((&EXT4_I(inode)->i_data_sem));

+found:
if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
int ret = check_block_validity(inode, map);
if (ret != 0)
@@ -1780,6 +1802,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
struct ext4_map_blocks *map,
struct buffer_head *bh)
{
+ struct extent_status es;
int retval;
sector_t invalid_block = ~((sector_t) 0xffff);

@@ -1790,6 +1813,42 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
ext_debug("ext4_da_map_blocks(): inode %lu, max_blocks %u,"
"logical block %lu\n", inode->i_ino, map->m_len,
(unsigned long) map->m_lblk);
+
+ /* Lookup extent status tree firstly */
+ if (ext4_es_lookup_extent(inode, iblock, &es)) {
+
+ if (ext4_es_is_hole(&es)) {
+ retval = 0;
+ down_read((&EXT4_I(inode)->i_data_sem));
+ goto add_delayed;
+ }
+
+ /*
+ * Delayed extent could be allocated by fallocate.
+ * So we need to check it.
+ */
+ if (ext4_es_is_delayed(&es) && !ext4_es_is_unwritten(&es)) {
+ map_bh(bh, inode->i_sb, invalid_block);
+ set_buffer_new(bh);
+ set_buffer_delay(bh);
+ return 0;
+ }
+
+ map->m_pblk = ext4_es_pblock(&es) + iblock - es.es_lblk;
+ retval = es.es_len - (iblock - es.es_lblk);
+ if (retval > map->m_len)
+ retval = map->m_len;
+ map->m_len = retval;
+ if (ext4_es_is_written(&es))
+ map->m_flags |= EXT4_MAP_MAPPED;
+ else if (ext4_es_is_unwritten(&es))
+ map->m_flags |= EXT4_MAP_UNWRITTEN;
+ else
+ BUG_ON(1);
+
+ return retval;
+ }
+
/*
* Try to see if we can get the block without requesting a new
* file system block.
@@ -1808,10 +1867,13 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
map->m_flags |= EXT4_MAP_FROM_CLUSTER;
retval = 0;
} else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
- retval = ext4_ext_map_blocks(NULL, inode, map, 0);
+ retval = ext4_ext_map_blocks(NULL, inode, map,
+ EXT4_GET_BLOCKS_NO_PUT_HOLE);
else
- retval = ext4_ind_map_blocks(NULL, inode, map, 0);
+ retval = ext4_ind_map_blocks(NULL, inode, map,
+ EXT4_GET_BLOCKS_NO_PUT_HOLE);

+add_delayed:
if (retval == 0) {
int ret;
/*
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 7f7d57b..1e634c1 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2174,6 +2174,62 @@ TRACE_EVENT(ext4_es_find_delayed_extent_exit,
__entry->pblk, __entry->status)
);

+TRACE_EVENT(ext4_es_lookup_extent_enter,
+ TP_PROTO(struct inode *inode, ext4_lblk_t lblk),
+
+ TP_ARGS(inode, lblk),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( ino_t, ino )
+ __field( ext4_lblk_t, lblk )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->lblk = lblk;
+ ),
+
+ TP_printk("dev %d,%d ino %lu lblk %u",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino, __entry->lblk)
+);
+
+TRACE_EVENT(ext4_es_lookup_extent_exit,
+ TP_PROTO(struct inode *inode, struct extent_status *es,
+ int found),
+
+ TP_ARGS(inode, es, found),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( ino_t, ino )
+ __field( ext4_lblk_t, lblk )
+ __field( ext4_lblk_t, len )
+ __field( ext4_fsblk_t, pblk )
+ __field( unsigned long long, status )
+ __field( int, found )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->lblk = es->es_lblk;
+ __entry->len = es->es_len;
+ __entry->pblk = ext4_es_pblock(es);
+ __entry->status = ext4_es_status(es);
+ __entry->found = found;
+ ),
+
+ TP_printk("dev %d,%d ino %lu found %d [%u/%u) %llu %llx",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino, __entry->found,
+ __entry->lblk, __entry->len,
+ __entry->found ? __entry->pblk : 0,
+ __entry->found ? __entry->status : 0)
+);
+
#endif /* _TRACE_EXT4_H */

/* This part must be outside protection */
--
1.7.12.rc2.18.g61b472e


2013-02-17 16:14:00

by Zheng Liu

[permalink] [raw]
Subject: [PATCH 7/9 v6] ext4: remove single extent cache

From: Zheng Liu <[email protected]>

Single extent cache could be removed because we have extent status tree
as a extent cache, and it would be better.

Signed-off-by: Zheng Liu <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Jan kara <[email protected]>
---
fs/ext4/ext4.h | 12 ----
fs/ext4/ext4_extents.h | 6 --
fs/ext4/extents.c | 179 +++++++++++--------------------------------------
fs/ext4/move_extent.c | 3 -
fs/ext4/super.c | 1 -
5 files changed, 38 insertions(+), 163 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8c8cd57..29bd2ef 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -812,17 +812,6 @@ do { \

#endif /* defined(__KERNEL__) || defined(__linux__) */

-/*
- * storage for cached extent
- * If ec_len == 0, then the cache is invalid.
- * If ec_start == 0, then the cache represents a gap (null mapping)
- */
-struct ext4_ext_cache {
- ext4_fsblk_t ec_start;
- ext4_lblk_t ec_block;
- __u32 ec_len; /* must be 32bit to return holes */
-};
-
#include "extents_status.h"

/*
@@ -889,7 +878,6 @@ struct ext4_inode_info {
struct inode vfs_inode;
struct jbd2_inode *jinode;

- struct ext4_ext_cache i_cached_extent;
/*
* File creation time. Its function is same as that of
* struct timespec i_{a,c,m}time in the generic inode.
diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
index 487fda1..8643ff5 100644
--- a/fs/ext4/ext4_extents.h
+++ b/fs/ext4/ext4_extents.h
@@ -193,12 +193,6 @@ static inline unsigned short ext_depth(struct inode *inode)
return le16_to_cpu(ext_inode_hdr(inode)->eh_depth);
}

-static inline void
-ext4_ext_invalidate_cache(struct inode *inode)
-{
- EXT4_I(inode)->i_cached_extent.ec_len = 0;
-}
-
static inline void ext4_ext_mark_uninitialized(struct ext4_extent *ext)
{
/* We can not have an uninitialized extent of zero length! */
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index d5161cd..55e73a9 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -112,7 +112,7 @@ static int ext4_split_extent_at(handle_t *handle,
int flags);

static int ext4_find_delayed_extent(struct inode *inode,
- struct ext4_ext_cache *newex);
+ struct extent_status *newes);

static int ext4_ext_truncate_extend_restart(handle_t *handle,
struct inode *inode,
@@ -714,7 +714,6 @@ int ext4_ext_tree_init(handle_t *handle, struct inode *inode)
eh->eh_magic = EXT4_EXT_MAGIC;
eh->eh_max = cpu_to_le16(ext4_ext_space_root(inode, 0));
ext4_mark_inode_dirty(handle, inode);
- ext4_ext_invalidate_cache(inode);
return 0;
}

@@ -1960,7 +1959,6 @@ cleanup:
ext4_ext_drop_refs(npath);
kfree(npath);
}
- ext4_ext_invalidate_cache(inode);
return err;
}

@@ -1969,8 +1967,8 @@ static int ext4_fill_fiemap_extents(struct inode *inode,
struct fiemap_extent_info *fieinfo)
{
struct ext4_ext_path *path = NULL;
- struct ext4_ext_cache newex;
struct ext4_extent *ex;
+ struct extent_status es;
ext4_lblk_t next, next_del, start = 0, end = 0;
ext4_lblk_t last = block + num;
int exists, depth = 0, err = 0;
@@ -2044,31 +2042,31 @@ static int ext4_fill_fiemap_extents(struct inode *inode,
BUG_ON(end <= start);

if (!exists) {
- newex.ec_block = start;
- newex.ec_len = end - start;
- newex.ec_start = 0;
+ es.es_lblk = start;
+ es.es_len = end - start;
+ es.es_pblk = 0;
} else {
- newex.ec_block = le32_to_cpu(ex->ee_block);
- newex.ec_len = ext4_ext_get_actual_len(ex);
- newex.ec_start = ext4_ext_pblock(ex);
+ es.es_lblk = le32_to_cpu(ex->ee_block);
+ es.es_len = ext4_ext_get_actual_len(ex);
+ es.es_pblk = ext4_ext_pblock(ex);
if (ext4_ext_is_uninitialized(ex))
flags |= FIEMAP_EXTENT_UNWRITTEN;
}

/*
- * Find delayed extent and update newex accordingly. We call
- * it even in !exists case to find out whether newex is the
+ * Find delayed extent and update es accordingly. We call
+ * it even in !exists case to find out whether es is the
* last existing extent or not.
*/
- next_del = ext4_find_delayed_extent(inode, &newex);
+ next_del = ext4_find_delayed_extent(inode, &es);
if (!exists && next_del) {
exists = 1;
flags |= FIEMAP_EXTENT_DELALLOC;
}
up_read(&EXT4_I(inode)->i_data_sem);

- if (unlikely(newex.ec_len == 0)) {
- EXT4_ERROR_INODE(inode, "newex.ec_len == 0");
+ if (unlikely(es.es_len == 0)) {
+ EXT4_ERROR_INODE(inode, "es.es_len == 0");
err = -EIO;
break;
}
@@ -2099,9 +2097,9 @@ static int ext4_fill_fiemap_extents(struct inode *inode,

if (exists) {
err = fiemap_fill_next_extent(fieinfo,
- (__u64)newex.ec_block << blksize_bits,
- (__u64)newex.ec_start << blksize_bits,
- (__u64)newex.ec_len << blksize_bits,
+ (__u64)es.es_lblk << blksize_bits,
+ (__u64)es.es_pblk << blksize_bits,
+ (__u64)es.es_len << blksize_bits,
flags);
if (err < 0)
break;
@@ -2111,7 +2109,7 @@ static int ext4_fill_fiemap_extents(struct inode *inode,
}
}

- block = newex.ec_block + newex.ec_len;
+ block = es.es_lblk + es.es_len;
}

if (path) {
@@ -2122,21 +2120,6 @@ static int ext4_fill_fiemap_extents(struct inode *inode,
return err;
}

-static void
-ext4_ext_put_in_cache(struct inode *inode, ext4_lblk_t block,
- __u32 len, ext4_fsblk_t start)
-{
- struct ext4_ext_cache *cex;
- BUG_ON(len == 0);
- spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
- trace_ext4_ext_put_in_cache(inode, block, len, start);
- cex = &EXT4_I(inode)->i_cached_extent;
- cex->ec_block = block;
- cex->ec_len = len;
- cex->ec_start = start;
- spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
-}
-
/*
* ext4_ext_put_gap_in_cache:
* calculate boundaries of the gap that the requested block fits into
@@ -2153,9 +2136,10 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,

ex = path[depth].p_ext;
if (ex == NULL) {
- /* there is no extent yet, so gap is [0;-] */
- lblock = 0;
- len = EXT_MAX_BLOCKS;
+ /*
+ * there is no extent yet, so gap is [0;-] and we
+ * don't cache it
+ */
ext_debug("cache gap(whole file):");
} else if (block < le32_to_cpu(ex->ee_block)) {
lblock = block;
@@ -2189,52 +2173,6 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
}

ext_debug(" -> %u:%lu\n", lblock, len);
- ext4_ext_put_in_cache(inode, lblock, len, 0);
-}
-
-/*
- * ext4_ext_in_cache()
- * Checks to see if the given block is in the cache.
- * If it is, the cached extent is stored in the given
- * cache extent pointer.
- *
- * @inode: The files inode
- * @block: The block to look for in the cache
- * @ex: Pointer where the cached extent will be stored
- * if it contains block
- *
- * Return 0 if cache is invalid; 1 if the cache is valid
- */
-static int
-ext4_ext_in_cache(struct inode *inode, ext4_lblk_t block,
- struct ext4_extent *ex)
-{
- struct ext4_ext_cache *cex;
- int ret = 0;
-
- /*
- * We borrow i_block_reservation_lock to protect i_cached_extent
- */
- spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
- cex = &EXT4_I(inode)->i_cached_extent;
-
- /* has cache valid data? */
- if (cex->ec_len == 0)
- goto errout;
-
- if (in_range(block, cex->ec_block, cex->ec_len)) {
- ex->ee_block = cpu_to_le32(cex->ec_block);
- ext4_ext_store_pblock(ex, cex->ec_start);
- ex->ee_len = cpu_to_le16(cex->ec_len);
- ext_debug("%u cached by %u:%u:%llu\n",
- block,
- cex->ec_block, cex->ec_len, cex->ec_start);
- ret = 1;
- }
-errout:
- trace_ext4_ext_in_cache(inode, block, ret);
- spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
- return ret;
}

/*
@@ -2674,8 +2612,6 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
return PTR_ERR(handle);

again:
- ext4_ext_invalidate_cache(inode);
-
trace_ext4_ext_remove_space(inode, start, depth);

/*
@@ -3917,35 +3853,6 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
map->m_lblk, map->m_len, inode->i_ino);
trace_ext4_ext_map_blocks_enter(inode, map->m_lblk, map->m_len, flags);

- /* check in cache */
- if (ext4_ext_in_cache(inode, map->m_lblk, &newex)) {
- if (!newex.ee_start_lo && !newex.ee_start_hi) {
- if ((sbi->s_cluster_ratio > 1) &&
- ext4_find_delalloc_cluster(inode, map->m_lblk))
- map->m_flags |= EXT4_MAP_FROM_CLUSTER;
-
- if ((flags & EXT4_GET_BLOCKS_CREATE) == 0) {
- /*
- * block isn't allocated yet and
- * user doesn't want to allocate it
- */
- goto out2;
- }
- /* we should allocate requested block */
- } else {
- /* block is already allocated */
- if (sbi->s_cluster_ratio > 1)
- map->m_flags |= EXT4_MAP_FROM_CLUSTER;
- newblock = map->m_lblk
- - le32_to_cpu(newex.ee_block)
- + ext4_ext_pblock(&newex);
- /* number of remaining blocks in the extent */
- allocated = ext4_ext_get_actual_len(&newex) -
- (map->m_lblk - le32_to_cpu(newex.ee_block));
- goto out;
- }
- }
-
/* find extent for this block */
path = ext4_ext_find_extent(inode, map->m_lblk, NULL);
if (IS_ERR(path)) {
@@ -3992,15 +3899,9 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
ext_debug("%u fit into %u:%d -> %llu\n", map->m_lblk,
ee_block, ee_len, newblock);

- /*
- * Do not put uninitialized extent
- * in the cache
- */
- if (!ext4_ext_is_uninitialized(ex)) {
- ext4_ext_put_in_cache(inode, ee_block,
- ee_len, ee_start);
+ if (!ext4_ext_is_uninitialized(ex))
goto out;
- }
+
allocated = ext4_ext_handle_uninitialized_extents(
handle, inode, map, path, flags,
allocated, newblock);
@@ -4262,10 +4163,9 @@ got_allocated_blocks:
* Cache the extent and update transaction to commit on fdatasync only
* when it is _not_ an uninitialized extent.
*/
- if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0) {
- ext4_ext_put_in_cache(inode, map->m_lblk, allocated, newblock);
+ if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0)
ext4_update_inode_fsync_trans(handle, inode, 1);
- } else
+ else
ext4_update_inode_fsync_trans(handle, inode, 0);
out:
if (allocated > map->m_len)
@@ -4324,7 +4224,6 @@ void ext4_ext_truncate(struct inode *inode)
goto out_stop;

down_write(&EXT4_I(inode)->i_data_sem);
- ext4_ext_invalidate_cache(inode);

ext4_discard_preallocations(inode);

@@ -4574,42 +4473,42 @@ int ext4_convert_unwritten_extents(struct inode *inode, loff_t offset,
}

/*
- * If newex is not existing extent (newex->ec_start equals zero) find
- * delayed extent at start of newex and update newex accordingly and
+ * If newes is not existing extent (newes->ec_pblk equals zero) find
+ * delayed extent at start of newes and update newes accordingly and
* return start of the next delayed extent.
*
- * If newex is existing extent (newex->ec_start is not equal zero)
+ * If newes is existing extent (newes->ec_pblk is not equal zero)
* return start of next delayed extent or EXT_MAX_BLOCKS if no delayed
- * extent found. Leave newex unmodified.
+ * extent found. Leave newes unmodified.
*/
static int ext4_find_delayed_extent(struct inode *inode,
- struct ext4_ext_cache *newex)
+ struct extent_status *newes)
{
struct extent_status es;
ext4_lblk_t block, next_del;

- ext4_es_find_delayed_extent(inode, newex->ec_block, &es);
+ ext4_es_find_delayed_extent(inode, newes->es_lblk, &es);

- if (newex->ec_start == 0) {
+ if (newes->es_pblk == 0) {
/*
- * No extent in extent-tree contains block @newex->ec_start,
+ * No extent in extent-tree contains block @newes->es_pblk,
* then the block may stay in 1)a hole or 2)delayed-extent.
*/
if (es.es_len == 0)
/* A hole found. */
return 0;

- if (es.es_lblk > newex->ec_block) {
+ if (es.es_lblk > newes->es_lblk) {
/* A hole found. */
- newex->ec_len = min(es.es_lblk - newex->ec_block,
- newex->ec_len);
+ newes->es_len = min(es.es_lblk - newes->es_lblk,
+ newes->es_len);
return 0;
}

- newex->ec_len = es.es_lblk + es.es_len - newex->ec_block;
+ newes->es_len = es.es_lblk + es.es_len - newes->es_lblk;
}

- block = newex->ec_block + newex->ec_len;
+ block = newes->es_lblk + newes->es_len;
ext4_es_find_delayed_extent(inode, block, &es);
if (es.es_len == 0)
next_del = EXT_MAX_BLOCKS;
@@ -4813,14 +4712,12 @@ int ext4_ext_punch_hole(struct file *file, loff_t offset, loff_t length)
goto out;

down_write(&EXT4_I(inode)->i_data_sem);
- ext4_ext_invalidate_cache(inode);
ext4_discard_preallocations(inode);

err = ext4_es_remove_extent(inode, first_block,
stop_block - first_block);
err = ext4_ext_remove_space(inode, first_block, stop_block - 1);

- ext4_ext_invalidate_cache(inode);
ext4_discard_preallocations(inode);

if (IS_SYNC(inode))
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index d9cc5ee..b9222c8 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -761,9 +761,6 @@ out:
kfree(donor_path);
}

- ext4_ext_invalidate_cache(orig_inode);
- ext4_ext_invalidate_cache(donor_inode);

2013-02-17 16:14:05

by Zheng Liu

[permalink] [raw]
Subject: [PATCH 8/9] ext4: adjust some functions for reclaiming extents from extent status tree

From: Zheng Liu <[email protected]>

This commit changes some interfaces in extent status tree because we
need to use inode to count the cached objects in a extent status tree.

Signed-off-by: Zheng Liu <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Jan kara <[email protected]>
---
fs/ext4/extents_status.c | 50 +++++++++++++++++++++++-------------------------
1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 473a935..3021858 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -142,9 +142,8 @@

static struct kmem_cache *ext4_es_cachep;

-static int __es_insert_extent(struct ext4_es_tree *tree,
- struct extent_status *newes);
-static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
+static int __es_insert_extent(struct inode *inode, struct extent_status *newes);
+static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t end);

int __init ext4_init_es(void)
@@ -285,7 +284,8 @@ out:
}

static struct extent_status *
-ext4_es_alloc_extent(ext4_lblk_t lblk, ext4_lblk_t len, ext4_fsblk_t pblk)
+ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len,
+ ext4_fsblk_t pblk)
{
struct extent_status *es;
es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC);
@@ -297,7 +297,7 @@ ext4_es_alloc_extent(ext4_lblk_t lblk, ext4_lblk_t len, ext4_fsblk_t pblk)
return es;
}

-static void ext4_es_free_extent(struct extent_status *es)
+static void ext4_es_free_extent(struct inode *inode, struct extent_status *es)
{
kmem_cache_free(ext4_es_cachep, es);
}
@@ -326,8 +326,9 @@ static int ext4_es_can_be_merged(struct extent_status *es1,
}

static struct extent_status *
-ext4_es_try_to_merge_left(struct ext4_es_tree *tree, struct extent_status *es)
+ext4_es_try_to_merge_left(struct inode *inode, struct extent_status *es)
{
+ struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
struct extent_status *es1;
struct rb_node *node;

@@ -339,7 +340,7 @@ ext4_es_try_to_merge_left(struct ext4_es_tree *tree, struct extent_status *es)
if (ext4_es_can_be_merged(es1, es)) {
es1->es_len += es->es_len;
rb_erase(&es->rb_node, &tree->root);
- ext4_es_free_extent(es);
+ ext4_es_free_extent(inode, es);
es = es1;
}

@@ -347,8 +348,9 @@ ext4_es_try_to_merge_left(struct ext4_es_tree *tree, struct extent_status *es)
}

static struct extent_status *
-ext4_es_try_to_merge_right(struct ext4_es_tree *tree, struct extent_status *es)
+ext4_es_try_to_merge_right(struct inode *inode, struct extent_status *es)
{
+ struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
struct extent_status *es1;
struct rb_node *node;

@@ -360,15 +362,15 @@ ext4_es_try_to_merge_right(struct ext4_es_tree *tree, struct extent_status *es)
if (ext4_es_can_be_merged(es, es1)) {
es->es_len += es1->es_len;
rb_erase(node, &tree->root);
- ext4_es_free_extent(es1);
+ ext4_es_free_extent(inode, es1);
}

return es;
}

-static int __es_insert_extent(struct ext4_es_tree *tree,
- struct extent_status *newes)
+static int __es_insert_extent(struct inode *inode, struct extent_status *newes)
{
+ struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
struct rb_node **p = &tree->root.rb_node;
struct rb_node *parent = NULL;
struct extent_status *es;
@@ -389,14 +391,14 @@ static int __es_insert_extent(struct ext4_es_tree *tree,
ext4_es_is_unwritten(es))
ext4_es_store_pblock(es,
newes->es_pblk);
- es = ext4_es_try_to_merge_left(tree, es);
+ es = ext4_es_try_to_merge_left(inode, es);
goto out;
}
p = &(*p)->rb_left;
} else if (newes->es_lblk > ext4_es_end(es)) {
if (ext4_es_can_be_merged(es, newes)) {
es->es_len += newes->es_len;
- es = ext4_es_try_to_merge_right(tree, es);
+ es = ext4_es_try_to_merge_right(inode, es);
goto out;
}
p = &(*p)->rb_right;
@@ -406,7 +408,7 @@ static int __es_insert_extent(struct ext4_es_tree *tree,
}
}

- es = ext4_es_alloc_extent(newes->es_lblk, newes->es_len,
+ es = ext4_es_alloc_extent(inode, newes->es_lblk, newes->es_len,
newes->es_pblk);
if (!es)
return -ENOMEM;
@@ -430,7 +432,6 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len, ext4_fsblk_t pblk,
unsigned long long status)
{
- struct ext4_es_tree *tree;
struct extent_status newes;
ext4_lblk_t end = lblk + len - 1;
int err = 0;
@@ -447,11 +448,10 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
trace_ext4_es_insert_extent(inode, &newes);

write_lock(&EXT4_I(inode)->i_es_lock);
- tree = &EXT4_I(inode)->i_es_tree;
- err = __es_remove_extent(tree, lblk, end);
+ err = __es_remove_extent(inode, lblk, end);
if (err != 0)
goto error;
- err = __es_insert_extent(tree, &newes);
+ err = __es_insert_extent(inode, &newes);

error:
write_unlock(&EXT4_I(inode)->i_es_lock);
@@ -521,9 +521,10 @@ out:
return found;
}

-static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
- ext4_lblk_t end)
+static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
+ ext4_lblk_t end)
{
+ struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
struct rb_node *node;
struct extent_status *es;
struct extent_status orig_es;
@@ -561,7 +562,7 @@ static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
ext4_es_store_pblock(&newes, block);
}
ext4_es_store_status(&newes, ext4_es_status(&orig_es));
- err = __es_insert_extent(tree, &newes);
+ err = __es_insert_extent(inode, &newes);
if (err) {
es->es_lblk = orig_es.es_lblk;
es->es_len = orig_es.es_len;
@@ -590,7 +591,7 @@ static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
while (es && ext4_es_end(es) <= end) {
node = rb_next(&es->rb_node);
rb_erase(&es->rb_node, &tree->root);
- ext4_es_free_extent(es);
+ ext4_es_free_extent(inode, es);
if (!node) {
es = NULL;
break;
@@ -622,7 +623,6 @@ out:
int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len)
{
- struct ext4_es_tree *tree;
ext4_lblk_t end;
int err = 0;

@@ -633,10 +633,8 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
end = lblk + len - 1;
BUG_ON(end < lblk);

- tree = &EXT4_I(inode)->i_es_tree;

2013-02-17 16:14:11

by Zheng Liu

[permalink] [raw]
Subject: [PATCH 9/9 v6] ext4: reclaim extents from extent status tree

From: Zheng Liu <[email protected]>

Although extent status is loaded on-demand, we also need to reclaim
extent from the tree when we are under a heavy memory pressure because
in some cases fragmented extent tree causes status tree costs too much
memory.

Here we maintain a lru list in super_block. When the extent status of
an inode is accessed and changed, this inode will be move to the tail
of the list. The inode will be dropped from this list when it is
cleared. In the inode, a counter is added to count the number of
cached objects in extent status tree. Here only written/unwritten/hole
extent is counted because delayed extent doesn't be reclaimed due to
fiemap, bigalloc and seek_data/hole need it. The counter will be
increased as a new extent is allocated, and it will be decreased as a
extent is freed.

In this commit we use normal shrinker framework to reclaim memory from
the status tree. ext4_es_reclaim_extents_count() traverses the lru list
to count the number of reclaimable extents. ext4_es_shrink() tries to
reclaim written/unwritten/hole extents from extent status tree. The
inode that has been shrunk is moved to the tail of lru list.

Signed-off-by: Zheng Liu <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Jan kara <[email protected]>
---
fs/ext4/ext4.h | 7 ++
fs/ext4/extents_status.c | 156 ++++++++++++++++++++++++++++++++++++++++++++
fs/ext4/extents_status.h | 5 ++
fs/ext4/super.c | 7 ++
include/trace/events/ext4.h | 60 +++++++++++++++++
5 files changed, 235 insertions(+)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 29bd2ef..01b2b37 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -891,6 +891,8 @@ struct ext4_inode_info {
/* extents status tree */
struct ext4_es_tree i_es_tree;
rwlock_t i_es_lock;
+ struct list_head i_es_lru;
+ unsigned int i_es_lru_nr; /* protected by i_es_lock */

/* ialloc */
ext4_group_t i_last_alloc_group;
@@ -1306,6 +1308,11 @@ struct ext4_sb_info {

/* Precomputed FS UUID checksum for seeding other checksums */
__u32 s_csum_seed;
+
+ /* Reclaim extents from extent status tree */
+ struct shrinker s_es_shrinker;
+ struct list_head s_es_lru;
+ spinlock_t s_es_lru_lock ____cacheline_aligned_in_smp;
};

static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 3021858..8d8d061 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -145,6 +145,9 @@ static struct kmem_cache *ext4_es_cachep;
static int __es_insert_extent(struct inode *inode, struct extent_status *newes);
static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t end);
+static int __es_try_to_reclaim_extents(struct ext4_inode_info *ei,
+ int nr_to_scan);
+static int ext4_es_reclaim_extents_count(struct super_block *sb);

int __init ext4_init_es(void)
{
@@ -280,6 +283,7 @@ out:

read_unlock(&EXT4_I(inode)->i_es_lock);

+ ext4_es_lru_add(inode);
trace_ext4_es_find_delayed_extent_exit(inode, es);
}

@@ -294,11 +298,24 @@ ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len,
es->es_lblk = lblk;
es->es_len = len;
es->es_pblk = pblk;
+
+ /*
+ * We don't count delayed extent because we never try to reclaim them
+ */
+ if (!ext4_es_is_delayed(es))
+ EXT4_I(inode)->i_es_lru_nr++;
+
return es;
}

static void ext4_es_free_extent(struct inode *inode, struct extent_status *es)
{
+ /* Decrease the lru counter when this es is not delayed */
+ if (!ext4_es_is_delayed(es)) {
+ BUG_ON(EXT4_I(inode)->i_es_lru_nr == 0);
+ EXT4_I(inode)->i_es_lru_nr--;
+ }
+
kmem_cache_free(ext4_es_cachep, es);
}

@@ -456,6 +473,7 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
error:
write_unlock(&EXT4_I(inode)->i_es_lock);

+ ext4_es_lru_add(inode);
ext4_es_print_tree(inode);

return err;
@@ -517,6 +535,7 @@ out:

read_unlock(&EXT4_I(inode)->i_es_lock);

+ ext4_es_lru_add(inode);
trace_ext4_es_lookup_extent_exit(inode, es, found);
return found;
}
@@ -639,3 +658,140 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_es_print_tree(inode);
return err;
}
+
+static int ext4_es_shrink(struct shrinker *shrink, struct shrink_control *sc)
+{
+ struct ext4_sb_info *sbi = container_of(shrink,
+ struct ext4_sb_info, s_es_shrinker);
+ struct ext4_inode_info *ei;
+ struct list_head *cur, *tmp, scanned;
+ int nr_to_scan = sc->nr_to_scan;
+ int ret, nr_shrunk = 0;
+
+ trace_ext4_es_shrink_enter(sbi->s_sb, nr_to_scan);
+
+ if (!nr_to_scan)
+ return ext4_es_reclaim_extents_count(sbi->s_sb);
+
+ INIT_LIST_HEAD(&scanned);
+
+ spin_lock(&sbi->s_es_lru_lock);
+ list_for_each_safe(cur, tmp, &sbi->s_es_lru) {
+ list_move_tail(cur, &scanned);
+
+ ei = list_entry(cur, struct ext4_inode_info, i_es_lru);
+
+ read_lock(&ei->i_es_lock);
+ if (ei->i_es_lru_nr == 0) {
+ read_unlock(&ei->i_es_lock);
+ continue;
+ }
+ read_unlock(&ei->i_es_lock);
+
+ write_lock(&ei->i_es_lock);
+ ret = __es_try_to_reclaim_extents(ei, nr_to_scan);
+ write_unlock(&ei->i_es_lock);
+
+ nr_shrunk += ret;
+ nr_to_scan -= ret;
+ if (nr_to_scan == 0)
+ break;
+ }
+ list_splice_tail(&scanned, &sbi->s_es_lru);
+ spin_unlock(&sbi->s_es_lru_lock);
+ trace_ext4_es_shrink_exit(sbi->s_sb, nr_shrunk);
+
+ return ext4_es_reclaim_extents_count(sbi->s_sb);
+}
+
+void ext4_es_register_shrinker(struct super_block *sb)
+{
+ struct ext4_sb_info *sbi;
+
+ sbi = EXT4_SB(sb);
+ INIT_LIST_HEAD(&sbi->s_es_lru);
+ spin_lock_init(&sbi->s_es_lru_lock);
+ sbi->s_es_shrinker.shrink = ext4_es_shrink;
+ sbi->s_es_shrinker.seeks = DEFAULT_SEEKS;
+ register_shrinker(&sbi->s_es_shrinker);
+}
+
+void ext4_es_unregister_shrinker(struct super_block *sb)
+{
+ unregister_shrinker(&EXT4_SB(sb)->s_es_shrinker);
+}
+
+void ext4_es_lru_add(struct inode *inode)
+{
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+
+ spin_lock(&sbi->s_es_lru_lock);
+ if (list_empty(&ei->i_es_lru))
+ list_add_tail(&ei->i_es_lru, &sbi->s_es_lru);
+ else
+ list_move_tail(&ei->i_es_lru, &sbi->s_es_lru);
+ spin_unlock(&sbi->s_es_lru_lock);
+}
+
+void ext4_es_lru_del(struct inode *inode)
+{
+ struct ext4_inode_info *ei = EXT4_I(inode);
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+
+ spin_lock(&sbi->s_es_lru_lock);
+ if (!list_empty(&ei->i_es_lru))
+ list_del_init(&ei->i_es_lru);
+ spin_unlock(&sbi->s_es_lru_lock);
+}
+
+static int ext4_es_reclaim_extents_count(struct super_block *sb)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_inode_info *ei;
+ struct list_head *cur;
+ int nr_cached = 0;
+
+ spin_lock(&sbi->s_es_lru_lock);
+ list_for_each(cur, &sbi->s_es_lru) {
+ ei = list_entry(cur, struct ext4_inode_info, i_es_lru);
+ read_lock(&ei->i_es_lock);
+ nr_cached += ei->i_es_lru_nr;
+ read_unlock(&ei->i_es_lock);
+ }
+ spin_unlock(&sbi->s_es_lru_lock);
+ trace_ext4_es_reclaim_extents_count(sb, nr_cached);
+ return nr_cached;
+}
+
+static int __es_try_to_reclaim_extents(struct ext4_inode_info *ei,
+ int nr_to_scan)
+{
+ struct inode *inode = &ei->vfs_inode;
+ struct ext4_es_tree *tree = &ei->i_es_tree;
+ struct rb_node *node;
+ struct extent_status *es;
+ int nr_shrunk = 0;
+
+ if (ei->i_es_lru_nr == 0)
+ return 0;
+
+ node = rb_first(&tree->root);
+ while (node != NULL) {
+ es = rb_entry(node, struct extent_status, rb_node);
+ node = rb_next(&es->rb_node);
+ /*
+ * We can't reclaim delayed extent from status tree because
+ * fiemap, bigallic, and seek_data/hole need to use it.
+ */
+ if (!ext4_es_is_delayed(es)) {
+ rb_erase(&es->rb_node, &tree->root);
+ ext4_es_free_extent(inode, es);
+ nr_shrunk++;
+ if (--nr_to_scan == 0)
+ break;
+ }
+ }
+ tree->cache_es = NULL;
+ return nr_shrunk;
+}
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index 8ffc90c..cf83e77 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -106,4 +106,9 @@ static inline void ext4_es_store_status(struct extent_status *es,
es->es_pblk = block;
}

+extern void ext4_es_register_shrinker(struct super_block *sb);
+extern void ext4_es_unregister_shrinker(struct super_block *sb);
+extern void ext4_es_lru_add(struct inode *inode);
+extern void ext4_es_lru_del(struct inode *inode);
+
#endif /* _EXT4_EXTENTS_STATUS_H */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a35c6c1..64d78b1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -858,6 +858,7 @@ static void ext4_put_super(struct super_block *sb)
ext4_abort(sb, "Couldn't clean up the journal");
}

+ ext4_es_unregister_shrinker(sb);
del_timer(&sbi->s_err_report);
ext4_release_system_zone(sb);
ext4_mb_release(sb);
@@ -943,6 +944,8 @@ static struct inode *ext4_alloc_inode(struct super_block *sb)
spin_lock_init(&ei->i_prealloc_lock);
ext4_es_init_tree(&ei->i_es_tree);
rwlock_init(&ei->i_es_lock);
+ INIT_LIST_HEAD(&ei->i_es_lru);
+ ei->i_es_lru_nr = 0;
ei->i_reserved_data_blocks = 0;
ei->i_reserved_meta_blocks = 0;
ei->i_allocated_meta_blocks = 0;
@@ -1030,6 +1033,7 @@ void ext4_clear_inode(struct inode *inode)
dquot_drop(inode);
ext4_discard_preallocations(inode);
ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
+ ext4_es_lru_del(inode);
if (EXT4_I(inode)->jinode) {
jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode),
EXT4_I(inode)->jinode);
@@ -3771,6 +3775,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_max_writeback_mb_bump = 128;
sbi->s_extent_max_zeroout_kb = 32;

+ /* Register extent status tree shrinker */
+ ext4_es_register_shrinker(sb);
+
/*
* set up enough so that it can read an inode
*/
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 1e634c1..4f8e763 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2230,6 +2230,66 @@ TRACE_EVENT(ext4_es_lookup_extent_exit,
__entry->found ? __entry->status : 0)
);

+TRACE_EVENT(ext4_es_reclaim_extents_count,
+ TP_PROTO(struct super_block *sb, int nr_cached),
+
+ TP_ARGS(sb, nr_cached),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( int, nr_cached )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = sb->s_dev;
+ __entry->nr_cached = nr_cached;
+ ),
+
+ TP_printk("dev %d,%d cached objects nr %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->nr_cached)
+);
+
+TRACE_EVENT(ext4_es_shrink_enter,
+ TP_PROTO(struct super_block *sb, int nr_to_scan),
+
+ TP_ARGS(sb, nr_to_scan),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( int, nr_to_scan )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = sb->s_dev;
+ __entry->nr_to_scan = nr_to_scan;
+ ),
+
+ TP_printk("dev %d,%d nr to scan %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->nr_to_scan)
+);
+
+TRACE_EVENT(ext4_es_shrink_exit,
+ TP_PROTO(struct super_block *sb, int shrunk_nr),
+
+ TP_ARGS(sb, shrunk_nr),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( int, shrunk_nr )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = sb->s_dev;
+ __entry->shrunk_nr = shrunk_nr;
+ ),
+
+ TP_printk("dev %d,%d nr to scan %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->shrunk_nr)
+);
+
#endif /* _TRACE_EXT4_H */

/* This part must be outside protection */
--
1.7.12.rc2.18.g61b472e


2013-02-17 16:18:42

by Zheng Liu

[permalink] [raw]
Subject: Re: [PATCH 8/9] ext4: adjust some functions for reclaiming extents from extent status tree

On Mon, Feb 18, 2013 at 12:27:53AM +0800, Zheng Liu wrote:
> From: Zheng Liu <[email protected]>
>
> This commit changes some interfaces in extent status tree because we
> need to use inode to count the cached objects in a extent status tree.
>
> Signed-off-by: Zheng Liu <[email protected]>
> Cc: "Theodore Ts'o" <[email protected]>
> Cc: Jan kara <[email protected]>

Oops, I forgot to add 'v6' in subject. The patch itself is right.

Regards,
- Zheng

> ---
> fs/ext4/extents_status.c | 50 +++++++++++++++++++++++-------------------------
> 1 file changed, 24 insertions(+), 26 deletions(-)
>
> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
> index 473a935..3021858 100644
> --- a/fs/ext4/extents_status.c
> +++ b/fs/ext4/extents_status.c
> @@ -142,9 +142,8 @@
>
> static struct kmem_cache *ext4_es_cachep;
>
> -static int __es_insert_extent(struct ext4_es_tree *tree,
> - struct extent_status *newes);
> -static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
> +static int __es_insert_extent(struct inode *inode, struct extent_status *newes);
> +static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
> ext4_lblk_t end);
>
> int __init ext4_init_es(void)
> @@ -285,7 +284,8 @@ out:
> }
>
> static struct extent_status *
> -ext4_es_alloc_extent(ext4_lblk_t lblk, ext4_lblk_t len, ext4_fsblk_t pblk)
> +ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len,
> + ext4_fsblk_t pblk)
> {
> struct extent_status *es;
> es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC);
> @@ -297,7 +297,7 @@ ext4_es_alloc_extent(ext4_lblk_t lblk, ext4_lblk_t len, ext4_fsblk_t pblk)
> return es;
> }
>
> -static void ext4_es_free_extent(struct extent_status *es)
> +static void ext4_es_free_extent(struct inode *inode, struct extent_status *es)
> {
> kmem_cache_free(ext4_es_cachep, es);
> }
> @@ -326,8 +326,9 @@ static int ext4_es_can_be_merged(struct extent_status *es1,
> }
>
> static struct extent_status *
> -ext4_es_try_to_merge_left(struct ext4_es_tree *tree, struct extent_status *es)
> +ext4_es_try_to_merge_left(struct inode *inode, struct extent_status *es)
> {
> + struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
> struct extent_status *es1;
> struct rb_node *node;
>
> @@ -339,7 +340,7 @@ ext4_es_try_to_merge_left(struct ext4_es_tree *tree, struct extent_status *es)
> if (ext4_es_can_be_merged(es1, es)) {
> es1->es_len += es->es_len;
> rb_erase(&es->rb_node, &tree->root);
> - ext4_es_free_extent(es);
> + ext4_es_free_extent(inode, es);
> es = es1;
> }
>
> @@ -347,8 +348,9 @@ ext4_es_try_to_merge_left(struct ext4_es_tree *tree, struct extent_status *es)
> }
>
> static struct extent_status *
> -ext4_es_try_to_merge_right(struct ext4_es_tree *tree, struct extent_status *es)
> +ext4_es_try_to_merge_right(struct inode *inode, struct extent_status *es)
> {
> + struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
> struct extent_status *es1;
> struct rb_node *node;
>
> @@ -360,15 +362,15 @@ ext4_es_try_to_merge_right(struct ext4_es_tree *tree, struct extent_status *es)
> if (ext4_es_can_be_merged(es, es1)) {
> es->es_len += es1->es_len;
> rb_erase(node, &tree->root);
> - ext4_es_free_extent(es1);
> + ext4_es_free_extent(inode, es1);
> }
>
> return es;
> }
>
> -static int __es_insert_extent(struct ext4_es_tree *tree,
> - struct extent_status *newes)
> +static int __es_insert_extent(struct inode *inode, struct extent_status *newes)
> {
> + struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
> struct rb_node **p = &tree->root.rb_node;
> struct rb_node *parent = NULL;
> struct extent_status *es;
> @@ -389,14 +391,14 @@ static int __es_insert_extent(struct ext4_es_tree *tree,
> ext4_es_is_unwritten(es))
> ext4_es_store_pblock(es,
> newes->es_pblk);
> - es = ext4_es_try_to_merge_left(tree, es);
> + es = ext4_es_try_to_merge_left(inode, es);
> goto out;
> }
> p = &(*p)->rb_left;
> } else if (newes->es_lblk > ext4_es_end(es)) {
> if (ext4_es_can_be_merged(es, newes)) {
> es->es_len += newes->es_len;
> - es = ext4_es_try_to_merge_right(tree, es);
> + es = ext4_es_try_to_merge_right(inode, es);
> goto out;
> }
> p = &(*p)->rb_right;
> @@ -406,7 +408,7 @@ static int __es_insert_extent(struct ext4_es_tree *tree,
> }
> }
>
> - es = ext4_es_alloc_extent(newes->es_lblk, newes->es_len,
> + es = ext4_es_alloc_extent(inode, newes->es_lblk, newes->es_len,
> newes->es_pblk);
> if (!es)
> return -ENOMEM;
> @@ -430,7 +432,6 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
> ext4_lblk_t len, ext4_fsblk_t pblk,
> unsigned long long status)
> {
> - struct ext4_es_tree *tree;
> struct extent_status newes;
> ext4_lblk_t end = lblk + len - 1;
> int err = 0;
> @@ -447,11 +448,10 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
> trace_ext4_es_insert_extent(inode, &newes);
>
> write_lock(&EXT4_I(inode)->i_es_lock);
> - tree = &EXT4_I(inode)->i_es_tree;
> - err = __es_remove_extent(tree, lblk, end);
> + err = __es_remove_extent(inode, lblk, end);
> if (err != 0)
> goto error;
> - err = __es_insert_extent(tree, &newes);
> + err = __es_insert_extent(inode, &newes);
>
> error:
> write_unlock(&EXT4_I(inode)->i_es_lock);
> @@ -521,9 +521,10 @@ out:
> return found;
> }
>
> -static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
> - ext4_lblk_t end)
> +static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
> + ext4_lblk_t end)
> {
> + struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
> struct rb_node *node;
> struct extent_status *es;
> struct extent_status orig_es;
> @@ -561,7 +562,7 @@ static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
> ext4_es_store_pblock(&newes, block);
> }
> ext4_es_store_status(&newes, ext4_es_status(&orig_es));
> - err = __es_insert_extent(tree, &newes);
> + err = __es_insert_extent(inode, &newes);
> if (err) {
> es->es_lblk = orig_es.es_lblk;
> es->es_len = orig_es.es_len;
> @@ -590,7 +591,7 @@ static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
> while (es && ext4_es_end(es) <= end) {
> node = rb_next(&es->rb_node);
> rb_erase(&es->rb_node, &tree->root);
> - ext4_es_free_extent(es);
> + ext4_es_free_extent(inode, es);
> if (!node) {
> es = NULL;
> break;
> @@ -622,7 +623,6 @@ out:
> int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
> ext4_lblk_t len)
> {
> - struct ext4_es_tree *tree;
> ext4_lblk_t end;
> int err = 0;
>
> @@ -633,10 +633,8 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
> end = lblk + len - 1;
> BUG_ON(end < lblk);
>
> - tree = &EXT4_I(inode)->i_es_tree;
> -
> write_lock(&EXT4_I(inode)->i_es_lock);
> - err = __es_remove_extent(tree, lblk, end);
> + err = __es_remove_extent(inode, lblk, end);
> write_unlock(&EXT4_I(inode)->i_es_lock);
> ext4_es_print_tree(inode);
> return err;
> --
> 1.7.12.rc2.18.g61b472e
>

2013-02-18 02:10:37

by Tao Ma

[permalink] [raw]
Subject: Re: [PATCH 8/9] ext4: adjust some functions for reclaiming extents from extent status tree

Hi Zheng,
On 02/18/2013 12:33 AM, Zheng Liu wrote:
> On Mon, Feb 18, 2013 at 12:27:53AM +0800, Zheng Liu wrote:
>> From: Zheng Liu <[email protected]>
>>
>> This commit changes some interfaces in extent status tree because we
>> need to use inode to count the cached objects in a extent status tree.
>>
>> Signed-off-by: Zheng Liu <[email protected]>
>> Cc: "Theodore Ts'o" <[email protected]>
>> Cc: Jan kara <[email protected]>
>
> Oops, I forgot to add 'v6' in subject. The patch itself is right.
You can use git format-patch --subject-prefix="PATCH V6" to create this
series of subjects automatically.

Thanks,
Tao

>
> Regards,
> - Zheng
>
>> ---
>> fs/ext4/extents_status.c | 50 +++++++++++++++++++++++-------------------------
>> 1 file changed, 24 insertions(+), 26 deletions(-)
>>
>> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
>> index 473a935..3021858 100644
>> --- a/fs/ext4/extents_status.c
>> +++ b/fs/ext4/extents_status.c
>> @@ -142,9 +142,8 @@
>>
>> static struct kmem_cache *ext4_es_cachep;
>>
>> -static int __es_insert_extent(struct ext4_es_tree *tree,
>> - struct extent_status *newes);
>> -static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
>> +static int __es_insert_extent(struct inode *inode, struct extent_status *newes);
>> +static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
>> ext4_lblk_t end);
>>
>> int __init ext4_init_es(void)
>> @@ -285,7 +284,8 @@ out:
>> }
>>
>> static struct extent_status *
>> -ext4_es_alloc_extent(ext4_lblk_t lblk, ext4_lblk_t len, ext4_fsblk_t pblk)
>> +ext4_es_alloc_extent(struct inode *inode, ext4_lblk_t lblk, ext4_lblk_t len,
>> + ext4_fsblk_t pblk)
>> {
>> struct extent_status *es;
>> es = kmem_cache_alloc(ext4_es_cachep, GFP_ATOMIC);
>> @@ -297,7 +297,7 @@ ext4_es_alloc_extent(ext4_lblk_t lblk, ext4_lblk_t len, ext4_fsblk_t pblk)
>> return es;
>> }
>>
>> -static void ext4_es_free_extent(struct extent_status *es)
>> +static void ext4_es_free_extent(struct inode *inode, struct extent_status *es)
>> {
>> kmem_cache_free(ext4_es_cachep, es);
>> }
>> @@ -326,8 +326,9 @@ static int ext4_es_can_be_merged(struct extent_status *es1,
>> }
>>
>> static struct extent_status *
>> -ext4_es_try_to_merge_left(struct ext4_es_tree *tree, struct extent_status *es)
>> +ext4_es_try_to_merge_left(struct inode *inode, struct extent_status *es)
>> {
>> + struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
>> struct extent_status *es1;
>> struct rb_node *node;
>>
>> @@ -339,7 +340,7 @@ ext4_es_try_to_merge_left(struct ext4_es_tree *tree, struct extent_status *es)
>> if (ext4_es_can_be_merged(es1, es)) {
>> es1->es_len += es->es_len;
>> rb_erase(&es->rb_node, &tree->root);
>> - ext4_es_free_extent(es);
>> + ext4_es_free_extent(inode, es);
>> es = es1;
>> }
>>
>> @@ -347,8 +348,9 @@ ext4_es_try_to_merge_left(struct ext4_es_tree *tree, struct extent_status *es)
>> }
>>
>> static struct extent_status *
>> -ext4_es_try_to_merge_right(struct ext4_es_tree *tree, struct extent_status *es)
>> +ext4_es_try_to_merge_right(struct inode *inode, struct extent_status *es)
>> {
>> + struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
>> struct extent_status *es1;
>> struct rb_node *node;
>>
>> @@ -360,15 +362,15 @@ ext4_es_try_to_merge_right(struct ext4_es_tree *tree, struct extent_status *es)
>> if (ext4_es_can_be_merged(es, es1)) {
>> es->es_len += es1->es_len;
>> rb_erase(node, &tree->root);
>> - ext4_es_free_extent(es1);
>> + ext4_es_free_extent(inode, es1);
>> }
>>
>> return es;
>> }
>>
>> -static int __es_insert_extent(struct ext4_es_tree *tree,
>> - struct extent_status *newes)
>> +static int __es_insert_extent(struct inode *inode, struct extent_status *newes)
>> {
>> + struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
>> struct rb_node **p = &tree->root.rb_node;
>> struct rb_node *parent = NULL;
>> struct extent_status *es;
>> @@ -389,14 +391,14 @@ static int __es_insert_extent(struct ext4_es_tree *tree,
>> ext4_es_is_unwritten(es))
>> ext4_es_store_pblock(es,
>> newes->es_pblk);
>> - es = ext4_es_try_to_merge_left(tree, es);
>> + es = ext4_es_try_to_merge_left(inode, es);
>> goto out;
>> }
>> p = &(*p)->rb_left;
>> } else if (newes->es_lblk > ext4_es_end(es)) {
>> if (ext4_es_can_be_merged(es, newes)) {
>> es->es_len += newes->es_len;
>> - es = ext4_es_try_to_merge_right(tree, es);
>> + es = ext4_es_try_to_merge_right(inode, es);
>> goto out;
>> }
>> p = &(*p)->rb_right;
>> @@ -406,7 +408,7 @@ static int __es_insert_extent(struct ext4_es_tree *tree,
>> }
>> }
>>
>> - es = ext4_es_alloc_extent(newes->es_lblk, newes->es_len,
>> + es = ext4_es_alloc_extent(inode, newes->es_lblk, newes->es_len,
>> newes->es_pblk);
>> if (!es)
>> return -ENOMEM;
>> @@ -430,7 +432,6 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>> ext4_lblk_t len, ext4_fsblk_t pblk,
>> unsigned long long status)
>> {
>> - struct ext4_es_tree *tree;
>> struct extent_status newes;
>> ext4_lblk_t end = lblk + len - 1;
>> int err = 0;
>> @@ -447,11 +448,10 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>> trace_ext4_es_insert_extent(inode, &newes);
>>
>> write_lock(&EXT4_I(inode)->i_es_lock);
>> - tree = &EXT4_I(inode)->i_es_tree;
>> - err = __es_remove_extent(tree, lblk, end);
>> + err = __es_remove_extent(inode, lblk, end);
>> if (err != 0)
>> goto error;
>> - err = __es_insert_extent(tree, &newes);
>> + err = __es_insert_extent(inode, &newes);
>>
>> error:
>> write_unlock(&EXT4_I(inode)->i_es_lock);
>> @@ -521,9 +521,10 @@ out:
>> return found;
>> }
>>
>> -static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
>> - ext4_lblk_t end)
>> +static int __es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
>> + ext4_lblk_t end)
>> {
>> + struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
>> struct rb_node *node;
>> struct extent_status *es;
>> struct extent_status orig_es;
>> @@ -561,7 +562,7 @@ static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
>> ext4_es_store_pblock(&newes, block);
>> }
>> ext4_es_store_status(&newes, ext4_es_status(&orig_es));
>> - err = __es_insert_extent(tree, &newes);
>> + err = __es_insert_extent(inode, &newes);
>> if (err) {
>> es->es_lblk = orig_es.es_lblk;
>> es->es_len = orig_es.es_len;
>> @@ -590,7 +591,7 @@ static int __es_remove_extent(struct ext4_es_tree *tree, ext4_lblk_t lblk,
>> while (es && ext4_es_end(es) <= end) {
>> node = rb_next(&es->rb_node);
>> rb_erase(&es->rb_node, &tree->root);
>> - ext4_es_free_extent(es);
>> + ext4_es_free_extent(inode, es);
>> if (!node) {
>> es = NULL;
>> break;
>> @@ -622,7 +623,6 @@ out:
>> int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
>> ext4_lblk_t len)
>> {
>> - struct ext4_es_tree *tree;
>> ext4_lblk_t end;
>> int err = 0;
>>
>> @@ -633,10 +633,8 @@ int ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
>> end = lblk + len - 1;
>> BUG_ON(end < lblk);
>>
>> - tree = &EXT4_I(inode)->i_es_tree;
>> -
>> write_lock(&EXT4_I(inode)->i_es_lock);
>> - err = __es_remove_extent(tree, lblk, end);
>> + err = __es_remove_extent(inode, lblk, end);
>> write_unlock(&EXT4_I(inode)->i_es_lock);
>> ext4_es_print_tree(inode);
>> return err;
>> --
>> 1.7.12.rc2.18.g61b472e
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2013-02-18 03:40:12

by Zheng Liu

[permalink] [raw]
Subject: Re: [PATCH 8/9] ext4: adjust some functions for reclaiming extents from extent status tree

On Mon, Feb 18, 2013 at 10:10:19AM +0800, Tao Ma wrote:
> Hi Zheng,
> On 02/18/2013 12:33 AM, Zheng Liu wrote:
> > On Mon, Feb 18, 2013 at 12:27:53AM +0800, Zheng Liu wrote:
> >> From: Zheng Liu <[email protected]>
> >>
> >> This commit changes some interfaces in extent status tree because we
> >> need to use inode to count the cached objects in a extent status tree.
> >>
> >> Signed-off-by: Zheng Liu <[email protected]>
> >> Cc: "Theodore Ts'o" <[email protected]>
> >> Cc: Jan kara <[email protected]>
> >
> > Oops, I forgot to add 'v6' in subject. The patch itself is right.
> You can use git format-patch --subject-prefix="PATCH V6" to create this
> series of subjects automatically.

Thanks for teaching me.

Regards,
- Zheng

2013-02-18 05:38:56

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 0/9 v6] ext4: extent status tree (step2)

Thanks, I've grabbed these patches and have kicked off an xfstests
"auto" run....

- Ted