2011-11-01 10:53:46

by Robin Dong

[permalink] [raw]
Subject: [PATCH 0/8 bigalloc] ext4: change unit of extent's ee_len from block to cluster

From: Robin Dong <[email protected]>

Hi,

This patch series change unit of ee_len (of extent) from "block" to "cluster".
Changing the unit of ee_len will reduce the space occupied by meta data.


2011-11-01 10:53:51

by Robin Dong

[permalink] [raw]
Subject: [PATCH 3/8 bigalloc] ext4: remove unused functions and tags

From: Robin Dong <[email protected]>

remove unused functions and tags.

Signed-off-by: Robin Dong <[email protected]>
---
fs/ext4/extents.c | 108 -----------------------------------------------------
1 files changed, 0 insertions(+), 108 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index c96f64f..d3866d1 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3619,111 +3619,6 @@ out2:
}

/*
- * get_implied_cluster_alloc - check to see if the requested
- * allocation (in the map structure) overlaps with a cluster already
- * allocated in an extent.
- * @sb The filesystem superblock structure
- * @map The requested lblk->pblk mapping
- * @ex The extent structure which might contain an implied
- * cluster allocation
- *
- * This function is called by ext4_ext_map_blocks() after we failed to
- * find blocks that were already in the inode's extent tree. Hence,
- * we know that the beginning of the requested region cannot overlap
- * the extent from the inode's extent tree. There are three cases we
- * want to catch. The first is this case:
- *
- * |--- cluster # N--|
- * |--- extent ---| |---- requested region ---|
- * |==========|
- *
- * The second case that we need to test for is this one:
- *
- * |--------- cluster # N ----------------|
- * |--- requested region --| |------- extent ----|
- * |=======================|
- *
- * The third case is when the requested region lies between two extents
- * within the same cluster:
- * |------------- cluster # N-------------|
- * |----- ex -----| |---- ex_right ----|
- * |------ requested region ------|
- * |================|
- *
- * In each of the above cases, we need to set the map->m_pblk and
- * map->m_len so it corresponds to the return the extent labelled as
- * "|====|" from cluster #N, since it is already in use for data in
- * cluster EXT4_B2C(sbi, map->m_lblk). We will then return 1 to
- * signal to ext4_ext_map_blocks() that map->m_pblk should be treated
- * as a new "allocated" block region. Otherwise, we will return 0 and
- * ext4_ext_map_blocks() will then allocate one or more new clusters
- * by calling ext4_mb_new_blocks().
- */
-static int get_implied_cluster_alloc(struct super_block *sb,
- struct ext4_map_blocks *map,
- struct ext4_extent *ex,
- struct ext4_ext_path *path)
-{
- struct ext4_sb_info *sbi = EXT4_SB(sb);
- ext4_lblk_t c_offset = map->m_lblk & (sbi->s_cluster_ratio-1);
- ext4_lblk_t ex_cluster_start, ex_cluster_end;
- ext4_lblk_t rr_cluster_start, rr_cluster_end;
- ext4_lblk_t ee_block = le32_to_cpu(ex->ee_block);
- ext4_fsblk_t ee_start = ext4_ext_pblock(ex);
- unsigned int ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));
-
- /* The extent passed in that we are trying to match */
- ex_cluster_start = EXT4_B2C(sbi, ee_block);
- ex_cluster_end = EXT4_B2C(sbi, ee_block + ee_len - 1);
-
- /* The requested region passed into ext4_map_blocks() */
- rr_cluster_start = EXT4_B2C(sbi, map->m_lblk);
- rr_cluster_end = EXT4_B2C(sbi, map->m_lblk + map->m_len - 1);
-
- if ((rr_cluster_start == ex_cluster_end) ||
- (rr_cluster_start == ex_cluster_start)) {
- if (rr_cluster_start == ex_cluster_end)
- ee_start += ee_len - 1;
- map->m_pblk = (ee_start & ~(sbi->s_cluster_ratio - 1)) +
- c_offset;
- map->m_len = min(map->m_len,
- (unsigned) sbi->s_cluster_ratio - c_offset);
- /*
- * Check for and handle this case:
- *
- * |--------- cluster # N-------------|
- * |------- extent ----|
- * |--- requested region ---|
- * |===========|
- */
-
- if (map->m_lblk < ee_block)
- map->m_len = min(map->m_len, ee_block - map->m_lblk);
-
- /*
- * Check for the case where there is already another allocated
- * block to the right of 'ex' but before the end of the cluster.
- *
- * |------------- cluster # N-------------|
- * |----- ex -----| |---- ex_right ----|
- * |------ requested region ------|
- * |================|
- */
- if (map->m_lblk > ee_block) {
- ext4_lblk_t next = ext4_ext_next_allocated_block(path);
- map->m_len = min(map->m_len, next - map->m_lblk);
- }
-
- trace_ext4_get_implied_cluster_alloc_exit(sb, map, 1);
- return 1;
- }
-
- trace_ext4_get_implied_cluster_alloc_exit(sb, map, 0);
- return 0;
-}
-
-
-/*
* Block allocation/map/preallocation routine for extents based files
*
*
@@ -3755,7 +3650,6 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
unsigned int result = 0;
struct ext4_allocation_request ar;
ext4_io_end_t *io = EXT4_I(inode)->cur_aio_dio;
- ext4_lblk_t cluster_offset;
struct ext4_map_blocks punch_map;

ext_debug("blocks %u/%u requested for inode %lu\n",
@@ -3963,7 +3857,6 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
*/
map->m_flags &= ~EXT4_MAP_FROM_CLUSTER;
newex.ee_block = cpu_to_le32(map->m_lblk & ~(sbi->s_cluster_ratio-1));
- cluster_offset = map->m_lblk & (sbi->s_cluster_ratio-1);

if (ex)
BUG_ON((le32_to_cpu(ex->ee_block) +
@@ -4033,7 +3926,6 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
free_on_err = 1;
allocated_clusters = ar.len;

-got_allocated_blocks:
/* try to insert new extent into found leaf and return */
ext4_ext_store_pblock(&newex, newblock);
newex.ee_len = cpu_to_le16(allocated_clusters);
--
1.7.3.2


2011-11-01 10:53:50

by Robin Dong

[permalink] [raw]
Subject: [PATCH 2/8 bigalloc] ext4: change ext4_ext_map_blocks to allocate clusters instead of blocks

From: Robin Dong <[email protected]>

We need to align to a cluster when users allocate just one block.

Signed-off-by: Robin Dong <[email protected]>
---
fs/ext4/extents.c | 47 +++++++++++++----------------------------------
1 files changed, 13 insertions(+), 34 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 50f208e..c96f64f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3962,20 +3962,13 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
* Okay, we need to do block allocation.
*/
map->m_flags &= ~EXT4_MAP_FROM_CLUSTER;
- newex.ee_block = cpu_to_le32(map->m_lblk);
+ newex.ee_block = cpu_to_le32(map->m_lblk & ~(sbi->s_cluster_ratio-1));
cluster_offset = map->m_lblk & (sbi->s_cluster_ratio-1);

- /*
- * If we are doing bigalloc, check to see if the extent returned
- * by ext4_ext_find_extent() implies a cluster we can use.
- */
- if (cluster_offset && ex &&
- get_implied_cluster_alloc(inode->i_sb, map, ex, path)) {
- ar.len = allocated = map->m_len;
- newblock = map->m_pblk;
- map->m_flags |= EXT4_MAP_FROM_CLUSTER;
- goto got_allocated_blocks;
- }
+ if (ex)
+ BUG_ON((le32_to_cpu(ex->ee_block) +
+ EXT4_C2B(sbi, ex->ee_len)) >
+ (map->m_lblk & ~(sbi->s_cluster_ratio-1)));

/* find neighbour allocated blocks */
ar.lleft = map->m_lblk;
@@ -3988,16 +3981,6 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
if (err)
goto out2;

- /* Check if the extent after searching to the right implies a
- * cluster we can use. */
- if ((sbi->s_cluster_ratio > 1) && ex2 &&
- get_implied_cluster_alloc(inode->i_sb, map, ex2, path)) {
- ar.len = allocated = map->m_len;
- newblock = map->m_pblk;
- map->m_flags |= EXT4_MAP_FROM_CLUSTER;
- goto got_allocated_blocks;
- }
-
/*
* See if request is beyond maximum number of blocks we can have in
* a single extent. For an initialized extent this limit is
@@ -4012,10 +3995,10 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
map->m_len = EXT_UNINIT_MAX_LEN;

/* Check if we can really insert (m_lblk)::(m_lblk + m_len) extent */
- newex.ee_len = cpu_to_le16(map->m_len);
+ newex.ee_len = cpu_to_le16(EXT4_NUM_B2C(sbi, map->m_len));
err = ext4_ext_check_overlap(sbi, inode, &newex, path);
if (err)
- allocated = EXT4_C2B(sbi, ext4_ext_get_actual_len(&newex));
+ allocated = ext4_ext_get_actual_len(&newex);
else
allocated = map->m_len;

@@ -4049,14 +4032,11 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
ar.goal, newblock, allocated);
free_on_err = 1;
allocated_clusters = ar.len;
- ar.len = EXT4_C2B(sbi, ar.len) - offset;
- if (ar.len > allocated)
- ar.len = allocated;

got_allocated_blocks:
/* try to insert new extent into found leaf and return */
- ext4_ext_store_pblock(&newex, newblock + offset);
- newex.ee_len = cpu_to_le16(ar.len);
+ ext4_ext_store_pblock(&newex, newblock);
+ newex.ee_len = cpu_to_le16(allocated_clusters);
/* Mark uninitialized */
if (flags & EXT4_GET_BLOCKS_UNINIT_EXT){
ext4_ext_mark_uninitialized(&newex);
@@ -4079,7 +4059,8 @@ got_allocated_blocks:
map->m_flags |= EXT4_MAP_UNINIT;
}

- err = check_eofblocks_fl(handle, inode, map->m_lblk, path, ar.len);
+ err = check_eofblocks_fl(handle, inode, map->m_lblk, path,
+ EXT4_C2B(sbi, allocated_clusters));
if (!err)
err = ext4_ext_insert_extent(handle, inode, path,
&newex, flags);
@@ -4099,8 +4080,6 @@ got_allocated_blocks:
/* previous routine could use block we allocated */
newblock = ext4_ext_pblock(&newex);
allocated = EXT4_C2B(sbi, ext4_ext_get_actual_len(&newex));
- if (allocated > map->m_len)
- allocated = map->m_len;
map->m_flags |= EXT4_MAP_NEW;

/*
@@ -4187,7 +4166,7 @@ got_allocated_blocks:
* when it is _not_ an uninitialized extent.
*/
if ((flags & EXT4_GET_BLOCKS_UNINIT_EXT) == 0) {
- ext4_ext_put_in_cache(inode, map->m_lblk, allocated, newblock);
+ ext4_ext_put_in_cache(inode, ar.logical, allocated, newblock);
ext4_update_inode_fsync_trans(handle, inode, 1);
} else
ext4_update_inode_fsync_trans(handle, inode, 0);
@@ -4196,7 +4175,7 @@ out:
allocated = map->m_len;
ext4_ext_show_leaf(inode, path);
map->m_flags |= EXT4_MAP_MAPPED;
- map->m_pblk = newblock;
+ map->m_pblk = newblock + offset;
map->m_len = allocated;
out2:
if (path) {
--
1.7.3.2


2011-11-01 10:53:48

by Robin Dong

[permalink] [raw]
Subject: [PATCH 1/8 bigalloc] ext4: get blocks from ext4_ext_get_actual_len

From: Robin Dong <[email protected]>

Since ee_len's unit change to cluster, it need to transform from clusters
to blocks when use ext4_ext_get_actual_len.

Signed-off-by: Robin Dong <[email protected]>
---
fs/ext4/ext4_extents.h | 2 +-
fs/ext4/extents.c | 164 ++++++++++++++++++++++++++++--------------------
2 files changed, 97 insertions(+), 69 deletions(-)

diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
index a52db3a..eb590fb 100644
--- a/fs/ext4/ext4_extents.h
+++ b/fs/ext4/ext4_extents.h
@@ -71,7 +71,7 @@
*/
struct ext4_extent {
__le32 ee_block; /* first logical block extent covers */
- __le16 ee_len; /* number of blocks covered by extent */
+ __le16 ee_len; /* number of clusters covered by extent */
__le16 ee_start_hi; /* high 16 bits of physical block */
__le32 ee_start_lo; /* low 32 bits of physical block */
};
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 4c38262..50f208e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -303,8 +303,9 @@ ext4_ext_max_entries(struct inode *inode, int depth)

static int ext4_valid_extent(struct inode *inode, struct ext4_extent *ext)
{
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
ext4_fsblk_t block = ext4_ext_pblock(ext);
- int len = ext4_ext_get_actual_len(ext);
+ int len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ext));

return ext4_data_block_valid(EXT4_SB(inode->i_sb), block, len);
}
@@ -406,6 +407,7 @@ int ext4_ext_check_inode(struct inode *inode)
#ifdef EXT_DEBUG
static void ext4_ext_show_path(struct inode *inode, struct ext4_ext_path *path)
{
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
int k, l = path->p_depth;

ext_debug("path:");
@@ -415,10 +417,11 @@ static void ext4_ext_show_path(struct inode *inode, struct ext4_ext_path *path)
ext4_idx_pblock(path->p_idx));
} else if (path->p_ext) {
ext_debug(" %d:[%d]%d:%llu ",
- le32_to_cpu(path->p_ext->ee_block),
- ext4_ext_is_uninitialized(path->p_ext),
- ext4_ext_get_actual_len(path->p_ext),
- ext4_ext_pblock(path->p_ext));
+ le32_to_cpu(path->p_ext->ee_block),
+ ext4_ext_is_uninitialized(path->p_ext),
+ EXT4_C2B(sbi,
+ ext4_ext_get_actual_len(path->p_ext)),
+ ext4_ext_pblock(path->p_ext));
} else
ext_debug(" []");
}
@@ -430,6 +433,7 @@ static void ext4_ext_show_leaf(struct inode *inode, struct ext4_ext_path *path)
int depth = ext_depth(inode);
struct ext4_extent_header *eh;
struct ext4_extent *ex;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
int i;

if (!path)
@@ -443,7 +447,8 @@ static void ext4_ext_show_leaf(struct inode *inode, struct ext4_ext_path *path)
for (i = 0; i < le16_to_cpu(eh->eh_entries); i++, ex++) {
ext_debug("%d:[%d]%d:%llu ", le32_to_cpu(ex->ee_block),
ext4_ext_is_uninitialized(ex),
- ext4_ext_get_actual_len(ex), ext4_ext_pblock(ex));
+ EXT4_C2B(sbi, ext4_ext_get_actual_len(ex),
+ ext4_ext_pblock(ex)));
}
ext_debug("\n");
}
@@ -453,6 +458,7 @@ static void ext4_ext_show_move(struct inode *inode, struct ext4_ext_path *path,
{
int depth = ext_depth(inode);
struct ext4_extent *ex;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);

if (depth != level) {
struct ext4_extent_idx *idx;
@@ -474,7 +480,7 @@ static void ext4_ext_show_move(struct inode *inode, struct ext4_ext_path *path,
le32_to_cpu(ex->ee_block),
ext4_ext_pblock(ex),
ext4_ext_is_uninitialized(ex),
- ext4_ext_get_actual_len(ex),
+ EXT4_C2B(sbi, ext4_ext_get_actual_len(ex)),
newblock);
ex++;
}
@@ -599,7 +605,8 @@ ext4_ext_binsearch(struct inode *inode,
le32_to_cpu(path->p_ext->ee_block),
ext4_ext_pblock(path->p_ext),
ext4_ext_is_uninitialized(path->p_ext),
- ext4_ext_get_actual_len(path->p_ext));
+ EXT4_C2B(EXT4_SB(inode->i_sb),
+ ext4_ext_get_actual_len(path->p_ext)));

#ifdef CHECK_BINSEARCH
{
@@ -1205,6 +1212,7 @@ static int ext4_ext_search_left(struct inode *inode,
{
struct ext4_extent_idx *ix;
struct ext4_extent *ex;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
int depth, ee_len;

if (unlikely(path == NULL)) {
@@ -1222,7 +1230,7 @@ static int ext4_ext_search_left(struct inode *inode,
* first one in the file */

ex = path[depth].p_ext;
- ee_len = ext4_ext_get_actual_len(ex);
+ ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));
if (*logical < le32_to_cpu(ex->ee_block)) {
if (unlikely(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex)) {
EXT4_ERROR_INODE(inode,
@@ -1273,6 +1281,7 @@ static int ext4_ext_search_right(struct inode *inode,
struct ext4_extent_header *eh;
struct ext4_extent_idx *ix;
struct ext4_extent *ex;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
ext4_fsblk_t block;
int depth; /* Note, NOT eh_depth; depth from top of tree */
int ee_len;
@@ -1292,7 +1301,7 @@ static int ext4_ext_search_right(struct inode *inode,
* first one in the file */

ex = path[depth].p_ext;
- ee_len = ext4_ext_get_actual_len(ex);
+ ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));
if (*logical < le32_to_cpu(ex->ee_block)) {
if (unlikely(EXT_FIRST_EXTENT(path[depth].p_hdr) != ex)) {
EXT4_ERROR_INODE(inode,
@@ -1506,7 +1515,9 @@ int
ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
struct ext4_extent *ex2)
{
- unsigned short ext1_ee_len, ext2_ee_len, max_len;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ /* unit: cluster */
+ unsigned int ext1_ee_len, ext2_ee_len, max_len;

/*
* Make sure that either both extents are uninitialized, or
@@ -1523,7 +1534,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
ext1_ee_len = ext4_ext_get_actual_len(ex1);
ext2_ee_len = ext4_ext_get_actual_len(ex2);

- if (le32_to_cpu(ex1->ee_block) + ext1_ee_len !=
+ if (le32_to_cpu(ex1->ee_block) + EXT4_C2B(sbi, ext1_ee_len) !=
le32_to_cpu(ex2->ee_block))
return 0;

@@ -1539,7 +1550,8 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
return 0;
#endif

- if (ext4_ext_pblock(ex1) + ext1_ee_len == ext4_ext_pblock(ex2))
+ if (ext4_ext_pblock(ex1) + EXT4_C2B(sbi, ext1_ee_len) ==
+ ext4_ext_pblock(ex2))
return 1;
return 0;
}
@@ -1633,7 +1645,7 @@ static unsigned int ext4_ext_check_overlap(struct ext4_sb_info *sbi,
unsigned int ret = 0;

b1 = le32_to_cpu(newext->ee_block);
- len1 = ext4_ext_get_actual_len(newext);
+ len1 = EXT4_C2B(sbi, ext4_ext_get_actual_len(newext));
depth = ext_depth(inode);
if (!path[depth].p_ext)
goto out;
@@ -1654,13 +1666,13 @@ static unsigned int ext4_ext_check_overlap(struct ext4_sb_info *sbi,
/* check for wrap through zero on extent logical start block*/
if (b1 + len1 < b1) {
len1 = EXT_MAX_BLOCKS - b1;
- newext->ee_len = cpu_to_le16(len1);
+ newext->ee_len = cpu_to_le16(EXT4_B2C(sbi, len1));
ret = 1;
}

/* check for overlap */
if (b1 + len1 > b2) {
- newext->ee_len = cpu_to_le16(b2 - b1);
+ newext->ee_len = cpu_to_le16(EXT4_B2C(sbi, b2 - b1));
ret = 1;
}
out:
@@ -1701,12 +1713,14 @@ int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
if (ex && !(flag & EXT4_GET_BLOCKS_PRE_IO)
&& ext4_can_extents_be_merged(inode, ex, newext)) {
ext_debug("append [%d]%d block to %d:[%d]%d (from %llu)\n",
- ext4_ext_is_uninitialized(newext),
- ext4_ext_get_actual_len(newext),
- le32_to_cpu(ex->ee_block),
- ext4_ext_is_uninitialized(ex),
- ext4_ext_get_actual_len(ex),
- ext4_ext_pblock(ex));
+ ext4_ext_is_uninitialized(newext),
+ EXT4_C2B(EXT4_SB(inode->i_sb),
+ ext4_ext_get_actual_len(newext)),
+ le32_to_cpu(ex->ee_block),
+ ext4_ext_is_uninitialized(ex),
+ EXT4_C2B(EXT4_SB(inode->i_sb),
+ ext4_ext_get_actual_len(ex)),
+ ext4_ext_pblock(ex));
err = ext4_ext_get_access(handle, inode, path + depth);
if (err)
return err;
@@ -1780,7 +1794,7 @@ has_space:
le32_to_cpu(newext->ee_block),
ext4_ext_pblock(newext),
ext4_ext_is_uninitialized(newext),
- ext4_ext_get_actual_len(newext));
+ EXT4_C2B(sbi, ext4_ext_get_actual_len(newext)));
path[depth].p_ext = EXT_FIRST_EXTENT(eh);
} else if (le32_to_cpu(newext->ee_block)
> le32_to_cpu(nearex->ee_block)) {
@@ -1791,11 +1805,11 @@ has_space:
len = len < 0 ? 0 : len;
ext_debug("insert %d:%llu:[%d]%d after: nearest 0x%p, "
"move %d from 0x%p to 0x%p\n",
- le32_to_cpu(newext->ee_block),
- ext4_ext_pblock(newext),
- ext4_ext_is_uninitialized(newext),
- ext4_ext_get_actual_len(newext),
- nearex, len, nearex + 1, nearex + 2);
+ le32_to_cpu(newext->ee_block),
+ ext4_ext_pblock(newext),
+ ext4_ext_is_uninitialized(newext),
+ EXT4_C2B(sbi, ext4_ext_get_actual_len(newext)),
+ nearex, len, nearex + 1, nearex + 2);
memmove(nearex + 2, nearex + 1, len);
}
path[depth].p_ext = nearex + 1;
@@ -1808,7 +1822,7 @@ has_space:
le32_to_cpu(newext->ee_block),
ext4_ext_pblock(newext),
ext4_ext_is_uninitialized(newext),
- ext4_ext_get_actual_len(newext),
+ EXT4_C2B(sbi, ext4_ext_get_actual_len(newext)),
nearex, len, nearex, nearex + 1);
memmove(nearex + 1, nearex, len);
path[depth].p_ext = nearex;
@@ -1850,6 +1864,7 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
struct ext4_ext_path *path = NULL;
struct ext4_ext_cache cbex;
struct ext4_extent *ex;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
ext4_lblk_t next, start = 0, end = 0;
ext4_lblk_t last = block + num;
int depth, exists, err = 0;
@@ -1891,7 +1906,7 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
if (block + num < end)
end = block + num;
} else if (block >= le32_to_cpu(ex->ee_block)
- + ext4_ext_get_actual_len(ex)) {
+ + EXT4_C2B(sbi, ext4_ext_get_actual_len(ex))) {
/* need to allocate space after found extent */
start = block;
end = block + num;
@@ -1904,7 +1919,7 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
*/
start = block;
end = le32_to_cpu(ex->ee_block)
- + ext4_ext_get_actual_len(ex);
+ + EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));
if (block + num < end)
end = block + num;
exists = 1;
@@ -1915,7 +1930,7 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,

if (!exists) {
cbex.ec_block = start;
- cbex.ec_len = end - start;
+ cbex.ec_len = EXT4_B2C(sbi, end - start);
cbex.ec_start = 0;
} else {
cbex.ec_block = le32_to_cpu(ex->ee_block);
@@ -1947,7 +1962,7 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
path = NULL;
}

- block = cbex.ec_block + cbex.ec_len;
+ block = cbex.ec_block + EXT4_C2B(sbi, cbex.ec_len);
}

if (path) {
@@ -1963,12 +1978,13 @@ ext4_ext_put_in_cache(struct inode *inode, ext4_lblk_t block,
__u32 len, ext4_fsblk_t start)
{
struct ext4_ext_cache *cex;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
BUG_ON(len == 0);
spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
trace_ext4_ext_put_in_cache(inode, block, len, start);
cex = &EXT4_I(inode)->i_cached_extent;
cex->ec_block = block;
- cex->ec_len = len;
+ cex->ec_len = EXT4_B2C(sbi, len);
cex->ec_start = start;
spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
}
@@ -1986,6 +2002,7 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
unsigned long len;
ext4_lblk_t lblock;
struct ext4_extent *ex;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);

ex = path[depth].p_ext;
if (ex == NULL) {
@@ -1999,17 +2016,17 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
ext_debug("cache gap(before): %u [%u:%u]",
block,
le32_to_cpu(ex->ee_block),
- ext4_ext_get_actual_len(ex));
+ EXT4_C2B(sbi, ext4_ext_get_actual_len(ex)));
} else if (block >= le32_to_cpu(ex->ee_block)
- + ext4_ext_get_actual_len(ex)) {
+ + EXT4_C2B(sbi, ext4_ext_get_actual_len(ex))) {
ext4_lblk_t next;
lblock = le32_to_cpu(ex->ee_block)
- + ext4_ext_get_actual_len(ex);
+ + EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));

next = ext4_ext_next_allocated_block(path);
ext_debug("cache gap(after): [%u:%u] %u",
le32_to_cpu(ex->ee_block),
- ext4_ext_get_actual_len(ex),
+ EXT4_C2B(sbi, ext4_ext_get_actual_len(ex)),
block);
BUG_ON(next == lblock);
len = next - lblock;
@@ -2055,11 +2072,12 @@ static int ext4_ext_check_cache(struct inode *inode, ext4_lblk_t block,
if (cex->ec_len == 0)
goto errout;

- if (in_range(block, cex->ec_block, cex->ec_len)) {
+ if (in_range(block, cex->ec_block, EXT4_C2B(sbi, cex->ec_len))) {
memcpy(ex, cex, sizeof(struct ext4_ext_cache));
ext_debug("%u cached by %u:%u:%llu\n",
block,
- cex->ec_block, cex->ec_len, cex->ec_start);
+ cex->ec_block, EXT4_C2B(sbi, cex->ec_len),
+ cex->ec_start);
ret = 1;
}
errout:
@@ -2207,7 +2225,7 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
ext4_lblk_t from, ext4_lblk_t to)
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
- unsigned short ee_len = ext4_ext_get_actual_len(ex);
+ unsigned int ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));
ext4_fsblk_t pblk;
int flags = EXT4_FREE_BLOCKS_FORGET;

@@ -2319,7 +2337,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
ext4_lblk_t a, b, block;
unsigned num;
ext4_lblk_t ex_ee_block;
- unsigned short ex_ee_len;
+ unsigned int ex_ee_len;
unsigned uninitialized = 0;
struct ext4_extent *ex;
struct ext4_map_blocks map;
@@ -2337,7 +2355,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
ex = EXT_LAST_EXTENT(eh);

ex_ee_block = le32_to_cpu(ex->ee_block);
- ex_ee_len = ext4_ext_get_actual_len(ex);
+ ex_ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));

trace_ext4_ext_rm_leaf(inode, start, ex_ee_block, ext4_ext_pblock(ex),
ex_ee_len, *partial_cluster);
@@ -2364,7 +2382,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
if (end <= ex_ee_block) {
ex--;
ex_ee_block = le32_to_cpu(ex->ee_block);
- ex_ee_len = ext4_ext_get_actual_len(ex);
+ ex_ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));
continue;
} else if (a != ex_ee_block &&
b != ex_ee_block + ex_ee_len - 1) {
@@ -2399,7 +2417,8 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
if (err < 0)
goto out;

- ex_ee_len = ext4_ext_get_actual_len(ex);
+ ex_ee_len = EXT4_C2B(sbi,
+ ext4_ext_get_actual_len(ex));

b = ex_ee_block+ex_ee_len - 1 < end ?
ex_ee_block+ex_ee_len - 1 : end;
@@ -2485,7 +2504,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
}

ex->ee_block = cpu_to_le32(block);
- ex->ee_len = cpu_to_le16(num);
+ ex->ee_len = cpu_to_le16(EXT4_B2C(sbi, num));
/*
* Do not mark uninitialized if all the blocks in the
* extent have been removed.
@@ -2523,7 +2542,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
ext4_ext_pblock(ex));
ex--;
ex_ee_block = le32_to_cpu(ex->ee_block);
- ex_ee_len = ext4_ext_get_actual_len(ex);
+ ex_ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));
}

if (correct_index && eh->eh_entries)
@@ -2789,11 +2808,12 @@ void ext4_ext_release(struct super_block *sb)
/* FIXME!! we need to try to merge to left or right after zero-out */
static int ext4_ext_zeroout(struct inode *inode, struct ext4_extent *ex)
{
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
ext4_fsblk_t ee_pblock;
unsigned int ee_len;
int ret;

- ee_len = ext4_ext_get_actual_len(ex);
+ ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));
ee_pblock = ext4_ext_pblock(ex);

ret = sb_issue_zeroout(inode->i_sb, ee_pblock, ee_len, GFP_NOFS);
@@ -2843,6 +2863,7 @@ static int ext4_split_extent_at(handle_t *handle,
ext4_lblk_t ee_block;
struct ext4_extent *ex, newex, orig_ex;
struct ext4_extent *ex2 = NULL;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
unsigned int ee_len, depth;
int err = 0;

@@ -2854,7 +2875,7 @@ static int ext4_split_extent_at(handle_t *handle,
depth = ext_depth(inode);
ex = path[depth].p_ext;
ee_block = le32_to_cpu(ex->ee_block);
- ee_len = ext4_ext_get_actual_len(ex);
+ ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));
newblock = split - ee_block + ext4_ext_pblock(ex);

BUG_ON(split < ee_block || split >= (ee_block + ee_len));
@@ -2883,7 +2904,7 @@ static int ext4_split_extent_at(handle_t *handle,

/* case a */
memcpy(&orig_ex, ex, sizeof(orig_ex));
- ex->ee_len = cpu_to_le16(split - ee_block);
+ ex->ee_len = cpu_to_le16(EXT4_B2C(sbi, split - ee_block));
if (split_flag & EXT4_EXT_MARK_UNINIT1)
ext4_ext_mark_uninitialized(ex);

@@ -2897,7 +2918,7 @@ static int ext4_split_extent_at(handle_t *handle,

ex2 = &newex;
ex2->ee_block = cpu_to_le32(split);
- ex2->ee_len = cpu_to_le16(ee_len - (split - ee_block));
+ ex2->ee_len = cpu_to_le16(EXT4_B2C(sbi, ee_len - (split - ee_block)));
ext4_ext_store_pblock(ex2, newblock);
if (split_flag & EXT4_EXT_MARK_UNINIT2)
ext4_ext_mark_uninitialized(ex2);
@@ -2908,7 +2929,7 @@ static int ext4_split_extent_at(handle_t *handle,
if (err)
goto fix_extent_len;
/* update the extent length and mark as initialized */
- ex->ee_len = cpu_to_le32(ee_len);
+ ex->ee_len = cpu_to_le32(EXT4_B2C(sbi, ee_len));
ext4_ext_try_to_merge(inode, path, ex);
err = ext4_ext_dirty(handle, inode, path + depth);
goto out;
@@ -2945,6 +2966,7 @@ static int ext4_split_extent(handle_t *handle,
{
ext4_lblk_t ee_block;
struct ext4_extent *ex;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
unsigned int ee_len, depth;
int err = 0;
int uninitialized;
@@ -2953,7 +2975,7 @@ static int ext4_split_extent(handle_t *handle,
depth = ext_depth(inode);
ex = path[depth].p_ext;
ee_block = le32_to_cpu(ex->ee_block);
- ee_len = ext4_ext_get_actual_len(ex);
+ ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));
uninitialized = ext4_ext_is_uninitialized(ex);

if (map->m_lblk + map->m_len < ee_block + ee_len) {
@@ -3011,6 +3033,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
struct ext4_map_blocks split_map;
struct ext4_extent zero_ex;
struct ext4_extent *ex;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
ext4_lblk_t ee_block, eof_block;
unsigned int allocated, ee_len, depth;
int err = 0;
@@ -3028,7 +3051,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
depth = ext_depth(inode);
ex = path[depth].p_ext;
ee_block = le32_to_cpu(ex->ee_block);
- ee_len = ext4_ext_get_actual_len(ex);
+ ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));
allocated = ee_len - (map->m_lblk - ee_block);

WARN_ON(map->m_lblk < ee_block);
@@ -3070,7 +3093,7 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
/* case 3 */
zero_ex.ee_block =
cpu_to_le32(map->m_lblk);
- zero_ex.ee_len = cpu_to_le16(allocated);
+ zero_ex.ee_len = cpu_to_le16(EXT4_B2C(sbi, allocated));
ext4_ext_store_pblock(&zero_ex,
ext4_ext_pblock(ex) + map->m_lblk - ee_block);
err = ext4_ext_zeroout(inode, &zero_ex);
@@ -3084,8 +3107,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
/* case 2 */
if (map->m_lblk != ee_block) {
zero_ex.ee_block = ex->ee_block;
- zero_ex.ee_len = cpu_to_le16(map->m_lblk -
- ee_block);
+ zero_ex.ee_len = cpu_to_le16(EXT4_B2C(sbi,
+ map->m_lblk - ee_block));
ext4_ext_store_pblock(&zero_ex,
ext4_ext_pblock(ex));
err = ext4_ext_zeroout(inode, &zero_ex);
@@ -3139,6 +3162,7 @@ static int ext4_split_unwritten_extents(handle_t *handle,
ext4_lblk_t eof_block;
ext4_lblk_t ee_block;
struct ext4_extent *ex;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
unsigned int ee_len;
int split_flag = 0, depth;

@@ -3157,7 +3181,7 @@ static int ext4_split_unwritten_extents(handle_t *handle,
depth = ext_depth(inode);
ex = path[depth].p_ext;
ee_block = le32_to_cpu(ex->ee_block);
- ee_len = ext4_ext_get_actual_len(ex);
+ ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));

split_flag |= ee_block + ee_len <= eof_block ? EXT4_EXT_MAY_ZEROOUT : 0;
split_flag |= EXT4_EXT_MARK_UNINIT2;
@@ -3180,7 +3204,7 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
ext_debug("ext4_convert_unwritten_extents_endio: inode %lu, logical"
"block %llu, max_blocks %u\n", inode->i_ino,
(unsigned long long)le32_to_cpu(ex->ee_block),
- ext4_ext_get_actual_len(ex));
+ EXT4_C2B(EXT4_SB(inode->i_sb), ext4_ext_get_actual_len(ex)));

err = ext4_ext_get_access(handle, inode, path + depth);
if (err)
@@ -3219,6 +3243,7 @@ static int check_eofblocks_fl(handle_t *handle, struct inode *inode,
int i, depth;
struct ext4_extent_header *eh;
struct ext4_extent *last_ex;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);

if (!ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))
return 0;
@@ -3242,7 +3267,7 @@ static int check_eofblocks_fl(handle_t *handle, struct inode *inode,
* function immediately.
*/
if (lblk + len < le32_to_cpu(last_ex->ee_block) +
- ext4_ext_get_actual_len(last_ex))
+ EXT4_C2B(sbi, ext4_ext_get_actual_len(last_ex)))
return 0;
/*
* If the caller does appear to be planning to write at or
@@ -3645,7 +3670,7 @@ static int get_implied_cluster_alloc(struct super_block *sb,
ext4_lblk_t rr_cluster_start, rr_cluster_end;
ext4_lblk_t ee_block = le32_to_cpu(ex->ee_block);
ext4_fsblk_t ee_start = ext4_ext_pblock(ex);
- unsigned short ee_len = ext4_ext_get_actual_len(ex);
+ unsigned int ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));

/* The extent passed in that we are trying to match */
ex_cluster_start = EXT4_B2C(sbi, ee_block);
@@ -3761,7 +3786,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
- le32_to_cpu(newex.ee_block)
+ ext4_ext_pblock(&newex);
/* number of remaining blocks in the extent */
- allocated = ext4_ext_get_actual_len(&newex) -
+ allocated =
+ EXT4_C2B(sbi, ext4_ext_get_actual_len(&newex)) -
(map->m_lblk - le32_to_cpu(newex.ee_block));
goto out;
}
@@ -3796,13 +3822,13 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
ext4_lblk_t ee_block = le32_to_cpu(ex->ee_block);
ext4_fsblk_t ee_start = ext4_ext_pblock(ex);
ext4_fsblk_t partial_cluster = 0;
- unsigned short ee_len;
+ unsigned int ee_len;

/*
* Uninitialized extents are treated as holes, except that
* we split out initialized portions during a write.
*/
- ee_len = ext4_ext_get_actual_len(ex);
+ ee_len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ex));

trace_ext4_ext_show_extent(inode, ee_block, ee_start, ee_len);

@@ -3880,7 +3906,8 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,

depth = ext_depth(inode);
ex = path[depth].p_ext;
- ee_len = ext4_ext_get_actual_len(ex);
+ ee_len = EXT4_C2B(sbi,
+ ext4_ext_get_actual_len(ex));
ee_block = le32_to_cpu(ex->ee_block);
ee_start = ext4_ext_pblock(ex);

@@ -3988,7 +4015,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
newex.ee_len = cpu_to_le16(map->m_len);
err = ext4_ext_check_overlap(sbi, inode, &newex, path);
if (err)
- allocated = ext4_ext_get_actual_len(&newex);
+ allocated = EXT4_C2B(sbi, ext4_ext_get_actual_len(&newex));
else
allocated = map->m_len;

@@ -4064,13 +4091,14 @@ got_allocated_blocks:
* but otherwise we'd need to call it every free() */
ext4_discard_preallocations(inode);
ext4_free_blocks(handle, inode, NULL, ext4_ext_pblock(&newex),
- ext4_ext_get_actual_len(&newex), fb_flags);
+ EXT4_C2B(sbi, ext4_ext_get_actual_len(&newex)),
+ fb_flags);
goto out2;
}

/* previous routine could use block we allocated */
newblock = ext4_ext_pblock(&newex);
- allocated = ext4_ext_get_actual_len(&newex);
+ allocated = EXT4_C2B(sbi, ext4_ext_get_actual_len(&newex));
if (allocated > map->m_len)
allocated = map->m_len;
map->m_flags |= EXT4_MAP_NEW;
--
1.7.3.2


2011-11-01 10:53:53

by Robin Dong

[permalink] [raw]
Subject: [PATCH 4/8 bigalloc] ext4: zeroout extra pages when users write one page

From: Robin Dong <[email protected]>

When users write one page which in the middle of a cluster, we need to zero the
anthor pages around it.

Signed-off-by: Robin Dong <[email protected]>
---
fs/ext4/ext4.h | 18 +++++
fs/ext4/extents.c | 2 +-
fs/ext4/inode.c | 190 +++++++++++++++++++++++++++++++++++++++++++++++++----
3 files changed, 197 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index fba951b..499da1c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -670,6 +670,15 @@ struct move_extent {
#define EXT4_EPOCH_MASK ((1 << EXT4_EPOCH_BITS) - 1)
#define EXT4_NSEC_MASK (~0UL << EXT4_EPOCH_BITS)

+#define EXT4_MAX_CLUSTERSIZE 1048576
+#define EXT4_MAX_CTXT_PAGES (EXT4_MAX_CLUSTERSIZE / PAGE_CACHE_SIZE)
+
+/* tracking cluster write pages */
+struct ext4_write_cluster_ctxt {
+ unsigned long w_num_pages;
+ struct page *w_pages[EXT4_MAX_CTXT_PAGES];
+};
+
/*
* Extended fields will fit into an inode if the filesystem was formatted
* with large inodes (-I 256 or larger) and there are not currently any EAs
@@ -1844,6 +1853,15 @@ extern int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
extern int ext4_trim_fs(struct super_block *, struct fstrim_range *);

/* inode.c */
+int walk_page_buffers(handle_t *handle, struct buffer_head *head,
+ unsigned from, unsigned to, int *partial,
+ int (*fn)(handle_t *handle, struct buffer_head *bh));
+int do_journal_get_write_access(handle_t *handle, struct buffer_head *bh);
+struct ext4_write_cluster_ctxt *ext4_alloc_write_cluster_ctxt(void);
+void ext4_free_write_cluster_ctxt(struct ext4_write_cluster_ctxt *ewcc);
+int ext4_zero_cluster_page(struct inode *inode, int index,
+ struct ext4_write_cluster_ctxt *ewcc, unsigned flags);
+
struct buffer_head *ext4_getblk(handle_t *, struct inode *,
ext4_lblk_t, int, int *);
struct buffer_head *ext4_bread(handle_t *, struct inode *,
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index d3866d1..970d6dc 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3860,7 +3860,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,

if (ex)
BUG_ON((le32_to_cpu(ex->ee_block) +
- EXT4_C2B(sbi, ex->ee_len)) >
+ EXT4_C2B(sbi, ext4_ext_get_actual_len(ex))) >
(map->m_lblk & ~(sbi->s_cluster_ratio-1)));

/* find neighbour allocated blocks */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9b83c3c..beec081 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -38,6 +38,7 @@
#include <linux/printk.h>
#include <linux/slab.h>
#include <linux/ratelimit.h>
+#include <linux/swap.h>

#include "ext4_jbd2.h"
#include "xattr.h"
@@ -49,6 +50,31 @@

#define MPAGE_DA_EXTENT_TAIL 0x01

+static void ext4_write_cluster_add_page(struct ext4_write_cluster_ctxt *ewcc,
+ struct page *page)
+{
+ ewcc->w_pages[ewcc->w_num_pages] = page;
+ ewcc->w_num_pages++;
+}
+
+struct ext4_write_cluster_ctxt *ext4_alloc_write_cluster_ctxt(void)
+{
+ return kzalloc(sizeof(struct ext4_write_cluster_ctxt), GFP_NOFS);
+}
+
+void ext4_free_write_cluster_ctxt(struct ext4_write_cluster_ctxt *ewcc)
+{
+ int i;
+ for (i = 0; i < ewcc->w_num_pages; i++) {
+ if (ewcc->w_pages[i]) {
+ unlock_page(ewcc->w_pages[i]);
+ mark_page_accessed(ewcc->w_pages[i]);
+ page_cache_release(ewcc->w_pages[i]);
+ }
+ }
+ kfree(ewcc);
+}
+
static inline int ext4_begin_ordered_truncate(struct inode *inode,
loff_t new_size)
{
@@ -656,7 +682,7 @@ struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
return NULL;
}

-static int walk_page_buffers(handle_t *handle,
+int walk_page_buffers(handle_t *handle,
struct buffer_head *head,
unsigned from,
unsigned to,
@@ -712,7 +738,7 @@ static int walk_page_buffers(handle_t *handle,
* is elevated. We'll still have enough credits for the tiny quotafile
* write.
*/
-static int do_journal_get_write_access(handle_t *handle,
+int do_journal_get_write_access(handle_t *handle,
struct buffer_head *bh)
{
int dirty = buffer_dirty(bh);
@@ -738,15 +764,95 @@ static int do_journal_get_write_access(handle_t *handle,

static int ext4_get_block_write(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create);
+
+int ext4_zero_cluster_page(struct inode *inode, int index,
+ struct ext4_write_cluster_ctxt *ewcc, unsigned flags)
+{
+ int ret = 0;
+ struct page *page;
+
+ page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
+ if (!page) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ext4_write_cluster_add_page(ewcc, page);
+
+ /* if page is already uptodate and has buffers, don't get_block again
+ */
+ if (PageUptodate(page) && PagePrivate(page))
+ goto out;
+
+ zero_user_segment(page, 0, PAGE_CACHE_SIZE);
+ SetPageUptodate(page);
+ if (ext4_should_dioread_nolock(inode))
+ ret = __block_write_begin(page, index << PAGE_CACHE_SHIFT,
+ PAGE_CACHE_SIZE, ext4_get_block_write);
+ else
+ ret = __block_write_begin(page, index << PAGE_CACHE_SHIFT,
+ PAGE_CACHE_SIZE, ext4_get_block);
+
+out:
+ return ret;
+}
+
+int ext4_prepare_cluster_left_pages(struct inode *inode, int index,
+ struct ext4_write_cluster_ctxt *ewcc, unsigned flags)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ int ret = 0;
+ int block;
+ sector_t left_offset = index & (sbi->s_cluster_ratio - 1);
+ sector_t begin;
+
+ if (left_offset) {
+ begin = index - left_offset;
+ for (block = begin; block < index; block++) {
+ ret = ext4_zero_cluster_page(inode, block, ewcc, flags);
+ if (ret)
+ goto out;
+ }
+ }
+
+out:
+ return ret;
+}
+
+int ext4_prepare_cluster_right_pages(struct inode *inode, int index,
+ struct ext4_write_cluster_ctxt *ewcc, unsigned flags)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ int ret = 0;
+ int block;
+ sector_t left_offset = index & (sbi->s_cluster_ratio - 1);
+ sector_t right_offset = sbi->s_cluster_ratio - left_offset - 1;
+ sector_t begin;
+
+ if (right_offset) {
+ begin = index + 1;
+ for (block = begin; block < index + right_offset + 1; block++) {
+ ret = ext4_zero_cluster_page(inode, block, ewcc, flags);
+ if (ret)
+ goto out;
+ }
+ }
+
+out:
+ return ret;
+}
+
static int ext4_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
{
struct inode *inode = mapping->host;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
int ret, needed_blocks;
handle_t *handle;
- int retries = 0;
- struct page *page;
+ int retries = 0, uninit = 0;
+ struct page *page = NULL;
+ struct ext4_write_cluster_ctxt *ewcc;
pgoff_t index;
unsigned from, to;

@@ -761,6 +867,12 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
to = from + len;

retry:
+ ewcc = ext4_alloc_write_cluster_ctxt();
+ if (!ewcc) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
handle = ext4_journal_start(inode, needed_blocks);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
@@ -771,27 +883,62 @@ retry:
* started */
flags |= AOP_FLAG_NOFS;

+ if (sbi->s_cluster_ratio > 1) {
+ /* We need to know whether the block is allocated already
+ */
+ struct ext4_map_blocks map;
+ map.m_lblk = index;
+ map.m_len = 1;
+ ret = ext4_map_blocks(handle, inode, &map, 0);
+ uninit = map.m_flags & EXT4_MAP_UNWRITTEN;
+ if (ret <= 0 || uninit) {
+ ret = ext4_prepare_cluster_left_pages(inode, index,
+ ewcc, flags);
+ if (ret)
+ goto err_out;
+ }
+ }
+
page = grab_cache_page_write_begin(mapping, index, flags);
if (!page) {
- ext4_journal_stop(handle);
ret = -ENOMEM;
- goto out;
+ goto err_out;
}
+
*pagep = page;

+ ext4_write_cluster_add_page(ewcc, page);
+
if (ext4_should_dioread_nolock(inode))
ret = __block_write_begin(page, pos, len, ext4_get_block_write);
else
ret = __block_write_begin(page, pos, len, ext4_get_block);

+ if (sbi->s_cluster_ratio > 1 && uninit) {
+ ret = ext4_prepare_cluster_right_pages(inode, index,
+ ewcc, flags);
+ if (ret)
+ goto err_out;
+ }
+
if (!ret && ext4_should_journal_data(inode)) {
- ret = walk_page_buffers(handle, page_buffers(page),
+ int i;
+ unsigned long from, to;
+ for (i = 0; i < ewcc->w_num_pages; i++) {
+ page = ewcc->w_pages[i];
+ if (!page || !page_buffers(page))
+ continue;
+ from = page->index << PAGE_CACHE_SHIFT;
+ to = from + PAGE_CACHE_SIZE;
+ ret = walk_page_buffers(handle, page_buffers(page),
from, to, NULL, do_journal_get_write_access);
+ if (ret)
+ break;
+ }
}

if (ret) {
- unlock_page(page);
- page_cache_release(page);
+ ext4_free_write_cluster_ctxt(ewcc);
/*
* __block_write_begin may have instantiated a few blocks
* outside i_size. Trim these off again. Don't need
@@ -819,8 +966,15 @@ retry:

if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
goto retry;
+
+ *fsdata = ewcc;
out:
return ret;
+
+err_out:
+ ext4_free_write_cluster_ctxt(ewcc);
+ ext4_journal_stop(handle);
+ return ret;
}

/* For write_end() in data=journal mode */
@@ -837,11 +991,24 @@ static int ext4_generic_write_end(struct file *file,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
{
- int i_size_changed = 0;
+ int i_size_changed = 0, i;
struct inode *inode = mapping->host;
+ struct ext4_write_cluster_ctxt *ewcc = fsdata;
handle_t *handle = ext4_journal_current_handle();

copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
+ for (i = 0; i < ewcc->w_num_pages; i++) {
+ unsigned long pos;
+ struct page *cluster_page;
+ cluster_page = ewcc->w_pages[i];
+ if (!cluster_page)
+ break;
+ if (cluster_page == page)
+ continue;
+ pos = cluster_page->index << PAGE_CACHE_SHIFT;
+ block_write_end(file, mapping, pos, PAGE_CACHE_SIZE,
+ PAGE_CACHE_SIZE, cluster_page, fsdata);
+ }

/*
* No need to use i_size_read() here, the i_size
@@ -863,8 +1030,7 @@ static int ext4_generic_write_end(struct file *file,
ext4_update_i_disksize(inode, (pos + copied));
i_size_changed = 1;
}
- unlock_page(page);
- page_cache_release(page);
+ ext4_free_write_cluster_ctxt(ewcc);

/*
* Don't mark the inode dirty under page lock. First, it unnecessarily
--
1.7.3.2


2011-11-01 10:54:00

by Robin Dong

[permalink] [raw]
Subject: [PATCH 5/8 bigalloc] ext4: zero out extra pages when truncate file

From: Robin Dong <[email protected]>

When truncate file to be larger, we need to zero out the pages which beyond
the old i_size.

Signed-off-by: Robin Dong <[email protected]>
---
fs/ext4/ext4.h | 4 +-
fs/ext4/extents.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++-
fs/ext4/inode.c | 13 ++++----
fs/ext4/ioctl.c | 2 +-
fs/ext4/super.c | 2 +-
fs/ext4/truncate.h | 2 +-
6 files changed, 89 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 499da1c..0f58245 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1881,7 +1881,7 @@ extern void ext4_dirty_inode(struct inode *, int);
extern int ext4_change_inode_journal_flag(struct inode *, int);
extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *);
extern int ext4_can_truncate(struct inode *inode);
-extern void ext4_truncate(struct inode *);
+extern void ext4_truncate(struct inode *, loff_t oldsize);
extern int ext4_punch_hole(struct file *file, loff_t offset, loff_t length);
extern int ext4_truncate_restart_trans(handle_t *, struct inode *, int nblocks);
extern void ext4_set_inode_flags(struct inode *);
@@ -2262,7 +2262,7 @@ extern int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks,
int chunk);
extern int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
struct ext4_map_blocks *map, int flags);
-extern void ext4_ext_truncate(struct inode *);
+extern void ext4_ext_truncate(struct inode *, loff_t oldsize);
extern int ext4_ext_punch_hole(struct file *file, loff_t offset,
loff_t length);
extern void ext4_ext_init(struct super_block *);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 970d6dc..5c091cf 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4083,10 +4083,76 @@ out2:
return err ? err : result;
}

-void ext4_ext_truncate(struct inode *inode)
+int ext4_ext_truncate_zero_pages(handle_t *handle, struct inode *inode,
+ loff_t old_size)
+{
+ struct super_block *sb = inode->i_sb;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ struct ext4_write_cluster_ctxt *ewcc = NULL;
+ struct page *page;
+ ext4_lblk_t last_block = ((old_size + sb->s_blocksize - 1)
+ >> EXT4_BLOCK_SIZE_BITS(sb)) - 1;
+ ext4_lblk_t left_offset = last_block & (sbi->s_cluster_ratio - 1);
+ ext4_lblk_t right_offset = sbi->s_cluster_ratio - left_offset - 1;
+ ext4_lblk_t begin, index;
+ unsigned long i;
+ int ret = 0;
+ unsigned from, to;
+
+ if (sbi->s_cluster_ratio <= 1)
+ goto out;
+
+ if (right_offset) {
+ struct ext4_map_blocks map;
+ map.m_lblk = last_block;
+ map.m_len = 1;
+ if (ext4_map_blocks(handle, inode, &map, 0) <= 0
+ || map.m_flags & EXT4_MAP_UNWRITTEN)
+ goto out;
+
+ ewcc = ext4_alloc_write_cluster_ctxt();
+ if (!ewcc) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ begin = last_block + 1;
+ for (index = begin; index < last_block + right_offset + 1;
+ index++) {
+ ret = ext4_zero_cluster_page(inode, index, ewcc,
+ mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS);
+ if (ret)
+ goto out;
+ }
+
+ if (ext4_should_journal_data(inode)) {
+ for (i = 0; i < ewcc->w_num_pages; i++) {
+ page = ewcc->w_pages[i];
+ if (!page || !page_buffers(page))
+ continue;
+ from = page->index << PAGE_CACHE_SHIFT;
+ to = from + PAGE_CACHE_SIZE;
+ ret = walk_page_buffers(handle,
+ page_buffers(page), from, to, NULL,
+ do_journal_get_write_access);
+ if (ret)
+ goto out;
+ }
+ }
+ }
+
+out:
+ if (ewcc)
+ ext4_free_write_cluster_ctxt(ewcc);
+
+ return ret;
+}
+
+void ext4_ext_truncate(struct inode *inode, loff_t old_size)
{
struct address_space *mapping = inode->i_mapping;
struct super_block *sb = inode->i_sb;
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
ext4_lblk_t last_block;
handle_t *handle;
int err = 0;
@@ -4108,6 +4174,9 @@ void ext4_ext_truncate(struct inode *inode)
if (inode->i_size & (sb->s_blocksize - 1))
ext4_block_truncate_page(handle, mapping, inode->i_size);

+ if (ext4_ext_truncate_zero_pages(handle, inode, old_size))
+ goto out_stop;
+
if (ext4_orphan_add(handle, inode))
goto out_stop;

@@ -4128,6 +4197,13 @@ void ext4_ext_truncate(struct inode *inode)

last_block = (inode->i_size + sb->s_blocksize - 1)
>> EXT4_BLOCK_SIZE_BITS(sb);
+
+ if (sbi->s_cluster_ratio > 1 &&
+ (last_block & (sbi->s_cluster_ratio - 1))) {
+ last_block = (last_block & ~(sbi->s_cluster_ratio - 1)) +
+ sbi->s_cluster_ratio;
+ }
+
err = ext4_ext_remove_space(inode, last_block);

/* In a multi-transaction truncate, we only make the final
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index beec081..2b70377 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -213,7 +213,7 @@ void ext4_evict_inode(struct inode *inode)
goto stop_handle;
}
if (inode->i_blocks)
- ext4_truncate(inode);
+ ext4_truncate(inode, 0);

/*
* ext4_ext_truncate() doesn't reserve any slop when it
@@ -3343,7 +3343,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
* that's fine - as long as they are linked from the inode, the post-crash
* ext4_truncate() run will find them and release them.
*/
-void ext4_truncate(struct inode *inode)
+void ext4_truncate(struct inode *inode, loff_t old_size)
{
trace_ext4_truncate_enter(inode);

@@ -3356,7 +3356,7 @@ void ext4_truncate(struct inode *inode)
ext4_set_inode_state(inode, EXT4_STATE_DA_ALLOC_CLOSE);

if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
- ext4_ext_truncate(inode);
+ ext4_ext_truncate(inode, old_size);
else
ext4_ind_truncate(inode);

@@ -4123,11 +4123,12 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr)
}

if (attr->ia_valid & ATTR_SIZE) {
- if (attr->ia_size != i_size_read(inode)) {
+ loff_t old_size = i_size_read(inode);
+ if (attr->ia_size != old_size) {
truncate_setsize(inode, attr->ia_size);
- ext4_truncate(inode);
+ ext4_truncate(inode, old_size);
} else if (ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))
- ext4_truncate(inode);
+ ext4_truncate(inode, 0);
}

if (!rc) {
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 4a5081a..6eb2f4f 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -100,7 +100,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
goto flags_out;
}
} else if (oldflags & EXT4_EOFBLOCKS_FL)
- ext4_truncate(inode);
+ ext4_truncate(inode, 0);

handle = ext4_journal_start(inode, 1);
if (IS_ERR(handle)) {
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2cf4ae0..beea7a1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2229,7 +2229,7 @@ static void ext4_orphan_cleanup(struct super_block *sb,
__func__, inode->i_ino, inode->i_size);
jbd_debug(2, "truncating inode %lu to %lld bytes\n",
inode->i_ino, inode->i_size);
- ext4_truncate(inode);
+ ext4_truncate(inode, 0);
nr_truncates++;
} else {
ext4_msg(sb, KERN_DEBUG,
diff --git a/fs/ext4/truncate.h b/fs/ext4/truncate.h
index 011ba66..2be0783 100644
--- a/fs/ext4/truncate.h
+++ b/fs/ext4/truncate.h
@@ -11,7 +11,7 @@
static inline void ext4_truncate_failed_write(struct inode *inode)
{
truncate_inode_pages(inode->i_mapping, inode->i_size);
- ext4_truncate(inode);
+ ext4_truncate(inode, 0);
}

/*
--
1.7.3.2


2011-11-01 10:54:05

by Robin Dong

[permalink] [raw]
Subject: [PATCH 6/8 bigalloc] ext4: directories allocate a cluster when it need spaces

From: Robin Dong <[email protected]>

allocate a cluster instead of a block when directories need new space.

Signed-off-by: Robin Dong <[email protected]>
---
fs/ext4/namei.c | 54 +++++++++++++++++++++++++++++++++++++++---------------
1 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 1c924fa..e8618ea 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -49,6 +49,16 @@
#define NAMEI_RA_SIZE (NAMEI_RA_CHUNKS * NAMEI_RA_BLOCKS)
#define NAMEI_RA_INDEX(c,b) (((c) * NAMEI_RA_BLOCKS) + (b))

+static void ext4_zero_append(handle_t *handle, struct inode *inode,
+ struct buffer_head *bh, ext4_lblk_t block)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+
+ if ((sbi->s_cluster_ratio > 1) &&
+ (block % sbi->s_cluster_ratio))
+ memset(bh->b_data, 0, inode->i_sb->s_blocksize);
+}
+
static struct buffer_head *ext4_append(handle_t *handle,
struct inode *inode,
ext4_lblk_t *block, int *err)
@@ -59,6 +69,7 @@ static struct buffer_head *ext4_append(handle_t *handle,

bh = ext4_bread(handle, inode, *block, 1, err);
if (bh) {
+ ext4_zero_append(handle, inode, bh, *block);
inode->i_size += inode->i_sb->s_blocksize;
EXT4_I(inode)->i_disksize = inode->i_size;
*err = ext4_journal_get_write_access(handle, bh);
@@ -1811,10 +1822,12 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, int mode)
{
handle_t *handle;
struct inode *inode;
- struct buffer_head *dir_block = NULL;
+ struct buffer_head *first_block = NULL;
+ struct buffer_head *dir_block[EXT4_MAX_CTXT_PAGES];
struct ext4_dir_entry_2 *de;
+ struct ext4_sb_info *sbi = EXT4_SB(dir->i_sb);
unsigned int blocksize = dir->i_sb->s_blocksize;
- int err, retries = 0;
+ int i, err, retries = 0;

if (EXT4_DIR_LINK_MAX(dir))
return -EMLINK;
@@ -1824,6 +1837,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, int mode)
retry:
handle = ext4_journal_start(dir, EXT4_DATA_TRANS_BLOCKS(dir->i_sb) +
EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3 +
+ sbi->s_cluster_ratio +
EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb));
if (IS_ERR(handle))
return PTR_ERR(handle);
@@ -1840,14 +1854,20 @@ retry:
inode->i_op = &ext4_dir_inode_operations;
inode->i_fop = &ext4_dir_operations;
inode->i_size = EXT4_I(inode)->i_disksize = inode->i_sb->s_blocksize;
- dir_block = ext4_bread(handle, inode, 0, 1, &err);
- if (!dir_block)
- goto out_clear_inode;
- BUFFER_TRACE(dir_block, "get_write_access");
- err = ext4_journal_get_write_access(handle, dir_block);
- if (err)
- goto out_clear_inode;
- de = (struct ext4_dir_entry_2 *) dir_block->b_data;
+
+ for (i = 0; i < sbi->s_cluster_ratio; i++) {
+ dir_block[i] = ext4_bread(handle, inode, i, 1, &err);
+ if (!dir_block[i])
+ goto out_clear_inode;
+ BUFFER_TRACE(dir_block[i], "get_write_access");
+ memset(dir_block[i]->b_data, 0, inode->i_sb->s_blocksize);
+ set_buffer_uptodate(dir_block[i]);
+ err = ext4_journal_get_write_access(handle, dir_block[i]);
+ if (err)
+ goto out_clear_inode;
+ }
+ first_block = dir_block[0];
+ de = (struct ext4_dir_entry_2 *) first_block->b_data;
de->inode = cpu_to_le32(inode->i_ino);
de->name_len = 1;
de->rec_len = ext4_rec_len_to_disk(EXT4_DIR_REC_LEN(de->name_len),
@@ -1862,10 +1882,13 @@ retry:
strcpy(de->name, "..");
ext4_set_de_type(dir->i_sb, de, S_IFDIR);
inode->i_nlink = 2;
- BUFFER_TRACE(dir_block, "call ext4_handle_dirty_metadata");
- err = ext4_handle_dirty_metadata(handle, dir, dir_block);
- if (err)
- goto out_clear_inode;
+ BUFFER_TRACE(first_block, "call ext4_handle_dirty_metadata");
+
+ for (i = 0; i < sbi->s_cluster_ratio; i++) {
+ err = ext4_handle_dirty_metadata(handle, dir, dir_block[i]);
+ if (err)
+ goto out_clear_inode;
+ }
err = ext4_mark_inode_dirty(handle, inode);
if (!err)
err = ext4_add_entry(handle, dentry, inode);
@@ -1885,7 +1908,8 @@ out_clear_inode:
d_instantiate(dentry, inode);
unlock_new_inode(inode);
out_stop:
- brelse(dir_block);
+ for (i = 0; i < sbi->s_cluster_ratio; i++)
+ brelse(dir_block[i]);
ext4_journal_stop(handle);
if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
goto retry;
--
1.7.3.2


2011-11-01 10:54:14

by Robin Dong

[permalink] [raw]
Subject: [PATCH 8/8 bigalloc] ext4: make cluster works for mmap

From: Robin Dong <[email protected]>

When users write a page in mmap regioin, it need to zero out other
pages around it.

Signed-off-by: Robin Dong <[email protected]>
---
fs/ext4/inode.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 68 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2b70377..56a5401 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4545,13 +4545,17 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
struct page *page = vmf->page;
loff_t size;
unsigned long len;
- int ret;
+ int ret, i, uninit = 0;
struct file *file = vma->vm_file;
struct inode *inode = file->f_path.dentry->d_inode;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
struct address_space *mapping = inode->i_mapping;
+ struct ext4_write_cluster_ctxt *ewcc = NULL;
handle_t *handle;
get_block_t *get_block;
int retries = 0;
+ unsigned int flags = AOP_FLAG_NOFS;
+ unsigned long from, to;

/*
* This check is racy but catches the common case. We rely on
@@ -4608,7 +4612,47 @@ retry_alloc:
ret = VM_FAULT_SIGBUS;
goto out;
}
+
+ ewcc = ext4_alloc_write_cluster_ctxt();
+ if (!ewcc) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ if (sbi->s_cluster_ratio > 1) {
+ /* We need to know whether the block is allocated already
+ */
+ struct ext4_map_blocks map;
+ map.m_lblk = page->index;
+ map.m_len = 1;
+ ret = ext4_map_blocks(handle, inode, &map, 0);
+ uninit = map.m_flags & EXT4_MAP_UNWRITTEN;
+ if (ret <= 0 || uninit) {
+ ret = ext4_prepare_cluster_left_pages(inode,
+ page->index, ewcc, flags);
+ if (ret)
+ goto err_out;
+ }
+ }
+
ret = __block_page_mkwrite(vma, vmf, get_block);
+ if (ret)
+ goto err_out;
+
+ if (sbi->s_cluster_ratio > 1 && uninit) {
+ ret = ext4_prepare_cluster_right_pages(inode, page->index,
+ ewcc, flags);
+ if (ret)
+ goto err_out;
+ for (i = 0; i < ewcc->w_num_pages; i++) {
+ if (!ewcc->w_pages[i] ||
+ !page_buffers(ewcc->w_pages[i]))
+ break;
+ block_commit_write(ewcc->w_pages[i],
+ 0, PAGE_CACHE_SIZE);
+ }
+ }
+
if (!ret && ext4_should_journal_data(inode)) {
if (walk_page_buffers(handle, page_buffers(page), 0,
PAGE_CACHE_SIZE, NULL, do_journal_get_write_access)) {
@@ -4616,13 +4660,36 @@ retry_alloc:
ret = VM_FAULT_SIGBUS;
goto out;
}
+
+ for (i = 0; i < ewcc->w_num_pages; i++) {
+ page = ewcc->w_pages[i];
+ if (!page || !page_buffers(page))
+ continue;
+ from = page->index << PAGE_CACHE_SHIFT;
+ to = from + PAGE_CACHE_SIZE;
+ ret = walk_page_buffers(handle, page_buffers(page),
+ from, to, NULL, do_journal_get_write_access);
+ if (ret) {
+ ret = VM_FAULT_SIGBUS;
+ goto out;
+ }
+ }
ext4_set_inode_state(inode, EXT4_STATE_JDATA);
}
+
+err_out:
+ if (ewcc) {
+ ext4_free_write_cluster_ctxt(ewcc);
+ ewcc = NULL;
+ }
ext4_journal_stop(handle);
if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
goto retry_alloc;
out_ret:
ret = block_page_mkwrite_return(ret);
+
out:
+ if (ewcc)
+ ext4_free_write_cluster_ctxt(ewcc);
return ret;
}
--
1.7.3.2


2011-11-01 10:54:12

by Robin Dong

[permalink] [raw]
Subject: [PATCH 7/8 bigalloc] ext4: align fallocate size to a whole cluster

From: Robin Dong <[email protected]>

align fallocate size to a whole cluster

Signed-off-by: Robin Dong <[email protected]>
---
fs/ext4/extents.c | 34 ++++++++++++++++++++++++++++++++--
1 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 5c091cf..8455bef 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3490,8 +3490,11 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
struct ext4_ext_path *path, int flags,
unsigned int allocated, ext4_fsblk_t newblock)
{
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+ struct ext4_map_blocks convert_map;
int ret = 0;
int err = 0;
+ int offset;
ext4_io_end_t *io = EXT4_I(inode)->cur_aio_dio;

ext_debug("ext4_ext_handle_uninitialized_extents: inode %lu, logical"
@@ -3555,8 +3558,14 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
}

/* buffered write, writepage time, convert*/
- ret = ext4_ext_convert_to_initialized(handle, inode, map, path);
+ offset = map->m_lblk & (sbi->s_cluster_ratio - 1);
+ convert_map.m_len =
+ EXT4_C2B(sbi, EXT4_NUM_B2C(sbi, offset + map->m_len));
+ convert_map.m_lblk = map->m_lblk - offset;
+ ret = ext4_ext_convert_to_initialized(handle, inode,
+ &convert_map, path);
if (ret >= 0) {
+ ret = map->m_len;
ext4_update_inode_fsync_trans(handle, inode, 1);
err = check_eofblocks_fl(handle, inode, map->m_lblk, path,
map->m_len);
@@ -4270,8 +4279,9 @@ static void ext4_falloc_update_inode(struct inode *inode,
long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
{
struct inode *inode = file->f_path.dentry->d_inode;
+ struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
handle_t *handle;
- loff_t new_size;
+ loff_t new_size, old_size;
unsigned int max_blocks;
int ret = 0;
int ret2 = 0;
@@ -4301,6 +4311,8 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
*/
max_blocks = (EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits)
- map.m_lblk;
+
+ old_size = i_size_read(inode);
/*
* credits to insert 1 extent into extent tree
*/
@@ -4355,6 +4367,24 @@ retry:
goto retry;
}
mutex_unlock(&inode->i_mutex);
+
+ /* if the fallocate expand the file size, we need to zeroout
+ * extra pages in cluster */
+ if (len + offset > old_size) {
+ credits = ext4_chunk_trans_blocks(inode, sbi->s_cluster_ratio);
+ handle = ext4_journal_start(inode, credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ goto out;
+ }
+ ext4_ext_truncate_zero_pages(handle, inode, old_size);
+ if (IS_SYNC(inode))
+ ext4_handle_sync(handle);
+ ext4_mark_inode_dirty(handle, inode);
+ ext4_journal_stop(handle);
+ }
+
+out:
trace_ext4_fallocate_exit(inode, offset, max_blocks,
ret > 0 ? ret2 : ret);
return ret > 0 ? ret2 : ret;
--
1.7.3.2


2011-11-02 18:28:47

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 1/8 bigalloc] ext4: get blocks from ext4_ext_get_actual_len

On 2011-11-01, at 4:53 AM, Robin Dong wrote:
> From: Robin Dong <[email protected]>
>
> Since ee_len's unit change to cluster, it need to transform from clusters
> to blocks when use ext4_ext_get_actual_len.

Robin,
thanks for working on and submitting these patches so quickly.

> struct ext4_extent {
> __le32 ee_block; /* first logical block extent covers */
> - __le16 ee_len; /* number of blocks covered by extent */
> + __le16 ee_len; /* number of clusters covered by extent */

It would make sense that ee_block should also be changed to be measured
in units of clusters instead of blocks, since there is no value to
using extents with cluster size if they are not also cluster aligned.

I think this would also simplify some of the code.

> static int ext4_valid_extent(struct inode *inode, struct ext4_extent *ext)
> {
> + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);

Why allocate "*sbi" on the stack in all of these functions for a
single use? This provides no benefit, but can increase the stack
usage considerably due to repeated allocations.

> ext4_fsblk_t block = ext4_ext_pblock(ext);
> + int len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ext));

It probably makes more sense to pass "sb" or "sbi" as a parameter to
ext4_ext_get_actual_len() and then have it return the proper length
in blocks (i.e. call EXT4_C2B() internally), which will simplify all
of the callers and avoid potential bugs if some code does not use it.

> @@ -1523,7 +1534,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
> ext1_ee_len = ext4_ext_get_actual_len(ex1);
> ext2_ee_len = ext4_ext_get_actual_len(ex2);
>
> - if (le32_to_cpu(ex1->ee_block) + ext1_ee_len !=
> + if (le32_to_cpu(ex1->ee_block) + EXT4_C2B(sbi, ext1_ee_len) !=
> le32_to_cpu(ex2->ee_block))

If both ee_len and ee_block are in the same units (blocks or clusters),
then there is no need to convert units for this function at all.


Cheers, Andreas






2011-11-03 08:50:04

by Yongqiang Yang

[permalink] [raw]
Subject: Re: [PATCH 1/8 bigalloc] ext4: get blocks from ext4_ext_get_actual_len

On Thu, Nov 3, 2011 at 2:29 AM, Andreas Dilger <[email protected]> wrote:
> On 2011-11-01, at 4:53 AM, Robin Dong wrote:
>> From: Robin Dong <[email protected]>
>>
>> Since ee_len's unit change to cluster, it need to transform from clusters
>> to blocks when use ext4_ext_get_actual_len.
>
> Robin,
> thanks for working on and submitting these patches so quickly.
>
>> struct ext4_extent {
>> ? ? ? __le32 ?ee_block; ? ? ? /* first logical block extent covers */
>> - ? ? __le16 ?ee_len; ? ? ? ? /* number of blocks covered by extent */
>> + ? ? __le16 ?ee_len; ? ? ? ? /* number of clusters covered by extent */
>
> It would make sense that ee_block should also be changed to be measured
> in units of clusters instead of blocks, since there is no value to
> using extents with cluster size if they are not also cluster aligned.
>
> I think this would also simplify some of the code.
Actually, after these patches are applied, both logical block and
physical block are all cluster sized. So I have a suggestion that we
can simply tell users that ext4 can use large size block rather than
cluster.

Yongqiang.

>
>> static int ext4_valid_extent(struct inode *inode, struct ext4_extent *ext)
>> {
>> + ? ? struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>
> Why allocate "*sbi" on the stack in all of these functions for a
> single use? ?This provides no benefit, but can increase the stack
> usage considerably due to repeated allocations.
>
>> ? ? ? ext4_fsblk_t block = ext4_ext_pblock(ext);
>> + ? ? int len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ext));
>
> It probably makes more sense to pass "sb" or "sbi" as a parameter to
> ext4_ext_get_actual_len() and then have it return the proper length
> in blocks (i.e. call EXT4_C2B() internally), which will simplify all
> of the callers and avoid potential bugs if some code does not use it.
>
>> @@ -1523,7 +1534,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
>> ? ? ? ext1_ee_len = ext4_ext_get_actual_len(ex1);
>> ? ? ? ext2_ee_len = ext4_ext_get_actual_len(ex2);
>>
>> - ? ? if (le32_to_cpu(ex1->ee_block) + ext1_ee_len !=
>> + ? ? if (le32_to_cpu(ex1->ee_block) + EXT4_C2B(sbi, ext1_ee_len) !=
>> ? ? ? ? ? ? ? ? ? ? ? le32_to_cpu(ex2->ee_block))
>
> If both ee_len and ee_block are in the same units (blocks or clusters),
> then there is no need to convert units for this function at all.
>
>
> Cheers, Andreas
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>



--
Best Wishes
Yongqiang Yang

2011-11-03 17:56:44

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH 1/8 bigalloc] ext4: get blocks from ext4_ext_get_actual_len

On 2011-11-03, at 2:50 AM, Yongqiang Yang wrote:
> On Thu, Nov 3, 2011 at 2:29 AM, Andreas Dilger <[email protected]> wrote:
>> On 2011-11-01, at 4:53 AM, Robin Dong wrote:
>>> From: Robin Dong <[email protected]>
>>>
>>> Since ee_len's unit change to cluster, it need to transform from clusters
>>> to blocks when use ext4_ext_get_actual_len.
>>
>> Robin,
>> thanks for working on and submitting these patches so quickly.
>>
>>> struct ext4_extent {
>>> __le32 ee_block; /* first logical block extent covers */
>>> - __le16 ee_len; /* number of blocks covered by extent */
>>> + __le16 ee_len; /* number of clusters covered by extent */
>>
>> It would make sense that ee_block should also be changed to be measured
>> in units of clusters instead of blocks, since there is no value to
>> using extents with cluster size if they are not also cluster aligned.
>>
>> I think this would also simplify some of the code.
>
> Actually, after these patches are applied, both logical block and
> physical block are all cluster sized. So I have a suggestion that we
> can simply tell users that ext4 can use large size block rather than
> cluster.

I hadn't actually looked at the later patches in the series yet. In
that case, I'm happy to allow bigalloc to continue with its current
approach of cluster size > blocksize, but extents are measured in blocks,
and use the support support you've added for blocksize > PAGE_SIZE by
scaling the in-memory "block" addresses to match PAGE_SIZE (along with
other fixes here to handle zeroing of neighbouring pages in the block).

Essentially, this would be very similar to internally setting the cluster
size to blocksize >> PAGE_SHIFT even though this isn't set in the superblock
at format time.


The other comments below should still be addressed.

>>> static int ext4_valid_extent(struct inode *inode, struct ext4_extent *ext)
>>> {
>>> + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>>
>> Why allocate "*sbi" on the stack in all of these functions for a
>> single use? This provides no benefit, but can increase the stack
>> usage considerably due to repeated allocations.
>>
>>> ext4_fsblk_t block = ext4_ext_pblock(ext);
>>> + int len = EXT4_C2B(sbi, ext4_ext_get_actual_len(ext));
>>
>> It probably makes more sense to pass "sb" or "sbi" as a parameter to
>> ext4_ext_get_actual_len() and then have it return the proper length
>> in blocks (i.e. call EXT4_C2B() internally), which will simplify all
>> of the callers and avoid potential bugs if some code does not use it.
>>
>>> @@ -1523,7 +1534,7 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
>>> ext1_ee_len = ext4_ext_get_actual_len(ex1);
>>> ext2_ee_len = ext4_ext_get_actual_len(ex2);
>>>
>>> - if (le32_to_cpu(ex1->ee_block) + ext1_ee_len !=
>>> + if (le32_to_cpu(ex1->ee_block) + EXT4_C2B(sbi, ext1_ee_len) !=
>>> le32_to_cpu(ex2->ee_block))
>>
>> If both ee_len and ee_block are in the same units (blocks or clusters),
>> then there is no need to convert units for this function at all.


Cheers, Andreas