2007-03-19 15:48:33

by Dave Kleikamp

[permalink] [raw]
Subject: cleaned up ext4 patch series

Ted,
I have rebased the ext4 patchset on 2.6.21-rc4 and cleaned up some bad
whitespace and sparse warnings. The patches are here:
http://www.kernel.org/pub/linux/kernel/people/shaggy/ext4/ext4-2007-03-19.tar.bz2
Untarred here:
http://www.kernel.org/pub/linux/kernel/people/shaggy/ext4/ext4-2007-03-19/

I have commented the series file with the changes I've made. Some of
the patches are missing signed-of-by:

# Rebased the patches to 2.6.21-rc4

# New patch to fix whitespace before applying new patches
whitespace.patch

# Replaced truncated beginning comments
extent-overlap-bugfix

persistent_allocation_1_ioctl_and_unitialized_extents

# Fixed an endian error
persistent_allocation_2_support_for_writing_to_unitialized_extent

# Note: still lots of outstanding comments from linux-ext4 list, 12/2006
# Missing signed-off-by:
booked-page-flag.patch

# Missing signed-off-by:
ext4-block-reservation.patch

# fixed a bunch of endianness errors reported by sparse
# Needs a signed-off-by from Alex, then can add shaggy's
ext4-delayed-allocation.patch

ext4-delalloc-extents-48bit.patch

# updated to latest version
nanosecond_timestamps.patch

--
David Kleikamp
IBM Linux Technology Center


2007-03-19 17:15:25

by Mingming Cao

[permalink] [raw]
Subject: Re: cleaned up ext4 patch series

On Mon, 2007-03-19 at 10:48 -0500, Dave Kleikamp wrote:
> Ted,
> I have rebased the ext4 patchset on 2.6.21-rc4 and cleaned up some bad
> whitespace and sparse warnings. The patches are here:
> http://www.kernel.org/pub/linux/kernel/people/shaggy/ext4/ext4-2007-03-19.tar.bz2
> Untarred here:
> http://www.kernel.org/pub/linux/kernel/people/shaggy/ext4/ext4-2007-03-19/
>
> I have commented the series file with the changes I've made. Some of
> the patches are missing signed-of-by:
>
> # Rebased the patches to 2.6.21-rc4
>
> # New patch to fix whitespace before applying new patches
> whitespace.patch
>
> # Replaced truncated beginning comments
> extent-overlap-bugfix
>

> persistent_allocation_1_ioctl_and_unitialized_extents
>

We could mention here that this patch is going to be replaced by a new
patch to use the fallocate() operations.

> # Fixed an endian error
> persistent_allocation_2_support_for_writing_to_unitialized_extent
>

I think Amit has an updated version of this patch in his place.

> # Note: still lots of outstanding comments from linux-ext4 list, 12/2006
> # Missing signed-off-by:
> booked-page-flag.patch
>
> # Missing signed-off-by:
> ext4-block-reservation.patch
>
> # fixed a bunch of endianness errors reported by sparse
> # Needs a signed-off-by from Alex, then can add shaggy's
> ext4-delayed-allocation.patch
>
> ext4-delalloc-extents-48bit.patch
>
> # updated to latest version
> nanosecond_timestamps.patch
>
This nanosecond patch could be move to upstream earlier than the delayed
allocation patch (Alex is working to rewrite it at VFS level), so shall
we move it before the booked-page-flag.patch?

I wonder if we should create two branches: one branch for patches that
are well discussed and tested, which Andrew could trust and pull to mm
tree; and create another branch to store patches that are still under
discussion and likely to be rewriten based on the review feedback.


Thanks,
Mingming

2007-03-19 17:31:24

by Andreas Dilger

[permalink] [raw]
Subject: Re: cleaned up ext4 patch series

On Mar 19, 2007 09:15 -0800, Mingming Cao wrote:
> I wonder if we should create two branches: one branch for patches that
> are well discussed and tested, which Andrew could trust and pull to mm
> tree; and create another branch to store patches that are still under
> discussion and likely to be rewriten based on the review feedback.

Yes, that makes a lot of sense. It would avoid issues like Andrew finding
problems with the nanosecond patches because they hadn't been widely tested
yet.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-03-19 19:42:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: cleaned up ext4 patch series

On Mon, Mar 19, 2007 at 10:48:28AM -0500, Dave Kleikamp wrote:
> I have rebased the ext4 patchset on 2.6.21-rc4 and cleaned up some bad
> whitespace and sparse warnings. The patches are here:
> http://www.kernel.org/pub/linux/kernel/people/shaggy/ext4/ext4-2007-03-19.tar.bz2
> Untarred here:
> http://www.kernel.org/pub/linux/kernel/people/shaggy/ext4/ext4-2007-03-19/

Hey Shaggy, as we discussed on last week's' call, I've set up a git
repository for the ext4 patch queue here:

http://repo.or.cz/w/ext4-patch-queue.git

You have access already, I'll add mingming's key in just a bit....

- Ted

2007-03-21 12:19:49

by Amit K. Arora

[permalink] [raw]
Subject: [Patch 0/2] Persistent preallocation in ext4 (using fallocate inode op)

On Mon, Mar 19, 2007 at 09:15:22AM -0800, Mingming Cao wrote:
> On Mon, 2007-03-19 at 10:48 -0500, Dave Kleikamp wrote:
> > persistent_allocation_1_ioctl_and_unitialized_extents
>
> We could mention here that this patch is going to be replaced by a new
> patch to use the fallocate() operations.
>
> > # Fixed an endian error
> > persistent_allocation_2_support_for_writing_to_unitialized_extent
>
> I think Amit has an updated version of this patch in his place.

Hi Mingming,

Here are the new patches which use new fallocate inode interface. I have
made following changes to the previous patchset:

1. Removed ioctl portion of the code from the preallocation patch.
2. Added ext4_fallocate() to support the new fallocate inode operation,
which will be called from the sys_fallocate() system call code.
3. Fixed the endian error which you observed in the "write support for
uninitialized extents" patch.

--
Regards,
Amit Arora

2007-03-21 12:32:50

by Amit K. Arora

[permalink] [raw]
Subject: Re: [Patch 1/2] preallocation patch (using fallocate inode op)

This is the new preallocation patch, which implements ext4_fallocate() to do
the preallocation.

Signed-off-by: Amit K Arora <[email protected]>
---
fs/ext4/extents.c | 201 +++++++++++++++++++++++++++++++---------
fs/ext4/file.c | 1
include/linux/ext4_fs.h | 7 +
include/linux/ext4_fs_extents.h | 13 ++
4 files changed, 179 insertions(+), 43 deletions(-)

Index: linux-2.6.20.1/fs/ext4/extents.c
===================================================================
--- linux-2.6.20.1.orig/fs/ext4/extents.c
+++ linux-2.6.20.1/fs/ext4/extents.c
@@ -283,7 +283,7 @@ static void ext4_ext_show_path(struct in
} else if (path->p_ext) {
ext_debug(" %d:%d:%llu ",
le32_to_cpu(path->p_ext->ee_block),
- le16_to_cpu(path->p_ext->ee_len),
+ ext4_ext_get_actual_len(path->p_ext),
ext_pblock(path->p_ext));
} else
ext_debug(" []");
@@ -306,7 +306,7 @@ static void ext4_ext_show_leaf(struct in

for (i = 0; i < le16_to_cpu(eh->eh_entries); i++, ex++) {
ext_debug("%d:%d:%llu ", le32_to_cpu(ex->ee_block),
- le16_to_cpu(ex->ee_len), ext_pblock(ex));
+ ext4_ext_get_actual_len(ex), ext_pblock(ex));
}
ext_debug("\n");
}
@@ -426,7 +426,7 @@ ext4_ext_binsearch(struct inode *inode,
ext_debug(" -> %d:%llu:%d ",
le32_to_cpu(path->p_ext->ee_block),
ext_pblock(path->p_ext),
- le16_to_cpu(path->p_ext->ee_len));
+ ext4_ext_get_actual_len(path->p_ext));

#ifdef CHECK_BINSEARCH
{
@@ -687,7 +687,7 @@ static int ext4_ext_split(handle_t *hand
ext_debug("move %d:%llu:%d in new leaf %llu\n",
le32_to_cpu(path[depth].p_ext->ee_block),
ext_pblock(path[depth].p_ext),
- le16_to_cpu(path[depth].p_ext->ee_len),
+ ext4_ext_get_actual_len(path[depth].p_ext),
newblock);
/*memmove(ex++, path[depth].p_ext++,
sizeof(struct ext4_extent));
@@ -1107,7 +1107,19 @@ static int
ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1,
struct ext4_extent *ex2)
{
- if (le32_to_cpu(ex1->ee_block) + le16_to_cpu(ex1->ee_len) !=
+ unsigned short ext1_ee_len, ext2_ee_len;
+
+ /*
+ * Make sure that either both extents are uninitialized, or
+ * both are _not_.
+ */
+ if (ext4_ext_is_uninitialized(ex1) ^ ext4_ext_is_uninitialized(ex2))
+ return 0;
+
+ ext1_ee_len = ext4_ext_get_actual_len(ex1);
+ ext2_ee_len = ext4_ext_get_actual_len(ex2);
+
+ if (le32_to_cpu(ex1->ee_block) + ext1_ee_len !=
le32_to_cpu(ex2->ee_block))
return 0;

@@ -1116,14 +1128,14 @@ ext4_can_extents_be_merged(struct inode
* as an RO_COMPAT feature, refuse to merge to extents if
* this can result in the top bit of ee_len being set.
*/
- if (le16_to_cpu(ex1->ee_len) + le16_to_cpu(ex2->ee_len) > EXT_MAX_LEN)
+ if (ext1_ee_len + ext2_ee_len > EXT_MAX_LEN)
return 0;
#ifdef AGRESSIVE_TEST
if (le16_to_cpu(ex1->ee_len) >= 4)
return 0;
#endif

- if (ext_pblock(ex1) + le16_to_cpu(ex1->ee_len) == ext_pblock(ex2))
+ if (ext_pblock(ex1) + ext1_ee_len == ext_pblock(ex2))
return 1;
return 0;
}
@@ -1145,7 +1157,7 @@ unsigned int ext4_ext_check_overlap(stru
unsigned int depth, len1;

b1 = le32_to_cpu(newext->ee_block);
- len1 = le16_to_cpu(newext->ee_len);
+ len1 = ext4_ext_get_actual_len(newext);
depth = ext_depth(inode);
if (!path[depth].p_ext)
goto out;
@@ -1181,9 +1193,9 @@ int ext4_ext_insert_extent(handle_t *han
struct ext4_extent *ex, *fex;
struct ext4_extent *nearex; /* nearest extent */
struct ext4_ext_path *npath = NULL;
- int depth, len, err, next;
+ int depth, len, err, next, uninitialized = 0;

- BUG_ON(newext->ee_len == 0);
+ BUG_ON(ext4_ext_get_actual_len(newext) == 0);
depth = ext_depth(inode);
ex = path[depth].p_ext;
BUG_ON(path[depth].p_hdr == NULL);
@@ -1191,14 +1203,23 @@ int ext4_ext_insert_extent(handle_t *han
/* try to insert block into found extent and return */
if (ex && ext4_can_extents_be_merged(inode, ex, newext)) {
ext_debug("append %d block to %d:%d (from %llu)\n",
- le16_to_cpu(newext->ee_len),
+ ext4_ext_get_actual_len(newext),
le32_to_cpu(ex->ee_block),
- le16_to_cpu(ex->ee_len), ext_pblock(ex));
+ ext4_ext_get_actual_len(ex), ext_pblock(ex));
err = ext4_ext_get_access(handle, inode, path + depth);
if (err)
return err;
- ex->ee_len = cpu_to_le16(le16_to_cpu(ex->ee_len)
- + le16_to_cpu(newext->ee_len));
+
+ /* ext4_can_extents_be_merged should have checked that either
+ * both extents are uninitialized, or both aren't. Thus we
+ * need to check only one of them here.
+ */
+ if (ext4_ext_is_uninitialized(ex))
+ uninitialized = 1;
+ ex->ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex)
+ + ext4_ext_get_actual_len(newext));
+ if (uninitialized)
+ ext4_ext_mark_uninitialized(ex);
eh = path[depth].p_hdr;
nearex = ex;
goto merge;
@@ -1254,7 +1275,7 @@ has_space:
ext_debug("first extent in the leaf: %d:%llu:%d\n",
le32_to_cpu(newext->ee_block),
ext_pblock(newext),
- le16_to_cpu(newext->ee_len));
+ ext4_ext_get_actual_len(newext));
path[depth].p_ext = EXT_FIRST_EXTENT(eh);
} else if (le32_to_cpu(newext->ee_block)
> le32_to_cpu(nearex->ee_block)) {
@@ -1267,7 +1288,7 @@ has_space:
"move %d from 0x%p to 0x%p\n",
le32_to_cpu(newext->ee_block),
ext_pblock(newext),
- le16_to_cpu(newext->ee_len),
+ ext4_ext_get_actual_len(newext),
nearex, len, nearex + 1, nearex + 2);
memmove(nearex + 2, nearex + 1, len);
}
@@ -1280,7 +1301,7 @@ has_space:
"move %d from 0x%p to 0x%p\n",
le32_to_cpu(newext->ee_block),
ext_pblock(newext),
- le16_to_cpu(newext->ee_len),
+ ext4_ext_get_actual_len(newext),
nearex, len, nearex + 1, nearex + 2);
memmove(nearex + 1, nearex, len);
path[depth].p_ext = nearex;
@@ -1299,8 +1320,13 @@ merge:
if (!ext4_can_extents_be_merged(inode, nearex, nearex + 1))
break;
/* merge with next extent! */
- nearex->ee_len = cpu_to_le16(le16_to_cpu(nearex->ee_len)
- + le16_to_cpu(nearex[1].ee_len));
+ if (ext4_ext_is_uninitialized(nearex))
+ uninitialized = 1;
+ nearex->ee_len = cpu_to_le16(ext4_ext_get_actual_len(nearex)
+ + ext4_ext_get_actual_len(nearex + 1));
+ if (uninitialized)
+ ext4_ext_mark_uninitialized(nearex);
+
if (nearex + 1 < EXT_LAST_EXTENT(eh)) {
len = (EXT_LAST_EXTENT(eh) - nearex - 1)
* sizeof(struct ext4_extent);
@@ -1370,8 +1396,8 @@ int ext4_ext_walk_space(struct inode *in
end = le32_to_cpu(ex->ee_block);
if (block + num < end)
end = block + num;
- } else if (block >=
- le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len)) {
+ } else if (block >= le32_to_cpu(ex->ee_block)
+ + ext4_ext_get_actual_len(ex)) {
/* need to allocate space after found extent */
start = block;
end = block + num;
@@ -1383,7 +1409,8 @@ int ext4_ext_walk_space(struct inode *in
* by found extent
*/
start = block;
- end = le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len);
+ end = le32_to_cpu(ex->ee_block)
+ + ext4_ext_get_actual_len(ex);
if (block + num < end)
end = block + num;
exists = 1;
@@ -1399,7 +1426,7 @@ int ext4_ext_walk_space(struct inode *in
cbex.ec_type = EXT4_EXT_CACHE_GAP;
} else {
cbex.ec_block = le32_to_cpu(ex->ee_block);
- cbex.ec_len = le16_to_cpu(ex->ee_len);
+ cbex.ec_len = ext4_ext_get_actual_len(ex);
cbex.ec_start = ext_pblock(ex);
cbex.ec_type = EXT4_EXT_CACHE_EXTENT;
}
@@ -1472,15 +1499,15 @@ ext4_ext_put_gap_in_cache(struct inode *
ext_debug("cache gap(before): %lu [%lu:%lu]",
(unsigned long) block,
(unsigned long) le32_to_cpu(ex->ee_block),
- (unsigned long) le16_to_cpu(ex->ee_len));
+ (unsigned long) ext4_ext_get_actual_len(ex));
} else if (block >= le32_to_cpu(ex->ee_block)
- + le16_to_cpu(ex->ee_len)) {
+ + ext4_ext_get_actual_len(ex)) {
lblock = le32_to_cpu(ex->ee_block)
- + le16_to_cpu(ex->ee_len);
+ + ext4_ext_get_actual_len(ex);
len = ext4_ext_next_allocated_block(path);
ext_debug("cache gap(after): [%lu:%lu] %lu",
(unsigned long) le32_to_cpu(ex->ee_block),
- (unsigned long) le16_to_cpu(ex->ee_len),
+ (unsigned long) ext4_ext_get_actual_len(ex),
(unsigned long) block);
BUG_ON(len == lblock);
len = len - lblock;
@@ -1610,12 +1637,12 @@ static int ext4_remove_blocks(handle_t *
unsigned long from, unsigned long to)
{
struct buffer_head *bh;
+ unsigned short ee_len = ext4_ext_get_actual_len(ex);
int i;

#ifdef EXTENTS_STATS
{
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
- unsigned short ee_len = le16_to_cpu(ex->ee_len);
spin_lock(&sbi->s_ext_stats_lock);
sbi->s_ext_blocks += ee_len;
sbi->s_ext_extents++;
@@ -1629,12 +1656,12 @@ static int ext4_remove_blocks(handle_t *
}
#endif
if (from >= le32_to_cpu(ex->ee_block)
- && to == le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len) - 1) {
+ && to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
/* tail removal */
unsigned long num;
ext4_fsblk_t start;
- num = le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len) - from;
- start = ext_pblock(ex) + le16_to_cpu(ex->ee_len) - num;
+ num = le32_to_cpu(ex->ee_block) + ee_len - from;
+ start = ext_pblock(ex) + ee_len - num;
ext_debug("free last %lu blocks starting %llu\n", num, start);
for (i = 0; i < num; i++) {
bh = sb_find_get_block(inode->i_sb, start + i);
@@ -1642,12 +1669,12 @@ static int ext4_remove_blocks(handle_t *
}
ext4_free_blocks(handle, inode, start, num);
} else if (from == le32_to_cpu(ex->ee_block)
- && to <= le32_to_cpu(ex->ee_block) + le16_to_cpu(ex->ee_len) - 1) {
+ && to <= le32_to_cpu(ex->ee_block) + ee_len - 1) {
printk("strange request: removal %lu-%lu from %u:%u\n",
- from, to, le32_to_cpu(ex->ee_block), le16_to_cpu(ex->ee_len));
+ from, to, le32_to_cpu(ex->ee_block), ee_len);
} else {
printk("strange request: removal(2) %lu-%lu from %u:%u\n",
- from, to, le32_to_cpu(ex->ee_block), le16_to_cpu(ex->ee_len));
+ from, to, le32_to_cpu(ex->ee_block), ee_len);
}
return 0;
}
@@ -1661,7 +1688,7 @@ ext4_ext_rm_leaf(handle_t *handle, struc
struct ext4_extent_header *eh;
unsigned a, b, block, num;
unsigned long ex_ee_block;
- unsigned short ex_ee_len;
+ unsigned short ex_ee_len, uninitialized = 0;
struct ext4_extent *ex;

ext_debug("truncate since %lu in leaf\n", start);
@@ -1676,7 +1703,9 @@ ext4_ext_rm_leaf(handle_t *handle, struc
ex = EXT_LAST_EXTENT(eh);

ex_ee_block = le32_to_cpu(ex->ee_block);
- ex_ee_len = le16_to_cpu(ex->ee_len);
+ if (ext4_ext_is_uninitialized(ex))
+ uninitialized = 1;
+ ex_ee_len = ext4_ext_get_actual_len(ex);

while (ex >= EXT_FIRST_EXTENT(eh) &&
ex_ee_block + ex_ee_len > start) {
@@ -1744,6 +1773,8 @@ ext4_ext_rm_leaf(handle_t *handle, struc

ex->ee_block = cpu_to_le32(block);
ex->ee_len = cpu_to_le16(num);
+ if (uninitialized)
+ ext4_ext_mark_uninitialized(ex);

err = ext4_ext_dirty(handle, inode, path + depth);
if (err)
@@ -1753,7 +1784,7 @@ ext4_ext_rm_leaf(handle_t *handle, struc
ext_pblock(ex));
ex--;
ex_ee_block = le32_to_cpu(ex->ee_block);
- ex_ee_len = le16_to_cpu(ex->ee_len);
+ ex_ee_len = ext4_ext_get_actual_len(ex);
}

if (correct_index && eh->eh_entries)
@@ -2029,7 +2060,7 @@ int ext4_ext_get_blocks(handle_t *handle
if (ex) {
unsigned long ee_block = le32_to_cpu(ex->ee_block);
ext4_fsblk_t ee_start = ext_pblock(ex);
- unsigned short ee_len = le16_to_cpu(ex->ee_len);
+ unsigned short ee_len;

/*
* Allow future support for preallocated extents to be added
@@ -2037,8 +2068,9 @@ int ext4_ext_get_blocks(handle_t *handle
* Uninitialized extents are treated as holes, except that
* we avoid (fail) allocating new blocks during a write.
*/
- if (ee_len > EXT_MAX_LEN)
+ if (le16_to_cpu(ex->ee_len) > EXT_MAX_LEN)
goto out2;
+ ee_len = ext4_ext_get_actual_len(ex);
/* if found extent covers block, simply return it */
if (iblock >= ee_block && iblock < ee_block + ee_len) {
newblock = iblock - ee_block + ee_start;
@@ -2046,8 +2078,11 @@ int ext4_ext_get_blocks(handle_t *handle
allocated = ee_len - (iblock - ee_block);
ext_debug("%d fit into %lu:%d -> %llu\n", (int) iblock,
ee_block, ee_len, newblock);
- ext4_ext_put_in_cache(inode, ee_block, ee_len,
- ee_start, EXT4_EXT_CACHE_EXTENT);
+ /* Do not put uninitialized extent in the cache */
+ if (!ext4_ext_is_uninitialized(ex))
+ ext4_ext_put_in_cache(inode, ee_block,
+ ee_len, ee_start,
+ EXT4_EXT_CACHE_EXTENT);
goto out;
}
}
@@ -2089,6 +2124,8 @@ int ext4_ext_get_blocks(handle_t *handle
/* try to insert new extent into found leaf and return */
ext4_ext_store_pblock(&newex, newblock);
newex.ee_len = cpu_to_le16(allocated);
+ if (create == EXT4_CREATE_UNINITIALIZED_EXT) /* Mark uninitialized */
+ ext4_ext_mark_uninitialized(&newex);
err = ext4_ext_insert_extent(handle, inode, path, &newex);
if (err)
goto out2;
@@ -2100,8 +2137,10 @@ int ext4_ext_get_blocks(handle_t *handle
newblock = ext_pblock(&newex);
__set_bit(BH_New, &bh_result->b_state);

- ext4_ext_put_in_cache(inode, iblock, allocated, newblock,
- EXT4_EXT_CACHE_EXTENT);
+ /* Cache only when it is _not_ an uninitialized extent */
+ if (create!=EXT4_CREATE_UNINITIALIZED_EXT)
+ ext4_ext_put_in_cache(inode, iblock, allocated, newblock,
+ EXT4_EXT_CACHE_EXTENT);
out:
if (allocated > max_blocks)
allocated = max_blocks;
@@ -2205,10 +2244,86 @@ int ext4_ext_writepage_trans_blocks(stru
return needed;
}

+/*
+ * ext4_fallocate:
+ * preallocate space for a file
+ * mode is for future use, e.g. for unallocating preallocated blocks etc.
+ */
+int ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
+{
+ handle_t *handle;
+ ext4_fsblk_t block, max_blocks;
+ int ret, ret2, nblocks = 0, retries = 0;
+ struct buffer_head map_bh;
+ unsigned int credits, blkbits = inode->i_blkbits;
+
+ /* Currently supporting (pre)allocate mode _only_ */
+ if (mode != FA_ALLOCATE)
+ return -EOPNOTSUPP;
+
+ if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+ return -ENOTTY;
+
+ block = offset >> blkbits;
+ max_blocks = (EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits)
+ - block;
+ mutex_lock(&EXT4_I(inode)->truncate_mutex);
+ credits = ext4_ext_calc_credits_for_insert(inode, NULL);
+ mutex_unlock(&EXT4_I(inode)->truncate_mutex);
+ handle=ext4_journal_start(inode, credits +
+ EXT4_DATA_TRANS_BLOCKS(inode->i_sb)+1);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+retry:
+ ret = 0;
+ while (ret >= 0 && ret < max_blocks) {
+ block = block + ret;
+ max_blocks = max_blocks - ret;
+ ret = ext4_ext_get_blocks(handle, inode, block,
+ max_blocks, &map_bh,
+ EXT4_CREATE_UNINITIALIZED_EXT, 0);
+ BUG_ON(!ret);
+ if (ret > 0 && test_bit(BH_New, &map_bh.b_state)
+ && ((block + ret) > (i_size_read(inode) << blkbits)))
+ nblocks = nblocks + ret;
+ }
+
+ if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
+ goto retry;
+
+ /* Time to update the file size.
+ * Update only when preallocation was requested beyond the file size.
+ */
+ if ((offset + len) > i_size_read(inode)) {
+ if (ret > 0) {
+ /* if no error, we assume preallocation succeeded completely */
+ mutex_lock(&inode->i_mutex);
+ i_size_write(inode, offset + len);
+ EXT4_I(inode)->i_disksize = i_size_read(inode);
+ mutex_unlock(&inode->i_mutex);
+ } else if (ret < 0 && nblocks) {
+ /* Handle partial allocation scenario */
+ loff_t newsize;
+ mutex_lock(&inode->i_mutex);
+ newsize = (nblocks << blkbits) + i_size_read(inode);
+ i_size_write(inode, EXT4_BLOCK_ALIGN(newsize, blkbits));
+ EXT4_I(inode)->i_disksize = i_size_read(inode);
+ mutex_unlock(&inode->i_mutex);
+ }
+ }
+ ext4_mark_inode_dirty(handle, inode);
+ ret2 = ext4_journal_stop(handle);
+ if (ret > 0)
+ ret = ret2;
+
+ return ret > 0 ? 0 : ret;
+}
+
EXPORT_SYMBOL(ext4_mark_inode_dirty);
EXPORT_SYMBOL(ext4_ext_invalidate_cache);
EXPORT_SYMBOL(ext4_ext_insert_extent);
EXPORT_SYMBOL(ext4_ext_walk_space);
EXPORT_SYMBOL(ext4_ext_find_goal);
EXPORT_SYMBOL(ext4_ext_calc_credits_for_insert);
+EXPORT_SYMBOL(ext4_fallocate);

Index: linux-2.6.20.1/include/linux/ext4_fs.h
===================================================================
--- linux-2.6.20.1.orig/include/linux/ext4_fs.h
+++ linux-2.6.20.1/include/linux/ext4_fs.h
@@ -102,6 +102,8 @@
EXT4_GOOD_OLD_FIRST_INO : \
(s)->s_first_ino)
#endif
+#define EXT4_BLOCK_ALIGN(size, blkbits) (((size)+(1 << blkbits)-1) & \
+ (~((1 << blkbits)-1)))

/*
* Macro-instructions used to manage fragments
@@ -225,6 +227,10 @@ struct ext4_new_group_data {
__u32 free_blocks_count;
};

+/* Following is used by preallocation logic to tell get_blocks() that we
+ * want uninitialzed extents.
+ */
+#define EXT4_CREATE_UNINITIALIZED_EXT 2

/*
* ioctl commands
@@ -976,6 +982,7 @@ extern int ext4_ext_get_blocks(handle_t
extern void ext4_ext_truncate(struct inode *, struct page *);
extern void ext4_ext_init(struct super_block *);
extern void ext4_ext_release(struct super_block *);
+extern int ext4_fallocate(struct inode *, int, loff_t, loff_t);
static inline int
ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block,
unsigned long max_blocks, struct buffer_head *bh,
Index: linux-2.6.20.1/include/linux/ext4_fs_extents.h
===================================================================
--- linux-2.6.20.1.orig/include/linux/ext4_fs_extents.h
+++ linux-2.6.20.1/include/linux/ext4_fs_extents.h
@@ -125,6 +125,19 @@ struct ext4_ext_path {
#define EXT4_EXT_CACHE_EXTENT 2

/*
+ * Macro-instructions to handle (mark/unmark/check/create) unitialized
+ * extents. Applications can issue an IOCTL for preallocation, which results
+ * in assigning unitialized extents to the file.
+ */
+#define ext4_ext_mark_uninitialized(ext) ((ext)->ee_len |= \
+ cpu_to_le16(0x8000))
+#define ext4_ext_is_uninitialized(ext) ((le16_to_cpu((ext)->ee_len))& \
+ 0x8000)
+#define ext4_ext_get_actual_len(ext) ((le16_to_cpu((ext)->ee_len))& \
+ 0x7FFF)
+
+
+/*
* to be called by ext4_ext_walk_space()
* negative retcode - error
* positive retcode - signal for ext4_ext_walk_space(), see below
Index: linux-2.6.20.1/fs/ext4/file.c
===================================================================
--- linux-2.6.20.1.orig/fs/ext4/file.c
+++ linux-2.6.20.1/fs/ext4/file.c
@@ -135,5 +135,6 @@ struct inode_operations ext4_file_inode_
.removexattr = generic_removexattr,
#endif
.permission = ext4_permission,
+ .fallocate = ext4_fallocate,
};


2007-03-21 12:36:43

by Amit K. Arora

[permalink] [raw]
Subject: [Patch 2/2] write support for uninitialized extents

Here is the patch which supports writing to uninitialized extents. There
are no major changes to this patch. But is being resubitted to make sure
that it applies cleanly on top of the new "preallocation" patch, which has
been modified to implement fallocate inode operation so that
preallocation can be done using sys_fallocate() system call.

Signed-off-by: Amit K Arora <[email protected]>

---
fs/ext4/extents.c | 228 +++++++++++++++++++++++++++++++++++-----
include/linux/ext4_fs_extents.h | 1
2 files changed, 202 insertions(+), 27 deletions(-)

Index: linux-2.6.20.1/fs/ext4/extents.c
===================================================================
--- linux-2.6.20.1.orig/fs/ext4/extents.c
+++ linux-2.6.20.1/fs/ext4/extents.c
@@ -1141,6 +1141,51 @@ ext4_can_extents_be_merged(struct inode
}

/*
+ * ext4_ext_try_to_merge:
+ * tries to merge the "ex" extent to the next extent in the tree.
+ * It always tries to merge towards right. If you want to merge towards
+ * left, pass "ex - 1" as argument instead of "ex".
+ * Returns 0 if the extents (ex and ex+1) were _not_ merged and returns
+ * 1 if they got merged.
+ */
+int ext4_ext_try_to_merge(struct inode *inode,
+ struct ext4_ext_path *path,
+ struct ext4_extent *ex)
+{
+ struct ext4_extent_header *eh;
+ unsigned int depth, len;
+ int merge_done=0, uninitialized = 0;
+
+ depth = ext_depth(inode);
+ BUG_ON(path[depth].p_hdr == NULL);
+ eh = path[depth].p_hdr;
+
+ while (ex < EXT_LAST_EXTENT(eh)) {
+ if (!ext4_can_extents_be_merged(inode, ex, ex + 1))
+ break;
+ /* merge with next extent! */
+ if (ext4_ext_is_uninitialized(ex))
+ uninitialized = 1;
+ ex->ee_len = cpu_to_le16(ext4_ext_get_actual_len(ex)
+ + ext4_ext_get_actual_len(ex + 1));
+ if (uninitialized)
+ ext4_ext_mark_uninitialized(ex);
+
+ if (ex + 1 < EXT_LAST_EXTENT(eh)) {
+ len = (EXT_LAST_EXTENT(eh) - ex - 1)
+ * sizeof(struct ext4_extent);
+ memmove(ex + 1, ex + 2, len);
+ }
+ eh->eh_entries = cpu_to_le16(le16_to_cpu(eh->eh_entries)-1);
+ merge_done = 1;
+ BUG_ON(eh->eh_entries == 0);
+ }
+
+ return merge_done;
+}
+
+
+/*
* ext4_ext_check_overlap:
* check if a portion of the "newext" extent overlaps with an
* existing extent.
@@ -1316,25 +1361,7 @@ has_space:

merge:
/* try to merge extents to the right */
- while (nearex < EXT_LAST_EXTENT(eh)) {
- if (!ext4_can_extents_be_merged(inode, nearex, nearex + 1))
- break;
- /* merge with next extent! */
- if (ext4_ext_is_uninitialized(nearex))
- uninitialized = 1;
- nearex->ee_len = cpu_to_le16(ext4_ext_get_actual_len(nearex)
- + ext4_ext_get_actual_len(nearex + 1));
- if (uninitialized)
- ext4_ext_mark_uninitialized(nearex);
-
- if (nearex + 1 < EXT_LAST_EXTENT(eh)) {
- len = (EXT_LAST_EXTENT(eh) - nearex - 1)
- * sizeof(struct ext4_extent);
- memmove(nearex + 1, nearex + 2, len);
- }
- eh->eh_entries = cpu_to_le16(le16_to_cpu(eh->eh_entries)-1);
- BUG_ON(eh->eh_entries == 0);
- }
+ ext4_ext_try_to_merge(inode, path, nearex);

/* try to merge extents to the left */

@@ -1999,15 +2026,149 @@ void ext4_ext_release(struct super_block
#endif
}

+/*
+ * ext4_ext_convert_to_initialized:
+ * this function is called by ext4_ext_get_blocks() if someone tries to write
+ * to an uninitialized extent. It may result in splitting the uninitialized
+ * extent into multiple extents (upto three). Atleast one initialized extent
+ * and atmost two uninitialized extents can result.
+ * There are three possibilities:
+ * a> No split required: Entire extent should be initialized.
+ * b> Split into two extents: Only one end of the extent is being written to.
+ * c> Split into three extents: Somone is writing in middle of the extent.
+ */
+int ext4_ext_convert_to_initialized(handle_t *handle, struct inode *inode,
+ struct ext4_ext_path *path,
+ ext4_fsblk_t iblock,
+ unsigned long max_blocks)
+{
+ struct ext4_extent *ex, *ex1 = NULL, *ex2 = NULL, *ex3 = NULL, newex;
+ struct ext4_extent_header *eh;
+ unsigned int allocated, ee_block, ee_len, depth;
+ ext4_fsblk_t newblock;
+ int err = 0, ret = 0;
+
+ depth = ext_depth(inode);
+ eh = path[depth].p_hdr;
+ ex = path[depth].p_ext;
+ ee_block = le32_to_cpu(ex->ee_block);
+ ee_len = ext4_ext_get_actual_len(ex);
+ allocated = ee_len - (iblock - ee_block);
+ newblock = iblock - ee_block + ext_pblock(ex);
+ ex2 = ex;
+
+ /* ex1: ee_block to iblock - 1 : uninitialized */
+ if (iblock > ee_block) {
+ ex1 = ex;
+ ex1->ee_len = cpu_to_le16(iblock - ee_block);
+ ext4_ext_mark_uninitialized(ex1);
+ ex2 = &newex;
+ }
+ /* for sanity, update the length of the ex2 extent before
+ * we insert ex3, if ex1 is NULL. This is to avoid temporary
+ * overlap of blocks.
+ */
+ if (!ex1 && allocated > max_blocks)
+ ex2->ee_len = cpu_to_le16(max_blocks);
+ /* ex3: to ee_block + ee_len : uninitialised */
+ if (allocated > max_blocks) {
+ unsigned int newdepth;
+ ex3 = &newex;
+ ex3->ee_block = cpu_to_le32(iblock + max_blocks);
+ ext4_ext_store_pblock(ex3, newblock + max_blocks);
+ ex3->ee_len = cpu_to_le16(allocated - max_blocks);
+ ext4_ext_mark_uninitialized(ex3);
+ err = ext4_ext_insert_extent(handle, inode, path, ex3);
+ if (err)
+ goto out;
+ /* The depth, and hence eh & ex might change
+ * as part of the insert above.
+ */
+ newdepth = ext_depth(inode);
+ if (newdepth != depth)
+ {
+ depth=newdepth;
+ path = ext4_ext_find_extent(inode, iblock, NULL);
+ if (IS_ERR(path)) {
+ err = PTR_ERR(path);
+ path = NULL;
+ goto out;
+ }
+ eh = path[depth].p_hdr;
+ ex = path[depth].p_ext;
+ if (ex2 != &newex)
+ ex2 = ex;
+ }
+ allocated = max_blocks;
+ }
+ /* If there was a change of depth as part of the
+ * insertion of ex3 above, we need to update the length
+ * of the ex1 extent again here
+ */
+ if (ex1 && ex1 != ex) {
+ ex1 = ex;
+ ex1->ee_len = cpu_to_le16(iblock - ee_block);
+ ext4_ext_mark_uninitialized(ex1);
+ ex2 = &newex;
+ }
+ /* ex2: iblock to iblock + maxblocks-1 : initialised */
+ ex2->ee_block = cpu_to_le32(iblock);
+ ex2->ee_start = cpu_to_le32(newblock);
+ ext4_ext_store_pblock(ex2, newblock);
+ ex2->ee_len = cpu_to_le16(allocated);
+ if (ex2 != ex)
+ goto insert;
+ if ((err = ext4_ext_get_access(handle, inode, path + depth)))
+ goto out;
+ /* New (initialized) extent starts from the first block
+ * in the current extent. i.e., ex2 == ex
+ * We have to see if it can be merged with the extent
+ * on the left.
+ */
+ if (ex2 > EXT_FIRST_EXTENT(eh)) {
+ /* To merge left, pass "ex2 - 1" to try_to_merge(),
+ * since it merges towards right _only_.
+ */
+ ret = ext4_ext_try_to_merge(inode, path, ex2 - 1);
+ if (ret) {
+ err = ext4_ext_correct_indexes(handle, inode, path);
+ if (err)
+ goto out;
+ depth = ext_depth(inode);
+ ex2--;
+ }
+ }
+ /* Try to Merge towards right. This might be required
+ * only when the whole extent is being written to.
+ * i.e. ex2==ex and ex3==NULL.
+ */
+ if (!ex3) {
+ ret = ext4_ext_try_to_merge(inode, path, ex2);
+ if (ret) {
+ err = ext4_ext_correct_indexes(handle, inode, path);
+ if (err)
+ goto out;
+ }
+ }
+ /* Mark modified extent as dirty */
+ err = ext4_ext_dirty(handle, inode, path + depth);
+ goto out;
+insert:
+ err = ext4_ext_insert_extent(handle, inode, path, &newex);
+out:
+ return err ? err : allocated;
+}
+
int ext4_ext_get_blocks(handle_t *handle, struct inode *inode,
ext4_fsblk_t iblock,
unsigned long max_blocks, struct buffer_head *bh_result,
int create, int extend_disksize)
{
struct ext4_ext_path *path = NULL;
+ struct ext4_extent_header *eh;
struct ext4_extent newex, *ex;
ext4_fsblk_t goal, newblock;
- int err = 0, depth;
+ int err = 0, depth, ret;
unsigned long allocated = 0;

__clear_bit(BH_New, &bh_result->b_state);
@@ -2055,6 +2216,7 @@ int ext4_ext_get_blocks(handle_t *handle
* this is why assert can't be put in ext4_ext_find_extent()
*/
BUG_ON(path[depth].p_ext == NULL && depth != 0);
+ eh = path[depth].p_hdr;

ex = path[depth].p_ext;
if (ex) {
@@ -2063,13 +2225,9 @@ int ext4_ext_get_blocks(handle_t *handle
unsigned short ee_len;

/*
- * Allow future support for preallocated extents to be added
- * as an RO_COMPAT feature:
* Uninitialized extents are treated as holes, except that
- * we avoid (fail) allocating new blocks during a write.
+ * we split out initialized portions during a write.
*/
- if (le16_to_cpu(ex->ee_len) > EXT_MAX_LEN)
- goto out2;
ee_len = ext4_ext_get_actual_len(ex);
/* if found extent covers block, simply return it */
if (iblock >= ee_block && iblock < ee_block + ee_len) {
@@ -2078,12 +2236,27 @@ int ext4_ext_get_blocks(handle_t *handle
allocated = ee_len - (iblock - ee_block);
ext_debug("%d fit into %lu:%d -> %llu\n", (int) iblock,
ee_block, ee_len, newblock);
+
/* Do not put uninitialized extent in the cache */
- if (!ext4_ext_is_uninitialized(ex))
+ if (!ext4_ext_is_uninitialized(ex)) {
ext4_ext_put_in_cache(inode, ee_block,
ee_len, ee_start,
EXT4_EXT_CACHE_EXTENT);
- goto out;
+ goto out;
+ }
+ if (create == EXT4_CREATE_UNINITIALIZED_EXT)
+ goto out;
+ if (!create)
+ goto out2;
+
+ ret = ext4_ext_convert_to_initialized(handle, inode,
+ path, iblock,
+ max_blocks);
+ if (ret <= 0)
+ goto out2;
+ else
+ allocated = ret;
+ goto outnew;
}
}

@@ -2135,6 +2308,7 @@ int ext4_ext_get_blocks(handle_t *handle

/* previous routine could use block we allocated */
newblock = ext_pblock(&newex);
+outnew:
__set_bit(BH_New, &bh_result->b_state);

/* Cache only when it is _not_ an uninitialized extent */
Index: linux-2.6.20.1/include/linux/ext4_fs_extents.h
===================================================================
--- linux-2.6.20.1.orig/include/linux/ext4_fs_extents.h
+++ linux-2.6.20.1/include/linux/ext4_fs_extents.h
@@ -203,6 +203,7 @@ ext4_ext_invalidate_cache(struct inode *

extern int ext4_extent_tree_init(handle_t *, struct inode *);
extern int ext4_ext_calc_credits_for_insert(struct inode *, struct ext4_ext_path *);
+extern int ext4_ext_try_to_merge(struct inode *, struct ext4_ext_path *, struct ext4_extent *);
extern unsigned int ext4_ext_check_overlap(struct inode *, struct ext4_extent *, struct ext4_ext_path *);
extern int ext4_ext_insert_extent(handle_t *, struct inode *, struct ext4_ext_path *, struct ext4_extent *);
extern int ext4_ext_walk_space(struct inode *, unsigned long, unsigned long, ext_prepare_callback, void *);