LinuxLists.cc - [PATCH 0/6 v2] Introduce FALLOC_FL_ZERO

2014-02-25 19:14:55

Subject: [PATCH 0/6 v2] Introduce FALLOC_FL_ZERO_RANGE flag for fallocate

2014-02-25 19:14:34

Subject: [PATCH 1/6 v2] ext4: Update inode i_size after the preallocation

Currently in ext4_fallocate we would update inode size, c_time and sync
the file with every partial allocation which is entirely unnecessary. It
is true that if the crash happens in the middle of truncate we might end
up with unchanged i size, or c_time which I do not think is really a
problem - it does not mean file system corruption in any way. Note that
xfs is doing things the same way e.g. update all of the mentioned after
the allocation is done.

This commit moves all the updates after the allocation is done. In
addition we also need to change m_time as not only inode has been change
bot also data regions might have changed (unwritten extents). However
m_time will be only updated when i_size changed.

Also we do not need to be paranoid about changing the c_time only if the
actual allocation have happened, we can change it even if we try to
allocate only to find out that there are already block allocated. It's
not really a big deal and it will save us some additional complexity.

Also use ext4_debug, instead of ext4_warning in #ifdef EXT4FS_DEBUG
section.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 89 ++++++++++++++++++++++---------------------------------
1 file changed, 35 insertions(+), 54 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 10cff47..67c7917 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4513,36 +4513,6 @@ retry:
ext4_std_error(inode->i_sb, err);
}

-static void ext4_falloc_update_inode(struct inode *inode,
- int mode, loff_t new_size, int update_ctime)
-{
- struct timespec now;
-
- if (update_ctime) {
- now = current_fs_time(inode->i_sb);
- if (!timespec_equal(&inode->i_ctime, &now))
- inode->i_ctime = now;
- }
- /*
- * Update only when preallocation was requested beyond
- * the file size.
- */
- if (!(mode & FALLOC_FL_KEEP_SIZE)) {
- if (new_size > i_size_read(inode))
- i_size_write(inode, new_size);
- if (new_size > EXT4_I(inode)->i_disksize)
- ext4_update_i_disksize(inode, new_size);
- } else {
- /*
- * Mark that we allocate beyond EOF so the subsequent truncate
- * can proceed even if the new size is the same as i_size.
- */
- if (new_size > i_size_read(inode))
- ext4_set_inode_flag(inode, EXT4_INODE_EOFBLOCKS);
- }
-
-}
-
/*
* preallocate space for a file. This implements ext4's fallocate file
* operation, which gets called from sys_fallocate system call.
@@ -4554,13 +4524,14 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
{
struct inode *inode = file_inode(file);
handle_t *handle;
- loff_t new_size;
+ loff_t new_size = 0;
unsigned int max_blocks;
int ret = 0;
int ret2 = 0;
int retries = 0;
int flags;
struct ext4_map_blocks map;
+ struct timespec tv;
unsigned int credits, blkbits = inode->i_blkbits;

/* Return error if mode is not supported */
@@ -4594,12 +4565,15 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
*/
credits = ext4_chunk_trans_blocks(inode, max_blocks);
mutex_lock(&inode->i_mutex);
- ret = inode_newsize_ok(inode, (len + offset));
- if (ret) {
- mutex_unlock(&inode->i_mutex);
- trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
- return ret;
+
+ if (!(mode & FALLOC_FL_KEEP_SIZE) &&
+ offset + len > i_size_read(inode)) {
+ new_size = offset + len;
+ ret = inode_newsize_ok(inode, new_size);
+ if (ret)
+ goto out;
}
+
flags = EXT4_GET_BLOCKS_CREATE_UNINIT_EXT;
if (mode & FALLOC_FL_KEEP_SIZE)
flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
@@ -4623,28 +4597,14 @@ retry:
}
ret = ext4_map_blocks(handle, inode, &map, flags);
if (ret <= 0) {
-#ifdef EXT4FS_DEBUG
- ext4_warning(inode->i_sb,
- "inode #%lu: block %u: len %u: "
- "ext4_ext_map_blocks returned %d",
- inode->i_ino, map.m_lblk,
- map.m_len, ret);
-#endif
+ ext4_debug("inode #%lu: block %u: len %u: "
+ "ext4_ext_map_blocks returned %d",
+ inode->i_ino, map.m_lblk,
+ map.m_len, ret);
ext4_mark_inode_dirty(handle, inode);
ret2 = ext4_journal_stop(handle);
break;
}
- if ((map.m_lblk + ret) >= (EXT4_BLOCK_ALIGN(offset + len,
- blkbits) >> blkbits))
- new_size = offset + len;
- else
- new_size = ((loff_t) map.m_lblk + ret) << blkbits;
-
- ext4_falloc_update_inode(inode, mode, new_size,
- (map.m_flags & EXT4_MAP_NEW));
- ext4_mark_inode_dirty(handle, inode);
- if ((file->f_flags & O_SYNC) && ret >= max_blocks)
- ext4_handle_sync(handle);
ret2 = ext4_journal_stop(handle);
if (ret2)
break;
@@ -4654,6 +4614,27 @@ retry:
ret = 0;
goto retry;
}
+
+ handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
+ if (IS_ERR(handle))
+ goto out;
+
+ tv = inode->i_ctime = ext4_current_time(inode);
+
+ if (ret > 0 && new_size) {
+ if (new_size > i_size_read(inode)) {
+ i_size_write(inode, new_size);
+ inode->i_mtime = tv;
+ }
+ if (new_size > EXT4_I(inode)->i_disksize)
+ ext4_update_i_disksize(inode, new_size);
+ }
+ ext4_mark_inode_dirty(handle, inode);
+ if (file->f_flags & O_SYNC)
+ ext4_handle_sync(handle);
+
+ ext4_journal_stop(handle);
+out:
mutex_unlock(&inode->i_mutex);
trace_ext4_fallocate_exit(inode, offset, max_blocks,
ret > 0 ? ret2 : ret);
--
1.8.3.1

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2014-02-25 19:14:35

by Lukas Czerner

[permalink] [raw]

Subject: [PATCH 2/6 v2] ext4: refactor ext4_fallocate code

Move block allocation out of the ext4_fallocate into separate function
called ext4_alloc_file_blocks(). This will allow us to use the same
allocation code for other allocation operations such as zero range which
is commit in the next patch.

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/extents.c | 127 +++++++++++++++++++++++++++++++-----------------------
1 file changed, 73 insertions(+), 54 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 67c7917..0e675bc 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4513,6 +4513,64 @@ retry:
ext4_std_error(inode->i_sb, err);
}

+static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset,
+ ext4_lblk_t len, int flags, int mode)
+{
+ struct inode *inode = file_inode(file);
+ handle_t *handle;
+ int ret = 0;
+ int ret2 = 0;
+ int retries = 0;
+ struct ext4_map_blocks map;
+ unsigned int credits;
+
+ map.m_lblk = offset;
+ /*
+ * Don't normalize the request if it can fit in one extent so
+ * that it doesn't get unnecessarily split into multiple
+ * extents.
+ */
+ if (len <= EXT_UNINIT_MAX_LEN)
+ flags |= EXT4_GET_BLOCKS_NO_NORMALIZE;
+
+ /*
+ * credits to insert 1 extent into extent tree
+ */
+ credits = ext4_chunk_trans_blocks(inode, len);
+
+retry:
+ while (ret >= 0 && ret < len) {
+ map.m_lblk = map.m_lblk + ret;
+ map.m_len = len = len - ret;
+ handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS,
+ credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ break;
+ }
+ ret = ext4_map_blocks(handle, inode, &map, flags);
+ if (ret <= 0) {
+ ext4_debug("inode #%lu: block %u: len %u: "
+ "ext4_ext_map_blocks returned %d",
+ inode->i_ino, map.m_lblk,
+ map.m_len, ret);
+ ext4_mark_inode_dirty(handle, inode);
+ ret2 = ext4_journal_stop(handle);
+ break;
+ }
+ ret2 = ext4_journal_stop(handle);
+ if (ret2)
+ break;
+ }
+ if (ret == -ENOSPC &&
+ ext4_should_retry_alloc(inode->i_sb, &retries)) {
+ ret = 0;
+ goto retry;
+ }
+
+ return ret > 0 ? ret2 : ret;
+}
+
/*
* preallocate space for a file. This implements ext4's fallocate file
* operation, which gets called from sys_fallocate system call.
@@ -4527,12 +4585,10 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
loff_t new_size = 0;
unsigned int max_blocks;
int ret = 0;
- int ret2 = 0;
- int retries = 0;
int flags;
- struct ext4_map_blocks map;
+ ext4_lblk_t lblk;
struct timespec tv;
- unsigned int credits, blkbits = inode->i_blkbits;
+ unsigned int blkbits = inode->i_blkbits;

/* Return error if mode is not supported */
if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
@@ -4553,17 +4609,18 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
return -EOPNOTSUPP;

trace_ext4_fallocate_enter(inode, offset, len, mode);
- map.m_lblk = offset >> blkbits;
+ lblk = offset >> blkbits;
/*
* We can't just convert len to max_blocks because
* If blocksize = 4096 offset = 3072 and len = 2048
*/
max_blocks = (EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits)
- - map.m_lblk;
- /*
- * credits to insert 1 extent into extent tree
- */
- credits = ext4_chunk_trans_blocks(inode, max_blocks);
+ - lblk;
+
+ flags = EXT4_GET_BLOCKS_CREATE_UNINIT_EXT;
+ if (mode & FALLOC_FL_KEEP_SIZE)
+ flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
+
mutex_lock(&inode->i_mutex);

if (!(mode & FALLOC_FL_KEEP_SIZE) &&
@@ -4574,46 +4631,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
goto out;
}

- flags = EXT4_GET_BLOCKS_CREATE_UNINIT_EXT;
- if (mode & FALLOC_FL_KEEP_SIZE)
- flags |= EXT4_GET_BLOCKS_KEEP_SIZE;
- /*
- * Don't normalize the request if it can fit in one extent so
- * that it doesn't get unnecessarily split into multiple
- * extents.
- */
- if (len <= EXT_UNINIT_MAX_LEN << blkbits)
- flags |= EXT4_GET_BLOCKS_NO_NORMALIZE;
-
-retry:
- while (ret >= 0 && ret < max_blocks) {
- map.m_lblk = map.m_lblk + ret;
- map.m_len = max_blocks = max_blocks - ret;
- handle = ext4_journal_start(inode, EXT4_HT_MAP_BLOCKS,
- credits);
- if (IS_ERR(handle)) {
- ret = PTR_ERR(handle);
- break;
- }
- ret = ext4_map_blocks(handle, inode, &map, flags);
- if (ret <= 0) {
- ext4_debug("inode #%lu: block %u: len %u: "
- "ext4_ext_map_blocks returned %d",
- inode->i_ino, map.m_lblk,
- map.m_len, ret);
- ext4_mark_inode_dirty(handle, inode);
- ret2 = ext4_journal_stop(handle);
- break;
- }
- ret2 = ext4_journal_stop(handle);
- if (ret2)
- break;
- }
- if (ret == -ENOSPC &&
- ext4_should_retry_alloc(inode->i_sb, &retries)) {
- ret = 0;
- goto retry;
- }
+ ret = ext4_alloc_file_blocks(file, lblk, max_blocks, flags, mode);
+ if (ret)
+ goto out;

handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
if (IS_ERR(handle))
@@ -4621,7 +4641,7 @@ retry:

tv = inode->i_ctime = ext4_current_time(inode);

- if (ret > 0 && new_size) {
+ if (!ret && new_size) {
if (new_size > i_size_read(inode)) {
i_size_write(inode, new_size);
inode->i_mtime = tv;
@@ -4636,9 +4656,8 @@ retry:
ext4_journal_stop(handle);
out:
mutex_unlock(&inode->i_mutex);
- trace_ext4_fallocate_exit(inode, offset, max_blocks,
- ret > 0 ? ret2 : ret);
- return ret > 0 ? ret2 : ret;
+ trace_ext4_fallocate_exit(inode, offset, max_blocks, ret);
+ return ret;
}

/*
--
1.8.3.1

2014-02-25 19:14:36

by Lukas Czerner

[permalink] [raw]

Subject: [PATCH 3/6 v2] ext4: translate fallocate mode bits to strings

Signed-off-by: Lukas Czerner <[email protected]>
---
fs/ext4/ext4.h | 1 +
fs/ext4/extents.c | 1 -
include/trace/events/ext4.h | 9 +++++++--
3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index ece5556..3b9601c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -31,6 +31,7 @@
#include <linux/percpu_counter.h>
#include <linux/ratelimit.h>
#include <crypto/hash.h>
+#include <linux/falloc.h>
#ifdef __KERNEL__
#include <linux/compat.h>
#endif
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 0e675bc..e5485eb 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -37,7 +37,6 @@
#include <linux/quotaops.h>
#include <linux/string.h>
#include <linux/slab.h>
-#include <linux/falloc.h>
#include <asm/uaccess.h>
#include <linux/fiemap.h>
#include "ext4_jbd2.h"
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 197d312..451e020 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -68,6 +68,11 @@ struct extent_status;
{ EXTENT_STATUS_DELAYED, "D" }, \
{ EXTENT_STATUS_HOLE, "H" })

+#define show_falloc_mode(mode) __print_flags(mode, "|", \
+ { FALLOC_FL_KEEP_SIZE, "KEEP_SIZE"}, \
+ { FALLOC_FL_PUNCH_HOLE, "PUNCH_HOLE"}, \
+ { FALLOC_FL_NO_HIDE_STALE, "NO_HIDE_STALE"})
+

TRACE_EVENT(ext4_free_inode,
TP_PROTO(struct inode *inode),
@@ -1349,10 +1354,10 @@ TRACE_EVENT(ext4_fallocate_enter,
__entry->mode = mode;
),

- TP_printk("dev %d,%d ino %lu pos %lld len %lld mode %d",
+ TP_printk("dev %d,%d ino %lu pos %lld len %lld mode %s",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino, __entry->pos,
- __entry->len, __entry->mode)
+ __entry->len, show_falloc_mode(__entry->mode))
);

TRACE_EVENT(ext4_fallocate_exit,
--
1.8.3.1

2014-02-25 19:15:01

by Lukas Czerner

[permalink] [raw]

On Tue, Feb 25, 2014 at 08:14:38PM +0100, Lukas Czerner wrote:
> Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same
> functionality as xfs ioctl XFS_IOC_ZERO_RANGE.
>
> It can be used to convert a range of file to zeros preferably without
> issuing data IO. Blocks should be preallocated for the regions that span
> holes in the file, and the entire range is preferable converted to
> unwritten extents
>
> This can be also used to preallocate blocks past EOF in the same way as
> with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode
> size to remain the same.
>
> Also add appropriate tracepoints.
>
> Signed-off-by: Lukas Czerner <[email protected]>

Thanks, applied.

- Ted

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2014-03-16 19:08:26

On Tue, 18 Mar 2014, [email protected] wrote:

> Date: Tue, 18 Mar 2014 08:39:19 -0400
> From: [email protected]
> To: =?utf-8?B?THVrw6HFoSBDemVybmVyIDxsY3plcm5lckByZWRoYXQuY29tPg==?=@thunk.org
> Cc: [email protected]
> Subject: Re: [PATCH 0/6 v2] Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
>
> On Tue, Mar 18, 2014 at 12:37:47PM +0100, Lukáš Czerner wrote:
> > Ok, finally I got it. The problem is that we now have commit
> >
> > 97d39798f77aef626130db8590cc79195300227b ext4: delete path dealloc
> > code in ext4_ext_handle_uninitialized_extents
> >
> > which I was not aware of before. And when merging you have used the
> > same out2 label out of the function. However when creating my new
> > function ext4_ext_convert_initialized_exten() so I've done the same
> > thing as with ext4_ext_handle_uninitialized_extents() and freed the
> > path. And since we do not set path to NULL in ext4_ext_map_blocks
> > after calling ext4_ext_convert_initialized_extent() when we hit the
> > condition at the out2:
> >
> > if (path) {
> > ext4_ext_drop_refs(path);
> > kfree(path);
> > }
> >
> > we will double-free possibly destroying data from someone else. That
> > is why we've seen what looked like a random memory corruption.
>
> My bad! I remember noticing that particular semantic conflict, and I
> *thought* I had fixed it up. The fixup must have gotten lost when I
> was doing some patch wrangling (I was moving aronud some patch hunks
> around to be the most logical with respect to the COLLAPSE RANGE, and
> I must have dropped the fixup somewhere along the way).
>
> Thanks for finding it!
>
> - Ted
>

No problem. I am running some tests right now to make sure that
everything is in order and will send out patches 1, 2, and 5 once
again in a separate patch set based on ext4/dev branch.

Thanks!
-Lukas