2015-02-16 15:47:47

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH RESEND 0/12] fs: Introduce FALLOC_FL_INSERT_RANGE for fallocate

From: Namjae Jeon <[email protected]>

In continuation of the work of making the process of non linear editing of
media files faster, we introduce here the new flag FALLOC_FL_INSERT_RANGE
for fallocate.

This flag will work opposite to the FALLOC_FL_COLLAPSE_RANGE flag.
As such, specifying FALLOC_FL_INSERT_RANGE flag will create new space inside file
by inserting a hole within the range specified by offset and len.
User can write new data in this space. e.g. ads.
Like collapse range, currently we have the limitation that offset and len should
be block size aligned for both XFS and Ext4.

The semantics of the flag are :
1) It creates space within file by inserting a hole of len bytes starting
at offset byte without overwriting any existing data. All the data blocks
from offset to EOF are shifted towards right to make hole space.
2) It should be used exclusively. No other fallocate flag in combination.
3) Offset and length supplied to fallocate should be fs block size aligned
in case of xfs and ext4.
4) Insert range does not work for the case when offset is overlapping/beyond
i_size. If the user wants to insert space at the end of file they are
advised to use either ftruncate(2) or fallocate(2) with mode 0.
5) It increses the size of file by len bytes.


Namjae Jeon (12):
fs: Add support FALLOC_FL_INSERT_RANGE for fallocate
xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate
ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate
xfsprog: xfsio: update xfs_io manpage for FALLOC_FL_INSERT_RANGE
xfstests: generic/042: Standard insert range tests
xfstests: generic/043: Delayed allocation insert range
xfstests: generic/044: Multi insert range tests
xfstests: generic/045: Delayed allocation multi insert
xfstests: generic/046: Test multiple fallocate insert/collapse range calls
xfstests: fsstress: Add fallocate insert range operation
xfstests: fsx: Add fallocate insert range operation
manpage: update FALLOC_FL_INSERT_RANGE flag in fallocate
--
1.7.9.5


2015-02-16 15:47:48

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH RESEND 1/12] fs: Add support FALLOC_FL_INSERT_RANGE for fallocate

From: Namjae Jeon <[email protected]>

FALLOC_FL_INSERT_RANGE command is the opposite command of
FALLOC_FL_COLLAPSE_RANGE that is needed for advertisers or someone who want to
add some data in the middle of file. FALLOC_FL_INSERT_RANGE will create space
for writing new data within a file after shifting extents to right as given
length. and this command also has same limitation as FALLOC_FL_COLLAPSE_RANGE,
that is block boundary and use ftruncate(2) for crosses EOF.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Ashish Sangwan <[email protected]>
Cc: Brian Foster<[email protected]>
---
fs/open.c | 8 +++++++-
include/uapi/linux/falloc.h | 17 +++++++++++++++++
2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/fs/open.c b/fs/open.c
index 813be03..762fb45 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -232,7 +232,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)

/* Return error if mode is not supported */
if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
- FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
+ FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
+ FALLOC_FL_INSERT_RANGE))
return -EOPNOTSUPP;

/* Punch hole and zero range are mutually exclusive */
@@ -250,6 +251,11 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
(mode & ~FALLOC_FL_COLLAPSE_RANGE))
return -EINVAL;

+ /* Insert range should only be used exclusively. */
+ if ((mode & FALLOC_FL_INSERT_RANGE) &&
+ (mode & ~FALLOC_FL_INSERT_RANGE))
+ return -EINVAL;
+
if (!(file->f_mode & FMODE_WRITE))
return -EBADF;

diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
index d1197ae..3e445a7 100644
--- a/include/uapi/linux/falloc.h
+++ b/include/uapi/linux/falloc.h
@@ -41,4 +41,21 @@
*/
#define FALLOC_FL_ZERO_RANGE 0x10

+/*
+ * FALLOC_FL_INSERT_RANGE is use to insert space within the file size without
+ * overwriting any existing data. The contents of the file beyond offset are
+ * shifted towards right by len bytes to create a hole. As such, this
+ * operation will increase the size of the file by len bytes.
+ *
+ * Different filesystems may implement different limitations on the granularity
+ * of the operation. Most will limit operations to filesystem block size
+ * boundaries, but this boundary may be larger or smaller depending on
+ * the filesystem and/or the configuration of the filesystem or file.
+ *
+ * Attempting to insert space using this flag at OR beyond the end of
+ * the file is considered an illegal operation - just use ftruncate(2) or
+ * fallocate(2) with mode 0 for such type of operations.
+ */
+#define FALLOC_FL_INSERT_RANGE 0x20
+
#endif /* _UAPI_FALLOC_H_ */
--
1.7.9.5

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2015-02-16 15:47:49

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH RESEND 2/12] xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate

From: Namjae Jeon <[email protected]>

This patch implements fallocate's FALLOC_FL_INSERT_RANGE for XFS.

1) Make sure that both offset and len are block size aligned.
2) Update the i_size of inode by len bytes.
3) Compute the file's logical block number against offset. If the computed
block number is not the starting block of the extent, split the extent
such that the block number is the starting block of the extent.
4) Shift all the extents which are lying bewteen [offset, last allocated extent]
towards right by len bytes. This step will make a hole of len bytes
at offset.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Ashish Sangwan <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
---
fs/xfs/libxfs/xfs_bmap.c | 358 ++++++++++++++++++++++++++++++++++++++++------
fs/xfs/libxfs/xfs_bmap.h | 13 +-
fs/xfs/xfs_bmap_util.c | 126 +++++++++++-----
fs/xfs/xfs_bmap_util.h | 2 +
fs/xfs/xfs_file.c | 38 ++++-
fs/xfs/xfs_trace.h | 1 +
6 files changed, 455 insertions(+), 83 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 61ec015..6699e53 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5518,50 +5518,86 @@ xfs_bmse_shift_one(
int *current_ext,
struct xfs_bmbt_rec_host *gotp,
struct xfs_btree_cur *cur,
- int *logflags)
+ int *logflags,
+ enum SHIFT_DIRECTION SHIFT)
{
struct xfs_ifork *ifp;
xfs_fileoff_t startoff;
- struct xfs_bmbt_rec_host *leftp;
+ struct xfs_bmbt_rec_host *contp;
struct xfs_bmbt_irec got;
- struct xfs_bmbt_irec left;
+ struct xfs_bmbt_irec cont;
int error;
int i;
+ int total_extents;

ifp = XFS_IFORK_PTR(ip, whichfork);
+ total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);

xfs_bmbt_get_all(gotp, &got);
- startoff = got.br_startoff - offset_shift_fsb;

/* delalloc extents should be prevented by caller */
XFS_WANT_CORRUPTED_RETURN(!isnullstartblock(got.br_startblock));

- /*
- * Check for merge if we've got an extent to the left, otherwise make
- * sure there's enough room at the start of the file for the shift.
- */
- if (*current_ext) {
- /* grab the left extent and check for a large enough hole */
- leftp = xfs_iext_get_ext(ifp, *current_ext - 1);
- xfs_bmbt_get_all(leftp, &left);
+ if (SHIFT == SHIFT_LEFT) {
+ startoff = got.br_startoff - offset_shift_fsb;

- if (startoff < left.br_startoff + left.br_blockcount)
+ /*
+ * Check for merge if we've got an extent to the left,
+ * otherwise make sure there's enough room at the start
+ * of the file for the shift.
+ */
+ if (*current_ext) {
+ /*
+ * grab the left extent and check for a large
+ * enough hole.
+ */
+ contp = xfs_iext_get_ext(ifp, *current_ext - 1);
+ xfs_bmbt_get_all(contp, &cont);
+
+ if (startoff < cont.br_startoff + cont.br_blockcount)
+ return -EINVAL;
+
+ /* check whether to merge the extent or shift it down */
+ if (xfs_bmse_can_merge(&cont, &got, offset_shift_fsb)) {
+ return xfs_bmse_merge(ip, whichfork,
+ offset_shift_fsb,
+ *current_ext, gotp, contp,
+ cur, logflags);
+ }
+ } else if (got.br_startoff < offset_shift_fsb)
return -EINVAL;
+ } else {
+ startoff = got.br_startoff + offset_shift_fsb;
+ /*
+ * If this is not the last extent in the file, make sure there's
+ * enough room between current extent and next extent for
+ * accommodating the shift.
+ */
+ if (*current_ext < (total_extents - 1)) {
+ contp = xfs_iext_get_ext(ifp, *current_ext + 1);
+ xfs_bmbt_get_all(contp, &cont);
+ if (startoff + got.br_blockcount > cont.br_startoff)
+ return -EINVAL;

- /* check whether to merge the extent or shift it down */
- if (xfs_bmse_can_merge(&left, &got, offset_shift_fsb)) {
- return xfs_bmse_merge(ip, whichfork, offset_shift_fsb,
- *current_ext, gotp, leftp, cur,
- logflags);
+ /*
+ * Unlike a left shift (which involves a hole punch),
+ * a right shift does not modify extent neighbors
+ * in any way. We should never find mergeable extents
+ * in this scenario. Check anyways and warn if we
+ * encounter two extents that could be one.
+ */
+ if (xfs_bmse_can_merge(&got, &cont, offset_shift_fsb))
+ WARN_ON_ONCE(1);
}
- } else if (got.br_startoff < offset_shift_fsb)
- return -EINVAL;
-
+ }
/*
* Increment the extent index for the next iteration, update the start
* offset of the in-core extent and update the btree if applicable.
*/
- (*current_ext)++;
+ if (SHIFT == SHIFT_LEFT)
+ (*current_ext)++;
+ else
+ (*current_ext)--;
xfs_bmbt_set_startoff(gotp, startoff);
*logflags |= XFS_ILOG_CORE;
if (!cur) {
@@ -5581,10 +5617,10 @@ xfs_bmse_shift_one(
}

/*
- * Shift extent records to the left to cover a hole.
+ * Shift extent records to the left/right to cover/create a hole.
*
* The maximum number of extents to be shifted in a single operation is
- * @num_exts. @start_fsb specifies the file offset to start the shift and the
+ * @num_exts. @stop_fsb specifies the file offset at which to stop shift and the
* file offset where we've left off is returned in @next_fsb. @offset_shift_fsb
* is the length by which each extent is shifted. If there is no hole to shift
* the extents into, this will be considered invalid operation and we abort
@@ -5594,12 +5630,13 @@ int
xfs_bmap_shift_extents(
struct xfs_trans *tp,
struct xfs_inode *ip,
- xfs_fileoff_t start_fsb,
+ xfs_fileoff_t *next_fsb,
xfs_fileoff_t offset_shift_fsb,
int *done,
- xfs_fileoff_t *next_fsb,
+ xfs_fileoff_t stop_fsb,
xfs_fsblock_t *firstblock,
struct xfs_bmap_free *flist,
+ enum SHIFT_DIRECTION SHIFT,
int num_exts)
{
struct xfs_btree_cur *cur = NULL;
@@ -5609,10 +5646,11 @@ xfs_bmap_shift_extents(
struct xfs_ifork *ifp;
xfs_extnum_t nexts = 0;
xfs_extnum_t current_ext;
+ xfs_extnum_t total_extents;
+ xfs_extnum_t stop_extent;
int error = 0;
int whichfork = XFS_DATA_FORK;
int logflags = 0;
- int total_extents;

if (unlikely(XFS_TEST_ERROR(
(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
@@ -5628,6 +5666,7 @@ xfs_bmap_shift_extents(

ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
+ ASSERT(SHIFT == SHIFT_LEFT || SHIFT == SHIFT_RIGHT);

ifp = XFS_IFORK_PTR(ip, whichfork);
if (!(ifp->if_flags & XFS_IFEXTENTS)) {
@@ -5645,43 +5684,85 @@ xfs_bmap_shift_extents(
}

/*
+ * There may be delalloc extents in the data fork before the range we
+ * are collapsing out, so we cannot use the count of real extents here.
+ * Instead we have to calculate it from the incore fork.
+ */
+ total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
+ if (total_extents == 0) {
+ *done = 1;
+ goto del_cursor;
+ }
+
+ /*
+ * In case of first right shift, we need to initialize next_fsb
+ */
+ if (*next_fsb == NULLFSBLOCK) {
+ ASSERT(SHIFT == SHIFT_RIGHT);
+ gotp = xfs_iext_get_ext(ifp, total_extents - 1);
+ xfs_bmbt_get_all(gotp, &got);
+ *next_fsb = got.br_startoff;
+ if (stop_fsb > *next_fsb) {
+ *done = 1;
+ goto del_cursor;
+ }
+ }
+
+ /* Lookup the extent index at which we have to stop */
+ if (SHIFT == SHIFT_RIGHT) {
+ gotp = xfs_iext_bno_to_ext(ifp, stop_fsb, &stop_extent);
+ /* Make stop_extent exclusive of shift range */
+ stop_extent--;
+ } else
+ stop_extent = total_extents;
+
+ /*
* Look up the extent index for the fsb where we start shifting. We can
* henceforth iterate with current_ext as extent list changes are locked
* out via ilock.
*
* gotp can be null in 2 cases: 1) if there are no extents or 2)
- * start_fsb lies in a hole beyond which there are no extents. Either
+ * *next_fsb lies in a hole beyond which there are no extents. Either
* way, we are done.
*/
- gotp = xfs_iext_bno_to_ext(ifp, start_fsb, &current_ext);
+ gotp = xfs_iext_bno_to_ext(ifp, *next_fsb, &current_ext);
if (!gotp) {
*done = 1;
goto del_cursor;
}

- /*
- * There may be delalloc extents in the data fork before the range we
- * are collapsing out, so we cannot use the count of real extents here.
- * Instead we have to calculate it from the incore fork.
- */
- total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
- while (nexts++ < num_exts && current_ext < total_extents) {
+ /* some sanity checking before we finally start shifting extents */
+ if ((SHIFT == SHIFT_LEFT && current_ext >= stop_extent) ||
+ (SHIFT == SHIFT_RIGHT && current_ext <= stop_extent)) {
+ error = EIO;
+ goto del_cursor;
+ }
+
+ while (nexts++ < num_exts) {
error = xfs_bmse_shift_one(ip, whichfork, offset_shift_fsb,
- &current_ext, gotp, cur, &logflags);
+ &current_ext, gotp, cur, &logflags,
+ SHIFT);
if (error)
goto del_cursor;
+ /*
+ * In case there was an extent merge after shifting extent,
+ * extent numbers would change.
+ * Update total extent count and grab the next record.
+ */
+ if (SHIFT == SHIFT_LEFT) {
+ total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
+ stop_extent = total_extents;
+ }

- /* update total extent count and grab the next record */
- total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
- if (current_ext >= total_extents)
+ if (current_ext == stop_extent) {
+ *done = 1;
+ *next_fsb = NULLFSBLOCK;
break;
+ }
gotp = xfs_iext_get_ext(ifp, current_ext);
}

- /* Check if we are done */
- if (current_ext == total_extents) {
- *done = 1;
- } else if (next_fsb) {
+ if (!*done) {
xfs_bmbt_get_all(gotp, &got);
*next_fsb = got.br_startoff;
}
@@ -5696,3 +5777,192 @@ del_cursor:

return error;
}
+
+/*
+ * Splits an extent into two extents at split_fsb block that it is
+ * the first block of the current_ext. @current_ext is a target extent
+ * to be split. @split_fsb is a block where the extents is split.
+ * If split_fsb lies in a hole or the first block of extents, just return 0.
+ */
+STATIC int
+xfs_bmap_split_extent_at(
+ struct xfs_trans *tp,
+ struct xfs_inode *ip,
+ xfs_fileoff_t split_fsb,
+ xfs_fsblock_t *firstfsb,
+ struct xfs_bmap_free *free_list)
+{
+ int whichfork = XFS_DATA_FORK;
+ struct xfs_btree_cur *cur = NULL;
+ struct xfs_bmbt_rec_host *gotp;
+ struct xfs_bmbt_irec got;
+ struct xfs_bmbt_irec new; /* split extent */
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_ifork *ifp;
+ xfs_fsblock_t gotblkcnt; /* new block count for got */
+ xfs_extnum_t current_ext;
+ int error = 0;
+ int logflags = 0;
+ int i = 0;
+
+ if (unlikely(XFS_TEST_ERROR(
+ (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
+ XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
+ mp, XFS_ERRTAG_BMAPIFORMAT, XFS_RANDOM_BMAPIFORMAT))) {
+ XFS_ERROR_REPORT("xfs_bmap_split_extent_at",
+ XFS_ERRLEVEL_LOW, mp);
+ return -EFSCORRUPTED;
+ }
+
+ if (XFS_FORCED_SHUTDOWN(mp))
+ return -EIO;
+
+ ifp = XFS_IFORK_PTR(ip, whichfork);
+ if (!(ifp->if_flags & XFS_IFEXTENTS)) {
+ /* Read in all the extents */
+ error = xfs_iread_extents(tp, ip, whichfork);
+ if (error)
+ return error;
+ }
+
+ gotp = xfs_iext_bno_to_ext(ifp, split_fsb, &current_ext);
+ /*
+ * gotp can be null in 2 cases: 1) if there are no extents
+ * or 2) split_fsb lies in a hole beyond which there are
+ * no extents. Either way, we are done.
+ */
+ if (!gotp)
+ return 0;
+
+ xfs_bmbt_get_all(gotp, &got);
+
+ /*
+ * Check split_fsb lies in a hole or the start boundary offset
+ * of the extent.
+ */
+ if (got.br_startoff >= split_fsb)
+ return 0;
+
+ gotblkcnt = split_fsb - got.br_startoff;
+ new.br_startoff = split_fsb;
+ new.br_startblock = got.br_startblock + gotblkcnt;
+ new.br_blockcount = got.br_blockcount - gotblkcnt;
+ new.br_state = got.br_state;
+
+ if (ifp->if_flags & XFS_IFBROOT) {
+ cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
+ cur->bc_private.b.firstblock = *firstfsb;
+ cur->bc_private.b.flist = free_list;
+ cur->bc_private.b.flags = 0;
+ }
+
+ if (cur) {
+ error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
+ got.br_startblock,
+ got.br_blockcount,
+ &i);
+ if (error)
+ goto del_cursor;
+ XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor);
+ }
+
+ xfs_bmbt_set_blockcount(gotp, gotblkcnt);
+ got.br_blockcount = gotblkcnt;
+
+ logflags = XFS_ILOG_CORE;
+ if (cur) {
+ error = xfs_bmbt_update(cur, got.br_startoff,
+ got.br_startblock,
+ got.br_blockcount,
+ got.br_state);
+ if (error)
+ goto del_cursor;
+ } else
+ logflags |= XFS_ILOG_DEXT;
+
+ /* Add new extent */
+ current_ext++;
+ xfs_iext_insert(ip, current_ext, 1, &new, 0);
+ XFS_IFORK_NEXT_SET(ip, whichfork,
+ XFS_IFORK_NEXTENTS(ip, whichfork) + 1);
+
+ if (cur) {
+ error = xfs_bmbt_lookup_eq(cur, new.br_startoff,
+ new.br_startblock, new.br_blockcount,
+ &i);
+ if (error)
+ goto del_cursor;
+ XFS_WANT_CORRUPTED_GOTO(i == 0, del_cursor);
+ cur->bc_rec.b.br_state = new.br_state;
+
+ error = xfs_btree_insert(cur, &i);
+ if (error)
+ goto del_cursor;
+ XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor);
+ }
+
+ /*
+ * Convert to a btree if necessary.
+ */
+ if (xfs_bmap_needs_btree(ip, whichfork)) {
+ int tmp_logflags; /* partial log flag return val */
+
+ ASSERT(cur == NULL);
+ error = xfs_bmap_extents_to_btree(tp, ip, firstfsb, free_list,
+ &cur, 0, &tmp_logflags, whichfork);
+ logflags |= tmp_logflags;
+ }
+
+del_cursor:
+ if (cur) {
+ cur->bc_private.b.allocated = 0;
+ xfs_btree_del_cursor(cur,
+ error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
+ }
+
+ if (logflags)
+ xfs_trans_log_inode(tp, ip, logflags);
+ return error;
+}
+
+int
+xfs_bmap_split_extent(
+ struct xfs_inode *ip,
+ xfs_fileoff_t split_fsb)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_trans *tp;
+ struct xfs_bmap_free free_list;
+ xfs_fsblock_t firstfsb;
+ int committed;
+ int error;
+
+ tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
+ error = xfs_trans_reserve(tp, &M_RES(mp)->tr_write,
+ XFS_DIOSTRAT_SPACE_RES(mp, 0), 0);
+ if (error) {
+ xfs_trans_cancel(tp, 0);
+ return error;
+ }
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+
+ xfs_bmap_init(&free_list, &firstfsb);
+
+ error = xfs_bmap_split_extent_at(tp, ip, split_fsb,
+ &firstfsb, &free_list);
+ if (error)
+ goto out;
+
+ error = xfs_bmap_finish(&tp, &free_list, &committed);
+ if (error)
+ goto out;
+
+ return xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+
+
+out:
+ xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
+ return error;
+}
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index b9d8a49..6ed6cd1 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -166,6 +166,11 @@ static inline void xfs_bmap_init(xfs_bmap_free_t *flp, xfs_fsblock_t *fbp)
*/
#define XFS_BMAP_MAX_SHIFT_EXTENTS 1

+enum SHIFT_DIRECTION {
+ SHIFT_LEFT = 0,
+ SHIFT_RIGHT,
+};
+
#ifdef DEBUG
void xfs_bmap_trace_exlist(struct xfs_inode *ip, xfs_extnum_t cnt,
int whichfork, unsigned long caller_ip);
@@ -211,8 +216,10 @@ int xfs_check_nostate_extents(struct xfs_ifork *ifp, xfs_extnum_t idx,
xfs_extnum_t num);
uint xfs_default_attroffset(struct xfs_inode *ip);
int xfs_bmap_shift_extents(struct xfs_trans *tp, struct xfs_inode *ip,
- xfs_fileoff_t start_fsb, xfs_fileoff_t offset_shift_fsb,
- int *done, xfs_fileoff_t *next_fsb, xfs_fsblock_t *firstblock,
- struct xfs_bmap_free *flist, int num_exts);
+ xfs_fileoff_t *next_fsb, xfs_fileoff_t offset_shift_fsb,
+ int *done, xfs_fileoff_t stop_fsb, xfs_fsblock_t *firstblock,
+ struct xfs_bmap_free *flist, enum SHIFT_DIRECTION SHIFT,
+ int num_exts);
+int xfs_bmap_split_extent(struct xfs_inode *ip, xfs_fileoff_t split_offset);

#endif /* __XFS_BMAP_H__ */
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 22a5dcb..841744c 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1376,22 +1376,19 @@ out:
}

/*
- * xfs_collapse_file_space()
- * This routine frees disk space and shift extent for the given file.
- * The first thing we do is to free data blocks in the specified range
- * by calling xfs_free_file_space(). It would also sync dirty data
- * and invalidate page cache over the region on which collapse range
- * is working. And Shift extent records to the left to cover a hole.
- * RETURNS:
- * 0 on success
- * errno on error
- *
+ * @next_fsb will keep track of the extent currently undergoing shift.
+ * @stop_fsb will keep track of the extent at which we have to stop.
+ * If we are shifting left, we will start with block (offset + len) and
+ * shift each extent till last extent.
+ * If we are shifting right, we will start with last extent inside file space
+ * and continue until we reach the block corresponding to offset.
*/
int
-xfs_collapse_file_space(
- struct xfs_inode *ip,
- xfs_off_t offset,
- xfs_off_t len)
+xfs_shift_file_space(
+ struct xfs_inode *ip,
+ xfs_off_t offset,
+ xfs_off_t len,
+ enum SHIFT_DIRECTION SHIFT)
{
int done = 0;
struct xfs_mount *mp = ip->i_mount;
@@ -1400,21 +1397,26 @@ xfs_collapse_file_space(
struct xfs_bmap_free free_list;
xfs_fsblock_t first_block;
int committed;
- xfs_fileoff_t start_fsb;
+ xfs_fileoff_t stop_fsb;
xfs_fileoff_t next_fsb;
xfs_fileoff_t shift_fsb;

- ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+ ASSERT(SHIFT == SHIFT_LEFT || SHIFT == SHIFT_RIGHT);

- trace_xfs_collapse_file_space(ip);
+ if (SHIFT == SHIFT_LEFT) {
+ next_fsb = XFS_B_TO_FSB(mp, offset + len);
+ stop_fsb = XFS_B_TO_FSB(mp, VFS_I(ip)->i_size);
+ } else {
+ /*
+ * If right shift, delegate the work of initialization of
+ * next_fsb to xfs_bmap_shift_extent as it has ilock held.
+ */
+ next_fsb = NULLFSBLOCK;
+ stop_fsb = XFS_B_TO_FSB(mp, offset);
+ }

- next_fsb = XFS_B_TO_FSB(mp, offset + len);
shift_fsb = XFS_B_TO_FSB(mp, len);

- error = xfs_free_file_space(ip, offset, len);
- if (error)
- return error;
-
/*
* Trim eofblocks to avoid shifting uninitialized post-eof preallocation
* into the accessible region of the file.
@@ -1427,20 +1429,23 @@ xfs_collapse_file_space(

/*
* Writeback and invalidate cache for the remainder of the file as we're
- * about to shift down every extent from the collapse range to EOF. The
- * free of the collapse range above might have already done some of
- * this, but we shouldn't rely on it to do anything outside of the range
- * that was freed.
+ * about to shift down every extent from offset to EOF.
*/
error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
- offset + len, -1);
+ offset, -1);
if (error)
return error;
error = invalidate_inode_pages2_range(VFS_I(ip)->i_mapping,
- (offset + len) >> PAGE_CACHE_SHIFT, -1);
+ offset >> PAGE_CACHE_SHIFT, -1);
if (error)
return error;

+ if (SHIFT == SHIFT_RIGHT) {
+ error = xfs_bmap_split_extent(ip, stop_fsb);
+ if (error)
+ return error;
+ }
+
while (!error && !done) {
tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
/*
@@ -1464,7 +1469,7 @@ xfs_collapse_file_space(
if (error)
goto out;

- xfs_trans_ijoin(tp, ip, 0);
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);

xfs_bmap_init(&free_list, &first_block);

@@ -1472,10 +1477,9 @@ xfs_collapse_file_space(
* We are using the write transaction in which max 2 bmbt
* updates are allowed
*/
- start_fsb = next_fsb;
- error = xfs_bmap_shift_extents(tp, ip, start_fsb, shift_fsb,
- &done, &next_fsb, &first_block, &free_list,
- XFS_BMAP_MAX_SHIFT_EXTENTS);
+ error = xfs_bmap_shift_extents(tp, ip, &next_fsb, shift_fsb,
+ &done, stop_fsb, &first_block, &free_list,
+ SHIFT, XFS_BMAP_MAX_SHIFT_EXTENTS);
if (error)
goto out;

@@ -1484,18 +1488,70 @@ xfs_collapse_file_space(
goto out;

error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
}

return error;

out:
xfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT);
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
return error;
}

/*
+ * xfs_collapse_file_space()
+ * This routine frees disk space and shift extent for the given file.
+ * The first thing we do is to free data blocks in the specified range
+ * by calling xfs_free_file_space(). It would also sync dirty data
+ * and invalidate page cache over the region on which collapse range
+ * is working. And Shift extent records to the left to cover a hole.
+ * RETURNS:
+ * 0 on success
+ * errno on error
+ *
+ */
+int
+xfs_collapse_file_space(
+ struct xfs_inode *ip,
+ xfs_off_t offset,
+ xfs_off_t len)
+{
+ int error;
+
+ ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+ trace_xfs_collapse_file_space(ip);
+
+ error = xfs_free_file_space(ip, offset, len);
+ if (error)
+ return error;
+
+ return xfs_shift_file_space(ip, offset, len, SHIFT_LEFT);
+}
+
+/*
+ * xfs_insert_file_space()
+ * This routine create hole space by shifting extents for the given file.
+ * The first thing we do is to sync dirty data and invalidate page cache
+ * over the region on which insert range is working. And split an extent
+ * to two extents at given offset by calling xfs_bmap_split_extent.
+ * And shift all extent records which are laying between [offset,
+ * last allocated extent] to the right to reserve hole range.
+ * RETURNS:
+ * 0 on success
+ * errno on error
+ */
+int
+xfs_insert_file_space(
+ struct xfs_inode *ip,
+ loff_t offset,
+ loff_t len)
+{
+ ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
+ trace_xfs_insert_file_space(ip);
+
+ return xfs_shift_file_space(ip, offset, len, SHIFT_RIGHT);
+}
+
+/*
* We need to check that the format of the data fork in the temporary inode is
* valid for the target inode before doing the swap. This is not a problem with
* attr1 because of the fixed fork offset, but attr2 has a dynamically sized
diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h
index 736429a..af97d9a 100644
--- a/fs/xfs/xfs_bmap_util.h
+++ b/fs/xfs/xfs_bmap_util.h
@@ -63,6 +63,8 @@ int xfs_zero_file_space(struct xfs_inode *ip, xfs_off_t offset,
xfs_off_t len);
int xfs_collapse_file_space(struct xfs_inode *, xfs_off_t offset,
xfs_off_t len);
+int xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset,
+ xfs_off_t len);

/* EOF block manipulation functions */
bool xfs_can_free_eofblocks(struct xfs_inode *ip, bool force);
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 1cdba95..222a91a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -823,11 +823,13 @@ xfs_file_fallocate(
long error;
enum xfs_prealloc_flags flags = 0;
loff_t new_size = 0;
+ int do_file_insert = 0;

if (!S_ISREG(inode->i_mode))
return -EINVAL;
if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
- FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
+ FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
+ FALLOC_FL_INSERT_RANGE))
return -EOPNOTSUPP;

xfs_ilock(ip, XFS_IOLOCK_EXCL);
@@ -857,6 +859,28 @@ xfs_file_fallocate(
error = xfs_collapse_file_space(ip, offset, len);
if (error)
goto out_unlock;
+ } else if (mode & FALLOC_FL_INSERT_RANGE) {
+ unsigned blksize_mask = (1 << inode->i_blkbits) - 1;
+
+ if (offset & blksize_mask || len & blksize_mask) {
+ error = -EINVAL;
+ goto out_unlock;
+ }
+
+ /* Check for wrap through zero */
+ if (inode->i_size + len > inode->i_sb->s_maxbytes) {
+ error = -EFBIG;
+ goto out_unlock;
+ }
+
+ /* Offset should be less than i_size */
+ if (offset >= i_size_read(inode)) {
+ error = -EINVAL;
+ goto out_unlock;
+ }
+
+ new_size = i_size_read(inode) + len;
+ do_file_insert = 1;
} else {
flags |= XFS_PREALLOC_SET;

@@ -891,8 +915,20 @@ xfs_file_fallocate(
iattr.ia_valid = ATTR_SIZE;
iattr.ia_size = new_size;
error = xfs_setattr_size(ip, &iattr);
+ if (error)
+ goto out_unlock;
}

+ /*
+ * Some operations are performed after the inode size is updated. For
+ * example, insert range expands the address space of the file, shifts
+ * all subsequent extents to create a hole inside the file. Updating
+ * the size first ensures that shifted extents aren't left hanging
+ * past EOF in the event of a crash or failure.
+ */
+ if (do_file_insert)
+ error = xfs_insert_file_space(ip, offset, len);
+
out_unlock:
xfs_iunlock(ip, XFS_IOLOCK_EXCL);
return error;
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 51372e3..7e45fa1 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -664,6 +664,7 @@ DEFINE_INODE_EVENT(xfs_alloc_file_space);
DEFINE_INODE_EVENT(xfs_free_file_space);
DEFINE_INODE_EVENT(xfs_zero_file_space);
DEFINE_INODE_EVENT(xfs_collapse_file_space);
+DEFINE_INODE_EVENT(xfs_insert_file_space);
DEFINE_INODE_EVENT(xfs_readdir);
#ifdef CONFIG_XFS_POSIX_ACL
DEFINE_INODE_EVENT(xfs_get_acl);
--
1.7.9.5

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2015-02-16 15:47:50

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH RESEND 3/12] ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate

From: Namjae Jeon <[email protected]>

This patch implements fallocate's FALLOC_FL_INSERT_RANGE for Ext4.

1) Make sure that both offset and len are block size aligned.
2) Update the i_size of inode by len bytes.
3) Compute the file's logical block number against offset. If the computed
block number is not the starting block of the extent, split the extent
such that the block number is the starting block of the extent.
4) Shift all the extents which are lying bewteen [offset, last allocated extent]
towards right by len bytes. This step will make a hole of len bytes
at offset.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Ashish Sangwan <[email protected]>
---
fs/ext4/ext4.h | 6 +
fs/ext4/extents.c | 302 +++++++++++++++++++++++++++++++++++--------
include/trace/events/ext4.h | 25 ++++
3 files changed, 282 insertions(+), 51 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 98ee89c..6db57e6 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -90,6 +90,11 @@ typedef __u32 ext4_lblk_t;
/* data type for block group number */
typedef unsigned int ext4_group_t;

+enum SHIFT_DIRECTION {
+ SHIFT_LEFT = 0,
+ SHIFT_RIGHT,
+};
+
/*
* Flags used in mballoc's allocation_context flags field.
*
@@ -2766,6 +2771,7 @@ extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
__u64 start, __u64 len);
extern int ext4_ext_precache(struct inode *inode);
extern int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len);
+extern int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len);
extern int ext4_swap_extents(handle_t *handle, struct inode *inode1,
struct inode *inode2, ext4_lblk_t lblk1,
ext4_lblk_t lblk2, ext4_lblk_t count,
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index bed4308..a07b109 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4924,7 +4924,8 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)

/* Return error if mode is not supported */
if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
- FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
+ FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
+ FALLOC_FL_INSERT_RANGE))
return -EOPNOTSUPP;

if (mode & FALLOC_FL_PUNCH_HOLE)
@@ -4944,6 +4945,9 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
if (mode & FALLOC_FL_COLLAPSE_RANGE)
return ext4_collapse_range(inode, offset, len);

+ if (mode & FALLOC_FL_INSERT_RANGE)
+ return ext4_insert_range(inode, offset, len);
+
if (mode & FALLOC_FL_ZERO_RANGE)
return ext4_zero_range(file, offset, len, mode);

@@ -5230,13 +5234,13 @@ ext4_access_path(handle_t *handle, struct inode *inode,
/*
* ext4_ext_shift_path_extents:
* Shift the extents of a path structure lying between path[depth].p_ext
- * and EXT_LAST_EXTENT(path[depth].p_hdr) downwards, by subtracting shift
- * from starting block for each extent.
+ * and EXT_LAST_EXTENT(path[depth].p_hdr), by @shift blocks. @SHIFT tells
+ * if it is right shift or left shift operation.
*/
static int
ext4_ext_shift_path_extents(struct ext4_ext_path *path, ext4_lblk_t shift,
struct inode *inode, handle_t *handle,
- ext4_lblk_t *start)
+ enum SHIFT_DIRECTION SHIFT)
{
int depth, err = 0;
struct ext4_extent *ex_start, *ex_last;
@@ -5258,19 +5262,25 @@ ext4_ext_shift_path_extents(struct ext4_ext_path *path, ext4_lblk_t shift,
if (ex_start == EXT_FIRST_EXTENT(path[depth].p_hdr))
update = 1;

- *start = le32_to_cpu(ex_last->ee_block) +
- ext4_ext_get_actual_len(ex_last);
-
while (ex_start <= ex_last) {
- le32_add_cpu(&ex_start->ee_block, -shift);
- /* Try to merge to the left. */
- if ((ex_start >
- EXT_FIRST_EXTENT(path[depth].p_hdr)) &&
- ext4_ext_try_to_merge_right(inode,
- path, ex_start - 1))
+ if (SHIFT == SHIFT_LEFT) {
+ le32_add_cpu(&ex_start->ee_block,
+ -shift);
+ /* Try to merge to the left. */
+ if ((ex_start >
+ EXT_FIRST_EXTENT(path[depth].p_hdr))
+ &&
+ ext4_ext_try_to_merge_right(inode,
+ path, ex_start - 1))
+ ex_last--;
+ else
+ ex_start++;
+ } else {
+ le32_add_cpu(&ex_last->ee_block, shift);
+ ext4_ext_try_to_merge_right(inode, path,
+ ex_last);
ex_last--;
- else
- ex_start++;
+ }
}
err = ext4_ext_dirty(handle, inode, path + depth);
if (err)
@@ -5285,7 +5295,10 @@ ext4_ext_shift_path_extents(struct ext4_ext_path *path, ext4_lblk_t shift,
if (err)
goto out;

- le32_add_cpu(&path[depth].p_idx->ei_block, -shift);
+ if (SHIFT == SHIFT_LEFT)
+ le32_add_cpu(&path[depth].p_idx->ei_block, -shift);
+ else
+ le32_add_cpu(&path[depth].p_idx->ei_block, shift);
err = ext4_ext_dirty(handle, inode, path + depth);
if (err)
goto out;
@@ -5303,19 +5316,20 @@ out:

/*
* ext4_ext_shift_extents:
- * All the extents which lies in the range from start to the last allocated
- * block for the file are shifted downwards by shift blocks.
+ * All the extents which lies in the range from @start to the last allocated
+ * block for the @inode are shifted either towards left or right (depending
+ * upon @SHIFT) by @shift blocks.
* On success, 0 is returned, error otherwise.
*/
static int
ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
- ext4_lblk_t start, ext4_lblk_t shift)
+ ext4_lblk_t start, ext4_lblk_t shift,
+ enum SHIFT_DIRECTION SHIFT)
{
struct ext4_ext_path *path;
int ret = 0, depth;
struct ext4_extent *extent;
- ext4_lblk_t stop_block;
- ext4_lblk_t ex_start, ex_end;
+ ext4_lblk_t stop, *iterator, ex_start, ex_end;

/* Let path point to the last extent */
path = ext4_find_extent(inode, EXT_MAX_BLOCKS - 1, NULL, 0);
@@ -5327,58 +5341,84 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
if (!extent)
goto out;

- stop_block = le32_to_cpu(extent->ee_block) +
+ stop = le32_to_cpu(extent->ee_block) +
ext4_ext_get_actual_len(extent);

- /* Nothing to shift, if hole is at the end of file */
- if (start >= stop_block)
- goto out;
+ /*
+ * In case of left shift, Don't start shifting extents until we make
+ * sure the hole is big enough to accommodate the shift.
+ */
+ if (SHIFT == SHIFT_LEFT) {
+ path = ext4_find_extent(inode, start - 1, &path, 0);
+ if (IS_ERR(path))
+ return PTR_ERR(path);
+ depth = path->p_depth;
+ extent = path[depth].p_ext;
+ if (extent) {
+ ex_start = le32_to_cpu(extent->ee_block);
+ ex_end = le32_to_cpu(extent->ee_block) +
+ ext4_ext_get_actual_len(extent);
+ } else {
+ ex_start = 0;
+ ex_end = 0;
+ }

- /*
- * Don't start shifting extents until we make sure the hole is big
- * enough to accomodate the shift.
- */
- path = ext4_find_extent(inode, start - 1, &path, 0);
- if (IS_ERR(path))
- return PTR_ERR(path);
- depth = path->p_depth;
- extent = path[depth].p_ext;
- if (extent) {
- ex_start = le32_to_cpu(extent->ee_block);
- ex_end = le32_to_cpu(extent->ee_block) +
- ext4_ext_get_actual_len(extent);
- } else {
- ex_start = 0;
- ex_end = 0;
+ if ((start == ex_start && shift > ex_start) ||
+ (shift > start - ex_end)) {
+ ext4_ext_drop_refs(path);
+ kfree(path);
+ return -EINVAL;
+ }
}

- if ((start == ex_start && shift > ex_start) ||
- (shift > start - ex_end))
- return -EINVAL;
+ /*
+ * In case of left shift, iterator points to start and it is increased
+ * till we reach stop. In case of right shift, iterator points to stop
+ * and it is decreased till we reach start.
+ */
+ if (SHIFT == SHIFT_LEFT)
+ iterator = &start;
+ else
+ iterator = &stop;

/* Its safe to start updating extents */
- while (start < stop_block) {
- path = ext4_find_extent(inode, start, &path, 0);
+ while (start < stop) {
+ path = ext4_find_extent(inode, *iterator, &path, 0);
if (IS_ERR(path))
return PTR_ERR(path);
depth = path->p_depth;
extent = path[depth].p_ext;
if (!extent) {
EXT4_ERROR_INODE(inode, "unexpected hole at %lu",
- (unsigned long) start);
+ (unsigned long) *iterator);
return -EIO;
}
- if (start > le32_to_cpu(extent->ee_block)) {
+ if (SHIFT == SHIFT_LEFT && *iterator >
+ le32_to_cpu(extent->ee_block)) {
/* Hole, move to the next extent */
if (extent < EXT_LAST_EXTENT(path[depth].p_hdr)) {
path[depth].p_ext++;
} else {
- start = ext4_ext_next_allocated_block(path);
+ *iterator = ext4_ext_next_allocated_block(path);
continue;
}
}
+
+ if (SHIFT == SHIFT_LEFT) {
+ extent = EXT_LAST_EXTENT(path[depth].p_hdr);
+ *iterator = le32_to_cpu(extent->ee_block) +
+ ext4_ext_get_actual_len(extent);
+ } else {
+ extent = EXT_FIRST_EXTENT(path[depth].p_hdr);
+ *iterator = le32_to_cpu(extent->ee_block) > 0 ?
+ le32_to_cpu(extent->ee_block) - 1 : 0;
+ /* Update path extent in case we need to stop */
+ while (le32_to_cpu(extent->ee_block) < start)
+ extent++;
+ path[depth].p_ext = extent;
+ }
ret = ext4_ext_shift_path_extents(path, shift, inode,
- handle, &start);
+ handle, SHIFT);
if (ret)
break;
}
@@ -5483,7 +5523,7 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
ext4_discard_preallocations(inode);

ret = ext4_ext_shift_extents(inode, handle, punch_stop,
- punch_stop - punch_start);
+ punch_stop - punch_start, SHIFT_LEFT);
if (ret) {
up_write(&EXT4_I(inode)->i_data_sem);
goto out_stop;
@@ -5508,6 +5548,166 @@ out_mutex:
return ret;
}

+/*
+ * ext4_insert_range:
+ * This function implements the FALLOC_FL_INSERT_RANGE flag of fallocate.
+ * The data blocks starting from @offset to the EOF are shifted by @len
+ * towards right to create a hole in the @inode. Inode size is increased
+ * by len bytes.
+ * Returns 0 on success, error otherwise.
+ */
+int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
+{
+ struct super_block *sb = inode->i_sb;
+ handle_t *handle;
+ struct ext4_ext_path *path;
+ struct ext4_extent *extent;
+ ext4_lblk_t offset_lblk, len_lblk, ee_start_lblk = 0;
+ unsigned int credits, ee_len;
+ int ret = 0, depth, split_flag = 0;
+ loff_t ioffset;
+
+ /* Insert range works only on fs block size aligned offsets. */
+ if (offset & (EXT4_CLUSTER_SIZE(sb) - 1) ||
+ len & (EXT4_CLUSTER_SIZE(sb) - 1))
+ return -EINVAL;
+
+ if (!S_ISREG(inode->i_mode))
+ return -EOPNOTSUPP;
+
+ trace_ext4_insert_range(inode, offset, len);
+
+ offset_lblk = offset >> EXT4_BLOCK_SIZE_BITS(sb);
+ len_lblk = len >> EXT4_BLOCK_SIZE_BITS(sb);
+
+ /* Call ext4_force_commit to flush all data in case of data=journal */
+ if (ext4_should_journal_data(inode)) {
+ ret = ext4_force_commit(inode->i_sb);
+ if (ret)
+ return ret;
+ }
+
+ /*
+ * Need to round down to align start offset to page size boundary
+ * for page size > block size.
+ */
+ ioffset = round_down(offset, PAGE_SIZE);
+
+ /* Write out all dirty pages */
+ ret = filemap_write_and_wait_range(inode->i_mapping, ioffset,
+ LLONG_MAX);
+ if (ret)
+ return ret;
+
+ /* Take mutex lock */
+ mutex_lock(&inode->i_mutex);
+
+ /* Currently just for extent based files */
+ if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) {
+ ret = -EOPNOTSUPP;
+ goto out_mutex;
+ }
+
+ /* Check for wrap through zero */
+ if (inode->i_size + len > inode->i_sb->s_maxbytes) {
+ ret = -EFBIG;
+ goto out_mutex;
+ }
+
+ /* Offset should be less than i_size */
+ if (offset >= i_size_read(inode)) {
+ ret = -EINVAL;
+ goto out_mutex;
+ }
+
+ truncate_pagecache(inode, ioffset);
+
+ /* Wait for existing dio to complete */
+ ext4_inode_block_unlocked_dio(inode);
+ inode_dio_wait(inode);
+
+ credits = ext4_writepage_trans_blocks(inode);
+ handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ goto out_dio;
+ }
+
+ /* Expand file to avoid data loss if there is error while shifting */
+ inode->i_size += len;
+ EXT4_I(inode)->i_disksize += len;
+ inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
+ ret = ext4_mark_inode_dirty(handle, inode);
+ if (ret)
+ goto out_stop;
+
+ down_write(&EXT4_I(inode)->i_data_sem);
+ ext4_discard_preallocations(inode);
+
+ path = ext4_find_extent(inode, offset_lblk, NULL, 0);
+ if (IS_ERR(path)) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ goto out_stop;
+ }
+
+ depth = ext_depth(inode);
+ extent = path[depth].p_ext;
+ if (extent) {
+ ee_start_lblk = le32_to_cpu(extent->ee_block);
+ ee_len = ext4_ext_get_actual_len(extent);
+
+ /*
+ * If offset_lblk is not the starting block of extent, split
+ * the extent @offset_lblk
+ */
+ if ((offset_lblk > ee_start_lblk) &&
+ (offset_lblk < (ee_start_lblk + ee_len))) {
+ if (ext4_ext_is_unwritten(extent))
+ split_flag = EXT4_EXT_MARK_UNWRIT1 |
+ EXT4_EXT_MARK_UNWRIT2;
+ ret = ext4_split_extent_at(handle, inode, &path,
+ offset_lblk, split_flag,
+ EXT4_EX_NOCACHE |
+ EXT4_GET_BLOCKS_PRE_IO |
+ EXT4_GET_BLOCKS_METADATA_NOFAIL);
+ }
+
+ ext4_ext_drop_refs(path);
+ kfree(path);
+ if (ret < 0) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ goto out_stop;
+ }
+ }
+
+ ret = ext4_es_remove_extent(inode, offset_lblk,
+ EXT_MAX_BLOCKS - offset_lblk);
+ if (ret) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ goto out_stop;
+ }
+
+ /*
+ * if offset_lblk lies in a hole which is at start of file, use
+ * ee_start_lblk to shift extents
+ */
+ ret = ext4_ext_shift_extents(inode, handle,
+ ee_start_lblk > offset_lblk ? ee_start_lblk : offset_lblk,
+ len_lblk, SHIFT_RIGHT);
+
+ up_write(&EXT4_I(inode)->i_data_sem);
+ if (IS_SYNC(inode))
+ ext4_handle_sync(handle);
+
+out_stop:
+ ext4_journal_stop(handle);
+out_dio:
+ ext4_inode_resume_unlocked_dio(inode);
+out_mutex:
+ mutex_unlock(&inode->i_mutex);
+ return ret;
+}
+
/**
* ext4_swap_extents - Swap extents between two inodes
*
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index 6e5abd6..2a89d66 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -2478,6 +2478,31 @@ TRACE_EVENT(ext4_collapse_range,
__entry->offset, __entry->len)
);

+TRACE_EVENT(ext4_insert_range,
+ TP_PROTO(struct inode *inode, loff_t offset, loff_t len),
+
+ TP_ARGS(inode, offset, len),
+
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(ino_t, ino)
+ __field(loff_t, offset)
+ __field(loff_t, len)
+ ),
+
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->offset = offset;
+ __entry->len = len;
+ ),
+
+ TP_printk("dev %d,%d ino %lu offset %lld len %lld",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino,
+ __entry->offset, __entry->len)
+);
+
TRACE_EVENT(ext4_es_shrink,
TP_PROTO(struct super_block *sb, int nr_shrunk, u64 scan_time,
int nr_skipped, int retried),
--
1.7.9.5

2015-02-16 15:47:52

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH RESEND 5/12] xfstests: generic/042: Standard insert range tests

From: Namjae Jeon <[email protected]>

This testcase(042) tries to test various corner cases for finsert range
functionality over different type of extents.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Ashish Sangwan <[email protected]>
---
common/punch | 5 ++++
common/rc | 2 +-
tests/generic/042 | 65 +++++++++++++++++++++++++++++++++++++++++
tests/generic/042.out | 78 +++++++++++++++++++++++++++++++++++++++++++++++++
tests/generic/group | 1 +
5 files changed, 150 insertions(+), 1 deletion(-)
create mode 100644 tests/generic/042
create mode 100644 tests/generic/042.out

diff --git a/common/punch b/common/punch
index 237b4d8..a75f4cf 100644
--- a/common/punch
+++ b/common/punch
@@ -527,6 +527,11 @@ _test_generic_punch()
return
fi

+ # If zero_cmd is finsert, don't check unaligned offsets
+ if [ "$zero_cmd" == "finsert" ]; then
+ return
+ fi
+
echo " 16. data -> cache cold ->hole"
if [ "$remove_testfile" ]; then
rm -f $testfile
diff --git a/common/rc b/common/rc
index 5377ba0..4388e29 100644
--- a/common/rc
+++ b/common/rc
@@ -1520,7 +1520,7 @@ _require_xfs_io_command()
"falloc" )
testio=`$XFS_IO_PROG -F -f -c "falloc 0 1m" $testfile 2>&1`
;;
- "fpunch" | "fcollapse" | "zero" | "fzero" )
+ "fpunch" | "fcollapse" | "zero" | "fzero" | "finsert" )
testio=`$XFS_IO_PROG -F -f -c "pwrite 0 20k" -c "fsync" \
-c "$command 4k 8k" $testfile 2>&1`
;;
diff --git a/tests/generic/042 b/tests/generic/042
new file mode 100644
index 0000000..9b83e8d
--- /dev/null
+++ b/tests/generic/042
@@ -0,0 +1,65 @@
+#! /bin/bash
+# FS QA Test No. generic/042
+#
+# Standard insert range tests
+# This testcase is one of the 4 testcases which tries to
+# test various corner cases for finsert range functionality over different
+# type of extents. These tests are based on generic/255 test case.
+# For the type of tests, check the description of _test_generic_punch
+# in common/rc.
+#-----------------------------------------------------------------------
+# Copyright (c) 2015 Samsung Electronics. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+#
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+
+_cleanup()
+{
+ rm -f $tmp.*
+}
+
+trap "_cleanup ; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+# we need to include common/punch to get defination fo filter functions
+. ./common/rc
+. ./common/filter
+. ./common/punch
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+
+_require_xfs_io_command "fpunch"
+_require_xfs_io_command "falloc"
+_require_xfs_io_command "fiemap"
+_require_xfs_io_command "finsert"
+
+testfile=$TEST_DIR/$seq.$$
+
+_test_generic_punch falloc fpunch finsert fiemap _filter_hole_fiemap $testfile
+_check_test_fs
+
+status=0
+exit
diff --git a/tests/generic/042.out b/tests/generic/042.out
new file mode 100644
index 0000000..2406d71
--- /dev/null
+++ b/tests/generic/042.out
@@ -0,0 +1,78 @@
+QA output created by 042
+ 1. into a hole
+cf845a781c107ec1346e849c9dd1b7e8
+ 2. into allocated space
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..55]: extent
+64e72217eebcbdf31b1b058f9f5f476a
+ 3. into unwritten space
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..55]: extent
+cf845a781c107ec1346e849c9dd1b7e8
+ 4. hole -> data
+0: [0..31]: hole
+1: [32..47]: extent
+2: [48..55]: hole
+adb08a6d94a3b5eff90fdfebb2366d31
+ 5. hole -> unwritten
+0: [0..31]: hole
+1: [32..47]: extent
+2: [48..55]: hole
+cf845a781c107ec1346e849c9dd1b7e8
+ 6. data -> hole
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..31]: extent
+3: [32..55]: hole
+be0f35d4292a20040766d87883b0abd1
+ 7. data -> unwritten
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..47]: extent
+3: [48..55]: hole
+be0f35d4292a20040766d87883b0abd1
+ 8. unwritten -> hole
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..31]: extent
+3: [32..55]: hole
+cf845a781c107ec1346e849c9dd1b7e8
+ 9. unwritten -> data
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..47]: extent
+3: [48..55]: hole
+adb08a6d94a3b5eff90fdfebb2366d31
+ 10. hole -> data -> hole
+0: [0..39]: hole
+1: [40..47]: extent
+2: [48..63]: hole
+0487b3c52810f994c541aa166215375f
+ 11. data -> hole -> data
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..39]: extent
+3: [40..47]: hole
+4: [48..63]: extent
+e3a8d52acc4d91a8ed19d7b6f4f26a71
+ 12. unwritten -> data -> unwritten
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..63]: extent
+0487b3c52810f994c541aa166215375f
+ 13. data -> unwritten -> data
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..63]: extent
+2b22165f4a24a2c36fd05ef00b41df88
+ 14. data -> hole @ EOF
+0: [0..23]: extent
+1: [24..39]: hole
+2: [40..55]: extent
+aa0f20d1edcdbce60d8ef82700ba30c3
+ 15. data -> hole @ 0
+0: [0..15]: hole
+1: [16..55]: extent
+86c9d033be2761385c9cfa203c426bb2
diff --git a/tests/generic/group b/tests/generic/group
index fb67b57..0d41c72 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -44,6 +44,7 @@
039 metadata auto quick
040 metadata auto quick
041 metadata auto quick
+042 auto quick prealloc
053 acl repair auto quick
062 attr udf auto quick
068 other auto freeze dangerous stress
--
1.7.9.5

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2015-02-16 15:47:53

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH RESEND 6/12] xfstests: generic/043: Delayed allocation insert range

From: Namjae Jeon <[email protected]>

This testcase(043) tries to test various corner cases with delayed extents
for finsert range functionality over different type of extents.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Ashish Sangwan <[email protected]>
---
tests/generic/043 | 65 +++++++++++++++++++++++++++++++++++++++++
tests/generic/043.out | 78 +++++++++++++++++++++++++++++++++++++++++++++++++
tests/generic/group | 1 +
3 files changed, 144 insertions(+)
create mode 100644 tests/generic/043
create mode 100644 tests/generic/043.out

diff --git a/tests/generic/043 b/tests/generic/043
new file mode 100644
index 0000000..e70644d
--- /dev/null
+++ b/tests/generic/043
@@ -0,0 +1,65 @@
+#! /bin/bash
+# FS QA Test No. generic/043
+#
+# Delayed allocation insert range tests
+# This testcase is one of the 4 testcases which tries to
+# test various corner cases for finsert range functionality over different
+# type of extents. These tests are based on generic/255 test case.
+# For the type of tests, check the description of _test_generic_punch
+# in common/rc.
+#-----------------------------------------------------------------------
+# Copyright (c) 2015 Samsung Electronics. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+#
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+
+_cleanup()
+{
+ rm -f $tmp.*
+}
+
+trap "_cleanup ; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+# we need to include common/punch to get defination fo filter functions
+. ./common/rc
+. ./common/filter
+. ./common/punch
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+
+_require_xfs_io_command "fpunch"
+_require_xfs_io_command "falloc"
+_require_xfs_io_command "fiemap"
+_require_xfs_io_command "finsert"
+
+testfile=$TEST_DIR/$seq.$$
+
+_test_generic_punch -d falloc fpunch finsert fiemap _filter_hole_fiemap $testfile
+_check_test_fs
+
+status=0
+exit
diff --git a/tests/generic/043.out b/tests/generic/043.out
new file mode 100644
index 0000000..817ed09
--- /dev/null
+++ b/tests/generic/043.out
@@ -0,0 +1,78 @@
+QA output created by 043
+ 1. into a hole
+cf845a781c107ec1346e849c9dd1b7e8
+ 2. into allocated space
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..55]: extent
+64e72217eebcbdf31b1b058f9f5f476a
+ 3. into unwritten space
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..55]: extent
+cf845a781c107ec1346e849c9dd1b7e8
+ 4. hole -> data
+0: [0..31]: hole
+1: [32..47]: extent
+2: [48..55]: hole
+adb08a6d94a3b5eff90fdfebb2366d31
+ 5. hole -> unwritten
+0: [0..31]: hole
+1: [32..47]: extent
+2: [48..55]: hole
+cf845a781c107ec1346e849c9dd1b7e8
+ 6. data -> hole
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..31]: extent
+3: [32..55]: hole
+be0f35d4292a20040766d87883b0abd1
+ 7. data -> unwritten
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..47]: extent
+3: [48..55]: hole
+be0f35d4292a20040766d87883b0abd1
+ 8. unwritten -> hole
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..31]: extent
+3: [32..55]: hole
+cf845a781c107ec1346e849c9dd1b7e8
+ 9. unwritten -> data
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..47]: extent
+3: [48..55]: hole
+adb08a6d94a3b5eff90fdfebb2366d31
+ 10. hole -> data -> hole
+0: [0..39]: hole
+1: [40..47]: extent
+2: [48..63]: hole
+0487b3c52810f994c541aa166215375f
+ 11. data -> hole -> data
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..39]: extent
+3: [40..47]: hole
+4: [48..63]: extent
+e3a8d52acc4d91a8ed19d7b6f4f26a71
+ 12. unwritten -> data -> unwritten
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..63]: extent
+0487b3c52810f994c541aa166215375f
+ 13. data -> unwritten -> data
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..63]: extent
+2b22165f4a24a2c36fd05ef00b41df88
+ 14. data -> hole @ EOF
+0: [0..23]: extent
+1: [24..39]: hole
+2: [40..55]: extent
+aa0f20d1edcdbce60d8ef82700ba30c3
+ 15. data -> hole @ 0
+0: [0..15]: hole
+1: [16..55]: extent
+86c9d033be2761385c9cfa203c426bb2
diff --git a/tests/generic/group b/tests/generic/group
index 0d41c72..c2156a1 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -45,6 +45,7 @@
040 metadata auto quick
041 metadata auto quick
042 auto quick prealloc
+043 auto quick prealloc
053 acl repair auto quick
062 attr udf auto quick
068 other auto freeze dangerous stress
--
1.7.9.5

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2015-02-16 15:47:54

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH RESEND 7/12] xfstests: generic/044: Multi insert range tests

From: Namjae Jeon <[email protected]>

This testcase(044) tries to test various corner cases with pre-existing holes
for finsert range functionality over different type of extents.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Ashish Sangwan <[email protected]>
---
tests/generic/044 | 65 ++++++++++++++++++++++++++++++++++++++++
tests/generic/044.out | 80 +++++++++++++++++++++++++++++++++++++++++++++++++
tests/generic/group | 1 +
3 files changed, 146 insertions(+)
create mode 100644 tests/generic/044
create mode 100644 tests/generic/044.out

diff --git a/tests/generic/044 b/tests/generic/044
new file mode 100644
index 0000000..4d6be1b
--- /dev/null
+++ b/tests/generic/044
@@ -0,0 +1,65 @@
+#! /bin/bash
+# FS QA Test No. generic/044
+#
+# Multi insert range tests
+# This testcase is one of the 4 testcases which tries to
+# test various corner cases for finsert range functionality over different
+# type of extents. These tests are based on generic/255 test case.
+# For the type of tests, check the description of _test_generic_punch
+# in common/rc.
+#-----------------------------------------------------------------------
+# Copyright (c) 2015 Samsung Electronics. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+#
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+
+_cleanup()
+{
+ rm -f $tmp.*
+}
+
+trap "_cleanup ; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+# we need to include common/punch to get defination fo filter functions
+. ./common/rc
+. ./common/filter
+. ./common/punch
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+
+_require_xfs_io_command "fpunch"
+_require_xfs_io_command "falloc"
+_require_xfs_io_command "fiemap"
+_require_xfs_io_command "finsert"
+
+testfile=$TEST_DIR/$seq.$$
+
+_test_generic_punch -k falloc fpunch finsert fiemap _filter_hole_fiemap $testfile
+_check_test_fs
+
+status=0
+exit
diff --git a/tests/generic/044.out b/tests/generic/044.out
new file mode 100644
index 0000000..4ddfb65
--- /dev/null
+++ b/tests/generic/044.out
@@ -0,0 +1,80 @@
+QA output created by 044
+ 1. into a hole
+cf845a781c107ec1346e849c9dd1b7e8
+ 2. into allocated space
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..55]: extent
+64e72217eebcbdf31b1b058f9f5f476a
+ 3. into unwritten space
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..55]: extent
+22b7303d274481990b5401b6263effe0
+ 4. hole -> data
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..55]: extent
+c4fef62ba1de9d91a977cfeec6632f19
+ 5. hole -> unwritten
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..55]: extent
+1ca74f7572a0f4ab477fdbb5682e5f61
+ 6. data -> hole
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..31]: extent
+3: [32..47]: hole
+4: [48..55]: extent
+be0f35d4292a20040766d87883b0abd1
+ 7. data -> unwritten
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..47]: extent
+3: [48..55]: hole
+bddb1f3895268acce30d516a99cb0f2f
+ 8. unwritten -> hole
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..31]: extent
+3: [32..39]: hole
+4: [40..55]: extent
+f8fc47adc45b7cf72f988b3ddf5bff64
+ 9. unwritten -> data
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..47]: extent
+3: [48..55]: hole
+c4fef62ba1de9d91a977cfeec6632f19
+ 10. hole -> data -> hole
+0: [0..7]: extent
+1: [8..39]: hole
+2: [40..63]: extent
+52af1bfcbf43f28af2328de32e0567e5
+ 11. data -> hole -> data
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..39]: extent
+3: [40..47]: hole
+4: [48..63]: extent
+e3a8d52acc4d91a8ed19d7b6f4f26a71
+ 12. unwritten -> data -> unwritten
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..63]: extent
+52af1bfcbf43f28af2328de32e0567e5
+ 13. data -> unwritten -> data
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..63]: extent
+2b22165f4a24a2c36fd05ef00b41df88
+ 14. data -> hole @ EOF
+0: [0..23]: extent
+1: [24..39]: hole
+2: [40..55]: extent
+aa0f20d1edcdbce60d8ef82700ba30c3
+ 15. data -> hole @ 0
+0: [0..15]: hole
+1: [16..55]: extent
+86c9d033be2761385c9cfa203c426bb2
diff --git a/tests/generic/group b/tests/generic/group
index c2156a1..70444a3 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -46,6 +46,7 @@
041 metadata auto quick
042 auto quick prealloc
043 auto quick prealloc
+044 auto quick prealloc
053 acl repair auto quick
062 attr udf auto quick
068 other auto freeze dangerous stress
--
1.7.9.5

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2015-02-16 15:47:56

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH RESEND 9/12] xfstests: generic/046: Test multiple fallocate insert/collapse range calls

From: Namjae Jeon <[email protected]>

This testcase(043) tries to test finsert range a single alternate block
multiple times and test merge code of collase range.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Ashish Sangwan <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
---
tests/generic/046 | 95 +++++++++++++++++++++++++++++++++++++++++++++++++
tests/generic/046.out | 2 ++
tests/generic/group | 1 +
3 files changed, 98 insertions(+)
create mode 100644 tests/generic/046
create mode 100644 tests/generic/046.out

diff --git a/tests/generic/046 b/tests/generic/046
new file mode 100644
index 0000000..5d036e0
--- /dev/null
+++ b/tests/generic/046
@@ -0,0 +1,95 @@
+#! /bin/bash
+# FS QA Test No. generic/046
+#
+# Test multiple fallocate insert/collapse range calls on same file.
+# Call insert range a single alternate block multiple times until the file
+# is left with 100 extents and as much number of extents. And Call collapse
+# range about the previously inserted ranges to test merge code of collapse
+# range. Also check for data integrity and file system consistency.
+#-----------------------------------------------------------------------
+# Copyright (c) 2015 Samsung Electronics. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+#
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+trap "rm -f $tmp.*; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+
+_require_scratch
+_require_xfs_io_command "fiemap"
+_require_xfs_io_command "finsert"
+_require_xfs_io_command "fcollapse"
+src=$SCRATCH_MNT/testfile
+dest=$SCRATCH_MNT/testfile.dest
+BLOCKS=100
+BSIZE=`get_block_size $SCRATCH_MNT`
+rm -f $seqres.full
+
+_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
+_scratch_mount || _fail "mount failed"
+length=$(($BLOCKS * $BSIZE))
+
+# Write file
+_do "$XFS_IO_PROG -f -c \"pwrite 0 $length\" -c fsync $src"
+cp $src $dest
+extent_before=`$XFS_IO_PROG -c "fiemap -v" $dest | grep "^ *[0-9]*:" |wc -l`
+
+# Insert alternate blocks
+for (( j=0; j < $(($BLOCKS/2)); j++ )); do
+ offset=$((($j*$BSIZE)*2))
+ _do "$XFS_IO_PROG -c \"finsert $offset $BSIZE\" $dest"
+done
+
+# Check if 100 extents are present
+$XFS_IO_PROG -c "fiemap -v" $dest | grep "^ *[0-9]*:" |wc -l
+
+_check_scratch_fs
+if [ $? -ne 0 ]; then
+ status=1
+ exit
+fi
+
+# Collapse alternate blocks
+for (( j=0; j < $(($BLOCKS/2)); j++ )); do
+ offset=$((($j*$BSIZE)))
+ _do "$XFS_IO_PROG -c \"fcollapse $offset $BSIZE\" $dest"
+done
+
+extent_after=`$XFS_IO_PROG -c "fiemap -v" $dest | grep "^ *[0-9]*:" |wc -l`
+if [ $extent_before -ne $extent_after ]; then
+ echo "extents mismatched before = $extent_before after = $extent_after"
+fi
+
+# compare original file and test file.
+cmp $src $dest || _fail "file bytes check failed"
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/046.out b/tests/generic/046.out
new file mode 100644
index 0000000..2a6a862
--- /dev/null
+++ b/tests/generic/046.out
@@ -0,0 +1,2 @@
+QA output created by 046
+100
diff --git a/tests/generic/group b/tests/generic/group
index 772f910..75e567e 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -48,6 +48,7 @@
043 auto quick prealloc
044 auto quick prealloc
045 auto quick prealloc
+046 auto quick prealloc
053 acl repair auto quick
062 attr udf auto quick
068 other auto freeze dangerous stress
--
1.7.9.5

2015-02-16 15:49:41

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH RESEND 11/12] xfstests: fsx: Add fallocate insert range operation

From: Namjae Jeon <[email protected]>

This commit adds fallocate FALLOC_FL_INSERT_RANGE support for fsx.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Ashish Sangwan <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
---
ltp/fsx.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 114 insertions(+), 10 deletions(-)

diff --git a/ltp/fsx.c b/ltp/fsx.c
index 3709419..9fed5b2 100644
--- a/ltp/fsx.c
+++ b/ltp/fsx.c
@@ -95,7 +95,8 @@ int logcount = 0; /* total ops */
#define OP_PUNCH_HOLE 6
#define OP_ZERO_RANGE 7
#define OP_COLLAPSE_RANGE 8
-#define OP_MAX_FULL 9
+#define OP_INSERT_RANGE 9
+#define OP_MAX_FULL 10

/* operation modifiers */
#define OP_CLOSEOPEN 100
@@ -145,6 +146,7 @@ int fallocate_calls = 1; /* -F flag disables */
int punch_hole_calls = 1; /* -H flag disables */
int zero_range_calls = 1; /* -z flag disables */
int collapse_range_calls = 1; /* -C flag disables */
+int insert_range_calls = 1; /* -i flag disables */
int mapped_reads = 1; /* -R flag disables it */
int fsxgoodfd = 0;
int o_direct; /* -Z */
@@ -339,6 +341,14 @@ logdump(void)
lp->args[0] + lp->args[1])
prt("\t******CCCC");
break;
+ case OP_INSERT_RANGE:
+ prt("INSERT 0x%x thru 0x%x\t(0x%x bytes)",
+ lp->args[0], lp->args[0] + lp->args[1] - 1,
+ lp->args[1]);
+ if (badoff >= lp->args[0] && badoff <
+ lp->args[0] + lp->args[1])
+ prt("\t******CCCC");
+ break;
case OP_SKIPPED:
prt("SKIPPED (no operation)");
break;
@@ -1012,6 +1022,59 @@ do_collapse_range(unsigned offset, unsigned length)
}
#endif

+#ifdef FALLOC_FL_INSERT_RANGE
+void
+do_insert_range(unsigned offset, unsigned length)
+{
+ unsigned end_offset;
+ int mode = FALLOC_FL_INSERT_RANGE;
+
+ if (length == 0) {
+ if (!quiet && testcalls > simulatedopcount)
+ prt("skipping zero length insert range\n");
+ log4(OP_SKIPPED, OP_INSERT_RANGE, offset, length);
+ return;
+ }
+
+ if ((loff_t)offset >= file_size) {
+ if (!quiet && testcalls > simulatedopcount)
+ prt("skipping insert range behind EOF\n");
+ log4(OP_SKIPPED, OP_INSERT_RANGE, offset, length);
+ return;
+ }
+
+ log4(OP_INSERT_RANGE, offset, length, 0);
+
+ if (testcalls <= simulatedopcount)
+ return;
+
+ end_offset = offset + length;
+ if ((progressinterval && testcalls % progressinterval == 0) ||
+ (debug && (monitorstart == -1 || monitorend == -1 ||
+ end_offset <= monitorend))) {
+ prt("%lu insert\tfrom 0x%x to 0x%x, (0x%x bytes)\n", testcalls,
+ offset, offset+length, length);
+ }
+ if (fallocate(fd, mode, (loff_t)offset, (loff_t)length) == -1) {
+ prt("insert range: %x to %x\n", offset, length);
+ prterr("do_insert_range: fallocate");
+ report_failure(161);
+ }
+
+ memmove(good_buf + end_offset, good_buf + offset,
+ file_size - offset);
+ memset(good_buf + offset, '\0', length);
+ file_size += length;
+}
+
+#else
+void
+do_insert_range(unsigned offset, unsigned length)
+{
+ return;
+}
+#endif
+
#ifdef HAVE_LINUX_FALLOC_H
/* fallocate is basically a no-op unless extending, then a lot like a truncate */
void
@@ -1117,14 +1180,25 @@ docloseopen(void)
}
}

-#define TRIM_OFF_LEN(off, len, size) \
-do { \
- if (size) \
- (off) %= (size); \
- else \
- (off) = 0; \
- if ((off) + (len) > (size)) \
- (len) = (size) - (off); \
+
+#define TRIM_OFF(off, size) \
+do { \
+ if (size) \
+ (off) %= (size); \
+ else \
+ (off) = 0; \
+} while (0)
+
+#define TRIM_LEN(off, len, size) \
+do { \
+ if ((off) + (len) > (size)) \
+ (len) = (size) - (off); \
+} while (0)
+
+#define TRIM_OFF_LEN(off, len, size) \
+do { \
+ TRIM_OFF(off, size); \
+ TRIM_LEN(off, len, size); \
} while (0)

void
@@ -1192,6 +1266,12 @@ test(void)
goto out;
}
break;
+ case OP_INSERT_RANGE:
+ if (!insert_range_calls) {
+ log4(OP_SKIPPED, OP_INSERT_RANGE, offset, size);
+ goto out;
+ }
+ break;
}

switch (op) {
@@ -1244,6 +1324,22 @@ test(void)
}
do_collapse_range(offset, size);
break;
+ case OP_INSERT_RANGE:
+ TRIM_OFF(offset, file_size);
+ TRIM_LEN(file_size, size, maxfilelen);
+ offset = offset & ~(block_size - 1);
+ size = size & ~(block_size - 1);
+ if (size == 0) {
+ log4(OP_SKIPPED, OP_INSERT_RANGE, offset, size);
+ goto out;
+ }
+ if (file_size + size > maxfilelen) {
+ log4(OP_SKIPPED, OP_INSERT_RANGE, offset, size);
+ goto out;
+ }
+
+ do_insert_range(offset, size);
+ break;
default:
prterr("test: unknown operation");
report_failure(42);
@@ -1307,6 +1403,9 @@ usage(void)
#ifdef FALLOC_FL_COLLAPSE_RANGE
" -C: Do not use collapse range calls\n"
#endif
+#ifdef FALLOC_FL_INSERT_RANGE
+" -i: Do not use insert range calls\n"
+#endif
" -L: fsxLite - no file creations & no file size changes\n\
-N numops: total # operations to do (default infinity)\n\
-O: use oplen (see -o flag) for every op (default random)\n\
@@ -1493,7 +1592,7 @@ main(int argc, char **argv)

setvbuf(stdout, (char *)0, _IOLBF, 0); /* line buffered stdout */

- while ((ch = getopt(argc, argv, "b:c:dfl:m:no:p:qr:s:t:w:xyAD:FHzCLN:OP:RS:WZ"))
+ while ((ch = getopt(argc, argv, "b:c:dfl:m:no:p:qr:s:t:w:xyAD:FHzCiLN:OP:RS:WZ"))
!= EOF)
switch (ch) {
case 'b':
@@ -1599,6 +1698,9 @@ main(int argc, char **argv)
case 'C':
collapse_range_calls = 0;
break;
+ case 'i':
+ insert_range_calls = 0;
+ break;
case 'L':
lite = 1;
break;
@@ -1758,6 +1860,8 @@ main(int argc, char **argv)
zero_range_calls = test_fallocate(FALLOC_FL_ZERO_RANGE);
if (collapse_range_calls)
collapse_range_calls = test_fallocate(FALLOC_FL_COLLAPSE_RANGE);
+ if (insert_range_calls)
+ insert_range_calls = test_fallocate(FALLOC_FL_INSERT_RANGE);

while (numops == -1 || numops--)
test();
--
1.7.9.5


2015-02-16 15:47:57

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH RESEND 10/12] xfstests: fsstress: Add fallocate insert range operation

From: Namjae Jeon <[email protected]>

This commit adds insert operation support for fsstress, which is
meant to exercise fallocate FALLOC_FL_INSERT_RANGE support.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Ashish Sangwan <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
---
ltp/fsstress.c | 19 ++++++++++++++++---
src/global.h | 4 ++++
2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/ltp/fsstress.c b/ltp/fsstress.c
index b56fe5c..aa3e0c3 100644
--- a/ltp/fsstress.c
+++ b/ltp/fsstress.c
@@ -72,6 +72,7 @@ typedef enum {
OP_PUNCH,
OP_ZERO,
OP_COLLAPSE,
+ OP_INSERT,
OP_READ,
OP_READLINK,
OP_RENAME,
@@ -170,6 +171,7 @@ void mknod_f(int, long);
void punch_f(int, long);
void zero_f(int, long);
void collapse_f(int, long);
+void insert_f(int, long);
void read_f(int, long);
void readlink_f(int, long);
void rename_f(int, long);
@@ -209,6 +211,7 @@ opdesc_t ops[] = {
{ OP_PUNCH, "punch", punch_f, 1, 1 },
{ OP_ZERO, "zero", zero_f, 1, 1 },
{ OP_COLLAPSE, "collapse", collapse_f, 1, 1 },
+ { OP_INSERT, "insert", insert_f, 1, 1 },
{ OP_READ, "read", read_f, 1, 0 },
{ OP_READLINK, "readlink", readlink_f, 1, 0 },
{ OP_RENAME, "rename", rename_f, 2, 1 },
@@ -2176,6 +2179,7 @@ struct print_flags falloc_flags [] = {
{ FALLOC_FL_NO_HIDE_STALE, "NO_HIDE_STALE"},
{ FALLOC_FL_COLLAPSE_RANGE, "COLLAPSE_RANGE"},
{ FALLOC_FL_ZERO_RANGE, "ZERO_RANGE"},
+ { FALLOC_FL_INSERT_RANGE, "INSERT_RANGE"},
{ -1, NULL}
};

@@ -2227,10 +2231,11 @@ do_fallocate(int opno, long r, int mode)
off %= maxfsize;
len = (off64_t)(random() % (1024 * 1024));
/*
- * Collapse range requires off and len to be block aligned, make it
- * more likely to be the case.
+ * Collapse/insert range requires off and len to be block aligned,
+ * make it more likely to be the case.
*/
- if ((mode & FALLOC_FL_COLLAPSE_RANGE) && (opno % 2)) {
+ if ((mode & (FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_INSERT_RANGE)) &&
+ (opno % 2)) {
off = ((off + stb.st_blksize - 1) & ~(stb.st_blksize - 1));
len = ((len + stb.st_blksize - 1) & ~(stb.st_blksize - 1));
}
@@ -2656,6 +2661,14 @@ collapse_f(int opno, long r)
}

void
+insert_f(int opno, long r)
+{
+#ifdef HAVE_LINUX_FALLOC_H
+ do_fallocate(opno, r, FALLOC_FL_INSERT_RANGE);
+#endif
+}
+
+void
read_f(int opno, long r)
{
char *buf;
diff --git a/src/global.h b/src/global.h
index 8180f66..f63246b 100644
--- a/src/global.h
+++ b/src/global.h
@@ -172,6 +172,10 @@
#define FALLOC_FL_ZERO_RANGE 0x10
#endif

+#ifndef FALLOC_FL_INSERT_RANGE
+#define FALLOC_FL_INSERT_RANGE 0x20
+#endif
+
#endif /* HAVE_LINUX_FALLOC_H */

#endif /* GLOBAL_H */
--
1.7.9.5

2015-02-16 15:47:59

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH RESEND 12/12] manpage: update FALLOC_FL_INSERT_RANGE flag in fallocate

From: Namjae Jeon <[email protected]>

Update FALLOC_FL_INSERT_RANGE flag in fallocate.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Ashish Sangwan <[email protected]>
---
man2/fallocate.2 | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 82 insertions(+), 6 deletions(-)

diff --git a/man2/fallocate.2 b/man2/fallocate.2
index adf42db..9b3c460 100644
--- a/man2/fallocate.2
+++ b/man2/fallocate.2
@@ -8,7 +8,7 @@
.\" 2011-09-19: Added FALLOC_FL_PUNCH_HOLE
.\" 2011-09-19: Substantial restructuring of the page
.\"
-.TH FALLOCATE 2 2015-01-22 "Linux" "Linux Programmer's Manual"
+.TH FALLOCATE 2 2015-02-14 "Linux" "Linux Programmer's Manual"
.SH NAME
fallocate \- manipulate file space
.SH SYNOPSIS
@@ -225,6 +225,56 @@ XFS (since Linux 3.14)
.IP *
ext4, for extent-based files (since Linux 3.14)
.\" commit b8a8684502a0fc852afa0056c6bb2a9273f6fcc0
+.SS Increasing file space
+.\" TODO: Mention commit id and supporting Linux version
+Specifying the
+.BR FALLOC_FL_INSERT_RANGE
+flag in
+.I mode
+will increase the file space by inserting a hole within the file size without
+overwriting any existing data. The hole will start at
+.I offset
+and continue for
+.I len
+bytes. For inserting hole inside file, the contents of the file starting at
+.I offset
+will be shifted towards right by
+.I len
+bytes. Inserting a hole inside the file will increase the file size by
+.I len
+bytes.
+
+This mode has the same limitation as
+.BR FALLOC_FL_COLLAPSE_RANGE
+regarding the
+granularity of the operation.
+If the granulrity requirements are not met,
+.BR fallocate ()
+will fail with the error
+.BR EINVAL.
+If the
+.I offset
+overlaps with end of file OR if it is greater than end of file, an error is
+returned. For such type of operations, i.e. inserting a hole at the end of
+file,
+.BR ftruncate(2)
+should be used.
+In case
+.IR offset + len
+exceeds the maximum file size, errno will be set to
+.B EFBIG.
+
+No other flags may be specified in
+.IR mode
+in conjunction with
+.BR FALLOC_FL_INSERT_RANGE .
+
+As of Linux XXXX,
+.\" TODO: Mention commit id and supporting Linux version
+.B FALLOC_FL_INSERT_RANGE
+is supported by
+ext4 (only for extent-based files) and XFS.
+
.SH RETURN VALUE
On success,
.BR fallocate ()
@@ -242,6 +292,12 @@ is not a valid file descriptor, or is not opened for writing.
.IR offset + len
exceeds the maximum file size.
.TP
+.B EFBIG
+.I mode
+is
+.BR FALLOC_FL_INSERT_RANGE ,
+the current file size+len excceds the maximum file size.
+.TP
.B EINTR
A signal was caught during execution.
.TP
@@ -270,7 +326,17 @@ reaches or passes the end of the file.
.B EINVAL
.I mode
is
-.BR FALLOC_FL_COLLAPSE_RANGE ,
+.BR FALLOC_FL_INSERT_RANGE
+and the range specified by
+.I offset
+reaches or passes the end of the file.
+.TP
+.B EINVAL
+.I mode
+is
+.BR FALLOC_FL_COLLAPSE_RANGE
+or
+.BR FALLOC_FL_INSERT_RANGE ,
but either
.I offset
or
@@ -279,18 +345,24 @@ is not a multiple of the filesystem block size.
.TP
.B EINVAL
.I mode
-contains both
+contains either of
.B FALLOC_FL_COLLAPSE_RANGE
+or
+.B FALLOC_FL_INSERT_RANGE
and other flags;
no other flags are permitted with
-.BR FALLOC_FL_COLLAPSE_RANGE .
+.BR FALLOC_FL_COLLAPSE_RANGE
+or
+.BR FALLOC_FL_INSERT_RANGE .
.TP
.B EINVAL
.I mode
is
.BR FALLOC_FL_COLLAPSE_RANGE
or
-.BR FALLOC_FL_ZERO_RANGE ,
+.BR FALLOC_FL_ZERO_RANGE
+or
+.BR FALLOC_FL_INSERT_RANGE ,
but the file referred to by
.I fd
is not a regular file.
@@ -342,6 +414,8 @@ specifies
.BR FALLOC_FL_PUNCH_HOLE
or
.BR FALLOC_FL_COLLAPSE_RANGE
+or
+.BR FALLOC_FL_INSERT_RANGE
and
the file referred to by
.I fd
@@ -360,7 +434,9 @@ refers to a pipe or FIFO.
.B ETXTBSY
.I mode
specifies
-.BR FALLOC_FL_COLLAPSE_RANGE ,
+.BR FALLOC_FL_COLLAPSE_RANGE
+or
+.BR FALLOC_FL_INSERT_RANGE ,
but the file referred to by
.IR fd
is currently being executed.
--
1.7.9.5

2015-02-16 15:47:55

by Namjae Jeon

[permalink] [raw]
Subject: [PATCH RESEND 8/12] xfstests: generic/045: Delayed allocation multi insert

From: Namjae Jeon <[email protected]>

This testcase(045) tries to test various corner cases with delayed extents and
pre-existing holes for finsert range functionality over different type of
extents.

Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Ashish Sangwan <[email protected]>
---
tests/generic/045 | 65 ++++++++++++++++++++++++++++++++++++++++
tests/generic/045.out | 80 +++++++++++++++++++++++++++++++++++++++++++++++++
tests/generic/group | 1 +
3 files changed, 146 insertions(+)
create mode 100644 tests/generic/045
create mode 100644 tests/generic/045.out

diff --git a/tests/generic/045 b/tests/generic/045
new file mode 100644
index 0000000..de4de02
--- /dev/null
+++ b/tests/generic/045
@@ -0,0 +1,65 @@
+#! /bin/bash
+# FS QA Test No. generic/045
+#
+# Delayed allocation multi insert range tests
+# This testcase is one of the 4 testcases which tries to
+# test various corner cases for finsert range functionality over different
+# type of extents. These tests are based on generic/255 test case.
+# For the type of tests, check the description of _test_generic_punch
+# in common/rc.
+#-----------------------------------------------------------------------
+# Copyright (c) 2015 Samsung Electronics. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+#
+#-----------------------------------------------------------------------
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+
+_cleanup()
+{
+ rm -f $tmp.*
+}
+
+trap "_cleanup ; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+# we need to include common/punch to get defination fo filter functions
+. ./common/rc
+. ./common/filter
+. ./common/punch
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+
+_require_xfs_io_command "fpunch"
+_require_xfs_io_command "falloc"
+_require_xfs_io_command "fiemap"
+_require_xfs_io_command "finsert"
+
+testfile=$TEST_DIR/$seq.$$
+
+_test_generic_punch -d -k falloc fpunch finsert fiemap _filter_hole_fiemap $testfile
+_check_test_fs
+
+status=0
+exit
diff --git a/tests/generic/045.out b/tests/generic/045.out
new file mode 100644
index 0000000..5bfd760
--- /dev/null
+++ b/tests/generic/045.out
@@ -0,0 +1,80 @@
+QA output created by 045
+ 1. into a hole
+cf845a781c107ec1346e849c9dd1b7e8
+ 2. into allocated space
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..55]: extent
+64e72217eebcbdf31b1b058f9f5f476a
+ 3. into unwritten space
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..55]: extent
+22b7303d274481990b5401b6263effe0
+ 4. hole -> data
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..55]: extent
+c4fef62ba1de9d91a977cfeec6632f19
+ 5. hole -> unwritten
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..55]: extent
+1ca74f7572a0f4ab477fdbb5682e5f61
+ 6. data -> hole
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..31]: extent
+3: [32..47]: hole
+4: [48..55]: extent
+be0f35d4292a20040766d87883b0abd1
+ 7. data -> unwritten
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..47]: extent
+3: [48..55]: hole
+bddb1f3895268acce30d516a99cb0f2f
+ 8. unwritten -> hole
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..31]: extent
+3: [32..39]: hole
+4: [40..55]: extent
+f8fc47adc45b7cf72f988b3ddf5bff64
+ 9. unwritten -> data
+0: [0..7]: extent
+1: [8..23]: hole
+2: [24..47]: extent
+3: [48..55]: hole
+c4fef62ba1de9d91a977cfeec6632f19
+ 10. hole -> data -> hole
+0: [0..7]: extent
+1: [8..39]: hole
+2: [40..63]: extent
+52af1bfcbf43f28af2328de32e0567e5
+ 11. data -> hole -> data
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..39]: extent
+3: [40..47]: hole
+4: [48..63]: extent
+e3a8d52acc4d91a8ed19d7b6f4f26a71
+ 12. unwritten -> data -> unwritten
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..63]: extent
+52af1bfcbf43f28af2328de32e0567e5
+ 13. data -> unwritten -> data
+0: [0..7]: extent
+1: [8..31]: hole
+2: [32..63]: extent
+2b22165f4a24a2c36fd05ef00b41df88
+ 14. data -> hole @ EOF
+0: [0..23]: extent
+1: [24..39]: hole
+2: [40..55]: extent
+aa0f20d1edcdbce60d8ef82700ba30c3
+ 15. data -> hole @ 0
+0: [0..15]: hole
+1: [16..55]: extent
+86c9d033be2761385c9cfa203c426bb2
diff --git a/tests/generic/group b/tests/generic/group
index 70444a3..772f910 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -47,6 +47,7 @@
042 auto quick prealloc
043 auto quick prealloc
044 auto quick prealloc
+045 auto quick prealloc
053 acl repair auto quick
062 attr udf auto quick
068 other auto freeze dangerous stress
--
1.7.9.5

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2015-02-16 23:53:50

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH RESEND 1/12] fs: Add support FALLOC_FL_INSERT_RANGE for fallocate

On Tue, Feb 17, 2015 at 12:47:48AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <[email protected]>
>
> FALLOC_FL_INSERT_RANGE command is the opposite command of
> FALLOC_FL_COLLAPSE_RANGE that is needed for advertisers or someone who want to
> add some data in the middle of file. FALLOC_FL_INSERT_RANGE will create space
> for writing new data within a file after shifting extents to right as given
> length. and this command also has same limitation as FALLOC_FL_COLLAPSE_RANGE,
> that is block boundary and use ftruncate(2) for crosses EOF.
>
> Signed-off-by: Namjae Jeon <[email protected]>
> Signed-off-by: Ashish Sangwan <[email protected]>
> Cc: Brian Foster<[email protected]>
> ---
> fs/open.c | 8 +++++++-
> include/uapi/linux/falloc.h | 17 +++++++++++++++++
> 2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/fs/open.c b/fs/open.c
> index 813be03..762fb45 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -232,7 +232,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
>
> /* Return error if mode is not supported */
> if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> - FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
> + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
> + FALLOC_FL_INSERT_RANGE))
> return -EOPNOTSUPP;

Can we create a FALLOC_FL_SUPPORTED_MASK define in falloc.h
so that we only need to add new flags to the mask in rather than
change this code every time we add a new flag?

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-02-17 00:54:08

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH RESEND 2/12] xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate

On Tue, Feb 17, 2015 at 12:47:49AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <[email protected]>
>
> This patch implements fallocate's FALLOC_FL_INSERT_RANGE for XFS.
>
> 1) Make sure that both offset and len are block size aligned.
> 2) Update the i_size of inode by len bytes.
> 3) Compute the file's logical block number against offset. If the computed
> block number is not the starting block of the extent, split the extent
> such that the block number is the starting block of the extent.
> 4) Shift all the extents which are lying bewteen [offset, last allocated extent]
> towards right by len bytes. This step will make a hole of len bytes
> at offset.
>
> Signed-off-by: Namjae Jeon <[email protected]>
> Signed-off-by: Ashish Sangwan <[email protected]>
> Reviewed-by: Brian Foster <[email protected]>
> ---
> fs/xfs/libxfs/xfs_bmap.c | 358 ++++++++++++++++++++++++++++++++++++++++------
> fs/xfs/libxfs/xfs_bmap.h | 13 +-
> fs/xfs/xfs_bmap_util.c | 126 +++++++++++-----
> fs/xfs/xfs_bmap_util.h | 2 +
> fs/xfs/xfs_file.c | 38 ++++-
> fs/xfs/xfs_trace.h | 1 +
> 6 files changed, 455 insertions(+), 83 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 61ec015..6699e53 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -5518,50 +5518,86 @@ xfs_bmse_shift_one(
> int *current_ext,
> struct xfs_bmbt_rec_host *gotp,
> struct xfs_btree_cur *cur,
> - int *logflags)
> + int *logflags,
> + enum SHIFT_DIRECTION SHIFT)

Please don't shout. ;)

Lower case for types and variables, upper case for the enum values.
I also think the "shift" variable should be named "direction",
too, so the code reads "if (direction == SHIFT_LEFT)" and so is
clearly self documenting...

(only commenting once on this, please change it in other places)
as well ;)

> {
> struct xfs_ifork *ifp;
> xfs_fileoff_t startoff;
> - struct xfs_bmbt_rec_host *leftp;
> + struct xfs_bmbt_rec_host *contp;
> struct xfs_bmbt_irec got;
> - struct xfs_bmbt_irec left;
> + struct xfs_bmbt_irec cont;

Not sure what "cont" is short for. It's used as the "adjacent
extent" record, so that would be a better name IMO.

> int error;
> int i;
> + int total_extents;
>
> ifp = XFS_IFORK_PTR(ip, whichfork);
> + total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
>
> xfs_bmbt_get_all(gotp, &got);
> - startoff = got.br_startoff - offset_shift_fsb;
>
> /* delalloc extents should be prevented by caller */
> XFS_WANT_CORRUPTED_RETURN(!isnullstartblock(got.br_startblock));
>
> - /*
> - * Check for merge if we've got an extent to the left, otherwise make
> - * sure there's enough room at the start of the file for the shift.
> - */
> - if (*current_ext) {
> - /* grab the left extent and check for a large enough hole */
> - leftp = xfs_iext_get_ext(ifp, *current_ext - 1);
> - xfs_bmbt_get_all(leftp, &left);
> + if (SHIFT == SHIFT_LEFT) {
> + startoff = got.br_startoff - offset_shift_fsb;
>
> - if (startoff < left.br_startoff + left.br_blockcount)
> + /*
> + * Check for merge if we've got an extent to the left,
> + * otherwise make sure there's enough room at the start
> + * of the file for the shift.
> + */
> + if (*current_ext) {
> + /*
> + * grab the left extent and check for a large
> + * enough hole.
> + */
> + contp = xfs_iext_get_ext(ifp, *current_ext - 1);
> + xfs_bmbt_get_all(contp, &cont);
> +
> + if (startoff < cont.br_startoff + cont.br_blockcount)
> + return -EINVAL;
> +
> + /* check whether to merge the extent or shift it down */
> + if (xfs_bmse_can_merge(&cont, &got, offset_shift_fsb)) {
> + return xfs_bmse_merge(ip, whichfork,
> + offset_shift_fsb,
> + *current_ext, gotp, contp,
> + cur, logflags);
> + }
> + } else if (got.br_startoff < offset_shift_fsb)
> return -EINVAL;

This would be better written:

if (!*current_ext) {
if (got.br_startoff < offset_shift_fsb)
return -EINVAL;
goto update_current_ext;
}

and then the rest of the code in the shift left branch can drop a
level of indent and hence become less congested and easier to read.


> + } else {
> + startoff = got.br_startoff + offset_shift_fsb;
> + /*
> + * If this is not the last extent in the file, make sure there's
> + * enough room between current extent and next extent for
> + * accommodating the shift.
> + */
> + if (*current_ext < (total_extents - 1)) {
> + contp = xfs_iext_get_ext(ifp, *current_ext + 1);
> + xfs_bmbt_get_all(contp, &cont);
> + if (startoff + got.br_blockcount > cont.br_startoff)
> + return -EINVAL;
>
> - /* check whether to merge the extent or shift it down */
> - if (xfs_bmse_can_merge(&left, &got, offset_shift_fsb)) {
> - return xfs_bmse_merge(ip, whichfork, offset_shift_fsb,
> - *current_ext, gotp, leftp, cur,
> - logflags);
> + /*
> + * Unlike a left shift (which involves a hole punch),
> + * a right shift does not modify extent neighbors
> + * in any way. We should never find mergeable extents
> + * in this scenario. Check anyways and warn if we
> + * encounter two extents that could be one.
> + */
> + if (xfs_bmse_can_merge(&got, &cont, offset_shift_fsb))
> + WARN_ON_ONCE(1);
> }

Similarly:
/* nothing to move if this is the last extent */
if (*current_ext >= total_extents)
goto update_current_ext;

> - } else if (got.br_startoff < offset_shift_fsb)
> - return -EINVAL;
> -
> + }
> /*
> * Increment the extent index for the next iteration, update the start
> * offset of the in-core extent and update the btree if applicable.
> */
> - (*current_ext)++;

update_current_ext:
> + if (SHIFT == SHIFT_LEFT)
> + (*current_ext)++;
> + else
> + (*current_ext)--;
> xfs_bmbt_set_startoff(gotp, startoff);
> *logflags |= XFS_ILOG_CORE;
> if (!cur) {
> @@ -5581,10 +5617,10 @@ xfs_bmse_shift_one(
> }
>
> /*
> - * Shift extent records to the left to cover a hole.
> + * Shift extent records to the left/right to cover/create a hole.
> *
> * The maximum number of extents to be shifted in a single operation is
> - * @num_exts. @start_fsb specifies the file offset to start the shift and the
> + * @num_exts. @stop_fsb specifies the file offset at which to stop shift and the
> * file offset where we've left off is returned in @next_fsb. @offset_shift_fsb
> * is the length by which each extent is shifted. If there is no hole to shift
> * the extents into, this will be considered invalid operation and we abort
> @@ -5594,12 +5630,13 @@ int
> xfs_bmap_shift_extents(
> struct xfs_trans *tp,
> struct xfs_inode *ip,
> - xfs_fileoff_t start_fsb,
> + xfs_fileoff_t *next_fsb,
> xfs_fileoff_t offset_shift_fsb,
> int *done,
> - xfs_fileoff_t *next_fsb,
> + xfs_fileoff_t stop_fsb,
> xfs_fsblock_t *firstblock,
> struct xfs_bmap_free *flist,
> + enum SHIFT_DIRECTION SHIFT,
> int num_exts)
> {
> struct xfs_btree_cur *cur = NULL;
> @@ -5609,10 +5646,11 @@ xfs_bmap_shift_extents(
> struct xfs_ifork *ifp;
> xfs_extnum_t nexts = 0;
> xfs_extnum_t current_ext;
> + xfs_extnum_t total_extents;
> + xfs_extnum_t stop_extent;
> int error = 0;
> int whichfork = XFS_DATA_FORK;
> int logflags = 0;
> - int total_extents;
>
> if (unlikely(XFS_TEST_ERROR(
> (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> @@ -5628,6 +5666,7 @@ xfs_bmap_shift_extents(
>
> ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> + ASSERT(SHIFT == SHIFT_LEFT || SHIFT == SHIFT_RIGHT);
>
> ifp = XFS_IFORK_PTR(ip, whichfork);
> if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> @@ -5645,43 +5684,85 @@ xfs_bmap_shift_extents(
> }
>
> /*
> + * There may be delalloc extents in the data fork before the range we
> + * are collapsing out, so we cannot use the count of real extents here.
> + * Instead we have to calculate it from the incore fork.
> + */
> + total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> + if (total_extents == 0) {
> + *done = 1;
> + goto del_cursor;
> + }
> +
> + /*
> + * In case of first right shift, we need to initialize next_fsb
> + */
> + if (*next_fsb == NULLFSBLOCK) {
> + ASSERT(SHIFT == SHIFT_RIGHT);

This should be at the top of the function. i.e.

ASSERT(*next_fsb != NULLFSBLOCK || direction == SHIFT_RIGHT)

> + gotp = xfs_iext_get_ext(ifp, total_extents - 1);
> + xfs_bmbt_get_all(gotp, &got);
> + *next_fsb = got.br_startoff;
> + if (stop_fsb > *next_fsb) {
> + *done = 1;
> + goto del_cursor;
> + }
> + }
> +
> + /* Lookup the extent index at which we have to stop */
> + if (SHIFT == SHIFT_RIGHT) {
> + gotp = xfs_iext_bno_to_ext(ifp, stop_fsb, &stop_extent);
> + /* Make stop_extent exclusive of shift range */
> + stop_extent--;
> + } else
> + stop_extent = total_extents;
> +
> + /*
> * Look up the extent index for the fsb where we start shifting. We can
> * henceforth iterate with current_ext as extent list changes are locked
> * out via ilock.
> *
> * gotp can be null in 2 cases: 1) if there are no extents or 2)
> - * start_fsb lies in a hole beyond which there are no extents. Either
> + * *next_fsb lies in a hole beyond which there are no extents. Either
> * way, we are done.
> */
> - gotp = xfs_iext_bno_to_ext(ifp, start_fsb, &current_ext);
> + gotp = xfs_iext_bno_to_ext(ifp, *next_fsb, &current_ext);
> if (!gotp) {
> *done = 1;
> goto del_cursor;
> }
>
> - /*
> - * There may be delalloc extents in the data fork before the range we
> - * are collapsing out, so we cannot use the count of real extents here.
> - * Instead we have to calculate it from the incore fork.
> - */
> - total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> - while (nexts++ < num_exts && current_ext < total_extents) {
> + /* some sanity checking before we finally start shifting extents */
> + if ((SHIFT == SHIFT_LEFT && current_ext >= stop_extent) ||
> + (SHIFT == SHIFT_RIGHT && current_ext <= stop_extent)) {
> + error = EIO;

error = -EIO;

> + goto del_cursor;
> + }
> +
> + while (nexts++ < num_exts) {
> error = xfs_bmse_shift_one(ip, whichfork, offset_shift_fsb,
> - &current_ext, gotp, cur, &logflags);
> + &current_ext, gotp, cur, &logflags,
> + SHIFT);
> if (error)
> goto del_cursor;
> + /*
> + * In case there was an extent merge after shifting extent,
> + * extent numbers would change.
> + * Update total extent count and grab the next record.
> + */

/*
* If there was an extent merge during the shift, the extent
* count can change. Update the total and grade the next record.
*/

> + if (SHIFT == SHIFT_LEFT) {
> + total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> + stop_extent = total_extents;
> + }
>
> - /* update total extent count and grab the next record */
> - total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> - if (current_ext >= total_extents)
> + if (current_ext == stop_extent) {
> + *done = 1;
> + *next_fsb = NULLFSBLOCK;
> break;
> + }
> gotp = xfs_iext_get_ext(ifp, current_ext);
> }
>
> - /* Check if we are done */
> - if (current_ext == total_extents) {
> - *done = 1;
> - } else if (next_fsb) {
> + if (!*done) {
> xfs_bmbt_get_all(gotp, &got);
> *next_fsb = got.br_startoff;
> }
> @@ -5696,3 +5777,192 @@ del_cursor:
>
> return error;
> }
> +
> +/*
> + * Splits an extent into two extents at split_fsb block that it is
> + * the first block of the current_ext. @current_ext is a target extent
> + * to be split. @split_fsb is a block where the extents is split.
> + * If split_fsb lies in a hole or the first block of extents, just return 0.
> + */
> +STATIC int
> +xfs_bmap_split_extent_at(
> + struct xfs_trans *tp,
> + struct xfs_inode *ip,
> + xfs_fileoff_t split_fsb,
> + xfs_fsblock_t *firstfsb,
> + struct xfs_bmap_free *free_list)
> +{
> + int whichfork = XFS_DATA_FORK;
> + struct xfs_btree_cur *cur = NULL;
> + struct xfs_bmbt_rec_host *gotp;
> + struct xfs_bmbt_irec got;
> + struct xfs_bmbt_irec new; /* split extent */
> + struct xfs_mount *mp = ip->i_mount;
> + struct xfs_ifork *ifp;
> + xfs_fsblock_t gotblkcnt; /* new block count for got */
> + xfs_extnum_t current_ext;
> + int error = 0;
> + int logflags = 0;
> + int i = 0;
> +
> + if (unlikely(XFS_TEST_ERROR(
> + (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> + XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
> + mp, XFS_ERRTAG_BMAPIFORMAT, XFS_RANDOM_BMAPIFORMAT))) {
> + XFS_ERROR_REPORT("xfs_bmap_split_extent_at",
> + XFS_ERRLEVEL_LOW, mp);
> + return -EFSCORRUPTED;
> + }
> +
> + if (XFS_FORCED_SHUTDOWN(mp))
> + return -EIO;
> +
> + ifp = XFS_IFORK_PTR(ip, whichfork);
> + if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> + /* Read in all the extents */
> + error = xfs_iread_extents(tp, ip, whichfork);
> + if (error)
> + return error;
> + }
> +
> + gotp = xfs_iext_bno_to_ext(ifp, split_fsb, &current_ext);
> + /*
> + * gotp can be null in 2 cases: 1) if there are no extents
> + * or 2) split_fsb lies in a hole beyond which there are
> + * no extents. Either way, we are done.
> + */
> + if (!gotp)
> + return 0;

Comment can go before the call to xfs_iext_bno_to_ext().

> +
> + xfs_bmbt_get_all(gotp, &got);
> +
> + /*
> + * Check split_fsb lies in a hole or the start boundary offset
> + * of the extent.
> + */
> + if (got.br_startoff >= split_fsb)
> + return 0;
> +
> + gotblkcnt = split_fsb - got.br_startoff;
> + new.br_startoff = split_fsb;
> + new.br_startblock = got.br_startblock + gotblkcnt;
> + new.br_blockcount = got.br_blockcount - gotblkcnt;
> + new.br_state = got.br_state;
> +
> + if (ifp->if_flags & XFS_IFBROOT) {
> + cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
> + cur->bc_private.b.firstblock = *firstfsb;
> + cur->bc_private.b.flist = free_list;
> + cur->bc_private.b.flags = 0;
> + }
> +
> + if (cur) {

No need to close the XFS_IFBROOT branch and then check for cur;
we just allocated it inside the XFS_IFBROOT branch!

> + error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
> + got.br_startblock,
> + got.br_blockcount,
> + &i);
> + if (error)
> + goto del_cursor;
> + XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor);
> + }

....

> @@ -1427,20 +1429,23 @@ xfs_collapse_file_space(
>
> /*
> * Writeback and invalidate cache for the remainder of the file as we're
> - * about to shift down every extent from the collapse range to EOF. The
> - * free of the collapse range above might have already done some of
> - * this, but we shouldn't rely on it to do anything outside of the range
> - * that was freed.
> + * about to shift down every extent from offset to EOF.
> */
> error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
> - offset + len, -1);
> + offset, -1);
> if (error)
> return error;
> error = invalidate_inode_pages2_range(VFS_I(ip)->i_mapping,
> - (offset + len) >> PAGE_CACHE_SHIFT, -1);
> + offset >> PAGE_CACHE_SHIFT, -1);
> if (error)
> return error;
>
> + if (SHIFT == SHIFT_RIGHT) {
> + error = xfs_bmap_split_extent(ip, stop_fsb);
> + if (error)
> + return error;
> + }

This needs a comment explaining why we are splitting an extent here.

> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 1cdba95..222a91a 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -823,11 +823,13 @@ xfs_file_fallocate(
> long error;
> enum xfs_prealloc_flags flags = 0;
> loff_t new_size = 0;
> + int do_file_insert = 0;

bool rather than int.

>
> if (!S_ISREG(inode->i_mode))
> return -EINVAL;
> if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> - FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
> + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
> + FALLOC_FL_INSERT_RANGE))
> return -EOPNOTSUPP;

This should use a local define before the function such as:

#define XFS_FALLOC_FL_SUPPORTED \
(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \
FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | \
FALLOC_FL_INSERT_RANGE)

This is similar to how we define supported checks for FIEMAP
operations in xfs_vn_fiemap().

>
> xfs_ilock(ip, XFS_IOLOCK_EXCL);
> @@ -857,6 +859,28 @@ xfs_file_fallocate(
> error = xfs_collapse_file_space(ip, offset, len);
> if (error)
> goto out_unlock;
> + } else if (mode & FALLOC_FL_INSERT_RANGE) {
> + unsigned blksize_mask = (1 << inode->i_blkbits) - 1;
> +
> + if (offset & blksize_mask || len & blksize_mask) {
> + error = -EINVAL;
> + goto out_unlock;
> + }
> +
> + /* Check for wrap through zero */
> + if (inode->i_size + len > inode->i_sb->s_maxbytes) {
> + error = -EFBIG;
> + goto out_unlock;
> + }

At first I thought that was a duplicate check of what is in
vfs_fallocate() (i.e. off + len > s_maxbytes). Can you change the
comment to read something like:

/* check the new inode size does not wrap through zero */

> +
> + /* Offset should be less than i_size */
> + if (offset >= i_size_read(inode)) {
> + error = -EINVAL;
> + goto out_unlock;
> + }
> +
> + new_size = i_size_read(inode) + len;
> + do_file_insert = 1;

Why do you use inode->i_size onthe wrap check, yet i_size_read()
twice here?

> } else {
> flags |= XFS_PREALLOC_SET;
>
> @@ -891,8 +915,20 @@ xfs_file_fallocate(
> iattr.ia_valid = ATTR_SIZE;
> iattr.ia_size = new_size;
> error = xfs_setattr_size(ip, &iattr);
> + if (error)
> + goto out_unlock;
> }
>
> + /*
> + * Some operations are performed after the inode size is updated. For
> + * example, insert range expands the address space of the file, shifts
> + * all subsequent extents to create a hole inside the file. Updating
> + * the size first ensures that shifted extents aren't left hanging
> + * past EOF in the event of a crash or failure.
> + */

/*
* Perform hole insertion now that the file size has been
* updated so that if we crash during the operation we don't
* leave shifted extents past EOF and hence losing access to
* the data that is contained within them.
*/
> + if (do_file_insert)
> + error = xfs_insert_file_space(ip, offset, len);
> +
> out_unlock:
> xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> return error;

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-02-17 01:00:33

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH RESEND 11/12] xfstests: fsx: Add fallocate insert range operation

On Tue, Feb 17, 2015 at 12:47:58AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <[email protected]>
>
> This commit adds fallocate FALLOC_FL_INSERT_RANGE support for fsx.
>
> Signed-off-by: Namjae Jeon <[email protected]>
> Signed-off-by: Ashish Sangwan <[email protected]>
> Reviewed-by: Brian Foster <[email protected]>
> ---
> ltp/fsx.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 114 insertions(+), 10 deletions(-)
.....
> @@ -339,6 +341,14 @@ logdump(void)
> lp->args[0] + lp->args[1])
> prt("\t******CCCC");
> break;
> + case OP_INSERT_RANGE:
> + prt("INSERT 0x%x thru 0x%x\t(0x%x bytes)",
> + lp->args[0], lp->args[0] + lp->args[1] - 1,
> + lp->args[1]);
> + if (badoff >= lp->args[0] && badoff <
> + lp->args[0] + lp->args[1])
> + prt("\t******CCCC");

Probably should output "*****IIII" so we can distinguish it from
collapse operations easily.

> @@ -1307,6 +1403,9 @@ usage(void)
> #ifdef FALLOC_FL_COLLAPSE_RANGE
> " -C: Do not use collapse range calls\n"
> #endif
> +#ifdef FALLOC_FL_INSERT_RANGE
> +" -i: Do not use insert range calls\n"
> +#endif

I'd make that "-I" rather than "-i" so it matches with the "-C" of
collapse range.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-02-17 01:02:20

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH RESEND 3/12] ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate

On Tue, Feb 17, 2015 at 12:47:50AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <[email protected]>
>
> This patch implements fallocate's FALLOC_FL_INSERT_RANGE for Ext4.
>
> 1) Make sure that both offset and len are block size aligned.
> 2) Update the i_size of inode by len bytes.
> 3) Compute the file's logical block number against offset. If the computed
> block number is not the starting block of the extent, split the extent
> such that the block number is the starting block of the extent.
> 4) Shift all the extents which are lying bewteen [offset, last allocated extent]
> towards right by len bytes. This step will make a hole of len bytes
> at offset.
>
> Signed-off-by: Namjae Jeon <[email protected]>
> Signed-off-by: Ashish Sangwan <[email protected]>

I'll leave this for the ext4 folk to review. If I don't get a review
by the time we're ready to merge the VFS and XFS code, then I'll
leave it out and let Ted merge it inhis own time.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-02-17 01:43:52

by Namjae Jeon

[permalink] [raw]
Subject: RE: [PATCH RESEND 11/12] xfstests: fsx: Add fallocate insert range operation

>
> On Tue, Feb 17, 2015 at 12:47:58AM +0900, Namjae Jeon wrote:
> > From: Namjae Jeon <[email protected]>
> >
> > This commit adds fallocate FALLOC_FL_INSERT_RANGE support for fsx.
> >
> > Signed-off-by: Namjae Jeon <[email protected]>
> > Signed-off-by: Ashish Sangwan <[email protected]>
> > Reviewed-by: Brian Foster <[email protected]>
> > ---
> > ltp/fsx.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
> > 1 file changed, 114 insertions(+), 10 deletions(-)
> .....
> > @@ -339,6 +341,14 @@ logdump(void)
> > lp->args[0] + lp->args[1])
> > prt("\t******CCCC");
> > break;
> > + case OP_INSERT_RANGE:
> > + prt("INSERT 0x%x thru 0x%x\t(0x%x bytes)",
> > + lp->args[0], lp->args[0] + lp->args[1] - 1,
> > + lp->args[1]);
> > + if (badoff >= lp->args[0] && badoff <
> > + lp->args[0] + lp->args[1])
> > + prt("\t******CCCC");
>
Hi Dave,
> Probably should output "*****IIII" so we can distinguish it from
> collapse operations easily.
Right. I will change it.
>
> > @@ -1307,6 +1403,9 @@ usage(void)
> > #ifdef FALLOC_FL_COLLAPSE_RANGE
> > " -C: Do not use collapse range calls\n"
> > #endif
> > +#ifdef FALLOC_FL_INSERT_RANGE
> > +" -i: Do not use insert range calls\n"
> > +#endif
>
> I'd make that "-I" rather than "-i" so it matches with the "-C" of
> collapse range.
Okay.

Thanks for your review!
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]

2015-02-17 01:47:10

by Namjae Jeon

[permalink] [raw]
Subject: RE: [PATCH RESEND 2/12] xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate

Hi Dave,

I did totally check your review points.
I will share the patch soon.

Thanks for your review!

> On Tue, Feb 17, 2015 at 12:47:49AM +0900, Namjae Jeon wrote:
> > From: Namjae Jeon <[email protected]>
> >
> > This patch implements fallocate's FALLOC_FL_INSERT_RANGE for XFS.
> >
> > 1) Make sure that both offset and len are block size aligned.
> > 2) Update the i_size of inode by len bytes.
> > 3) Compute the file's logical block number against offset. If the computed
> > block number is not the starting block of the extent, split the extent
> > such that the block number is the starting block of the extent.
> > 4) Shift all the extents which are lying bewteen [offset, last allocated extent]
> > towards right by len bytes. This step will make a hole of len bytes
> > at offset.
> >
> > Signed-off-by: Namjae Jeon <[email protected]>
> > Signed-off-by: Ashish Sangwan <[email protected]>
> > Reviewed-by: Brian Foster <[email protected]>
> > ---
> > fs/xfs/libxfs/xfs_bmap.c | 358 ++++++++++++++++++++++++++++++++++++++++------
> > fs/xfs/libxfs/xfs_bmap.h | 13 +-
> > fs/xfs/xfs_bmap_util.c | 126 +++++++++++-----
> > fs/xfs/xfs_bmap_util.h | 2 +
> > fs/xfs/xfs_file.c | 38 ++++-
> > fs/xfs/xfs_trace.h | 1 +
> > 6 files changed, 455 insertions(+), 83 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 61ec015..6699e53 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -5518,50 +5518,86 @@ xfs_bmse_shift_one(
> > int *current_ext,
> > struct xfs_bmbt_rec_host *gotp,
> > struct xfs_btree_cur *cur,
> > - int *logflags)
> > + int *logflags,
> > + enum SHIFT_DIRECTION SHIFT)
>
> Please don't shout. ;)
>
> Lower case for types and variables, upper case for the enum values.
> I also think the "shift" variable should be named "direction",
> too, so the code reads "if (direction == SHIFT_LEFT)" and so is
> clearly self documenting...
>
> (only commenting once on this, please change it in other places)
> as well ;)
>
> > {
> > struct xfs_ifork *ifp;
> > xfs_fileoff_t startoff;
> > - struct xfs_bmbt_rec_host *leftp;
> > + struct xfs_bmbt_rec_host *contp;
> > struct xfs_bmbt_irec got;
> > - struct xfs_bmbt_irec left;
> > + struct xfs_bmbt_irec cont;
>
> Not sure what "cont" is short for. It's used as the "adjacent
> extent" record, so that would be a better name IMO.
>
> > int error;
> > int i;
> > + int total_extents;
> >
> > ifp = XFS_IFORK_PTR(ip, whichfork);
> > + total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> >
> > xfs_bmbt_get_all(gotp, &got);
> > - startoff = got.br_startoff - offset_shift_fsb;
> >
> > /* delalloc extents should be prevented by caller */
> > XFS_WANT_CORRUPTED_RETURN(!isnullstartblock(got.br_startblock));
> >
> > - /*
> > - * Check for merge if we've got an extent to the left, otherwise make
> > - * sure there's enough room at the start of the file for the shift.
> > - */
> > - if (*current_ext) {
> > - /* grab the left extent and check for a large enough hole */
> > - leftp = xfs_iext_get_ext(ifp, *current_ext - 1);
> > - xfs_bmbt_get_all(leftp, &left);
> > + if (SHIFT == SHIFT_LEFT) {
> > + startoff = got.br_startoff - offset_shift_fsb;
> >
> > - if (startoff < left.br_startoff + left.br_blockcount)
> > + /*
> > + * Check for merge if we've got an extent to the left,
> > + * otherwise make sure there's enough room at the start
> > + * of the file for the shift.
> > + */
> > + if (*current_ext) {
> > + /*
> > + * grab the left extent and check for a large
> > + * enough hole.
> > + */
> > + contp = xfs_iext_get_ext(ifp, *current_ext - 1);
> > + xfs_bmbt_get_all(contp, &cont);
> > +
> > + if (startoff < cont.br_startoff + cont.br_blockcount)
> > + return -EINVAL;
> > +
> > + /* check whether to merge the extent or shift it down */
> > + if (xfs_bmse_can_merge(&cont, &got, offset_shift_fsb)) {
> > + return xfs_bmse_merge(ip, whichfork,
> > + offset_shift_fsb,
> > + *current_ext, gotp, contp,
> > + cur, logflags);
> > + }
> > + } else if (got.br_startoff < offset_shift_fsb)
> > return -EINVAL;
>
> This would be better written:
>
> if (!*current_ext) {
> if (got.br_startoff < offset_shift_fsb)
> return -EINVAL;
> goto update_current_ext;
> }
>
> and then the rest of the code in the shift left branch can drop a
> level of indent and hence become less congested and easier to read.
>
>
> > + } else {
> > + startoff = got.br_startoff + offset_shift_fsb;
> > + /*
> > + * If this is not the last extent in the file, make sure there's
> > + * enough room between current extent and next extent for
> > + * accommodating the shift.
> > + */
> > + if (*current_ext < (total_extents - 1)) {
> > + contp = xfs_iext_get_ext(ifp, *current_ext + 1);
> > + xfs_bmbt_get_all(contp, &cont);
> > + if (startoff + got.br_blockcount > cont.br_startoff)
> > + return -EINVAL;
> >
> > - /* check whether to merge the extent or shift it down */
> > - if (xfs_bmse_can_merge(&left, &got, offset_shift_fsb)) {
> > - return xfs_bmse_merge(ip, whichfork, offset_shift_fsb,
> > - *current_ext, gotp, leftp, cur,
> > - logflags);
> > + /*
> > + * Unlike a left shift (which involves a hole punch),
> > + * a right shift does not modify extent neighbors
> > + * in any way. We should never find mergeable extents
> > + * in this scenario. Check anyways and warn if we
> > + * encounter two extents that could be one.
> > + */
> > + if (xfs_bmse_can_merge(&got, &cont, offset_shift_fsb))
> > + WARN_ON_ONCE(1);
> > }
>
> Similarly:
> /* nothing to move if this is the last extent */
> if (*current_ext >= total_extents)
> goto update_current_ext;
>
> > - } else if (got.br_startoff < offset_shift_fsb)
> > - return -EINVAL;
> > -
> > + }
> > /*
> > * Increment the extent index for the next iteration, update the start
> > * offset of the in-core extent and update the btree if applicable.
> > */
> > - (*current_ext)++;
>
> update_current_ext:
> > + if (SHIFT == SHIFT_LEFT)
> > + (*current_ext)++;
> > + else
> > + (*current_ext)--;
> > xfs_bmbt_set_startoff(gotp, startoff);
> > *logflags |= XFS_ILOG_CORE;
> > if (!cur) {
> > @@ -5581,10 +5617,10 @@ xfs_bmse_shift_one(
> > }
> >
> > /*
> > - * Shift extent records to the left to cover a hole.
> > + * Shift extent records to the left/right to cover/create a hole.
> > *
> > * The maximum number of extents to be shifted in a single operation is
> > - * @num_exts. @start_fsb specifies the file offset to start the shift and the
> > + * @num_exts. @stop_fsb specifies the file offset at which to stop shift and the
> > * file offset where we've left off is returned in @next_fsb. @offset_shift_fsb
> > * is the length by which each extent is shifted. If there is no hole to shift
> > * the extents into, this will be considered invalid operation and we abort
> > @@ -5594,12 +5630,13 @@ int
> > xfs_bmap_shift_extents(
> > struct xfs_trans *tp,
> > struct xfs_inode *ip,
> > - xfs_fileoff_t start_fsb,
> > + xfs_fileoff_t *next_fsb,
> > xfs_fileoff_t offset_shift_fsb,
> > int *done,
> > - xfs_fileoff_t *next_fsb,
> > + xfs_fileoff_t stop_fsb,
> > xfs_fsblock_t *firstblock,
> > struct xfs_bmap_free *flist,
> > + enum SHIFT_DIRECTION SHIFT,
> > int num_exts)
> > {
> > struct xfs_btree_cur *cur = NULL;
> > @@ -5609,10 +5646,11 @@ xfs_bmap_shift_extents(
> > struct xfs_ifork *ifp;
> > xfs_extnum_t nexts = 0;
> > xfs_extnum_t current_ext;
> > + xfs_extnum_t total_extents;
> > + xfs_extnum_t stop_extent;
> > int error = 0;
> > int whichfork = XFS_DATA_FORK;
> > int logflags = 0;
> > - int total_extents;
> >
> > if (unlikely(XFS_TEST_ERROR(
> > (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> > @@ -5628,6 +5666,7 @@ xfs_bmap_shift_extents(
> >
> > ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> > ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> > + ASSERT(SHIFT == SHIFT_LEFT || SHIFT == SHIFT_RIGHT);
> >
> > ifp = XFS_IFORK_PTR(ip, whichfork);
> > if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> > @@ -5645,43 +5684,85 @@ xfs_bmap_shift_extents(
> > }
> >
> > /*
> > + * There may be delalloc extents in the data fork before the range we
> > + * are collapsing out, so we cannot use the count of real extents here.
> > + * Instead we have to calculate it from the incore fork.
> > + */
> > + total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> > + if (total_extents == 0) {
> > + *done = 1;
> > + goto del_cursor;
> > + }
> > +
> > + /*
> > + * In case of first right shift, we need to initialize next_fsb
> > + */
> > + if (*next_fsb == NULLFSBLOCK) {
> > + ASSERT(SHIFT == SHIFT_RIGHT);
>
> This should be at the top of the function. i.e.
>
> ASSERT(*next_fsb != NULLFSBLOCK || direction == SHIFT_RIGHT)
>
> > + gotp = xfs_iext_get_ext(ifp, total_extents - 1);
> > + xfs_bmbt_get_all(gotp, &got);
> > + *next_fsb = got.br_startoff;
> > + if (stop_fsb > *next_fsb) {
> > + *done = 1;
> > + goto del_cursor;
> > + }
> > + }
> > +
> > + /* Lookup the extent index at which we have to stop */
> > + if (SHIFT == SHIFT_RIGHT) {
> > + gotp = xfs_iext_bno_to_ext(ifp, stop_fsb, &stop_extent);
> > + /* Make stop_extent exclusive of shift range */
> > + stop_extent--;
> > + } else
> > + stop_extent = total_extents;
> > +
> > + /*
> > * Look up the extent index for the fsb where we start shifting. We can
> > * henceforth iterate with current_ext as extent list changes are locked
> > * out via ilock.
> > *
> > * gotp can be null in 2 cases: 1) if there are no extents or 2)
> > - * start_fsb lies in a hole beyond which there are no extents. Either
> > + * *next_fsb lies in a hole beyond which there are no extents. Either
> > * way, we are done.
> > */
> > - gotp = xfs_iext_bno_to_ext(ifp, start_fsb, &current_ext);
> > + gotp = xfs_iext_bno_to_ext(ifp, *next_fsb, &current_ext);
> > if (!gotp) {
> > *done = 1;
> > goto del_cursor;
> > }
> >
> > - /*
> > - * There may be delalloc extents in the data fork before the range we
> > - * are collapsing out, so we cannot use the count of real extents here.
> > - * Instead we have to calculate it from the incore fork.
> > - */
> > - total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> > - while (nexts++ < num_exts && current_ext < total_extents) {
> > + /* some sanity checking before we finally start shifting extents */
> > + if ((SHIFT == SHIFT_LEFT && current_ext >= stop_extent) ||
> > + (SHIFT == SHIFT_RIGHT && current_ext <= stop_extent)) {
> > + error = EIO;
>
> error = -EIO;
>
> > + goto del_cursor;
> > + }
> > +
> > + while (nexts++ < num_exts) {
> > error = xfs_bmse_shift_one(ip, whichfork, offset_shift_fsb,
> > - &current_ext, gotp, cur, &logflags);
> > + &current_ext, gotp, cur, &logflags,
> > + SHIFT);
> > if (error)
> > goto del_cursor;
> > + /*
> > + * In case there was an extent merge after shifting extent,
> > + * extent numbers would change.
> > + * Update total extent count and grab the next record.
> > + */
>
> /*
> * If there was an extent merge during the shift, the extent
> * count can change. Update the total and grade the next record.
> */
>
> > + if (SHIFT == SHIFT_LEFT) {
> > + total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> > + stop_extent = total_extents;
> > + }
> >
> > - /* update total extent count and grab the next record */
> > - total_extents = ifp->if_bytes / sizeof(xfs_bmbt_rec_t);
> > - if (current_ext >= total_extents)
> > + if (current_ext == stop_extent) {
> > + *done = 1;
> > + *next_fsb = NULLFSBLOCK;
> > break;
> > + }
> > gotp = xfs_iext_get_ext(ifp, current_ext);
> > }
> >
> > - /* Check if we are done */
> > - if (current_ext == total_extents) {
> > - *done = 1;
> > - } else if (next_fsb) {
> > + if (!*done) {
> > xfs_bmbt_get_all(gotp, &got);
> > *next_fsb = got.br_startoff;
> > }
> > @@ -5696,3 +5777,192 @@ del_cursor:
> >
> > return error;
> > }
> > +
> > +/*
> > + * Splits an extent into two extents at split_fsb block that it is
> > + * the first block of the current_ext. @current_ext is a target extent
> > + * to be split. @split_fsb is a block where the extents is split.
> > + * If split_fsb lies in a hole or the first block of extents, just return 0.
> > + */
> > +STATIC int
> > +xfs_bmap_split_extent_at(
> > + struct xfs_trans *tp,
> > + struct xfs_inode *ip,
> > + xfs_fileoff_t split_fsb,
> > + xfs_fsblock_t *firstfsb,
> > + struct xfs_bmap_free *free_list)
> > +{
> > + int whichfork = XFS_DATA_FORK;
> > + struct xfs_btree_cur *cur = NULL;
> > + struct xfs_bmbt_rec_host *gotp;
> > + struct xfs_bmbt_irec got;
> > + struct xfs_bmbt_irec new; /* split extent */
> > + struct xfs_mount *mp = ip->i_mount;
> > + struct xfs_ifork *ifp;
> > + xfs_fsblock_t gotblkcnt; /* new block count for got */
> > + xfs_extnum_t current_ext;
> > + int error = 0;
> > + int logflags = 0;
> > + int i = 0;
> > +
> > + if (unlikely(XFS_TEST_ERROR(
> > + (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
> > + XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
> > + mp, XFS_ERRTAG_BMAPIFORMAT, XFS_RANDOM_BMAPIFORMAT))) {
> > + XFS_ERROR_REPORT("xfs_bmap_split_extent_at",
> > + XFS_ERRLEVEL_LOW, mp);
> > + return -EFSCORRUPTED;
> > + }
> > +
> > + if (XFS_FORCED_SHUTDOWN(mp))
> > + return -EIO;
> > +
> > + ifp = XFS_IFORK_PTR(ip, whichfork);
> > + if (!(ifp->if_flags & XFS_IFEXTENTS)) {
> > + /* Read in all the extents */
> > + error = xfs_iread_extents(tp, ip, whichfork);
> > + if (error)
> > + return error;
> > + }
> > +
> > + gotp = xfs_iext_bno_to_ext(ifp, split_fsb, &current_ext);
> > + /*
> > + * gotp can be null in 2 cases: 1) if there are no extents
> > + * or 2) split_fsb lies in a hole beyond which there are
> > + * no extents. Either way, we are done.
> > + */
> > + if (!gotp)
> > + return 0;
>
> Comment can go before the call to xfs_iext_bno_to_ext().
>
> > +
> > + xfs_bmbt_get_all(gotp, &got);
> > +
> > + /*
> > + * Check split_fsb lies in a hole or the start boundary offset
> > + * of the extent.
> > + */
> > + if (got.br_startoff >= split_fsb)
> > + return 0;
> > +
> > + gotblkcnt = split_fsb - got.br_startoff;
> > + new.br_startoff = split_fsb;
> > + new.br_startblock = got.br_startblock + gotblkcnt;
> > + new.br_blockcount = got.br_blockcount - gotblkcnt;
> > + new.br_state = got.br_state;
> > +
> > + if (ifp->if_flags & XFS_IFBROOT) {
> > + cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork);
> > + cur->bc_private.b.firstblock = *firstfsb;
> > + cur->bc_private.b.flist = free_list;
> > + cur->bc_private.b.flags = 0;
> > + }
> > +
> > + if (cur) {
>
> No need to close the XFS_IFBROOT branch and then check for cur;
> we just allocated it inside the XFS_IFBROOT branch!
>
> > + error = xfs_bmbt_lookup_eq(cur, got.br_startoff,
> > + got.br_startblock,
> > + got.br_blockcount,
> > + &i);
> > + if (error)
> > + goto del_cursor;
> > + XFS_WANT_CORRUPTED_GOTO(i == 1, del_cursor);
> > + }
>
> ....
>
> > @@ -1427,20 +1429,23 @@ xfs_collapse_file_space(
> >
> > /*
> > * Writeback and invalidate cache for the remainder of the file as we're
> > - * about to shift down every extent from the collapse range to EOF. The
> > - * free of the collapse range above might have already done some of
> > - * this, but we shouldn't rely on it to do anything outside of the range
> > - * that was freed.
> > + * about to shift down every extent from offset to EOF.
> > */
> > error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
> > - offset + len, -1);
> > + offset, -1);
> > if (error)
> > return error;
> > error = invalidate_inode_pages2_range(VFS_I(ip)->i_mapping,
> > - (offset + len) >> PAGE_CACHE_SHIFT, -1);
> > + offset >> PAGE_CACHE_SHIFT, -1);
> > if (error)
> > return error;
> >
> > + if (SHIFT == SHIFT_RIGHT) {
> > + error = xfs_bmap_split_extent(ip, stop_fsb);
> > + if (error)
> > + return error;
> > + }
>
> This needs a comment explaining why we are splitting an extent here.
>
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 1cdba95..222a91a 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -823,11 +823,13 @@ xfs_file_fallocate(
> > long error;
> > enum xfs_prealloc_flags flags = 0;
> > loff_t new_size = 0;
> > + int do_file_insert = 0;
>
> bool rather than int.
>
> >
> > if (!S_ISREG(inode->i_mode))
> > return -EINVAL;
> > if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> > - FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
> > + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
> > + FALLOC_FL_INSERT_RANGE))
> > return -EOPNOTSUPP;
>
> This should use a local define before the function such as:
>
> #define XFS_FALLOC_FL_SUPPORTED \
> (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \
> FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE | \
> FALLOC_FL_INSERT_RANGE)
>
> This is similar to how we define supported checks for FIEMAP
> operations in xfs_vn_fiemap().
>
> >
> > xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > @@ -857,6 +859,28 @@ xfs_file_fallocate(
> > error = xfs_collapse_file_space(ip, offset, len);
> > if (error)
> > goto out_unlock;
> > + } else if (mode & FALLOC_FL_INSERT_RANGE) {
> > + unsigned blksize_mask = (1 << inode->i_blkbits) - 1;
> > +
> > + if (offset & blksize_mask || len & blksize_mask) {
> > + error = -EINVAL;
> > + goto out_unlock;
> > + }
> > +
> > + /* Check for wrap through zero */
> > + if (inode->i_size + len > inode->i_sb->s_maxbytes) {
> > + error = -EFBIG;
> > + goto out_unlock;
> > + }
>
> At first I thought that was a duplicate check of what is in
> vfs_fallocate() (i.e. off + len > s_maxbytes). Can you change the
> comment to read something like:
>
> /* check the new inode size does not wrap through zero */
>
> > +
> > + /* Offset should be less than i_size */
> > + if (offset >= i_size_read(inode)) {
> > + error = -EINVAL;
> > + goto out_unlock;
> > + }
> > +
> > + new_size = i_size_read(inode) + len;
> > + do_file_insert = 1;
>
> Why do you use inode->i_size onthe wrap check, yet i_size_read()
> twice here?
>
> > } else {
> > flags |= XFS_PREALLOC_SET;
> >
> > @@ -891,8 +915,20 @@ xfs_file_fallocate(
> > iattr.ia_valid = ATTR_SIZE;
> > iattr.ia_size = new_size;
> > error = xfs_setattr_size(ip, &iattr);
> > + if (error)
> > + goto out_unlock;
> > }
> >
> > + /*
> > + * Some operations are performed after the inode size is updated. For
> > + * example, insert range expands the address space of the file, shifts
> > + * all subsequent extents to create a hole inside the file. Updating
> > + * the size first ensures that shifted extents aren't left hanging
> > + * past EOF in the event of a crash or failure.
> > + */
>
> /*
> * Perform hole insertion now that the file size has been
> * updated so that if we crash during the operation we don't
> * leave shifted extents past EOF and hence losing access to
> * the data that is contained within them.
> */
> > + if (do_file_insert)
> > + error = xfs_insert_file_space(ip, offset, len);
> > +
> > out_unlock:
> > xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> > return error;
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]

2015-02-17 01:49:37

by Namjae Jeon

[permalink] [raw]
Subject: RE: [PATCH RESEND 1/12] fs: Add support FALLOC_FL_INSERT_RANGE for fallocate

> On Tue, Feb 17, 2015 at 12:47:48AM +0900, Namjae Jeon wrote:
> > From: Namjae Jeon <[email protected]>
> >
> > FALLOC_FL_INSERT_RANGE command is the opposite command of
> > FALLOC_FL_COLLAPSE_RANGE that is needed for advertisers or someone who want to
> > add some data in the middle of file. FALLOC_FL_INSERT_RANGE will create space
> > for writing new data within a file after shifting extents to right as given
> > length. and this command also has same limitation as FALLOC_FL_COLLAPSE_RANGE,
> > that is block boundary and use ftruncate(2) for crosses EOF.
> >
> > Signed-off-by: Namjae Jeon <[email protected]>
> > Signed-off-by: Ashish Sangwan <[email protected]>
> > Cc: Brian Foster<[email protected]>
> > ---
> > fs/open.c | 8 +++++++-
> > include/uapi/linux/falloc.h | 17 +++++++++++++++++
> > 2 files changed, 24 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/open.c b/fs/open.c
> > index 813be03..762fb45 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -232,7 +232,8 @@ int vfs_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
> >
> > /* Return error if mode is not supported */
> > if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
> > - FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE))
> > + FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_ZERO_RANGE |
> > + FALLOC_FL_INSERT_RANGE))
> > return -EOPNOTSUPP;
>
> Can we create a FALLOC_FL_SUPPORTED_MASK define in falloc.h
> so that we only need to add new flags to the mask in rather than
> change this code every time we add a new flag?
Sure, I will do it. and share the patch with the others you gave me review points soon.

Thanks for review!
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]