2015-08-26 08:18:37

by Peng Tao

[permalink] [raw]
Subject: [PATCH-RFC-RESEND 0/9] NFS/NFSD: add NFSv42 CLONE operation support

[resend ccing linux-fsdevel list and dropping the two nfs cleanup patches,
which were sent to nfs list separately.]

This patchset adds NFSv42 COPY support to nfs and nfsd. As suggested by Christoph,
it pulls btrfs BTRFS_IOC_CLONE/BTRFS_IOC_CLONE_RANGE ioctls to generic layer and
adds a new file operation to call down to file systems. Then the CLONE ioctl
is implemented for NFS and NFSD.

I took a slightly different approach from Anna's NFS42 COPY implementation
(http://www.spinics.net/lists/linux-nfs/msg53009.html) mainly because CLONE
has different requirements than COPY. And the main differences between the two are:
1. well, CLONE is supported rather than COPY.
2. vfs_file_clone_range() does not expect file systems to do data copy, and thus
no rw_verify_area() required.
3. clone does not expect partial success.
4. clone does not fall back to data copy if underlying storage does not support it.
5. no new system call required. We reuse existing BTRFS_IOC_CLONE/BTRFS_IOC_CLONE_RANGE
and make it generic to all file systems.

This overlaps with Anna's COPY work. We'll deal with the conflicts after figuring out
what to do with both patchsets.

Cheers,
Tao

Anna Schumaker (2):
nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
NFSD: Implement the CLONE call

Peng Tao (7):
vfs: pull btrfs clone API to vfs layer
vfs/btrfs: add .clone_range file operation
nfs42: add CLONE xdr functions
nfs42: add CLONE proc functions
nfs42: add .copy_range file operation
nfs: get clone_blksize when probing fsinfo
nfs42: respect clone_blksize

fs/btrfs/ctree.h | 2 +
fs/btrfs/file.c | 1 +
fs/btrfs/ioctl.c | 68 ++++----------------------------
fs/ioctl.c | 30 ++++++++++++++
fs/nfs/client.c | 1 +
fs/nfs/nfs42.h | 1 +
fs/nfs/nfs42proc.c | 71 +++++++++++++++++++++++++++++++++
fs/nfs/nfs42xdr.c | 97 +++++++++++++++++++++++++++++++++++++++++++++-
fs/nfs/nfs4file.c | 58 +++++++++++++++++++++++++++
fs/nfs/nfs4proc.c | 4 +-
fs/nfs/nfs4xdr.c | 26 +++++++++++++
fs/nfsd/nfs4proc.c | 95 +++++++++++++++++++++++++++++++++++++++++----
fs/nfsd/nfs4state.c | 5 +--
fs/nfsd/nfs4xdr.c | 21 ++++++++++
fs/nfsd/state.h | 4 +-
fs/nfsd/vfs.c | 7 ++++
fs/nfsd/vfs.h | 1 +
fs/nfsd/xdr4.h | 10 +++++
fs/read_write.c | 45 +++++++++++++++++++++
include/linux/fs.h | 4 ++
include/linux/nfs4.h | 7 +++-
include/linux/nfs_fs_sb.h | 2 +
include/linux/nfs_xdr.h | 20 ++++++++++
include/uapi/linux/btrfs.h | 10 -----
include/uapi/linux/fs.h | 9 +++++
25 files changed, 512 insertions(+), 87 deletions(-)

--
1.8.3.1



2015-08-26 08:19:20

by Peng Tao

[permalink] [raw]
Subject: [PATCH-RFC-RESEND 8/9] nfsd: Pass filehandle to nfs4_preprocess_stateid_op()

From: Anna Schumaker <[email protected]>

This will be needed so COPY can look up the saved_fh in addition to the
current_fh.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfsd/nfs4proc.c | 16 +++++++++-------
fs/nfsd/nfs4state.c | 5 ++---
fs/nfsd/state.h | 4 ++--
3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 90cfda7..d34c967 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -776,8 +776,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
clear_bit(RQ_SPLICE_OK, &rqstp->rq_flags);

/* check stateid */
- status = nfs4_preprocess_stateid_op(rqstp, cstate, &read->rd_stateid,
- RD_STATE, &read->rd_filp, &read->rd_tmp_file);
+ status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+ &read->rd_stateid, RD_STATE,
+ &read->rd_filp, &read->rd_tmp_file);
if (status) {
dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
goto out;
@@ -923,7 +924,8 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,

if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
status = nfs4_preprocess_stateid_op(rqstp, cstate,
- &setattr->sa_stateid, WR_STATE, NULL, NULL);
+ &cstate->current_fh, &setattr->sa_stateid,
+ WR_STATE, NULL, NULL);
if (status) {
dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
return status;
@@ -987,8 +989,8 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (write->wr_offset >= OFFSET_MAX)
return nfserr_inval;

- status = nfs4_preprocess_stateid_op(rqstp, cstate, stateid, WR_STATE,
- &filp, NULL);
+ status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+ stateid, WR_STATE, &filp, NULL);
if (status) {
dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
return status;
@@ -1018,7 +1020,7 @@ nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
__be32 status = nfserr_notsupp;
struct file *file;

- status = nfs4_preprocess_stateid_op(rqstp, cstate,
+ status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
&fallocate->falloc_stateid,
WR_STATE, &file, NULL);
if (status != nfs_ok) {
@@ -1057,7 +1059,7 @@ nfsd4_seek(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
__be32 status;
struct file *file;

- status = nfs4_preprocess_stateid_op(rqstp, cstate,
+ status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
&seek->seek_stateid,
RD_STATE, &file, NULL);
if (status) {
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 61dfb33..7b0059d 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4645,10 +4645,9 @@ nfs4_check_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfs4_stid *s,
*/
__be32
nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
- struct nfsd4_compound_state *cstate, stateid_t *stateid,
- int flags, struct file **filpp, bool *tmp_file)
+ struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
+ stateid_t *stateid, int flags, struct file **filpp, bool *tmp_file)
{
- struct svc_fh *fhp = &cstate->current_fh;
struct inode *ino = d_inode(fhp->fh_dentry);
struct net *net = SVC_NET(rqstp);
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 4874ce5..d3e81ce 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -584,8 +584,8 @@ struct nfsd4_compound_state;
struct nfsd_net;

extern __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
- struct nfsd4_compound_state *cstate, stateid_t *stateid,
- int flags, struct file **filp, bool *tmp_file);
+ struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
+ stateid_t *stateid, int flags, struct file **filp, bool *tmp_file);
__be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
stateid_t *stateid, unsigned char typemask,
struct nfs4_stid **s, struct nfsd_net *nn);
--
1.8.3.1


2015-08-26 08:19:28

by Peng Tao

[permalink] [raw]
Subject: [PATCH-RFC-RESEND 9/9] NFSD: Implement the CLONE call

From: Anna Schumaker <[email protected]>

I can simply call vfs_file_clone_range() and have the vfs do the
right thing for the filesystem being exported.

Signed-off-by: Anna Schumaker <[email protected]>
[hch: change to implement the CLONE op instead of COPY]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
---
fs/nfsd/nfs4proc.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfs4xdr.c | 21 ++++++++++++++
fs/nfsd/vfs.c | 7 +++++
fs/nfsd/vfs.h | 1 +
fs/nfsd/xdr4.h | 10 +++++++
include/linux/nfs4.h | 4 +--
6 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index d34c967..c2d3558 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1014,6 +1014,79 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
}

static __be32
+nfsd4_verify_clone(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ struct nfsd4_clone *clone, struct file **src, struct file **dst)
+{
+ struct inode *src_ino, *dst_ino;
+ __be32 status;
+
+ status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh,
+ &clone->cl_src_stateid, RD_STATE,
+ src, NULL);
+ if (status) {
+ dprintk("NFSD: %s: couldn't process src stateid!\n", __func__);
+ return status;
+ }
+
+ status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+ &clone->cl_dst_stateid, WR_STATE,
+ dst, NULL);
+ if (status) {
+ dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
+ fput(*src);
+ }
+
+ /* a few extra check to make sure we send back proper errors per RFC */
+ src_ino = file_inode(*src);
+ dst_ino = file_inode(*dst);
+
+ if (S_ISDIR(src_ino->i_mode) || S_ISDIR(dst_ino->i_mode)) {
+ status = nfserr_wrong_type;
+ goto out_fput;
+ }
+
+ if (src_ino == dst_ino) {
+ status = nfserr_inval;
+ goto out_fput;
+ }
+
+ if (!(*src)->f_op || !(*src)->f_op->clone_range) {
+ status = nfserr_notsupp;
+ goto out_fput;
+ }
+out:
+ return status;
+out_fput:
+ fput(*src);
+ fput(*dst);
+ goto out;
+}
+
+static __be32
+nfsd4_clone(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ struct nfsd4_clone *clone)
+{
+ int ret;
+ __be32 status;
+ struct file *src = NULL, *dst = NULL;
+
+ status = nfsd4_verify_clone(rqstp, cstate, clone, &src, &dst);
+ if (status)
+ return status;
+
+ ret = nfsd4_clone_range(src, dst, clone->cl_src_pos,
+ clone->cl_count, clone->cl_dst_pos);
+ if (ret < 0)
+ status = nfserrno(ret);
+ else
+ status = nfs_ok;
+
+ fput(src);
+ fput(dst);
+ return status;
+}
+
+static __be32
nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
struct nfsd4_fallocate *fallocate, int flags)
{
@@ -2283,6 +2356,12 @@ static struct nfsd4_operation nfsd4_ops[] = {
.op_name = "OP_DEALLOCATE",
.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
},
+ [OP_CLONE] = {
+ .op_func = (nfsd4op_func)nfsd4_clone,
+ .op_flags = OP_MODIFIES_SOMETHING | OP_CACHEME,
+ .op_name = "OP_CLONE",
+ .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
+ },
[OP_SEEK] = {
.op_func = (nfsd4op_func)nfsd4_seek,
.op_name = "OP_SEEK",
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 5463385..a845db1 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1675,6 +1675,25 @@ nfsd4_decode_fallocate(struct nfsd4_compoundargs *argp,
}

static __be32
+nfsd4_decode_clone(struct nfsd4_compoundargs *argp, struct nfsd4_clone *clone)
+{
+ DECODE_HEAD;
+
+ status = nfsd4_decode_stateid(argp, &clone->cl_src_stateid);
+ if (status)
+ return status;
+ status = nfsd4_decode_stateid(argp, &clone->cl_dst_stateid);
+ if (status)
+ return status;
+
+ READ_BUF(8 + 8 + 8);
+ p = xdr_decode_hyper(p, &clone->cl_src_pos);
+ p = xdr_decode_hyper(p, &clone->cl_dst_pos);
+ p = xdr_decode_hyper(p, &clone->cl_count);
+ DECODE_TAIL;
+}
+
+static __be32
nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek)
{
DECODE_HEAD;
@@ -1785,6 +1804,7 @@ static nfsd4_dec nfsd4_dec_ops[] = {
[OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_SEEK] = (nfsd4_dec)nfsd4_decode_seek,
[OP_WRITE_SAME] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_CLONE] = (nfsd4_dec)nfsd4_decode_clone,
};

static inline bool
@@ -4249,6 +4269,7 @@ static nfsd4_enc nfsd4_enc_ops[] = {
[OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
[OP_SEEK] = (nfsd4_enc)nfsd4_encode_seek,
[OP_WRITE_SAME] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_CLONE] = (nfsd4_enc)nfsd4_encode_noop,
};

/*
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index b5e077a..7d43097 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -36,6 +36,7 @@
#endif /* CONFIG_NFSD_V3 */

#ifdef CONFIG_NFSD_V4
+#include "../internal.h"
#include "acl.h"
#include "idmap.h"
#endif /* CONFIG_NFSD_V4 */
@@ -498,6 +499,12 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
}
#endif

+int nfsd4_clone_range(struct file *src, struct file *dst, u64 src_pos,
+ u64 count, u64 dst_pos)
+{
+ return vfs_file_clone_range(src, dst, src_pos, count, dst_pos);
+}
+
__be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct file *file, loff_t offset, loff_t len,
int flags)
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 5be875e..31d56a3 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -56,6 +56,7 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *,
struct xdr_netobj *);
__be32 nfsd4_vfs_fallocate(struct svc_rqst *, struct svc_fh *,
struct file *, loff_t, loff_t, int);
+int nfsd4_clone_range(struct file *, struct file *, u64, u64, u64);
#endif /* CONFIG_NFSD_V4 */
__be32 nfsd_create(struct svc_rqst *, struct svc_fh *,
char *name, int len, struct iattr *attrs,
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 9f99100..3d70712 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -491,6 +491,15 @@ struct nfsd4_fallocate {
u64 falloc_length;
};

+struct nfsd4_clone {
+ /* request */
+ stateid_t cl_src_stateid;
+ stateid_t cl_dst_stateid;
+ u64 cl_src_pos;
+ u64 cl_dst_pos;
+ u64 cl_count;
+};
+
struct nfsd4_seek {
/* request */
stateid_t seek_stateid;
@@ -555,6 +564,7 @@ struct nfsd4_op {
/* NFSv4.2 */
struct nfsd4_fallocate allocate;
struct nfsd4_fallocate deallocate;
+ struct nfsd4_clone clone;
struct nfsd4_seek seek;
} u;
struct nfs4_replay * replay;
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index e7e7853..21c6612 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -139,10 +139,10 @@ enum nfs_opnum4 {
Needs to be updated if more operations are defined in future.*/

#define FIRST_NFS4_OP OP_ACCESS
-#define LAST_NFS4_OP OP_WRITE_SAME
#define LAST_NFS40_OP OP_RELEASE_LOCKOWNER
#define LAST_NFS41_OP OP_RECLAIM_COMPLETE
-#define LAST_NFS42_OP OP_WRITE_SAME
+#define LAST_NFS42_OP OP_CLONE
+#define LAST_NFS4_OP LAST_NFS42_OP

enum nfsstat4 {
NFS4_OK = 0,
--
1.8.3.1


2015-08-26 08:19:10

by Peng Tao

[permalink] [raw]
Subject: [PATCH-RFC-RESEND 6/9] nfs: get clone_blksize when probing fsinfo

NFSv42 CLONE operation is supposed to respect it.

Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/client.c | 1 +
fs/nfs/nfs4proc.c | 1 +
fs/nfs/nfs4xdr.c | 25 +++++++++++++++++++++++++
include/linux/nfs4.h | 1 +
include/linux/nfs_fs_sb.h | 1 +
include/linux/nfs_xdr.h | 1 +
6 files changed, 30 insertions(+)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 57c5a02..d6d5d2a 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -764,6 +764,7 @@ static void nfs_server_set_fsinfo(struct nfs_server *server,

server->time_delta = fsinfo->time_delta;

+ server->clone_blksize = fsinfo->clone_blksize;
/* We're airborne Set socket buffersize */
rpc_setbufsize(server->client, server->wsize + 100, server->rsize + 100);
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index fb51b75..ecf792c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -239,6 +239,7 @@ const u32 nfs4_fsinfo_bitmap[3] = { FATTR4_WORD0_MAXFILESIZE
FATTR4_WORD1_TIME_DELTA
| FATTR4_WORD1_FS_LAYOUT_TYPES,
FATTR4_WORD2_LAYOUT_BLKSIZE
+ | FATTR4_WORD2_CLONE_BLKSIZE
};

const u32 nfs4_fs_locations_bitmap[3] = {
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index a453bd1..f0dce533 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -4737,6 +4737,28 @@ static int decode_attr_layout_blksize(struct xdr_stream *xdr, uint32_t *bitmap,
return 0;
}

+/*
+ * The granularity of a CLONE operation.
+ */
+static int decode_attr_clone_blksize(struct xdr_stream *xdr, uint32_t *bitmap,
+ uint32_t *res)
+{
+ __be32 *p;
+
+ dprintk("%s: bitmap is %x\n", __func__, bitmap[2]);
+ *res = 0;
+ if (bitmap[2] & FATTR4_WORD2_CLONE_BLKSIZE) {
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p)) {
+ print_overflow_msg(__func__, xdr);
+ return -EIO;
+ }
+ *res = be32_to_cpup(p);
+ bitmap[2] &= ~FATTR4_WORD2_CLONE_BLKSIZE;
+ }
+ return 0;
+}
+
static int decode_fsinfo(struct xdr_stream *xdr, struct nfs_fsinfo *fsinfo)
{
unsigned int savep;
@@ -4771,6 +4793,9 @@ static int decode_fsinfo(struct xdr_stream *xdr, struct nfs_fsinfo *fsinfo)
status = decode_attr_layout_blksize(xdr, bitmap, &fsinfo->blksize);
if (status)
goto xdr_error;
+ status = decode_attr_clone_blksize(xdr, bitmap, &fsinfo->clone_blksize);
+ if (status)
+ goto xdr_error;

status = verify_attr_len(xdr, savep, attrlen);
xdr_error:
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index c0c695b..e7e7853 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -422,6 +422,7 @@ enum lock_type4 {
#define FATTR4_WORD2_LAYOUT_TYPES (1UL << 0)
#define FATTR4_WORD2_LAYOUT_BLKSIZE (1UL << 1)
#define FATTR4_WORD2_MDSTHRESHOLD (1UL << 4)
+#define FATTR4_WORD2_CLONE_BLKSIZE (1UL << 13)
#define FATTR4_WORD2_SECURITY_LABEL (1UL << 16)

/* MDS threshold bitmap bits */
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 2fb163a..db00311 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -147,6 +147,7 @@ struct nfs_server {
unsigned int acdirmax;
unsigned int namelen;
unsigned int options; /* extra options enabled by mount */
+ unsigned int clone_blksize; /* granularity of a CLONE operation */
#define NFS_OPTION_FSCACHE 0x00000001 /* - local caching enabled */
#define NFS_OPTION_MIGRATION 0x00000002 /* - NFSv4 migration enabled */

diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 79918d6..e89dafd 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -141,6 +141,7 @@ struct nfs_fsinfo {
__u32 lease_time; /* in seconds */
__u32 layouttype; /* supported pnfs layout driver */
__u32 blksize; /* preferred pnfs io block size */
+ __u32 clone_blksize; /* granularity of a CLONE operation */
};

struct nfs_fsstat {
--
1.8.3.1


2015-08-26 08:19:14

by Peng Tao

[permalink] [raw]
Subject: [PATCH-RFC-RESEND 7/9] nfs42: respect clone_blksize

draft-ietf-nfsv4-minorversion2-38.txt says:
Both cl_src_offset and
cl_dst_offset must be aligned to the clone block size Section 12.2.1.
The number of bytes to be cloned must be a multiple of the clone
block size, except in the case in which cl_src_offset plus the number
of bytes to be cloned is equal to the source file size.

Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/nfs4file.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index c335cb0..ee8c014 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -174,12 +174,20 @@ nfs42_file_clone_range(struct file *src_file, struct file *dst_file,
{
struct inode *dst_inode = file_inode(dst_file);
struct inode *src_inode = file_inode(src_file);
+ struct nfs_server *server = NFS_SERVER(dst_inode);
+ unsigned int bs = server->clone_blksize;
int ret;

/* src and dst must be different files */
if (src_inode == dst_inode)
return -EINVAL;

+ /* check alignment w.r.t. clone_blksize */
+ if (bs)
+ if (!IS_ALIGNED(src_off, bs) || !IS_ALIGNED(dst_off, bs) ||
+ (!IS_ALIGNED(count, bs) && i_size_read(src_inode) != (src_off + count)))
+ return -EINVAL;
+
/* XXX: do we lock at all? what if server needs CB_RECALL_LAYOUT? */
if (dst_inode < src_inode) {
mutex_lock_nested(&dst_inode->i_mutex, I_MUTEX_PARENT);
--
1.8.3.1


2015-08-26 08:19:05

by Peng Tao

[permalink] [raw]
Subject: [PATCH-RFC-RESEND 5/9] nfs42: add .copy_range file operation

Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/nfs4file.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)

diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index dcd39d4..c335cb0 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -4,6 +4,7 @@
* Copyright (C) 1992 Rick Sladkey
*/
#include <linux/fs.h>
+#include <linux/file.h>
#include <linux/falloc.h>
#include <linux/nfs_fs.h>
#include "internal.h"
@@ -166,6 +167,54 @@ static long nfs42_fallocate(struct file *filep, int mode, loff_t offset, loff_t
return nfs42_proc_deallocate(filep, offset, len);
return nfs42_proc_allocate(filep, offset, len);
}
+
+static noinline int
+nfs42_file_clone_range(struct file *src_file, struct file *dst_file,
+ loff_t src_off, size_t dst_off, loff_t count)
+{
+ struct inode *dst_inode = file_inode(dst_file);
+ struct inode *src_inode = file_inode(src_file);
+ int ret;
+
+ /* src and dst must be different files */
+ if (src_inode == dst_inode)
+ return -EINVAL;
+
+ /* XXX: do we lock at all? what if server needs CB_RECALL_LAYOUT? */
+ if (dst_inode < src_inode) {
+ mutex_lock_nested(&dst_inode->i_mutex, I_MUTEX_PARENT);
+ mutex_lock_nested(&src_inode->i_mutex, I_MUTEX_CHILD);
+ } else {
+ mutex_lock_nested(&src_inode->i_mutex, I_MUTEX_PARENT);
+ mutex_lock_nested(&dst_inode->i_mutex, I_MUTEX_CHILD);
+ }
+
+ /* flush all pending writes on both src and dst so that server
+ * has the latest data */
+ ret = nfs_sync_inode(src_inode);
+ if (ret)
+ goto out_unlock;
+ ret = nfs_sync_inode(dst_inode);
+ if (ret)
+ goto out_unlock;
+
+ ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count);
+
+ /* truncate inode page cache of the dst range so that future reads can fetch
+ * new data from server */
+ if (!ret)
+ truncate_inode_pages_range(&dst_inode->i_data, dst_off, dst_off + count - 1);
+
+out_unlock:
+ if (dst_inode < src_inode) {
+ mutex_unlock(&src_inode->i_mutex);
+ mutex_unlock(&dst_inode->i_mutex);
+ } else {
+ mutex_unlock(&dst_inode->i_mutex);
+ mutex_unlock(&src_inode->i_mutex);
+ }
+ return ret;
+}
#endif /* CONFIG_NFS_V4_2 */

const struct file_operations nfs4_file_operations = {
@@ -187,6 +236,7 @@ const struct file_operations nfs4_file_operations = {
.splice_write = iter_file_splice_write,
#ifdef CONFIG_NFS_V4_2
.fallocate = nfs42_fallocate,
+ .clone_range = nfs42_file_clone_range,
#endif /* CONFIG_NFS_V4_2 */
.check_flags = nfs_check_flags,
.setlease = simple_nosetlease,
--
1.8.3.1


2015-08-26 08:19:01

by Peng Tao

[permalink] [raw]
Subject: [PATCH-RFC-RESEND 4/9] nfs42: add CLONE proc functions

Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/nfs42.h | 1 +
fs/nfs/nfs42proc.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++
fs/nfs/nfs4proc.c | 3 +-
include/linux/nfs_fs_sb.h | 1 +
4 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs42.h b/fs/nfs/nfs42.h
index ff66ae7..3585f3c 100644
--- a/fs/nfs/nfs42.h
+++ b/fs/nfs/nfs42.h
@@ -17,6 +17,7 @@ int nfs42_proc_deallocate(struct file *, loff_t, loff_t);
loff_t nfs42_proc_llseek(struct file *, loff_t, int);
int nfs42_proc_layoutstats_generic(struct nfs_server *,
struct nfs42_layoutstat_data *);
+int nfs42_proc_clone(struct file *, struct file *, loff_t, loff_t, loff_t);
/* nfs4.2xdr.h */
extern struct rpc_procinfo nfs4_2_procedures[];

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index d731bbf..baa3141 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -269,3 +269,74 @@ int nfs42_proc_layoutstats_generic(struct nfs_server *server,
return PTR_ERR(task);
return 0;
}
+
+static int _nfs42_proc_clone(struct rpc_message *msg, struct file *src_f,
+ struct file *dst_f, loff_t src_offset,
+ loff_t dst_offset, loff_t count)
+{
+ struct inode *src_inode = file_inode(src_f);
+ struct inode *dst_inode = file_inode(dst_f);
+ struct nfs_server *server = NFS_SERVER(dst_inode);
+ struct nfs42_clone_args args = {
+ .src_fh = NFS_FH(src_inode),
+ .dst_fh = NFS_FH(dst_inode),
+ .src_offset = src_offset,
+ .dst_offset = dst_offset,
+ .dst_bitmask = server->cache_consistency_bitmask,
+ };
+ struct nfs42_clone_res res = {
+ .server = server,
+ };
+ int status;
+
+ msg->rpc_argp = &args;
+ msg->rpc_resp = &res;
+
+ status = nfs42_set_rw_stateid(&args.src_stateid, src_f, FMODE_READ);
+ if (status)
+ return status;
+
+ status = nfs42_set_rw_stateid(&args.dst_stateid, dst_f, FMODE_WRITE);
+ if (status)
+ return status;
+
+ res.dst_fattr = nfs_alloc_fattr();
+ if (!res.dst_fattr)
+ return -ENOMEM;
+
+ status = nfs4_call_sync(server->client, server, msg,
+ &args.seq_args, &res.seq_res, 0);
+ if (status == 0)
+ status = nfs_post_op_update_inode(dst_inode, res.dst_fattr);
+
+ kfree(res.dst_fattr);
+ return status;
+}
+
+int nfs42_proc_clone(struct file *src_f, struct file *dst_f,
+ loff_t src_offset, loff_t dst_offset, loff_t count)
+{
+ struct rpc_message msg = {
+ .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_CLONE],
+ };
+ struct inode *inode = file_inode(src_f);
+ struct nfs_server *server = NFS_SERVER(file_inode(src_f));
+ struct nfs4_exception exception = { };
+ int err;
+
+ if (!nfs_server_capable(inode, NFS_CAP_CLONE))
+ return -EOPNOTSUPP;
+
+ do {
+ err = _nfs42_proc_clone(&msg, src_f, dst_f, src_offset,
+ dst_offset, count);
+ if (err == -ENOTSUPP || err == -EOPNOTSUPP) {
+ NFS_SERVER(inode)->caps &= ~NFS_CAP_CLONE;
+ return -EOPNOTSUPP;
+ }
+ err = nfs4_handle_exception(server, err, &exception);
+ } while (exception.retry);
+
+ return err;
+
+}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6e988fd..fb51b75 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -8659,7 +8659,8 @@ static const struct nfs4_minor_version_ops nfs_v4_2_minor_ops = {
| NFS_CAP_ALLOCATE
| NFS_CAP_DEALLOCATE
| NFS_CAP_SEEK
- | NFS_CAP_LAYOUTSTATS,
+ | NFS_CAP_LAYOUTSTATS
+ | NFS_CAP_CLONE,
.init_client = nfs41_init_client,
.shutdown_client = nfs41_shutdown_client,
.match_stateid = nfs41_match_stateid,
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 20bc8e5..2fb163a 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -238,5 +238,6 @@ struct nfs_server {
#define NFS_CAP_ALLOCATE (1U << 20)
#define NFS_CAP_DEALLOCATE (1U << 21)
#define NFS_CAP_LAYOUTSTATS (1U << 22)
+#define NFS_CAP_CLONE (1U << 23)

#endif
--
1.8.3.1


2015-08-26 08:18:55

by Peng Tao

[permalink] [raw]
Subject: [PATCH-RFC-RESEND 3/9] nfs42: add CLONE xdr functions

xdr definitions per draft-ietf-nfsv4-minorversion2-38.txt

Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/nfs42xdr.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++-
fs/nfs/nfs4xdr.c | 1 +
include/linux/nfs4.h | 2 +
include/linux/nfs_xdr.h | 19 ++++++++++
4 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index a6bd27d..a761421 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -34,6 +34,12 @@
1 /* opaque devaddr4 length */ + \
XDR_QUADLEN(PNFS_LAYOUTSTATS_MAXSIZE))
#define decode_layoutstats_maxsz (op_decode_hdr_maxsz)
+#define encode_clone_maxsz (encode_stateid_maxsz + \
+ encode_stateid_maxsz + \
+ 2 /* src offset */ + \
+ 2 /* dst offset */ + \
+ 2 /* count */)
+#define decode_clone_maxsz (op_decode_hdr_maxsz)

#define NFS4_enc_allocate_sz (compound_encode_hdr_maxsz + \
encode_putfh_maxsz + \
@@ -65,7 +71,20 @@
decode_sequence_maxsz + \
decode_putfh_maxsz + \
PNFS_LAYOUTSTATS_MAXDEV * decode_layoutstats_maxsz)
-
+#define NFS4_enc_clone_sz (compound_encode_hdr_maxsz + \
+ encode_sequence_maxsz + \
+ encode_putfh_maxsz + \
+ encode_savefh_maxsz + \
+ encode_putfh_maxsz + \
+ encode_clone_maxsz + \
+ encode_getattr_maxsz)
+#define NFS4_dec_clone_sz (compound_decode_hdr_maxsz + \
+ decode_sequence_maxsz + \
+ decode_putfh_maxsz + \
+ decode_savefh_maxsz + \
+ decode_putfh_maxsz + \
+ decode_clone_maxsz + \
+ decode_getattr_maxsz)

static void encode_fallocate(struct xdr_stream *xdr,
struct nfs42_falloc_args *args)
@@ -128,6 +147,21 @@ static void encode_layoutstats(struct xdr_stream *xdr,
encode_uint32(xdr, 0);
}

+static void encode_clone(struct xdr_stream *xdr,
+ struct nfs42_clone_args *args,
+ struct compound_hdr *hdr)
+{
+ __be32 *p;
+
+ encode_op_hdr(xdr, OP_CLONE, decode_clone_maxsz, hdr);
+ encode_nfs4_stateid(xdr, &args->src_stateid);
+ encode_nfs4_stateid(xdr, &args->dst_stateid);
+ p = reserve_space(xdr, 3*8);
+ p = xdr_encode_hyper(p, args->src_offset);
+ p = xdr_encode_hyper(p, args->dst_offset);
+ xdr_encode_hyper(p, args->count);
+}
+
/*
* Encode ALLOCATE request
*/
@@ -206,6 +240,27 @@ static void nfs4_xdr_enc_layoutstats(struct rpc_rqst *req,
encode_nops(&hdr);
}

+/*
+ * Encode CLONE request
+ */
+static void nfs4_xdr_enc_clone(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ struct nfs42_clone_args *args)
+{
+ struct compound_hdr hdr = {
+ .minorversion = nfs4_xdr_minorversion(&args->seq_args),
+ };
+
+ encode_compound_hdr(xdr, req, &hdr);
+ encode_sequence(xdr, &args->seq_args, &hdr);
+ encode_putfh(xdr, args->src_fh, &hdr);
+ encode_savefh(xdr, &hdr);
+ encode_putfh(xdr, args->dst_fh, &hdr);
+ encode_clone(xdr, args, &hdr);
+ encode_getfattr(xdr, args->dst_bitmask, &hdr);
+ encode_nops(&hdr);
+}
+
static int decode_allocate(struct xdr_stream *xdr, struct nfs42_falloc_res *res)
{
return decode_op_hdr(xdr, OP_ALLOCATE);
@@ -244,6 +299,11 @@ static int decode_layoutstats(struct xdr_stream *xdr,
return decode_op_hdr(xdr, OP_LAYOUTSTATS);
}

+static int decode_clone(struct xdr_stream *xdr)
+{
+ return decode_op_hdr(xdr, OP_CLONE);
+}
+
/*
* Decode ALLOCATE request
*/
@@ -352,4 +412,39 @@ out:
return status;
}

+/*
+ * Decode CLONE request
+ */
+static int nfs4_xdr_dec_clone(struct rpc_rqst *rqstp,
+ struct xdr_stream *xdr,
+ struct nfs42_clone_res *res)
+{
+ struct compound_hdr hdr;
+ int status;
+
+ status = decode_compound_hdr(xdr, &hdr);
+ if (status)
+ goto out;
+ status = decode_sequence(xdr, &res->seq_res, rqstp);
+ if (status)
+ goto out;
+ status = decode_putfh(xdr);
+ if (status)
+ goto out;
+ status = decode_savefh(xdr);
+ if (status)
+ goto out;
+ status = decode_putfh(xdr);
+ if (status)
+ goto out;
+ status = decode_clone(xdr);
+ if (status)
+ goto out;
+ status = decode_getfattr(xdr, res->dst_fattr, res->server);
+
+out:
+ res->rpc_status = status;
+ return status;
+}
+
#endif /* __LINUX_FS_NFS_NFS4_2XDR_H */
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index c42459e..a453bd1 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7434,6 +7434,7 @@ struct rpc_procinfo nfs4_procedures[] = {
PROC(ALLOCATE, enc_allocate, dec_allocate),
PROC(DEALLOCATE, enc_deallocate, dec_deallocate),
PROC(LAYOUTSTATS, enc_layoutstats, dec_layoutstats),
+ PROC(CLONE, enc_clone, dec_clone),
#endif /* CONFIG_NFS_V4_2 */
};

diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 00121f2..c0c695b 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -130,6 +130,7 @@ enum nfs_opnum4 {
OP_READ_PLUS = 68,
OP_SEEK = 69,
OP_WRITE_SAME = 70,
+ OP_CLONE = 71,

OP_ILLEGAL = 10044,
};
@@ -501,6 +502,7 @@ enum {
NFSPROC4_CLNT_ALLOCATE,
NFSPROC4_CLNT_DEALLOCATE,
NFSPROC4_CLNT_LAYOUTSTATS,
+ NFSPROC4_CLNT_CLONE,
};

/* nfs41 types */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index b9b5304..79918d6 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -359,6 +359,25 @@ struct nfs42_layoutstat_data {
struct nfs42_layoutstat_res res;
};

+struct nfs42_clone_args {
+ struct nfs4_sequence_args seq_args;
+ struct nfs_fh *src_fh;
+ struct nfs_fh *dst_fh;
+ nfs4_stateid src_stateid;
+ nfs4_stateid dst_stateid;
+ __u64 src_offset;
+ __u64 dst_offset;
+ __u64 count;
+ const u32 *dst_bitmask;
+};
+
+struct nfs42_clone_res {
+ struct nfs4_sequence_res seq_res;
+ unsigned int rpc_status;
+ struct nfs_fattr *dst_fattr;
+ const struct nfs_server *server;
+};
+
struct stateowner_id {
__u64 create_time;
__u32 uniquifier;
--
1.8.3.1


2015-08-26 08:18:42

by Peng Tao

[permalink] [raw]
Subject: [PATCH-RFC-RESEND 1/9] vfs: pull btrfs clone API to vfs layer

Now that a few file systems are adding clone functionality, namingly
btrfs, NFS (later in the series) and XFS
(ttp://oss.sgi.com/archives/xfs/2015-06/msg00407.html), it makes sense
to pull the ioctl to common code.

Add vfs_file_clone_range() helper and .clone_range file operation interface
to allow underlying filesystems to clone between regular files.

The change in do_vfs_ioctl() is defered to next patch where btrfs
.clone_range is added, just so that we don't break btrfs CLONE ioctl
with this patch.

Signed-off-by: Peng Tao <[email protected]>
---
fs/ioctl.c | 24 ++++++++++++++++++++++++
fs/read_write.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/fs.h | 4 ++++
include/uapi/linux/fs.h | 9 +++++++++
4 files changed, 82 insertions(+)

diff --git a/fs/ioctl.c b/fs/ioctl.c
index 5d01d26..726c5d7 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -215,6 +215,30 @@ static int ioctl_fiemap(struct file *filp, unsigned long arg)
return error;
}

+static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
+ u64 off, u64 olen, u64 destoff)
+{
+ struct fd src_file = fdget(srcfd);
+ int ret;
+
+ if (!src_file.file)
+ return -EBADF;
+ ret = vfs_file_clone_range(src_file.file, dst_file, off, olen, destoff);
+
+ fdput(src_file);
+ return ret;
+}
+
+static long ioctl_file_clone_range(struct file *file, void __user *argp)
+{
+ struct file_clone_range args;
+
+ if (copy_from_user(&args, argp, sizeof(args)))
+ return -EFAULT;
+ return ioctl_file_clone(file, args.src_fd, args.src_offset,
+ args.src_length, args.dest_offset);
+}
+
#ifdef CONFIG_BLOCK

static inline sector_t logical_to_blk(struct inode *inode, loff_t offset)
diff --git a/fs/read_write.c b/fs/read_write.c
index 819ef3f..beaad2c 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -16,6 +16,7 @@
#include <linux/pagemap.h>
#include <linux/splice.h>
#include <linux/compat.h>
+#include <linux/mount.h>
#include "internal.h"

#include <asm/uaccess.h>
@@ -1327,3 +1328,47 @@ COMPAT_SYSCALL_DEFINE4(sendfile64, int, out_fd, int, in_fd,
return do_sendfile(out_fd, in_fd, NULL, count, 0);
}
#endif
+
+int vfs_file_clone_range(struct file *src_file, struct file *dst_file,
+ loff_t off, size_t len, loff_t dstoff)
+{
+ struct inode *src_ino;
+ struct inode *dst_ino;
+ ssize_t ret;
+
+ if (!(src_file->f_mode & FMODE_READ) ||
+ !(dst_file->f_mode & FMODE_WRITE) ||
+ (dst_file->f_flags & O_APPEND) ||
+ !src_file->f_op || !src_file->f_op->clone_range)
+ return -EINVAL;
+
+ src_ino = file_inode(src_file);
+ dst_ino = file_inode(dst_file);
+
+ if (S_ISDIR(src_ino->i_mode) || S_ISDIR(dst_ino->i_mode))
+ return -EISDIR;
+
+ /* sanity check on offsets and length */
+ if (off + len < off || dstoff + len < dstoff ||
+ off + len > i_size_read(src_ino))
+ return -EINVAL;
+
+ if (src_ino->i_sb != dst_ino->i_sb ||
+ src_file->f_path.mnt != dst_file->f_path.mnt)
+ return -EXDEV;
+
+ ret = mnt_want_write_file(dst_file);
+ if (ret)
+ return ret;
+
+ ret = src_file->f_op->clone_range(src_file, dst_file, off, len, dstoff);
+ if (!ret) {
+ fsnotify_access(src_file);
+ fsnotify_modify(dst_file);
+ }
+
+ mnt_drop_write_file(dst_file);
+
+ return ret;
+}
+EXPORT_SYMBOL(vfs_file_clone_range);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cc008c3..612d7f4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1628,6 +1628,8 @@ struct file_operations {
long (*fallocate)(struct file *file, int mode, loff_t offset,
loff_t len);
void (*show_fdinfo)(struct seq_file *m, struct file *f);
+ int (*clone_range)(struct file *src_file, struct file *dst_file,
+ loff_t off, size_t len, loff_t dstoff);
#ifndef CONFIG_MMU
unsigned (*mmap_capabilities)(struct file *);
#endif
@@ -2678,6 +2680,8 @@ int __dax_fault(struct vm_area_struct *, struct vm_fault *, get_block_t,
int dax_pfn_mkwrite(struct vm_area_struct *, struct vm_fault *);
#define dax_mkwrite(vma, vmf, gb, iod) dax_fault(vma, vmf, gb, iod)
#define __dax_mkwrite(vma, vmf, gb, iod) __dax_fault(vma, vmf, gb, iod)
+int vfs_file_clone_range(struct file *src_file, struct file *dst_file,
+ loff_t off, size_t len, loff_t dstoff);

#ifdef CONFIG_BLOCK
typedef void (dio_submit_t)(int rw, struct bio *bio, struct inode *inode,
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 9b964a5..ac7f1c5 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -39,6 +39,13 @@
#define RENAME_EXCHANGE (1 << 1) /* Exchange source and dest */
#define RENAME_WHITEOUT (1 << 2) /* Whiteout source */

+struct file_clone_range {
+ __s64 src_fd;
+ __u64 src_offset;
+ __u64 src_length;
+ __u64 dest_offset;
+};
+
struct fstrim_range {
__u64 start;
__u64 len;
@@ -159,6 +166,8 @@ struct inodes_stat_t {
#define FIFREEZE _IOWR('X', 119, int) /* Freeze */
#define FITHAW _IOWR('X', 120, int) /* Thaw */
#define FITRIM _IOWR('X', 121, struct fstrim_range) /* Trim */
+#define FICLONE _IOW(0x94, 9, int) /* Clone */
+#define FICLONERANGE _IOW(0x94, 13, struct file_clone_range) /* Clone range */

#define FS_IOC_GETFLAGS _IOR('f', 1, long)
#define FS_IOC_SETFLAGS _IOW('f', 2, long)
--
1.8.3.1


2015-08-26 08:18:47

by Peng Tao

[permalink] [raw]
Subject: [PATCH-RFC-RESEND 2/9] vfs/btrfs: add .clone_range file operation

Also add vfs callers in do_vfs_ioctl(). Now btrfs CLONE
ioctl goes through vfs_file_clone_range().

One question to btrfs developers:
Does btrfs_clone() ever return a positive value? If it does, what does it mean
to users?

Signed-off-by: Peng Tao <[email protected]>
---
fs/btrfs/ctree.h | 2 ++
fs/btrfs/file.c | 1 +
fs/btrfs/ioctl.c | 68 +++++-----------------------------------------
fs/ioctl.c | 6 ++++
include/uapi/linux/btrfs.h | 10 -------
5 files changed, 16 insertions(+), 71 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index aac314e..af3e224 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3969,6 +3969,8 @@ void btrfs_get_block_group_info(struct list_head *groups_list,
struct btrfs_ioctl_space_info *space);
void update_ioctl_balance_args(struct btrfs_fs_info *fs_info, int lock,
struct btrfs_ioctl_balance_args *bargs);
+int btrfs_file_clone_range(struct file *src_file, struct file *dst_file,
+ loff_t off, size_t olen, loff_t destoff);


/* file.c */
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index b823fac..c9170fd 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2813,6 +2813,7 @@ const struct file_operations btrfs_file_operations = {
.fsync = btrfs_sync_file,
.fallocate = btrfs_fallocate,
.unlocked_ioctl = btrfs_ioctl,
+ .clone_range = btrfs_file_clone_range,
#ifdef CONFIG_COMPAT
.compat_ioctl = btrfs_ioctl,
#endif
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0770c91..3d026c8 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3719,13 +3719,12 @@ out:
return ret;
}

-static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
- u64 off, u64 olen, u64 destoff)
+int btrfs_file_clone_range(struct file *src_file, struct file *dst_file,
+ loff_t off, size_t olen, loff_t destoff)
{
- struct inode *inode = file_inode(file);
+ struct inode *inode = file_inode(dst_file);
struct btrfs_root *root = BTRFS_I(inode)->root;
- struct fd src_file;
- struct inode *src;
+ struct inode *src = file_inode(src_file);
int ret;
u64 len = olen;
u64 bs = root->fs_info->sb->s_blocksize;
@@ -3742,49 +3741,16 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,
* be either compressed or non-compressed.
*/

- /* the destination must be opened for writing */
- if (!(file->f_mode & FMODE_WRITE) || (file->f_flags & O_APPEND))
- return -EINVAL;
-
if (btrfs_root_readonly(root))
return -EROFS;

- ret = mnt_want_write_file(file);
- if (ret)
- return ret;
-
- src_file = fdget(srcfd);
- if (!src_file.file) {
- ret = -EBADF;
- goto out_drop_write;
- }
-
- ret = -EXDEV;
- if (src_file.file->f_path.mnt != file->f_path.mnt)
- goto out_fput;
-
- src = file_inode(src_file.file);
-
- ret = -EINVAL;
- if (src == inode)
- same_inode = 1;
-
- /* the src must be open for reading */
- if (!(src_file.file->f_mode & FMODE_READ))
- goto out_fput;
-
/* don't make the dst file partly checksummed */
if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
- goto out_fput;
-
- ret = -EISDIR;
- if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
- goto out_fput;
+ return -EINVAL;

- ret = -EXDEV;
- if (src->i_sb != inode->i_sb)
- goto out_fput;
+ if (src == inode)
+ same_inode = 1;

if (!same_inode) {
if (inode < src) {
@@ -3800,8 +3766,6 @@ static noinline long btrfs_ioctl_clone(struct file *file, unsigned long srcfd,

/* determine range to clone */
ret = -EINVAL;
- if (off + len > src->i_size || off + len < off)
- goto out_unlock;
if (len == 0)
olen = len = src->i_size - off;
/* if we extend to eof, continue to block boundary */
@@ -3877,23 +3841,9 @@ out_unlock:
} else {
mutex_unlock(&src->i_mutex);
}
-out_fput:
- fdput(src_file);
-out_drop_write:
- mnt_drop_write_file(file);
return ret;
}

-static long btrfs_ioctl_clone_range(struct file *file, void __user *argp)
-{
- struct btrfs_ioctl_clone_range_args args;
-
- if (copy_from_user(&args, argp, sizeof(args)))
- return -EFAULT;
- return btrfs_ioctl_clone(file, args.src_fd, args.src_offset,
- args.src_length, args.dest_offset);
-}
-
/*
* there are many ways the trans_start and trans_end ioctls can lead
* to deadlocks. They should only be used by applications that
@@ -5433,10 +5383,6 @@ long btrfs_ioctl(struct file *file, unsigned int
return btrfs_ioctl_dev_info(root, argp);
case BTRFS_IOC_BALANCE:
return btrfs_ioctl_balance(file, NULL);
- case BTRFS_IOC_CLONE:
- return btrfs_ioctl_clone(file, arg, 0, 0, 0);
- case BTRFS_IOC_CLONE_RANGE:
- return btrfs_ioctl_clone_range(file, argp);
case BTRFS_IOC_TRANS_START:
return btrfs_ioctl_trans_start(file);
case BTRFS_IOC_TRANS_END:
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 726c5d7..41c6080 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -624,6 +624,12 @@ int do_vfs_ioctl(struct file *filp, unsigned int fd, unsigned int cmd,
case FIGETBSZ:
return put_user(inode->i_sb->s_blocksize, argp);

+ case FICLONE:
+ return ioctl_file_clone(filp, arg, 0, 0, 0);
+
+ case FICLONERANGE:
+ return ioctl_file_clone_range(filp, argp);
+
default:
if (S_ISREG(inode->i_mode))
error = file_ioctl(filp, cmd, arg);
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index b6dec05..c3a23d3 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -316,12 +316,6 @@ struct btrfs_ioctl_search_args_v2 {
__u64 buf[0]; /* out - found items */
};

-struct btrfs_ioctl_clone_range_args {
- __s64 src_fd;
- __u64 src_offset, src_length;
- __u64 dest_offset;
-};
-
/* flags for the defrag range ioctl */
#define BTRFS_DEFRAG_RANGE_COMPRESS 1
#define BTRFS_DEFRAG_RANGE_START_IO 2
@@ -548,7 +542,6 @@ static inline char *btrfs_err_str(enum btrfs_err_code err_code)
#define BTRFS_IOC_TRANS_END _IO(BTRFS_IOCTL_MAGIC, 7)
#define BTRFS_IOC_SYNC _IO(BTRFS_IOCTL_MAGIC, 8)

-#define BTRFS_IOC_CLONE _IOW(BTRFS_IOCTL_MAGIC, 9, int)
#define BTRFS_IOC_ADD_DEV _IOW(BTRFS_IOCTL_MAGIC, 10, \
struct btrfs_ioctl_vol_args)
#define BTRFS_IOC_RM_DEV _IOW(BTRFS_IOCTL_MAGIC, 11, \
@@ -556,9 +549,6 @@ static inline char *btrfs_err_str(enum btrfs_err_code err_code)
#define BTRFS_IOC_BALANCE _IOW(BTRFS_IOCTL_MAGIC, 12, \
struct btrfs_ioctl_vol_args)

-#define BTRFS_IOC_CLONE_RANGE _IOW(BTRFS_IOCTL_MAGIC, 13, \
- struct btrfs_ioctl_clone_range_args)
-
#define BTRFS_IOC_SUBVOL_CREATE _IOW(BTRFS_IOCTL_MAGIC, 14, \
struct btrfs_ioctl_vol_args)
#define BTRFS_IOC_SNAP_DESTROY _IOW(BTRFS_IOCTL_MAGIC, 15, \
--
1.8.3.1


2015-08-26 13:11:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH-RFC-RESEND 0/9] NFS/NFSD: add NFSv42 CLONE operation support

On Wed, Aug 26, 2015 at 04:16:41PM +0800, Peng Tao wrote:
> 2. vfs_file_clone_range() does not expect file systems to do data copy, and thus
> no rw_verify_area() required.

While there is no physical copy involved, the data logically is copied from
file A to file B. So we still need all the checks from rw_verify_area.


2015-08-26 13:17:41

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH-RFC-RESEND 2/9] vfs/btrfs: add .clone_range file operation

You need to eep the existing defintion, either as a duplicate with a
comment, or by including the uapi fs.h and defininin it to the new name.

Also this will break cifs, which uses the BTRFS_IOC_CLONE defintion
already.

2015-08-26 13:38:00

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH-RFC-RESEND 1/9] vfs: pull btrfs clone API to vfs layer

On Wed, Aug 26, 2015 at 04:16:42PM +0800, Peng Tao wrote:
> +struct file_clone_range {
> + __s64 src_fd;
> + __u64 src_offset;
> + __u64 src_length;
> + __u64 dest_offset;
> +};

Might be a good idea to add some spare bytes to the structure.

> struct fstrim_range {
> __u64 start;
> __u64 len;
> @@ -159,6 +166,8 @@ struct inodes_stat_t {
> #define FIFREEZE _IOWR('X', 119, int) /* Freeze */
> #define FITHAW _IOWR('X', 120, int) /* Thaw */
> #define FITRIM _IOWR('X', 121, struct fstrim_range) /* Trim */
> +#define FICLONE _IOW(0x94, 9, int) /* Clone */
> +#define FICLONERANGE _IOW(0x94, 13, struct file_clone_range) /* Clone range */

FICLONE is a special case of FICLONERANGE. The whole file clone had come
historically first and then was refined, I don't think this needs to be
copied to the generic API. A zeroed file_clone_range is simple to use
for that purpose.

2015-08-26 16:22:29

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH-RFC-RESEND 1/9] vfs: pull btrfs clone API to vfs layer

On Wed, Aug 26, 2015 at 03:37:23PM +0200, David Sterba wrote:
> On Wed, Aug 26, 2015 at 04:16:42PM +0800, Peng Tao wrote:
> > +struct file_clone_range {
> > + __s64 src_fd;
> > + __u64 src_offset;
> > + __u64 src_length;
> > + __u64 dest_offset;
> > +};
>
> Might be a good idea to add some spare bytes to the structure.

But... structure size is encoded in the ioctl definition, so adding bytes
to struct file_clone_range now will change the ioctl number and break
userland.

--D

>
> > struct fstrim_range {
> > __u64 start;
> > __u64 len;
> > @@ -159,6 +166,8 @@ struct inodes_stat_t {
> > #define FIFREEZE _IOWR('X', 119, int) /* Freeze */
> > #define FITHAW _IOWR('X', 120, int) /* Thaw */
> > #define FITRIM _IOWR('X', 121, struct fstrim_range) /* Trim */
> > +#define FICLONE _IOW(0x94, 9, int) /* Clone */
> > +#define FICLONERANGE _IOW(0x94, 13, struct file_clone_range) /* Clone range */
>
> FICLONE is a special case of FICLONERANGE. The whole file clone had come
> historically first and then was refined, I don't think this needs to be
> copied to the generic API. A zeroed file_clone_range is simple to use
> for that purpose.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-08-26 16:37:11

by David Sterba

[permalink] [raw]
Subject: Re: [PATCH-RFC-RESEND 1/9] vfs: pull btrfs clone API to vfs layer

On Wed, Aug 26, 2015 at 09:21:54AM -0700, Darrick J. Wong wrote:
> On Wed, Aug 26, 2015 at 03:37:23PM +0200, David Sterba wrote:
> > On Wed, Aug 26, 2015 at 04:16:42PM +0800, Peng Tao wrote:
> > > +struct file_clone_range {
> > > + __s64 src_fd;
> > > + __u64 src_offset;
> > > + __u64 src_length;
> > > + __u64 dest_offset;
> > > +};
> >
> > Might be a good idea to add some spare bytes to the structure.
>
> But... structure size is encoded in the ioctl definition, so adding bytes
> to struct file_clone_range now will change the ioctl number and break
> userland.

Oh right, I somehow did not left idea of a new ioctl definition while
writing it.

2015-08-26 22:49:11

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH-RFC-RESEND 5/9] nfs42: add .copy_range file operation

On Wed, Aug 26, 2015 at 04:16:46PM +0800, Peng Tao wrote:
> Signed-off-by: Peng Tao <[email protected]>
> ---
> fs/nfs/nfs4file.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 50 insertions(+)
>
> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> index dcd39d4..c335cb0 100644
> --- a/fs/nfs/nfs4file.c
> +++ b/fs/nfs/nfs4file.c
> @@ -4,6 +4,7 @@
> * Copyright (C) 1992 Rick Sladkey
> */
> #include <linux/fs.h>
> +#include <linux/file.h>
> #include <linux/falloc.h>
> #include <linux/nfs_fs.h>
> #include "internal.h"
> @@ -166,6 +167,54 @@ static long nfs42_fallocate(struct file *filep, int mode, loff_t offset, loff_t
> return nfs42_proc_deallocate(filep, offset, len);
> return nfs42_proc_allocate(filep, offset, len);
> }
> +
> +static noinline int
> +nfs42_file_clone_range(struct file *src_file, struct file *dst_file,
> + loff_t src_off, size_t dst_off, loff_t count)
> +{
> + struct inode *dst_inode = file_inode(dst_file);
> + struct inode *src_inode = file_inode(src_file);
> + int ret;
> +
> + /* src and dst must be different files */
> + if (src_inode == dst_inode)
> + return -EINVAL;
> +
> + /* XXX: do we lock at all? what if server needs CB_RECALL_LAYOUT? */
> + if (dst_inode < src_inode) {
> + mutex_lock_nested(&dst_inode->i_mutex, I_MUTEX_PARENT);
> + mutex_lock_nested(&src_inode->i_mutex, I_MUTEX_CHILD);
> + } else {
> + mutex_lock_nested(&src_inode->i_mutex, I_MUTEX_PARENT);
> + mutex_lock_nested(&dst_inode->i_mutex, I_MUTEX_CHILD);
> + }

Is that safe? Between two operations, the inode code be reclaimed
and re-instantiated, resulting in the second operation having a
different locking order for the same files compared to the
first operation...

> +out_unlock:
> + if (dst_inode < src_inode) {
> + mutex_unlock(&src_inode->i_mutex);
> + mutex_unlock(&dst_inode->i_mutex);
> + } else {
> + mutex_unlock(&dst_inode->i_mutex);
> + mutex_unlock(&src_inode->i_mutex);
> + }

You don't have to care about lock order on unlock.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-08-26 22:52:49

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH-RFC-RESEND 1/9] vfs: pull btrfs clone API to vfs layer

On Wed, Aug 26, 2015 at 04:16:42PM +0800, Peng Tao wrote:
> Now that a few file systems are adding clone functionality, namingly
> btrfs, NFS (later in the series) and XFS
> (ttp://oss.sgi.com/archives/xfs/2015-06/msg00407.html), it makes sense
> to pull the ioctl to common code.
>
> Add vfs_file_clone_range() helper and .clone_range file operation interface
> to allow underlying filesystems to clone between regular files.
>
> The change in do_vfs_ioctl() is defered to next patch where btrfs
> .clone_range is added, just so that we don't break btrfs CLONE ioctl
> with this patch.
>
> Signed-off-by: Peng Tao <[email protected]>
> ---
> fs/ioctl.c | 24 ++++++++++++++++++++++++
> fs/read_write.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 4 ++++
> include/uapi/linux/fs.h | 9 +++++++++
> 4 files changed, 82 insertions(+)
.....
> +int vfs_file_clone_range(struct file *src_file, struct file *dst_file,
> + loff_t off, size_t len, loff_t dstoff)
> +{
> + struct inode *src_ino;
> + struct inode *dst_ino;
> + ssize_t ret;
> +
> + if (!(src_file->f_mode & FMODE_READ) ||
> + !(dst_file->f_mode & FMODE_WRITE) ||
> + (dst_file->f_flags & O_APPEND) ||
> + !src_file->f_op || !src_file->f_op->clone_range)
> + return -EINVAL;
> +
> + src_ino = file_inode(src_file);
> + dst_ino = file_inode(dst_file);
> +
> + if (S_ISDIR(src_ino->i_mode) || S_ISDIR(dst_ino->i_mode))
> + return -EISDIR;

Whacky whitespace.

Also, shouldn't this call be restricted to S_ISREG() inodes? This
only checks for directories...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2015-08-27 06:16:11

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH-RFC-RESEND 2/9] vfs/btrfs: add .clone_range file operation

On Wed, Aug 26, 2015 at 9:17 PM, Christoph Hellwig <[email protected]> wrote:
> You need to eep the existing defintion, either as a duplicate with a
> comment, or by including the uapi fs.h and defininin it to the new name.
OK. will keep it.

>
> Also this will break cifs, which uses the BTRFS_IOC_CLONE defintion
> already.
I'll look into it as well.

Thanks,
Tao

2015-08-27 06:17:56

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH-RFC-RESEND 5/9] nfs42: add .copy_range file operation

On Thu, Aug 27, 2015 at 6:48 AM, Dave Chinner <[email protected]> wrote:
> On Wed, Aug 26, 2015 at 04:16:46PM +0800, Peng Tao wrote:
>> Signed-off-by: Peng Tao <[email protected]>
>> ---
>> fs/nfs/nfs4file.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 50 insertions(+)
>>
>> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
>> index dcd39d4..c335cb0 100644
>> --- a/fs/nfs/nfs4file.c
>> +++ b/fs/nfs/nfs4file.c
>> @@ -4,6 +4,7 @@
>> * Copyright (C) 1992 Rick Sladkey
>> */
>> #include <linux/fs.h>
>> +#include <linux/file.h>
>> #include <linux/falloc.h>
>> #include <linux/nfs_fs.h>
>> #include "internal.h"
>> @@ -166,6 +167,54 @@ static long nfs42_fallocate(struct file *filep, int mode, loff_t offset, loff_t
>> return nfs42_proc_deallocate(filep, offset, len);
>> return nfs42_proc_allocate(filep, offset, len);
>> }
>> +
>> +static noinline int
>> +nfs42_file_clone_range(struct file *src_file, struct file *dst_file,
>> + loff_t src_off, size_t dst_off, loff_t count)
>> +{
>> + struct inode *dst_inode = file_inode(dst_file);
>> + struct inode *src_inode = file_inode(src_file);
>> + int ret;
>> +
>> + /* src and dst must be different files */
>> + if (src_inode == dst_inode)
>> + return -EINVAL;
>> +
>> + /* XXX: do we lock at all? what if server needs CB_RECALL_LAYOUT? */
>> + if (dst_inode < src_inode) {
>> + mutex_lock_nested(&dst_inode->i_mutex, I_MUTEX_PARENT);
>> + mutex_lock_nested(&src_inode->i_mutex, I_MUTEX_CHILD);
>> + } else {
>> + mutex_lock_nested(&src_inode->i_mutex, I_MUTEX_PARENT);
>> + mutex_lock_nested(&dst_inode->i_mutex, I_MUTEX_CHILD);
>> + }
>
> Is that safe? Between two operations, the inode code be reclaimed
> and re-instantiated, resulting in the second operation having a
> different locking order for the same files compared to the
> first operation...
Both files are still open so I don't think we need to worry about
inode going away.

>
>> +out_unlock:
>> + if (dst_inode < src_inode) {
>> + mutex_unlock(&src_inode->i_mutex);
>> + mutex_unlock(&dst_inode->i_mutex);
>> + } else {
>> + mutex_unlock(&dst_inode->i_mutex);
>> + mutex_unlock(&src_inode->i_mutex);
>> + }
>
> You don't have to care about lock order on unlock.
Good point!

Thanks,
Tao

2015-08-27 06:23:59

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH-RFC-RESEND 1/9] vfs: pull btrfs clone API to vfs layer

On Thu, Aug 27, 2015 at 6:52 AM, Dave Chinner <[email protected]> wrote:
> On Wed, Aug 26, 2015 at 04:16:42PM +0800, Peng Tao wrote:
>> Now that a few file systems are adding clone functionality, namingly
>> btrfs, NFS (later in the series) and XFS
>> (ttp://oss.sgi.com/archives/xfs/2015-06/msg00407.html), it makes sense
>> to pull the ioctl to common code.
>>
>> Add vfs_file_clone_range() helper and .clone_range file operation interface
>> to allow underlying filesystems to clone between regular files.
>>
>> The change in do_vfs_ioctl() is defered to next patch where btrfs
>> .clone_range is added, just so that we don't break btrfs CLONE ioctl
>> with this patch.
>>
>> Signed-off-by: Peng Tao <[email protected]>
>> ---
>> fs/ioctl.c | 24 ++++++++++++++++++++++++
>> fs/read_write.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>> include/linux/fs.h | 4 ++++
>> include/uapi/linux/fs.h | 9 +++++++++
>> 4 files changed, 82 insertions(+)
> .....
>> +int vfs_file_clone_range(struct file *src_file, struct file *dst_file,
>> + loff_t off, size_t len, loff_t dstoff)
>> +{
>> + struct inode *src_ino;
>> + struct inode *dst_ino;
>> + ssize_t ret;
>> +
>> + if (!(src_file->f_mode & FMODE_READ) ||
>> + !(dst_file->f_mode & FMODE_WRITE) ||
>> + (dst_file->f_flags & O_APPEND) ||
>> + !src_file->f_op || !src_file->f_op->clone_range)
>> + return -EINVAL;
>> +
>> + src_ino = file_inode(src_file);
>> + dst_ino = file_inode(dst_file);
>> +
>> + if (S_ISDIR(src_ino->i_mode) || S_ISDIR(dst_ino->i_mode))
>> + return -EISDIR;
>
> Whacky whitespace.
>
> Also, shouldn't this call be restricted to S_ISREG() inodes? This
> only checks for directories...
Good point. I'll change it. Thanks!

Cheers,
Tao

2015-08-27 06:24:31

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH-RFC-RESEND 0/9] NFS/NFSD: add NFSv42 CLONE operation support

On Wed, Aug 26, 2015 at 9:11 PM, Christoph Hellwig <[email protected]> wrote:
> On Wed, Aug 26, 2015 at 04:16:41PM +0800, Peng Tao wrote:
>> 2. vfs_file_clone_range() does not expect file systems to do data copy, and thus
>> no rw_verify_area() required.
>
> While there is no physical copy involved, the data logically is copied from
> file A to file B. So we still need all the checks from rw_verify_area.
I see. Will add the checks.

Cheers,
Tao

2015-08-28 03:09:41

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH-RFC-RESEND 1/9] vfs: pull btrfs clone API to vfs layer

On Wed, Aug 26, 2015 at 9:37 PM, David Sterba <[email protected]> wrote:
> On Wed, Aug 26, 2015 at 04:16:42PM +0800, Peng Tao wrote:
>> +struct file_clone_range {
>> + __s64 src_fd;
>> + __u64 src_offset;
>> + __u64 src_length;
>> + __u64 dest_offset;
>> +};
>
> Might be a good idea to add some spare bytes to the structure.
>
>> struct fstrim_range {
>> __u64 start;
>> __u64 len;
>> @@ -159,6 +166,8 @@ struct inodes_stat_t {
>> #define FIFREEZE _IOWR('X', 119, int) /* Freeze */
>> #define FITHAW _IOWR('X', 120, int) /* Thaw */
>> #define FITRIM _IOWR('X', 121, struct fstrim_range) /* Trim */
>> +#define FICLONE _IOW(0x94, 9, int) /* Clone */
>> +#define FICLONERANGE _IOW(0x94, 13, struct file_clone_range) /* Clone range */
>
> FICLONE is a special case of FICLONERANGE. The whole file clone had come
> historically first and then was refined, I don't think this needs to be
> copied to the generic API. A zeroed file_clone_range is simple to use
> for that purpose.
oh, sorry I missed this one...

BTRFS_IOC_CLONE is being used by cp(1) and widely shipped with
distros. So we cannot just abandon the API. And my intention is to
keep the clone ioctl "JUST WORK" (TM) with cp(1).

Cheers,
Tao