2013-07-19 21:03:53

by Anna Schumaker

[permalink] [raw]
Subject: [RFC 0/5] NFS Server Side Copy

From: Bryan Schumaker <[email protected]>

These patches build on Zach Brown's copyfile patches to add server side
copy to both the NFS client and the NFS server.

The first patch improves on the copyfile syscall to make it usable on my
machine and also includes notes on other potential problems that I've
found. The remaining patches first implement a sync copy, then expand to
async.

My testing was done on a server exporting an ext4 filesystem exporting an
ext4 filesystem. I compared copying using the cp command to copying with
the copyfile system call.


File size: 512 MB
cp: 4.244 seconds
copyfile: 0.961 seconds

File size: 1024 MB
cp: 9.091 seconds
copyfile: 1.919 seconds

File size: 1536 MB
cp: 15.291 seconds
copyfile: 6.016 seconds


Repeating these tests on a btrfs exported filesystem supporting the copyfile
system call drops the time for copyfile to about 0.01 seconds.

Feel free to send me any questions, comments or other thoughts!

- Bryan

Bryan Schumaker (5):
Improve on the copyfile systemcall
NFSD: Implement the COPY call
NFS: Add COPY nfs operation
NFSD: Defer copying
NFS: Change copy to support async servers

fs/copy_range.c | 10 +++-
fs/nfs/callback.h | 13 ++++
fs/nfs/callback_proc.c | 9 +++
fs/nfs/callback_xdr.c | 54 ++++++++++++++++-
fs/nfs/inode.c | 2 +
fs/nfs/nfs4_fs.h | 7 +++
fs/nfs/nfs4file.c | 101 +++++++++++++++++++++++++++++++
fs/nfs/nfs4proc.c | 16 +++++
fs/nfs/nfs4xdr.c | 150 +++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfs4callback.c | 136 ++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfs4proc.c | 104 ++++++++++++++++++++++++++++++--
fs/nfsd/nfs4state.c | 15 ++++-
fs/nfsd/nfs4xdr.c | 121 +++++++++++++++++++++++++++++++++++++-
fs/nfsd/state.h | 23 +++++++-
fs/nfsd/vfs.c | 9 +++
fs/nfsd/vfs.h | 1 +
fs/nfsd/xdr4.h | 24 ++++++++
fs/nfsd/xdr4cb.h | 9 +++
include/linux/nfs4.h | 14 ++++-
include/linux/nfs_xdr.h | 33 +++++++++++
include/linux/syscalls.h | 1 +
21 files changed, 836 insertions(+), 16 deletions(-)

--
1.8.3.3



2013-07-19 21:03:56

by Anna Schumaker

[permalink] [raw]
Subject: [RFC 3/5] NFS: Add COPY nfs operation

From: Bryan Schumaker <[email protected]>

This adds the copy_range file_ops function pointer used by the
sys_copy_range() function call. This patch only implements sync copies,
so if an async copy happens we decode the stateid and ignore it.

Signed-off-by: Bryan Schumaker <[email protected]>
---
fs/nfs/inode.c | 2 +
fs/nfs/nfs4_fs.h | 4 ++
fs/nfs/nfs4file.c | 53 +++++++++++++++++
fs/nfs/nfs4proc.c | 16 ++++++
fs/nfs/nfs4xdr.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/nfs4.h | 11 +++-
include/linux/nfs_xdr.h | 30 ++++++++++
7 files changed, 265 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index af6e806..80849a0 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -677,6 +677,7 @@ struct nfs_lock_context *nfs_get_lock_context(struct nfs_open_context *ctx)
kfree(new);
return res;
}
+EXPORT_SYMBOL_GPL(nfs_get_lock_context);

void nfs_put_lock_context(struct nfs_lock_context *l_ctx)
{
@@ -689,6 +690,7 @@ void nfs_put_lock_context(struct nfs_lock_context *l_ctx)
spin_unlock(&inode->i_lock);
kfree(l_ctx);
}
+EXPORT_SYMBOL_GPL(nfs_put_lock_context);

/**
* nfs_close_context - Common close_context() routine NFSv2/v3
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index ee81e35..26c7cf0 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -300,6 +300,10 @@ is_ds_client(struct nfs_client *clp)
}
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+int nfs42_proc_copy(struct nfs_server *, struct nfs42_copy_args *, struct nfs42_copy_res *);
+#endif /* CONFIG_NFS_V4_2 */
+
extern const struct nfs4_minor_version_ops *nfs_v4_minor_ops[];

extern const u32 nfs4_fattr_bitmap[3];
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index e5b804d..ca77ab4 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -117,6 +117,56 @@ nfs4_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
return ret;
}

+#ifdef CONFIG_NFS_V4_2
+static int nfs4_find_copy_stateid(struct file *file, nfs4_stateid *stateid,
+ fmode_t mode)
+{
+ struct nfs_open_context *open;
+ struct nfs_lock_context *lock;
+ int ret;
+
+ open = nfs_file_open_context(file);
+ if (!open)
+ return PTR_ERR(open);
+
+ lock = nfs_get_lock_context(open);
+ ret = nfs4_set_rw_stateid(stateid, open, lock, mode);
+
+ if (lock)
+ nfs_put_lock_context(lock);
+ return ret;
+}
+
+static ssize_t nfs4_copy_range(struct file *file_in, loff_t pos_in,
+ struct file *file_out, loff_t pos_out,
+ size_t count)
+{
+ int err;
+ struct nfs42_copy_args args = {
+ .src_fh = NFS_FH(file_inode(file_in)),
+ .src_pos = pos_in,
+ .dst_fh = NFS_FH(file_inode(file_out)),
+ .dst_pos = pos_out,
+ .count = count,
+ };
+ struct nfs42_copy_res res;
+
+ err = nfs4_find_copy_stateid(file_in, &args.src_stateid, FMODE_READ);
+ if (err)
+ return err;
+
+ err = nfs4_find_copy_stateid(file_out, &args.dst_stateid, FMODE_WRITE);
+ if (err)
+ return err;
+
+ err = nfs42_proc_copy(NFS_SERVER(file_inode(file_out)), &args, &res);
+ if (err)
+ return err;
+
+ return res.cp_res.wr_bytes_copied;
+}
+#endif /* CONFIG_NFS_V4_2 */
+
const struct file_operations nfs4_file_operations = {
.llseek = nfs_file_llseek,
.read = do_sync_read,
@@ -134,4 +184,7 @@ const struct file_operations nfs4_file_operations = {
.splice_write = nfs_file_splice_write,
.check_flags = nfs_check_flags,
.setlease = nfs_setlease,
+#ifdef CONFIG_NFS_V4_2
+ .copy_range = nfs4_copy_range,
+#endif /* CONFIG_NFS_V4_2 */
};
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index cf11799..f7eb4fd 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7346,6 +7346,22 @@ static bool nfs41_match_stateid(const nfs4_stateid *s1,

#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+int nfs42_proc_copy(struct nfs_server *server, struct nfs42_copy_args *args,
+ struct nfs42_copy_res *res)
+{
+ struct rpc_message msg = {
+ .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_COPY],
+ .rpc_argp = args,
+ .rpc_resp = res,
+ };
+
+ dprintk("NFS call copy %p\n", &args);
+ return nfs4_call_sync(server->client, server, &msg,
+ &(args->seq_args), &(res->seq_res), 0);
+}
+#endif /* CONFIG_NFS_V4_2 */
+
static bool nfs4_match_stateid(const nfs4_stateid *s1,
const nfs4_stateid *s2)
{
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 0abfb846..d70c6bc 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -416,6 +416,16 @@ static int nfs4_stat_to_errno(int);
#define decode_sequence_maxsz 0
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+#define encode_copy_maxsz (op_encode_hdr_maxsz + \
+ XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+ XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+ 2 + 2 + 2 + 1 + 1 + 1)
+#define decode_copy_maxsz (op_decode_hdr_maxsz + \
+ 1 + XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+ 2 + 1 + XDR_QUADLEN(NFS4_VERIFIER_SIZE))
+#endif /* CONFIG_NFS_V4_2 */
+
#define NFS4_enc_compound_sz (1024) /* XXX: large enough? */
#define NFS4_dec_compound_sz (1024) /* XXX: large enough? */
#define NFS4_enc_read_sz (compound_encode_hdr_maxsz + \
@@ -875,6 +885,19 @@ const u32 nfs41_maxgetdevinfo_overhead = ((RPC_MAX_REPHEADER_WITH_AUTH +
EXPORT_SYMBOL_GPL(nfs41_maxgetdevinfo_overhead);
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+#define NFS4_enc_copy_sz (compound_encode_hdr_maxsz + \
+ encode_putfh_maxsz + \
+ encode_savefh_maxsz + \
+ encode_putfh_maxsz + \
+ encode_copy_maxsz)
+#define NFS4_dec_copy_sz (compound_decode_hdr_maxsz + \
+ decode_putfh_maxsz + \
+ decode_savefh_maxsz + \
+ decode_putfh_maxsz + \
+ decode_copy_maxsz)
+#endif /* CONFIG_NFS_V4_2 */
+
static const umode_t nfs_type2fmt[] = {
[NF4BAD] = 0,
[NF4REG] = S_IFREG,
@@ -2048,6 +2071,27 @@ static void encode_free_stateid(struct xdr_stream *xdr,
}
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+static void encode_copy(struct xdr_stream *xdr,
+ struct nfs42_copy_args *args,
+ struct compound_hdr *hdr)
+{
+ encode_op_hdr(xdr, OP_COPY, decode_copy_maxsz, hdr);
+ encode_nfs4_stateid(xdr, &args->src_stateid);
+ encode_nfs4_stateid(xdr, &args->dst_stateid);
+
+ /* TODO: Partial file copy with changable offsets */
+ encode_uint64(xdr, args->src_pos); /* src offset */
+ encode_uint64(xdr, args->dst_pos); /* dst offset */
+ encode_uint64(xdr, args->count); /* count */
+
+ encode_uint32(xdr, COPY4_METADATA); /* flags */
+
+ encode_uint32(xdr, 0); /* ca_destination */
+ encode_uint32(xdr, 0); /* src server list */
+}
+#endif /* CONFIG_NFS_V4_2 */
+
/*
* END OF "GENERIC" ENCODE ROUTINES.
*/
@@ -2994,6 +3038,29 @@ static void nfs4_xdr_enc_free_stateid(struct rpc_rqst *req,
}
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+/*
+ * Encode COPY request
+ */
+static void nfs4_xdr_enc_copy(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ struct nfs42_copy_args *args)
+{
+ struct compound_hdr hdr = {
+ .minorversion = nfs4_xdr_minorversion(&args->seq_args),
+ };
+
+ encode_compound_hdr(xdr, req, &hdr);
+ encode_sequence(xdr, &args->seq_args, &hdr);
+ encode_putfh(xdr, args->src_fh, &hdr);
+ encode_savefh(xdr, &hdr);
+ encode_putfh(xdr, args->dst_fh, &hdr);
+ encode_copy(xdr, args, &hdr);
+ encode_nops(&hdr);
+ return;
+}
+#endif /* CONFIG_NFS_V4_2 */
+
static void print_overflow_msg(const char *func, const struct xdr_stream *xdr)
{
dprintk("nfs: %s: prematurely hit end of receive buffer. "
@@ -5943,6 +6010,54 @@ out_overflow:
}
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+static int decode_write_response(struct xdr_stream *xdr,
+ struct nfs42_write_response *write_res)
+{
+ __be32 *p;
+ int num_ids;
+
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ goto out_overflow;
+ num_ids = be32_to_cpup(p);
+
+ if (num_ids == 0)
+ write_res->wr_stateid = NULL;
+ else {
+ write_res->wr_stateid = kmalloc(sizeof(nfs4_stateid), GFP_KERNEL);
+ if (decode_stateid(xdr, write_res->wr_stateid) != 0)
+ goto out_free;
+ }
+
+ p = xdr_inline_decode(xdr, 12);
+ if (unlikely(!p))
+ goto out_free;
+ p = xdr_decode_hyper(p, &write_res->wr_bytes_copied);
+ write_res->wr_committed = be32_to_cpup(p);
+
+ return decode_write_verifier(xdr, &write_res->wr_verf);
+
+out_free:
+ kfree(write_res->wr_stateid);
+
+out_overflow:
+ print_overflow_msg(__func__, xdr);
+ return -EIO;
+}
+
+static int decode_copy(struct xdr_stream *xdr, struct nfs42_copy_res *res)
+{
+ int status;
+
+ status = decode_op_hdr(xdr, OP_COPY);
+ if (status)
+ return status;
+
+ return decode_write_response(xdr, &res->cp_res);
+}
+#endif /* CONFIG_NFS_V4_2 */
+
/*
* END OF "GENERIC" DECODE ROUTINES.
*/
@@ -7155,6 +7270,38 @@ out:
}
#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+/*
+ * Decode COPY request
+ */
+static int nfs4_xdr_dec_copy(struct rpc_rqst *rqstp,
+ struct xdr_stream *xdr,
+ struct nfs42_copy_res *res)
+{
+ struct compound_hdr hdr;
+ int status;
+
+ status = decode_compound_hdr(xdr, &hdr);
+ if (status)
+ goto out;
+ status = decode_sequence(xdr, &res->seq_res, rqstp);
+ if (status)
+ goto out;
+ status = decode_putfh(xdr);
+ if (status)
+ goto out;
+ status = decode_savefh(xdr);
+ if (status)
+ goto out;
+ status = decode_putfh(xdr);
+ if (status)
+ goto out;
+ status = decode_copy(xdr, res);
+out:
+ return status;
+}
+#endif /* CONFIG_NFS_V4_2 */
+
/**
* nfs4_decode_dirent - Decode a single NFSv4 directory entry stored in
* the local page cache.
@@ -7364,6 +7511,9 @@ struct rpc_procinfo nfs4_procedures[] = {
enc_bind_conn_to_session, dec_bind_conn_to_session),
PROC(DESTROY_CLIENTID, enc_destroy_clientid, dec_destroy_clientid),
#endif /* CONFIG_NFS_V4_1 */
+#if defined(CONFIG_NFS_V4_2)
+ PROC(COPY, enc_copy, dec_copy),
+#endif /* CONFIG_NFS_V4_2 */
};

const struct rpc_version nfs_version4 = {
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index ebf60c6..347de63 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -120,7 +120,7 @@ enum nfs_opnum4 {
Needs to be updated if more operations are defined in future.*/

#define FIRST_NFS4_OP OP_ACCESS
-#define LAST_NFS4_OP OP_RECLAIM_COMPLETE
+#define LAST_NFS4_OP OP_COPY

enum nfsstat4 {
NFS4_OK = 0,
@@ -333,6 +333,12 @@ enum lock_type4 {
NFS4_WRITEW_LT = 4
};

+#ifdef CONFIG_NFS_V4_2
+enum copy_flags4 {
+ COPY4_GUARDED = (1 << 0),
+ COPY4_METADATA = (1 << 1),
+};
+#endif

/* Mandatory Attributes */
#define FATTR4_WORD0_SUPPORTED_ATTRS (1UL << 0)
@@ -481,6 +487,9 @@ enum {
NFSPROC4_CLNT_GETDEVICELIST,
NFSPROC4_CLNT_BIND_CONN_TO_SESSION,
NFSPROC4_CLNT_DESTROY_CLIENTID,
+
+ /* nfs42 */
+ NFSPROC4_CLNT_COPY,
};

/* nfs41 types */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 8651574..0bc6b14 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1204,6 +1204,36 @@ struct pnfs_ds_commit_info {

#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+struct nfs42_write_response
+{
+ nfs4_stateid *wr_stateid;
+ u64 wr_bytes_copied;
+ int wr_committed;
+ struct nfs_write_verifier wr_verf;
+};
+
+struct nfs42_copy_args {
+ struct nfs4_sequence_args seq_args;
+
+ struct nfs_fh *src_fh;
+ nfs4_stateid src_stateid;
+ u64 src_pos;
+
+ struct nfs_fh *dst_fh;
+ nfs4_stateid dst_stateid;
+ u64 dst_pos;
+
+ u64 count;
+};
+
+struct nfs42_copy_res {
+ struct nfs4_sequence_res seq_res;
+ unsigned int status;
+ struct nfs42_write_response cp_res;
+};
+#endif
+
struct nfs_page;

#define NFS_PAGEVEC_SIZE (8U)
--
1.8.3.3


2013-07-22 19:30:03

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On Mon, Jul 22, 2013 at 03:17:29PM -0400, Bryan Schumaker wrote:
> On 07/22/2013 02:50 PM, J. Bruce Fields wrote:
> > On Fri, Jul 19, 2013 at 05:03:49PM -0400, [email protected] wrote:
> >> From: Bryan Schumaker <[email protected]>
> >>
> >> Rather than performing the copy right away, schedule it to run later and
> >> reply to the client. Later, send a callback to notify the client that
> >> the copy has finished.
> >
> > I believe you need to implement the referring triple support described
> > in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
> > described in
> > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> > .
>
> I'll re-read and re-write.
>
> >
> > I see cb_delay initialized below, but not otherwise used. Am I missing
> > anything?
>
> Whoops! I was using that earlier to try to fake up a callback, but I eventually decided it's easier to just do the copy asynchronously. I must have forgotten to take it out :(
>
> >
> > What about OFFLOAD_STATUS and OFFLOAD_ABORT?
>
> I haven't thought out those too much... I haven't thought about a use for them on the client yet.

If it might be a long-running copy, I assume the client needs the
ability to abort if the caller is killed.

(Dumb question: what happens on the network partition? Does the server
abort the copy when it expires the client state?)

In any case,
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
says "If a server's COPY operation returns a stateid, then the server
MUST also support these operations: CB_OFFLOAD, OFFLOAD_ABORT, and
OFFLOAD_STATUS."

So even if we've no use for them on the client then we still need to
implement them (and probably just write a basic pynfs test). Either
that or update the spec.

> > In some common cases the reply will be very quick, and we might be
> > better off handling it synchronously. Could we implement a heuristic
> > like "copy synchronously if the filesystem has special support or the
> > range is less than the maximum iosize, otherwise copy asynchronously"?
>
> I'm sure that can be done, I'm just not sure how to do it yet...

OK, thanks.

--b.

2013-07-19 21:03:57

by Anna Schumaker

[permalink] [raw]
Subject: [RFC 4/5] NFSD: Defer copying

From: Bryan Schumaker <[email protected]>

Rather than performing the copy right away, schedule it to run later and
reply to the client. Later, send a callback to notify the client that
the copy has finished.

Signed-off-by: Bryan Schumaker <[email protected]>
---
fs/nfsd/nfs4callback.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfs4proc.c | 59 +++++++++++++++------
fs/nfsd/nfs4state.c | 11 ++++
fs/nfsd/state.h | 21 ++++++++
fs/nfsd/xdr4cb.h | 9 ++++
5 files changed, 221 insertions(+), 15 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 7f05cd1..8f797e1 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -52,6 +52,9 @@ enum {
NFSPROC4_CLNT_CB_NULL = 0,
NFSPROC4_CLNT_CB_RECALL,
NFSPROC4_CLNT_CB_SEQUENCE,
+
+ /* NFS v4.2 callback */
+ NFSPROC4_CLNT_CB_OFFLOAD,
};

struct nfs4_cb_compound_hdr {
@@ -110,6 +113,7 @@ enum nfs_cb_opnum4 {
OP_CB_WANTS_CANCELLED = 12,
OP_CB_NOTIFY_LOCK = 13,
OP_CB_NOTIFY_DEVICEID = 14,
+ OP_CB_OFFLOAD = 15,
OP_CB_ILLEGAL = 10044
};

@@ -469,6 +473,31 @@ out_default:
return nfs_cb_stat_to_errno(nfserr);
}

+static void encode_cb_offload4args(struct xdr_stream *xdr,
+ const struct nfs4_cb_offload *offload,
+ struct nfs4_cb_compound_hdr *hdr)
+{
+ __be32 *p;
+
+ if (hdr->minorversion < 2)
+ return;
+
+ encode_nfs_cb_opnum4(xdr, OP_CB_OFFLOAD);
+ encode_nfs_fh4(xdr, &offload->co_dst_fh);
+ encode_stateid4(xdr, &offload->co_stid->sc_stateid);
+
+ p = xdr_reserve_space(xdr, 4);
+ *p = cpu_to_be32(1);
+ encode_stateid4(xdr, &offload->co_stid->sc_stateid);
+
+ p = xdr_reserve_space(xdr, 12 + NFS4_VERIFIER_SIZE);
+ p = xdr_encode_hyper(p, offload->co_count);
+ *p++ = cpu_to_be32(offload->co_stable_how);
+ xdr_encode_opaque_fixed(p, offload->co_verifier.data, NFS4_VERIFIER_SIZE);
+
+ hdr->nops++;
+}
+
/*
* NFSv4.0 and NFSv4.1 XDR encode functions
*
@@ -505,6 +534,23 @@ static void nfs4_xdr_enc_cb_recall(struct rpc_rqst *req, struct xdr_stream *xdr,
encode_cb_nops(&hdr);
}

+/*
+ * CB_OFFLOAD
+ */
+static void nfs4_xdr_enc_cb_offload(struct rpc_rqst *req, struct xdr_stream *xdr,
+ const struct nfsd4_callback *cb)
+{
+ const struct nfs4_cb_offload *args = cb->cb_op;
+ struct nfs4_cb_compound_hdr hdr = {
+ .ident = cb->cb_clp->cl_cb_ident,
+ .minorversion = cb->cb_minorversion,
+ };
+
+ encode_cb_compound4args(xdr, &hdr);
+ encode_cb_sequence4args(xdr, cb, &hdr);
+ encode_cb_offload4args(xdr, args, &hdr);
+ encode_cb_nops(&hdr);
+}

/*
* NFSv4.0 and NFSv4.1 XDR decode functions
@@ -552,6 +598,36 @@ out:
}

/*
+ * CB_OFFLOAD
+ */
+static int nfs4_xdr_dec_cb_offload(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
+ struct nfsd4_callback *cb)
+{
+ struct nfs4_cb_compound_hdr hdr;
+ enum nfsstat4 nfserr;
+ int status;
+
+ status = decode_cb_compound4res(xdr, &hdr);
+ if (unlikely(status))
+ goto out;
+
+ if (cb != NULL) {
+ status = decode_cb_sequence4res(xdr, cb);
+ if (unlikely(status))
+ goto out;
+ }
+
+ status = decode_cb_op_status(xdr, OP_CB_OFFLOAD, &nfserr);
+ if (unlikely(status))
+ goto out;
+ if (unlikely(nfserr != NFS4_OK))
+ status = nfs_cb_stat_to_errno(nfserr);
+
+out:
+ return status;
+}
+
+/*
* RPC procedure tables
*/
#define PROC(proc, call, argtype, restype) \
@@ -568,6 +644,7 @@ out:
static struct rpc_procinfo nfs4_cb_procedures[] = {
PROC(CB_NULL, NULL, cb_null, cb_null),
PROC(CB_RECALL, COMPOUND, cb_recall, cb_recall),
+ PROC(CB_OFFLOAD, COMPOUND, cb_offload, cb_offload),
};

static struct rpc_version nfs_cb_version4 = {
@@ -1017,6 +1094,11 @@ void nfsd4_init_callback(struct nfsd4_callback *cb)
INIT_WORK(&cb->cb_work, nfsd4_do_callback_rpc);
}

+void nfsd4_init_delayed_callback(struct nfsd4_callback *cb)
+{
+ INIT_DELAYED_WORK(&cb->cb_delay, nfsd4_do_callback_rpc);
+}
+
void nfsd4_cb_recall(struct nfs4_delegation *dp)
{
struct nfsd4_callback *cb = &dp->dl_recall;
@@ -1036,3 +1118,57 @@ void nfsd4_cb_recall(struct nfs4_delegation *dp)

run_nfsd4_cb(&dp->dl_recall);
}
+
+static void nfsd4_cb_offload_done(struct rpc_task *task, void *calldata)
+{
+ struct nfsd4_callback *cb = calldata;
+ struct nfs4_client *clp = cb->cb_clp;
+ struct rpc_clnt *current_rpc_client = clp->cl_cb_client;
+
+ nfsd4_cb_done(task, calldata);
+
+ if (current_rpc_client != task->tk_client)
+ return;
+
+ if (cb->cb_done)
+ return;
+
+ if (task->tk_status != 0)
+ nfsd4_mark_cb_down(clp, task->tk_status);
+ cb->cb_done = true;
+}
+
+static void nfsd4_cb_offload_release(void *calldata)
+{
+ struct nfsd4_callback *cb = calldata;
+ struct nfs4_cb_offload *offload = container_of(cb, struct nfs4_cb_offload, co_callback);
+
+ if (cb->cb_done) {
+ nfs4_free_offload_stateid(offload->co_stid);
+ kfree(offload);
+ }
+}
+
+static const struct rpc_call_ops nfsd4_cb_offload_ops = {
+ .rpc_call_prepare = nfsd4_cb_prepare,
+ .rpc_call_done = nfsd4_cb_offload_done,
+ .rpc_release = nfsd4_cb_offload_release,
+};
+
+void nfsd4_cb_offload(struct nfs4_cb_offload *offload)
+{
+ struct nfsd4_callback *cb = &offload->co_callback;
+
+ cb->cb_op = offload;
+ cb->cb_clp = offload->co_stid->sc_client;
+ cb->cb_msg.rpc_proc = &nfs4_cb_procedures[NFSPROC4_CLNT_CB_OFFLOAD];
+ cb->cb_msg.rpc_argp = cb;
+ cb->cb_msg.rpc_resp = cb;
+
+ cb->cb_ops = &nfsd4_cb_offload_ops;
+
+ INIT_LIST_HEAD(&cb->cb_per_client);
+ cb->cb_done = true;
+
+ run_nfsd4_cb(cb);
+}
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index d4584ea..66a787f 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -35,6 +35,7 @@
#include <linux/file.h>
#include <linux/slab.h>

+#include "state.h"
#include "idmap.h"
#include "cache.h"
#include "xdr4.h"
@@ -1062,29 +1063,57 @@ out:
return status;
}

+static void
+nfsd4_copy_async(struct work_struct *w)
+{
+ __be32 status;
+ struct nfs4_cb_offload *offload;
+
+ offload = container_of(w, struct nfs4_cb_offload, co_work);
+ status = nfsd_copy_range(offload->co_src_file, offload->co_src_pos,
+ offload->co_dst_file, offload->co_dst_pos,
+ offload->co_count);
+
+ if (status == nfs_ok) {
+ offload->co_stable_how = NFS_FILE_SYNC;
+ gen_boot_verifier(&offload->co_verifier, offload->co_net);
+ fput(offload->co_src_file);
+ fput(offload->co_dst_file);
+ }
+ nfsd4_cb_offload(offload);
+}
+
static __be32
nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
struct nfsd4_copy *copy)
{
- __be32 status;
struct file *src = NULL, *dst = NULL;
+ struct nfs4_cb_offload *offload;

- status = nfsd4_verify_copy(rqstp, cstate, copy, &src, &dst);
- if (status)
- return status;
-
- status = nfsd_copy_range(src, copy->cp_src_pos,
- dst, copy->cp_dst_pos,
- copy->cp_count);
+ if (nfsd4_verify_copy(rqstp, cstate, copy, &src, &dst))
+ return nfserr_jukebox;

- if (status == nfs_ok) {
- copy->cp_res.wr_stateid = NULL;
- copy->cp_res.wr_bytes_written = copy->cp_count;
- copy->cp_res.wr_stable_how = NFS_FILE_SYNC;
- gen_boot_verifier(&copy->cp_res.wr_verifier, SVC_NET(rqstp));
- }
+ offload = kmalloc(sizeof(struct nfs4_cb_offload), GFP_KERNEL);
+ if (!offload)
+ return nfserr_jukebox;

- return status;
+ offload->co_src_file = get_file(src);
+ offload->co_dst_file = get_file(dst);
+ offload->co_src_pos = copy->cp_src_pos;
+ offload->co_dst_pos = copy->cp_dst_pos;
+ offload->co_count = copy->cp_count;
+ offload->co_stid = nfs4_alloc_offload_stateid(cstate->session->se_client);
+ offload->co_net = SVC_NET(rqstp);
+ INIT_WORK(&offload->co_work, nfsd4_copy_async);
+ nfsd4_init_callback(&offload->co_callback);
+ memcpy(&offload->co_dst_fh, &cstate->current_fh, sizeof(struct knfsd_fh));
+
+ copy->cp_res.wr_stateid = &offload->co_stid->sc_stateid;
+ copy->cp_res.wr_bytes_written = 0;
+ copy->cp_res.wr_stable_how = NFS_UNSTABLE;
+
+ schedule_work(&offload->co_work);
+ return nfs_ok;
}

/* This routine never returns NFS_OK! If there are no other errors, it
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index c4e270e..582edb5 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -364,6 +364,11 @@ static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
return openlockstateid(nfs4_alloc_stid(clp, stateid_slab));
}

+struct nfs4_stid *nfs4_alloc_offload_stateid(struct nfs4_client *clp)
+{
+ return nfs4_alloc_stid(clp, stateid_slab);
+}
+
static struct nfs4_delegation *
alloc_init_deleg(struct nfs4_client *clp, struct nfs4_ol_stateid *stp, struct svc_fh *current_fh)
{
@@ -617,6 +622,12 @@ static void free_generic_stateid(struct nfs4_ol_stateid *stp)
kmem_cache_free(stateid_slab, stp);
}

+void nfs4_free_offload_stateid(struct nfs4_stid *stid)
+{
+ remove_stid(stid);
+ kmem_cache_free(stateid_slab, stid);
+}
+
static void release_lock_stateid(struct nfs4_ol_stateid *stp)
{
struct file *file;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 2478805..56682fb 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -70,6 +70,7 @@ struct nfsd4_callback {
struct rpc_message cb_msg;
const struct rpc_call_ops *cb_ops;
struct work_struct cb_work;
+ struct delayed_work cb_delay;
bool cb_done;
};

@@ -101,6 +102,22 @@ struct nfs4_delegation {
struct nfsd4_callback dl_recall;
};

+struct nfs4_cb_offload {
+ struct file *co_src_file;
+ struct file *co_dst_file;
+ u64 co_src_pos;
+ u64 co_dst_pos;
+ u64 co_count;
+ u32 co_stable_how;
+ struct knfsd_fh co_dst_fh;
+ nfs4_verifier co_verifier;
+ struct net *co_net;
+
+ struct nfs4_stid *co_stid;
+ struct work_struct co_work;
+ struct nfsd4_callback co_callback;
+};
+
/* client delegation callback info */
struct nfs4_cb_conn {
/* SETCLIENTID info */
@@ -468,10 +485,12 @@ extern void nfs4_free_openowner(struct nfs4_openowner *);
extern void nfs4_free_lockowner(struct nfs4_lockowner *);
extern int set_callback_cred(void);
extern void nfsd4_init_callback(struct nfsd4_callback *);
+extern void nfsd4_init_delayed_callback(struct nfsd4_callback *);
extern void nfsd4_probe_callback(struct nfs4_client *clp);
extern void nfsd4_probe_callback_sync(struct nfs4_client *clp);
extern void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *);
extern void nfsd4_cb_recall(struct nfs4_delegation *dp);
+extern void nfsd4_cb_offload(struct nfs4_cb_offload *);
extern int nfsd4_create_callback_queue(void);
extern void nfsd4_destroy_callback_queue(void);
extern void nfsd4_shutdown_callback(struct nfs4_client *);
@@ -480,6 +499,8 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
struct nfsd_net *nn);
extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
extern void put_client_renew(struct nfs4_client *clp);
+extern struct nfs4_stid *nfs4_alloc_offload_stateid(struct nfs4_client *);
+extern void nfs4_free_offload_stateid(struct nfs4_stid *);

/* nfs4recover operations */
extern int nfsd4_client_tracking_init(struct net *net);
diff --git a/fs/nfsd/xdr4cb.h b/fs/nfsd/xdr4cb.h
index c5c55df..75b0ef7 100644
--- a/fs/nfsd/xdr4cb.h
+++ b/fs/nfsd/xdr4cb.h
@@ -21,3 +21,12 @@
#define NFS4_dec_cb_recall_sz (cb_compound_dec_hdr_sz + \
cb_sequence_dec_sz + \
op_dec_sz)
+
+#define NFS4_enc_cb_offload_sz (cb_compound_enc_hdr_sz + \
+ cb_sequence_enc_sz + \
+ 1 + enc_stateid_sz + 2 + 1 + \
+ XDR_QUADLEN(NFS4_VERIFIER_SIZE))
+
+#define NFS4_dec_cb_offload_sz (cb_compound_dec_hdr_sz + \
+ cb_sequence_dec_sz + \
+ op_dec_sz)
--
1.8.3.3


2013-07-22 19:43:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On Mon, Jul 22, 2013 at 03:37:00PM -0400, Bryan Schumaker wrote:
> On 07/22/2013 03:30 PM, J. Bruce Fields wrote:
> > On Mon, Jul 22, 2013 at 03:17:29PM -0400, Bryan Schumaker wrote:
> >> On 07/22/2013 02:50 PM, J. Bruce Fields wrote:
> >>> On Fri, Jul 19, 2013 at 05:03:49PM -0400, [email protected] wrote:
> >>>> From: Bryan Schumaker <[email protected]>
> >>>>
> >>>> Rather than performing the copy right away, schedule it to run later and
> >>>> reply to the client. Later, send a callback to notify the client that
> >>>> the copy has finished.
> >>>
> >>> I believe you need to implement the referring triple support described
> >>> in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
> >>> described in
> >>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> >>> .
> >>
> >> I'll re-read and re-write.
> >>
> >>>
> >>> I see cb_delay initialized below, but not otherwise used. Am I missing
> >>> anything?
> >>
> >> Whoops! I was using that earlier to try to fake up a callback, but I eventually decided it's easier to just do the copy asynchronously. I must have forgotten to take it out :(
> >>
> >>>
> >>> What about OFFLOAD_STATUS and OFFLOAD_ABORT?
> >>
> >> I haven't thought out those too much... I haven't thought about a use for them on the client yet.
> >
> > If it might be a long-running copy, I assume the client needs the
> > ability to abort if the caller is killed.
> >
> > (Dumb question: what happens on the network partition? Does the server
> > abort the copy when it expires the client state?)
> >
> > In any case,
> > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> > says "If a server's COPY operation returns a stateid, then the server
> > MUST also support these operations: CB_OFFLOAD, OFFLOAD_ABORT, and
> > OFFLOAD_STATUS."
> >
> > So even if we've no use for them on the client then we still need to
> > implement them (and probably just write a basic pynfs test). Either
> > that or update the spec.
>
> Fair enough. I'll think it out and do something! Easy solution: save this patch for later and only support the sync version of copy for the final version of this patch series.

I can't remember--does the spec give the server a clear way to bail out
and tell the client to fall back on a normal copy in cases where the
server knows the copy could take an unreasonable amount of time?

--b.

>
> - Bryan
>
> >
> >>> In some common cases the reply will be very quick, and we might be
> >>> better off handling it synchronously. Could we implement a heuristic
> >>> like "copy synchronously if the filesystem has special support or the
> >>> range is less than the maximum iosize, otherwise copy asynchronously"?
> >>
> >> I'm sure that can be done, I'm just not sure how to do it yet...
> >
> > OK, thanks.
> >
> > --b.
> >
>

2013-07-22 18:59:03

by Anna Schumaker

[permalink] [raw]
Subject: Re: [RFC 2/5] NFSD: Implement the COPY call

On 07/22/2013 02:05 PM, J. Bruce Fields wrote:
> On Fri, Jul 19, 2013 at 05:03:47PM -0400, [email protected] wrote:
>> From: Bryan Schumaker <[email protected]>
>>
>> I only implemented the sync version of this call, since it's the
>> easiest. I can simply call vfs_copy_range() and have the vfs do the
>> right thing for the filesystem being exported.
>>
>> Signed-off-by: Bryan Schumaker <[email protected]>
>> ---
>> fs/nfsd/nfs4proc.c | 75 ++++++++++++++++++++++++++++---
>> fs/nfsd/nfs4state.c | 4 +-
>> fs/nfsd/nfs4xdr.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>> fs/nfsd/state.h | 2 +-
>> fs/nfsd/vfs.c | 9 ++++
>> fs/nfsd/vfs.h | 1 +
>> fs/nfsd/xdr4.h | 24 ++++++++++
>> include/linux/nfs4.h | 3 ++
>> 8 files changed, 230 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index a7cee86..d4584ea 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -780,8 +780,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>
>> nfs4_lock_state();
>> /* check stateid */
>> - if ((status = nfs4_preprocess_stateid_op(SVC_NET(rqstp),
>> - cstate, &read->rd_stateid,
>> + if ((status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
>> + &cstate->current_fh,
>> + &read->rd_stateid,
>> RD_STATE, &read->rd_filp))) {
>> dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
>> goto out;
>> @@ -931,7 +932,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
>> nfs4_lock_state();
>> status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
>> - &setattr->sa_stateid, WR_STATE, NULL);
>> + &cstate->current_fh, &setattr->sa_stateid, WR_STATE, NULL);
>> nfs4_unlock_state();
>> if (status) {
>> dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
>> @@ -999,8 +1000,9 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> return nfserr_inval;
>>
>> nfs4_lock_state();
>> - status = nfs4_preprocess_stateid_op(SVC_NET(rqstp),
>> - cstate, stateid, WR_STATE, &filp);
>> + status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
>> + &cstate->current_fh, stateid,
>> + WR_STATE, &filp);
>> if (status) {
>> nfs4_unlock_state();
>> dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
>> @@ -1028,6 +1030,63 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> return status;
>> }
>>
>> +static __be32
>> +nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> + struct nfsd4_copy *copy, struct file **src, struct file **dst)
>> +{
>> + __be32 status;
>> + /* only support copying data to an existing file */
>> + if (!cstate->current_fh.fh_dentry || !cstate->save_fh.fh_dentry)
>> + return nfserr_nofilehandle;
>> +
>> + nfs4_lock_state();
>> + /* check stateids */
>> + if ((status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
>> + &cstate->save_fh,
>> + &copy->cp_src_stateid,
>> + RD_STATE, src))){
>> + dprintk("NFSD: nfsd4_copy: couldn't process src stateid!\n");
>> + goto out;
>> + }
>> +
>> + if ((status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
>> + &cstate->current_fh,
>> + &copy->cp_dst_stateid,
>> + WR_STATE, dst))){
>> + dprintk("NFSD: nfsd4_copy: couldn't process dst stateid!\n");
>> + goto out;
>> + }
>> +
>> +out:
>> + nfs4_unlock_state();
>> + return status;
>> +}
>> +
>> +static __be32
>> +nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> + struct nfsd4_copy *copy)
>> +{
>> + __be32 status;
>> + struct file *src = NULL, *dst = NULL;
>> +
>> + status = nfsd4_verify_copy(rqstp, cstate, copy, &src, &dst);
>> + if (status)
>> + return status;
>> +
>> + status = nfsd_copy_range(src, copy->cp_src_pos,
>> + dst, copy->cp_dst_pos,
>> + copy->cp_count);
>> +
>> + if (status == nfs_ok) {
>> + copy->cp_res.wr_stateid = NULL;
>> + copy->cp_res.wr_bytes_written = copy->cp_count;
>> + copy->cp_res.wr_stable_how = NFS_FILE_SYNC;
>> + gen_boot_verifier(&copy->cp_res.wr_verifier, SVC_NET(rqstp));
>> + }
>> +
>> + return status;
>> +}
>> +
>> /* This routine never returns NFS_OK! If there are no other errors, it
>> * will return NFSERR_SAME or NFSERR_NOT_SAME depending on whether the
>> * attributes matched. VERIFY is implemented by mapping NFSERR_SAME
>> @@ -1840,6 +1899,12 @@ static struct nfsd4_operation nfsd4_ops[] = {
>> .op_get_currentstateid = (stateid_getter)nfsd4_get_freestateid,
>> .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
>> },
>> +
>> + /* NFSv4.2 operations */
>> + [OP_COPY] = {
>> + .op_func = (nfsd4op_func)nfsd4_copy,
>> + .op_name = "OP_COPY",
>
> This needs more fields filled in. Probably take the OP_WRITE entry as a
> starting point.

I'll look, thanks for the tip!

>
>> + }
>> };
>>
>> #ifdef NFSD_DEBUG
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 280acef..c4e270e 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -3613,12 +3613,12 @@ static __be32 nfsd4_lookup_stateid(stateid_t *stateid, unsigned char typemask,
>> */
>> __be32
>> nfs4_preprocess_stateid_op(struct net *net, struct nfsd4_compound_state *cstate,
>> - stateid_t *stateid, int flags, struct file **filpp)
>> + struct svc_fh *current_fh, stateid_t *stateid,
>> + int flags, struct file **filpp)
>> {
>> struct nfs4_stid *s;
>> struct nfs4_ol_stateid *stp = NULL;
>> struct nfs4_delegation *dp = NULL;
>> - struct svc_fh *current_fh = &cstate->current_fh;
>> struct inode *ino = current_fh->fh_dentry->d_inode;
>> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>> __be32 status;
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index 0c0f3ea9..8f84e9e 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -1485,6 +1485,26 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
>> }
>>
>> static __be32
>> +nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
>> +{
>> + DECODE_HEAD;
>> +
>> + status = nfsd4_decode_stateid(argp, &copy->cp_src_stateid);
>> + if (status)
>> + return status;
>> + status = nfsd4_decode_stateid(argp, &copy->cp_dst_stateid);
>> + if (status)
>> + return status;
>> +
>> + READ_BUF(24);
>> + READ64(copy->cp_src_pos);
>> + READ64(copy->cp_dst_pos);
>> + READ64(copy->cp_count);
>> +
>> + DECODE_TAIL;
>> +}
>> +
>> +static __be32
>> nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p)
>> {
>> return nfs_ok;
>> @@ -1599,6 +1619,70 @@ static nfsd4_dec nfsd41_dec_ops[] = {
>> [OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete,
>> };
>>
>> +static nfsd4_dec nfsd42_dec_ops[] = {
>> + [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access,
>> + [OP_CLOSE] = (nfsd4_dec)nfsd4_decode_close,
>> + [OP_COMMIT] = (nfsd4_dec)nfsd4_decode_commit,
>> + [OP_CREATE] = (nfsd4_dec)nfsd4_decode_create,
>> + [OP_DELEGPURGE] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_DELEGRETURN] = (nfsd4_dec)nfsd4_decode_delegreturn,
>> + [OP_GETATTR] = (nfsd4_dec)nfsd4_decode_getattr,
>> + [OP_GETFH] = (nfsd4_dec)nfsd4_decode_noop,
>> + [OP_LINK] = (nfsd4_dec)nfsd4_decode_link,
>> + [OP_LOCK] = (nfsd4_dec)nfsd4_decode_lock,
>> + [OP_LOCKT] = (nfsd4_dec)nfsd4_decode_lockt,
>> + [OP_LOCKU] = (nfsd4_dec)nfsd4_decode_locku,
>> + [OP_LOOKUP] = (nfsd4_dec)nfsd4_decode_lookup,
>> + [OP_LOOKUPP] = (nfsd4_dec)nfsd4_decode_noop,
>> + [OP_NVERIFY] = (nfsd4_dec)nfsd4_decode_verify,
>> + [OP_OPEN] = (nfsd4_dec)nfsd4_decode_open,
>> + [OP_OPENATTR] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_OPEN_CONFIRM] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_OPEN_DOWNGRADE] = (nfsd4_dec)nfsd4_decode_open_downgrade,
>> + [OP_PUTFH] = (nfsd4_dec)nfsd4_decode_putfh,
>> + [OP_PUTPUBFH] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_PUTROOTFH] = (nfsd4_dec)nfsd4_decode_noop,
>> + [OP_READ] = (nfsd4_dec)nfsd4_decode_read,
>> + [OP_READDIR] = (nfsd4_dec)nfsd4_decode_readdir,
>> + [OP_READLINK] = (nfsd4_dec)nfsd4_decode_noop,
>> + [OP_REMOVE] = (nfsd4_dec)nfsd4_decode_remove,
>> + [OP_RENAME] = (nfsd4_dec)nfsd4_decode_rename,
>> + [OP_RENEW] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_RESTOREFH] = (nfsd4_dec)nfsd4_decode_noop,
>> + [OP_SAVEFH] = (nfsd4_dec)nfsd4_decode_noop,
>> + [OP_SECINFO] = (nfsd4_dec)nfsd4_decode_secinfo,
>> + [OP_SETATTR] = (nfsd4_dec)nfsd4_decode_setattr,
>> + [OP_SETCLIENTID] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_SETCLIENTID_CONFIRM]= (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_VERIFY] = (nfsd4_dec)nfsd4_decode_verify,
>> + [OP_WRITE] = (nfsd4_dec)nfsd4_decode_write,
>> + [OP_RELEASE_LOCKOWNER] = (nfsd4_dec)nfsd4_decode_notsupp,
>> +
>> + /* new operations for NFSv4.1 */
>> + [OP_BACKCHANNEL_CTL] = (nfsd4_dec)nfsd4_decode_backchannel_ctl,
>> + [OP_BIND_CONN_TO_SESSION]= (nfsd4_dec)nfsd4_decode_bind_conn_to_session,
>> + [OP_EXCHANGE_ID] = (nfsd4_dec)nfsd4_decode_exchange_id,
>> + [OP_CREATE_SESSION] = (nfsd4_dec)nfsd4_decode_create_session,
>> + [OP_DESTROY_SESSION] = (nfsd4_dec)nfsd4_decode_destroy_session,
>> + [OP_FREE_STATEID] = (nfsd4_dec)nfsd4_decode_free_stateid,
>> + [OP_GET_DIR_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_SECINFO_NO_NAME] = (nfsd4_dec)nfsd4_decode_secinfo_no_name,
>> + [OP_SEQUENCE] = (nfsd4_dec)nfsd4_decode_sequence,
>> + [OP_SET_SSV] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_TEST_STATEID] = (nfsd4_dec)nfsd4_decode_test_stateid,
>> + [OP_WANT_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp,
>> + [OP_DESTROY_CLIENTID] = (nfsd4_dec)nfsd4_decode_destroy_clientid,
>> + [OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete,
>> +
>> + /* new operations for NFS v4.2 */
>> + [OP_COPY] = (nfsd4_dec)nfsd4_decode_copy,
>> +};
>> +
>> struct nfsd4_minorversion_ops {
>> nfsd4_dec *decoders;
>> int nops;
>> @@ -1607,7 +1691,7 @@ struct nfsd4_minorversion_ops {
>> static struct nfsd4_minorversion_ops nfsd4_minorversion[] = {
>> [0] = { nfsd4_dec_ops, ARRAY_SIZE(nfsd4_dec_ops) },
>> [1] = { nfsd41_dec_ops, ARRAY_SIZE(nfsd41_dec_ops) },
>> - [2] = { nfsd41_dec_ops, ARRAY_SIZE(nfsd41_dec_ops) },
>> + [2] = { nfsd42_dec_ops, ARRAY_SIZE(nfsd42_dec_ops) },
>> };
>>
>> static __be32
>> @@ -3518,6 +3602,38 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
>> return nfserr;
>> }
>>
>> +static void
>> +nfsd42_encode_write_res(struct nfsd4_compoundres *resp, struct nfsd42_write_res *write)
>> +{
>> + __be32 *p;
>> +
>> + RESERVE_SPACE(4);
>> +
>> + if (write->wr_stateid == NULL) {
>> + WRITE32(0);
>> + ADJUST_ARGS();
>> + } else {
>> + WRITE32(1);
>> + ADJUST_ARGS();
>> + nfsd4_encode_stateid(resp, write->wr_stateid);
>> + }
>> +
>> + RESERVE_SPACE(12 + NFS4_VERIFIER_SIZE);
>> + WRITE64(write->wr_bytes_written);
>> + WRITE32(write->wr_stable_how);
>> + WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
>> + ADJUST_ARGS();
>
> If I'm reading
> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19 14.1.2
> correctly.... This should be just an offset in the error case, right?

Yeah, I think it should be... I'll fix that up once I figure out where I can find an offset!

>
> Also, may as well share code in the succesful case with
> nfsd4_encode_write().

Okay, I'll see what I can do. Thanks!

- Bryan
>
> --b.
>
>> +}
>> +
>> +static __be32
>> +nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr,
>> + struct nfsd4_copy *copy)
>> +{
>> + if (!nfserr)
>> + nfsd42_encode_write_res(resp, &copy->cp_res);
>> + return nfserr;
>> +}
>> +
>> static __be32
>> nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p)
>> {
>> @@ -3590,6 +3706,9 @@ static nfsd4_enc nfsd4_enc_ops[] = {
>> [OP_WANT_DELEGATION] = (nfsd4_enc)nfsd4_encode_noop,
>> [OP_DESTROY_CLIENTID] = (nfsd4_enc)nfsd4_encode_noop,
>> [OP_RECLAIM_COMPLETE] = (nfsd4_enc)nfsd4_encode_noop,
>> +
>> + /* NFSv4.2 operations */
>> + [OP_COPY] = (nfsd4_enc)nfsd4_encode_copy,
>> };
>>
>> /*
>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
>> index 424d8f5..2478805 100644
>> --- a/fs/nfsd/state.h
>> +++ b/fs/nfsd/state.h
>> @@ -455,7 +455,7 @@ struct nfsd4_compound_state;
>> struct nfsd_net;
>>
>> extern __be32 nfs4_preprocess_stateid_op(struct net *net,
>> - struct nfsd4_compound_state *cstate,
>> + struct nfsd4_compound_state *cstate, struct svc_fh *,
>> stateid_t *stateid, int flags, struct file **filp);
>> extern void nfs4_lock_state(void);
>> extern void nfs4_unlock_state(void);
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 8ff6a00..d77958d 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -649,6 +649,15 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
>> }
>> #endif
>>
>> +__be32 nfsd_copy_range(struct file *src, u64 src_pos,
>> + struct file *dst, u64 dst_pos,
>> + u64 count)
>> +{
>> + int err = vfs_copy_range(src, src_pos, dst, dst_pos, count);
>> + if (err < 0)
>> + return nfserrno(err);
>> + return vfs_fsync_range(dst, dst_pos, dst_pos + count, 0);
>> +}
>> #endif /* defined(CONFIG_NFSD_V4) */
>>
>> #ifdef CONFIG_NFSD_V3
>> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
>> index a4be2e3..0f26c3b 100644
>> --- a/fs/nfsd/vfs.h
>> +++ b/fs/nfsd/vfs.h
>> @@ -86,6 +86,7 @@ __be32 nfsd_symlink(struct svc_rqst *, struct svc_fh *,
>> struct svc_fh *res, struct iattr *);
>> __be32 nfsd_link(struct svc_rqst *, struct svc_fh *,
>> char *, int, struct svc_fh *);
>> +__be32 nfsd_copy_range(struct file *, u64, struct file *, u64, u64);
>> __be32 nfsd_rename(struct svc_rqst *,
>> struct svc_fh *, char *, int,
>> struct svc_fh *, char *, int);
>> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
>> index b3ed644..55b9ef7 100644
>> --- a/fs/nfsd/xdr4.h
>> +++ b/fs/nfsd/xdr4.h
>> @@ -430,6 +430,27 @@ struct nfsd4_reclaim_complete {
>> u32 rca_one_fs;
>> };
>>
>> +struct nfsd42_write_res {
>> + stateid_t *wr_stateid;
>> + u64 wr_bytes_written;
>> + u32 wr_stable_how;
>> + nfs4_verifier wr_verifier;
>> +};
>> +
>> +struct nfsd4_copy {
>> + /* request */
>> + stateid_t cp_src_stateid;
>> + stateid_t cp_dst_stateid;
>> +
>> + u64 cp_src_pos;
>> + u64 cp_dst_pos;
>> +
>> + u64 cp_count;
>> +
>> + /* response */
>> + struct nfsd42_write_res cp_res;
>> +};
>> +
>> struct nfsd4_op {
>> int opnum;
>> __be32 status;
>> @@ -475,6 +496,9 @@ struct nfsd4_op {
>> struct nfsd4_reclaim_complete reclaim_complete;
>> struct nfsd4_test_stateid test_stateid;
>> struct nfsd4_free_stateid free_stateid;
>> +
>> + /* NFSv4.2 */
>> + struct nfsd4_copy copy;
>> } u;
>> struct nfs4_replay * replay;
>> };
>> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
>> index e36dee5..ebf60c6 100644
>> --- a/include/linux/nfs4.h
>> +++ b/include/linux/nfs4.h
>> @@ -110,6 +110,9 @@ enum nfs_opnum4 {
>> OP_DESTROY_CLIENTID = 57,
>> OP_RECLAIM_COMPLETE = 58,
>>
>> + /* nfs42 */
>> + OP_COPY = 59,
>> +
>> OP_ILLEGAL = 10044,
>> };
>>
>> --
>> 1.8.3.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html


2013-07-22 19:55:58

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On Mon, Jul 22, 2013 at 03:54:00PM -0400, Bryan Schumaker wrote:
> On 07/22/2013 03:43 PM, J. Bruce Fields wrote:
> > On Mon, Jul 22, 2013 at 03:37:00PM -0400, Bryan Schumaker wrote:
> >> On 07/22/2013 03:30 PM, J. Bruce Fields wrote:
> >>> On Mon, Jul 22, 2013 at 03:17:29PM -0400, Bryan Schumaker wrote:
> >>>> On 07/22/2013 02:50 PM, J. Bruce Fields wrote:
> >>>>> On Fri, Jul 19, 2013 at 05:03:49PM -0400, [email protected] wrote:
> >>>>>> From: Bryan Schumaker <[email protected]>
> >>>>>>
> >>>>>> Rather than performing the copy right away, schedule it to run later and
> >>>>>> reply to the client. Later, send a callback to notify the client that
> >>>>>> the copy has finished.
> >>>>>
> >>>>> I believe you need to implement the referring triple support described
> >>>>> in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
> >>>>> described in
> >>>>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> >>>>> .
> >>>>
> >>>> I'll re-read and re-write.
> >>>>
> >>>>>
> >>>>> I see cb_delay initialized below, but not otherwise used. Am I missing
> >>>>> anything?
> >>>>
> >>>> Whoops! I was using that earlier to try to fake up a callback, but I eventually decided it's easier to just do the copy asynchronously. I must have forgotten to take it out :(
> >>>>
> >>>>>
> >>>>> What about OFFLOAD_STATUS and OFFLOAD_ABORT?
> >>>>
> >>>> I haven't thought out those too much... I haven't thought about a use for them on the client yet.
> >>>
> >>> If it might be a long-running copy, I assume the client needs the
> >>> ability to abort if the caller is killed.
> >>>
> >>> (Dumb question: what happens on the network partition? Does the server
> >>> abort the copy when it expires the client state?)
> >>>
> >>> In any case,
> >>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> >>> says "If a server's COPY operation returns a stateid, then the server
> >>> MUST also support these operations: CB_OFFLOAD, OFFLOAD_ABORT, and
> >>> OFFLOAD_STATUS."
> >>>
> >>> So even if we've no use for them on the client then we still need to
> >>> implement them (and probably just write a basic pynfs test). Either
> >>> that or update the spec.
> >>
> >> Fair enough. I'll think it out and do something! Easy solution: save this patch for later and only support the sync version of copy for the final version of this patch series.
> >
> > I can't remember--does the spec give the server a clear way to bail out
> > and tell the client to fall back on a normal copy in cases where the
> > server knows the copy could take an unreasonable amount of time?
> >
> > --b.
>
> I don't think so. Is there ever a case where copying over the network would be slower than copying on the server?

Mybe not, but if the copy will take a minute, then we don't want to tie
up an rpc slot for a minute.

--b.

2013-07-19 21:03:55

by Anna Schumaker

[permalink] [raw]
Subject: [RFC 2/5] NFSD: Implement the COPY call

From: Bryan Schumaker <[email protected]>

I only implemented the sync version of this call, since it's the
easiest. I can simply call vfs_copy_range() and have the vfs do the
right thing for the filesystem being exported.

Signed-off-by: Bryan Schumaker <[email protected]>
---
fs/nfsd/nfs4proc.c | 75 ++++++++++++++++++++++++++++---
fs/nfsd/nfs4state.c | 4 +-
fs/nfsd/nfs4xdr.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++-
fs/nfsd/state.h | 2 +-
fs/nfsd/vfs.c | 9 ++++
fs/nfsd/vfs.h | 1 +
fs/nfsd/xdr4.h | 24 ++++++++++
include/linux/nfs4.h | 3 ++
8 files changed, 230 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index a7cee86..d4584ea 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -780,8 +780,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,

nfs4_lock_state();
/* check stateid */
- if ((status = nfs4_preprocess_stateid_op(SVC_NET(rqstp),
- cstate, &read->rd_stateid,
+ if ((status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
+ &cstate->current_fh,
+ &read->rd_stateid,
RD_STATE, &read->rd_filp))) {
dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
goto out;
@@ -931,7 +932,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
nfs4_lock_state();
status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
- &setattr->sa_stateid, WR_STATE, NULL);
+ &cstate->current_fh, &setattr->sa_stateid, WR_STATE, NULL);
nfs4_unlock_state();
if (status) {
dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
@@ -999,8 +1000,9 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return nfserr_inval;

nfs4_lock_state();
- status = nfs4_preprocess_stateid_op(SVC_NET(rqstp),
- cstate, stateid, WR_STATE, &filp);
+ status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
+ &cstate->current_fh, stateid,
+ WR_STATE, &filp);
if (status) {
nfs4_unlock_state();
dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
@@ -1028,6 +1030,63 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return status;
}

+static __be32
+nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ struct nfsd4_copy *copy, struct file **src, struct file **dst)
+{
+ __be32 status;
+ /* only support copying data to an existing file */
+ if (!cstate->current_fh.fh_dentry || !cstate->save_fh.fh_dentry)
+ return nfserr_nofilehandle;
+
+ nfs4_lock_state();
+ /* check stateids */
+ if ((status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
+ &cstate->save_fh,
+ &copy->cp_src_stateid,
+ RD_STATE, src))){
+ dprintk("NFSD: nfsd4_copy: couldn't process src stateid!\n");
+ goto out;
+ }
+
+ if ((status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
+ &cstate->current_fh,
+ &copy->cp_dst_stateid,
+ WR_STATE, dst))){
+ dprintk("NFSD: nfsd4_copy: couldn't process dst stateid!\n");
+ goto out;
+ }
+
+out:
+ nfs4_unlock_state();
+ return status;
+}
+
+static __be32
+nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ struct nfsd4_copy *copy)
+{
+ __be32 status;
+ struct file *src = NULL, *dst = NULL;
+
+ status = nfsd4_verify_copy(rqstp, cstate, copy, &src, &dst);
+ if (status)
+ return status;
+
+ status = nfsd_copy_range(src, copy->cp_src_pos,
+ dst, copy->cp_dst_pos,
+ copy->cp_count);
+
+ if (status == nfs_ok) {
+ copy->cp_res.wr_stateid = NULL;
+ copy->cp_res.wr_bytes_written = copy->cp_count;
+ copy->cp_res.wr_stable_how = NFS_FILE_SYNC;
+ gen_boot_verifier(&copy->cp_res.wr_verifier, SVC_NET(rqstp));
+ }
+
+ return status;
+}
+
/* This routine never returns NFS_OK! If there are no other errors, it
* will return NFSERR_SAME or NFSERR_NOT_SAME depending on whether the
* attributes matched. VERIFY is implemented by mapping NFSERR_SAME
@@ -1840,6 +1899,12 @@ static struct nfsd4_operation nfsd4_ops[] = {
.op_get_currentstateid = (stateid_getter)nfsd4_get_freestateid,
.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
},
+
+ /* NFSv4.2 operations */
+ [OP_COPY] = {
+ .op_func = (nfsd4op_func)nfsd4_copy,
+ .op_name = "OP_COPY",
+ }
};

#ifdef NFSD_DEBUG
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 280acef..c4e270e 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -3613,12 +3613,12 @@ static __be32 nfsd4_lookup_stateid(stateid_t *stateid, unsigned char typemask,
*/
__be32
nfs4_preprocess_stateid_op(struct net *net, struct nfsd4_compound_state *cstate,
- stateid_t *stateid, int flags, struct file **filpp)
+ struct svc_fh *current_fh, stateid_t *stateid,
+ int flags, struct file **filpp)
{
struct nfs4_stid *s;
struct nfs4_ol_stateid *stp = NULL;
struct nfs4_delegation *dp = NULL;
- struct svc_fh *current_fh = &cstate->current_fh;
struct inode *ino = current_fh->fh_dentry->d_inode;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
__be32 status;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 0c0f3ea9..8f84e9e 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1485,6 +1485,26 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
}

static __be32
+nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
+{
+ DECODE_HEAD;
+
+ status = nfsd4_decode_stateid(argp, &copy->cp_src_stateid);
+ if (status)
+ return status;
+ status = nfsd4_decode_stateid(argp, &copy->cp_dst_stateid);
+ if (status)
+ return status;
+
+ READ_BUF(24);
+ READ64(copy->cp_src_pos);
+ READ64(copy->cp_dst_pos);
+ READ64(copy->cp_count);
+
+ DECODE_TAIL;
+}
+
+static __be32
nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p)
{
return nfs_ok;
@@ -1599,6 +1619,70 @@ static nfsd4_dec nfsd41_dec_ops[] = {
[OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete,
};

+static nfsd4_dec nfsd42_dec_ops[] = {
+ [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access,
+ [OP_CLOSE] = (nfsd4_dec)nfsd4_decode_close,
+ [OP_COMMIT] = (nfsd4_dec)nfsd4_decode_commit,
+ [OP_CREATE] = (nfsd4_dec)nfsd4_decode_create,
+ [OP_DELEGPURGE] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_DELEGRETURN] = (nfsd4_dec)nfsd4_decode_delegreturn,
+ [OP_GETATTR] = (nfsd4_dec)nfsd4_decode_getattr,
+ [OP_GETFH] = (nfsd4_dec)nfsd4_decode_noop,
+ [OP_LINK] = (nfsd4_dec)nfsd4_decode_link,
+ [OP_LOCK] = (nfsd4_dec)nfsd4_decode_lock,
+ [OP_LOCKT] = (nfsd4_dec)nfsd4_decode_lockt,
+ [OP_LOCKU] = (nfsd4_dec)nfsd4_decode_locku,
+ [OP_LOOKUP] = (nfsd4_dec)nfsd4_decode_lookup,
+ [OP_LOOKUPP] = (nfsd4_dec)nfsd4_decode_noop,
+ [OP_NVERIFY] = (nfsd4_dec)nfsd4_decode_verify,
+ [OP_OPEN] = (nfsd4_dec)nfsd4_decode_open,
+ [OP_OPENATTR] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_OPEN_CONFIRM] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_OPEN_DOWNGRADE] = (nfsd4_dec)nfsd4_decode_open_downgrade,
+ [OP_PUTFH] = (nfsd4_dec)nfsd4_decode_putfh,
+ [OP_PUTPUBFH] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_PUTROOTFH] = (nfsd4_dec)nfsd4_decode_noop,
+ [OP_READ] = (nfsd4_dec)nfsd4_decode_read,
+ [OP_READDIR] = (nfsd4_dec)nfsd4_decode_readdir,
+ [OP_READLINK] = (nfsd4_dec)nfsd4_decode_noop,
+ [OP_REMOVE] = (nfsd4_dec)nfsd4_decode_remove,
+ [OP_RENAME] = (nfsd4_dec)nfsd4_decode_rename,
+ [OP_RENEW] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_RESTOREFH] = (nfsd4_dec)nfsd4_decode_noop,
+ [OP_SAVEFH] = (nfsd4_dec)nfsd4_decode_noop,
+ [OP_SECINFO] = (nfsd4_dec)nfsd4_decode_secinfo,
+ [OP_SETATTR] = (nfsd4_dec)nfsd4_decode_setattr,
+ [OP_SETCLIENTID] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_SETCLIENTID_CONFIRM]= (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_VERIFY] = (nfsd4_dec)nfsd4_decode_verify,
+ [OP_WRITE] = (nfsd4_dec)nfsd4_decode_write,
+ [OP_RELEASE_LOCKOWNER] = (nfsd4_dec)nfsd4_decode_notsupp,
+
+ /* new operations for NFSv4.1 */
+ [OP_BACKCHANNEL_CTL] = (nfsd4_dec)nfsd4_decode_backchannel_ctl,
+ [OP_BIND_CONN_TO_SESSION]= (nfsd4_dec)nfsd4_decode_bind_conn_to_session,
+ [OP_EXCHANGE_ID] = (nfsd4_dec)nfsd4_decode_exchange_id,
+ [OP_CREATE_SESSION] = (nfsd4_dec)nfsd4_decode_create_session,
+ [OP_DESTROY_SESSION] = (nfsd4_dec)nfsd4_decode_destroy_session,
+ [OP_FREE_STATEID] = (nfsd4_dec)nfsd4_decode_free_stateid,
+ [OP_GET_DIR_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_SECINFO_NO_NAME] = (nfsd4_dec)nfsd4_decode_secinfo_no_name,
+ [OP_SEQUENCE] = (nfsd4_dec)nfsd4_decode_sequence,
+ [OP_SET_SSV] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_TEST_STATEID] = (nfsd4_dec)nfsd4_decode_test_stateid,
+ [OP_WANT_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_DESTROY_CLIENTID] = (nfsd4_dec)nfsd4_decode_destroy_clientid,
+ [OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete,
+
+ /* new operations for NFS v4.2 */
+ [OP_COPY] = (nfsd4_dec)nfsd4_decode_copy,
+};
+
struct nfsd4_minorversion_ops {
nfsd4_dec *decoders;
int nops;
@@ -1607,7 +1691,7 @@ struct nfsd4_minorversion_ops {
static struct nfsd4_minorversion_ops nfsd4_minorversion[] = {
[0] = { nfsd4_dec_ops, ARRAY_SIZE(nfsd4_dec_ops) },
[1] = { nfsd41_dec_ops, ARRAY_SIZE(nfsd41_dec_ops) },
- [2] = { nfsd41_dec_ops, ARRAY_SIZE(nfsd41_dec_ops) },
+ [2] = { nfsd42_dec_ops, ARRAY_SIZE(nfsd42_dec_ops) },
};

static __be32
@@ -3518,6 +3602,38 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
return nfserr;
}

+static void
+nfsd42_encode_write_res(struct nfsd4_compoundres *resp, struct nfsd42_write_res *write)
+{
+ __be32 *p;
+
+ RESERVE_SPACE(4);
+
+ if (write->wr_stateid == NULL) {
+ WRITE32(0);
+ ADJUST_ARGS();
+ } else {
+ WRITE32(1);
+ ADJUST_ARGS();
+ nfsd4_encode_stateid(resp, write->wr_stateid);
+ }
+
+ RESERVE_SPACE(12 + NFS4_VERIFIER_SIZE);
+ WRITE64(write->wr_bytes_written);
+ WRITE32(write->wr_stable_how);
+ WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
+ ADJUST_ARGS();
+}
+
+static __be32
+nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_copy *copy)
+{
+ if (!nfserr)
+ nfsd42_encode_write_res(resp, &copy->cp_res);
+ return nfserr;
+}
+
static __be32
nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p)
{
@@ -3590,6 +3706,9 @@ static nfsd4_enc nfsd4_enc_ops[] = {
[OP_WANT_DELEGATION] = (nfsd4_enc)nfsd4_encode_noop,
[OP_DESTROY_CLIENTID] = (nfsd4_enc)nfsd4_encode_noop,
[OP_RECLAIM_COMPLETE] = (nfsd4_enc)nfsd4_encode_noop,
+
+ /* NFSv4.2 operations */
+ [OP_COPY] = (nfsd4_enc)nfsd4_encode_copy,
};

/*
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 424d8f5..2478805 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -455,7 +455,7 @@ struct nfsd4_compound_state;
struct nfsd_net;

extern __be32 nfs4_preprocess_stateid_op(struct net *net,
- struct nfsd4_compound_state *cstate,
+ struct nfsd4_compound_state *cstate, struct svc_fh *,
stateid_t *stateid, int flags, struct file **filp);
extern void nfs4_lock_state(void);
extern void nfs4_unlock_state(void);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 8ff6a00..d77958d 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -649,6 +649,15 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
}
#endif

+__be32 nfsd_copy_range(struct file *src, u64 src_pos,
+ struct file *dst, u64 dst_pos,
+ u64 count)
+{
+ int err = vfs_copy_range(src, src_pos, dst, dst_pos, count);
+ if (err < 0)
+ return nfserrno(err);
+ return vfs_fsync_range(dst, dst_pos, dst_pos + count, 0);
+}
#endif /* defined(CONFIG_NFSD_V4) */

#ifdef CONFIG_NFSD_V3
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index a4be2e3..0f26c3b 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -86,6 +86,7 @@ __be32 nfsd_symlink(struct svc_rqst *, struct svc_fh *,
struct svc_fh *res, struct iattr *);
__be32 nfsd_link(struct svc_rqst *, struct svc_fh *,
char *, int, struct svc_fh *);
+__be32 nfsd_copy_range(struct file *, u64, struct file *, u64, u64);
__be32 nfsd_rename(struct svc_rqst *,
struct svc_fh *, char *, int,
struct svc_fh *, char *, int);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index b3ed644..55b9ef7 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -430,6 +430,27 @@ struct nfsd4_reclaim_complete {
u32 rca_one_fs;
};

+struct nfsd42_write_res {
+ stateid_t *wr_stateid;
+ u64 wr_bytes_written;
+ u32 wr_stable_how;
+ nfs4_verifier wr_verifier;
+};
+
+struct nfsd4_copy {
+ /* request */
+ stateid_t cp_src_stateid;
+ stateid_t cp_dst_stateid;
+
+ u64 cp_src_pos;
+ u64 cp_dst_pos;
+
+ u64 cp_count;
+
+ /* response */
+ struct nfsd42_write_res cp_res;
+};
+
struct nfsd4_op {
int opnum;
__be32 status;
@@ -475,6 +496,9 @@ struct nfsd4_op {
struct nfsd4_reclaim_complete reclaim_complete;
struct nfsd4_test_stateid test_stateid;
struct nfsd4_free_stateid free_stateid;
+
+ /* NFSv4.2 */
+ struct nfsd4_copy copy;
} u;
struct nfs4_replay * replay;
};
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index e36dee5..ebf60c6 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -110,6 +110,9 @@ enum nfs_opnum4 {
OP_DESTROY_CLIENTID = 57,
OP_RECLAIM_COMPLETE = 58,

+ /* nfs42 */
+ OP_COPY = 59,
+
OP_ILLEGAL = 10044,
};

--
1.8.3.3


2013-07-22 19:42:09

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 0/5] NFS Server Side Copy

On Mon, Jul 22, 2013 at 03:38:47PM -0400, Bryan Schumaker wrote:
> On 07/22/2013 02:53 PM, J. Bruce Fields wrote:
> > On Fri, Jul 19, 2013 at 05:03:45PM -0400, [email protected] wrote:
> >> From: Bryan Schumaker <[email protected]>
> >>
> >> These patches build on Zach Brown's copyfile patches to add server side
> >> copy to both the NFS client and the NFS server.
> >>
> >> The first patch improves on the copyfile syscall to make it usable on my
> >> machine and also includes notes on other potential problems that I've
> >> found. The remaining patches first implement a sync copy, then expand to
> >> async.
> >>
> >> My testing was done on a server exporting an ext4 filesystem exporting an
> >> ext4 filesystem. I compared copying using the cp command to copying with
> >> the copyfile system call.
> >
> > Were these tests using the full series of patches? (So, using the
> > asynchronous mechanism?)
>
> Yes. Want me to re-run them without it?

I don't think it's urgent.

--b.

>
> - Bryan
>
> >
> > --b.
> >
> >>
> >>
> >> File size: 512 MB
> >> cp: 4.244 seconds
> >> copyfile: 0.961 seconds
> >>
> >> File size: 1024 MB
> >> cp: 9.091 seconds
> >> copyfile: 1.919 seconds
> >>
> >> File size: 1536 MB
> >> cp: 15.291 seconds
> >> copyfile: 6.016 seconds
> >>
> >>
> >> Repeating these tests on a btrfs exported filesystem supporting the copyfile
> >> system call drops the time for copyfile to about 0.01 seconds.
> >>
> >> Feel free to send me any questions, comments or other thoughts!
> >>
> >> - Bryan
> >>
> >> Bryan Schumaker (5):
> >> Improve on the copyfile systemcall
> >> NFSD: Implement the COPY call
> >> NFS: Add COPY nfs operation
> >> NFSD: Defer copying
> >> NFS: Change copy to support async servers
> >>
> >> fs/copy_range.c | 10 +++-
> >> fs/nfs/callback.h | 13 ++++
> >> fs/nfs/callback_proc.c | 9 +++
> >> fs/nfs/callback_xdr.c | 54 ++++++++++++++++-
> >> fs/nfs/inode.c | 2 +
> >> fs/nfs/nfs4_fs.h | 7 +++
> >> fs/nfs/nfs4file.c | 101 +++++++++++++++++++++++++++++++
> >> fs/nfs/nfs4proc.c | 16 +++++
> >> fs/nfs/nfs4xdr.c | 150 +++++++++++++++++++++++++++++++++++++++++++++++
> >> fs/nfsd/nfs4callback.c | 136 ++++++++++++++++++++++++++++++++++++++++++
> >> fs/nfsd/nfs4proc.c | 104 ++++++++++++++++++++++++++++++--
> >> fs/nfsd/nfs4state.c | 15 ++++-
> >> fs/nfsd/nfs4xdr.c | 121 +++++++++++++++++++++++++++++++++++++-
> >> fs/nfsd/state.h | 23 +++++++-
> >> fs/nfsd/vfs.c | 9 +++
> >> fs/nfsd/vfs.h | 1 +
> >> fs/nfsd/xdr4.h | 24 ++++++++
> >> fs/nfsd/xdr4cb.h | 9 +++
> >> include/linux/nfs4.h | 14 ++++-
> >> include/linux/nfs_xdr.h | 33 +++++++++++
> >> include/linux/syscalls.h | 1 +
> >> 21 files changed, 836 insertions(+), 16 deletions(-)
> >>
> >> --
> >> 1.8.3.3
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2013-07-22 18:05:35

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 2/5] NFSD: Implement the COPY call

On Fri, Jul 19, 2013 at 05:03:47PM -0400, [email protected] wrote:
> From: Bryan Schumaker <[email protected]>
>
> I only implemented the sync version of this call, since it's the
> easiest. I can simply call vfs_copy_range() and have the vfs do the
> right thing for the filesystem being exported.
>
> Signed-off-by: Bryan Schumaker <[email protected]>
> ---
> fs/nfsd/nfs4proc.c | 75 ++++++++++++++++++++++++++++---
> fs/nfsd/nfs4state.c | 4 +-
> fs/nfsd/nfs4xdr.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> fs/nfsd/state.h | 2 +-
> fs/nfsd/vfs.c | 9 ++++
> fs/nfsd/vfs.h | 1 +
> fs/nfsd/xdr4.h | 24 ++++++++++
> include/linux/nfs4.h | 3 ++
> 8 files changed, 230 insertions(+), 9 deletions(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index a7cee86..d4584ea 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -780,8 +780,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>
> nfs4_lock_state();
> /* check stateid */
> - if ((status = nfs4_preprocess_stateid_op(SVC_NET(rqstp),
> - cstate, &read->rd_stateid,
> + if ((status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
> + &cstate->current_fh,
> + &read->rd_stateid,
> RD_STATE, &read->rd_filp))) {
> dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
> goto out;
> @@ -931,7 +932,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
> nfs4_lock_state();
> status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
> - &setattr->sa_stateid, WR_STATE, NULL);
> + &cstate->current_fh, &setattr->sa_stateid, WR_STATE, NULL);
> nfs4_unlock_state();
> if (status) {
> dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
> @@ -999,8 +1000,9 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> return nfserr_inval;
>
> nfs4_lock_state();
> - status = nfs4_preprocess_stateid_op(SVC_NET(rqstp),
> - cstate, stateid, WR_STATE, &filp);
> + status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
> + &cstate->current_fh, stateid,
> + WR_STATE, &filp);
> if (status) {
> nfs4_unlock_state();
> dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
> @@ -1028,6 +1030,63 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> return status;
> }
>
> +static __be32
> +nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> + struct nfsd4_copy *copy, struct file **src, struct file **dst)
> +{
> + __be32 status;
> + /* only support copying data to an existing file */
> + if (!cstate->current_fh.fh_dentry || !cstate->save_fh.fh_dentry)
> + return nfserr_nofilehandle;
> +
> + nfs4_lock_state();
> + /* check stateids */
> + if ((status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
> + &cstate->save_fh,
> + &copy->cp_src_stateid,
> + RD_STATE, src))){
> + dprintk("NFSD: nfsd4_copy: couldn't process src stateid!\n");
> + goto out;
> + }
> +
> + if ((status = nfs4_preprocess_stateid_op(SVC_NET(rqstp), cstate,
> + &cstate->current_fh,
> + &copy->cp_dst_stateid,
> + WR_STATE, dst))){
> + dprintk("NFSD: nfsd4_copy: couldn't process dst stateid!\n");
> + goto out;
> + }
> +
> +out:
> + nfs4_unlock_state();
> + return status;
> +}
> +
> +static __be32
> +nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> + struct nfsd4_copy *copy)
> +{
> + __be32 status;
> + struct file *src = NULL, *dst = NULL;
> +
> + status = nfsd4_verify_copy(rqstp, cstate, copy, &src, &dst);
> + if (status)
> + return status;
> +
> + status = nfsd_copy_range(src, copy->cp_src_pos,
> + dst, copy->cp_dst_pos,
> + copy->cp_count);
> +
> + if (status == nfs_ok) {
> + copy->cp_res.wr_stateid = NULL;
> + copy->cp_res.wr_bytes_written = copy->cp_count;
> + copy->cp_res.wr_stable_how = NFS_FILE_SYNC;
> + gen_boot_verifier(&copy->cp_res.wr_verifier, SVC_NET(rqstp));
> + }
> +
> + return status;
> +}
> +
> /* This routine never returns NFS_OK! If there are no other errors, it
> * will return NFSERR_SAME or NFSERR_NOT_SAME depending on whether the
> * attributes matched. VERIFY is implemented by mapping NFSERR_SAME
> @@ -1840,6 +1899,12 @@ static struct nfsd4_operation nfsd4_ops[] = {
> .op_get_currentstateid = (stateid_getter)nfsd4_get_freestateid,
> .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
> },
> +
> + /* NFSv4.2 operations */
> + [OP_COPY] = {
> + .op_func = (nfsd4op_func)nfsd4_copy,
> + .op_name = "OP_COPY",

This needs more fields filled in. Probably take the OP_WRITE entry as a
starting point.

> + }
> };
>
> #ifdef NFSD_DEBUG
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 280acef..c4e270e 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -3613,12 +3613,12 @@ static __be32 nfsd4_lookup_stateid(stateid_t *stateid, unsigned char typemask,
> */
> __be32
> nfs4_preprocess_stateid_op(struct net *net, struct nfsd4_compound_state *cstate,
> - stateid_t *stateid, int flags, struct file **filpp)
> + struct svc_fh *current_fh, stateid_t *stateid,
> + int flags, struct file **filpp)
> {
> struct nfs4_stid *s;
> struct nfs4_ol_stateid *stp = NULL;
> struct nfs4_delegation *dp = NULL;
> - struct svc_fh *current_fh = &cstate->current_fh;
> struct inode *ino = current_fh->fh_dentry->d_inode;
> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> __be32 status;
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 0c0f3ea9..8f84e9e 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1485,6 +1485,26 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
> }
>
> static __be32
> +nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
> +{
> + DECODE_HEAD;
> +
> + status = nfsd4_decode_stateid(argp, &copy->cp_src_stateid);
> + if (status)
> + return status;
> + status = nfsd4_decode_stateid(argp, &copy->cp_dst_stateid);
> + if (status)
> + return status;
> +
> + READ_BUF(24);
> + READ64(copy->cp_src_pos);
> + READ64(copy->cp_dst_pos);
> + READ64(copy->cp_count);
> +
> + DECODE_TAIL;
> +}
> +
> +static __be32
> nfsd4_decode_noop(struct nfsd4_compoundargs *argp, void *p)
> {
> return nfs_ok;
> @@ -1599,6 +1619,70 @@ static nfsd4_dec nfsd41_dec_ops[] = {
> [OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete,
> };
>
> +static nfsd4_dec nfsd42_dec_ops[] = {
> + [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access,
> + [OP_CLOSE] = (nfsd4_dec)nfsd4_decode_close,
> + [OP_COMMIT] = (nfsd4_dec)nfsd4_decode_commit,
> + [OP_CREATE] = (nfsd4_dec)nfsd4_decode_create,
> + [OP_DELEGPURGE] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_DELEGRETURN] = (nfsd4_dec)nfsd4_decode_delegreturn,
> + [OP_GETATTR] = (nfsd4_dec)nfsd4_decode_getattr,
> + [OP_GETFH] = (nfsd4_dec)nfsd4_decode_noop,
> + [OP_LINK] = (nfsd4_dec)nfsd4_decode_link,
> + [OP_LOCK] = (nfsd4_dec)nfsd4_decode_lock,
> + [OP_LOCKT] = (nfsd4_dec)nfsd4_decode_lockt,
> + [OP_LOCKU] = (nfsd4_dec)nfsd4_decode_locku,
> + [OP_LOOKUP] = (nfsd4_dec)nfsd4_decode_lookup,
> + [OP_LOOKUPP] = (nfsd4_dec)nfsd4_decode_noop,
> + [OP_NVERIFY] = (nfsd4_dec)nfsd4_decode_verify,
> + [OP_OPEN] = (nfsd4_dec)nfsd4_decode_open,
> + [OP_OPENATTR] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_OPEN_CONFIRM] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_OPEN_DOWNGRADE] = (nfsd4_dec)nfsd4_decode_open_downgrade,
> + [OP_PUTFH] = (nfsd4_dec)nfsd4_decode_putfh,
> + [OP_PUTPUBFH] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_PUTROOTFH] = (nfsd4_dec)nfsd4_decode_noop,
> + [OP_READ] = (nfsd4_dec)nfsd4_decode_read,
> + [OP_READDIR] = (nfsd4_dec)nfsd4_decode_readdir,
> + [OP_READLINK] = (nfsd4_dec)nfsd4_decode_noop,
> + [OP_REMOVE] = (nfsd4_dec)nfsd4_decode_remove,
> + [OP_RENAME] = (nfsd4_dec)nfsd4_decode_rename,
> + [OP_RENEW] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_RESTOREFH] = (nfsd4_dec)nfsd4_decode_noop,
> + [OP_SAVEFH] = (nfsd4_dec)nfsd4_decode_noop,
> + [OP_SECINFO] = (nfsd4_dec)nfsd4_decode_secinfo,
> + [OP_SETATTR] = (nfsd4_dec)nfsd4_decode_setattr,
> + [OP_SETCLIENTID] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_SETCLIENTID_CONFIRM]= (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_VERIFY] = (nfsd4_dec)nfsd4_decode_verify,
> + [OP_WRITE] = (nfsd4_dec)nfsd4_decode_write,
> + [OP_RELEASE_LOCKOWNER] = (nfsd4_dec)nfsd4_decode_notsupp,
> +
> + /* new operations for NFSv4.1 */
> + [OP_BACKCHANNEL_CTL] = (nfsd4_dec)nfsd4_decode_backchannel_ctl,
> + [OP_BIND_CONN_TO_SESSION]= (nfsd4_dec)nfsd4_decode_bind_conn_to_session,
> + [OP_EXCHANGE_ID] = (nfsd4_dec)nfsd4_decode_exchange_id,
> + [OP_CREATE_SESSION] = (nfsd4_dec)nfsd4_decode_create_session,
> + [OP_DESTROY_SESSION] = (nfsd4_dec)nfsd4_decode_destroy_session,
> + [OP_FREE_STATEID] = (nfsd4_dec)nfsd4_decode_free_stateid,
> + [OP_GET_DIR_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_GETDEVICEINFO] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_GETDEVICELIST] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_LAYOUTCOMMIT] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_LAYOUTGET] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_LAYOUTRETURN] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_SECINFO_NO_NAME] = (nfsd4_dec)nfsd4_decode_secinfo_no_name,
> + [OP_SEQUENCE] = (nfsd4_dec)nfsd4_decode_sequence,
> + [OP_SET_SSV] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_TEST_STATEID] = (nfsd4_dec)nfsd4_decode_test_stateid,
> + [OP_WANT_DELEGATION] = (nfsd4_dec)nfsd4_decode_notsupp,
> + [OP_DESTROY_CLIENTID] = (nfsd4_dec)nfsd4_decode_destroy_clientid,
> + [OP_RECLAIM_COMPLETE] = (nfsd4_dec)nfsd4_decode_reclaim_complete,
> +
> + /* new operations for NFS v4.2 */
> + [OP_COPY] = (nfsd4_dec)nfsd4_decode_copy,
> +};
> +
> struct nfsd4_minorversion_ops {
> nfsd4_dec *decoders;
> int nops;
> @@ -1607,7 +1691,7 @@ struct nfsd4_minorversion_ops {
> static struct nfsd4_minorversion_ops nfsd4_minorversion[] = {
> [0] = { nfsd4_dec_ops, ARRAY_SIZE(nfsd4_dec_ops) },
> [1] = { nfsd41_dec_ops, ARRAY_SIZE(nfsd41_dec_ops) },
> - [2] = { nfsd41_dec_ops, ARRAY_SIZE(nfsd41_dec_ops) },
> + [2] = { nfsd42_dec_ops, ARRAY_SIZE(nfsd42_dec_ops) },
> };
>
> static __be32
> @@ -3518,6 +3602,38 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
> return nfserr;
> }
>
> +static void
> +nfsd42_encode_write_res(struct nfsd4_compoundres *resp, struct nfsd42_write_res *write)
> +{
> + __be32 *p;
> +
> + RESERVE_SPACE(4);
> +
> + if (write->wr_stateid == NULL) {
> + WRITE32(0);
> + ADJUST_ARGS();
> + } else {
> + WRITE32(1);
> + ADJUST_ARGS();
> + nfsd4_encode_stateid(resp, write->wr_stateid);
> + }
> +
> + RESERVE_SPACE(12 + NFS4_VERIFIER_SIZE);
> + WRITE64(write->wr_bytes_written);
> + WRITE32(write->wr_stable_how);
> + WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
> + ADJUST_ARGS();

If I'm reading
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19 14.1.2
correctly.... This should be just an offset in the error case, right?

Also, may as well share code in the succesful case with
nfsd4_encode_write().

--b.

> +}
> +
> +static __be32
> +nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr,
> + struct nfsd4_copy *copy)
> +{
> + if (!nfserr)
> + nfsd42_encode_write_res(resp, &copy->cp_res);
> + return nfserr;
> +}
> +
> static __be32
> nfsd4_encode_noop(struct nfsd4_compoundres *resp, __be32 nfserr, void *p)
> {
> @@ -3590,6 +3706,9 @@ static nfsd4_enc nfsd4_enc_ops[] = {
> [OP_WANT_DELEGATION] = (nfsd4_enc)nfsd4_encode_noop,
> [OP_DESTROY_CLIENTID] = (nfsd4_enc)nfsd4_encode_noop,
> [OP_RECLAIM_COMPLETE] = (nfsd4_enc)nfsd4_encode_noop,
> +
> + /* NFSv4.2 operations */
> + [OP_COPY] = (nfsd4_enc)nfsd4_encode_copy,
> };
>
> /*
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 424d8f5..2478805 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -455,7 +455,7 @@ struct nfsd4_compound_state;
> struct nfsd_net;
>
> extern __be32 nfs4_preprocess_stateid_op(struct net *net,
> - struct nfsd4_compound_state *cstate,
> + struct nfsd4_compound_state *cstate, struct svc_fh *,
> stateid_t *stateid, int flags, struct file **filp);
> extern void nfs4_lock_state(void);
> extern void nfs4_unlock_state(void);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 8ff6a00..d77958d 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -649,6 +649,15 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
> }
> #endif
>
> +__be32 nfsd_copy_range(struct file *src, u64 src_pos,
> + struct file *dst, u64 dst_pos,
> + u64 count)
> +{
> + int err = vfs_copy_range(src, src_pos, dst, dst_pos, count);
> + if (err < 0)
> + return nfserrno(err);
> + return vfs_fsync_range(dst, dst_pos, dst_pos + count, 0);
> +}
> #endif /* defined(CONFIG_NFSD_V4) */
>
> #ifdef CONFIG_NFSD_V3
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index a4be2e3..0f26c3b 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -86,6 +86,7 @@ __be32 nfsd_symlink(struct svc_rqst *, struct svc_fh *,
> struct svc_fh *res, struct iattr *);
> __be32 nfsd_link(struct svc_rqst *, struct svc_fh *,
> char *, int, struct svc_fh *);
> +__be32 nfsd_copy_range(struct file *, u64, struct file *, u64, u64);
> __be32 nfsd_rename(struct svc_rqst *,
> struct svc_fh *, char *, int,
> struct svc_fh *, char *, int);
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index b3ed644..55b9ef7 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -430,6 +430,27 @@ struct nfsd4_reclaim_complete {
> u32 rca_one_fs;
> };
>
> +struct nfsd42_write_res {
> + stateid_t *wr_stateid;
> + u64 wr_bytes_written;
> + u32 wr_stable_how;
> + nfs4_verifier wr_verifier;
> +};
> +
> +struct nfsd4_copy {
> + /* request */
> + stateid_t cp_src_stateid;
> + stateid_t cp_dst_stateid;
> +
> + u64 cp_src_pos;
> + u64 cp_dst_pos;
> +
> + u64 cp_count;
> +
> + /* response */
> + struct nfsd42_write_res cp_res;
> +};
> +
> struct nfsd4_op {
> int opnum;
> __be32 status;
> @@ -475,6 +496,9 @@ struct nfsd4_op {
> struct nfsd4_reclaim_complete reclaim_complete;
> struct nfsd4_test_stateid test_stateid;
> struct nfsd4_free_stateid free_stateid;
> +
> + /* NFSv4.2 */
> + struct nfsd4_copy copy;
> } u;
> struct nfs4_replay * replay;
> };
> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> index e36dee5..ebf60c6 100644
> --- a/include/linux/nfs4.h
> +++ b/include/linux/nfs4.h
> @@ -110,6 +110,9 @@ enum nfs_opnum4 {
> OP_DESTROY_CLIENTID = 57,
> OP_RECLAIM_COMPLETE = 58,
>
> + /* nfs42 */
> + OP_COPY = 59,
> +
> OP_ILLEGAL = 10044,
> };
>
> --
> 1.8.3.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-07-22 19:54:04

by Anna Schumaker

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On 07/22/2013 03:43 PM, J. Bruce Fields wrote:
> On Mon, Jul 22, 2013 at 03:37:00PM -0400, Bryan Schumaker wrote:
>> On 07/22/2013 03:30 PM, J. Bruce Fields wrote:
>>> On Mon, Jul 22, 2013 at 03:17:29PM -0400, Bryan Schumaker wrote:
>>>> On 07/22/2013 02:50 PM, J. Bruce Fields wrote:
>>>>> On Fri, Jul 19, 2013 at 05:03:49PM -0400, [email protected] wrote:
>>>>>> From: Bryan Schumaker <[email protected]>
>>>>>>
>>>>>> Rather than performing the copy right away, schedule it to run later and
>>>>>> reply to the client. Later, send a callback to notify the client that
>>>>>> the copy has finished.
>>>>>
>>>>> I believe you need to implement the referring triple support described
>>>>> in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
>>>>> described in
>>>>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
>>>>> .
>>>>
>>>> I'll re-read and re-write.
>>>>
>>>>>
>>>>> I see cb_delay initialized below, but not otherwise used. Am I missing
>>>>> anything?
>>>>
>>>> Whoops! I was using that earlier to try to fake up a callback, but I eventually decided it's easier to just do the copy asynchronously. I must have forgotten to take it out :(
>>>>
>>>>>
>>>>> What about OFFLOAD_STATUS and OFFLOAD_ABORT?
>>>>
>>>> I haven't thought out those too much... I haven't thought about a use for them on the client yet.
>>>
>>> If it might be a long-running copy, I assume the client needs the
>>> ability to abort if the caller is killed.
>>>
>>> (Dumb question: what happens on the network partition? Does the server
>>> abort the copy when it expires the client state?)
>>>
>>> In any case,
>>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
>>> says "If a server's COPY operation returns a stateid, then the server
>>> MUST also support these operations: CB_OFFLOAD, OFFLOAD_ABORT, and
>>> OFFLOAD_STATUS."
>>>
>>> So even if we've no use for them on the client then we still need to
>>> implement them (and probably just write a basic pynfs test). Either
>>> that or update the spec.
>>
>> Fair enough. I'll think it out and do something! Easy solution: save this patch for later and only support the sync version of copy for the final version of this patch series.
>
> I can't remember--does the spec give the server a clear way to bail out
> and tell the client to fall back on a normal copy in cases where the
> server knows the copy could take an unreasonable amount of time?
>
> --b.

I don't think so. Is there ever a case where copying over the network would be slower than copying on the server?

>
>>
>> - Bryan
>>
>>>
>>>>> In some common cases the reply will be very quick, and we might be
>>>>> better off handling it synchronously. Could we implement a heuristic
>>>>> like "copy synchronously if the filesystem has special support or the
>>>>> range is less than the maximum iosize, otherwise copy asynchronously"?
>>>>
>>>> I'm sure that can be done, I'm just not sure how to do it yet...
>>>
>>> OK, thanks.
>>>
>>> --b.
>>>
>>


2013-07-24 14:21:52

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC 3/5] NFS: Add COPY nfs operation

T24gRnJpLCAyMDEzLTA3LTE5IGF0IDE3OjAzIC0wNDAwLCBianNjaHVtYUBuZXRhcHAuY29tIHdy
b3RlOg0KPiBGcm9tOiBCcnlhbiBTY2h1bWFrZXIgPGJqc2NodW1hQG5ldGFwcC5jb20+DQo+IA0K
PiBUaGlzIGFkZHMgdGhlIGNvcHlfcmFuZ2UgZmlsZV9vcHMgZnVuY3Rpb24gcG9pbnRlciB1c2Vk
IGJ5IHRoZQ0KPiBzeXNfY29weV9yYW5nZSgpIGZ1bmN0aW9uIGNhbGwuICBUaGlzIHBhdGNoIG9u
bHkgaW1wbGVtZW50cyBzeW5jIGNvcGllcywNCj4gc28gaWYgYW4gYXN5bmMgY29weSBoYXBwZW5z
IHdlIGRlY29kZSB0aGUgc3RhdGVpZCBhbmQgaWdub3JlIGl0Lg0KPiANCj4gU2lnbmVkLW9mZi1i
eTogQnJ5YW4gU2NodW1ha2VyIDxianNjaHVtYUBuZXRhcHAuY29tPg0KPiAtLS0NCj4gIGZzL25m
cy9pbm9kZS5jICAgICAgICAgIHwgICAyICsNCj4gIGZzL25mcy9uZnM0X2ZzLmggICAgICAgIHwg
ICA0ICsrDQo+ICBmcy9uZnMvbmZzNGZpbGUuYyAgICAgICB8ICA1MyArKysrKysrKysrKysrKysr
Kw0KPiAgZnMvbmZzL25mczRwcm9jLmMgICAgICAgfCAgMTYgKysrKysrDQo+ICBmcy9uZnMvbmZz
NHhkci5jICAgICAgICB8IDE1MCArKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr
KysrKysrKysrKysNCj4gIGluY2x1ZGUvbGludXgvbmZzNC5oICAgIHwgIDExICsrKy0NCj4gIGlu
Y2x1ZGUvbGludXgvbmZzX3hkci5oIHwgIDMwICsrKysrKysrKysNCj4gIDcgZmlsZXMgY2hhbmdl
ZCwgMjY1IGluc2VydGlvbnMoKyksIDEgZGVsZXRpb24oLSkNCj4gDQo+IGRpZmYgLS1naXQgYS9m
cy9uZnMvaW5vZGUuYyBiL2ZzL25mcy9pbm9kZS5jDQo+IGluZGV4IGFmNmU4MDYuLjgwODQ5YTAg
MTAwNjQ0DQo+IC0tLSBhL2ZzL25mcy9pbm9kZS5jDQo+ICsrKyBiL2ZzL25mcy9pbm9kZS5jDQo+
IEBAIC02NzcsNiArNjc3LDcgQEAgc3RydWN0IG5mc19sb2NrX2NvbnRleHQgKm5mc19nZXRfbG9j
a19jb250ZXh0KHN0cnVjdCBuZnNfb3Blbl9jb250ZXh0ICpjdHgpDQo+ICAJa2ZyZWUobmV3KTsN
Cj4gIAlyZXR1cm4gcmVzOw0KPiAgfQ0KPiArRVhQT1JUX1NZTUJPTF9HUEwobmZzX2dldF9sb2Nr
X2NvbnRleHQpOw0KPiAgDQo+ICB2b2lkIG5mc19wdXRfbG9ja19jb250ZXh0KHN0cnVjdCBuZnNf
bG9ja19jb250ZXh0ICpsX2N0eCkNCj4gIHsNCj4gQEAgLTY4OSw2ICs2OTAsNyBAQCB2b2lkIG5m
c19wdXRfbG9ja19jb250ZXh0KHN0cnVjdCBuZnNfbG9ja19jb250ZXh0ICpsX2N0eCkNCj4gIAlz
cGluX3VubG9jaygmaW5vZGUtPmlfbG9jayk7DQo+ICAJa2ZyZWUobF9jdHgpOw0KPiAgfQ0KPiAr
RVhQT1JUX1NZTUJPTF9HUEwobmZzX3B1dF9sb2NrX2NvbnRleHQpOw0KPiAgDQo+ICAvKioNCj4g
ICAqIG5mc19jbG9zZV9jb250ZXh0IC0gQ29tbW9uIGNsb3NlX2NvbnRleHQoKSByb3V0aW5lIE5G
U3YyL3YzDQo+IGRpZmYgLS1naXQgYS9mcy9uZnMvbmZzNF9mcy5oIGIvZnMvbmZzL25mczRfZnMu
aA0KPiBpbmRleCBlZTgxZTM1Li4yNmM3Y2YwIDEwMDY0NA0KPiAtLS0gYS9mcy9uZnMvbmZzNF9m
cy5oDQo+ICsrKyBiL2ZzL25mcy9uZnM0X2ZzLmgNCj4gQEAgLTMwMCw2ICszMDAsMTAgQEAgaXNf
ZHNfY2xpZW50KHN0cnVjdCBuZnNfY2xpZW50ICpjbHApDQo+ICB9DQo+ICAjZW5kaWYgLyogQ09O
RklHX05GU19WNF8xICovDQo+ICANCj4gKyNpZmRlZiBDT05GSUdfTkZTX1Y0XzINCj4gK2ludCBu
ZnM0Ml9wcm9jX2NvcHkoc3RydWN0IG5mc19zZXJ2ZXIgKiwgc3RydWN0IG5mczQyX2NvcHlfYXJn
cyAqLCBzdHJ1Y3QgbmZzNDJfY29weV9yZXMgKik7DQo+ICsjZW5kaWYgLyogQ09ORklHX05GU19W
NF8yICovDQo+ICsNCj4gIGV4dGVybiBjb25zdCBzdHJ1Y3QgbmZzNF9taW5vcl92ZXJzaW9uX29w
cyAqbmZzX3Y0X21pbm9yX29wc1tdOw0KPiAgDQo+ICBleHRlcm4gY29uc3QgdTMyIG5mczRfZmF0
dHJfYml0bWFwWzNdOw0KPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL25mczRmaWxlLmMgYi9mcy9uZnMv
bmZzNGZpbGUuYw0KPiBpbmRleCBlNWI4MDRkLi5jYTc3YWI0IDEwMDY0NA0KPiAtLS0gYS9mcy9u
ZnMvbmZzNGZpbGUuYw0KPiArKysgYi9mcy9uZnMvbmZzNGZpbGUuYw0KPiBAQCAtMTE3LDYgKzEx
Nyw1NiBAQCBuZnM0X2ZpbGVfZnN5bmMoc3RydWN0IGZpbGUgKmZpbGUsIGxvZmZfdCBzdGFydCwg
bG9mZl90IGVuZCwgaW50IGRhdGFzeW5jKQ0KPiAgCXJldHVybiByZXQ7DQo+ICB9DQo+ICANCj4g
KyNpZmRlZiBDT05GSUdfTkZTX1Y0XzINCj4gK3N0YXRpYyBpbnQgbmZzNF9maW5kX2NvcHlfc3Rh
dGVpZChzdHJ1Y3QgZmlsZSAqZmlsZSwgbmZzNF9zdGF0ZWlkICpzdGF0ZWlkLA0KPiArCQkJCSAg
Zm1vZGVfdCBtb2RlKQ0KPiArew0KPiArCXN0cnVjdCBuZnNfb3Blbl9jb250ZXh0ICpvcGVuOw0K
PiArCXN0cnVjdCBuZnNfbG9ja19jb250ZXh0ICpsb2NrOw0KPiArCWludCByZXQ7DQo+ICsNCj4g
KwlvcGVuID0gbmZzX2ZpbGVfb3Blbl9jb250ZXh0KGZpbGUpOw0KPiArCWlmICghb3BlbikNCj4g
KwkJcmV0dXJuIFBUUl9FUlIob3Blbik7DQoNClBUUl9FUlIob3BlbikgPT0gMCBoZXJlLiBXYXMg
dGhhdCByZWFsbHkgdGhlIGludGVudGlvbj8NCg0KPiArDQo+ICsJbG9jayA9IG5mc19nZXRfbG9j
a19jb250ZXh0KG9wZW4pOw0KDQpuZnNfZ2V0X2xvY2tfY29udGV4dCgpIHdpbGwgcmV0dXJuIGFu
IEVSUl9QVFIgb24gZmFpbHVyZS4NCg0KPiArCXJldCA9IG5mczRfc2V0X3J3X3N0YXRlaWQoc3Rh
dGVpZCwgb3BlbiwgbG9jaywgbW9kZSk7DQo+ICsNCj4gKwlpZiAobG9jaykNCj4gKwkJbmZzX3B1
dF9sb2NrX2NvbnRleHQobG9jayk7DQo+ICsJcmV0dXJuIHJldDsNCj4gK30NCj4gKw0KPiArc3Rh
dGljIHNzaXplX3QgbmZzNF9jb3B5X3JhbmdlKHN0cnVjdCBmaWxlICpmaWxlX2luLCBsb2ZmX3Qg
cG9zX2luLA0KPiArCQkJICAgICAgIHN0cnVjdCBmaWxlICpmaWxlX291dCwgbG9mZl90IHBvc19v
dXQsDQo+ICsJCQkgICAgICAgc2l6ZV90IGNvdW50KQ0KPiArew0KPiArCWludCBlcnI7DQo+ICsJ
c3RydWN0IG5mczQyX2NvcHlfYXJncyBhcmdzID0gew0KPiArCQkuc3JjX2ZoICA9IE5GU19GSChm
aWxlX2lub2RlKGZpbGVfaW4pKSwNCj4gKwkJLnNyY19wb3MgPSBwb3NfaW4sDQo+ICsJCS5kc3Rf
ZmggID0gTkZTX0ZIKGZpbGVfaW5vZGUoZmlsZV9vdXQpKSwNCj4gKwkJLmRzdF9wb3MgPSBwb3Nf
b3V0LA0KPiArCQkuY291bnQgICA9IGNvdW50LA0KPiArCX07DQo+ICsJc3RydWN0IG5mczQyX2Nv
cHlfcmVzIHJlczsNCj4gKw0KPiArCWVyciA9IG5mczRfZmluZF9jb3B5X3N0YXRlaWQoZmlsZV9p
biwgJmFyZ3Muc3JjX3N0YXRlaWQsIEZNT0RFX1JFQUQpOw0KPiArCWlmIChlcnIpDQo+ICsJCXJl
dHVybiBlcnI7DQo+ICsNCj4gKwllcnIgPSBuZnM0X2ZpbmRfY29weV9zdGF0ZWlkKGZpbGVfb3V0
LCAmYXJncy5kc3Rfc3RhdGVpZCwgRk1PREVfV1JJVEUpOw0KPiArCWlmIChlcnIpDQo+ICsJCXJl
dHVybiBlcnI7DQo+ICsNCj4gKwllcnIgPSBuZnM0Ml9wcm9jX2NvcHkoTkZTX1NFUlZFUihmaWxl
X2lub2RlKGZpbGVfb3V0KSksICZhcmdzLCAmcmVzKTsNCj4gKwlpZiAoZXJyKQ0KPiArCQlyZXR1
cm4gZXJyOw0KPiArDQo+ICsJcmV0dXJuIHJlcy5jcF9yZXMud3JfYnl0ZXNfY29waWVkOw0KPiAr
fQ0KPiArI2VuZGlmIC8qIENPTkZJR19ORlNfVjRfMiAqLw0KPiArDQo+ICBjb25zdCBzdHJ1Y3Qg
ZmlsZV9vcGVyYXRpb25zIG5mczRfZmlsZV9vcGVyYXRpb25zID0gew0KPiAgCS5sbHNlZWsJCT0g
bmZzX2ZpbGVfbGxzZWVrLA0KPiAgCS5yZWFkCQk9IGRvX3N5bmNfcmVhZCwNCj4gQEAgLTEzNCw0
ICsxODQsNyBAQCBjb25zdCBzdHJ1Y3QgZmlsZV9vcGVyYXRpb25zIG5mczRfZmlsZV9vcGVyYXRp
b25zID0gew0KPiAgCS5zcGxpY2Vfd3JpdGUJPSBuZnNfZmlsZV9zcGxpY2Vfd3JpdGUsDQo+ICAJ
LmNoZWNrX2ZsYWdzCT0gbmZzX2NoZWNrX2ZsYWdzLA0KPiAgCS5zZXRsZWFzZQk9IG5mc19zZXRs
ZWFzZSwNCj4gKyNpZmRlZiBDT05GSUdfTkZTX1Y0XzINCj4gKwkuY29weV9yYW5nZQk9IG5mczRf
Y29weV9yYW5nZSwNCj4gKyNlbmRpZiAvKiBDT05GSUdfTkZTX1Y0XzIgKi8NCj4gIH07DQo+IGRp
ZmYgLS1naXQgYS9mcy9uZnMvbmZzNHByb2MuYyBiL2ZzL25mcy9uZnM0cHJvYy5jDQo+IGluZGV4
IGNmMTE3OTkuLmY3ZWI0ZmQgMTAwNjQ0DQo+IC0tLSBhL2ZzL25mcy9uZnM0cHJvYy5jDQo+ICsr
KyBiL2ZzL25mcy9uZnM0cHJvYy5jDQo+IEBAIC03MzQ2LDYgKzczNDYsMjIgQEAgc3RhdGljIGJv
b2wgbmZzNDFfbWF0Y2hfc3RhdGVpZChjb25zdCBuZnM0X3N0YXRlaWQgKnMxLA0KPiAgDQo+ICAj
ZW5kaWYgLyogQ09ORklHX05GU19WNF8xICovDQo+ICANCj4gKyNpZmRlZiBDT05GSUdfTkZTX1Y0
XzINCj4gK2ludCBuZnM0Ml9wcm9jX2NvcHkoc3RydWN0IG5mc19zZXJ2ZXIgKnNlcnZlciwgc3Ry
dWN0IG5mczQyX2NvcHlfYXJncyAqYXJncywNCj4gKwkJICAgIHN0cnVjdCBuZnM0Ml9jb3B5X3Jl
cyAqcmVzKQ0KPiArew0KPiArCXN0cnVjdCBycGNfbWVzc2FnZSBtc2cgPSB7DQo+ICsJCS5ycGNf
cHJvYyA9ICZuZnM0X3Byb2NlZHVyZXNbTkZTUFJPQzRfQ0xOVF9DT1BZXSwNCj4gKwkJLnJwY19h
cmdwID0gYXJncywNCj4gKwkJLnJwY19yZXNwID0gcmVzLA0KDQpTaG91bGRuJ3QgeW91IHNldCAu
cnBjX2NyZWQgdG8gdGhlIGNvcnJlY3QgdmFsdWUgdG9vPyBJIGFzc3VtZSB0aGF0DQpuZWVkcyB0
byByZWZsZWN0IHRoZSBjcmVkZW50aWFsIHVzZWQgd2hlbiBvcGVuaW5nIGZpbGVfb3V0LiA/DQoN
Cj4gKwl9Ow0KPiArDQo+ICsJZHByaW50aygiTkZTIGNhbGwgY29weSAlcFxuIiwgJmFyZ3MpOw0K
PiArCXJldHVybiBuZnM0X2NhbGxfc3luYyhzZXJ2ZXItPmNsaWVudCwgc2VydmVyLCAmbXNnLA0K
PiArCQkJCSYoYXJncy0+c2VxX2FyZ3MpLCAmKHJlcy0+c2VxX3JlcyksIDApOw0KDQpUaGUgKCkg
YXJvdW5kIGFyZ3MtPnNlcV9hcmdzIGFuZCByZXMtPnNlcV9yZXMgYXJlIHJlZHVuZGFudCBoZXJl
Lg0KDQo+ICt9DQo+ICsjZW5kaWYgLyogQ09ORklHX05GU19WNF8yICovDQo+ICsNCj4gIHN0YXRp
YyBib29sIG5mczRfbWF0Y2hfc3RhdGVpZChjb25zdCBuZnM0X3N0YXRlaWQgKnMxLA0KPiAgCQlj
b25zdCBuZnM0X3N0YXRlaWQgKnMyKQ0KPiAgew0KPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL25mczR4
ZHIuYyBiL2ZzL25mcy9uZnM0eGRyLmMNCj4gaW5kZXggMGFiZmI4NDYuLmQ3MGM2YmMgMTAwNjQ0
DQo+IC0tLSBhL2ZzL25mcy9uZnM0eGRyLmMNCj4gKysrIGIvZnMvbmZzL25mczR4ZHIuYw0KPiBA
QCAtNDE2LDYgKzQxNiwxNiBAQCBzdGF0aWMgaW50IG5mczRfc3RhdF90b19lcnJubyhpbnQpOw0K
PiAgI2RlZmluZSBkZWNvZGVfc2VxdWVuY2VfbWF4c3oJMA0KPiAgI2VuZGlmIC8qIENPTkZJR19O
RlNfVjRfMSAqLw0KPiAgDQo+ICsjaWZkZWYgQ09ORklHX05GU19WNF8yDQo+ICsjZGVmaW5lIGVu
Y29kZV9jb3B5X21heHN6CQkob3BfZW5jb2RlX2hkcl9tYXhzeiArICAgICAgICAgIFwNCj4gKwkJ
CQkJIFhEUl9RVUFETEVOKE5GUzRfU1RBVEVJRF9TSVpFKSArIFwNCj4gKwkJCQkJIFhEUl9RVUFE
TEVOKE5GUzRfU1RBVEVJRF9TSVpFKSArIFwNCj4gKwkJCQkJIDIgKyAyICsgMiArIDEgKyAxICsg
MSkNCj4gKyNkZWZpbmUgZGVjb2RlX2NvcHlfbWF4c3oJCShvcF9kZWNvZGVfaGRyX21heHN6ICsg
XA0KPiArCQkJCQkgMSArIFhEUl9RVUFETEVOKE5GUzRfU1RBVEVJRF9TSVpFKSArIFwNCj4gKwkJ
CQkJIDIgKyAxICsgWERSX1FVQURMRU4oTkZTNF9WRVJJRklFUl9TSVpFKSkNCj4gKyNlbmRpZiAv
KiBDT05GSUdfTkZTX1Y0XzIgKi8NCj4gKw0KPiAgI2RlZmluZSBORlM0X2VuY19jb21wb3VuZF9z
egkoMTAyNCkgIC8qIFhYWDogbGFyZ2UgZW5vdWdoPyAqLw0KPiAgI2RlZmluZSBORlM0X2RlY19j
b21wb3VuZF9zegkoMTAyNCkgIC8qIFhYWDogbGFyZ2UgZW5vdWdoPyAqLw0KPiAgI2RlZmluZSBO
RlM0X2VuY19yZWFkX3N6CShjb21wb3VuZF9lbmNvZGVfaGRyX21heHN6ICsgXA0KPiBAQCAtODc1
LDYgKzg4NSwxOSBAQCBjb25zdCB1MzIgbmZzNDFfbWF4Z2V0ZGV2aW5mb19vdmVyaGVhZCA9ICgo
UlBDX01BWF9SRVBIRUFERVJfV0lUSF9BVVRIICsNCj4gIEVYUE9SVF9TWU1CT0xfR1BMKG5mczQx
X21heGdldGRldmluZm9fb3ZlcmhlYWQpOw0KPiAgI2VuZGlmIC8qIENPTkZJR19ORlNfVjRfMSAq
Lw0KPiAgDQo+ICsjaWZkZWYgQ09ORklHX05GU19WNF8yDQo+ICsjZGVmaW5lIE5GUzRfZW5jX2Nv
cHlfc3oJCShjb21wb3VuZF9lbmNvZGVfaGRyX21heHN6ICsgXA0KPiArCQkJCQkgZW5jb2RlX3B1
dGZoX21heHN6ICsgXA0KPiArCQkJCQkgZW5jb2RlX3NhdmVmaF9tYXhzeiArIFwNCj4gKwkJCQkJ
IGVuY29kZV9wdXRmaF9tYXhzeiArIFwNCj4gKwkJCQkJIGVuY29kZV9jb3B5X21heHN6KQ0KPiAr
I2RlZmluZSBORlM0X2RlY19jb3B5X3N6CQkoY29tcG91bmRfZGVjb2RlX2hkcl9tYXhzeiArIFwN
Cj4gKwkJCQkJIGRlY29kZV9wdXRmaF9tYXhzeiArIFwNCj4gKwkJCQkJIGRlY29kZV9zYXZlZmhf
bWF4c3ogKyBcDQo+ICsJCQkJCSBkZWNvZGVfcHV0ZmhfbWF4c3ogKyBcDQo+ICsJCQkJCSBkZWNv
ZGVfY29weV9tYXhzeikNCj4gKyNlbmRpZiAvKiBDT05GSUdfTkZTX1Y0XzIgKi8NCj4gKw0KPiAg
c3RhdGljIGNvbnN0IHVtb2RlX3QgbmZzX3R5cGUyZm10W10gPSB7DQo+ICAJW05GNEJBRF0gPSAw
LA0KPiAgCVtORjRSRUddID0gU19JRlJFRywNCj4gQEAgLTIwNDgsNiArMjA3MSwyNyBAQCBzdGF0
aWMgdm9pZCBlbmNvZGVfZnJlZV9zdGF0ZWlkKHN0cnVjdCB4ZHJfc3RyZWFtICp4ZHIsDQo+ICB9
DQo+ICAjZW5kaWYgLyogQ09ORklHX05GU19WNF8xICovDQo+ICANCj4gKyNpZmRlZiBDT05GSUdf
TkZTX1Y0XzINCj4gK3N0YXRpYyB2b2lkIGVuY29kZV9jb3B5KHN0cnVjdCB4ZHJfc3RyZWFtICp4
ZHIsDQo+ICsJCQlzdHJ1Y3QgbmZzNDJfY29weV9hcmdzICphcmdzLA0KPiArCQkJc3RydWN0IGNv
bXBvdW5kX2hkciAqaGRyKQ0KPiArew0KPiArCWVuY29kZV9vcF9oZHIoeGRyLCBPUF9DT1BZLCBk
ZWNvZGVfY29weV9tYXhzeiwgaGRyKTsNCj4gKwllbmNvZGVfbmZzNF9zdGF0ZWlkKHhkciwgJmFy
Z3MtPnNyY19zdGF0ZWlkKTsNCj4gKwllbmNvZGVfbmZzNF9zdGF0ZWlkKHhkciwgJmFyZ3MtPmRz
dF9zdGF0ZWlkKTsNCj4gKw0KPiArCS8qIFRPRE86IFBhcnRpYWwgZmlsZSBjb3B5IHdpdGggY2hh
bmdhYmxlIG9mZnNldHMgKi8NCj4gKwllbmNvZGVfdWludDY0KHhkciwgYXJncy0+c3JjX3Bvcyk7
IC8qIHNyYyBvZmZzZXQgKi8NCj4gKwllbmNvZGVfdWludDY0KHhkciwgYXJncy0+ZHN0X3Bvcyk7
IC8qIGRzdCBvZmZzZXQgKi8NCj4gKwllbmNvZGVfdWludDY0KHhkciwgYXJncy0+Y291bnQpOyAv
KiBjb3VudCAqLw0KPiArDQo+ICsJZW5jb2RlX3VpbnQzMih4ZHIsIENPUFk0X01FVEFEQVRBKTsg
LyogZmxhZ3MgKi8NCg0KQXJlIHdlIHN1cmUgdGhhdCB3ZSB3YW50IENPUFk0X01FVEFEQVRBIGhl
cmU/IEkgYXNzdW1lZCB0aGUgY29weV9yYW5nZQ0Kd291bGQgY29weSBmaWxlIGRhdGEgcmFuZ2Vz
IG9ubHkuDQoNCj4gKw0KPiArCWVuY29kZV91aW50MzIoeGRyLCAwKTsgLyogY2FfZGVzdGluYXRp
b24gKi8NCj4gKwllbmNvZGVfdWludDMyKHhkciwgMCk7IC8qIHNyYyBzZXJ2ZXIgbGlzdCAqLw0K
PiArfQ0KPiArI2VuZGlmIC8qIENPTkZJR19ORlNfVjRfMiAqLw0KPiArDQo+ICAvKg0KPiAgICog
RU5EIE9GICJHRU5FUklDIiBFTkNPREUgUk9VVElORVMuDQo+ICAgKi8NCj4gQEAgLTI5OTQsNiAr
MzAzOCwyOSBAQCBzdGF0aWMgdm9pZCBuZnM0X3hkcl9lbmNfZnJlZV9zdGF0ZWlkKHN0cnVjdCBy
cGNfcnFzdCAqcmVxLA0KPiAgfQ0KPiAgI2VuZGlmIC8qIENPTkZJR19ORlNfVjRfMSAqLw0KPiAg
DQo+ICsjaWZkZWYgQ09ORklHX05GU19WNF8yDQo+ICsvKg0KPiArICogRW5jb2RlIENPUFkgcmVx
dWVzdA0KPiArICovDQo+ICtzdGF0aWMgdm9pZCBuZnM0X3hkcl9lbmNfY29weShzdHJ1Y3QgcnBj
X3Jxc3QgKnJlcSwNCj4gKwkJCSAgICAgIHN0cnVjdCB4ZHJfc3RyZWFtICp4ZHIsDQo+ICsJCQkg
ICAgICBzdHJ1Y3QgbmZzNDJfY29weV9hcmdzICphcmdzKQ0KPiArew0KPiArCXN0cnVjdCBjb21w
b3VuZF9oZHIgaGRyID0gew0KPiArCQkubWlub3J2ZXJzaW9uID0gbmZzNF94ZHJfbWlub3J2ZXJz
aW9uKCZhcmdzLT5zZXFfYXJncyksDQo+ICsJfTsNCj4gKw0KPiArCWVuY29kZV9jb21wb3VuZF9o
ZHIoeGRyLCByZXEsICZoZHIpOw0KPiArCWVuY29kZV9zZXF1ZW5jZSh4ZHIsICZhcmdzLT5zZXFf
YXJncywgJmhkcik7DQo+ICsJZW5jb2RlX3B1dGZoKHhkciwgYXJncy0+c3JjX2ZoLCAmaGRyKTsN
Cj4gKwllbmNvZGVfc2F2ZWZoKHhkciwgJmhkcik7DQo+ICsJZW5jb2RlX3B1dGZoKHhkciwgYXJn
cy0+ZHN0X2ZoLCAmaGRyKTsNCj4gKwllbmNvZGVfY29weSh4ZHIsIGFyZ3MsICZoZHIpOw0KPiAr
CWVuY29kZV9ub3BzKCZoZHIpOw0KPiArCXJldHVybjsNCj4gK30NCj4gKyNlbmRpZiAvKiBDT05G
SUdfTkZTX1Y0XzIgKi8NCj4gKw0KPiAgc3RhdGljIHZvaWQgcHJpbnRfb3ZlcmZsb3dfbXNnKGNv
bnN0IGNoYXIgKmZ1bmMsIGNvbnN0IHN0cnVjdCB4ZHJfc3RyZWFtICp4ZHIpDQo+ICB7DQo+ICAJ
ZHByaW50aygibmZzOiAlczogcHJlbWF0dXJlbHkgaGl0IGVuZCBvZiByZWNlaXZlIGJ1ZmZlci4g
Ig0KPiBAQCAtNTk0Myw2ICs2MDEwLDU0IEBAIG91dF9vdmVyZmxvdzoNCj4gIH0NCj4gICNlbmRp
ZiAvKiBDT05GSUdfTkZTX1Y0XzEgKi8NCj4gIA0KPiArI2lmZGVmIENPTkZJR19ORlNfVjRfMg0K
PiArc3RhdGljIGludCBkZWNvZGVfd3JpdGVfcmVzcG9uc2Uoc3RydWN0IHhkcl9zdHJlYW0gKnhk
ciwNCj4gKwkJCQkgc3RydWN0IG5mczQyX3dyaXRlX3Jlc3BvbnNlICp3cml0ZV9yZXMpDQo+ICt7
DQo+ICsJX19iZTMyICpwOw0KPiArCWludCBudW1faWRzOw0KPiArDQo+ICsJcCA9IHhkcl9pbmxp
bmVfZGVjb2RlKHhkciwgNCk7DQo+ICsJaWYgKHVubGlrZWx5KCFwKSkNCj4gKwkJZ290byBvdXRf
b3ZlcmZsb3c7DQo+ICsJbnVtX2lkcyA9IGJlMzJfdG9fY3B1cChwKTsNCj4gKw0KPiArCWlmIChu
dW1faWRzID09IDApDQo+ICsJCXdyaXRlX3Jlcy0+d3Jfc3RhdGVpZCA9IE5VTEw7DQo+ICsJZWxz
ZSB7DQo+ICsJCXdyaXRlX3Jlcy0+d3Jfc3RhdGVpZCA9IGttYWxsb2Moc2l6ZW9mKG5mczRfc3Rh
dGVpZCksIEdGUF9LRVJORUwpOw0KDQpQbGVhc2UgZG9uJ3QgYWxsb2NhdGUgZnJvbSBYRFIgcm91
dGluZXMuIFlvdSBhcmUgYmV0dGVyIG9mZg0KcHJlYWxsb2NhdGluZyB0aGUgYWJvdmUgc3RydWN0
dXJlIGluIHRoZSBuZnM0Ml9wcm9jXyByb3V0aW5lLiBCZXR0ZXINCnlldCwgZW1iZWQgaXQgaW4g
c3RydWN0IG5mczQyX3dyaXRlX3Jlc3BvbnNlLg0KDQo+ICsJCWlmIChkZWNvZGVfc3RhdGVpZCh4
ZHIsIHdyaXRlX3Jlcy0+d3Jfc3RhdGVpZCkgIT0gMCkNCj4gKwkJCWdvdG8gb3V0X2ZyZWU7DQo+
ICsJfQ0KPiArDQo+ICsJcCA9IHhkcl9pbmxpbmVfZGVjb2RlKHhkciwgMTIpOw0KPiArCWlmICh1
bmxpa2VseSghcCkpDQo+ICsJCWdvdG8gb3V0X2ZyZWU7DQo+ICsJcCA9IHhkcl9kZWNvZGVfaHlw
ZXIocCwgJndyaXRlX3Jlcy0+d3JfYnl0ZXNfY29waWVkKTsNCj4gKwl3cml0ZV9yZXMtPndyX2Nv
bW1pdHRlZCA9IGJlMzJfdG9fY3B1cChwKTsNCj4gKw0KPiArCXJldHVybiBkZWNvZGVfd3JpdGVf
dmVyaWZpZXIoeGRyLCAmd3JpdGVfcmVzLT53cl92ZXJmKTsNCj4gKw0KPiArb3V0X2ZyZWU6DQo+
ICsJa2ZyZWUod3JpdGVfcmVzLT53cl9zdGF0ZWlkKTsNCj4gKw0KPiArb3V0X292ZXJmbG93Og0K
PiArCXByaW50X292ZXJmbG93X21zZyhfX2Z1bmNfXywgeGRyKTsNCj4gKwlyZXR1cm4gLUVJTzsN
Cj4gK30NCj4gKw0KPiArc3RhdGljIGludCBkZWNvZGVfY29weShzdHJ1Y3QgeGRyX3N0cmVhbSAq
eGRyLCBzdHJ1Y3QgbmZzNDJfY29weV9yZXMgKnJlcykNCj4gK3sNCj4gKwlpbnQgc3RhdHVzOw0K
PiArDQo+ICsJc3RhdHVzID0gZGVjb2RlX29wX2hkcih4ZHIsIE9QX0NPUFkpOw0KPiArCWlmIChz
dGF0dXMpDQo+ICsJCXJldHVybiBzdGF0dXM7DQo+ICsNCj4gKwlyZXR1cm4gZGVjb2RlX3dyaXRl
X3Jlc3BvbnNlKHhkciwgJnJlcy0+Y3BfcmVzKTsNCj4gK30NCj4gKyNlbmRpZiAvKiBDT05GSUdf
TkZTX1Y0XzIgKi8NCj4gKw0KPiAgLyoNCj4gICAqIEVORCBPRiAiR0VORVJJQyIgREVDT0RFIFJP
VVRJTkVTLg0KPiAgICovDQo+IEBAIC03MTU1LDYgKzcyNzAsMzggQEAgb3V0Og0KPiAgfQ0KPiAg
I2VuZGlmIC8qIENPTkZJR19ORlNfVjRfMSAqLw0KPiAgDQo+ICsjaWZkZWYgQ09ORklHX05GU19W
NF8yDQo+ICsvKg0KPiArICogRGVjb2RlIENPUFkgcmVxdWVzdA0KPiArICovDQo+ICtzdGF0aWMg
aW50IG5mczRfeGRyX2RlY19jb3B5KHN0cnVjdCBycGNfcnFzdCAqcnFzdHAsDQo+ICsJCQkgICAg
IHN0cnVjdCB4ZHJfc3RyZWFtICp4ZHIsDQo+ICsJCQkgICAgIHN0cnVjdCBuZnM0Ml9jb3B5X3Jl
cyAqcmVzKQ0KPiArew0KPiArCXN0cnVjdCBjb21wb3VuZF9oZHIgaGRyOw0KPiArCWludCBzdGF0
dXM7DQo+ICsNCj4gKwlzdGF0dXMgPSBkZWNvZGVfY29tcG91bmRfaGRyKHhkciwgJmhkcik7DQo+
ICsJaWYgKHN0YXR1cykNCj4gKwkJZ290byBvdXQ7DQo+ICsJc3RhdHVzID0gZGVjb2RlX3NlcXVl
bmNlKHhkciwgJnJlcy0+c2VxX3JlcywgcnFzdHApOw0KPiArCWlmIChzdGF0dXMpDQo+ICsJCWdv
dG8gb3V0Ow0KPiArCXN0YXR1cyA9IGRlY29kZV9wdXRmaCh4ZHIpOw0KPiArCWlmIChzdGF0dXMp
DQo+ICsJCWdvdG8gb3V0Ow0KPiArCXN0YXR1cyA9IGRlY29kZV9zYXZlZmgoeGRyKTsNCj4gKwlp
ZiAoc3RhdHVzKQ0KPiArCQlnb3RvIG91dDsNCj4gKwlzdGF0dXMgPSBkZWNvZGVfcHV0ZmgoeGRy
KTsNCj4gKwlpZiAoc3RhdHVzKQ0KPiArCQlnb3RvIG91dDsNCj4gKwlzdGF0dXMgPSBkZWNvZGVf
Y29weSh4ZHIsIHJlcyk7DQo+ICtvdXQ6DQo+ICsJcmV0dXJuIHN0YXR1czsNCj4gK30NCj4gKyNl
bmRpZiAvKiBDT05GSUdfTkZTX1Y0XzIgKi8NCj4gKw0KPiAgLyoqDQo+ICAgKiBuZnM0X2RlY29k
ZV9kaXJlbnQgLSBEZWNvZGUgYSBzaW5nbGUgTkZTdjQgZGlyZWN0b3J5IGVudHJ5IHN0b3JlZCBp
bg0KPiAgICogICAgICAgICAgICAgICAgICAgICAgdGhlIGxvY2FsIHBhZ2UgY2FjaGUuDQo+IEBA
IC03MzY0LDYgKzc1MTEsOSBAQCBzdHJ1Y3QgcnBjX3Byb2NpbmZvCW5mczRfcHJvY2VkdXJlc1td
ID0gew0KPiAgCQkJZW5jX2JpbmRfY29ubl90b19zZXNzaW9uLCBkZWNfYmluZF9jb25uX3RvX3Nl
c3Npb24pLA0KPiAgCVBST0MoREVTVFJPWV9DTElFTlRJRCwJZW5jX2Rlc3Ryb3lfY2xpZW50aWQs
CWRlY19kZXN0cm95X2NsaWVudGlkKSwNCj4gICNlbmRpZiAvKiBDT05GSUdfTkZTX1Y0XzEgKi8N
Cj4gKyNpZiBkZWZpbmVkKENPTkZJR19ORlNfVjRfMikNCj4gKwlQUk9DKENPUFksCQllbmNfY29w
eSwJCWRlY19jb3B5KSwNCj4gKyNlbmRpZiAvKiBDT05GSUdfTkZTX1Y0XzIgKi8NCj4gIH07DQo+
ICANCj4gIGNvbnN0IHN0cnVjdCBycGNfdmVyc2lvbiBuZnNfdmVyc2lvbjQgPSB7DQo+IGRpZmYg
LS1naXQgYS9pbmNsdWRlL2xpbnV4L25mczQuaCBiL2luY2x1ZGUvbGludXgvbmZzNC5oDQo+IGlu
ZGV4IGViZjYwYzYuLjM0N2RlNjMgMTAwNjQ0DQo+IC0tLSBhL2luY2x1ZGUvbGludXgvbmZzNC5o
DQo+ICsrKyBiL2luY2x1ZGUvbGludXgvbmZzNC5oDQo+IEBAIC0xMjAsNyArMTIwLDcgQEAgZW51
bSBuZnNfb3BudW00IHsNCj4gIE5lZWRzIHRvIGJlIHVwZGF0ZWQgaWYgbW9yZSBvcGVyYXRpb25z
IGFyZSBkZWZpbmVkIGluIGZ1dHVyZS4qLw0KPiAgDQo+ICAjZGVmaW5lIEZJUlNUX05GUzRfT1AJ
T1BfQUNDRVNTDQo+IC0jZGVmaW5lIExBU1RfTkZTNF9PUCAJT1BfUkVDTEFJTV9DT01QTEVURQ0K
PiArI2RlZmluZSBMQVNUX05GUzRfT1AgCU9QX0NPUFkNCj4gIA0KPiAgZW51bSBuZnNzdGF0NCB7
DQo+ICAJTkZTNF9PSyA9IDAsDQo+IEBAIC0zMzMsNiArMzMzLDEyIEBAIGVudW0gbG9ja190eXBl
NCB7DQo+ICAJTkZTNF9XUklURVdfTFQgPSA0DQo+ICB9Ow0KPiAgDQo+ICsjaWZkZWYgQ09ORklH
X05GU19WNF8yDQo+ICtlbnVtIGNvcHlfZmxhZ3M0IHsNCj4gKwlDT1BZNF9HVUFSREVEID0gKDEg
PDwgMCksDQo+ICsJQ09QWTRfTUVUQURBVEEgPSAoMSA8PCAxKSwNCj4gK307DQo+ICsjZW5kaWYN
Cj4gIA0KPiAgLyogTWFuZGF0b3J5IEF0dHJpYnV0ZXMgKi8NCj4gICNkZWZpbmUgRkFUVFI0X1dP
UkQwX1NVUFBPUlRFRF9BVFRSUyAgICAoMVVMIDw8IDApDQo+IEBAIC00ODEsNiArNDg3LDkgQEAg
ZW51bSB7DQo+ICAJTkZTUFJPQzRfQ0xOVF9HRVRERVZJQ0VMSVNULA0KPiAgCU5GU1BST0M0X0NM
TlRfQklORF9DT05OX1RPX1NFU1NJT04sDQo+ICAJTkZTUFJPQzRfQ0xOVF9ERVNUUk9ZX0NMSUVO
VElELA0KPiArDQo+ICsJLyogbmZzNDIgKi8NCj4gKwlORlNQUk9DNF9DTE5UX0NPUFksDQo+ICB9
Ow0KPiAgDQo+ICAvKiBuZnM0MSB0eXBlcyAqLw0KPiBkaWZmIC0tZ2l0IGEvaW5jbHVkZS9saW51
eC9uZnNfeGRyLmggYi9pbmNsdWRlL2xpbnV4L25mc194ZHIuaA0KPiBpbmRleCA4NjUxNTc0Li4w
YmM2YjE0IDEwMDY0NA0KPiAtLS0gYS9pbmNsdWRlL2xpbnV4L25mc194ZHIuaA0KPiArKysgYi9p
bmNsdWRlL2xpbnV4L25mc194ZHIuaA0KPiBAQCAtMTIwNCw2ICsxMjA0LDM2IEBAIHN0cnVjdCBw
bmZzX2RzX2NvbW1pdF9pbmZvIHsNCj4gIA0KPiAgI2VuZGlmIC8qIENPTkZJR19ORlNfVjRfMSAq
Lw0KPiAgDQo+ICsjaWZkZWYgQ09ORklHX05GU19WNF8yDQo+ICtzdHJ1Y3QgbmZzNDJfd3JpdGVf
cmVzcG9uc2UNCj4gK3sNCj4gKwluZnM0X3N0YXRlaWQJCQkqd3Jfc3RhdGVpZDsNCj4gKwl1NjQJ
CQkJd3JfYnl0ZXNfY29waWVkOw0KPiArCWludAkJCQl3cl9jb21taXR0ZWQ7DQo+ICsJc3RydWN0
IG5mc193cml0ZV92ZXJpZmllcgl3cl92ZXJmOw0KPiArfTsNCj4gKw0KPiArc3RydWN0IG5mczQy
X2NvcHlfYXJncyB7DQo+ICsJc3RydWN0IG5mczRfc2VxdWVuY2VfYXJncwlzZXFfYXJnczsNCj4g
Kw0KPiArCXN0cnVjdCBuZnNfZmgJCQkqc3JjX2ZoOw0KPiArCW5mczRfc3RhdGVpZAkJCXNyY19z
dGF0ZWlkOw0KPiArCXU2NAkJCQlzcmNfcG9zOw0KPiArDQo+ICsJc3RydWN0IG5mc19maAkJCSpk
c3RfZmg7DQo+ICsJbmZzNF9zdGF0ZWlkCQkJZHN0X3N0YXRlaWQ7DQo+ICsJdTY0CQkJCWRzdF9w
b3M7DQo+ICsNCj4gKwl1NjQJCQkJY291bnQ7DQo+ICt9Ow0KPiArDQo+ICtzdHJ1Y3QgbmZzNDJf
Y29weV9yZXMgew0KPiArCXN0cnVjdCBuZnM0X3NlcXVlbmNlX3JlcwlzZXFfcmVzOw0KPiArCXVu
c2lnbmVkIGludAkJCXN0YXR1czsNCj4gKwlzdHJ1Y3QgbmZzNDJfd3JpdGVfcmVzcG9uc2UJY3Bf
cmVzOw0KPiArfTsNCj4gKyNlbmRpZg0KPiArDQo+ICBzdHJ1Y3QgbmZzX3BhZ2U7DQo+ICANCj4g
ICNkZWZpbmUgTkZTX1BBR0VWRUNfU0laRQkoOFUpDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpM
aW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0
YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg==

2013-07-19 21:03:54

by Anna Schumaker

[permalink] [raw]
Subject: [RFC 1/5] Improve on the copyfile systemcall

From: Bryan Schumaker <[email protected]>

I added in a fallback to do_splice_direct() if the filesystem doesn't
support the copy_range call. This is because the declaration of
do_splice_direct() is now found in fs/internal.h and can't be used by
other filesystems.

I also had to add sys_copy_range to include/linux/syscalls.h to get my
test program to recognize the new syscall.

Other thoughts:
- Pass count = 0 to mean "copy the entire file"
- rw_verify_area() limits count to values that can fit in an int, so
files larger than about 2GB cannot be copied.
---
fs/copy_range.c | 10 +++++++---
include/linux/syscalls.h | 1 +
2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/copy_range.c b/fs/copy_range.c
index 3000b9f..bcf6e67 100644
--- a/fs/copy_range.c
+++ b/fs/copy_range.c
@@ -10,6 +10,8 @@
#include <linux/export.h>
#include <linux/fsnotify.h>

+#include "internal.h"
+
/**
* vfs_copy_range - copy range of bytes from source file to existing file
* @file_in: source regular file
@@ -52,7 +54,7 @@ ssize_t vfs_copy_range(struct file *file_in, loff_t pos_in,
if (!(file_in->f_mode & FMODE_READ) ||
!(file_out->f_mode & FMODE_WRITE) ||
(file_out->f_flags & O_APPEND) ||
- !file_in->f_op || !file_in->f_op->copy_range)
+ !file_in->f_op)
return -EINVAL;

inode_in = file_inode(file_in);
@@ -82,8 +84,10 @@ ssize_t vfs_copy_range(struct file *file_in, loff_t pos_in,
if (ret)
return ret;

- ret = file_in->f_op->copy_range(file_in, pos_in, file_out, pos_out,
- count);
+ if (file_in->f_op->copy_range)
+ ret = file_in->f_op->copy_range(file_in, pos_in, file_out, pos_out, count);
+ else
+ ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, count, 0);
if (ret > 0) {
fsnotify_access(file_in);
add_rchar(current, ret);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 4147d70..5afcd00 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -485,6 +485,7 @@ asmlinkage long sys_sendfile(int out_fd, int in_fd,
off_t __user *offset, size_t count);
asmlinkage long sys_sendfile64(int out_fd, int in_fd,
loff_t __user *offset, size_t count);
+asmlinkage long sys_copy_range(int, loff_t __user *, int, loff_t __user *, size_t);
asmlinkage long sys_readlink(const char __user *path,
char __user *buf, int bufsiz);
asmlinkage long sys_creat(const char __user *pathname, umode_t mode);
--
1.8.3.3


2013-07-24 14:28:29

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC 5/5] NFS: Change copy to support async servers

T24gRnJpLCAyMDEzLTA3LTE5IGF0IDE3OjAzIC0wNDAwLCBianNjaHVtYUBuZXRhcHAuY29tIHdy
b3RlOg0KPiBGcm9tOiBCcnlhbiBTY2h1bWFrZXIgPGJqc2NodW1hQG5ldGFwcC5jb20+DQo+IA0K
PiBTdXBwb3J0aW5nIENCX09GRkxPQUQgaXMgcmVxdWlyZWQgYnkgdGhlIHNwZWMsIHNvIGlmIGEg
c2VydmVyIGNob29zZXMgdG8NCj4gY29weSBpbiB0aGUgYmFja2dyb3VuZCB3ZSBoYXZlIHRvIHdh
aXQgZm9yIHRoZSBjb3B5IHRvIGZpbmlzaCBiZWZvcmUNCj4gcmV0dXJuaW5nIHRvIHVzZXJzcGFj
ZS4NCj4gDQo+IFNpZ25lZC1vZmYtYnk6IEJyeWFuIFNjaHVtYWtlciA8YmpzY2h1bWFAbmV0YXBw
LmNvbT4NCj4gLS0tDQo+ICBmcy9uZnMvY2FsbGJhY2suaCAgICAgICB8IDEzICsrKysrKysrKysr
Kw0KPiAgZnMvbmZzL2NhbGxiYWNrX3Byb2MuYyAgfCAgOSArKysrKysrKysNCj4gIGZzL25mcy9j
YWxsYmFja194ZHIuYyAgIHwgNTQgKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr
KysrKysrKysrKy0tLQ0KPiAgZnMvbmZzL25mczRfZnMuaCAgICAgICAgfCAgMyArKysNCj4gIGZz
L25mcy9uZnM0ZmlsZS5jICAgICAgIHwgNDggKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr
KysrKysrKysrKysrKw0KPiAgZnMvbmZzL25mczR4ZHIuYyAgICAgICAgfCAgNCArKy0tDQo+ICBp
bmNsdWRlL2xpbnV4L25mc194ZHIuaCB8ICAzICsrKw0KPiAgNyBmaWxlcyBjaGFuZ2VkLCAxMjkg
aW5zZXJ0aW9ucygrKSwgNSBkZWxldGlvbnMoLSkNCj4gDQo+IGRpZmYgLS1naXQgYS9mcy9uZnMv
Y2FsbGJhY2suaCBiL2ZzL25mcy9jYWxsYmFjay5oDQo+IGluZGV4IDg0MzI2ZTkuLmFlNWE1ZDIg
MTAwNjQ0DQo+IC0tLSBhL2ZzL25mcy9jYWxsYmFjay5oDQo+ICsrKyBiL2ZzL25mcy9jYWxsYmFj
ay5oDQo+IEBAIC0xODcsNiArMTg3LDE5IEBAIGV4dGVybiBfX2JlMzIgbmZzNF9jYWxsYmFja19k
ZXZpY2Vub3RpZnkoDQo+ICAJdm9pZCAqZHVtbXksIHN0cnVjdCBjYl9wcm9jZXNzX3N0YXRlICpj
cHMpOw0KPiAgDQo+ICAjZW5kaWYgLyogQ09ORklHX05GU19WNF8xICovDQo+ICsNCj4gKyNpZmRl
ZiBDT05GSUdfTkZTX1Y0XzINCj4gK3N0cnVjdCBjYl9vZmZsb2FkYXJncyB7DQo+ICsJc3RydWN0
IG5mc19maAkJCWRzdF9maDsNCj4gKwluZnM0X3N0YXRlaWQJCQlzdGF0ZWlkOw0KPiArCXN0cnVj
dCBuZnM0Ml93cml0ZV9yZXNwb25zZQl3cml0ZV9yZXM7DQo+ICt9Ow0KPiArDQo+ICtleHRlcm4g
X19iZTMyIG5mczRfY2FsbGJhY2tfb2ZmbG9hZChzdHJ1Y3QgY2Jfb2ZmbG9hZGFyZ3MgKiwgdm9p
ZCAqLA0KPiArCQkJCSAgICBzdHJ1Y3QgY2JfcHJvY2Vzc19zdGF0ZSAqKTsNCj4gK3ZvaWQgd2Fr
ZV9jb3B5X29mZmxvYWQoc3RydWN0IGNiX29mZmxvYWRhcmdzICopOw0KPiArI2VuZGlmIC8qIENP
TkZJR19ORlNfVjRfMiAqLw0KPiArDQo+ICBleHRlcm4gaW50IGNoZWNrX2dzc19jYWxsYmFja19w
cmluY2lwYWwoc3RydWN0IG5mc19jbGllbnQgKiwgc3RydWN0IHN2Y19ycXN0ICopOw0KPiAgZXh0
ZXJuIF9fYmUzMiBuZnM0X2NhbGxiYWNrX2dldGF0dHIoc3RydWN0IGNiX2dldGF0dHJhcmdzICph
cmdzLA0KPiAgCQkJCSAgICBzdHJ1Y3QgY2JfZ2V0YXR0cnJlcyAqcmVzLA0KPiBkaWZmIC0tZ2l0
IGEvZnMvbmZzL2NhbGxiYWNrX3Byb2MuYyBiL2ZzL25mcy9jYWxsYmFja19wcm9jLmMNCj4gaW5k
ZXggZTZlYmM0Yy4uY2RmNDE4MCAxMDA2NDQNCj4gLS0tIGEvZnMvbmZzL2NhbGxiYWNrX3Byb2Mu
Yw0KPiArKysgYi9mcy9uZnMvY2FsbGJhY2tfcHJvYy5jDQo+IEBAIC01MzMsMyArNTMzLDEyIEBA
IG91dDoNCj4gIAlyZXR1cm4gc3RhdHVzOw0KPiAgfQ0KPiAgI2VuZGlmIC8qIENPTkZJR19ORlNf
VjRfMSAqLw0KPiArDQo+ICsjaWZkZWYgQ09ORklHX05GU19WNF8yDQo+ICtfX2JlMzIgbmZzNF9j
YWxsYmFja19vZmZsb2FkKHN0cnVjdCBjYl9vZmZsb2FkYXJncyAqYXJncywgdm9pZCAqZHVtbXks
DQo+ICsJCQkgICAgIHN0cnVjdCBjYl9wcm9jZXNzX3N0YXRlICpjcHMpDQo+ICt7DQo+ICsJd2Fr
ZV9jb3B5X29mZmxvYWQoYXJncyk7DQo+ICsJcmV0dXJuIGh0b25sKE5GUzRfT0spOw0KPiArfQ0K
PiArI2VuZGlmIC8qIENPTkZJR19ORlNfVjRfMiAqLw0KPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL2Nh
bGxiYWNrX3hkci5jIGIvZnMvbmZzL2NhbGxiYWNrX3hkci5jDQo+IGluZGV4IGY0Y2NmZTYuLmQ4
ZmNmMWEgMTAwNjQ0DQo+IC0tLSBhL2ZzL25mcy9jYWxsYmFja194ZHIuYw0KPiArKysgYi9mcy9u
ZnMvY2FsbGJhY2tfeGRyLmMNCj4gQEAgLTM1LDYgKzM1LDE0IEBADQo+ICAjZGVmaW5lIENCX09Q
X1JFQ0FMTFNMT1RfUkVTX01BWFNaCShDQl9PUF9IRFJfUkVTX01BWFNaKQ0KPiAgI2VuZGlmIC8q
IENPTkZJR19ORlNfVjRfMSAqLw0KPiAgDQo+ICsjaWYgZGVmaW5lZChDT05GSUdfTkZTX1Y0XzIp
DQo+ICsjZGVmaW5lIENCX09QX09GRkxPQURfUkVTX01BWFNaCQkoQ0JfT1BfSERSX1JFU19NQVhT
WiArIFwNCj4gKwkJCQkJIDEgKyBYRFJfUVVBRExFTihORlM0X0ZIU0laRSkgKyBcDQo+ICsJCQkJ
CSBYRFJfUVVBRExFTihORlM0X1NUQVRFSURfU0laRSkgKyBcDQo+ICsJCQkJCSAxICsgWERSX1FV
QURMRU4oTkZTNF9TVEFURUlEX1NJWkUpICsgXA0KPiArCQkJCQkgMiArIDEgKyBYRFJfUVVBRExF
TihORlM0X1ZFUklGSUVSX1NJWkUpKQ0KPiArI2VuZGlmIC8qIENPTkZJR19ORlNfVjRfMiAqLw0K
PiArDQo+ICAjZGVmaW5lIE5GU0RCR19GQUNJTElUWSBORlNEQkdfQ0FMTEJBQ0sNCj4gIA0KPiAg
LyogSW50ZXJuYWwgZXJyb3IgY29kZSAqLw0KPiBAQCAtNTI3LDYgKzUzNSwzNyBAQCBzdGF0aWMg
X19iZTMyIGRlY29kZV9yZWNhbGxzbG90X2FyZ3Moc3RydWN0IHN2Y19ycXN0ICpycXN0cCwNCj4g
IA0KPiAgI2VuZGlmIC8qIENPTkZJR19ORlNfVjRfMSAqLw0KPiAgDQo+ICsjaWZkZWYgQ09ORklH
X05GU19WNF8yDQo+ICtzdGF0aWMgaW5saW5lIF9fYmUzMiBkZWNvZGVfd3JpdGVfcmVzKHN0cnVj
dCB4ZHJfc3RyZWFtICp4ZHIsDQo+ICsJCQkJICAgICAgc3RydWN0IG5mczQyX3dyaXRlX3Jlc3Bv
bnNlICp3cml0ZV9yZXMpDQo+ICt7DQo+ICsJX19iZTMyIHN0YXR1cyA9IGRlY29kZV93cml0ZV9y
ZXNwb25zZSh4ZHIsIHdyaXRlX3Jlcyk7DQo+ICsJaWYgKHN0YXR1cyA9PSAtRUlPKQ0KPiArCQly
ZXR1cm4gaHRvbmwoTkZTNEVSUl9SRVNPVVJDRSk7DQo+ICsJcmV0dXJuIGh0b25sKHN0YXR1cyk7
DQo+ICt9DQo+ICsNCj4gK3N0YXRpYyBfX2JlMzIgZGVjb2RlX29mZmxvYWRfYXJncyhzdHJ1Y3Qg
c3ZjX3Jxc3QgKnJxc3RwLA0KPiArCQkJCSAgc3RydWN0IHhkcl9zdHJlYW0gKnhkciwNCj4gKwkJ
CQkgIHN0cnVjdCBjYl9vZmZsb2FkYXJncyAqYXJncykNCj4gK3sNCj4gKwlfX2JlMzIgc3RhdHVz
Ow0KPiArDQo+ICsJc3RhdHVzID0gZGVjb2RlX2ZoKHhkciwgJmFyZ3MtPmRzdF9maCk7DQo+ICsJ
aWYgKHVubGlrZWx5KHN0YXR1cyAhPSAwKSkNCj4gKwkJZ290byBvdXQ7DQo+ICsNCj4gKwlzdGF0
dXMgPSBkZWNvZGVfc3RhdGVpZCh4ZHIsICZhcmdzLT5zdGF0ZWlkKTsNCj4gKwlpZiAodW5saWtl
bHkoc3RhdHVzICE9IDApKQ0KPiArCQlnb3RvIG91dDsNCj4gKw0KPiArCXN0YXR1cyA9IGRlY29k
ZV93cml0ZV9yZXMoeGRyLCAmYXJncy0+d3JpdGVfcmVzKTsNCj4gK291dDoNCj4gKwlkcHJpbnRr
KCIlczogZXhpdCB3aXRoIHN0YXR1cyA9ICVkXG4iLCBfX2Z1bmNfXywgbnRvaGwoc3RhdHVzKSk7
DQo+ICsJcmV0dXJuIHN0YXR1czsNCj4gK30NCj4gKyNlbmRpZiAvKiBDT05GSUdfTkZTX1Y0XzIg
Ki8NCj4gKw0KPiAgc3RhdGljIF9fYmUzMiBlbmNvZGVfc3RyaW5nKHN0cnVjdCB4ZHJfc3RyZWFt
ICp4ZHIsIHVuc2lnbmVkIGludCBsZW4sIGNvbnN0IGNoYXIgKnN0cikNCj4gIHsNCj4gIAlfX2Jl
MzIgKnA7DQo+IEBAIC03OTQsOSArODMzLDExIEBAIHByZXByb2Nlc3NfbmZzNDJfb3AoaW50IG5v
cCwgdW5zaWduZWQgaW50IG9wX25yLCBzdHJ1Y3QgY2FsbGJhY2tfb3AgKipvcCkNCj4gIAlpZiAo
c3RhdHVzICE9IGh0b25sKE5GUzRFUlJfT1BfSUxMRUdBTCkpDQo+ICAJCXJldHVybiBzdGF0dXM7
DQo+ICANCj4gLQlpZiAob3BfbnIgPT0gT1BfQ0JfT0ZGTE9BRCkNCj4gLQkJcmV0dXJuIGh0b25s
KE5GUzRFUlJfTk9UU1VQUCk7DQo+IC0JcmV0dXJuIGh0b25sKE5GUzRFUlJfT1BfSUxMRUdBTCk7
DQo+ICsJaWYgKG9wX25yICE9IE9QX0NCX09GRkxPQUQpDQo+ICsJCXJldHVybiBodG9ubChORlM0
RVJSX09QX0lMTEVHQUwpOw0KPiArDQo+ICsJKm9wID0gJmNhbGxiYWNrX29wc1tvcF9ucl07DQo+
ICsJcmV0dXJuIGh0b25sKE5GUzRfT0spOw0KPiAgfQ0KPiAgI2Vsc2UgLyogQ09ORklHX05GU19W
NF8yICovDQo+ICBzdGF0aWMgX19iZTMyDQo+IEBAIC05OTEsNiArMTAzMiwxMyBAQCBzdGF0aWMg
c3RydWN0IGNhbGxiYWNrX29wIGNhbGxiYWNrX29wc1tdID0gew0KPiAgCQkucmVzX21heHNpemUg
PSBDQl9PUF9SRUNBTExTTE9UX1JFU19NQVhTWiwNCj4gIAl9LA0KPiAgI2VuZGlmIC8qIENPTkZJ
R19ORlNfVjRfMSAqLw0KPiArI2lmIGRlZmluZWQoQ09ORklHX05GU19WNF8yKQ0KPiArCVtPUF9D
Ql9PRkZMT0FEXSA9IHsNCj4gKwkJLnByb2Nlc3Nfb3AgPSAoY2FsbGJhY2tfcHJvY2Vzc19vcF90
KW5mczRfY2FsbGJhY2tfb2ZmbG9hZCwNCj4gKwkJLmRlY29kZV9hcmdzID0gKGNhbGxiYWNrX2Rl
Y29kZV9hcmdfdClkZWNvZGVfb2ZmbG9hZF9hcmdzLA0KPiArCQkucmVzX21heHNpemUgPSBDQl9P
UF9PRkZMT0FEX1JFU19NQVhTWiwNCj4gKwl9LA0KPiArI2VuZGlmDQo+ICB9Ow0KPiAgDQo+ICAv
Kg0KPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL25mczRfZnMuaCBiL2ZzL25mcy9uZnM0X2ZzLmgNCj4g
aW5kZXggMjZjN2NmMC4uNWMzMmZiNSAxMDA2NDQNCj4gLS0tIGEvZnMvbmZzL25mczRfZnMuaA0K
PiArKysgYi9mcy9uZnMvbmZzNF9mcy5oDQo+IEBAIC00MDcsNiArNDA3LDkgQEAgc3RhdGljIGlu
bGluZSB2b2lkIG5mczRfdW5yZWdpc3Rlcl9zeXNjdGwodm9pZCkNCj4gIA0KPiAgLyogbmZzNHhk
ci5jICovDQo+ICBleHRlcm4gc3RydWN0IHJwY19wcm9jaW5mbyBuZnM0X3Byb2NlZHVyZXNbXTsN
Cj4gKyNpZiBkZWZpbmVkKENPTkZJR19ORlNfVjRfMikNCj4gK2ludCBkZWNvZGVfd3JpdGVfcmVz
cG9uc2Uoc3RydWN0IHhkcl9zdHJlYW0gKiwgc3RydWN0IG5mczQyX3dyaXRlX3Jlc3BvbnNlICop
Ow0KPiArI2VuZGlmIC8qIENPTkZJR19ORlNfVjRfMiAqLw0KPiAgDQo+ICBzdHJ1Y3QgbmZzNF9t
b3VudF9kYXRhOw0KPiAgDQo+IGRpZmYgLS1naXQgYS9mcy9uZnMvbmZzNGZpbGUuYyBiL2ZzL25m
cy9uZnM0ZmlsZS5jDQo+IGluZGV4IGNhNzdhYjQuLmZiZDVmNzcgMTAwNjQ0DQo+IC0tLSBhL2Zz
L25mcy9uZnM0ZmlsZS5jDQo+ICsrKyBiL2ZzL25mcy9uZnM0ZmlsZS5jDQo+IEBAIC00LDYgKzQs
NyBAQA0KPiAgICogIENvcHlyaWdodCAoQykgMTk5MiAgUmljayBTbGFka2V5DQo+ICAgKi8NCj4g
ICNpbmNsdWRlIDxsaW51eC9uZnNfZnMuaD4NCj4gKyNpbmNsdWRlICJjYWxsYmFjay5oIg0KPiAg
I2luY2x1ZGUgImludGVybmFsLmgiDQo+ICAjaW5jbHVkZSAiZnNjYWNoZS5oIg0KPiAgI2luY2x1
ZGUgInBuZnMuaCINCj4gQEAgLTExOCw2ICsxMTksOSBAQCBuZnM0X2ZpbGVfZnN5bmMoc3RydWN0
IGZpbGUgKmZpbGUsIGxvZmZfdCBzdGFydCwgbG9mZl90IGVuZCwgaW50IGRhdGFzeW5jKQ0KPiAg
fQ0KPiAgDQo+ICAjaWZkZWYgQ09ORklHX05GU19WNF8yDQo+ICtzdGF0aWMgTElTVF9IRUFEKG5m
c19jb3B5X2FzeW5jX2xpc3QpOw0KPiArc3RhdGljIERFRklORV9TUElOTE9DSyhhc3luY19jb3B5
X2xvY2spOw0KPiArDQo+ICBzdGF0aWMgaW50IG5mczRfZmluZF9jb3B5X3N0YXRlaWQoc3RydWN0
IGZpbGUgKmZpbGUsIG5mczRfc3RhdGVpZCAqc3RhdGVpZCwNCj4gIAkJCQkgIGZtb2RlX3QgbW9k
ZSkNCj4gIHsNCj4gQEAgLTEzNyw2ICsxNDEsNDMgQEAgc3RhdGljIGludCBuZnM0X2ZpbmRfY29w
eV9zdGF0ZWlkKHN0cnVjdCBmaWxlICpmaWxlLCBuZnM0X3N0YXRlaWQgKnN0YXRlaWQsDQo+ICAJ
cmV0dXJuIHJldDsNCj4gIH0NCj4gIA0KPiArc3RhdGljIHZvaWQgd2FpdF9mb3Jfb2ZmbG9hZChz
dHJ1Y3QgbmZzNDJfY29weV9yZXMgKnJlcykNCj4gK3sNCj4gKwlzcGluX2xvY2soJmFzeW5jX2Nv
cHlfbG9jayk7DQo+ICsJbGlzdF9hZGQoJnJlcy0+d2FpdF9saXN0LCAmbmZzX2NvcHlfYXN5bmNf
bGlzdCk7DQo+ICsJc3Bpbl91bmxvY2soJmFzeW5jX2NvcHlfbG9jayk7DQo+ICsNCj4gKwl3YWl0
X2Zvcl9jb21wbGV0aW9uKCZyZXMtPmNvbXBsZXRpb24pOw0KPiArfQ0KPiArDQo+ICtzdGF0aWMg
c3RydWN0IG5mczQyX2NvcHlfcmVzICpmaW5kX2FzeW5jX2NvcHkobmZzNF9zdGF0ZWlkICpzdGF0
ZWlkKQ0KPiArew0KPiArCXN0cnVjdCBuZnM0Ml9jb3B5X3JlcyAqY3VyOw0KPiArDQo+ICsJbGlz
dF9mb3JfZWFjaF9lbnRyeShjdXIsICZuZnNfY29weV9hc3luY19saXN0LCB3YWl0X2xpc3QpIHsN
Cj4gKwkJaWYgKG1lbWNtcChzdGF0ZWlkLCBjdXItPmNwX3Jlcy53cl9zdGF0ZWlkLCBzaXplb2Yo
bmZzNF9zdGF0ZWlkKSkgPT0gMCkNCj4gKwkJCXJldHVybiBjdXI7DQo+ICsJfQ0KPiArCXJldHVy
biBOVUxMOw0KPiArfQ0KPiArDQo+ICt2b2lkIHdha2VfY29weV9vZmZsb2FkKHN0cnVjdCBjYl9v
ZmZsb2FkYXJncyAqb2ZmbG9hZCkNCg0KUGxlYXNlIHVzZSBhICduZnNfJyBwcmVmaXggaGVyZSB0
byBhdm9pZCBuYW1lc3BhY2UgcG9sbHV0aW9uLiBEaXR0byBmb3INCnRoZSBhYm92ZSByb3V0aW5l
cy4NCg0KPiArew0KPiArCXN0cnVjdCBuZnM0Ml9jb3B5X3JlcyAqY29weTsNCj4gKw0KPiArCXNw
aW5fbG9jaygmYXN5bmNfY29weV9sb2NrKTsNCj4gKwljb3B5ID0gZmluZF9hc3luY19jb3B5KCZv
ZmZsb2FkLT5zdGF0ZWlkKTsNCj4gKwlpZiAoY29weSA9PSBOVUxMKSB7DQo+ICsJCXNwaW5fdW5s
b2NrKCZhc3luY19jb3B5X2xvY2spOw0KPiArCQlyZXR1cm47DQo+ICsJfQ0KPiArCWxpc3RfZGVs
KCZjb3B5LT53YWl0X2xpc3QpOw0KDQpXb3VsZCBpdCBiZSBiZXR0ZXIgdG8gaGF2ZSB3YWl0X2Zv
cl9vZmZsb2FkKCkgY2FsbCBsaXN0X2RlbD8gVGhhdCB3YXksDQp5b3UgY2FuIG1ha2UgdGhlIGNv
bXBsZXRpb24gaW50ZXJydXB0aWJsZS4NCg0KPiArCXNwaW5fdW5sb2NrKCZhc3luY19jb3B5X2xv
Y2spOw0KPiArDQo+ICsJY29weS0+Y3BfcmVzLndyX2J5dGVzX2NvcGllZCA9IG9mZmxvYWQtPndy
aXRlX3Jlcy53cl9ieXRlc19jb3BpZWQ7DQo+ICsJY29tcGxldGUoJmNvcHktPmNvbXBsZXRpb24p
Ow0KDQpZb3UgbWlnaHQgd2FudCB0byBob2xkIHRoZSBhc3luY19jb3B5X2xvY2sgaWYgdGhlIGNv
bXBsZXRpb24gaXMNCmludGVycnVwdGlibGUgdG8gcHJldmVudCB3YWl0X2Zvcl9vZmZsb2FkKCkg
ZnJvbSBkb2luZyBsaXN0X2RlbCgpLg0KDQo+ICt9DQo+ICsNCj4gIHN0YXRpYyBzc2l6ZV90IG5m
czRfY29weV9yYW5nZShzdHJ1Y3QgZmlsZSAqZmlsZV9pbiwgbG9mZl90IHBvc19pbiwNCj4gIAkJ
CSAgICAgICBzdHJ1Y3QgZmlsZSAqZmlsZV9vdXQsIGxvZmZfdCBwb3Nfb3V0LA0KPiAgCQkJICAg
ICAgIHNpemVfdCBjb3VudCkNCj4gQEAgLTE1OSwxMCArMjAwLDE3IEBAIHN0YXRpYyBzc2l6ZV90
IG5mczRfY29weV9yYW5nZShzdHJ1Y3QgZmlsZSAqZmlsZV9pbiwgbG9mZl90IHBvc19pbiwNCj4g
IAlpZiAoZXJyKQ0KPiAgCQlyZXR1cm4gZXJyOw0KPiAgDQo+ICsJaW5pdF9jb21wbGV0aW9uKCZy
ZXMuY29tcGxldGlvbik7DQo+ICsNCj4gIAllcnIgPSBuZnM0Ml9wcm9jX2NvcHkoTkZTX1NFUlZF
UihmaWxlX2lub2RlKGZpbGVfb3V0KSksICZhcmdzLCAmcmVzKTsNCj4gIAlpZiAoZXJyKQ0KPiAg
CQlyZXR1cm4gZXJyOw0KPiAgDQo+ICsJaWYgKHJlcy5jcF9yZXMud3Jfc3RhdGVpZCAhPSBOVUxM
KSB7DQo+ICsJCXdhaXRfZm9yX29mZmxvYWQoJnJlcyk7DQo+ICsJCWtmcmVlKHJlcy5jcF9yZXMu
d3Jfc3RhdGVpZCk7DQo+ICsJfQ0KPiArDQo+ICAJcmV0dXJuIHJlcy5jcF9yZXMud3JfYnl0ZXNf
Y29waWVkOw0KPiAgfQ0KPiAgI2VuZGlmIC8qIENPTkZJR19ORlNfVjRfMiAqLw0KPiBkaWZmIC0t
Z2l0IGEvZnMvbmZzL25mczR4ZHIuYyBiL2ZzL25mcy9uZnM0eGRyLmMNCj4gaW5kZXggZDcwYzZi
Yy4uNDY1ZDFiYyAxMDA2NDQNCj4gLS0tIGEvZnMvbmZzL25mczR4ZHIuYw0KPiArKysgYi9mcy9u
ZnMvbmZzNHhkci5jDQo+IEBAIC02MDExLDggKzYwMTEsOCBAQCBvdXRfb3ZlcmZsb3c6DQo+ICAj
ZW5kaWYgLyogQ09ORklHX05GU19WNF8xICovDQo+ICANCj4gICNpZmRlZiBDT05GSUdfTkZTX1Y0
XzINCj4gLXN0YXRpYyBpbnQgZGVjb2RlX3dyaXRlX3Jlc3BvbnNlKHN0cnVjdCB4ZHJfc3RyZWFt
ICp4ZHIsDQo+IC0JCQkJIHN0cnVjdCBuZnM0Ml93cml0ZV9yZXNwb25zZSAqd3JpdGVfcmVzKQ0K
PiAraW50IGRlY29kZV93cml0ZV9yZXNwb25zZShzdHJ1Y3QgeGRyX3N0cmVhbSAqeGRyLA0KPiAr
CQkJICBzdHJ1Y3QgbmZzNDJfd3JpdGVfcmVzcG9uc2UgKndyaXRlX3JlcykNCj4gIHsNCj4gIAlf
X2JlMzIgKnA7DQo+ICAJaW50IG51bV9pZHM7DQo+IGRpZmYgLS1naXQgYS9pbmNsdWRlL2xpbnV4
L25mc194ZHIuaCBiL2luY2x1ZGUvbGludXgvbmZzX3hkci5oDQo+IGluZGV4IDBiYzZiMTQuLmY2
MDM3OTMgMTAwNjQ0DQo+IC0tLSBhL2luY2x1ZGUvbGludXgvbmZzX3hkci5oDQo+ICsrKyBiL2lu
Y2x1ZGUvbGludXgvbmZzX3hkci5oDQo+IEBAIC0xMjMxLDYgKzEyMzEsOSBAQCBzdHJ1Y3QgbmZz
NDJfY29weV9yZXMgew0KPiAgCXN0cnVjdCBuZnM0X3NlcXVlbmNlX3JlcwlzZXFfcmVzOw0KPiAg
CXVuc2lnbmVkIGludAkJCXN0YXR1czsNCj4gIAlzdHJ1Y3QgbmZzNDJfd3JpdGVfcmVzcG9uc2UJ
Y3BfcmVzOw0KPiArDQo+ICsJc3RydWN0IGxpc3RfaGVhZAkJd2FpdF9saXN0Ow0KPiArCXN0cnVj
dCBjb21wbGV0aW9uCQljb21wbGV0aW9uOw0KPiAgfTsNCj4gICNlbmRpZg0KPiAgDQoNCi0tIA0K
VHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpU
cm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg==

2013-07-22 18:50:04

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On Fri, Jul 19, 2013 at 05:03:49PM -0400, [email protected] wrote:
> From: Bryan Schumaker <[email protected]>
>
> Rather than performing the copy right away, schedule it to run later and
> reply to the client. Later, send a callback to notify the client that
> the copy has finished.

I believe you need to implement the referring triple support described
in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
described in
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
.

I see cb_delay initialized below, but not otherwise used. Am I missing
anything?

What about OFFLOAD_STATUS and OFFLOAD_ABORT?

In some common cases the reply will be very quick, and we might be
better off handling it synchronously. Could we implement a heuristic
like "copy synchronously if the filesystem has special support or the
range is less than the maximum iosize, otherwise copy asynchronously"?

--b.


> Signed-off-by: Bryan Schumaker <[email protected]>
> ---
> fs/nfsd/nfs4callback.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++++
> fs/nfsd/nfs4proc.c | 59 +++++++++++++++------
> fs/nfsd/nfs4state.c | 11 ++++
> fs/nfsd/state.h | 21 ++++++++
> fs/nfsd/xdr4cb.h | 9 ++++
> 5 files changed, 221 insertions(+), 15 deletions(-)
>
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 7f05cd1..8f797e1 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -52,6 +52,9 @@ enum {
> NFSPROC4_CLNT_CB_NULL = 0,
> NFSPROC4_CLNT_CB_RECALL,
> NFSPROC4_CLNT_CB_SEQUENCE,
> +
> + /* NFS v4.2 callback */
> + NFSPROC4_CLNT_CB_OFFLOAD,
> };
>
> struct nfs4_cb_compound_hdr {
> @@ -110,6 +113,7 @@ enum nfs_cb_opnum4 {
> OP_CB_WANTS_CANCELLED = 12,
> OP_CB_NOTIFY_LOCK = 13,
> OP_CB_NOTIFY_DEVICEID = 14,
> + OP_CB_OFFLOAD = 15,
> OP_CB_ILLEGAL = 10044
> };
>
> @@ -469,6 +473,31 @@ out_default:
> return nfs_cb_stat_to_errno(nfserr);
> }
>
> +static void encode_cb_offload4args(struct xdr_stream *xdr,
> + const struct nfs4_cb_offload *offload,
> + struct nfs4_cb_compound_hdr *hdr)
> +{
> + __be32 *p;
> +
> + if (hdr->minorversion < 2)
> + return;
> +
> + encode_nfs_cb_opnum4(xdr, OP_CB_OFFLOAD);
> + encode_nfs_fh4(xdr, &offload->co_dst_fh);
> + encode_stateid4(xdr, &offload->co_stid->sc_stateid);
> +
> + p = xdr_reserve_space(xdr, 4);
> + *p = cpu_to_be32(1);
> + encode_stateid4(xdr, &offload->co_stid->sc_stateid);
> +
> + p = xdr_reserve_space(xdr, 12 + NFS4_VERIFIER_SIZE);
> + p = xdr_encode_hyper(p, offload->co_count);
> + *p++ = cpu_to_be32(offload->co_stable_how);
> + xdr_encode_opaque_fixed(p, offload->co_verifier.data, NFS4_VERIFIER_SIZE);
> +
> + hdr->nops++;
> +}
> +
> /*
> * NFSv4.0 and NFSv4.1 XDR encode functions
> *
> @@ -505,6 +534,23 @@ static void nfs4_xdr_enc_cb_recall(struct rpc_rqst *req, struct xdr_stream *xdr,
> encode_cb_nops(&hdr);
> }
>
> +/*
> + * CB_OFFLOAD
> + */
> +static void nfs4_xdr_enc_cb_offload(struct rpc_rqst *req, struct xdr_stream *xdr,
> + const struct nfsd4_callback *cb)
> +{
> + const struct nfs4_cb_offload *args = cb->cb_op;
> + struct nfs4_cb_compound_hdr hdr = {
> + .ident = cb->cb_clp->cl_cb_ident,
> + .minorversion = cb->cb_minorversion,
> + };
> +
> + encode_cb_compound4args(xdr, &hdr);
> + encode_cb_sequence4args(xdr, cb, &hdr);
> + encode_cb_offload4args(xdr, args, &hdr);
> + encode_cb_nops(&hdr);
> +}
>
> /*
> * NFSv4.0 and NFSv4.1 XDR decode functions
> @@ -552,6 +598,36 @@ out:
> }
>
> /*
> + * CB_OFFLOAD
> + */
> +static int nfs4_xdr_dec_cb_offload(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
> + struct nfsd4_callback *cb)
> +{
> + struct nfs4_cb_compound_hdr hdr;
> + enum nfsstat4 nfserr;
> + int status;
> +
> + status = decode_cb_compound4res(xdr, &hdr);
> + if (unlikely(status))
> + goto out;
> +
> + if (cb != NULL) {
> + status = decode_cb_sequence4res(xdr, cb);
> + if (unlikely(status))
> + goto out;
> + }
> +
> + status = decode_cb_op_status(xdr, OP_CB_OFFLOAD, &nfserr);
> + if (unlikely(status))
> + goto out;
> + if (unlikely(nfserr != NFS4_OK))
> + status = nfs_cb_stat_to_errno(nfserr);
> +
> +out:
> + return status;
> +}
> +
> +/*
> * RPC procedure tables
> */
> #define PROC(proc, call, argtype, restype) \
> @@ -568,6 +644,7 @@ out:
> static struct rpc_procinfo nfs4_cb_procedures[] = {
> PROC(CB_NULL, NULL, cb_null, cb_null),
> PROC(CB_RECALL, COMPOUND, cb_recall, cb_recall),
> + PROC(CB_OFFLOAD, COMPOUND, cb_offload, cb_offload),
> };
>
> static struct rpc_version nfs_cb_version4 = {
> @@ -1017,6 +1094,11 @@ void nfsd4_init_callback(struct nfsd4_callback *cb)
> INIT_WORK(&cb->cb_work, nfsd4_do_callback_rpc);
> }
>
> +void nfsd4_init_delayed_callback(struct nfsd4_callback *cb)
> +{
> + INIT_DELAYED_WORK(&cb->cb_delay, nfsd4_do_callback_rpc);
> +}
> +
> void nfsd4_cb_recall(struct nfs4_delegation *dp)
> {
> struct nfsd4_callback *cb = &dp->dl_recall;
> @@ -1036,3 +1118,57 @@ void nfsd4_cb_recall(struct nfs4_delegation *dp)
>
> run_nfsd4_cb(&dp->dl_recall);
> }
> +
> +static void nfsd4_cb_offload_done(struct rpc_task *task, void *calldata)
> +{
> + struct nfsd4_callback *cb = calldata;
> + struct nfs4_client *clp = cb->cb_clp;
> + struct rpc_clnt *current_rpc_client = clp->cl_cb_client;
> +
> + nfsd4_cb_done(task, calldata);
> +
> + if (current_rpc_client != task->tk_client)
> + return;
> +
> + if (cb->cb_done)
> + return;
> +
> + if (task->tk_status != 0)
> + nfsd4_mark_cb_down(clp, task->tk_status);
> + cb->cb_done = true;
> +}
> +
> +static void nfsd4_cb_offload_release(void *calldata)
> +{
> + struct nfsd4_callback *cb = calldata;
> + struct nfs4_cb_offload *offload = container_of(cb, struct nfs4_cb_offload, co_callback);
> +
> + if (cb->cb_done) {
> + nfs4_free_offload_stateid(offload->co_stid);
> + kfree(offload);
> + }
> +}
> +
> +static const struct rpc_call_ops nfsd4_cb_offload_ops = {
> + .rpc_call_prepare = nfsd4_cb_prepare,
> + .rpc_call_done = nfsd4_cb_offload_done,
> + .rpc_release = nfsd4_cb_offload_release,
> +};
> +
> +void nfsd4_cb_offload(struct nfs4_cb_offload *offload)
> +{
> + struct nfsd4_callback *cb = &offload->co_callback;
> +
> + cb->cb_op = offload;
> + cb->cb_clp = offload->co_stid->sc_client;
> + cb->cb_msg.rpc_proc = &nfs4_cb_procedures[NFSPROC4_CLNT_CB_OFFLOAD];
> + cb->cb_msg.rpc_argp = cb;
> + cb->cb_msg.rpc_resp = cb;
> +
> + cb->cb_ops = &nfsd4_cb_offload_ops;
> +
> + INIT_LIST_HEAD(&cb->cb_per_client);
> + cb->cb_done = true;
> +
> + run_nfsd4_cb(cb);
> +}
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index d4584ea..66a787f 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -35,6 +35,7 @@
> #include <linux/file.h>
> #include <linux/slab.h>
>
> +#include "state.h"
> #include "idmap.h"
> #include "cache.h"
> #include "xdr4.h"
> @@ -1062,29 +1063,57 @@ out:
> return status;
> }
>
> +static void
> +nfsd4_copy_async(struct work_struct *w)
> +{
> + __be32 status;
> + struct nfs4_cb_offload *offload;
> +
> + offload = container_of(w, struct nfs4_cb_offload, co_work);
> + status = nfsd_copy_range(offload->co_src_file, offload->co_src_pos,
> + offload->co_dst_file, offload->co_dst_pos,
> + offload->co_count);
> +
> + if (status == nfs_ok) {
> + offload->co_stable_how = NFS_FILE_SYNC;
> + gen_boot_verifier(&offload->co_verifier, offload->co_net);
> + fput(offload->co_src_file);
> + fput(offload->co_dst_file);
> + }
> + nfsd4_cb_offload(offload);
> +}
> +
> static __be32
> nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> struct nfsd4_copy *copy)
> {
> - __be32 status;
> struct file *src = NULL, *dst = NULL;
> + struct nfs4_cb_offload *offload;
>
> - status = nfsd4_verify_copy(rqstp, cstate, copy, &src, &dst);
> - if (status)
> - return status;
> -
> - status = nfsd_copy_range(src, copy->cp_src_pos,
> - dst, copy->cp_dst_pos,
> - copy->cp_count);
> + if (nfsd4_verify_copy(rqstp, cstate, copy, &src, &dst))
> + return nfserr_jukebox;
>
> - if (status == nfs_ok) {
> - copy->cp_res.wr_stateid = NULL;
> - copy->cp_res.wr_bytes_written = copy->cp_count;
> - copy->cp_res.wr_stable_how = NFS_FILE_SYNC;
> - gen_boot_verifier(&copy->cp_res.wr_verifier, SVC_NET(rqstp));
> - }
> + offload = kmalloc(sizeof(struct nfs4_cb_offload), GFP_KERNEL);
> + if (!offload)
> + return nfserr_jukebox;
>
> - return status;
> + offload->co_src_file = get_file(src);
> + offload->co_dst_file = get_file(dst);
> + offload->co_src_pos = copy->cp_src_pos;
> + offload->co_dst_pos = copy->cp_dst_pos;
> + offload->co_count = copy->cp_count;
> + offload->co_stid = nfs4_alloc_offload_stateid(cstate->session->se_client);
> + offload->co_net = SVC_NET(rqstp);
> + INIT_WORK(&offload->co_work, nfsd4_copy_async);
> + nfsd4_init_callback(&offload->co_callback);
> + memcpy(&offload->co_dst_fh, &cstate->current_fh, sizeof(struct knfsd_fh));
> +
> + copy->cp_res.wr_stateid = &offload->co_stid->sc_stateid;
> + copy->cp_res.wr_bytes_written = 0;
> + copy->cp_res.wr_stable_how = NFS_UNSTABLE;
> +
> + schedule_work(&offload->co_work);
> + return nfs_ok;
> }
>
> /* This routine never returns NFS_OK! If there are no other errors, it
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index c4e270e..582edb5 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -364,6 +364,11 @@ static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
> return openlockstateid(nfs4_alloc_stid(clp, stateid_slab));
> }
>
> +struct nfs4_stid *nfs4_alloc_offload_stateid(struct nfs4_client *clp)
> +{
> + return nfs4_alloc_stid(clp, stateid_slab);
> +}
> +
> static struct nfs4_delegation *
> alloc_init_deleg(struct nfs4_client *clp, struct nfs4_ol_stateid *stp, struct svc_fh *current_fh)
> {
> @@ -617,6 +622,12 @@ static void free_generic_stateid(struct nfs4_ol_stateid *stp)
> kmem_cache_free(stateid_slab, stp);
> }
>
> +void nfs4_free_offload_stateid(struct nfs4_stid *stid)
> +{
> + remove_stid(stid);
> + kmem_cache_free(stateid_slab, stid);
> +}
> +
> static void release_lock_stateid(struct nfs4_ol_stateid *stp)
> {
> struct file *file;
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 2478805..56682fb 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -70,6 +70,7 @@ struct nfsd4_callback {
> struct rpc_message cb_msg;
> const struct rpc_call_ops *cb_ops;
> struct work_struct cb_work;
> + struct delayed_work cb_delay;
> bool cb_done;
> };
>
> @@ -101,6 +102,22 @@ struct nfs4_delegation {
> struct nfsd4_callback dl_recall;
> };
>
> +struct nfs4_cb_offload {
> + struct file *co_src_file;
> + struct file *co_dst_file;
> + u64 co_src_pos;
> + u64 co_dst_pos;
> + u64 co_count;
> + u32 co_stable_how;
> + struct knfsd_fh co_dst_fh;
> + nfs4_verifier co_verifier;
> + struct net *co_net;
> +
> + struct nfs4_stid *co_stid;
> + struct work_struct co_work;
> + struct nfsd4_callback co_callback;
> +};
> +
> /* client delegation callback info */
> struct nfs4_cb_conn {
> /* SETCLIENTID info */
> @@ -468,10 +485,12 @@ extern void nfs4_free_openowner(struct nfs4_openowner *);
> extern void nfs4_free_lockowner(struct nfs4_lockowner *);
> extern int set_callback_cred(void);
> extern void nfsd4_init_callback(struct nfsd4_callback *);
> +extern void nfsd4_init_delayed_callback(struct nfsd4_callback *);
> extern void nfsd4_probe_callback(struct nfs4_client *clp);
> extern void nfsd4_probe_callback_sync(struct nfs4_client *clp);
> extern void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *);
> extern void nfsd4_cb_recall(struct nfs4_delegation *dp);
> +extern void nfsd4_cb_offload(struct nfs4_cb_offload *);
> extern int nfsd4_create_callback_queue(void);
> extern void nfsd4_destroy_callback_queue(void);
> extern void nfsd4_shutdown_callback(struct nfs4_client *);
> @@ -480,6 +499,8 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
> struct nfsd_net *nn);
> extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
> extern void put_client_renew(struct nfs4_client *clp);
> +extern struct nfs4_stid *nfs4_alloc_offload_stateid(struct nfs4_client *);
> +extern void nfs4_free_offload_stateid(struct nfs4_stid *);
>
> /* nfs4recover operations */
> extern int nfsd4_client_tracking_init(struct net *net);
> diff --git a/fs/nfsd/xdr4cb.h b/fs/nfsd/xdr4cb.h
> index c5c55df..75b0ef7 100644
> --- a/fs/nfsd/xdr4cb.h
> +++ b/fs/nfsd/xdr4cb.h
> @@ -21,3 +21,12 @@
> #define NFS4_dec_cb_recall_sz (cb_compound_dec_hdr_sz + \
> cb_sequence_dec_sz + \
> op_dec_sz)
> +
> +#define NFS4_enc_cb_offload_sz (cb_compound_enc_hdr_sz + \
> + cb_sequence_enc_sz + \
> + 1 + enc_stateid_sz + 2 + 1 + \
> + XDR_QUADLEN(NFS4_VERIFIER_SIZE))
> +
> +#define NFS4_dec_cb_offload_sz (cb_compound_dec_hdr_sz + \
> + cb_sequence_dec_sz + \
> + op_dec_sz)
> --
> 1.8.3.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-07-19 21:03:57

by Anna Schumaker

[permalink] [raw]
Subject: [RFC 5/5] NFS: Change copy to support async servers

From: Bryan Schumaker <[email protected]>

Supporting CB_OFFLOAD is required by the spec, so if a server chooses to
copy in the background we have to wait for the copy to finish before
returning to userspace.

Signed-off-by: Bryan Schumaker <[email protected]>
---
fs/nfs/callback.h | 13 ++++++++++++
fs/nfs/callback_proc.c | 9 +++++++++
fs/nfs/callback_xdr.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++---
fs/nfs/nfs4_fs.h | 3 +++
fs/nfs/nfs4file.c | 48 +++++++++++++++++++++++++++++++++++++++++++
fs/nfs/nfs4xdr.c | 4 ++--
include/linux/nfs_xdr.h | 3 +++
7 files changed, 129 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
index 84326e9..ae5a5d2 100644
--- a/fs/nfs/callback.h
+++ b/fs/nfs/callback.h
@@ -187,6 +187,19 @@ extern __be32 nfs4_callback_devicenotify(
void *dummy, struct cb_process_state *cps);

#endif /* CONFIG_NFS_V4_1 */
+
+#ifdef CONFIG_NFS_V4_2
+struct cb_offloadargs {
+ struct nfs_fh dst_fh;
+ nfs4_stateid stateid;
+ struct nfs42_write_response write_res;
+};
+
+extern __be32 nfs4_callback_offload(struct cb_offloadargs *, void *,
+ struct cb_process_state *);
+void wake_copy_offload(struct cb_offloadargs *);
+#endif /* CONFIG_NFS_V4_2 */
+
extern int check_gss_callback_principal(struct nfs_client *, struct svc_rqst *);
extern __be32 nfs4_callback_getattr(struct cb_getattrargs *args,
struct cb_getattrres *res,
diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index e6ebc4c..cdf4180 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -533,3 +533,12 @@ out:
return status;
}
#endif /* CONFIG_NFS_V4_1 */
+
+#ifdef CONFIG_NFS_V4_2
+__be32 nfs4_callback_offload(struct cb_offloadargs *args, void *dummy,
+ struct cb_process_state *cps)
+{
+ wake_copy_offload(args);
+ return htonl(NFS4_OK);
+}
+#endif /* CONFIG_NFS_V4_2 */
diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index f4ccfe6..d8fcf1a 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -35,6 +35,14 @@
#define CB_OP_RECALLSLOT_RES_MAXSZ (CB_OP_HDR_RES_MAXSZ)
#endif /* CONFIG_NFS_V4_1 */

+#if defined(CONFIG_NFS_V4_2)
+#define CB_OP_OFFLOAD_RES_MAXSZ (CB_OP_HDR_RES_MAXSZ + \
+ 1 + XDR_QUADLEN(NFS4_FHSIZE) + \
+ XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+ 1 + XDR_QUADLEN(NFS4_STATEID_SIZE) + \
+ 2 + 1 + XDR_QUADLEN(NFS4_VERIFIER_SIZE))
+#endif /* CONFIG_NFS_V4_2 */
+
#define NFSDBG_FACILITY NFSDBG_CALLBACK

/* Internal error code */
@@ -527,6 +535,37 @@ static __be32 decode_recallslot_args(struct svc_rqst *rqstp,

#endif /* CONFIG_NFS_V4_1 */

+#ifdef CONFIG_NFS_V4_2
+static inline __be32 decode_write_res(struct xdr_stream *xdr,
+ struct nfs42_write_response *write_res)
+{
+ __be32 status = decode_write_response(xdr, write_res);
+ if (status == -EIO)
+ return htonl(NFS4ERR_RESOURCE);
+ return htonl(status);
+}
+
+static __be32 decode_offload_args(struct svc_rqst *rqstp,
+ struct xdr_stream *xdr,
+ struct cb_offloadargs *args)
+{
+ __be32 status;
+
+ status = decode_fh(xdr, &args->dst_fh);
+ if (unlikely(status != 0))
+ goto out;
+
+ status = decode_stateid(xdr, &args->stateid);
+ if (unlikely(status != 0))
+ goto out;
+
+ status = decode_write_res(xdr, &args->write_res);
+out:
+ dprintk("%s: exit with status = %d\n", __func__, ntohl(status));
+ return status;
+}
+#endif /* CONFIG_NFS_V4_2 */
+
static __be32 encode_string(struct xdr_stream *xdr, unsigned int len, const char *str)
{
__be32 *p;
@@ -794,9 +833,11 @@ preprocess_nfs42_op(int nop, unsigned int op_nr, struct callback_op **op)
if (status != htonl(NFS4ERR_OP_ILLEGAL))
return status;

- if (op_nr == OP_CB_OFFLOAD)
- return htonl(NFS4ERR_NOTSUPP);
- return htonl(NFS4ERR_OP_ILLEGAL);
+ if (op_nr != OP_CB_OFFLOAD)
+ return htonl(NFS4ERR_OP_ILLEGAL);
+
+ *op = &callback_ops[op_nr];
+ return htonl(NFS4_OK);
}
#else /* CONFIG_NFS_V4_2 */
static __be32
@@ -991,6 +1032,13 @@ static struct callback_op callback_ops[] = {
.res_maxsize = CB_OP_RECALLSLOT_RES_MAXSZ,
},
#endif /* CONFIG_NFS_V4_1 */
+#if defined(CONFIG_NFS_V4_2)
+ [OP_CB_OFFLOAD] = {
+ .process_op = (callback_process_op_t)nfs4_callback_offload,
+ .decode_args = (callback_decode_arg_t)decode_offload_args,
+ .res_maxsize = CB_OP_OFFLOAD_RES_MAXSZ,
+ },
+#endif
};

/*
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 26c7cf0..5c32fb5 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -407,6 +407,9 @@ static inline void nfs4_unregister_sysctl(void)

/* nfs4xdr.c */
extern struct rpc_procinfo nfs4_procedures[];
+#if defined(CONFIG_NFS_V4_2)
+int decode_write_response(struct xdr_stream *, struct nfs42_write_response *);
+#endif /* CONFIG_NFS_V4_2 */

struct nfs4_mount_data;

diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index ca77ab4..fbd5f77 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -4,6 +4,7 @@
* Copyright (C) 1992 Rick Sladkey
*/
#include <linux/nfs_fs.h>
+#include "callback.h"
#include "internal.h"
#include "fscache.h"
#include "pnfs.h"
@@ -118,6 +119,9 @@ nfs4_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
}

#ifdef CONFIG_NFS_V4_2
+static LIST_HEAD(nfs_copy_async_list);
+static DEFINE_SPINLOCK(async_copy_lock);
+
static int nfs4_find_copy_stateid(struct file *file, nfs4_stateid *stateid,
fmode_t mode)
{
@@ -137,6 +141,43 @@ static int nfs4_find_copy_stateid(struct file *file, nfs4_stateid *stateid,
return ret;
}

+static void wait_for_offload(struct nfs42_copy_res *res)
+{
+ spin_lock(&async_copy_lock);
+ list_add(&res->wait_list, &nfs_copy_async_list);
+ spin_unlock(&async_copy_lock);
+
+ wait_for_completion(&res->completion);
+}
+
+static struct nfs42_copy_res *find_async_copy(nfs4_stateid *stateid)
+{
+ struct nfs42_copy_res *cur;
+
+ list_for_each_entry(cur, &nfs_copy_async_list, wait_list) {
+ if (memcmp(stateid, cur->cp_res.wr_stateid, sizeof(nfs4_stateid)) == 0)
+ return cur;
+ }
+ return NULL;
+}
+
+void wake_copy_offload(struct cb_offloadargs *offload)
+{
+ struct nfs42_copy_res *copy;
+
+ spin_lock(&async_copy_lock);
+ copy = find_async_copy(&offload->stateid);
+ if (copy == NULL) {
+ spin_unlock(&async_copy_lock);
+ return;
+ }
+ list_del(&copy->wait_list);
+ spin_unlock(&async_copy_lock);
+
+ copy->cp_res.wr_bytes_copied = offload->write_res.wr_bytes_copied;
+ complete(&copy->completion);
+}
+
static ssize_t nfs4_copy_range(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
size_t count)
@@ -159,10 +200,17 @@ static ssize_t nfs4_copy_range(struct file *file_in, loff_t pos_in,
if (err)
return err;

+ init_completion(&res.completion);
+
err = nfs42_proc_copy(NFS_SERVER(file_inode(file_out)), &args, &res);
if (err)
return err;

+ if (res.cp_res.wr_stateid != NULL) {
+ wait_for_offload(&res);
+ kfree(res.cp_res.wr_stateid);
+ }
+
return res.cp_res.wr_bytes_copied;
}
#endif /* CONFIG_NFS_V4_2 */
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index d70c6bc..465d1bc 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -6011,8 +6011,8 @@ out_overflow:
#endif /* CONFIG_NFS_V4_1 */

#ifdef CONFIG_NFS_V4_2
-static int decode_write_response(struct xdr_stream *xdr,
- struct nfs42_write_response *write_res)
+int decode_write_response(struct xdr_stream *xdr,
+ struct nfs42_write_response *write_res)
{
__be32 *p;
int num_ids;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 0bc6b14..f603793 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1231,6 +1231,9 @@ struct nfs42_copy_res {
struct nfs4_sequence_res seq_res;
unsigned int status;
struct nfs42_write_response cp_res;
+
+ struct list_head wait_list;
+ struct completion completion;
};
#endif

--
1.8.3.3


2013-07-22 19:17:32

by Anna Schumaker

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On 07/22/2013 02:50 PM, J. Bruce Fields wrote:
> On Fri, Jul 19, 2013 at 05:03:49PM -0400, [email protected] wrote:
>> From: Bryan Schumaker <[email protected]>
>>
>> Rather than performing the copy right away, schedule it to run later and
>> reply to the client. Later, send a callback to notify the client that
>> the copy has finished.
>
> I believe you need to implement the referring triple support described
> in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
> described in
> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> .

I'll re-read and re-write.

>
> I see cb_delay initialized below, but not otherwise used. Am I missing
> anything?

Whoops! I was using that earlier to try to fake up a callback, but I eventually decided it's easier to just do the copy asynchronously. I must have forgotten to take it out :(

>
> What about OFFLOAD_STATUS and OFFLOAD_ABORT?

I haven't thought out those too much... I haven't thought about a use for them on the client yet.

>
> In some common cases the reply will be very quick, and we might be
> better off handling it synchronously. Could we implement a heuristic
> like "copy synchronously if the filesystem has special support or the
> range is less than the maximum iosize, otherwise copy asynchronously"?

I'm sure that can be done, I'm just not sure how to do it yet...

>
> --b.
>
>
>> Signed-off-by: Bryan Schumaker <[email protected]>
>> ---
>> fs/nfsd/nfs4callback.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++++
>> fs/nfsd/nfs4proc.c | 59 +++++++++++++++------
>> fs/nfsd/nfs4state.c | 11 ++++
>> fs/nfsd/state.h | 21 ++++++++
>> fs/nfsd/xdr4cb.h | 9 ++++
>> 5 files changed, 221 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>> index 7f05cd1..8f797e1 100644
>> --- a/fs/nfsd/nfs4callback.c
>> +++ b/fs/nfsd/nfs4callback.c
>> @@ -52,6 +52,9 @@ enum {
>> NFSPROC4_CLNT_CB_NULL = 0,
>> NFSPROC4_CLNT_CB_RECALL,
>> NFSPROC4_CLNT_CB_SEQUENCE,
>> +
>> + /* NFS v4.2 callback */
>> + NFSPROC4_CLNT_CB_OFFLOAD,
>> };
>>
>> struct nfs4_cb_compound_hdr {
>> @@ -110,6 +113,7 @@ enum nfs_cb_opnum4 {
>> OP_CB_WANTS_CANCELLED = 12,
>> OP_CB_NOTIFY_LOCK = 13,
>> OP_CB_NOTIFY_DEVICEID = 14,
>> + OP_CB_OFFLOAD = 15,
>> OP_CB_ILLEGAL = 10044
>> };
>>
>> @@ -469,6 +473,31 @@ out_default:
>> return nfs_cb_stat_to_errno(nfserr);
>> }
>>
>> +static void encode_cb_offload4args(struct xdr_stream *xdr,
>> + const struct nfs4_cb_offload *offload,
>> + struct nfs4_cb_compound_hdr *hdr)
>> +{
>> + __be32 *p;
>> +
>> + if (hdr->minorversion < 2)
>> + return;
>> +
>> + encode_nfs_cb_opnum4(xdr, OP_CB_OFFLOAD);
>> + encode_nfs_fh4(xdr, &offload->co_dst_fh);
>> + encode_stateid4(xdr, &offload->co_stid->sc_stateid);
>> +
>> + p = xdr_reserve_space(xdr, 4);
>> + *p = cpu_to_be32(1);
>> + encode_stateid4(xdr, &offload->co_stid->sc_stateid);
>> +
>> + p = xdr_reserve_space(xdr, 12 + NFS4_VERIFIER_SIZE);
>> + p = xdr_encode_hyper(p, offload->co_count);
>> + *p++ = cpu_to_be32(offload->co_stable_how);
>> + xdr_encode_opaque_fixed(p, offload->co_verifier.data, NFS4_VERIFIER_SIZE);
>> +
>> + hdr->nops++;
>> +}
>> +
>> /*
>> * NFSv4.0 and NFSv4.1 XDR encode functions
>> *
>> @@ -505,6 +534,23 @@ static void nfs4_xdr_enc_cb_recall(struct rpc_rqst *req, struct xdr_stream *xdr,
>> encode_cb_nops(&hdr);
>> }
>>
>> +/*
>> + * CB_OFFLOAD
>> + */
>> +static void nfs4_xdr_enc_cb_offload(struct rpc_rqst *req, struct xdr_stream *xdr,
>> + const struct nfsd4_callback *cb)
>> +{
>> + const struct nfs4_cb_offload *args = cb->cb_op;
>> + struct nfs4_cb_compound_hdr hdr = {
>> + .ident = cb->cb_clp->cl_cb_ident,
>> + .minorversion = cb->cb_minorversion,
>> + };
>> +
>> + encode_cb_compound4args(xdr, &hdr);
>> + encode_cb_sequence4args(xdr, cb, &hdr);
>> + encode_cb_offload4args(xdr, args, &hdr);
>> + encode_cb_nops(&hdr);
>> +}
>>
>> /*
>> * NFSv4.0 and NFSv4.1 XDR decode functions
>> @@ -552,6 +598,36 @@ out:
>> }
>>
>> /*
>> + * CB_OFFLOAD
>> + */
>> +static int nfs4_xdr_dec_cb_offload(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
>> + struct nfsd4_callback *cb)
>> +{
>> + struct nfs4_cb_compound_hdr hdr;
>> + enum nfsstat4 nfserr;
>> + int status;
>> +
>> + status = decode_cb_compound4res(xdr, &hdr);
>> + if (unlikely(status))
>> + goto out;
>> +
>> + if (cb != NULL) {
>> + status = decode_cb_sequence4res(xdr, cb);
>> + if (unlikely(status))
>> + goto out;
>> + }
>> +
>> + status = decode_cb_op_status(xdr, OP_CB_OFFLOAD, &nfserr);
>> + if (unlikely(status))
>> + goto out;
>> + if (unlikely(nfserr != NFS4_OK))
>> + status = nfs_cb_stat_to_errno(nfserr);
>> +
>> +out:
>> + return status;
>> +}
>> +
>> +/*
>> * RPC procedure tables
>> */
>> #define PROC(proc, call, argtype, restype) \
>> @@ -568,6 +644,7 @@ out:
>> static struct rpc_procinfo nfs4_cb_procedures[] = {
>> PROC(CB_NULL, NULL, cb_null, cb_null),
>> PROC(CB_RECALL, COMPOUND, cb_recall, cb_recall),
>> + PROC(CB_OFFLOAD, COMPOUND, cb_offload, cb_offload),
>> };
>>
>> static struct rpc_version nfs_cb_version4 = {
>> @@ -1017,6 +1094,11 @@ void nfsd4_init_callback(struct nfsd4_callback *cb)
>> INIT_WORK(&cb->cb_work, nfsd4_do_callback_rpc);
>> }
>>
>> +void nfsd4_init_delayed_callback(struct nfsd4_callback *cb)
>> +{
>> + INIT_DELAYED_WORK(&cb->cb_delay, nfsd4_do_callback_rpc);
>> +}
>> +
>> void nfsd4_cb_recall(struct nfs4_delegation *dp)
>> {
>> struct nfsd4_callback *cb = &dp->dl_recall;
>> @@ -1036,3 +1118,57 @@ void nfsd4_cb_recall(struct nfs4_delegation *dp)
>>
>> run_nfsd4_cb(&dp->dl_recall);
>> }
>> +
>> +static void nfsd4_cb_offload_done(struct rpc_task *task, void *calldata)
>> +{
>> + struct nfsd4_callback *cb = calldata;
>> + struct nfs4_client *clp = cb->cb_clp;
>> + struct rpc_clnt *current_rpc_client = clp->cl_cb_client;
>> +
>> + nfsd4_cb_done(task, calldata);
>> +
>> + if (current_rpc_client != task->tk_client)
>> + return;
>> +
>> + if (cb->cb_done)
>> + return;
>> +
>> + if (task->tk_status != 0)
>> + nfsd4_mark_cb_down(clp, task->tk_status);
>> + cb->cb_done = true;
>> +}
>> +
>> +static void nfsd4_cb_offload_release(void *calldata)
>> +{
>> + struct nfsd4_callback *cb = calldata;
>> + struct nfs4_cb_offload *offload = container_of(cb, struct nfs4_cb_offload, co_callback);
>> +
>> + if (cb->cb_done) {
>> + nfs4_free_offload_stateid(offload->co_stid);
>> + kfree(offload);
>> + }
>> +}
>> +
>> +static const struct rpc_call_ops nfsd4_cb_offload_ops = {
>> + .rpc_call_prepare = nfsd4_cb_prepare,
>> + .rpc_call_done = nfsd4_cb_offload_done,
>> + .rpc_release = nfsd4_cb_offload_release,
>> +};
>> +
>> +void nfsd4_cb_offload(struct nfs4_cb_offload *offload)
>> +{
>> + struct nfsd4_callback *cb = &offload->co_callback;
>> +
>> + cb->cb_op = offload;
>> + cb->cb_clp = offload->co_stid->sc_client;
>> + cb->cb_msg.rpc_proc = &nfs4_cb_procedures[NFSPROC4_CLNT_CB_OFFLOAD];
>> + cb->cb_msg.rpc_argp = cb;
>> + cb->cb_msg.rpc_resp = cb;
>> +
>> + cb->cb_ops = &nfsd4_cb_offload_ops;
>> +
>> + INIT_LIST_HEAD(&cb->cb_per_client);
>> + cb->cb_done = true;
>> +
>> + run_nfsd4_cb(cb);
>> +}
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index d4584ea..66a787f 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -35,6 +35,7 @@
>> #include <linux/file.h>
>> #include <linux/slab.h>
>>
>> +#include "state.h"
>> #include "idmap.h"
>> #include "cache.h"
>> #include "xdr4.h"
>> @@ -1062,29 +1063,57 @@ out:
>> return status;
>> }
>>
>> +static void
>> +nfsd4_copy_async(struct work_struct *w)
>> +{
>> + __be32 status;
>> + struct nfs4_cb_offload *offload;
>> +
>> + offload = container_of(w, struct nfs4_cb_offload, co_work);
>> + status = nfsd_copy_range(offload->co_src_file, offload->co_src_pos,
>> + offload->co_dst_file, offload->co_dst_pos,
>> + offload->co_count);
>> +
>> + if (status == nfs_ok) {
>> + offload->co_stable_how = NFS_FILE_SYNC;
>> + gen_boot_verifier(&offload->co_verifier, offload->co_net);
>> + fput(offload->co_src_file);
>> + fput(offload->co_dst_file);
>> + }
>> + nfsd4_cb_offload(offload);
>> +}
>> +
>> static __be32
>> nfsd4_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>> struct nfsd4_copy *copy)
>> {
>> - __be32 status;
>> struct file *src = NULL, *dst = NULL;
>> + struct nfs4_cb_offload *offload;
>>
>> - status = nfsd4_verify_copy(rqstp, cstate, copy, &src, &dst);
>> - if (status)
>> - return status;
>> -
>> - status = nfsd_copy_range(src, copy->cp_src_pos,
>> - dst, copy->cp_dst_pos,
>> - copy->cp_count);
>> + if (nfsd4_verify_copy(rqstp, cstate, copy, &src, &dst))
>> + return nfserr_jukebox;
>>
>> - if (status == nfs_ok) {
>> - copy->cp_res.wr_stateid = NULL;
>> - copy->cp_res.wr_bytes_written = copy->cp_count;
>> - copy->cp_res.wr_stable_how = NFS_FILE_SYNC;
>> - gen_boot_verifier(&copy->cp_res.wr_verifier, SVC_NET(rqstp));
>> - }
>> + offload = kmalloc(sizeof(struct nfs4_cb_offload), GFP_KERNEL);
>> + if (!offload)
>> + return nfserr_jukebox;
>>
>> - return status;
>> + offload->co_src_file = get_file(src);
>> + offload->co_dst_file = get_file(dst);
>> + offload->co_src_pos = copy->cp_src_pos;
>> + offload->co_dst_pos = copy->cp_dst_pos;
>> + offload->co_count = copy->cp_count;
>> + offload->co_stid = nfs4_alloc_offload_stateid(cstate->session->se_client);
>> + offload->co_net = SVC_NET(rqstp);
>> + INIT_WORK(&offload->co_work, nfsd4_copy_async);
>> + nfsd4_init_callback(&offload->co_callback);
>> + memcpy(&offload->co_dst_fh, &cstate->current_fh, sizeof(struct knfsd_fh));
>> +
>> + copy->cp_res.wr_stateid = &offload->co_stid->sc_stateid;
>> + copy->cp_res.wr_bytes_written = 0;
>> + copy->cp_res.wr_stable_how = NFS_UNSTABLE;
>> +
>> + schedule_work(&offload->co_work);
>> + return nfs_ok;
>> }
>>
>> /* This routine never returns NFS_OK! If there are no other errors, it
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index c4e270e..582edb5 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -364,6 +364,11 @@ static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
>> return openlockstateid(nfs4_alloc_stid(clp, stateid_slab));
>> }
>>
>> +struct nfs4_stid *nfs4_alloc_offload_stateid(struct nfs4_client *clp)
>> +{
>> + return nfs4_alloc_stid(clp, stateid_slab);
>> +}
>> +
>> static struct nfs4_delegation *
>> alloc_init_deleg(struct nfs4_client *clp, struct nfs4_ol_stateid *stp, struct svc_fh *current_fh)
>> {
>> @@ -617,6 +622,12 @@ static void free_generic_stateid(struct nfs4_ol_stateid *stp)
>> kmem_cache_free(stateid_slab, stp);
>> }
>>
>> +void nfs4_free_offload_stateid(struct nfs4_stid *stid)
>> +{
>> + remove_stid(stid);
>> + kmem_cache_free(stateid_slab, stid);
>> +}
>> +
>> static void release_lock_stateid(struct nfs4_ol_stateid *stp)
>> {
>> struct file *file;
>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
>> index 2478805..56682fb 100644
>> --- a/fs/nfsd/state.h
>> +++ b/fs/nfsd/state.h
>> @@ -70,6 +70,7 @@ struct nfsd4_callback {
>> struct rpc_message cb_msg;
>> const struct rpc_call_ops *cb_ops;
>> struct work_struct cb_work;
>> + struct delayed_work cb_delay;
>> bool cb_done;
>> };
>>
>> @@ -101,6 +102,22 @@ struct nfs4_delegation {
>> struct nfsd4_callback dl_recall;
>> };
>>
>> +struct nfs4_cb_offload {
>> + struct file *co_src_file;
>> + struct file *co_dst_file;
>> + u64 co_src_pos;
>> + u64 co_dst_pos;
>> + u64 co_count;
>> + u32 co_stable_how;
>> + struct knfsd_fh co_dst_fh;
>> + nfs4_verifier co_verifier;
>> + struct net *co_net;
>> +
>> + struct nfs4_stid *co_stid;
>> + struct work_struct co_work;
>> + struct nfsd4_callback co_callback;
>> +};
>> +
>> /* client delegation callback info */
>> struct nfs4_cb_conn {
>> /* SETCLIENTID info */
>> @@ -468,10 +485,12 @@ extern void nfs4_free_openowner(struct nfs4_openowner *);
>> extern void nfs4_free_lockowner(struct nfs4_lockowner *);
>> extern int set_callback_cred(void);
>> extern void nfsd4_init_callback(struct nfsd4_callback *);
>> +extern void nfsd4_init_delayed_callback(struct nfsd4_callback *);
>> extern void nfsd4_probe_callback(struct nfs4_client *clp);
>> extern void nfsd4_probe_callback_sync(struct nfs4_client *clp);
>> extern void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *);
>> extern void nfsd4_cb_recall(struct nfs4_delegation *dp);
>> +extern void nfsd4_cb_offload(struct nfs4_cb_offload *);
>> extern int nfsd4_create_callback_queue(void);
>> extern void nfsd4_destroy_callback_queue(void);
>> extern void nfsd4_shutdown_callback(struct nfs4_client *);
>> @@ -480,6 +499,8 @@ extern struct nfs4_client_reclaim *nfs4_client_to_reclaim(const char *name,
>> struct nfsd_net *nn);
>> extern bool nfs4_has_reclaimed_state(const char *name, struct nfsd_net *nn);
>> extern void put_client_renew(struct nfs4_client *clp);
>> +extern struct nfs4_stid *nfs4_alloc_offload_stateid(struct nfs4_client *);
>> +extern void nfs4_free_offload_stateid(struct nfs4_stid *);
>>
>> /* nfs4recover operations */
>> extern int nfsd4_client_tracking_init(struct net *net);
>> diff --git a/fs/nfsd/xdr4cb.h b/fs/nfsd/xdr4cb.h
>> index c5c55df..75b0ef7 100644
>> --- a/fs/nfsd/xdr4cb.h
>> +++ b/fs/nfsd/xdr4cb.h
>> @@ -21,3 +21,12 @@
>> #define NFS4_dec_cb_recall_sz (cb_compound_dec_hdr_sz + \
>> cb_sequence_dec_sz + \
>> op_dec_sz)
>> +
>> +#define NFS4_enc_cb_offload_sz (cb_compound_enc_hdr_sz + \
>> + cb_sequence_enc_sz + \
>> + 1 + enc_stateid_sz + 2 + 1 + \
>> + XDR_QUADLEN(NFS4_VERIFIER_SIZE))
>> +
>> +#define NFS4_dec_cb_offload_sz (cb_compound_dec_hdr_sz + \
>> + cb_sequence_dec_sz + \
>> + op_dec_sz)
>> --
>> 1.8.3.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html


2013-07-22 19:37:20

by Anna Schumaker

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On 07/22/2013 03:30 PM, J. Bruce Fields wrote:
> On Mon, Jul 22, 2013 at 03:17:29PM -0400, Bryan Schumaker wrote:
>> On 07/22/2013 02:50 PM, J. Bruce Fields wrote:
>>> On Fri, Jul 19, 2013 at 05:03:49PM -0400, [email protected] wrote:
>>>> From: Bryan Schumaker <[email protected]>
>>>>
>>>> Rather than performing the copy right away, schedule it to run later and
>>>> reply to the client. Later, send a callback to notify the client that
>>>> the copy has finished.
>>>
>>> I believe you need to implement the referring triple support described
>>> in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
>>> described in
>>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
>>> .
>>
>> I'll re-read and re-write.
>>
>>>
>>> I see cb_delay initialized below, but not otherwise used. Am I missing
>>> anything?
>>
>> Whoops! I was using that earlier to try to fake up a callback, but I eventually decided it's easier to just do the copy asynchronously. I must have forgotten to take it out :(
>>
>>>
>>> What about OFFLOAD_STATUS and OFFLOAD_ABORT?
>>
>> I haven't thought out those too much... I haven't thought about a use for them on the client yet.
>
> If it might be a long-running copy, I assume the client needs the
> ability to abort if the caller is killed.
>
> (Dumb question: what happens on the network partition? Does the server
> abort the copy when it expires the client state?)
>
> In any case,
> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> says "If a server's COPY operation returns a stateid, then the server
> MUST also support these operations: CB_OFFLOAD, OFFLOAD_ABORT, and
> OFFLOAD_STATUS."
>
> So even if we've no use for them on the client then we still need to
> implement them (and probably just write a basic pynfs test). Either
> that or update the spec.

Fair enough. I'll think it out and do something! Easy solution: save this patch for later and only support the sync version of copy for the final version of this patch series.

- Bryan

>
>>> In some common cases the reply will be very quick, and we might be
>>> better off handling it synchronously. Could we implement a heuristic
>>> like "copy synchronously if the filesystem has special support or the
>>> range is less than the maximum iosize, otherwise copy asynchronously"?
>>
>> I'm sure that can be done, I'm just not sure how to do it yet...
>
> OK, thanks.
>
> --b.
>


2013-07-22 19:38:50

by Anna Schumaker

[permalink] [raw]
Subject: Re: [RFC 0/5] NFS Server Side Copy

On 07/22/2013 02:53 PM, J. Bruce Fields wrote:
> On Fri, Jul 19, 2013 at 05:03:45PM -0400, [email protected] wrote:
>> From: Bryan Schumaker <[email protected]>
>>
>> These patches build on Zach Brown's copyfile patches to add server side
>> copy to both the NFS client and the NFS server.
>>
>> The first patch improves on the copyfile syscall to make it usable on my
>> machine and also includes notes on other potential problems that I've
>> found. The remaining patches first implement a sync copy, then expand to
>> async.
>>
>> My testing was done on a server exporting an ext4 filesystem exporting an
>> ext4 filesystem. I compared copying using the cp command to copying with
>> the copyfile system call.
>
> Were these tests using the full series of patches? (So, using the
> asynchronous mechanism?)

Yes. Want me to re-run them without it?

- Bryan

>
> --b.
>
>>
>>
>> File size: 512 MB
>> cp: 4.244 seconds
>> copyfile: 0.961 seconds
>>
>> File size: 1024 MB
>> cp: 9.091 seconds
>> copyfile: 1.919 seconds
>>
>> File size: 1536 MB
>> cp: 15.291 seconds
>> copyfile: 6.016 seconds
>>
>>
>> Repeating these tests on a btrfs exported filesystem supporting the copyfile
>> system call drops the time for copyfile to about 0.01 seconds.
>>
>> Feel free to send me any questions, comments or other thoughts!
>>
>> - Bryan
>>
>> Bryan Schumaker (5):
>> Improve on the copyfile systemcall
>> NFSD: Implement the COPY call
>> NFS: Add COPY nfs operation
>> NFSD: Defer copying
>> NFS: Change copy to support async servers
>>
>> fs/copy_range.c | 10 +++-
>> fs/nfs/callback.h | 13 ++++
>> fs/nfs/callback_proc.c | 9 +++
>> fs/nfs/callback_xdr.c | 54 ++++++++++++++++-
>> fs/nfs/inode.c | 2 +
>> fs/nfs/nfs4_fs.h | 7 +++
>> fs/nfs/nfs4file.c | 101 +++++++++++++++++++++++++++++++
>> fs/nfs/nfs4proc.c | 16 +++++
>> fs/nfs/nfs4xdr.c | 150 +++++++++++++++++++++++++++++++++++++++++++++++
>> fs/nfsd/nfs4callback.c | 136 ++++++++++++++++++++++++++++++++++++++++++
>> fs/nfsd/nfs4proc.c | 104 ++++++++++++++++++++++++++++++--
>> fs/nfsd/nfs4state.c | 15 ++++-
>> fs/nfsd/nfs4xdr.c | 121 +++++++++++++++++++++++++++++++++++++-
>> fs/nfsd/state.h | 23 +++++++-
>> fs/nfsd/vfs.c | 9 +++
>> fs/nfsd/vfs.h | 1 +
>> fs/nfsd/xdr4.h | 24 ++++++++
>> fs/nfsd/xdr4cb.h | 9 +++
>> include/linux/nfs4.h | 14 ++++-
>> include/linux/nfs_xdr.h | 33 +++++++++++
>> include/linux/syscalls.h | 1 +
>> 21 files changed, 836 insertions(+), 16 deletions(-)
>>
>> --
>> 1.8.3.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html


2013-07-22 18:53:37

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 0/5] NFS Server Side Copy

On Fri, Jul 19, 2013 at 05:03:45PM -0400, [email protected] wrote:
> From: Bryan Schumaker <[email protected]>
>
> These patches build on Zach Brown's copyfile patches to add server side
> copy to both the NFS client and the NFS server.
>
> The first patch improves on the copyfile syscall to make it usable on my
> machine and also includes notes on other potential problems that I've
> found. The remaining patches first implement a sync copy, then expand to
> async.
>
> My testing was done on a server exporting an ext4 filesystem exporting an
> ext4 filesystem. I compared copying using the cp command to copying with
> the copyfile system call.

Were these tests using the full series of patches? (So, using the
asynchronous mechanism?)

--b.

>
>
> File size: 512 MB
> cp: 4.244 seconds
> copyfile: 0.961 seconds
>
> File size: 1024 MB
> cp: 9.091 seconds
> copyfile: 1.919 seconds
>
> File size: 1536 MB
> cp: 15.291 seconds
> copyfile: 6.016 seconds
>
>
> Repeating these tests on a btrfs exported filesystem supporting the copyfile
> system call drops the time for copyfile to about 0.01 seconds.
>
> Feel free to send me any questions, comments or other thoughts!
>
> - Bryan
>
> Bryan Schumaker (5):
> Improve on the copyfile systemcall
> NFSD: Implement the COPY call
> NFS: Add COPY nfs operation
> NFSD: Defer copying
> NFS: Change copy to support async servers
>
> fs/copy_range.c | 10 +++-
> fs/nfs/callback.h | 13 ++++
> fs/nfs/callback_proc.c | 9 +++
> fs/nfs/callback_xdr.c | 54 ++++++++++++++++-
> fs/nfs/inode.c | 2 +
> fs/nfs/nfs4_fs.h | 7 +++
> fs/nfs/nfs4file.c | 101 +++++++++++++++++++++++++++++++
> fs/nfs/nfs4proc.c | 16 +++++
> fs/nfs/nfs4xdr.c | 150 +++++++++++++++++++++++++++++++++++++++++++++++
> fs/nfsd/nfs4callback.c | 136 ++++++++++++++++++++++++++++++++++++++++++
> fs/nfsd/nfs4proc.c | 104 ++++++++++++++++++++++++++++++--
> fs/nfsd/nfs4state.c | 15 ++++-
> fs/nfsd/nfs4xdr.c | 121 +++++++++++++++++++++++++++++++++++++-
> fs/nfsd/state.h | 23 +++++++-
> fs/nfsd/vfs.c | 9 +++
> fs/nfsd/vfs.h | 1 +
> fs/nfsd/xdr4.h | 24 ++++++++
> fs/nfsd/xdr4cb.h | 9 +++
> include/linux/nfs4.h | 14 ++++-
> include/linux/nfs_xdr.h | 33 +++++++++++
> include/linux/syscalls.h | 1 +
> 21 files changed, 836 insertions(+), 16 deletions(-)
>
> --
> 1.8.3.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-08-05 14:41:33

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On Mon, Aug 05, 2013 at 09:38:20AM +0100, Ric Wheeler wrote:
> On 07/22/2013 08:55 PM, J. Bruce Fields wrote:
> >On Mon, Jul 22, 2013 at 03:54:00PM -0400, Bryan Schumaker wrote:
> >>On 07/22/2013 03:43 PM, J. Bruce Fields wrote:
> >>>On Mon, Jul 22, 2013 at 03:37:00PM -0400, Bryan Schumaker wrote:
> >>>>On 07/22/2013 03:30 PM, J. Bruce Fields wrote:
> >>>>>On Mon, Jul 22, 2013 at 03:17:29PM -0400, Bryan Schumaker wrote:
> >>>>>>On 07/22/2013 02:50 PM, J. Bruce Fields wrote:
> >>>>>>>On Fri, Jul 19, 2013 at 05:03:49PM -0400, [email protected] wrote:
> >>>>>>>>From: Bryan Schumaker <[email protected]>
> >>>>>>>>
> >>>>>>>>Rather than performing the copy right away, schedule it to run later and
> >>>>>>>>reply to the client. Later, send a callback to notify the client that
> >>>>>>>>the copy has finished.
> >>>>>>>I believe you need to implement the referring triple support described
> >>>>>>>in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
> >>>>>>>described in
> >>>>>>>http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> >>>>>>>.
> >>>>>>I'll re-read and re-write.
> >>>>>>
> >>>>>>>I see cb_delay initialized below, but not otherwise used. Am I missing
> >>>>>>>anything?
> >>>>>>Whoops! I was using that earlier to try to fake up a callback, but I eventually decided it's easier to just do the copy asynchronously. I must have forgotten to take it out :(
> >>>>>>
> >>>>>>>What about OFFLOAD_STATUS and OFFLOAD_ABORT?
> >>>>>>I haven't thought out those too much... I haven't thought about a use for them on the client yet.
> >>>>>If it might be a long-running copy, I assume the client needs the
> >>>>>ability to abort if the caller is killed.
> >>>>>
> >>>>>(Dumb question: what happens on the network partition? Does the server
> >>>>>abort the copy when it expires the client state?)
> >>>>>
> >>>>>In any case,
> >>>>>http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> >>>>>says "If a server's COPY operation returns a stateid, then the server
> >>>>>MUST also support these operations: CB_OFFLOAD, OFFLOAD_ABORT, and
> >>>>>OFFLOAD_STATUS."
> >>>>>
> >>>>>So even if we've no use for them on the client then we still need to
> >>>>>implement them (and probably just write a basic pynfs test). Either
> >>>>>that or update the spec.
> >>>>Fair enough. I'll think it out and do something! Easy solution: save this patch for later and only support the sync version of copy for the final version of this patch series.
> >>>I can't remember--does the spec give the server a clear way to bail out
> >>>and tell the client to fall back on a normal copy in cases where the
> >>>server knows the copy could take an unreasonable amount of time?
> >>>
> >>>--b.
> >>I don't think so. Is there ever a case where copying over the network would be slower than copying on the server?
> >Mybe not, but if the copy will take a minute, then we don't want to tie
> >up an rpc slot for a minute.
> >
> >--b.
>
> I think that we need to be able to handle copies that would take a
> lot longer than just a minute - this offload could take a very long
> time I assume depending on the size of the data getting copied and
> the back end storage device....

Bryan suggested in offline discussion that one possibility might be to
copy, say, at most a gigabyte at a time before returning and making the
client continue the copy.

Where for "a gigabyte" read, "some amount that doesn't take too long to
copy but is still enough to allow close to full bandwidth". Hopefully
that's an easy number to find.

But based on
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-14.1.2
the COPY operation isn't designed for that--it doesn't give the option
of returning bytes_copied in the successful case.

Maybe we should fix that in the spec, or maybe we just need to implement
the asynchronous case. I guess it depends on which is easier,

a) implementing the asynchronous case (and the referring-triple
support to fix the COPY/callback races), or
b) implementing this sort of "short copy" loop in a way that gives
good performance.

On the client side it's clearly a) since you're forced to handle that
case anyway. (Unless we argue that *all* copies should work that way,
and that the spec should ditch the asynchronous case.) On the server
side, b) looks easier.

--b.

2013-08-05 08:38:26

by Ric Wheeler

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On 07/22/2013 08:55 PM, J. Bruce Fields wrote:
> On Mon, Jul 22, 2013 at 03:54:00PM -0400, Bryan Schumaker wrote:
>> On 07/22/2013 03:43 PM, J. Bruce Fields wrote:
>>> On Mon, Jul 22, 2013 at 03:37:00PM -0400, Bryan Schumaker wrote:
>>>> On 07/22/2013 03:30 PM, J. Bruce Fields wrote:
>>>>> On Mon, Jul 22, 2013 at 03:17:29PM -0400, Bryan Schumaker wrote:
>>>>>> On 07/22/2013 02:50 PM, J. Bruce Fields wrote:
>>>>>>> On Fri, Jul 19, 2013 at 05:03:49PM -0400, [email protected] wrote:
>>>>>>>> From: Bryan Schumaker <[email protected]>
>>>>>>>>
>>>>>>>> Rather than performing the copy right away, schedule it to run later and
>>>>>>>> reply to the client. Later, send a callback to notify the client that
>>>>>>>> the copy has finished.
>>>>>>> I believe you need to implement the referring triple support described
>>>>>>> in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
>>>>>>> described in
>>>>>>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
>>>>>>> .
>>>>>> I'll re-read and re-write.
>>>>>>
>>>>>>> I see cb_delay initialized below, but not otherwise used. Am I missing
>>>>>>> anything?
>>>>>> Whoops! I was using that earlier to try to fake up a callback, but I eventually decided it's easier to just do the copy asynchronously. I must have forgotten to take it out :(
>>>>>>
>>>>>>> What about OFFLOAD_STATUS and OFFLOAD_ABORT?
>>>>>> I haven't thought out those too much... I haven't thought about a use for them on the client yet.
>>>>> If it might be a long-running copy, I assume the client needs the
>>>>> ability to abort if the caller is killed.
>>>>>
>>>>> (Dumb question: what happens on the network partition? Does the server
>>>>> abort the copy when it expires the client state?)
>>>>>
>>>>> In any case,
>>>>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
>>>>> says "If a server's COPY operation returns a stateid, then the server
>>>>> MUST also support these operations: CB_OFFLOAD, OFFLOAD_ABORT, and
>>>>> OFFLOAD_STATUS."
>>>>>
>>>>> So even if we've no use for them on the client then we still need to
>>>>> implement them (and probably just write a basic pynfs test). Either
>>>>> that or update the spec.
>>>> Fair enough. I'll think it out and do something! Easy solution: save this patch for later and only support the sync version of copy for the final version of this patch series.
>>> I can't remember--does the spec give the server a clear way to bail out
>>> and tell the client to fall back on a normal copy in cases where the
>>> server knows the copy could take an unreasonable amount of time?
>>>
>>> --b.
>> I don't think so. Is there ever a case where copying over the network would be slower than copying on the server?
> Mybe not, but if the copy will take a minute, then we don't want to tie
> up an rpc slot for a minute.
>
> --b.

I think that we need to be able to handle copies that would take a lot longer
than just a minute - this offload could take a very long time I assume depending
on the size of the data getting copied and the back end storage device....

ric



2013-08-05 14:50:39

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

T24gTW9uLCAyMDEzLTA4LTA1IGF0IDEwOjQxIC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIE1vbiwgQXVnIDA1LCAyMDEzIGF0IDA5OjM4OjIwQU0gKzAxMDAsIFJpYyBXaGVlbGVy
IHdyb3RlOg0KPiA+IE9uIDA3LzIyLzIwMTMgMDg6NTUgUE0sIEouIEJydWNlIEZpZWxkcyB3cm90
ZToNCj4gPiA+T24gTW9uLCBKdWwgMjIsIDIwMTMgYXQgMDM6NTQ6MDBQTSAtMDQwMCwgQnJ5YW4g
U2NodW1ha2VyIHdyb3RlOg0KPiA+ID4+T24gMDcvMjIvMjAxMyAwMzo0MyBQTSwgSi4gQnJ1Y2Ug
RmllbGRzIHdyb3RlOg0KPiA+ID4+Pk9uIE1vbiwgSnVsIDIyLCAyMDEzIGF0IDAzOjM3OjAwUE0g
LTA0MDAsIEJyeWFuIFNjaHVtYWtlciB3cm90ZToNCj4gPiA+Pj4+T24gMDcvMjIvMjAxMyAwMzoz
MCBQTSwgSi4gQnJ1Y2UgRmllbGRzIHdyb3RlOg0KPiA+ID4+Pj4+T24gTW9uLCBKdWwgMjIsIDIw
MTMgYXQgMDM6MTc6MjlQTSAtMDQwMCwgQnJ5YW4gU2NodW1ha2VyIHdyb3RlOg0KPiA+ID4+Pj4+
Pk9uIDA3LzIyLzIwMTMgMDI6NTAgUE0sIEouIEJydWNlIEZpZWxkcyB3cm90ZToNCj4gPiA+Pj4+
Pj4+T24gRnJpLCBKdWwgMTksIDIwMTMgYXQgMDU6MDM6NDlQTSAtMDQwMCwgYmpzY2h1bWFAbmV0
YXBwLmNvbSB3cm90ZToNCj4gPiA+Pj4+Pj4+PkZyb206IEJyeWFuIFNjaHVtYWtlciA8YmpzY2h1
bWFAbmV0YXBwLmNvbT4NCj4gPiA+Pj4+Pj4+Pg0KPiA+ID4+Pj4+Pj4+UmF0aGVyIHRoYW4gcGVy
Zm9ybWluZyB0aGUgY29weSByaWdodCBhd2F5LCBzY2hlZHVsZSBpdCB0byBydW4gbGF0ZXIgYW5k
DQo+ID4gPj4+Pj4+Pj5yZXBseSB0byB0aGUgY2xpZW50LiAgTGF0ZXIsIHNlbmQgYSBjYWxsYmFj
ayB0byBub3RpZnkgdGhlIGNsaWVudCB0aGF0DQo+ID4gPj4+Pj4+Pj50aGUgY29weSBoYXMgZmlu
aXNoZWQuDQo+ID4gPj4+Pj4+PkkgYmVsaWV2ZSB5b3UgbmVlZCB0byBpbXBsZW1lbnQgdGhlIHJl
ZmVycmluZyB0cmlwbGUgc3VwcG9ydCBkZXNjcmliZWQNCj4gPiA+Pj4+Pj4+aW4gaHR0cDovL3Rv
b2xzLmlldGYub3JnL2h0bWwvcmZjNTY2MSNzZWN0aW9uLTIuMTAuNi4zIHRvIGZpeCB0aGUgcmFj
ZQ0KPiA+ID4+Pj4+Pj5kZXNjcmliZWQgaW4NCj4gPiA+Pj4+Pj4+aHR0cDovL3Rvb2xzLmlldGYu
b3JnL2h0bWwvZHJhZnQtaWV0Zi1uZnN2NC1taW5vcnZlcnNpb24yLTE5I3NlY3Rpb24tMTUuMS4z
DQo+ID4gPj4+Pj4+Pi4NCj4gPiA+Pj4+Pj5JJ2xsIHJlLXJlYWQgYW5kIHJlLXdyaXRlLg0KPiA+
ID4+Pj4+Pg0KPiA+ID4+Pj4+Pj5JIHNlZSBjYl9kZWxheSBpbml0aWFsaXplZCBiZWxvdywgYnV0
IG5vdCBvdGhlcndpc2UgdXNlZC4gIEFtIEkgbWlzc2luZw0KPiA+ID4+Pj4+Pj5hbnl0aGluZz8N
Cj4gPiA+Pj4+Pj5XaG9vcHMhICBJIHdhcyB1c2luZyB0aGF0IGVhcmxpZXIgdG8gdHJ5IHRvIGZh
a2UgdXAgYSBjYWxsYmFjaywgYnV0IEkgZXZlbnR1YWxseSBkZWNpZGVkIGl0J3MgZWFzaWVyIHRv
IGp1c3QgZG8gdGhlIGNvcHkgYXN5bmNocm9ub3VzbHkuICBJIG11c3QgaGF2ZSBmb3Jnb3R0ZW4g
dG8gdGFrZSBpdCBvdXQgOigNCj4gPiA+Pj4+Pj4NCj4gPiA+Pj4+Pj4+V2hhdCBhYm91dCBPRkZM
T0FEX1NUQVRVUyBhbmQgT0ZGTE9BRF9BQk9SVD8NCj4gPiA+Pj4+Pj5JIGhhdmVuJ3QgdGhvdWdo
dCBvdXQgdGhvc2UgdG9vIG11Y2guLi4gSSBoYXZlbid0IHRob3VnaHQgYWJvdXQgYSB1c2UgZm9y
IHRoZW0gb24gdGhlIGNsaWVudCB5ZXQuDQo+ID4gPj4+Pj5JZiBpdCBtaWdodCBiZSBhIGxvbmct
cnVubmluZyBjb3B5LCBJIGFzc3VtZSB0aGUgY2xpZW50IG5lZWRzIHRoZQ0KPiA+ID4+Pj4+YWJp
bGl0eSB0byBhYm9ydCBpZiB0aGUgY2FsbGVyIGlzIGtpbGxlZC4NCj4gPiA+Pj4+Pg0KPiA+ID4+
Pj4+KER1bWIgcXVlc3Rpb246IHdoYXQgaGFwcGVucyBvbiB0aGUgbmV0d29yayBwYXJ0aXRpb24/
ICBEb2VzIHRoZSBzZXJ2ZXINCj4gPiA+Pj4+PmFib3J0IHRoZSBjb3B5IHdoZW4gaXQgZXhwaXJl
cyB0aGUgY2xpZW50IHN0YXRlPykNCj4gPiA+Pj4+Pg0KPiA+ID4+Pj4+SW4gYW55IGNhc2UsDQo+
ID4gPj4+Pj5odHRwOi8vdG9vbHMuaWV0Zi5vcmcvaHRtbC9kcmFmdC1pZXRmLW5mc3Y0LW1pbm9y
dmVyc2lvbjItMTkjc2VjdGlvbi0xNS4xLjMNCj4gPiA+Pj4+PnNheXMgIklmIGEgc2VydmVyJ3Mg
Q09QWSBvcGVyYXRpb24gcmV0dXJucyBhIHN0YXRlaWQsIHRoZW4gdGhlIHNlcnZlcg0KPiA+ID4+
Pj4+TVVTVCBhbHNvIHN1cHBvcnQgdGhlc2Ugb3BlcmF0aW9uczogQ0JfT0ZGTE9BRCwgT0ZGTE9B
RF9BQk9SVCwgYW5kDQo+ID4gPj4+Pj5PRkZMT0FEX1NUQVRVUy4iDQo+ID4gPj4+Pj4NCj4gPiA+
Pj4+PlNvIGV2ZW4gaWYgd2UndmUgbm8gdXNlIGZvciB0aGVtIG9uIHRoZSBjbGllbnQgdGhlbiB3
ZSBzdGlsbCBuZWVkIHRvDQo+ID4gPj4+Pj5pbXBsZW1lbnQgdGhlbSAoYW5kIHByb2JhYmx5IGp1
c3Qgd3JpdGUgYSBiYXNpYyBweW5mcyB0ZXN0KS4gIEVpdGhlcg0KPiA+ID4+Pj4+dGhhdCBvciB1
cGRhdGUgdGhlIHNwZWMuDQo+ID4gPj4+PkZhaXIgZW5vdWdoLiAgSSdsbCB0aGluayBpdCBvdXQg
YW5kIGRvIHNvbWV0aGluZyEgIEVhc3kgc29sdXRpb246IHNhdmUgdGhpcyBwYXRjaCBmb3IgbGF0
ZXIgYW5kIG9ubHkgc3VwcG9ydCB0aGUgc3luYyB2ZXJzaW9uIG9mIGNvcHkgZm9yIHRoZSBmaW5h
bCB2ZXJzaW9uIG9mIHRoaXMgcGF0Y2ggc2VyaWVzLg0KPiA+ID4+PkkgY2FuJ3QgcmVtZW1iZXIt
LWRvZXMgdGhlIHNwZWMgZ2l2ZSB0aGUgc2VydmVyIGEgY2xlYXIgd2F5IHRvIGJhaWwgb3V0DQo+
ID4gPj4+YW5kIHRlbGwgdGhlIGNsaWVudCB0byBmYWxsIGJhY2sgb24gYSBub3JtYWwgY29weSBp
biBjYXNlcyB3aGVyZSB0aGUNCj4gPiA+Pj5zZXJ2ZXIga25vd3MgdGhlIGNvcHkgY291bGQgdGFr
ZSBhbiB1bnJlYXNvbmFibGUgYW1vdW50IG9mIHRpbWU/DQo+ID4gPj4+DQo+ID4gPj4+LS1iLg0K
PiA+ID4+SSBkb24ndCB0aGluayBzby4gIElzIHRoZXJlIGV2ZXIgYSBjYXNlIHdoZXJlIGNvcHlp
bmcgb3ZlciB0aGUgbmV0d29yayB3b3VsZCBiZSBzbG93ZXIgdGhhbiBjb3B5aW5nIG9uIHRoZSBz
ZXJ2ZXI/DQo+ID4gPk15YmUgbm90LCBidXQgaWYgdGhlIGNvcHkgd2lsbCB0YWtlIGEgbWludXRl
LCB0aGVuIHdlIGRvbid0IHdhbnQgdG8gdGllDQo+ID4gPnVwIGFuIHJwYyBzbG90IGZvciBhIG1p
bnV0ZS4NCj4gPiA+DQo+ID4gPi0tYi4NCj4gPiANCj4gPiBJIHRoaW5rIHRoYXQgd2UgbmVlZCB0
byBiZSBhYmxlIHRvIGhhbmRsZSBjb3BpZXMgdGhhdCB3b3VsZCB0YWtlIGENCj4gPiBsb3QgbG9u
Z2VyIHRoYW4ganVzdCBhIG1pbnV0ZSAtIHRoaXMgb2ZmbG9hZCBjb3VsZCB0YWtlIGEgdmVyeSBs
b25nDQo+ID4gdGltZSBJIGFzc3VtZSBkZXBlbmRpbmcgb24gdGhlIHNpemUgb2YgdGhlIGRhdGEg
Z2V0dGluZyBjb3BpZWQgYW5kDQo+ID4gdGhlIGJhY2sgZW5kIHN0b3JhZ2UgZGV2aWNlLi4uLg0K
PiANCj4gQnJ5YW4gc3VnZ2VzdGVkIGluIG9mZmxpbmUgZGlzY3Vzc2lvbiB0aGF0IG9uZSBwb3Nz
aWJpbGl0eSBtaWdodCBiZSB0bw0KPiBjb3B5LCBzYXksIGF0IG1vc3QgYSBnaWdhYnl0ZSBhdCBh
IHRpbWUgYmVmb3JlIHJldHVybmluZyBhbmQgbWFraW5nIHRoZQ0KPiBjbGllbnQgY29udGludWUg
dGhlIGNvcHkuDQo+IA0KPiBXaGVyZSBmb3IgImEgZ2lnYWJ5dGUiIHJlYWQsICJzb21lIGFtb3Vu
dCB0aGF0IGRvZXNuJ3QgdGFrZSB0b28gbG9uZyB0bw0KPiBjb3B5IGJ1dCBpcyBzdGlsbCBlbm91
Z2ggdG8gYWxsb3cgY2xvc2UgdG8gZnVsbCBiYW5kd2lkdGgiLiAgSG9wZWZ1bGx5DQo+IHRoYXQn
cyBhbiBlYXN5IG51bWJlciB0byBmaW5kLg0KPiANCj4gQnV0IGJhc2VkIG9uDQo+IGh0dHA6Ly90
b29scy5pZXRmLm9yZy9odG1sL2RyYWZ0LWlldGYtbmZzdjQtbWlub3J2ZXJzaW9uMi0xOSNzZWN0
aW9uLTE0LjEuMg0KPiB0aGUgQ09QWSBvcGVyYXRpb24gaXNuJ3QgZGVzaWduZWQgZm9yIHRoYXQt
LWl0IGRvZXNuJ3QgZ2l2ZSB0aGUgb3B0aW9uDQo+IG9mIHJldHVybmluZyBieXRlc19jb3BpZWQg
aW4gdGhlIHN1Y2Nlc3NmdWwgY2FzZS4NCg0KVGhlIHJlYXNvbiBpcyB0aGF0IHRoZSBzcGVjIHdy
aXRlcnMgZGlkIG5vdCB3YW50IHRvIGZvcmNlIHRoZSBzZXJ2ZXIgdG8NCmNvcHkgdGhlIGRhdGEg
aW4gc2VxdWVudGlhbCBvcmRlciAob3IgYW55IG90aGVyIHBhcnRpY3VsYXIgb3JkZXIgZm9yDQp0
aGF0IG1hdHRlcikuDQoNCklmIHRoZSBjb3B5IHdhcyBzaG9ydCwgdGhlbiB0aGUgY2xpZW50IGNh
bid0IGtub3cgd2hpY2ggYnl0ZXMgd2VyZQ0KY29waWVkOyB0aGV5IGNvdWxkIGJlIGF0IHRoZSBi
ZWdpbm5pbmcgb2YgdGhlIGZpbGUsIGluIHRoZSBtaWRkbGUsIG9yDQpldmVuIHRoZSB2ZXJ5IGVu
ZC4gQmFzaWNhbGx5LCBpdCBuZWVkcyB0byByZWRvIHRoZSBlbnRpcmUgY29weSBpbiBvcmRlcg0K
dG8gYmUgY2VydGFpbi4NCg0KPiBNYXliZSB3ZSBzaG91bGQgZml4IHRoYXQgaW4gdGhlIHNwZWMs
IG9yIG1heWJlIHdlIGp1c3QgbmVlZCB0byBpbXBsZW1lbnQNCj4gdGhlIGFzeW5jaHJvbm91cyBj
YXNlLiAgSSBndWVzcyBpdCBkZXBlbmRzIG9uIHdoaWNoIGlzIGVhc2llciwNCj4gDQo+IAlhKSBp
bXBsZW1lbnRpbmcgdGhlIGFzeW5jaHJvbm91cyBjYXNlIChhbmQgdGhlIHJlZmVycmluZy10cmlw
bGUNCj4gCSAgIHN1cHBvcnQgdG8gZml4IHRoZSBDT1BZL2NhbGxiYWNrIHJhY2VzKSwgb3INCj4g
CWIpIGltcGxlbWVudGluZyB0aGlzIHNvcnQgb2YgInNob3J0IGNvcHkiIGxvb3AgaW4gYSB3YXkg
dGhhdCBnaXZlcw0KPiAJICAgZ29vZCBwZXJmb3JtYW5jZS4NCj4gDQo+IE9uIHRoZSBjbGllbnQg
c2lkZSBpdCdzIGNsZWFybHkgYSkgc2luY2UgeW91J3JlIGZvcmNlZCB0byBoYW5kbGUgdGhhdA0K
PiBjYXNlIGFueXdheS4gIChVbmxlc3Mgd2UgYXJndWUgdGhhdCAqYWxsKiBjb3BpZXMgc2hvdWxk
IHdvcmsgdGhhdCB3YXksDQo+IGFuZCB0aGF0IHRoZSBzcGVjIHNob3VsZCBkaXRjaCB0aGUgYXN5
bmNocm9ub3VzIGNhc2UuKSBPbiB0aGUgc2VydmVyDQo+IHNpZGUsIGIpIGxvb2tzIGVhc2llci4N
Cj4gDQo+IC0tYi4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFp
bnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBw
LmNvbQ0K

2013-08-05 14:44:29

by Ric Wheeler

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On 08/05/2013 03:41 PM, J. Bruce Fields wrote:
> On Mon, Aug 05, 2013 at 09:38:20AM +0100, Ric Wheeler wrote:
>> On 07/22/2013 08:55 PM, J. Bruce Fields wrote:
>>> On Mon, Jul 22, 2013 at 03:54:00PM -0400, Bryan Schumaker wrote:
>>>> On 07/22/2013 03:43 PM, J. Bruce Fields wrote:
>>>>> On Mon, Jul 22, 2013 at 03:37:00PM -0400, Bryan Schumaker wrote:
>>>>>> On 07/22/2013 03:30 PM, J. Bruce Fields wrote:
>>>>>>> On Mon, Jul 22, 2013 at 03:17:29PM -0400, Bryan Schumaker wrote:
>>>>>>>> On 07/22/2013 02:50 PM, J. Bruce Fields wrote:
>>>>>>>>> On Fri, Jul 19, 2013 at 05:03:49PM -0400, [email protected] wrote:
>>>>>>>>>> From: Bryan Schumaker <[email protected]>
>>>>>>>>>>
>>>>>>>>>> Rather than performing the copy right away, schedule it to run later and
>>>>>>>>>> reply to the client. Later, send a callback to notify the client that
>>>>>>>>>> the copy has finished.
>>>>>>>>> I believe you need to implement the referring triple support described
>>>>>>>>> in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
>>>>>>>>> described in
>>>>>>>>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
>>>>>>>>> .
>>>>>>>> I'll re-read and re-write.
>>>>>>>>
>>>>>>>>> I see cb_delay initialized below, but not otherwise used. Am I missing
>>>>>>>>> anything?
>>>>>>>> Whoops! I was using that earlier to try to fake up a callback, but I eventually decided it's easier to just do the copy asynchronously. I must have forgotten to take it out :(
>>>>>>>>
>>>>>>>>> What about OFFLOAD_STATUS and OFFLOAD_ABORT?
>>>>>>>> I haven't thought out those too much... I haven't thought about a use for them on the client yet.
>>>>>>> If it might be a long-running copy, I assume the client needs the
>>>>>>> ability to abort if the caller is killed.
>>>>>>>
>>>>>>> (Dumb question: what happens on the network partition? Does the server
>>>>>>> abort the copy when it expires the client state?)
>>>>>>>
>>>>>>> In any case,
>>>>>>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
>>>>>>> says "If a server's COPY operation returns a stateid, then the server
>>>>>>> MUST also support these operations: CB_OFFLOAD, OFFLOAD_ABORT, and
>>>>>>> OFFLOAD_STATUS."
>>>>>>>
>>>>>>> So even if we've no use for them on the client then we still need to
>>>>>>> implement them (and probably just write a basic pynfs test). Either
>>>>>>> that or update the spec.
>>>>>> Fair enough. I'll think it out and do something! Easy solution: save this patch for later and only support the sync version of copy for the final version of this patch series.
>>>>> I can't remember--does the spec give the server a clear way to bail out
>>>>> and tell the client to fall back on a normal copy in cases where the
>>>>> server knows the copy could take an unreasonable amount of time?
>>>>>
>>>>> --b.
>>>> I don't think so. Is there ever a case where copying over the network would be slower than copying on the server?
>>> Mybe not, but if the copy will take a minute, then we don't want to tie
>>> up an rpc slot for a minute.
>>>
>>> --b.
>> I think that we need to be able to handle copies that would take a
>> lot longer than just a minute - this offload could take a very long
>> time I assume depending on the size of the data getting copied and
>> the back end storage device....
> Bryan suggested in offline discussion that one possibility might be to
> copy, say, at most a gigabyte at a time before returning and making the
> client continue the copy.
>
> Where for "a gigabyte" read, "some amount that doesn't take too long to
> copy but is still enough to allow close to full bandwidth". Hopefully
> that's an easy number to find.
>
> But based on
> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-14.1.2
> the COPY operation isn't designed for that--it doesn't give the option
> of returning bytes_copied in the successful case.
>
> Maybe we should fix that in the spec, or maybe we just need to implement
> the asynchronous case. I guess it depends on which is easier,
>
> a) implementing the asynchronous case (and the referring-triple
> support to fix the COPY/callback races), or
> b) implementing this sort of "short copy" loop in a way that gives
> good performance.
>
> On the client side it's clearly a) since you're forced to handle that
> case anyway. (Unless we argue that *all* copies should work that way,
> and that the spec should ditch the asynchronous case.) On the server
> side, b) looks easier.
>
> --b.

I am not sure that 1GB/time is enough - for a lot of servers, you could do an
enormous range since no data is actually moved inside of the target (just
pointers updated like in reflinked files for example)....

ric


2013-08-05 18:24:05

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

T24gTW9uLCAyMDEzLTA4LTA1IGF0IDE0OjE3IC0wNDAwLCBDaHVjayBMZXZlciB3cm90ZToNCj4g
T24gQXVnIDUsIDIwMTMsIGF0IDI6MTEgUE0sICJKLiBCcnVjZSBGaWVsZHMiIDxiZmllbGRzQGZp
ZWxkc2VzLm9yZz4gd3JvdGU6DQo+IA0KPiA+IE9uIE1vbiwgQXVnIDA1LCAyMDEzIGF0IDAyOjUw
OjM4UE0gKzAwMDAsIE15a2xlYnVzdCwgVHJvbmQgd3JvdGU6DQo+ID4+IE9uIE1vbiwgMjAxMy0w
OC0wNSBhdCAxMDo0MSAtMDQwMCwgSi4gQnJ1Y2UgRmllbGRzIHdyb3RlOg0KPiA+Pj4gQnJ5YW4g
c3VnZ2VzdGVkIGluIG9mZmxpbmUgZGlzY3Vzc2lvbiB0aGF0IG9uZSBwb3NzaWJpbGl0eSBtaWdo
dCBiZSB0bw0KPiA+Pj4gY29weSwgc2F5LCBhdCBtb3N0IGEgZ2lnYWJ5dGUgYXQgYSB0aW1lIGJl
Zm9yZSByZXR1cm5pbmcgYW5kIG1ha2luZyB0aGUNCj4gPj4+IGNsaWVudCBjb250aW51ZSB0aGUg
Y29weS4NCj4gPj4+IA0KPiA+Pj4gV2hlcmUgZm9yICJhIGdpZ2FieXRlIiByZWFkLCAic29tZSBh
bW91bnQgdGhhdCBkb2Vzbid0IHRha2UgdG9vIGxvbmcgdG8NCj4gPj4+IGNvcHkgYnV0IGlzIHN0
aWxsIGVub3VnaCB0byBhbGxvdyBjbG9zZSB0byBmdWxsIGJhbmR3aWR0aCIuICBIb3BlZnVsbHkN
Cj4gPj4+IHRoYXQncyBhbiBlYXN5IG51bWJlciB0byBmaW5kLg0KPiA+Pj4gDQo+ID4+PiBCdXQg
YmFzZWQgb24NCj4gPj4+IGh0dHA6Ly90b29scy5pZXRmLm9yZy9odG1sL2RyYWZ0LWlldGYtbmZz
djQtbWlub3J2ZXJzaW9uMi0xOSNzZWN0aW9uLTE0LjEuMg0KPiA+Pj4gdGhlIENPUFkgb3BlcmF0
aW9uIGlzbid0IGRlc2lnbmVkIGZvciB0aGF0LS1pdCBkb2Vzbid0IGdpdmUgdGhlIG9wdGlvbg0K
PiA+Pj4gb2YgcmV0dXJuaW5nIGJ5dGVzX2NvcGllZCBpbiB0aGUgc3VjY2Vzc2Z1bCBjYXNlLg0K
PiA+PiANCj4gPj4gVGhlIHJlYXNvbiBpcyB0aGF0IHRoZSBzcGVjIHdyaXRlcnMgZGlkIG5vdCB3
YW50IHRvIGZvcmNlIHRoZSBzZXJ2ZXIgdG8NCj4gPj4gY29weSB0aGUgZGF0YSBpbiBzZXF1ZW50
aWFsIG9yZGVyIChvciBhbnkgb3RoZXIgcGFydGljdWxhciBvcmRlciBmb3INCj4gPj4gdGhhdCBt
YXR0ZXIpLg0KPiA+IA0KPiA+IFdlbGwsIHNlcnZlcnMgd291bGQgc3RpbGwgaGF2ZSB0aGUgb3B0
aW9uIG5vdCB0byByZXR1cm4gc3VjY2VzcyB1bmxlc3MNCj4gPiB0aGUgd2hvbGUgY29weSBzdWNj
ZWVkZWQsIHNvIEknbSBub3Qgc3VyZSB0aGlzICpmb3JjZXMqIHNlcnZlcnMgdG8gZG8NCj4gPiBz
ZXF1ZW50aWFsIGNvcGllcy4NCj4gPiANCj4gPiAoVW5sZXNzIHdlIGFsc28gZ290IHJpZCBvZiB0
aGUgY2FsbGJhY2suKQ0KPiANCj4gSWYgdGhlIGNsaWVudCBpbml0aWF0ZXMgYSBmdWxsLWZpbGUg
Y29weSBhbmQgdGhlIG9wZXJhdGlvbiBmYWlscywgSSB3b3VsZCB0aGluayB0aGF0IHRoZSBjbGll
bnQgaXRzZWxmIGNhbiB0cnkgY29weWluZyBzdWZmaWNpZW50bHkgbGFyZ2UgY2h1bmtzIG9mIHRo
ZSBmaWxlIHZpYSBzZXBhcmF0ZSBpbmRpdmlkdWFsIENPUFkgb3BlcmF0aW9ucy4gIElmIGFueSBv
ZiB0aG9zZSBvcGVyYXRpb25zIGZhaWxzLCB0aGVuIHRoZSBjbGllbnQgY2FuIGZhbGwgYmFjayBh
Z2FpbiB0byBhIHRyYWRpdGlvbmFsIG92ZXItdGhlLXdpcmUgY29weSBvcGVyYXRpb24uDQoNCkhv
dyBkb2VzIHRoZSBjbGllbnQgZGV0ZXJtaW5lIHdoYXQgY29uc3RpdHV0ZXMgYSAic3VmZmljaWVu
dGx5IGxhcmdlDQpjaHVuayIgaW4gdGhlIG1pbmQgb2YgdGhlIHNlcnZlciwgYW5kIHdoeSBkbyB3
ZSB3YW50IHRvIGFkZCB0aGF0DQpmdW5jdGlvbmFsaXR5IGluIHRoZSBmaXJzdCBwbGFjZT8gRmFs
bGJhY2sgdG8gdHJhZGl0aW9uYWwgY29weSBpbiB0aGUNCmNhc2Ugd2hlcmUgdGhlIHNlcnZlciBk
b2Vzbid0IHN1cHBvcnQgb2ZmbG9hZCBpcyBmaW5lLCBidXQgYWxsIHRoZXNlDQpzY3Jld2JhbGwg
c3BlY2lhbCBjYXNlcyBhcmUgbm90LiBXZSBhbHJlYWR5IGhhdmUgc3luYyB2cyBhc3luYy4gTm93
IHlvdQ0Kd2FudCB0byBhZGQgY2h1bmtlZCBzeW5jIGFuZCBjaHVua2VkIGFzeW5jIHRvbz8gTkFD
Sy4uLg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVy
DQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQo=

2013-08-05 18:11:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On Mon, Aug 05, 2013 at 02:50:38PM +0000, Myklebust, Trond wrote:
> On Mon, 2013-08-05 at 10:41 -0400, J. Bruce Fields wrote:
> > Bryan suggested in offline discussion that one possibility might be to
> > copy, say, at most a gigabyte at a time before returning and making the
> > client continue the copy.
> >
> > Where for "a gigabyte" read, "some amount that doesn't take too long to
> > copy but is still enough to allow close to full bandwidth". Hopefully
> > that's an easy number to find.
> >
> > But based on
> > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-14.1.2
> > the COPY operation isn't designed for that--it doesn't give the option
> > of returning bytes_copied in the successful case.
>
> The reason is that the spec writers did not want to force the server to
> copy the data in sequential order (or any other particular order for
> that matter).

Well, servers would still have the option not to return success unless
the whole copy succeeded, so I'm not sure this *forces* servers to do
sequential copies.

(Unless we also got rid of the callback.)

--b.

>
> If the copy was short, then the client can't know which bytes were
> copied; they could be at the beginning of the file, in the middle, or
> even the very end. Basically, it needs to redo the entire copy in order
> to be certain.
>
> > Maybe we should fix that in the spec, or maybe we just need to implement
> > the asynchronous case. I guess it depends on which is easier,
> >
> > a) implementing the asynchronous case (and the referring-triple
> > support to fix the COPY/callback races), or
> > b) implementing this sort of "short copy" loop in a way that gives
> > good performance.
> >
> > On the client side it's clearly a) since you're forced to handle that
> > case anyway. (Unless we argue that *all* copies should work that way,
> > and that the spec should ditch the asynchronous case.) On the server
> > side, b) looks easier.
> >
> > --b.
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
> N?????r??y????b?X??ǧv?^?)޺{.n?+????{???"??^n?r???z???h?????&???G???h?(?階?ݢj"???m??????z?ޖ???f???h???~?m

2013-08-05 14:44:37

by Anna Schumaker

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On 08/05/2013 10:41 AM, J. Bruce Fields wrote:
> On Mon, Aug 05, 2013 at 09:38:20AM +0100, Ric Wheeler wrote:
>> On 07/22/2013 08:55 PM, J. Bruce Fields wrote:
>>> On Mon, Jul 22, 2013 at 03:54:00PM -0400, Bryan Schumaker wrote:
>>>> On 07/22/2013 03:43 PM, J. Bruce Fields wrote:
>>>>> On Mon, Jul 22, 2013 at 03:37:00PM -0400, Bryan Schumaker wrote:
>>>>>> On 07/22/2013 03:30 PM, J. Bruce Fields wrote:
>>>>>>> On Mon, Jul 22, 2013 at 03:17:29PM -0400, Bryan Schumaker wrote:
>>>>>>>> On 07/22/2013 02:50 PM, J. Bruce Fields wrote:
>>>>>>>>> On Fri, Jul 19, 2013 at 05:03:49PM -0400, [email protected] wrote:
>>>>>>>>>> From: Bryan Schumaker <[email protected]>
>>>>>>>>>>
>>>>>>>>>> Rather than performing the copy right away, schedule it to run later and
>>>>>>>>>> reply to the client. Later, send a callback to notify the client that
>>>>>>>>>> the copy has finished.
>>>>>>>>> I believe you need to implement the referring triple support described
>>>>>>>>> in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
>>>>>>>>> described in
>>>>>>>>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
>>>>>>>>> .
>>>>>>>> I'll re-read and re-write.
>>>>>>>>
>>>>>>>>> I see cb_delay initialized below, but not otherwise used. Am I missing
>>>>>>>>> anything?
>>>>>>>> Whoops! I was using that earlier to try to fake up a callback, but I eventually decided it's easier to just do the copy asynchronously. I must have forgotten to take it out :(
>>>>>>>>
>>>>>>>>> What about OFFLOAD_STATUS and OFFLOAD_ABORT?
>>>>>>>> I haven't thought out those too much... I haven't thought about a use for them on the client yet.
>>>>>>> If it might be a long-running copy, I assume the client needs the
>>>>>>> ability to abort if the caller is killed.
>>>>>>>
>>>>>>> (Dumb question: what happens on the network partition? Does the server
>>>>>>> abort the copy when it expires the client state?)
>>>>>>>
>>>>>>> In any case,
>>>>>>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
>>>>>>> says "If a server's COPY operation returns a stateid, then the server
>>>>>>> MUST also support these operations: CB_OFFLOAD, OFFLOAD_ABORT, and
>>>>>>> OFFLOAD_STATUS."
>>>>>>>
>>>>>>> So even if we've no use for them on the client then we still need to
>>>>>>> implement them (and probably just write a basic pynfs test). Either
>>>>>>> that or update the spec.
>>>>>> Fair enough. I'll think it out and do something! Easy solution: save this patch for later and only support the sync version of copy for the final version of this patch series.
>>>>> I can't remember--does the spec give the server a clear way to bail out
>>>>> and tell the client to fall back on a normal copy in cases where the
>>>>> server knows the copy could take an unreasonable amount of time?
>>>>>
>>>>> --b.
>>>> I don't think so. Is there ever a case where copying over the network would be slower than copying on the server?
>>> Mybe not, but if the copy will take a minute, then we don't want to tie
>>> up an rpc slot for a minute.
>>>
>>> --b.
>>
>> I think that we need to be able to handle copies that would take a
>> lot longer than just a minute - this offload could take a very long
>> time I assume depending on the size of the data getting copied and
>> the back end storage device....
>
> Bryan suggested in offline discussion that one possibility might be to
> copy, say, at most a gigabyte at a time before returning and making the
> client continue the copy.
>
> Where for "a gigabyte" read, "some amount that doesn't take too long to
> copy but is still enough to allow close to full bandwidth". Hopefully
> that's an easy number to find.
>
> But based on
> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-14.1.2
> the COPY operation isn't designed for that--it doesn't give the option
> of returning bytes_copied in the successful case.

Wouldn't the wr_count field in a write_response4 struct be the bytes copied? I'm working on a patch for puting a limit on the amount copied - is there a "1 gigabyte in bytes" constant somewhere?

- Bryan

>
> Maybe we should fix that in the spec, or maybe we just need to implement
> the asynchronous case. I guess it depends on which is easier,
>
> a) implementing the asynchronous case (and the referring-triple
> support to fix the COPY/callback races), or
> b) implementing this sort of "short copy" loop in a way that gives
> good performance.
>
> On the client side it's clearly a) since you're forced to handle that
> case anyway. (Unless we argue that *all* copies should work that way,
> and that the spec should ditch the asynchronous case.) On the server
> side, b) looks easier.
>
> --b.
>


2013-08-05 14:56:56

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On Mon, Aug 05, 2013 at 03:44:18PM +0100, Ric Wheeler wrote:
> On 08/05/2013 03:41 PM, J. Bruce Fields wrote:
> >On Mon, Aug 05, 2013 at 09:38:20AM +0100, Ric Wheeler wrote:
> >>On 07/22/2013 08:55 PM, J. Bruce Fields wrote:
> >>>On Mon, Jul 22, 2013 at 03:54:00PM -0400, Bryan Schumaker wrote:
> >>>>On 07/22/2013 03:43 PM, J. Bruce Fields wrote:
> >>>>>On Mon, Jul 22, 2013 at 03:37:00PM -0400, Bryan Schumaker wrote:
> >>>>>>On 07/22/2013 03:30 PM, J. Bruce Fields wrote:
> >>>>>>>On Mon, Jul 22, 2013 at 03:17:29PM -0400, Bryan Schumaker wrote:
> >>>>>>>>On 07/22/2013 02:50 PM, J. Bruce Fields wrote:
> >>>>>>>>>On Fri, Jul 19, 2013 at 05:03:49PM -0400, [email protected] wrote:
> >>>>>>>>>>From: Bryan Schumaker <[email protected]>
> >>>>>>>>>>
> >>>>>>>>>>Rather than performing the copy right away, schedule it to run later and
> >>>>>>>>>>reply to the client. Later, send a callback to notify the client that
> >>>>>>>>>>the copy has finished.
> >>>>>>>>>I believe you need to implement the referring triple support described
> >>>>>>>>>in http://tools.ietf.org/html/rfc5661#section-2.10.6.3 to fix the race
> >>>>>>>>>described in
> >>>>>>>>>http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> >>>>>>>>>.
> >>>>>>>>I'll re-read and re-write.
> >>>>>>>>
> >>>>>>>>>I see cb_delay initialized below, but not otherwise used. Am I missing
> >>>>>>>>>anything?
> >>>>>>>>Whoops! I was using that earlier to try to fake up a callback, but I eventually decided it's easier to just do the copy asynchronously. I must have forgotten to take it out :(
> >>>>>>>>
> >>>>>>>>>What about OFFLOAD_STATUS and OFFLOAD_ABORT?
> >>>>>>>>I haven't thought out those too much... I haven't thought about a use for them on the client yet.
> >>>>>>>If it might be a long-running copy, I assume the client needs the
> >>>>>>>ability to abort if the caller is killed.
> >>>>>>>
> >>>>>>>(Dumb question: what happens on the network partition? Does the server
> >>>>>>>abort the copy when it expires the client state?)
> >>>>>>>
> >>>>>>>In any case,
> >>>>>>>http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-15.1.3
> >>>>>>>says "If a server's COPY operation returns a stateid, then the server
> >>>>>>>MUST also support these operations: CB_OFFLOAD, OFFLOAD_ABORT, and
> >>>>>>>OFFLOAD_STATUS."
> >>>>>>>
> >>>>>>>So even if we've no use for them on the client then we still need to
> >>>>>>>implement them (and probably just write a basic pynfs test). Either
> >>>>>>>that or update the spec.
> >>>>>>Fair enough. I'll think it out and do something! Easy solution: save this patch for later and only support the sync version of copy for the final version of this patch series.
> >>>>>I can't remember--does the spec give the server a clear way to bail out
> >>>>>and tell the client to fall back on a normal copy in cases where the
> >>>>>server knows the copy could take an unreasonable amount of time?
> >>>>>
> >>>>>--b.
> >>>>I don't think so. Is there ever a case where copying over the network would be slower than copying on the server?
> >>>Mybe not, but if the copy will take a minute, then we don't want to tie
> >>>up an rpc slot for a minute.
> >>>
> >>>--b.
> >>I think that we need to be able to handle copies that would take a
> >>lot longer than just a minute - this offload could take a very long
> >>time I assume depending on the size of the data getting copied and
> >>the back end storage device....
> >Bryan suggested in offline discussion that one possibility might be to
> >copy, say, at most a gigabyte at a time before returning and making the
> >client continue the copy.
> >
> >Where for "a gigabyte" read, "some amount that doesn't take too long to
> >copy but is still enough to allow close to full bandwidth". Hopefully
> >that's an easy number to find.
> >
> >But based on
> >http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-14.1.2
> >the COPY operation isn't designed for that--it doesn't give the option
> >of returning bytes_copied in the successful case.
> >
> >Maybe we should fix that in the spec, or maybe we just need to implement
> >the asynchronous case. I guess it depends on which is easier,
> >
> > a) implementing the asynchronous case (and the referring-triple
> > support to fix the COPY/callback races), or
> > b) implementing this sort of "short copy" loop in a way that gives
> > good performance.
> >
> >On the client side it's clearly a) since you're forced to handle that
> >case anyway. (Unless we argue that *all* copies should work that way,
> >and that the spec should ditch the asynchronous case.) On the server
> >side, b) looks easier.
> >
> >--b.
>
> I am not sure that 1GB/time is enough - for a lot of servers, you
> could do an enormous range since no data is actually moved inside of
> the target (just pointers updated like in reflinked files for
> example)....

Right, but the short copy return would be optional on the server's
part--so the client would request the whole range, and the server could
copy it all in the quick update-some-pointers case while copying only
the first gigabyte (or whatever) in the "dumb" read-write-loop case.

But in the "dumb" case the server still needs a large enough range to
make its IO efficient and to amortize the cost of the extra round trips,
since the client is forced to serialize the COPY requests by the need to
see the bytes_copied result.

--b.

2013-08-05 18:30:32

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying

On Mon, Aug 05, 2013 at 02:11:21PM -0400, J. Bruce Fields wrote:
> On Mon, Aug 05, 2013 at 02:50:38PM +0000, Myklebust, Trond wrote:
> > On Mon, 2013-08-05 at 10:41 -0400, J. Bruce Fields wrote:
> > > Bryan suggested in offline discussion that one possibility might be to
> > > copy, say, at most a gigabyte at a time before returning and making the
> > > client continue the copy.
> > >
> > > Where for "a gigabyte" read, "some amount that doesn't take too long to
> > > copy but is still enough to allow close to full bandwidth". Hopefully
> > > that's an easy number to find.
> > >
> > > But based on
> > > http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-14.1.2
> > > the COPY operation isn't designed for that--it doesn't give the option
> > > of returning bytes_copied in the successful case.
> >
> > The reason is that the spec writers did not want to force the server to
> > copy the data in sequential order (or any other particular order for
> > that matter).
>
> Well, servers would still have the option not to return success unless
> the whole copy succeeded, so I'm not sure this *forces* servers to do
> sequential copies.

Uh, sorry, I was confused, I missed the write_response4 in the COPY
result entirely.

Yeah obviously that's useless. (Why's it there anyway? No client or
application is going to care about anything other than whether it's 0 or
not, right?)

So maybe it would be useful to add a way for a server to optionally
communicate a sequential bytes_written, I don't know.

Without that, at least, I think the only reasonable implementation of
"dumb" server-side copies will need to implement the asynchronous case
(and referring triples). Which might be worth doing.

But for the first cut maybe we should instead *only* implement this on
btrfs (or whoever else can do quick copies).

--b.

>
> (Unless we also got rid of the callback.)
> > If the copy was short, then the client can't know which bytes were
> > copied; they could be at the beginning of the file, in the middle, or
> > even the very end. Basically, it needs to redo the entire copy in order
> > to be certain.
> >
> > > Maybe we should fix that in the spec, or maybe we just need to implement
> > > the asynchronous case. I guess it depends on which is easier,
> > >
> > > a) implementing the asynchronous case (and the referring-triple
> > > support to fix the COPY/callback races), or
> > > b) implementing this sort of "short copy" loop in a way that gives
> > > good performance.
> > >
> > > On the client side it's clearly a) since you're forced to handle that
> > > case anyway. (Unless we argue that *all* copies should work that way,
> > > and that the spec should ditch the asynchronous case.) On the server
> > > side, b) looks easier.
> > >
> > > --b.
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer
> >
> > NetApp
> > [email protected]
> > http://www.netapp.com
> > N?????r??y????b?X??ǧv?^?)޺{.n?+????{???"??^n?r???z???h?????&???G???h?(?階?ݢj"???m??????z?ޖ???f???h???~?m

2013-08-05 18:18:06

by Chuck Lever III

[permalink] [raw]
Subject: Re: [RFC 4/5] NFSD: Defer copying


On Aug 5, 2013, at 2:11 PM, "J. Bruce Fields" <[email protected]> wrote:

> On Mon, Aug 05, 2013 at 02:50:38PM +0000, Myklebust, Trond wrote:
>> On Mon, 2013-08-05 at 10:41 -0400, J. Bruce Fields wrote:
>>> Bryan suggested in offline discussion that one possibility might be to
>>> copy, say, at most a gigabyte at a time before returning and making the
>>> client continue the copy.
>>>
>>> Where for "a gigabyte" read, "some amount that doesn't take too long to
>>> copy but is still enough to allow close to full bandwidth". Hopefully
>>> that's an easy number to find.
>>>
>>> But based on
>>> http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-19#section-14.1.2
>>> the COPY operation isn't designed for that--it doesn't give the option
>>> of returning bytes_copied in the successful case.
>>
>> The reason is that the spec writers did not want to force the server to
>> copy the data in sequential order (or any other particular order for
>> that matter).
>
> Well, servers would still have the option not to return success unless
> the whole copy succeeded, so I'm not sure this *forces* servers to do
> sequential copies.
>
> (Unless we also got rid of the callback.)

If the client initiates a full-file copy and the operation fails, I would think that the client itself can try copying sufficiently large chunks of the file via separate individual COPY operations. If any of those operations fails, then the client can fall back again to a traditional over-the-wire copy operation.


> --b.
>
>>
>> If the copy was short, then the client can't know which bytes were
>> copied; they could be at the beginning of the file, in the middle, or
>> even the very end. Basically, it needs to redo the entire copy in order
>> to be certain.
>>
>>> Maybe we should fix that in the spec, or maybe we just need to implement
>>> the asynchronous case. I guess it depends on which is easier,
>>>
>>> a) implementing the asynchronous case (and the referring-triple
>>> support to fix the COPY/callback races), or
>>> b) implementing this sort of "short copy" loop in a way that gives
>>> good performance.
>>>
>>> On the client side it's clearly a) since you're forced to handle that
>>> case anyway. (Unless we argue that *all* copies should work that way,
>>> and that the spec should ditch the asynchronous case.) On the server
>>> side, b) looks easier.
>>>
>>> --b.
>>
>> --
>> Trond Myklebust
>> Linux NFS client maintainer
>>
>> NetApp
>> [email protected]
>> http://www.netapp.com
>> N?????r??y????b?X??ǧv?^?)޺{.n?+????{???"??^n?r???z???h?????&???G???h?(?階?ݢj"???m??????z?ޖ???f???h???~?m
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com