2018-11-30 20:03:52

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH v2 00/10] server-side support for "inter" SSC copy

This patch series adds support for NFSv4.2 copy offload feature
allowing copy between two different NFS servers.

This functionality depends on the VFS ability to support generic
copy_file_range() where a copy is done between an NFS file and
a local file system.

This feature is enabled by the kernel module parameter --
inter_copy_offload_enable -- and by default is disabled. There is
also a kernel compile configuration of NFSD_V4_2_INTER_SSC that
adds dependency on the NFS client side functions called from the
server.

These patches work on top of existing async intra copy offload
patches. For the "inter" SSC, the implementation only supports
asynchronous inter copy.

On the source server, upon receiving a COPY_NOTIFY, it generate a
unique stateid that's kept in the global list. Upon receiving a READ
with a stateid, the code checks the normal list of open stateid and
now additionally, it'll check the copy state list as well before
deciding to either fail with BAD_STATEID or find one that matches.
The stored stateid is only valid to be used for the first time
with a choosen lease period (90s currently). When the source server
received an OFFLOAD_CANCEL, it will remove the stateid from the
global list. Otherwise, the copy stateid is removed upon the removal
of its "parent" stateid (open/lock/delegation stateid).

On the destination server, upon receiving a COPY request, the server
establishes the necessary clientid/session with the source server.
It calls into the NFS client code to establish the necessary
open stateid, filehandle, file description (without doing an NFS open).
Then the server calls into the copy_file_range() to preform the copy
where the source file will issue NFS READs and then do local file
system writes (this depends on the VFS ability to do cross device
copy_file_range().

v2:
-- in on top of 4.20-rc4 + client side inter patch series
-- VFS changes to do enable generic copy_file_range() and then NFS
falls back on generic_copy_file_range() for previous EXDEV/OPNOTSUPP
errors
-- hopefully addressed Bruce's review comments (highlights are):
--- copy_notify patch: addressed naming, sc_cp_list access is
now protected by s2s_cp_lock
--- fillin netloc4 patch: address the size and added WARN_ON
--- add ca_source to COPY: decode only 1 address, dont allocate
memory (the rest into dummy)
--- check stateid against stored: moved the refcount under lock
--- allow stale filehandle: adding a loop to go thru the ops in
the compound, store/manage puttfh if copy is present in the compound
mark the source putfh as "no verify".

All the patches (client inter) and this patch series is available
from git://linux-nfs.org/projects/aglo/linux.git under the "linux-ssc"
branch

Olga Kornievskaia (10):
VFS generic copy_file_range() support
NFS fallback to generic_copy_file_range
NFSD fill-in netloc4 structure
NFSD add ca_source_server<> to COPY
NFSD return nfs4_stid in nfs4_preprocess_stateid_op
NFSD add COPY_NOTIFY operation
NFSD check stateids against copy stateids
NFSD generalize nfsd4_compound_state flag names
NFSD: allow inter server COPY to have a STALE source server fh
NFSD add nfs4 inter ssc to nfsd4_copy

fs/nfs/nfs4file.c | 9 +-
fs/nfsd/Kconfig | 10 ++
fs/nfsd/nfs4proc.c | 406 ++++++++++++++++++++++++++++++++++++++++++++++-----
fs/nfsd/nfs4state.c | 124 ++++++++++++++--
fs/nfsd/nfs4xdr.c | 166 ++++++++++++++++++++-
fs/nfsd/nfsd.h | 32 ++++
fs/nfsd/nfsfh.h | 5 +-
fs/nfsd/nfssvc.c | 6 +
fs/nfsd/state.h | 21 ++-
fs/nfsd/xdr4.h | 37 ++++-
fs/read_write.c | 66 +++++++--
include/linux/fs.h | 7 +
include/linux/nfs4.h | 1 +
mm/filemap.c | 6 +-
14 files changed, 810 insertions(+), 86 deletions(-)

--
1.8.3.1



2018-11-30 20:03:55

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH v2 01/10] VFS generic copy_file_range() support

Relax the condition that input files must be from the same
file systems.

Add checks that input parameters adhere semantics.

If no copy_file_range() support is found, then do generic
checks for the unsupported page cache ranges, LFS, limits,
and clear setuid/setgid if not running as root before calling
do_splice_direct(). Update atime,ctime,mtime afterwards.

Signed-off-by: Olga Kornievskaia <[email protected]>
---
fs/read_write.c | 66 ++++++++++++++++++++++++++++++++++++++++++------------
include/linux/fs.h | 7 ++++++
mm/filemap.c | 6 ++---
3 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 7b9e59d..2d309b0 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1540,6 +1540,44 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
}
#endif

+ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
+ struct file *file_out, loff_t pos_out,
+ loff_t len, unsigned int flags)
+{
+ ssize_t ret;
+ loff_t size_in = i_size_read(file_inode(file_in)), count;
+
+ /* preform generic checks for unsupported page cache ranges, LFS
+ * limits. If pos exceeds the limit, returns EFBIG
+ */
+ count = min(len, size_in - pos_in);
+ ret = generic_access_check_limits(file_in, pos_in, &count);
+ if (ret)
+ goto done;
+ ret = generic_write_check_limits(file_out, pos_out, &count);
+ if (ret)
+ goto done;
+ /* If not running as root, clear setuid/setgid bits. This keeps
+ * people from modifying setuid and setgid binaries.
+ */
+ if (!IS_NOSEC(file_inode(file_out))) {
+ ret = file_remove_privs(file_out);
+ if (ret)
+ goto done;
+ }
+
+ ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
+ count > MAX_RW_COUNT ? MAX_RW_COUNT : count, 0);
+
+ file_accessed(file_in);
+ if (!(file_out->f_mode & FMODE_NOCMTIME))
+ file_update_time(file_out);
+
+done:
+ return ret;
+}
+EXPORT_SYMBOL(generic_copy_file_range);
+
/*
* copy_file_range() differs from regular file read and write in that it
* specifically allows return partial success. When it does so is up to
@@ -1552,6 +1590,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
struct inode *inode_in = file_inode(file_in);
struct inode *inode_out = file_inode(file_out);
ssize_t ret;
+ loff_t size_in;

if (flags != 0)
return -EINVAL;
@@ -1577,6 +1616,15 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
if (len == 0)
return 0;

+ /* Ensure offsets don't wrap. */
+ if (pos_in + len < pos_in || pos_out + len < pos_out)
+ return -EINVAL;
+
+ size_in = i_size_read(inode_in);
+ /* Ensure that source range is within EOF. */
+ if (pos_in >= size_in || pos_in + len > size_in)
+ return -EINVAL;
+
file_start_write(file_out);

/*
@@ -1597,22 +1645,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
}
}

- if (file_out->f_op->copy_file_range) {
+ if (file_out->f_op->copy_file_range)
ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
pos_out, len, flags);
- if (ret != -EOPNOTSUPP)
- goto done;
- }
-
- /* this could be relaxed once generic cross fs support is added */
- if (inode_in->i_sb != inode_out->i_sb) {
- ret = -EXDEV;
- goto done;
- }
-
- ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
- len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
-
+ else
+ ret = generic_copy_file_range(file_in, pos_in, file_out,
+ pos_out, len, flags);
done:
if (ret > 0) {
fsnotify_access(file_in);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c95c080..c88ad09 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1874,6 +1874,9 @@ extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
unsigned long, loff_t *, rwf_t);
extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
loff_t, size_t, unsigned int);
+extern ssize_t generic_copy_file_range(struct file *file_int, loff_t pos_in,
+ struct file *file_out, loff_t pos_out,
+ loff_t len, unsigned int flags);
extern int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
loff_t *count,
@@ -3016,6 +3019,10 @@ static inline void remove_inode_hash(struct inode *inode)
extern int generic_file_mmap(struct file *, struct vm_area_struct *);
extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
extern ssize_t generic_write_checks(struct kiocb *, struct iov_iter *);
+extern int generic_access_check_limits(struct file *file, loff_t pos,
+ loff_t *count);
+extern int generic_write_check_limits(struct file *file, loff_t pos,
+ loff_t *count);
extern int generic_remap_checks(struct file *file_in, loff_t pos_in,
struct file *file_out, loff_t pos_out,
loff_t *count, unsigned int remap_flags);
diff --git a/mm/filemap.c b/mm/filemap.c
index 81adec8..894f3ae 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2829,8 +2829,7 @@ struct page *read_cache_page_gfp(struct address_space *mapping,
* LFS limits. If pos is under the limit it becomes a short access. If it
* exceeds the limit we return -EFBIG.
*/
-static int generic_access_check_limits(struct file *file, loff_t pos,
- loff_t *count)
+int generic_access_check_limits(struct file *file, loff_t pos, loff_t *count)
{
struct inode *inode = file->f_mapping->host;
loff_t max_size = inode->i_sb->s_maxbytes;
@@ -2844,8 +2843,7 @@ static int generic_access_check_limits(struct file *file, loff_t pos,
return 0;
}

-static int generic_write_check_limits(struct file *file, loff_t pos,
- loff_t *count)
+int generic_write_check_limits(struct file *file, loff_t pos, loff_t *count)
{
loff_t limit = rlimit(RLIMIT_FSIZE);

--
1.8.3.1


2018-11-30 20:03:57

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH v2 02/10] NFS fallback to generic_copy_file_range

If NFS unable to handle the copy then fallback to the generic VFS
copy_file_range functionality.

Also remove the offset check, as the check was added at the VFS.

Signed-off-by: Olga Kornievskaia <[email protected]>
---
fs/nfs/nfs4file.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 4fe9fc1..78e163a 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -139,17 +139,16 @@ static ssize_t nfs4_copy_file_range(struct file *file_in, loff_t pos_in,
nfs4_stateid *cnrs = NULL;
ssize_t ret;

- if (pos_in >= i_size_read(file_inode(file_in)))
- return -EINVAL;
-
if (file_in->f_op != &nfs4_file_operations)
- return -EXDEV;
+ return generic_copy_file_range(file_in, pos_in, file_out,
+ pos_out, count, flags);

if (file_inode(file_in) == file_inode(file_out))
return -EINVAL;

if (!nfs_server_capable(file_inode(file_out), NFS_CAP_COPY))
- return -EOPNOTSUPP;
+ return generic_copy_file_range(file_in, pos_in, file_out,
+ pos_out, count, flags);
retry:
if (!nfs42_files_from_same_server(file_in, file_out)) {
cn_resp = kzalloc(sizeof(struct nfs42_copy_notify_res),
--
1.8.3.1


2018-11-30 20:03:59

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH v2 03/10] NFSD fill-in netloc4 structure

nfs.4 defines nfs42_netaddr structure that represents netloc4.

Populate needed fields from the sockaddr structure.

This will be used by flexfiles and 4.2 inter copy

Signed-off-by: Olga Kornievskaia <[email protected]>
---
fs/nfsd/nfsd.h | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)

diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 0668999..a8fec63 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -18,6 +18,7 @@
#include <linux/nfs4.h>
#include <linux/sunrpc/svc.h>
#include <linux/sunrpc/msg_prot.h>
+#include <linux/sunrpc/addr.h>

#include <uapi/linux/nfsd/debug.h>

@@ -366,6 +367,37 @@ static inline bool nfsd4_spo_must_allow(struct svc_rqst *rqstp)

extern const u32 nfsd_suppattrs[3][3];

+static inline u32 nfsd4_set_netaddr(struct sockaddr *addr,
+ struct nfs42_netaddr *netaddr)
+{
+ struct sockaddr_in *sin = (struct sockaddr_in *)addr;
+ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)addr;
+ unsigned int port;
+ size_t ret_addr, ret_port;
+
+ switch (addr->sa_family) {
+ case AF_INET:
+ port = ntohs(sin->sin_port);
+ sprintf(netaddr->netid, "tcp");
+ netaddr->netid_len = 3;
+ break;
+ case AF_INET6:
+ port = ntohs(sin6->sin6_port);
+ sprintf(netaddr->netid, "tcp6");
+ netaddr->netid_len = 4;
+ break;
+ default:
+ return nfserr_inval;
+ }
+ ret_addr = rpc_ntop(addr, netaddr->addr, sizeof(netaddr->addr));
+ ret_port = snprintf(netaddr->addr + ret_addr,
+ RPCBIND_MAXUADDRLEN + 1 - ret_addr,
+ ".%u.%u", port >> 8, port & 0xff);
+ WARN_ON(ret_port >= RPCBIND_MAXUADDRLEN + 1 - ret_addr);
+ netaddr->addr_len = ret_addr + ret_port;
+ return 0;
+}
+
static inline bool bmval_is_subset(const u32 *bm1, const u32 *bm2)
{
return !((bm1[0] & ~bm2[0]) ||
--
1.8.3.1


2018-11-30 20:04:01

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH v2 04/10] NFSD add ca_source_server<> to COPY

Decode the ca_source_server list that's sent but only use the
first one. Presence of non-zero list indicates an "inter" copy.

Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Olga Kornievskaia <[email protected]>
---
fs/nfsd/nfs4xdr.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
fs/nfsd/xdr4.h | 12 ++++++----
2 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 3de42a7..879ddc6 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -40,6 +40,7 @@
#include <linux/utsname.h>
#include <linux/pagemap.h>
#include <linux/sunrpc/svcauth_gss.h>
+#include <linux/sunrpc/addr.h>

#include "idmap.h"
#include "acl.h"
@@ -1743,11 +1744,58 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
DECODE_TAIL;
}

+static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp,
+ struct nl4_server *ns)
+{
+ DECODE_HEAD;
+ struct nfs42_netaddr *naddr;
+
+ READ_BUF(4);
+ ns->nl4_type = be32_to_cpup(p++);
+
+ /* currently support for 1 inter-server source server */
+ switch (ns->nl4_type) {
+ case NL4_NAME:
+ case NL4_URL:
+ READ_BUF(4);
+ ns->u.nl4_str_sz = be32_to_cpup(p++);
+ if (ns->u.nl4_str_sz > NFS4_OPAQUE_LIMIT)
+ goto xdr_error;
+
+ READ_BUF(ns->u.nl4_str_sz);
+ COPYMEM(ns->u.nl4_str,
+ ns->u.nl4_str_sz);
+ break;
+ case NL4_NETADDR:
+ naddr = &ns->u.nl4_addr;
+
+ READ_BUF(4);
+ naddr->netid_len = be32_to_cpup(p++);
+ if (naddr->netid_len > RPCBIND_MAXNETIDLEN)
+ goto xdr_error;
+
+ READ_BUF(naddr->netid_len + 4); /* 4 for uaddr len */
+ COPYMEM(naddr->netid, naddr->netid_len);
+
+ naddr->addr_len = be32_to_cpup(p++);
+ if (naddr->addr_len > RPCBIND_MAXUADDRLEN)
+ goto xdr_error;
+
+ READ_BUF(naddr->addr_len);
+ COPYMEM(naddr->addr, naddr->addr_len);
+ break;
+ default:
+ goto xdr_error;
+ }
+ DECODE_TAIL;
+}
+
static __be32
nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
{
DECODE_HEAD;
- unsigned int tmp;
+ struct nl4_server ns_dummy;
+ int i, count;

status = nfsd4_decode_stateid(argp, &copy->cp_src_stateid);
if (status)
@@ -1762,8 +1810,25 @@ static __be32 nfsd4_decode_reclaim_complete(struct nfsd4_compoundargs *argp, str
p = xdr_decode_hyper(p, &copy->cp_count);
p++; /* ca_consecutive: we always do consecutive copies */
copy->cp_synchronous = be32_to_cpup(p++);
- tmp = be32_to_cpup(p); /* Source server list not supported */
+ count = be32_to_cpup(p++);

+ copy->cp_intra = false;
+ if (count == 0) { /* intra-server copy */
+ copy->cp_intra = true;
+ goto intra;
+ }
+
+ /* decode all the supplied server addresses but use first */
+ status = nfsd4_decode_nl4_server(argp, &copy->cp_src);
+ if (status)
+ return status;
+
+ for (i = 0; i < count - 1; i++) {
+ status = nfsd4_decode_nl4_server(argp, &ns_dummy);
+ if (status)
+ return status;
+ }
+intra:
DECODE_TAIL;
}

diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index feeb6d4..513c9ff 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -516,11 +516,13 @@ struct nfsd42_write_res {

struct nfsd4_copy {
/* request */
- stateid_t cp_src_stateid;
- stateid_t cp_dst_stateid;
- u64 cp_src_pos;
- u64 cp_dst_pos;
- u64 cp_count;
+ stateid_t cp_src_stateid;
+ stateid_t cp_dst_stateid;
+ u64 cp_src_pos;
+ u64 cp_dst_pos;
+ u64 cp_count;
+ struct nl4_server cp_src;
+ bool cp_intra;

/* both */
bool cp_synchronous;
--
1.8.3.1


2018-11-30 20:04:03

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH v2 05/10] NFSD return nfs4_stid in nfs4_preprocess_stateid_op

Needed for copy to add nfs4_cp_state to the nfs4_stid.

Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Olga Kornievskaia <[email protected]>
---
fs/nfsd/nfs4proc.c | 17 ++++++++++-------
fs/nfsd/nfs4state.c | 8 ++++++--
fs/nfsd/state.h | 3 ++-
3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index d505990..0152b34 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -781,7 +781,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
/* check stateid */
status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
&read->rd_stateid, RD_STATE,
- &read->rd_filp, &read->rd_tmp_file);
+ &read->rd_filp, &read->rd_tmp_file,
+ NULL);
if (status) {
dprintk("NFSD: nfsd4_read: couldn't process stateid!\n");
goto out;
@@ -954,7 +955,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
status = nfs4_preprocess_stateid_op(rqstp, cstate,
&cstate->current_fh, &setattr->sa_stateid,
- WR_STATE, NULL, NULL);
+ WR_STATE, NULL, NULL, NULL);
if (status) {
dprintk("NFSD: nfsd4_setattr: couldn't process stateid!\n");
return status;
@@ -1005,7 +1006,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
trace_nfsd_write_start(rqstp, &cstate->current_fh,
write->wr_offset, cnt);
status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
- stateid, WR_STATE, &filp, NULL);
+ stateid, WR_STATE, &filp, NULL, NULL);
if (status) {
dprintk("NFSD: nfsd4_write: couldn't process stateid!\n");
return status;
@@ -1042,14 +1043,16 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
return nfserr_nofilehandle;

status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->save_fh,
- src_stateid, RD_STATE, src, NULL);
+ src_stateid, RD_STATE, src, NULL,
+ NULL);
if (status) {
dprintk("NFSD: %s: couldn't process src stateid!\n", __func__);
goto out;
}

status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
- dst_stateid, WR_STATE, dst, NULL);
+ dst_stateid, WR_STATE, dst, NULL,
+ NULL);
if (status) {
dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
goto out_put_src;
@@ -1353,7 +1356,7 @@ struct nfsd4_copy *

status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
&fallocate->falloc_stateid,
- WR_STATE, &file, NULL);
+ WR_STATE, &file, NULL, NULL);
if (status != nfs_ok) {
dprintk("NFSD: nfsd4_fallocate: couldn't process stateid!\n");
return status;
@@ -1412,7 +1415,7 @@ struct nfsd4_copy *

status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
&seek->seek_stateid,
- RD_STATE, &file, NULL);
+ RD_STATE, &file, NULL, NULL);
if (status) {
dprintk("NFSD: nfsd4_seek: couldn't process stateid!\n");
return status;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index f093fbe..be3e967 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5158,7 +5158,8 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
__be32
nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
- stateid_t *stateid, int flags, struct file **filpp, bool *tmp_file)
+ stateid_t *stateid, int flags, struct file **filpp,
+ bool *tmp_file, struct nfs4_stid **cstid)
{
struct inode *ino = d_inode(fhp->fh_dentry);
struct net *net = SVC_NET(rqstp);
@@ -5209,8 +5210,11 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
if (!status && filpp)
status = nfs4_check_file(rqstp, fhp, s, filpp, tmp_file, flags);
out:
- if (s)
+ if (s) {
+ if (!status && cstid)
+ *cstid = s;
nfs4_put_stid(s);
+ }
return status;
}

diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 6aacb32..304de3b 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -606,7 +606,8 @@ struct nfsd4_blocked_lock {

extern __be32 nfs4_preprocess_stateid_op(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate, struct svc_fh *fhp,
- stateid_t *stateid, int flags, struct file **filp, bool *tmp_file);
+ stateid_t *stateid, int flags, struct file **filp,
+ bool *tmp_file, struct nfs4_stid **cstid);
__be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
stateid_t *stateid, unsigned char typemask,
struct nfs4_stid **s, struct nfsd_net *nn);
--
1.8.3.1


2018-11-30 20:04:05

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation

Introducing the COPY_NOTIFY operation.

Create a new unique stateid that will keep track of the copy
state and the upcoming READs that will use that stateid. Keep
it in the list associated with parent stateid.

Return single netaddr to advertise to the copy.

Signed-off-by: Andy Adamson <[email protected]>
Signed-off-by: Olga Kornievskaia <[email protected]>
---
fs/nfsd/nfs4proc.c | 72 +++++++++++++++++++++++++++++++++++----
fs/nfsd/nfs4state.c | 64 +++++++++++++++++++++++++++++++----
fs/nfsd/nfs4xdr.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
fs/nfsd/state.h | 18 ++++++++--
fs/nfsd/xdr4.h | 13 +++++++
5 files changed, 248 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 0152b34..51fca9e 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -37,6 +37,7 @@
#include <linux/falloc.h>
#include <linux/slab.h>
#include <linux/kthread.h>
+#include <linux/sunrpc/addr.h>

#include "idmap.h"
#include "cache.h"
@@ -1035,7 +1036,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
static __be32
nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
stateid_t *src_stateid, struct file **src,
- stateid_t *dst_stateid, struct file **dst)
+ stateid_t *dst_stateid, struct file **dst,
+ struct nfs4_stid **stid)
{
__be32 status;

@@ -1052,7 +1054,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)

status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
dst_stateid, WR_STATE, dst, NULL,
- NULL);
+ stid);
if (status) {
dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
goto out_put_src;
@@ -1083,7 +1085,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
__be32 status;

status = nfsd4_verify_copy(rqstp, cstate, &clone->cl_src_stateid, &src,
- &clone->cl_dst_stateid, &dst);
+ &clone->cl_dst_stateid, &dst, NULL);
if (status)
goto out;

@@ -1230,7 +1232,7 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)

static void cleanup_async_copy(struct nfsd4_copy *copy)
{
- nfs4_free_cp_state(copy);
+ nfs4_free_copy_state(copy);
fput(copy->file_dst);
fput(copy->file_src);
spin_lock(&copy->cp_clp->async_lock);
@@ -1270,7 +1272,7 @@ static int nfsd4_do_async_copy(void *data)

status = nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
&copy->file_src, &copy->cp_dst_stateid,
- &copy->file_dst);
+ &copy->file_dst, NULL);
if (status)
goto out;

@@ -1284,7 +1286,7 @@ static int nfsd4_do_async_copy(void *data)
async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
if (!async_copy)
goto out;
- if (!nfs4_init_cp_state(nn, copy)) {
+ if (!nfs4_init_copy_state(nn, copy)) {
kfree(async_copy);
goto out;
}
@@ -1348,6 +1350,43 @@ struct nfsd4_copy *
}

static __be32
+nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ union nfsd4_op_u *u)
+{
+ struct nfsd4_copy_notify *cn = &u->copy_notify;
+ __be32 status;
+ struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+ struct nfs4_stid *stid;
+ struct nfs4_cpntf_state *cps;
+
+ status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+ &cn->cpn_src_stateid, RD_STATE, NULL,
+ NULL, &stid);
+ if (status)
+ return status;
+
+ cn->cpn_sec = nn->nfsd4_lease;
+ cn->cpn_nsec = 0;
+
+ status = nfserrno(-ENOMEM);
+ cps = nfs4_alloc_init_cpntf_state(nn, stid);
+ if (!cps)
+ return status;
+ memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid, sizeof(stateid_t));
+
+ /**
+ * For now, only return one server address in cpn_src, the
+ * address used by the client to connect to this server.
+ */
+ cn->cpn_src.nl4_type = NL4_NETADDR;
+ status = nfsd4_set_netaddr((struct sockaddr *)&rqstp->rq_daddr,
+ &cn->cpn_src.u.nl4_addr);
+ WARN_ON_ONCE(status);
+
+ return status;
+}
+
+static __be32
nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
struct nfsd4_fallocate *fallocate, int flags)
{
@@ -2299,6 +2338,21 @@ static inline u32 nfsd4_offload_status_rsize(struct svc_rqst *rqstp,
1 /* osr_complete<1> optional 0 for now */) * sizeof(__be32);
}

+static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp,
+ struct nfsd4_op *op)
+{
+ return (op_encode_hdr_size +
+ 3 /* cnr_lease_time */ +
+ 1 /* We support one cnr_source_server */ +
+ 1 /* cnr_stateid seq */ +
+ op_encode_stateid_maxsz /* cnr_stateid */ +
+ 1 /* num cnr_source_server*/ +
+ 1 /* nl4_type */ +
+ 1 /* nl4 size */ +
+ XDR_QUADLEN(NFS4_OPAQUE_LIMIT) /*nl4_loc + nl4_loc_sz */)
+ * sizeof(__be32);
+}
+
#ifdef CONFIG_NFSD_PNFS
static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
@@ -2723,6 +2777,12 @@ static inline u32 nfsd4_seek_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
.op_name = "OP_OFFLOAD_CANCEL",
.op_rsize_bop = nfsd4_only_status_rsize,
},
+ [OP_COPY_NOTIFY] = {
+ .op_func = nfsd4_copy_notify,
+ .op_flags = OP_MODIFIES_SOMETHING,
+ .op_name = "OP_COPY_NOTIFY",
+ .op_rsize_bop = nfsd4_copy_notify_rsize,
+ },
};

/**
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index be3e967..eaa136f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -697,6 +697,7 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
/* Will be incremented before return to client: */
refcount_set(&stid->sc_count, 1);
spin_lock_init(&stid->sc_lock);
+ INIT_LIST_HEAD(&stid->sc_cp_list);

/*
* It shouldn't be a problem to reuse an opaque stateid value.
@@ -716,24 +717,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
/*
* Create a unique stateid_t to represent each COPY.
*/
-int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
+static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr, stateid_t *stid)
{
int new_id;

idr_preload(GFP_KERNEL);
spin_lock(&nn->s2s_cp_lock);
- new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0, GFP_NOWAIT);
+ new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0, GFP_NOWAIT);
spin_unlock(&nn->s2s_cp_lock);
idr_preload_end();
if (new_id < 0)
return 0;
- copy->cp_stateid.si_opaque.so_id = new_id;
- copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
- copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
+ stid->si_opaque.so_id = new_id;
+ stid->si_opaque.so_clid.cl_boot = nn->boot_time;
+ stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
return 1;
}

-void nfs4_free_cp_state(struct nfsd4_copy *copy)
+int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
+{
+ return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
+}
+
+struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn,
+ struct nfs4_stid *p_stid)
+{
+ struct nfs4_cpntf_state *cps;
+
+ cps = kzalloc(sizeof(struct nfs4_cpntf_state), GFP_KERNEL);
+ if (!cps)
+ return NULL;
+ if (!nfs4_init_cp_state(nn, cps, &cps->cp_stateid))
+ goto out_free;
+ cps->cp_p_stid = p_stid;
+ cps->cp_active = false;
+ cps->cp_timeout = jiffies + (nn->nfsd4_lease * HZ);
+ INIT_LIST_HEAD(&cps->cp_list);
+ spin_lock(&nn->s2s_cp_lock);
+ list_add(&cps->cp_list, &p_stid->sc_cp_list);
+ spin_unlock(&nn->s2s_cp_lock);
+
+ return cps;
+out_free:
+ kfree(cps);
+ return NULL;
+}
+
+void nfs4_free_copy_state(struct nfsd4_copy *copy)
{
struct nfsd_net *nn;

@@ -743,6 +773,27 @@ void nfs4_free_cp_state(struct nfsd4_copy *copy)
spin_unlock(&nn->s2s_cp_lock);
}

+static void nfs4_free_cpntf_statelist(struct net *net, struct nfs4_stid *stid)
+{
+ struct nfs4_cpntf_state *cps;
+ struct nfsd_net *nn;
+
+ nn = net_generic(net, nfsd_net_id);
+
+ might_sleep();
+
+ spin_lock(&nn->s2s_cp_lock);
+ while (!list_empty(&stid->sc_cp_list)) {
+ cps = list_first_entry(&stid->sc_cp_list,
+ struct nfs4_cpntf_state, cp_list);
+ list_del(&cps->cp_list);
+ idr_remove(&nn->s2s_cp_stateids,
+ cps->cp_stateid.si_opaque.so_id);
+ kfree(cps);
+ }
+ spin_unlock(&nn->s2s_cp_lock);
+}
+
static struct nfs4_ol_stateid * nfs4_alloc_open_stateid(struct nfs4_client *clp)
{
struct nfs4_stid *stid;
@@ -891,6 +942,7 @@ static void block_delegations(struct knfsd_fh *fh)
}
idr_remove(&clp->cl_stateids, s->sc_stateid.si_opaque.so_id);
spin_unlock(&clp->cl_lock);
+ nfs4_free_cpntf_statelist(clp->net, s);
s->sc_free(s);
if (fp)
put_nfs4_file(fp);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 879ddc6..c9fb625 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1840,6 +1840,22 @@ static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp,
}

static __be32
+nfsd4_decode_copy_notify(struct nfsd4_compoundargs *argp,
+ struct nfsd4_copy_notify *cn)
+{
+ int status;
+
+ status = nfsd4_decode_stateid(argp, &cn->cpn_src_stateid);
+ if (status)
+ return status;
+ status = nfsd4_decode_nl4_server(argp, &cn->cpn_dst);
+ if (status)
+ return status;
+
+ return status;
+}
+
+static __be32
nfsd4_decode_seek(struct nfsd4_compoundargs *argp, struct nfsd4_seek *seek)
{
DECODE_HEAD;
@@ -1940,7 +1956,7 @@ static __be32 nfsd4_decode_nl4_server(struct nfsd4_compoundargs *argp,
/* new operations for NFSv4.2 */
[OP_ALLOCATE] = (nfsd4_dec)nfsd4_decode_fallocate,
[OP_COPY] = (nfsd4_dec)nfsd4_decode_copy,
- [OP_COPY_NOTIFY] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_COPY_NOTIFY] = (nfsd4_dec)nfsd4_decode_copy_notify,
[OP_DEALLOCATE] = (nfsd4_dec)nfsd4_decode_fallocate,
[OP_IO_ADVISE] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_LAYOUTERROR] = (nfsd4_dec)nfsd4_decode_notsupp,
@@ -4325,6 +4341,45 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
}

static __be32
+nfsd42_encode_nl4_server(struct nfsd4_compoundres *resp, struct nl4_server *ns)
+{
+ struct xdr_stream *xdr = &resp->xdr;
+ struct nfs42_netaddr *addr;
+ __be32 *p;
+
+ p = xdr_reserve_space(xdr, 4);
+ *p++ = cpu_to_be32(ns->nl4_type);
+
+ switch (ns->nl4_type) {
+ case NL4_NETADDR:
+ addr = &ns->u.nl4_addr;
+
+ /** netid_len, netid, uaddr_len, uaddr (port included
+ * in RPCBIND_MAXUADDRLEN)
+ */
+ p = xdr_reserve_space(xdr,
+ 4 /* netid len */ +
+ (XDR_QUADLEN(addr->netid_len) * 4) +
+ 4 /* uaddr len */ +
+ (XDR_QUADLEN(addr->addr_len) * 4));
+ if (!p)
+ return nfserr_resource;
+
+ *p++ = cpu_to_be32(addr->netid_len);
+ p = xdr_encode_opaque_fixed(p, addr->netid,
+ addr->netid_len);
+ *p++ = cpu_to_be32(addr->addr_len);
+ p = xdr_encode_opaque_fixed(p, addr->addr,
+ addr->addr_len);
+ break;
+ default:
+ WARN_ON(ns->nl4_type != NL4_NETADDR);
+ }
+
+ return 0;
+}
+
+static __be32
nfsd4_encode_copy(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_copy *copy)
{
@@ -4358,6 +4413,44 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
}

static __be32
+nfsd4_encode_copy_notify(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_copy_notify *cn)
+{
+ struct xdr_stream *xdr = &resp->xdr;
+ __be32 *p;
+
+ if (nfserr)
+ return nfserr;
+
+ /* 8 sec, 4 nsec */
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
+ return nfserr_resource;
+
+ /* cnr_lease_time */
+ p = xdr_encode_hyper(p, cn->cpn_sec);
+ *p++ = cpu_to_be32(cn->cpn_nsec);
+
+ /* cnr_stateid */
+ nfserr = nfsd4_encode_stateid(xdr, &cn->cpn_cnr_stateid);
+ if (nfserr)
+ return nfserr;
+
+ /* cnr_src.nl_nsvr */
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
+
+ *p++ = cpu_to_be32(1);
+
+ nfserr = nfsd42_encode_nl4_server(resp, &cn->cpn_src);
+ if (nfserr)
+ return nfserr;
+
+ return nfserr;
+}
+
+static __be32
nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_seek *seek)
{
@@ -4454,7 +4547,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
/* NFSv4.2 operations */
[OP_ALLOCATE] = (nfsd4_enc)nfsd4_encode_noop,
[OP_COPY] = (nfsd4_enc)nfsd4_encode_copy,
- [OP_COPY_NOTIFY] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_COPY_NOTIFY] = (nfsd4_enc)nfsd4_encode_copy_notify,
[OP_DEALLOCATE] = (nfsd4_enc)nfsd4_encode_noop,
[OP_IO_ADVISE] = (nfsd4_enc)nfsd4_encode_noop,
[OP_LAYOUTERROR] = (nfsd4_enc)nfsd4_encode_noop,
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 304de3b..31b12b1 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -94,6 +94,7 @@ struct nfs4_stid {
#define NFS4_REVOKED_DELEG_STID 16
#define NFS4_CLOSED_DELEG_STID 32
#define NFS4_LAYOUT_STID 64
+ struct list_head sc_cp_list;
unsigned char sc_type;
stateid_t sc_stateid;
spinlock_t sc_lock;
@@ -102,6 +103,17 @@ struct nfs4_stid {
void (*sc_free)(struct nfs4_stid *);
};

+/* Keep a list of stateids issued by the COPY_NOTIFY, associate it with the
+ * parent OPEN/LOCK/DELEG stateid.
+ */
+struct nfs4_cpntf_state {
+ stateid_t cp_stateid;
+ struct list_head cp_list; /* per parent nfs4_stid */
+ struct nfs4_stid *cp_p_stid; /* pointer to parent */
+ bool cp_active; /* has the copy started */
+ unsigned long cp_timeout; /* copy timeout */
+};
+
/*
* Represents a delegation stateid. The nfs4_client holds references to these
* and they are put when it is being destroyed or when the delegation is
@@ -613,8 +625,10 @@ __be32 nfsd4_lookup_stateid(struct nfsd4_compound_state *cstate,
struct nfs4_stid **s, struct nfsd_net *nn);
struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *slab,
void (*sc_free)(struct nfs4_stid *));
-int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy);
-void nfs4_free_cp_state(struct nfsd4_copy *copy);
+int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy);
+void nfs4_free_copy_state(struct nfsd4_copy *copy);
+struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn,
+ struct nfs4_stid *p_stid);
void nfs4_unhash_stid(struct nfs4_stid *s);
void nfs4_put_stid(struct nfs4_stid *s);
void nfs4_inc_and_copy_stateid(stateid_t *dst, struct nfs4_stid *stid);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 513c9ff..bade8e5 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -568,6 +568,18 @@ struct nfsd4_offload_status {
u32 status;
};

+struct nfsd4_copy_notify {
+ /* request */
+ stateid_t cpn_src_stateid;
+ struct nl4_server cpn_dst;
+
+ /* response */
+ stateid_t cpn_cnr_stateid;
+ u64 cpn_sec;
+ u32 cpn_nsec;
+ struct nl4_server cpn_src;
+};
+
struct nfsd4_op {
int opnum;
const struct nfsd4_operation * opdesc;
@@ -627,6 +639,7 @@ struct nfsd4_op {
struct nfsd4_clone clone;
struct nfsd4_copy copy;
struct nfsd4_offload_status offload_status;
+ struct nfsd4_copy_notify copy_notify;
struct nfsd4_seek seek;
} u;
struct nfs4_replay * replay;
--
1.8.3.1


2018-11-30 20:04:06

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH v2 07/10] NFSD check stateids against copy stateids

Incoming stateid (used by a READ) could be a saved copy stateid.
On first use make it active and check that the copy has started
within the allowable lease time.

Signed-off-by: Olga Kornievskaia <[email protected]>
---
fs/nfsd/nfs4state.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index eaa136f..7b3586ab 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5203,6 +5203,49 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)

return 0;
}
+/*
+ * A READ from an inter server to server COPY will have a
+ * copy stateid. Return the parent nfs4_stid.
+ */
+static __be32 _find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
+ struct nfs4_cpntf_state **cps)
+{
+ struct nfs4_cpntf_state *state = NULL;
+
+ if (st->si_opaque.so_clid.cl_id != nn->s2s_cp_cl_id)
+ return nfserr_bad_stateid;
+ spin_lock(&nn->s2s_cp_lock);
+ state = idr_find(&nn->s2s_cp_stateids, st->si_opaque.so_id);
+ if (state)
+ refcount_inc(&state->cp_p_stid->sc_count);
+ spin_unlock(&nn->s2s_cp_lock);
+ if (!state)
+ return nfserr_bad_stateid;
+ *cps = state;
+ return 0;
+}
+
+static __be32 find_cpntf_state(struct nfsd_net *nn, stateid_t *st,
+ struct nfs4_stid **stid)
+{
+ __be32 status;
+ struct nfs4_cpntf_state *cps = NULL;
+
+ status = _find_cpntf_state(nn, st, &cps);
+ if (status)
+ return status;
+
+ /* Did the inter server to server copy start in time? */
+ if (cps->cp_active == false && !time_after(cps->cp_timeout, jiffies)) {
+ nfs4_put_stid(cps->cp_p_stid);
+ return nfserr_partner_no_auth;
+ } else
+ cps->cp_active = true;
+
+ *stid = cps->cp_p_stid;
+
+ return nfs_ok;
+}

/*
* Checks for stateid operations
@@ -5235,6 +5278,8 @@ static __be32 nfsd4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
status = nfsd4_lookup_stateid(cstate, stateid,
NFS4_DELEG_STID|NFS4_OPEN_STID|NFS4_LOCK_STID,
&s, nn);
+ if (status == nfserr_bad_stateid)
+ status = find_cpntf_state(nn, stateid, &s);
if (status)
return status;
status = nfsd4_stid_check_stateid_generation(stateid, s,
--
1.8.3.1


2018-11-30 20:04:08

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH v2 08/10] NFSD generalize nfsd4_compound_state flag names

From: Olga Kornievskaia <[email protected]>

Allow for sid_flag field non-stateid use.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfsd/nfs4proc.c | 8 ++++----
fs/nfsd/nfs4state.c | 7 ++++---
fs/nfsd/xdr4.h | 6 +++---
3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 51fca9e..70d03e9 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -530,9 +530,9 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
return nfserr_restorefh;

fh_dup2(&cstate->current_fh, &cstate->save_fh);
- if (HAS_STATE_ID(cstate, SAVED_STATE_ID_FLAG)) {
+ if (HAS_CSTATE_FLAG(cstate, SAVED_STATE_ID_FLAG)) {
memcpy(&cstate->current_stateid, &cstate->save_stateid, sizeof(stateid_t));
- SET_STATE_ID(cstate, CURRENT_STATE_ID_FLAG);
+ SET_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG);
}
return nfs_ok;
}
@@ -542,9 +542,9 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
union nfsd4_op_u *u)
{
fh_dup2(&cstate->save_fh, &cstate->current_fh);
- if (HAS_STATE_ID(cstate, CURRENT_STATE_ID_FLAG)) {
+ if (HAS_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG)) {
memcpy(&cstate->save_stateid, &cstate->current_stateid, sizeof(stateid_t));
- SET_STATE_ID(cstate, SAVED_STATE_ID_FLAG);
+ SET_CSTATE_FLAG(cstate, SAVED_STATE_ID_FLAG);
}
return nfs_ok;
}
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 7b3586ab..3f5fb0b 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7423,7 +7423,8 @@ static int nfs4_state_create_net(struct net *net)
static void
get_stateid(struct nfsd4_compound_state *cstate, stateid_t *stateid)
{
- if (HAS_STATE_ID(cstate, CURRENT_STATE_ID_FLAG) && CURRENT_STATEID(stateid))
+ if (HAS_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG) &&
+ CURRENT_STATEID(stateid))
memcpy(stateid, &cstate->current_stateid, sizeof(stateid_t));
}

@@ -7432,14 +7433,14 @@ static int nfs4_state_create_net(struct net *net)
{
if (cstate->minorversion) {
memcpy(&cstate->current_stateid, stateid, sizeof(stateid_t));
- SET_STATE_ID(cstate, CURRENT_STATE_ID_FLAG);
+ SET_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG);
}
}

void
clear_current_stateid(struct nfsd4_compound_state *cstate)
{
- CLEAR_STATE_ID(cstate, CURRENT_STATE_ID_FLAG);
+ CLEAR_CSTATE_FLAG(cstate, CURRENT_STATE_ID_FLAG);
}

/*
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index bade8e5..9d7318c 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -46,9 +46,9 @@
#define CURRENT_STATE_ID_FLAG (1<<0)
#define SAVED_STATE_ID_FLAG (1<<1)

-#define SET_STATE_ID(c, f) ((c)->sid_flags |= (f))
-#define HAS_STATE_ID(c, f) ((c)->sid_flags & (f))
-#define CLEAR_STATE_ID(c, f) ((c)->sid_flags &= ~(f))
+#define SET_CSTATE_FLAG(c, f) ((c)->sid_flags |= (f))
+#define HAS_CSTATE_FLAG(c, f) ((c)->sid_flags & (f))
+#define CLEAR_CSTATE_FLAG(c, f) ((c)->sid_flags &= ~(f))

struct nfsd4_compound_state {
struct svc_fh current_fh;
--
1.8.3.1


2018-11-30 20:04:09

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH v2 09/10] NFSD: allow inter server COPY to have a STALE source server fh

The inter server to server COPY source server filehandle
is a foreign filehandle as the COPY is sent to the destination
server.

Signed-off-by: Olga Kornievskaia <[email protected]>
---
fs/nfsd/Kconfig | 10 ++++++++++
fs/nfsd/nfs4proc.c | 41 ++++++++++++++++++++++++++++++++++++-----
fs/nfsd/nfsfh.h | 5 ++++-
fs/nfsd/xdr4.h | 1 +
4 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index 20b1c17..37ff3d5 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -131,6 +131,16 @@ config NFSD_FLEXFILELAYOUT

If unsure, say N.

+config NFSD_V4_2_INTER_SSC
+ bool "NFSv4.2 inter server to server COPY"
+ depends on NFSD_V4 && NFS_V4_1 && NFS_V4_2
+ help
+ This option enables support for NFSv4.2 inter server to
+ server copy where the destination server calls the NFSv4.2
+ client to read the data to copy from the source server.
+
+ If unsure, say N.
+
config NFSD_V4_SECURITY_LABEL
bool "Provide Security Label support for NFSv4 server"
depends on NFSD_V4 && SECURITY
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 70d03e9..2e28254 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -503,12 +503,20 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
union nfsd4_op_u *u)
{
struct nfsd4_putfh *putfh = &u->putfh;
+ __be32 ret;

fh_put(&cstate->current_fh);
cstate->current_fh.fh_handle.fh_size = putfh->pf_fhlen;
memcpy(&cstate->current_fh.fh_handle.fh_base, putfh->pf_fhval,
putfh->pf_fhlen);
- return fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
+ ret = fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
+#ifdef CONFIG_NFSD_V4_2_INTER_SSC
+ if (ret == nfserr_stale && putfh->no_verify) {
+ SET_FH_FLAG(&cstate->current_fh, NFSD4_FH_FOREIGN);
+ ret = 0;
+ }
+#endif
+ return ret;
}

static __be32
@@ -1967,11 +1975,12 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
{
struct nfsd4_compoundargs *args = rqstp->rq_argp;
struct nfsd4_compoundres *resp = rqstp->rq_resp;
- struct nfsd4_op *op;
+ struct nfsd4_op *op, *current_op, *saved_op;
struct nfsd4_compound_state *cstate = &resp->cstate;
struct svc_fh *current_fh = &cstate->current_fh;
struct svc_fh *save_fh = &cstate->save_fh;
__be32 status;
+ int i;

svcxdr_init_encode(rqstp, resp);
resp->tagp = resp->xdr.p;
@@ -2006,6 +2015,27 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
resp->opcnt = 1;
goto encode_op;
}
+#ifdef CONFIG_NFSD_V4_2_INTER_SSC
+ /* traverse all operation and if it's a COPY compound, mark the
+ * source filehandle to skip verification
+ */
+ for (i = 0; i < args->opcnt; i++) {
+ op = &args->ops[i];
+ if (op->opnum == OP_PUTFH)
+ current_op = op;
+ else if (op->opnum == OP_SAVEFH)
+ saved_op = current_op;
+ else if (op->opnum == OP_RESTOREFH)
+ current_op = saved_op;
+ else if (op->opnum == OP_COPY) {
+ struct nfsd4_copy *copy = (struct nfsd4_copy *)&op[i].u;
+ struct nfsd4_putfh *putfh =
+ (struct nfsd4_putfh *)&saved_op->u;
+ if (!copy->cp_intra)
+ putfh->no_verify = true;
+ }
+ }
+#endif

trace_nfsd_compound(rqstp, args->opcnt);
while (!status && resp->opcnt < args->opcnt) {
@@ -2021,13 +2051,14 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
op->status = nfsd4_open_omfg(rqstp, cstate, op);
goto encode_op;
}
-
- if (!current_fh->fh_dentry) {
+ if (!current_fh->fh_dentry &&
+ !HAS_FH_FLAG(current_fh, NFSD4_FH_FOREIGN)) {
if (!(op->opdesc->op_flags & ALLOWED_WITHOUT_FH)) {
op->status = nfserr_nofilehandle;
goto encode_op;
}
- } else if (current_fh->fh_export->ex_fslocs.migrated &&
+ } else if (current_fh->fh_export &&
+ current_fh->fh_export->ex_fslocs.migrated &&
!(op->opdesc->op_flags & ALLOWED_ON_ABSENT_FS)) {
op->status = nfserr_moved;
goto encode_op;
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 755e256..b9c7568 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -35,7 +35,7 @@ static inline ino_t u32_to_ino_t(__u32 uino)

bool fh_locked; /* inode locked by us */
bool fh_want_write; /* remount protection taken */
-
+ int fh_flags; /* FH flags */
#ifdef CONFIG_NFSD_V3
bool fh_post_saved; /* post-op attrs saved */
bool fh_pre_saved; /* pre-op attrs saved */
@@ -56,6 +56,9 @@ static inline ino_t u32_to_ino_t(__u32 uino)
#endif /* CONFIG_NFSD_V3 */

} svc_fh;
+#define NFSD4_FH_FOREIGN (1<<0)
+#define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f))
+#define HAS_FH_FLAG(c, f) ((c)->fh_flags & (f))

enum nfsd_fsid {
FSID_DEV = 0,
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 9d7318c..fbd18d6 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -221,6 +221,7 @@ struct nfsd4_lookup {
struct nfsd4_putfh {
u32 pf_fhlen; /* request */
char *pf_fhval; /* request */
+ bool no_verify; /* represents foreigh fh */
};

struct nfsd4_open {
--
1.8.3.1


2018-11-30 20:04:12

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH v2 10/10] NFSD add nfs4 inter ssc to nfsd4_copy

Given a universal address, mount the source server from the destination
server. Use an internal mount. Call the NFS client nfs42_ssc_open to
obtain the NFS struct file suitable for nfsd_copy_range.

Ability to do "inter" server-to-server depends on the an nfsd kernel
parameter "inter_copy_offload_enabled".

Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfsd/nfs4proc.c | 274 +++++++++++++++++++++++++++++++++++++++++++++++----
fs/nfsd/nfssvc.c | 6 ++
fs/nfsd/xdr4.h | 5 +
include/linux/nfs4.h | 1 +
4 files changed, 269 insertions(+), 17 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 2e28254..238c4b7 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1155,6 +1155,209 @@ void nfsd4_shutdown_copy(struct nfs4_client *clp)
while ((copy = nfsd4_get_copy(clp)) != NULL)
nfsd4_stop_copy(copy);
}
+#ifdef CONFIG_NFSD_V4_2_INTER_SSC
+
+extern struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
+ struct nfs_fh *src_fh,
+ nfs4_stateid *stateid);
+extern void nfs42_ssc_close(struct file *filep);
+
+extern void nfs_sb_deactive(struct super_block *sb);
+
+#define NFSD42_INTERSSC_MOUNTOPS "minorversion=2,vers=4,addr=%s"
+
+/**
+ * Support one copy source server for now.
+ */
+static __be32
+nfsd4_interssc_connect(struct nl4_server *nss, struct svc_rqst *rqstp,
+ struct vfsmount **mount)
+{
+ struct file_system_type *type;
+ struct vfsmount *ss_mnt;
+ struct nfs42_netaddr *naddr;
+ struct sockaddr_storage tmp_addr;
+ size_t tmp_addrlen, match_netid_len = 3;
+ char *startsep = "", *endsep = "", *match_netid = "tcp";
+ char *ipaddr, *dev_name, *raw_data;
+ int len, raw_len, status = -EINVAL;
+
+ naddr = &nss->u.nl4_addr;
+ tmp_addrlen = rpc_uaddr2sockaddr(SVC_NET(rqstp), naddr->addr,
+ naddr->addr_len,
+ (struct sockaddr *)&tmp_addr,
+ sizeof(tmp_addr));
+ if (tmp_addrlen == 0)
+ goto out_err;
+
+ if (tmp_addr.ss_family == AF_INET6) {
+ startsep = "[";
+ endsep = "]";
+ match_netid = "tcp6";
+ match_netid_len = 4;
+ }
+
+ if (naddr->netid_len != match_netid_len ||
+ strncmp(naddr->netid, match_netid, naddr->netid_len))
+ goto out_err;
+
+ /* Construct the raw data for the vfs_kern_mount call */
+ len = RPC_MAX_ADDRBUFLEN + 1;
+ ipaddr = kzalloc(len, GFP_KERNEL);
+ if (!ipaddr)
+ goto out_err;
+
+ rpc_ntop((struct sockaddr *)&tmp_addr, ipaddr, len);
+
+ /* 2 for ipv6 endsep and startsep. 3 for ":/" and trailing '/0'*/
+
+ raw_len = strlen(NFSD42_INTERSSC_MOUNTOPS) + strlen(ipaddr);
+ raw_data = kzalloc(raw_len, GFP_KERNEL);
+ if (!raw_data)
+ goto out_free_ipaddr;
+
+ snprintf(raw_data, raw_len, NFSD42_INTERSSC_MOUNTOPS, ipaddr);
+
+ status = -ENODEV;
+ type = get_fs_type("nfs");
+ if (!type)
+ goto out_free_rawdata;
+
+ /* Set the server:<export> for the vfs_kern_mount call */
+ dev_name = kzalloc(len + 5, GFP_KERNEL);
+ if (!dev_name)
+ goto out_free_rawdata;
+ snprintf(dev_name, len + 5, "%s%s%s:/", startsep, ipaddr, endsep);
+
+ /* Use an 'internal' mount: MS_KERNMOUNT -> MNT_INTERNAL */
+ ss_mnt = vfs_kern_mount(type, MS_KERNMOUNT, dev_name, raw_data);
+ module_put(type->owner);
+ if (IS_ERR(ss_mnt))
+ goto out_free_devname;
+
+ status = 0;
+ *mount = ss_mnt;
+
+out_free_devname:
+ kfree(dev_name);
+out_free_rawdata:
+ kfree(raw_data);
+out_free_ipaddr:
+ kfree(ipaddr);
+out_err:
+ return status;
+}
+
+static void
+nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
+{
+ nfs_sb_deactive(ss_mnt->mnt_sb);
+ mntput(ss_mnt);
+}
+
+/**
+ * nfsd4_setup_inter_ssc
+ *
+ * Verify COPY destination stateid.
+ * Connect to the source server with NFSv4.1.
+ * Create the source struct file for nfsd_copy_range.
+ * Called with COPY cstate:
+ * SAVED_FH: source filehandle
+ * CURRENT_FH: destination filehandle
+ *
+ * Returns errno (not nfserrxxx)
+ */
+static __be32
+nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
+ struct nfsd4_compound_state *cstate,
+ struct nfsd4_copy *copy, struct vfsmount **mount)
+{
+ struct svc_fh *s_fh = NULL;
+ stateid_t *s_stid = &copy->cp_src_stateid;
+ __be32 status = -EINVAL;
+
+ /* Verify the destination stateid and set dst struct file*/
+ status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
+ &copy->cp_dst_stateid,
+ WR_STATE, &copy->file_dst, NULL,
+ NULL);
+ if (status)
+ goto out;
+
+ status = nfsd4_interssc_connect(&copy->cp_src, rqstp, mount);
+ if (status)
+ goto out;
+
+ s_fh = &cstate->save_fh;
+
+ copy->c_fh.size = s_fh->fh_handle.fh_size;
+ memcpy(copy->c_fh.data, &s_fh->fh_handle.fh_base, copy->c_fh.size);
+ copy->stateid.seqid = s_stid->si_generation;
+ memcpy(copy->stateid.other, (void *)&s_stid->si_opaque,
+ sizeof(stateid_opaque_t));
+
+ status = 0;
+out:
+ return status;
+}
+
+static void
+nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *src,
+ struct file *dst)
+{
+ nfs42_ssc_close(src);
+ fput(src);
+ fput(dst);
+ mntput(ss_mnt);
+}
+
+#else /* CONFIG_NFSD_V4_2_INTER_SSC */
+
+static __be32
+nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
+ struct nfsd4_compound_state *cstate,
+ struct nfsd4_copy *copy,
+ struct vfs_mount **mount)
+{
+ *mount = NULL;
+ return -EINVAL;
+}
+
+static void
+nfsd4_cleanup_inter_ssc(struct vfsmount *ss_mnt, struct file *src,
+ struct file *dst)
+{
+}
+
+static void
+nfsd4_interssc_disconnect(struct vfsmount *ss_mnt)
+{
+}
+
+static struct file *nfs42_ssc_open(struct vfsmount *ss_mnt,
+ struct nfs_fh *src_fh,
+ nfs4_stateid *stateid)
+{
+ return NULL;
+}
+#endif /* CONFIG_NFSD_V4_2_INTER_SSC */
+
+static __be32
+nfsd4_setup_intra_ssc(struct svc_rqst *rqstp,
+ struct nfsd4_compound_state *cstate,
+ struct nfsd4_copy *copy)
+{
+ return nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
+ &copy->file_src, &copy->cp_dst_stateid,
+ &copy->file_dst, NULL);
+}
+
+static void
+nfsd4_cleanup_intra_ssc(struct file *src, struct file *dst)
+{
+ fput(src);
+ fput(dst);
+}

static void nfsd4_cb_offload_release(struct nfsd4_callback *cb)
{
@@ -1219,12 +1422,16 @@ static __be32 nfsd4_do_copy(struct nfsd4_copy *copy, bool sync)
status = nfs_ok;
}

- fput(copy->file_src);
- fput(copy->file_dst);
+ if (!copy->cp_intra) /* Inter server SSC */
+ nfsd4_cleanup_inter_ssc(copy->ss_mnt, copy->file_src,
+ copy->file_dst);
+ else
+ nfsd4_cleanup_intra_ssc(copy->file_src, copy->file_dst);
+
return status;
}

-static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
+static int dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
{
dst->cp_src_pos = src->cp_src_pos;
dst->cp_dst_pos = src->cp_dst_pos;
@@ -1234,8 +1441,17 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
memcpy(&dst->fh, &src->fh, sizeof(src->fh));
dst->cp_clp = src->cp_clp;
dst->file_dst = get_file(src->file_dst);
- dst->file_src = get_file(src->file_src);
+ dst->cp_intra = src->cp_intra;
+ if (src->cp_intra) /* for inter, file_src doesn't exist yet */
+ dst->file_src = get_file(src->file_src);
memcpy(&dst->cp_stateid, &src->cp_stateid, sizeof(src->cp_stateid));
+ memcpy(&dst->cp_src, &src->cp_src, sizeof(struct nl4_server));
+ memcpy(&dst->stateid, &src->stateid, sizeof(src->stateid));
+ memcpy(&dst->c_fh, &src->c_fh, sizeof(src->c_fh));
+ dst->ss_mnt = src->ss_mnt;
+
+ return 0;
+
}

static void cleanup_async_copy(struct nfsd4_copy *copy)
@@ -1254,7 +1470,18 @@ static int nfsd4_do_async_copy(void *data)
struct nfsd4_copy *copy = (struct nfsd4_copy *)data;
struct nfsd4_copy *cb_copy;

+ if (!copy->cp_intra) { /* Inter server SSC */
+ copy->file_src = nfs42_ssc_open(copy->ss_mnt, &copy->c_fh,
+ &copy->stateid);
+ if (IS_ERR(copy->file_src)) {
+ copy->nfserr = nfserr_offload_denied;
+ nfsd4_interssc_disconnect(copy->ss_mnt);
+ goto do_callback;
+ }
+ }
+
copy->nfserr = nfsd4_do_copy(copy, 0);
+do_callback:
cb_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
if (!cb_copy)
goto out;
@@ -1278,11 +1505,20 @@ static int nfsd4_do_async_copy(void *data)
__be32 status;
struct nfsd4_copy *async_copy = NULL;

- status = nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
- &copy->file_src, &copy->cp_dst_stateid,
- &copy->file_dst, NULL);
- if (status)
- goto out;
+ if (!copy->cp_intra) { /* Inter server SSC */
+ if (!inter_copy_offload_enable || copy->cp_synchronous) {
+ status = nfserr_notsupp;
+ goto out;
+ }
+ status = nfsd4_setup_inter_ssc(rqstp, cstate, copy,
+ &copy->ss_mnt);
+ if (status)
+ return nfserr_offload_denied;
+ } else {
+ status = nfsd4_setup_intra_ssc(rqstp, cstate, copy);
+ if (status)
+ return status;
+ }

copy->cp_clp = cstate->clp;
memcpy(&copy->fh, &cstate->current_fh.fh_handle,
@@ -1293,15 +1529,15 @@ static int nfsd4_do_async_copy(void *data)
status = nfserrno(-ENOMEM);
async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
if (!async_copy)
- goto out;
- if (!nfs4_init_copy_state(nn, copy)) {
- kfree(async_copy);
- goto out;
- }
+ goto out_err;
+ if (!nfs4_init_copy_state(nn, copy))
+ goto out_err;
refcount_set(&async_copy->refcount, 1);
memcpy(&copy->cp_res.cb_stateid, &copy->cp_stateid,
sizeof(copy->cp_stateid));
- dup_copy_fields(copy, async_copy);
+ status = dup_copy_fields(copy, async_copy);
+ if (status)
+ goto out_err;
async_copy->copy_task = kthread_create(nfsd4_do_async_copy,
async_copy, "%s", "copy thread");
if (IS_ERR(async_copy->copy_task))
@@ -1312,13 +1548,17 @@ static int nfsd4_do_async_copy(void *data)
spin_unlock(&async_copy->cp_clp->async_lock);
wake_up_process(async_copy->copy_task);
status = nfs_ok;
- } else
+ } else {
status = nfsd4_do_copy(copy, 1);
+ }
out:
return status;
out_err:
cleanup_async_copy(async_copy);
- goto out;
+ status = nfserrno(-ENOMEM);
+ if (!copy->cp_intra)
+ nfsd4_interssc_disconnect(copy->ss_mnt);
+ goto out_err;
}

struct nfsd4_copy *
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 89cb484..9d254e7 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -30,6 +30,12 @@

#define NFSDDBG_FACILITY NFSDDBG_SVC

+bool inter_copy_offload_enable;
+EXPORT_SYMBOL_GPL(inter_copy_offload_enable);
+module_param(inter_copy_offload_enable, bool, 0644);
+MODULE_PARM_DESC(inter_copy_offload_enable,
+ "Enable inter server to server copy offload. Default: false");
+
extern struct svc_program nfsd_program;
static int nfsd(void *vrqstp);

diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index fbd18d6..bb2f8e5 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -547,7 +547,12 @@ struct nfsd4_copy {
struct task_struct *copy_task;
refcount_t refcount;
bool stopped;
+
+ struct vfsmount *ss_mnt;
+ struct nfs_fh c_fh;
+ nfs4_stateid stateid;
};
+extern bool inter_copy_offload_enable;

struct nfsd4_seek {
/* request */
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 9e49a6c..fa4b411 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -17,6 +17,7 @@
#include <linux/uidgid.h>
#include <uapi/linux/nfs4.h>
#include <linux/sunrpc/msg_prot.h>
+#include <linux/nfs.h>

enum nfs4_acl_whotype {
NFS4_ACL_WHO_NAMED = 0,
--
1.8.3.1


2018-12-01 08:12:02

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v2 01/10] VFS generic copy_file_range() support

On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
<[email protected]> wrote:
>
> Relax the condition that input files must be from the same
> file systems.
>
> Add checks that input parameters adhere semantics.
>
> If no copy_file_range() support is found, then do generic
> checks for the unsupported page cache ranges, LFS, limits,
> and clear setuid/setgid if not running as root before calling
> do_splice_direct(). Update atime,ctime,mtime afterwards.
>
> Signed-off-by: Olga Kornievskaia <[email protected]>
> ---

This patch is either going to bring you down or make you stronger ;-)

This is not how its done. Behavior change and refactoring mixed into
one patch is wrong for several reasons. And when you relax same sb
check you need to restrict it inside filesystems, like your previous patch
did.

You already had v7 patch reviewed-by 4 developers.
What made you go and change it (and posted as v2)?

Your intentions were good trying to fix the broken syscall, but
I hope you understood that Dave didn't mean that you *have* to
add the missing generic checks as part of your work. He just
pointed out how broken the current interface is in the context of
reviewing your patch.

In any case, I hear that Dave is neck deep in fixing copy_file_range()
so changes to this function should be collaborated with him. Or better
yet, wait until he posts his fixes and carry on from there.

If I were you, I would just go back to the reviewed v7 vfs patch.

Thanks,
Amir.

2018-12-01 13:24:11

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v2 01/10] VFS generic copy_file_range() support

On Sat, Dec 1, 2018 at 3:11 AM Amir Goldstein <[email protected]> wrote:
>
> On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> <[email protected]> wrote:
> >
> > Relax the condition that input files must be from the same
> > file systems.
> >
> > Add checks that input parameters adhere semantics.
> >
> > If no copy_file_range() support is found, then do generic
> > checks for the unsupported page cache ranges, LFS, limits,
> > and clear setuid/setgid if not running as root before calling
> > do_splice_direct(). Update atime,ctime,mtime afterwards.
> >
> > Signed-off-by: Olga Kornievskaia <[email protected]>
> > ---
>
> This patch is either going to bring you down or make you stronger ;-)
>
> This is not how its done. Behavior change and refactoring mixed into
> one patch is wrong for several reasons. And when you relax same sb
> check you need to restrict it inside filesystems, like your previous patch
> did.
>
> You already had v7 patch reviewed-by 4 developers.
> What made you go and change it (and posted as v2)?
>
> Your intentions were good trying to fix the broken syscall, but
> I hope you understood that Dave didn't mean that you *have* to
> add the missing generic checks as part of your work. He just
> pointed out how broken the current interface is in the context of
> reviewing your patch.
>
> In any case, I hear that Dave is neck deep in fixing copy_file_range()
> so changes to this function should be collaborated with him. Or better
> yet, wait until he posts his fixes and carry on from there.
>
> If I were you, I would just go back to the reviewed v7 vfs patch.

This is NOT a replacement to the v7 vfs patch??? This is a new patch
on top of that one.

I assume that v7 patch has been OK-ed by everybody and is ready to go in???

As you recall, what was left is to provide the functionality to relax
the check for the superblocks to be the same before calling the
do_splice_direct(). This patch attempt do this. I was under the
impression that to do so extra checks were needed to be added which I
added.


>
> Thanks,
> Amir.

2018-12-01 13:44:39

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v2 01/10] VFS generic copy_file_range() support

On Sat, Dec 1, 2018 at 8:23 AM Olga Kornievskaia
<[email protected]> wrote:
>
> On Sat, Dec 1, 2018 at 3:11 AM Amir Goldstein <[email protected]> wrote:
> >
> > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> > <[email protected]> wrote:
> > >
> > > Relax the condition that input files must be from the same
> > > file systems.
> > >
> > > Add checks that input parameters adhere semantics.
> > >
> > > If no copy_file_range() support is found, then do generic
> > > checks for the unsupported page cache ranges, LFS, limits,
> > > and clear setuid/setgid if not running as root before calling
> > > do_splice_direct(). Update atime,ctime,mtime afterwards.
> > >
> > > Signed-off-by: Olga Kornievskaia <[email protected]>
> > > ---
> >
> > This patch is either going to bring you down or make you stronger ;-)
> >
> > This is not how its done. Behavior change and refactoring mixed into
> > one patch is wrong for several reasons. And when you relax same sb
> > check you need to restrict it inside filesystems, like your previous patch
> > did.
> >
> > You already had v7 patch reviewed-by 4 developers.
> > What made you go and change it (and posted as v2)?
> >
> > Your intentions were good trying to fix the broken syscall, but
> > I hope you understood that Dave didn't mean that you *have* to
> > add the missing generic checks as part of your work. He just
> > pointed out how broken the current interface is in the context of
> > reviewing your patch.
> >
> > In any case, I hear that Dave is neck deep in fixing copy_file_range()
> > so changes to this function should be collaborated with him. Or better
> > yet, wait until he posts his fixes and carry on from there.
> >
> > If I were you, I would just go back to the reviewed v7 vfs patch.
>
> This is NOT a replacement to the v7 vfs patch??? This is a new patch
> on top of that one.
>
> I assume that v7 patch has been OK-ed by everybody and is ready to go in???
>
> As you recall, what was left is to provide the functionality to relax
> the check for the superblocks to be the same before calling the
> do_splice_direct(). This patch attempt do this. I was under the
> impression that to do so extra checks were needed to be added which I
> added.
>

To clarify, previously I had a VFS patch with the client-side series
to support "server to server" copy offload. It needed the
functionality to be able to call copy_file_range with different super
blocks.

This patch series is for the server side support for the "server to
server" copy offload. It requires ability to call copy_file_range()
and do a copy between NFS and a local file system. Thus it needs
generic_copy_file_range.

>
> >
> > Thanks,
> > Amir.

2018-12-01 16:59:23

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v2 01/10] VFS generic copy_file_range() support

On Sat, Dec 1, 2018 at 5:57 PM Olga Kornievskaia
<[email protected]> wrote:
>
> On Sat, Dec 1, 2018 at 9:03 AM Amir Goldstein <[email protected]> wrote:
> >
> >
> >
> > On Sat, Dec 1, 2018, 3:44 PM Olga Kornievskaia <[email protected] wrote:
> >>
> >> On Sat, Dec 1, 2018 at 8:23 AM Olga Kornievskaia
> >> <[email protected]> wrote:
> >> >
> >> > On Sat, Dec 1, 2018 at 3:11 AM Amir Goldstein <[email protected]> wrote:
> >> > >
> >> > > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> >> > > <[email protected]> wrote:
> >> > > >
> >> > > > Relax the condition that input files must be from the same
> >> > > > file systems.
> >> > > >
> >> > > > Add checks that input parameters adhere semantics.
> >> > > >
> >> > > > If no copy_file_range() support is found, then do generic
> >> > > > checks for the unsupported page cache ranges, LFS, limits,
> >> > > > and clear setuid/setgid if not running as root before calling
> >> > > > do_splice_direct(). Update atime,ctime,mtime afterwards.
> >> > > >
> >> > > > Signed-off-by: Olga Kornievskaia <[email protected]>
> >> > > > ---
> >> > >
> >> > > This patch is either going to bring you down or make you stronger ;-)
> >> > >
> >> > > This is not how its done. Behavior change and refactoring mixed into
> >> > > one patch is wrong for several reasons. And when you relax same sb
> >> > > check you need to restrict it inside filesystems, like your previous patch
> >> > > did.
> >> > >
> >> > > You already had v7 patch reviewed-by 4 developers.
> >> > > What made you go and change it (and posted as v2)?
> >> > >
> >> > > Your intentions were good trying to fix the broken syscall, but
> >> > > I hope you understood that Dave didn't mean that you *have* to
> >> > > add the missing generic checks as part of your work. He just
> >> > > pointed out how broken the current interface is in the context of
> >> > > reviewing your patch.
> >> > >
> >> > > In any case, I hear that Dave is neck deep in fixing copy_file_range()
> >> > > so changes to this function should be collaborated with him. Or better
> >> > > yet, wait until he posts his fixes and carry on from there.
> >> > >
> >> > > If I were you, I would just go back to the reviewed v7 vfs patch.
> >> >
> >> > This is NOT a replacement to the v7 vfs patch??? This is a new patch
> >> > on top of that one.
> >> >
> >> > I assume that v7 patch has been OK-ed by everybody and is ready to go in???
> >> >
> >> > As you recall, what was left is to provide the functionality to relax
> >> > the check for the superblocks to be the same before calling the
> >> > do_splice_direct(). This patch attempt do this. I was under the
> >> > impression that to do so extra checks were needed to be added which I
> >> > added.
> >> >
> >>
> >> To clarify, previously I had a VFS patch with the client-side series
> >> to support "server to server" copy offload. It needed the
> >> functionality to be able to call copy_file_range with different super
> >> blocks.
> >>
> >> This patch series is for the server side support for the "server to
> >> server" copy offload. It requires ability to call copy_file_range()
> >> and do a copy between NFS and a local file system. Thus it needs
> >> generic_copy_file_range.
> >
> >
> > Ah. Sorry for the confusion.
> > My comment on change of behavior and refactoring in same patch still hold.
> > My comment about coordinate your work with Dave Chinner still hold.
>
> Understood. I will email Dave directly and coordinate.
>
> > Raise that with a comment about adding test coverage to the new
> > generic cross fs copy API to xfstest.
>
> What kind of extra coverage are you envisioning? Something that
> requires two different file systems mounted and then does a fs copy?
>

Yes, if you add this functionality you should add test coverage for the
added functionality. It's not going to be trivial to add cross fs type tests
to xfstests, but adding cross fs (same type) should be relatively easy
(copy_file_range from test fs to scratch fs).

> > Am I mistaken that this change affects any cross fs copy file range
> > by userspace and not only by kernel nfsd?
>
> That's correct, any cross fs copy is what I'm going for here.
>

Forgive me for being thick. After briefly going over the patches, I still don't
understand if you *need* to add generic cross fs copy to implement
server side copy support in nfsd? Or if you are adding it as an added bonus
to the community along with your SSC patch set?

The first two patches of the series seem unrelated to the rest, but maybe
I'm just not getting the connection?

Thanks,
Amir.

2018-12-01 21:18:08

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 01/10] VFS generic copy_file_range() support

On Fri, Nov 30, 2018 at 03:03:39PM -0500, Olga Kornievskaia wrote:
> Relax the condition that input files must be from the same
> file systems.

> + ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
> + count > MAX_RW_COUNT ? MAX_RW_COUNT : count, 0);

Wasn't there a concern about splicing between filesystems with different
block sizes mentioned the last time this came up? I can't find a citation
for that now.

> - /* this could be relaxed once generic cross fs support is added */
> - if (inode_in->i_sb != inode_out->i_sb) {
> - ret = -EXDEV;
> - goto done;
> - }

2018-12-01 22:00:53

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2 01/10] VFS generic copy_file_range() support

On Sat, Dec 01, 2018 at 10:11:48AM +0200, Amir Goldstein wrote:
> On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> <[email protected]> wrote:
> >
> > Relax the condition that input files must be from the same
> > file systems.
> >
> > Add checks that input parameters adhere semantics.
> >
> > If no copy_file_range() support is found, then do generic
> > checks for the unsupported page cache ranges, LFS, limits,
> > and clear setuid/setgid if not running as root before calling
> > do_splice_direct(). Update atime,ctime,mtime afterwards.
> >
> > Signed-off-by: Olga Kornievskaia <[email protected]>
> > ---
>
> This patch is either going to bring you down or make you stronger ;-)
>
> This is not how its done. Behavior change and refactoring mixed into
> one patch is wrong for several reasons. And when you relax same sb
> check you need to restrict it inside filesystems, like your previous patch
> did.
.....
> In any case, I hear that Dave is neck deep in fixing copy_file_range()
> so changes to this function should be collaborated with him. Or better
> yet, wait until he posts his fixes and carry on from there.

Yeah, because I've heard nothing for a month and this is kinda
important, I have a series of 8-9 patches that make all the fixes we
need, push the cross-filesystem checks down into the filesystems,
and let filesystems handle the fallback to a splice based copy
themselves (because there are way more fallback cases than just
EOPNOPSUPP and EXDEV).

I also have a patch for the man page that document all the missing
failure cases, and document where things are filesystem specific or
not.

And I also have a fstests patch that exercises all the failure cases
so that all filesystems will end up behaving the same way for all
the same cases they should.

I'm still sorting out the fstests patch (it requires changes
to xfs_io's copy-range command) so I've got some confidence that the
code actually does what it says in the man page, but I should have
that sorted in a couple of days.

Cheers,

Dave.

--
Dave Chinner
[email protected]

2018-12-01 22:36:22

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2 01/10] VFS generic copy_file_range() support

On Sat, Dec 01, 2018 at 01:18:06PM -0800, Matthew Wilcox wrote:
> On Fri, Nov 30, 2018 at 03:03:39PM -0500, Olga Kornievskaia wrote:
> > Relax the condition that input files must be from the same
> > file systems.
>
> > + ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
> > + count > MAX_RW_COUNT ? MAX_RW_COUNT : count, 0);
>
> Wasn't there a concern about splicing between filesystems with different
> block sizes mentioned the last time this came up? I can't find a citation
> for that now.

the filesystems should be able to handle that themselves - they are
just passes an iter that has a range of data regions in pages that
they copy the required data into/out of. The data transfer mechanism
itself is completely independent of filesystem block sizes....

There's lots of other problems with do_splice_direct, but I don't
think this is one of them. I coul dbe wrong - this code has pretty
much zero documentation on how it is supposed to work and what it is
supposed to do - so don't take my word for it...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2018-12-02 03:17:23

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v2 01/10] VFS generic copy_file_range() support

On Sat, Dec 1, 2018 at 5:00 PM Dave Chinner <[email protected]> wrote:
>
> On Sat, Dec 01, 2018 at 10:11:48AM +0200, Amir Goldstein wrote:
> > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> > <[email protected]> wrote:
> > >
> > > Relax the condition that input files must be from the same
> > > file systems.
> > >
> > > Add checks that input parameters adhere semantics.
> > >
> > > If no copy_file_range() support is found, then do generic
> > > checks for the unsupported page cache ranges, LFS, limits,
> > > and clear setuid/setgid if not running as root before calling
> > > do_splice_direct(). Update atime,ctime,mtime afterwards.
> > >
> > > Signed-off-by: Olga Kornievskaia <[email protected]>
> > > ---
> >
> > This patch is either going to bring you down or make you stronger ;-)
> >
> > This is not how its done. Behavior change and refactoring mixed into
> > one patch is wrong for several reasons. And when you relax same sb
> > check you need to restrict it inside filesystems, like your previous patch
> > did.
> .....
> > In any case, I hear that Dave is neck deep in fixing copy_file_range()
> > so changes to this function should be collaborated with him. Or better
> > yet, wait until he posts his fixes and carry on from there.
>
> Yeah, because I've heard nothing for a month and this is kinda
> important

Dave I think that's unfair. It is important. NFS is actually the file
system that needed VFS support for cross fs copy_file_range and I was
working on it. If you were in doubt, you could have emailed and asked
me.

I'm unsure now what does this mean. I have a patch series with a VFS
patch that went thru the extensive review (people spend time on it)
and an NFS patch series that depends on it that is ready for the
upstream push. Are you saying that the VFS patch is no longer welcomed
and thus NFS series is no longer viable either?

, I have a series of 8-9 patches that make all the fixes we
> need, push the cross-filesystem checks down into the filesystems,
> and let filesystems handle the fallback to a splice based copy
> themselves (because there are way more fallback cases than just
> EOPNOPSUPP and EXDEV).

Are you saying it is each individual filesystem responsibility to
fallback on splice? Isn't that a step backwards? Each individual
filesystem is going to implement the same code of calling
do_splice_direct() to do the functionally that could and should be in
VFS?

>
> I also have a patch for the man page that document all the missing
> failure cases, and document where things are filesystem specific or
> not.
>
> And I also have a fstests patch that exercises all the failure cases
> so that all filesystems will end up behaving the same way for all
> the same cases they should.
>
> I'm still sorting out the fstests patch (it requires changes
> to xfs_io's copy-range command) so I've got some confidence that the
> code actually does what it says in the man page, but I should have
> that sorted in a couple of days.
>
> Cheers,
>
> Dave.
>
> --
> Dave Chinner
> [email protected]

2018-12-02 15:20:01

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v2 01/10] VFS generic copy_file_range() support

On Sat, Dec 1, 2018 at 10:12 PM Olga Kornievskaia
<[email protected]> wrote:
>
> On Sat, Dec 1, 2018 at 5:00 PM Dave Chinner <[email protected]> wrote:
> >
> > On Sat, Dec 01, 2018 at 10:11:48AM +0200, Amir Goldstein wrote:
> > > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> > > <[email protected]> wrote:
> > > >
> > > > Relax the condition that input files must be from the same
> > > > file systems.
> > > >
> > > > Add checks that input parameters adhere semantics.
> > > >
> > > > If no copy_file_range() support is found, then do generic
> > > > checks for the unsupported page cache ranges, LFS, limits,
> > > > and clear setuid/setgid if not running as root before calling
> > > > do_splice_direct(). Update atime,ctime,mtime afterwards.
> > > >
> > > > Signed-off-by: Olga Kornievskaia <[email protected]>
> > > > ---
> > >
> > > This patch is either going to bring you down or make you stronger ;-)
> > >
> > > This is not how its done. Behavior change and refactoring mixed into
> > > one patch is wrong for several reasons. And when you relax same sb
> > > check you need to restrict it inside filesystems, like your previous patch
> > > did.
> > .....
> > > In any case, I hear that Dave is neck deep in fixing copy_file_range()
> > > so changes to this function should be collaborated with him. Or better
> > > yet, wait until he posts his fixes and carry on from there.
> >
> > Yeah, because I've heard nothing for a month and this is kinda
> > important
>
> Dave I think that's unfair. It is important. NFS is actually the file
> system that needed VFS support for cross fs copy_file_range and I was
> working on it. If you were in doubt, you could have emailed and asked
> me.

Just to be clear. What I think was unfair in that comment was the
wording "this is kinda important". I think a lot stems from lack of
clarity in the the mailing list communications. I object to the fact
that it wasn't clear who was going to implement the functionality.
Since the work was needed by NFS I didn't want to assume that somebody
in VFS would just do it for us. At the time nobody in VFS stood up and
said they would do the work and thus I tried to do my best.

I'm grateful, and would have been in the first place, that somebody
did support generic cross-filesystem functionality. Thus I'm by no
means speaking against Dave's work.

> I'm unsure now what does this mean. I have a patch series with a VFS
> patch that went thru the extensive review (people spend time on it)
> and an NFS patch series that depends on it that is ready for the
> upstream push. Are you saying that the VFS patch is no longer welcomed
> and thus NFS series is no longer viable either?

I'm unclear of the fate of the patch set that has the (v7) VFS patch
that was reviewed and approved and is thought to be pushed for 4.21.
It is unclear if the new work is on top of that or not.

> , I have a series of 8-9 patches that make all the fixes we
> > need, push the cross-filesystem checks down into the filesystems,
> > and let filesystems handle the fallback to a splice based copy
> > themselves (because there are way more fallback cases than just
> > EOPNOPSUPP and EXDEV).
>
> Are you saying it is each individual filesystem responsibility to
> fallback on splice? Isn't that a step backwards? Each individual
> filesystem is going to implement the same code of calling
> do_splice_direct() to do the functionally that could and should be in
> VFS?
>
> >
> > I also have a patch for the man page that document all the missing
> > failure cases, and document where things are filesystem specific or
> > not.
> >
> > And I also have a fstests patch that exercises all the failure cases
> > so that all filesystems will end up behaving the same way for all
> > the same cases they should.
> >
> > I'm still sorting out the fstests patch (it requires changes
> > to xfs_io's copy-range command) so I've got some confidence that the
> > code actually does what it says in the man page, but I should have
> > that sorted in a couple of days.
> >
> > Cheers,
> >
> > Dave.
> >
> > --
> > Dave Chinner
> > [email protected]

2018-12-02 20:47:57

by Dave Chinner

[permalink] [raw]
Subject: Re: [PATCH v2 01/10] VFS generic copy_file_range() support

On Sat, Dec 01, 2018 at 10:12:05PM -0500, Olga Kornievskaia wrote:
> On Sat, Dec 1, 2018 at 5:00 PM Dave Chinner <[email protected]> wrote:
> >
> > On Sat, Dec 01, 2018 at 10:11:48AM +0200, Amir Goldstein wrote:
> > > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia
> > > <[email protected]> wrote:
> > > >
> > > > Relax the condition that input files must be from the same
> > > > file systems.
> > > >
> > > > Add checks that input parameters adhere semantics.
> > > >
> > > > If no copy_file_range() support is found, then do generic
> > > > checks for the unsupported page cache ranges, LFS, limits,
> > > > and clear setuid/setgid if not running as root before calling
> > > > do_splice_direct(). Update atime,ctime,mtime afterwards.
> > > >
> > > > Signed-off-by: Olga Kornievskaia <[email protected]>
> > > > ---
> > >
> > > This patch is either going to bring you down or make you stronger ;-)
> > >
> > > This is not how its done. Behavior change and refactoring mixed into
> > > one patch is wrong for several reasons. And when you relax same sb
> > > check you need to restrict it inside filesystems, like your previous patch
> > > did.
> > .....
> > > In any case, I hear that Dave is neck deep in fixing copy_file_range()
> > > so changes to this function should be collaborated with him. Or better
> > > yet, wait until he posts his fixes and carry on from there.
> >
> > Yeah, because I've heard nothing for a month and this is kinda
> > important
>
> Dave I think that's unfair. It is important. NFS is actually the file
> system that needed VFS support for cross fs copy_file_range and I was
> working on it. If you were in doubt, you could have emailed and asked
> me.

Last I heard from you was "this isn't my problem and I don't have
time to deal with it". You were fairly unambiguous in saying you
weren't going to spend any time on it.

> I'm unsure now what does this mean. I have a patch series with a VFS
> patch that went thru the extensive review (people spend time on it)
> and an NFS patch series that depends on it that is ready for the
> upstream push. Are you saying that the VFS patch is no longer welcomed
> and thus NFS series is no longer viable either?

No, I'm saying that this is urgent work and needs to be separated
from the NFS patch series, of which there are now two and you've
split copy_file_range() changes across both patch sets.
copy_file_range() is broken for *everyone*, not just NFS. i.e.
fixing these problems should not be tied to some other filesystem
feature patchset.

> , I have a series of 8-9 patches that make all the fixes we
> > need, push the cross-filesystem checks down into the filesystems,
> > and let filesystems handle the fallback to a splice based copy
> > themselves (because there are way more fallback cases than just
> > EOPNOPSUPP and EXDEV).
>
> Are you saying it is each individual filesystem responsibility to
> fallback on splice? Isn't that a step backwards? Each individual
> filesystem is going to implement the same code of calling
> do_splice_direct() to do the functionally that could and should be in
> VFS?

I've done this because one of the problems I've found is that
different filesystems *do not fall back consistently*. e.g. the NFS
client will return -EINVAL if src/dst are the same file, but -EINVAL
is not one of the errors that the vfs code falls back to a data copy
on.

This is despite the fact that the fallback path can copy to/from
the same file, we support same file copy through the
->remap_file_range offload, etc. IOWs, the behaviour of the syscall
when it comes to single file ranges is completely inconsistent
because fallbacks are implemented on a filesystem-by-filesystem
basis.

I called the fallback generic_copy_file_range(), and filesystems that
implement ->copy_file_range() are responsible for calling it
themselves if they want a fallback. That's because there may be
different error/constraint conditions at the filesystem level that
prevent offloading the copy, and we can't distinguish at the VFs
between "-EINVAL means fallback because it was a single file copy"
and "-EINVAL means fail, parameter out of range".

IOWs, if you implement ->copy_file_range() you take full
resposnsibility for implementing the copying function. This is
exactly what we do for all the other file methods, so this is just
making the implementation behaviour consistent with the rest of the
code.

FWIW, this also points out a problem with the copy_file_range()
definition - it does not say WTF should happen if the copy ranges
/overlap/ in the same file. clone is clear on that - support is
determined by the filesystem (i.e. "EINVAL [...] XFS and Btrfs do
not support overlapping reflink ranges in the same file."). For
copying, the fallback code can't copy the file data correctly if the
ranges overlap, so I've added checks to make this illegal and added
that overlapping ranges are not supported to the man page.....

These are the sort of API definition problems that I'm fixing with
right now, and I'm writing tests to make sure that all filesystems
will behave the same way for given copy scenarios.

i.e. I'm not doing this so I can get a NFS feature patchset merged,
I'm doing this to make the copy_file_range API well defined and
robust and allow implementations to be verified against the
specification the man page lays out.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2018-12-05 18:06:12

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v2 09/10] NFSD: allow inter server COPY to have a STALE source server fh

On Fri, Nov 30, 2018 at 3:04 PM Olga Kornievskaia
<[email protected]> wrote:
>
> The inter server to server COPY source server filehandle
> is a foreign filehandle as the COPY is sent to the destination
> server.
>
> Signed-off-by: Olga Kornievskaia <[email protected]>
> ---
> fs/nfsd/Kconfig | 10 ++++++++++
> fs/nfsd/nfs4proc.c | 41 ++++++++++++++++++++++++++++++++++++-----
> fs/nfsd/nfsfh.h | 5 ++++-
> fs/nfsd/xdr4.h | 1 +
> 4 files changed, 51 insertions(+), 6 deletions(-)
>
> diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> index 20b1c17..37ff3d5 100644
> --- a/fs/nfsd/Kconfig
> +++ b/fs/nfsd/Kconfig
> @@ -131,6 +131,16 @@ config NFSD_FLEXFILELAYOUT
>
> If unsure, say N.
>
> +config NFSD_V4_2_INTER_SSC
> + bool "NFSv4.2 inter server to server COPY"
> + depends on NFSD_V4 && NFS_V4_1 && NFS_V4_2
> + help
> + This option enables support for NFSv4.2 inter server to
> + server copy where the destination server calls the NFSv4.2
> + client to read the data to copy from the source server.
> +
> + If unsure, say N.
> +
> config NFSD_V4_SECURITY_LABEL
> bool "Provide Security Label support for NFSv4 server"
> depends on NFSD_V4 && SECURITY
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 70d03e9..2e28254 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -503,12 +503,20 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
> union nfsd4_op_u *u)
> {
> struct nfsd4_putfh *putfh = &u->putfh;
> + __be32 ret;
>
> fh_put(&cstate->current_fh);
> cstate->current_fh.fh_handle.fh_size = putfh->pf_fhlen;
> memcpy(&cstate->current_fh.fh_handle.fh_base, putfh->pf_fhval,
> putfh->pf_fhlen);
> - return fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
> + ret = fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
> +#ifdef CONFIG_NFSD_V4_2_INTER_SSC
> + if (ret == nfserr_stale && putfh->no_verify) {
> + SET_FH_FLAG(&cstate->current_fh, NFSD4_FH_FOREIGN);
> + ret = 0;
> + }
> +#endif
> + return ret;
> }
>
> static __be32
> @@ -1967,11 +1975,12 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
> {
> struct nfsd4_compoundargs *args = rqstp->rq_argp;
> struct nfsd4_compoundres *resp = rqstp->rq_resp;
> - struct nfsd4_op *op;
> + struct nfsd4_op *op, *current_op, *saved_op;
> struct nfsd4_compound_state *cstate = &resp->cstate;
> struct svc_fh *current_fh = &cstate->current_fh;
> struct svc_fh *save_fh = &cstate->save_fh;
> __be32 status;
> + int i;
>
> svcxdr_init_encode(rqstp, resp);
> resp->tagp = resp->xdr.p;
> @@ -2006,6 +2015,27 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
> resp->opcnt = 1;
> goto encode_op;
> }
> +#ifdef CONFIG_NFSD_V4_2_INTER_SSC
> + /* traverse all operation and if it's a COPY compound, mark the
> + * source filehandle to skip verification
> + */
> + for (i = 0; i < args->opcnt; i++) {
> + op = &args->ops[i];
> + if (op->opnum == OP_PUTFH)
> + current_op = op;
> + else if (op->opnum == OP_SAVEFH)
> + saved_op = current_op;
> + else if (op->opnum == OP_RESTOREFH)
> + current_op = saved_op;
> + else if (op->opnum == OP_COPY) {
> + struct nfsd4_copy *copy = (struct nfsd4_copy *)&op[i].u;

Bruce, found an error this needs to be "op.u" not "op[i]".

> + struct nfsd4_putfh *putfh =
> + (struct nfsd4_putfh *)&saved_op->u;
> + if (!copy->cp_intra)
> + putfh->no_verify = true;
> + }
> + }
> +#endif
>
> trace_nfsd_compound(rqstp, args->opcnt);
> while (!status && resp->opcnt < args->opcnt) {
> @@ -2021,13 +2051,14 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
> op->status = nfsd4_open_omfg(rqstp, cstate, op);
> goto encode_op;
> }
> -
> - if (!current_fh->fh_dentry) {
> + if (!current_fh->fh_dentry &&
> + !HAS_FH_FLAG(current_fh, NFSD4_FH_FOREIGN)) {
> if (!(op->opdesc->op_flags & ALLOWED_WITHOUT_FH)) {
> op->status = nfserr_nofilehandle;
> goto encode_op;
> }
> - } else if (current_fh->fh_export->ex_fslocs.migrated &&
> + } else if (current_fh->fh_export &&
> + current_fh->fh_export->ex_fslocs.migrated &&
> !(op->opdesc->op_flags & ALLOWED_ON_ABSENT_FS)) {
> op->status = nfserr_moved;
> goto encode_op;
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index 755e256..b9c7568 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -35,7 +35,7 @@ static inline ino_t u32_to_ino_t(__u32 uino)
>
> bool fh_locked; /* inode locked by us */
> bool fh_want_write; /* remount protection taken */
> -
> + int fh_flags; /* FH flags */
> #ifdef CONFIG_NFSD_V3
> bool fh_post_saved; /* post-op attrs saved */
> bool fh_pre_saved; /* pre-op attrs saved */
> @@ -56,6 +56,9 @@ static inline ino_t u32_to_ino_t(__u32 uino)
> #endif /* CONFIG_NFSD_V3 */
>
> } svc_fh;
> +#define NFSD4_FH_FOREIGN (1<<0)
> +#define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f))
> +#define HAS_FH_FLAG(c, f) ((c)->fh_flags & (f))
>
> enum nfsd_fsid {
> FSID_DEV = 0,
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index 9d7318c..fbd18d6 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -221,6 +221,7 @@ struct nfsd4_lookup {
> struct nfsd4_putfh {
> u32 pf_fhlen; /* request */
> char *pf_fhval; /* request */
> + bool no_verify; /* represents foreigh fh */
> };
>
> struct nfsd4_open {
> --
> 1.8.3.1
>

2018-12-20 18:42:41

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] server-side support for "inter" SSC copy

On Fri, Nov 30, 2018 at 3:03 PM Olga Kornievskaia
<[email protected]> wrote:
>
> This patch series adds support for NFSv4.2 copy offload feature
> allowing copy between two different NFS servers.
>
> This functionality depends on the VFS ability to support generic
> copy_file_range() where a copy is done between an NFS file and
> a local file system.
>
> This feature is enabled by the kernel module parameter --
> inter_copy_offload_enable -- and by default is disabled. There is
> also a kernel compile configuration of NFSD_V4_2_INTER_SSC that
> adds dependency on the NFS client side functions called from the
> server.
>
> These patches work on top of existing async intra copy offload
> patches. For the "inter" SSC, the implementation only supports
> asynchronous inter copy.
>
> On the source server, upon receiving a COPY_NOTIFY, it generate a
> unique stateid that's kept in the global list. Upon receiving a READ
> with a stateid, the code checks the normal list of open stateid and
> now additionally, it'll check the copy state list as well before
> deciding to either fail with BAD_STATEID or find one that matches.
> The stored stateid is only valid to be used for the first time
> with a choosen lease period (90s currently). When the source server
> received an OFFLOAD_CANCEL, it will remove the stateid from the
> global list. Otherwise, the copy stateid is removed upon the removal
> of its "parent" stateid (open/lock/delegation stateid).
>
> On the destination server, upon receiving a COPY request, the server
> establishes the necessary clientid/session with the source server.
> It calls into the NFS client code to establish the necessary
> open stateid, filehandle, file description (without doing an NFS open).
> Then the server calls into the copy_file_range() to preform the copy
> where the source file will issue NFS READs and then do local file
> system writes (this depends on the VFS ability to do cross device
> copy_file_range().
>
> v2:
> -- in on top of 4.20-rc4 + client side inter patch series
> -- VFS changes to do enable generic copy_file_range() and then NFS
> falls back on generic_copy_file_range() for previous EXDEV/OPNOTSUPP
> errors
> -- hopefully addressed Bruce's review comments (highlights are):
> --- copy_notify patch: addressed naming, sc_cp_list access is
> now protected by s2s_cp_lock
> --- fillin netloc4 patch: address the size and added WARN_ON
> --- add ca_source to COPY: decode only 1 address, dont allocate
> memory (the rest into dummy)
> --- check stateid against stored: moved the refcount under lock
> --- allow stale filehandle: adding a loop to go thru the ops in
> the compound, store/manage puttfh if copy is present in the compound
> mark the source putfh as "no verify".
>
> All the patches (client inter) and this patch series is available
> from git://linux-nfs.org/projects/aglo/linux.git under the "linux-ssc"
> branch
>

Bruce,

Do you have comments on this v2? Once VFS has the patches for the
generic copy_file_range() functionality, NFS should be all set to just
used it.

> Olga Kornievskaia (10):
> VFS generic copy_file_range() support
> NFS fallback to generic_copy_file_range
> NFSD fill-in netloc4 structure
> NFSD add ca_source_server<> to COPY
> NFSD return nfs4_stid in nfs4_preprocess_stateid_op
> NFSD add COPY_NOTIFY operation
> NFSD check stateids against copy stateids
> NFSD generalize nfsd4_compound_state flag names
> NFSD: allow inter server COPY to have a STALE source server fh
> NFSD add nfs4 inter ssc to nfsd4_copy
>
> fs/nfs/nfs4file.c | 9 +-
> fs/nfsd/Kconfig | 10 ++
> fs/nfsd/nfs4proc.c | 406 ++++++++++++++++++++++++++++++++++++++++++++++-----
> fs/nfsd/nfs4state.c | 124 ++++++++++++++--
> fs/nfsd/nfs4xdr.c | 166 ++++++++++++++++++++-
> fs/nfsd/nfsd.h | 32 ++++
> fs/nfsd/nfsfh.h | 5 +-
> fs/nfsd/nfssvc.c | 6 +
> fs/nfsd/state.h | 21 ++-
> fs/nfsd/xdr4.h | 37 ++++-
> fs/read_write.c | 66 +++++++--
> include/linux/fs.h | 7 +
> include/linux/nfs4.h | 1 +
> mm/filemap.c | 6 +-
> 14 files changed, 810 insertions(+), 86 deletions(-)
>
> --
> 1.8.3.1
>

2018-12-21 19:08:39

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] server-side support for "inter" SSC copy

On Thu, Dec 20, 2018 at 01:42:27PM -0500, Olga Kornievskaia wrote:
> On Fri, Nov 30, 2018 at 3:03 PM Olga Kornievskaia
> <[email protected]> wrote:
> > All the patches (client inter) and this patch series is available
> > from git://linux-nfs.org/projects/aglo/linux.git under the "linux-ssc"
> > branch
> >
>
> Bruce,
>
> Do you have comments on this v2? Once VFS has the patches for the
> generic copy_file_range() functionality, NFS should be all set to just
> used it.

Not yet, sorry. I probably won't be able to give it a proper review
till the second week in January, feel free to ping me again then.

--b.

>
> > Olga Kornievskaia (10):
> > VFS generic copy_file_range() support
> > NFS fallback to generic_copy_file_range
> > NFSD fill-in netloc4 structure
> > NFSD add ca_source_server<> to COPY
> > NFSD return nfs4_stid in nfs4_preprocess_stateid_op
> > NFSD add COPY_NOTIFY operation
> > NFSD check stateids against copy stateids
> > NFSD generalize nfsd4_compound_state flag names
> > NFSD: allow inter server COPY to have a STALE source server fh
> > NFSD add nfs4 inter ssc to nfsd4_copy
> >
> > fs/nfs/nfs4file.c | 9 +-
> > fs/nfsd/Kconfig | 10 ++
> > fs/nfsd/nfs4proc.c | 406 ++++++++++++++++++++++++++++++++++++++++++++++-----
> > fs/nfsd/nfs4state.c | 124 ++++++++++++++--
> > fs/nfsd/nfs4xdr.c | 166 ++++++++++++++++++++-
> > fs/nfsd/nfsd.h | 32 ++++
> > fs/nfsd/nfsfh.h | 5 +-
> > fs/nfsd/nfssvc.c | 6 +
> > fs/nfsd/state.h | 21 ++-
> > fs/nfsd/xdr4.h | 37 ++++-
> > fs/read_write.c | 66 +++++++--
> > include/linux/fs.h | 7 +
> > include/linux/nfs4.h | 1 +
> > mm/filemap.c | 6 +-
> > 14 files changed, 810 insertions(+), 86 deletions(-)
> >
> > --
> > 1.8.3.1
> >

2019-01-14 14:53:23

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] server-side support for "inter" SSC copy

On Fri, Dec 21, 2018 at 2:08 PM J. Bruce Fields <[email protected]> wrote:
>
> On Thu, Dec 20, 2018 at 01:42:27PM -0500, Olga Kornievskaia wrote:
> > On Fri, Nov 30, 2018 at 3:03 PM Olga Kornievskaia
> > <[email protected]> wrote:
> > > All the patches (client inter) and this patch series is available
> > > from git://linux-nfs.org/projects/aglo/linux.git under the "linux-ssc"
> > > branch
> > >
> >
> > Bruce,
> >
> > Do you have comments on this v2? Once VFS has the patches for the
> > generic copy_file_range() functionality, NFS should be all set to just
> > used it.
>
> Not yet, sorry. I probably won't be able to give it a proper review
> till the second week in January, feel free to ping me again then.
>

Hi Bruce,

Any progress on this?

> --b.
>
> >
> > > Olga Kornievskaia (10):
> > > VFS generic copy_file_range() support
> > > NFS fallback to generic_copy_file_range
> > > NFSD fill-in netloc4 structure
> > > NFSD add ca_source_server<> to COPY
> > > NFSD return nfs4_stid in nfs4_preprocess_stateid_op
> > > NFSD add COPY_NOTIFY operation
> > > NFSD check stateids against copy stateids
> > > NFSD generalize nfsd4_compound_state flag names
> > > NFSD: allow inter server COPY to have a STALE source server fh
> > > NFSD add nfs4 inter ssc to nfsd4_copy
> > >
> > > fs/nfs/nfs4file.c | 9 +-
> > > fs/nfsd/Kconfig | 10 ++
> > > fs/nfsd/nfs4proc.c | 406 ++++++++++++++++++++++++++++++++++++++++++++++-----
> > > fs/nfsd/nfs4state.c | 124 ++++++++++++++--
> > > fs/nfsd/nfs4xdr.c | 166 ++++++++++++++++++++-
> > > fs/nfsd/nfsd.h | 32 ++++
> > > fs/nfsd/nfsfh.h | 5 +-
> > > fs/nfsd/nfssvc.c | 6 +
> > > fs/nfsd/state.h | 21 ++-
> > > fs/nfsd/xdr4.h | 37 ++++-
> > > fs/read_write.c | 66 +++++++--
> > > include/linux/fs.h | 7 +
> > > include/linux/nfs4.h | 1 +
> > > mm/filemap.c | 6 +-
> > > 14 files changed, 810 insertions(+), 86 deletions(-)
> > >
> > > --
> > > 1.8.3.1
> > >

2019-01-16 22:08:21

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] server-side support for "inter" SSC copy

On Mon, Jan 14, 2019 at 09:53:10AM -0500, Olga Kornievskaia wrote:
> On Fri, Dec 21, 2018 at 2:08 PM J. Bruce Fields <[email protected]> wrote:
> >
> > On Thu, Dec 20, 2018 at 01:42:27PM -0500, Olga Kornievskaia wrote:
> > > On Fri, Nov 30, 2018 at 3:03 PM Olga Kornievskaia
> > > <[email protected]> wrote:
> > > > All the patches (client inter) and this patch series is available
> > > > from git://linux-nfs.org/projects/aglo/linux.git under the "linux-ssc"
> > > > branch
> > > >
> > >
> > > Bruce,
> > >
> > > Do you have comments on this v2? Once VFS has the patches for the
> > > generic copy_file_range() functionality, NFS should be all set to just
> > > used it.
> >
> > Not yet, sorry. I probably won't be able to give it a proper review
> > till the second week in January, feel free to ping me again then.
>
> Any progress on this?

I have some delegation patches I want to get debugged first, hopefully
just a couple more days.

Where does the VFS cross-superblock copy work stand? I remember Dave
Chinner sending some patches but don't see them in 5.0-rc1, on a quick
look.

--b.

>
> > --b.
> >
> > >
> > > > Olga Kornievskaia (10):
> > > > VFS generic copy_file_range() support
> > > > NFS fallback to generic_copy_file_range
> > > > NFSD fill-in netloc4 structure
> > > > NFSD add ca_source_server<> to COPY
> > > > NFSD return nfs4_stid in nfs4_preprocess_stateid_op
> > > > NFSD add COPY_NOTIFY operation
> > > > NFSD check stateids against copy stateids
> > > > NFSD generalize nfsd4_compound_state flag names
> > > > NFSD: allow inter server COPY to have a STALE source server fh
> > > > NFSD add nfs4 inter ssc to nfsd4_copy
> > > >
> > > > fs/nfs/nfs4file.c | 9 +-
> > > > fs/nfsd/Kconfig | 10 ++
> > > > fs/nfsd/nfs4proc.c | 406 ++++++++++++++++++++++++++++++++++++++++++++++-----
> > > > fs/nfsd/nfs4state.c | 124 ++++++++++++++--
> > > > fs/nfsd/nfs4xdr.c | 166 ++++++++++++++++++++-
> > > > fs/nfsd/nfsd.h | 32 ++++
> > > > fs/nfsd/nfsfh.h | 5 +-
> > > > fs/nfsd/nfssvc.c | 6 +
> > > > fs/nfsd/state.h | 21 ++-
> > > > fs/nfsd/xdr4.h | 37 ++++-
> > > > fs/read_write.c | 66 +++++++--
> > > > include/linux/fs.h | 7 +
> > > > include/linux/nfs4.h | 1 +
> > > > mm/filemap.c | 6 +-
> > > > 14 files changed, 810 insertions(+), 86 deletions(-)
> > > >
> > > > --
> > > > 1.8.3.1
> > > >

2019-01-17 17:03:28

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] server-side support for "inter" SSC copy

On Wed, Jan 16, 2019 at 5:08 PM J. Bruce Fields <[email protected]> wrote:
>
> On Mon, Jan 14, 2019 at 09:53:10AM -0500, Olga Kornievskaia wrote:
> > On Fri, Dec 21, 2018 at 2:08 PM J. Bruce Fields <[email protected]> wrote:
> > >
> > > On Thu, Dec 20, 2018 at 01:42:27PM -0500, Olga Kornievskaia wrote:
> > > > On Fri, Nov 30, 2018 at 3:03 PM Olga Kornievskaia
> > > > <[email protected]> wrote:
> > > > > All the patches (client inter) and this patch series is available
> > > > > from git://linux-nfs.org/projects/aglo/linux.git under the "linux-ssc"
> > > > > branch
> > > > >
> > > >
> > > > Bruce,
> > > >
> > > > Do you have comments on this v2? Once VFS has the patches for the
> > > > generic copy_file_range() functionality, NFS should be all set to just
> > > > used it.
> > >
> > > Not yet, sorry. I probably won't be able to give it a proper review
> > > till the second week in January, feel free to ping me again then.
> >
> > Any progress on this?
>
> I have some delegation patches I want to get debugged first, hopefully
> just a couple more days.
>
> Where does the VFS cross-superblock copy work stand? I remember Dave
> Chinner sending some patches but don't see them in 5.0-rc1, on a quick
> look.

I don't know where it stands. Dave didn't reply my asking about it.

>
> --b.
>
> >
> > > --b.
> > >
> > > >
> > > > > Olga Kornievskaia (10):
> > > > > VFS generic copy_file_range() support
> > > > > NFS fallback to generic_copy_file_range
> > > > > NFSD fill-in netloc4 structure
> > > > > NFSD add ca_source_server<> to COPY
> > > > > NFSD return nfs4_stid in nfs4_preprocess_stateid_op
> > > > > NFSD add COPY_NOTIFY operation
> > > > > NFSD check stateids against copy stateids
> > > > > NFSD generalize nfsd4_compound_state flag names
> > > > > NFSD: allow inter server COPY to have a STALE source server fh
> > > > > NFSD add nfs4 inter ssc to nfsd4_copy
> > > > >
> > > > > fs/nfs/nfs4file.c | 9 +-
> > > > > fs/nfsd/Kconfig | 10 ++
> > > > > fs/nfsd/nfs4proc.c | 406 ++++++++++++++++++++++++++++++++++++++++++++++-----
> > > > > fs/nfsd/nfs4state.c | 124 ++++++++++++++--
> > > > > fs/nfsd/nfs4xdr.c | 166 ++++++++++++++++++++-
> > > > > fs/nfsd/nfsd.h | 32 ++++
> > > > > fs/nfsd/nfsfh.h | 5 +-
> > > > > fs/nfsd/nfssvc.c | 6 +
> > > > > fs/nfsd/state.h | 21 ++-
> > > > > fs/nfsd/xdr4.h | 37 ++++-
> > > > > fs/read_write.c | 66 +++++++--
> > > > > include/linux/fs.h | 7 +
> > > > > include/linux/nfs4.h | 1 +
> > > > > mm/filemap.c | 6 +-
> > > > > 14 files changed, 810 insertions(+), 86 deletions(-)
> > > > >
> > > > > --
> > > > > 1.8.3.1
> > > > >

2019-02-19 15:53:09

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 09/10] NFSD: allow inter server COPY to have a STALE source server fh

On Fri, Nov 30, 2018 at 03:03:47PM -0500, Olga Kornievskaia wrote:
> The inter server to server COPY source server filehandle
> is a foreign filehandle as the COPY is sent to the destination
> server.
>
> Signed-off-by: Olga Kornievskaia <[email protected]>
> ---
> fs/nfsd/Kconfig | 10 ++++++++++
> fs/nfsd/nfs4proc.c | 41 ++++++++++++++++++++++++++++++++++++-----
> fs/nfsd/nfsfh.h | 5 ++++-
> fs/nfsd/xdr4.h | 1 +
> 4 files changed, 51 insertions(+), 6 deletions(-)
>
> diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> index 20b1c17..37ff3d5 100644
> --- a/fs/nfsd/Kconfig
> +++ b/fs/nfsd/Kconfig
> @@ -131,6 +131,16 @@ config NFSD_FLEXFILELAYOUT
>
> If unsure, say N.
>
> +config NFSD_V4_2_INTER_SSC
> + bool "NFSv4.2 inter server to server COPY"
> + depends on NFSD_V4 && NFS_V4_1 && NFS_V4_2
> + help
> + This option enables support for NFSv4.2 inter server to
> + server copy where the destination server calls the NFSv4.2
> + client to read the data to copy from the source server.
> +
> + If unsure, say N.
> +
> config NFSD_V4_SECURITY_LABEL
> bool "Provide Security Label support for NFSv4 server"
> depends on NFSD_V4 && SECURITY
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 70d03e9..2e28254 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -503,12 +503,20 @@ static __be32 nfsd4_open_omfg(struct svc_rqst *rqstp, struct nfsd4_compound_stat
> union nfsd4_op_u *u)
> {
> struct nfsd4_putfh *putfh = &u->putfh;
> + __be32 ret;
>
> fh_put(&cstate->current_fh);
> cstate->current_fh.fh_handle.fh_size = putfh->pf_fhlen;
> memcpy(&cstate->current_fh.fh_handle.fh_base, putfh->pf_fhval,
> putfh->pf_fhlen);
> - return fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
> + ret = fh_verify(rqstp, &cstate->current_fh, 0, NFSD_MAY_BYPASS_GSS);
> +#ifdef CONFIG_NFSD_V4_2_INTER_SSC
> + if (ret == nfserr_stale && putfh->no_verify) {
> + SET_FH_FLAG(&cstate->current_fh, NFSD4_FH_FOREIGN);
> + ret = 0;
> + }
> +#endif
> + return ret;
> }
>
> static __be32
> @@ -1967,11 +1975,12 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
> {
> struct nfsd4_compoundargs *args = rqstp->rq_argp;
> struct nfsd4_compoundres *resp = rqstp->rq_resp;
> - struct nfsd4_op *op;
> + struct nfsd4_op *op, *current_op, *saved_op;
> struct nfsd4_compound_state *cstate = &resp->cstate;
> struct svc_fh *current_fh = &cstate->current_fh;
> struct svc_fh *save_fh = &cstate->save_fh;
> __be32 status;
> + int i;
>
> svcxdr_init_encode(rqstp, resp);
> resp->tagp = resp->xdr.p;
> @@ -2006,6 +2015,27 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
> resp->opcnt = 1;
> goto encode_op;
> }
> +#ifdef CONFIG_NFSD_V4_2_INTER_SSC
> + /* traverse all operation and if it's a COPY compound, mark the
> + * source filehandle to skip verification
> + */
> + for (i = 0; i < args->opcnt; i++) {
> + op = &args->ops[i];
> + if (op->opnum == OP_PUTFH)
> + current_op = op;
> + else if (op->opnum == OP_SAVEFH)
> + saved_op = current_op;
> + else if (op->opnum == OP_RESTOREFH)
> + current_op = saved_op;
> + else if (op->opnum == OP_COPY) {
> + struct nfsd4_copy *copy = (struct nfsd4_copy *)&op[i].u;
> + struct nfsd4_putfh *putfh =
> + (struct nfsd4_putfh *)&saved_op->u;
> + if (!copy->cp_intra)
> + putfh->no_verify = true;
> + }
> + }
> +#endif

This looks good, but could you please move this loop to a function of
its own? (And do the usual trick of making that function a no-op when
INTER_SSC isn't defined.)

--b.

>
> trace_nfsd_compound(rqstp, args->opcnt);
> while (!status && resp->opcnt < args->opcnt) {
> @@ -2021,13 +2051,14 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
> op->status = nfsd4_open_omfg(rqstp, cstate, op);
> goto encode_op;
> }
> -
> - if (!current_fh->fh_dentry) {
> + if (!current_fh->fh_dentry &&
> + !HAS_FH_FLAG(current_fh, NFSD4_FH_FOREIGN)) {
> if (!(op->opdesc->op_flags & ALLOWED_WITHOUT_FH)) {
> op->status = nfserr_nofilehandle;
> goto encode_op;
> }
> - } else if (current_fh->fh_export->ex_fslocs.migrated &&
> + } else if (current_fh->fh_export &&
> + current_fh->fh_export->ex_fslocs.migrated &&
> !(op->opdesc->op_flags & ALLOWED_ON_ABSENT_FS)) {
> op->status = nfserr_moved;
> goto encode_op;
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index 755e256..b9c7568 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -35,7 +35,7 @@ static inline ino_t u32_to_ino_t(__u32 uino)
>
> bool fh_locked; /* inode locked by us */
> bool fh_want_write; /* remount protection taken */
> -
> + int fh_flags; /* FH flags */
> #ifdef CONFIG_NFSD_V3
> bool fh_post_saved; /* post-op attrs saved */
> bool fh_pre_saved; /* pre-op attrs saved */
> @@ -56,6 +56,9 @@ static inline ino_t u32_to_ino_t(__u32 uino)
> #endif /* CONFIG_NFSD_V3 */
>
> } svc_fh;
> +#define NFSD4_FH_FOREIGN (1<<0)
> +#define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f))
> +#define HAS_FH_FLAG(c, f) ((c)->fh_flags & (f))
>
> enum nfsd_fsid {
> FSID_DEV = 0,
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index 9d7318c..fbd18d6 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -221,6 +221,7 @@ struct nfsd4_lookup {
> struct nfsd4_putfh {
> u32 pf_fhlen; /* request */
> char *pf_fhval; /* request */
> + bool no_verify; /* represents foreigh fh */
> };
>
> struct nfsd4_open {
> --
> 1.8.3.1

2019-02-19 15:54:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 10/10] NFSD add nfs4 inter ssc to nfsd4_copy

On Fri, Nov 30, 2018 at 03:03:48PM -0500, Olga Kornievskaia wrote:
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 2e28254..238c4b7 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
...
> +static __be32
> +nfsd4_setup_inter_ssc(struct svc_rqst *rqstp,
> + struct nfsd4_compound_state *cstate,
> + struct nfsd4_copy *copy,
> + struct vfs_mount **mount)

That should be struct vfsmount. Don't forget to check the compile with
the new config option both on and off.

--b.

> +{
> + *mount = NULL;
> + return -EINVAL;
> +}

2019-02-19 16:03:20

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 00/10] server-side support for "inter" SSC copy

On Wed, Jan 16, 2019 at 05:08:17PM -0500, J. Bruce Fields wrote:
> I have some delegation patches I want to get debugged first, hopefully
> just a couple more days.

Sorry for the delay, I should be back to the server-to-server copy
patches today and tomorrow.

> Where does the VFS cross-superblock copy work stand? I remember Dave
> Chinner sending some patches but don't see them in 5.0-rc1, on a quick
> look.

(Adding Dave to the cc.)

I don't recall wha the remaining cross-device copy issues were. Any
opinion whether they'd be worth a discussion at lsf/mm?

https://lkml.org/lkml/2019/1/18/696

--b.

2019-02-19 16:17:13

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 04/10] NFSD add ca_source_server<> to COPY

On Fri, Nov 30, 2018 at 03:03:42PM -0500, Olga Kornievskaia wrote:
> Decode the ca_source_server list that's sent but only use the
> first one. Presence of non-zero list indicates an "inter" copy.
>
> Signed-off-by: Andy Adamson <[email protected]>
> Signed-off-by: Olga Kornievskaia <[email protected]>
> ---
> fs/nfsd/nfs4xdr.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
> fs/nfsd/xdr4.h | 12 ++++++----
> 2 files changed, 74 insertions(+), 7 deletions(-)
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 3de42a7..879ddc6 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
...
> static __be32
> nfsd4_decode_copy(struct nfsd4_compoundargs *argp, struct nfsd4_copy *copy)
> {
> DECODE_HEAD;
> - unsigned int tmp;
> + struct nl4_server ns_dummy;

This struct is much too big to put on the stack.

--b.

2019-02-20 01:44:48

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation

On Fri, Nov 30, 2018 at 03:03:44PM -0500, Olga Kornievskaia wrote:
> + /**
> + * For now, only return one server address in cpn_src, the
> + * address used by the client to connect to this server.
> + */

Could you check your code for any places where you use /** as the
beginning of a comment? The usual style is just /*, and /** is reserved
for specially-formatted comments meant to be extracted into
automatically-generated API documentation, so I think the above might
confuse kernel-doc. (See Documentation/doc-guide/kernel-doc.rst.)

--b.

2019-02-20 02:07:34

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation

On Fri, Nov 30, 2018 at 03:03:44PM -0500, Olga Kornievskaia wrote:
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 879ddc6..c9fb625 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
...
> + /* cnr_lease_time */
> + p = xdr_encode_hyper(p, cn->cpn_sec);
> + *p++ = cpu_to_be32(cn->cpn_nsec);

This is redundant; xdr_encode_hyper already wrote cn->cpn_sec.

--b.

2019-02-20 02:12:10

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation

On Fri, Nov 30, 2018 at 03:03:44PM -0500, Olga Kornievskaia wrote:
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 0152b34..51fca9e 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
...
> @@ -1348,6 +1350,43 @@ struct nfsd4_copy *
> }
>
> static __be32
> +nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> + union nfsd4_op_u *u)
> +{
> + struct nfsd4_copy_notify *cn = &u->copy_notify;
> + __be32 status;
> + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> + struct nfs4_stid *stid;
> + struct nfs4_cpntf_state *cps;
> +
> + status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
> + &cn->cpn_src_stateid, RD_STATE, NULL,
> + NULL, &stid);
> + if (status)
> + return status;
> +
> + cn->cpn_sec = nn->nfsd4_lease;
> + cn->cpn_nsec = 0;

I'm pretty sure this should be cp_timeout, not 0.

--b.

2019-02-20 02:35:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation

On Fri, Nov 30, 2018 at 03:03:44PM -0500, Olga Kornievskaia wrote:
> Introducing the COPY_NOTIFY operation.
>
> Create a new unique stateid that will keep track of the copy
> state and the upcoming READs that will use that stateid. Keep
> it in the list associated with parent stateid.
>
> Return single netaddr to advertise to the copy.
>
> Signed-off-by: Andy Adamson <[email protected]>
> Signed-off-by: Olga Kornievskaia <[email protected]>
> ---
> fs/nfsd/nfs4proc.c | 72 +++++++++++++++++++++++++++++++++++----
> fs/nfsd/nfs4state.c | 64 +++++++++++++++++++++++++++++++----
> fs/nfsd/nfs4xdr.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> fs/nfsd/state.h | 18 ++++++++--
> fs/nfsd/xdr4.h | 13 +++++++
> 5 files changed, 248 insertions(+), 16 deletions(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 0152b34..51fca9e 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -37,6 +37,7 @@
> #include <linux/falloc.h>
> #include <linux/slab.h>
> #include <linux/kthread.h>
> +#include <linux/sunrpc/addr.h>
>
> #include "idmap.h"
> #include "cache.h"
> @@ -1035,7 +1036,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> static __be32
> nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> stateid_t *src_stateid, struct file **src,
> - stateid_t *dst_stateid, struct file **dst)
> + stateid_t *dst_stateid, struct file **dst,
> + struct nfs4_stid **stid)
> {
> __be32 status;
>
> @@ -1052,7 +1054,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
>
> status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
> dst_stateid, WR_STATE, dst, NULL,
> - NULL);
> + stid);
> if (status) {
> dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
> goto out_put_src;
> @@ -1083,7 +1085,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> __be32 status;
>
> status = nfsd4_verify_copy(rqstp, cstate, &clone->cl_src_stateid, &src,
> - &clone->cl_dst_stateid, &dst);
> + &clone->cl_dst_stateid, &dst, NULL);
> if (status)
> goto out;
>
> @@ -1230,7 +1232,7 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
>
> static void cleanup_async_copy(struct nfsd4_copy *copy)
> {
> - nfs4_free_cp_state(copy);
> + nfs4_free_copy_state(copy);
> fput(copy->file_dst);
> fput(copy->file_src);
> spin_lock(&copy->cp_clp->async_lock);
> @@ -1270,7 +1272,7 @@ static int nfsd4_do_async_copy(void *data)
>
> status = nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
> &copy->file_src, &copy->cp_dst_stateid,
> - &copy->file_dst);
> + &copy->file_dst, NULL);
> if (status)
> goto out;
>
> @@ -1284,7 +1286,7 @@ static int nfsd4_do_async_copy(void *data)
> async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
> if (!async_copy)
> goto out;
> - if (!nfs4_init_cp_state(nn, copy)) {
> + if (!nfs4_init_copy_state(nn, copy)) {
> kfree(async_copy);
> goto out;
> }
> @@ -1348,6 +1350,43 @@ struct nfsd4_copy *
> }
>
> static __be32
> +nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> + union nfsd4_op_u *u)
> +{
> + struct nfsd4_copy_notify *cn = &u->copy_notify;
> + __be32 status;
> + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> + struct nfs4_stid *stid;
> + struct nfs4_cpntf_state *cps;
> +
> + status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
> + &cn->cpn_src_stateid, RD_STATE, NULL,
> + NULL, &stid);
> + if (status)
> + return status;
> +
> + cn->cpn_sec = nn->nfsd4_lease;
> + cn->cpn_nsec = 0;
> +
> + status = nfserrno(-ENOMEM);
> + cps = nfs4_alloc_init_cpntf_state(nn, stid);
> + if (!cps)
> + return status;
> + memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid, sizeof(stateid_t));
> +
> + /**
> + * For now, only return one server address in cpn_src, the
> + * address used by the client to connect to this server.
> + */
> + cn->cpn_src.nl4_type = NL4_NETADDR;
> + status = nfsd4_set_netaddr((struct sockaddr *)&rqstp->rq_daddr,
> + &cn->cpn_src.u.nl4_addr);
> + WARN_ON_ONCE(status);
> +
> + return status;
> +}
> +
> +static __be32
> nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> struct nfsd4_fallocate *fallocate, int flags)
> {
> @@ -2299,6 +2338,21 @@ static inline u32 nfsd4_offload_status_rsize(struct svc_rqst *rqstp,
> 1 /* osr_complete<1> optional 0 for now */) * sizeof(__be32);
> }
>
> +static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp,
> + struct nfsd4_op *op)
> +{
> + return (op_encode_hdr_size +
> + 3 /* cnr_lease_time */ +
> + 1 /* We support one cnr_source_server */ +
> + 1 /* cnr_stateid seq */ +
> + op_encode_stateid_maxsz /* cnr_stateid */ +
> + 1 /* num cnr_source_server*/ +
> + 1 /* nl4_type */ +
> + 1 /* nl4 size */ +
> + XDR_QUADLEN(NFS4_OPAQUE_LIMIT) /*nl4_loc + nl4_loc_sz */)
> + * sizeof(__be32);
> +}
> +
> #ifdef CONFIG_NFSD_PNFS
> static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> {
> @@ -2723,6 +2777,12 @@ static inline u32 nfsd4_seek_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> .op_name = "OP_OFFLOAD_CANCEL",
> .op_rsize_bop = nfsd4_only_status_rsize,
> },
> + [OP_COPY_NOTIFY] = {
> + .op_func = nfsd4_copy_notify,
> + .op_flags = OP_MODIFIES_SOMETHING,
> + .op_name = "OP_COPY_NOTIFY",
> + .op_rsize_bop = nfsd4_copy_notify_rsize,
> + },
> };
>
> /**
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index be3e967..eaa136f 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -697,6 +697,7 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
> /* Will be incremented before return to client: */
> refcount_set(&stid->sc_count, 1);
> spin_lock_init(&stid->sc_lock);
> + INIT_LIST_HEAD(&stid->sc_cp_list);
>
> /*
> * It shouldn't be a problem to reuse an opaque stateid value.
> @@ -716,24 +717,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
> /*
> * Create a unique stateid_t to represent each COPY.
> */
> -int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> +static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr, stateid_t *stid)
> {
> int new_id;
>
> idr_preload(GFP_KERNEL);
> spin_lock(&nn->s2s_cp_lock);
> - new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0, GFP_NOWAIT);
> + new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0, GFP_NOWAIT);
> spin_unlock(&nn->s2s_cp_lock);
> idr_preload_end();
> if (new_id < 0)
> return 0;
> - copy->cp_stateid.si_opaque.so_id = new_id;
> - copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
> - copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> + stid->si_opaque.so_id = new_id;
> + stid->si_opaque.so_clid.cl_boot = nn->boot_time;
> + stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> return 1;
> }
>
> -void nfs4_free_cp_state(struct nfsd4_copy *copy)
> +int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> +{
> + return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
> +}
> +
> +struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn,
> + struct nfs4_stid *p_stid)
> +{
> + struct nfs4_cpntf_state *cps;
> +
> + cps = kzalloc(sizeof(struct nfs4_cpntf_state), GFP_KERNEL);
> + if (!cps)
> + return NULL;
> + if (!nfs4_init_cp_state(nn, cps, &cps->cp_stateid))
> + goto out_free;
> + cps->cp_p_stid = p_stid;
> + cps->cp_active = false;
> + cps->cp_timeout = jiffies + (nn->nfsd4_lease * HZ);
> + INIT_LIST_HEAD(&cps->cp_list);
> + spin_lock(&nn->s2s_cp_lock);
> + list_add(&cps->cp_list, &p_stid->sc_cp_list);
> + spin_unlock(&nn->s2s_cp_lock);
> +
> + return cps;
> +out_free:
> + kfree(cps);
> + return NULL;
> +}
> +
> +void nfs4_free_copy_state(struct nfsd4_copy *copy)
> {
> struct nfsd_net *nn;
>
> @@ -743,6 +773,27 @@ void nfs4_free_cp_state(struct nfsd4_copy *copy)
> spin_unlock(&nn->s2s_cp_lock);
> }
>
> +static void nfs4_free_cpntf_statelist(struct net *net, struct nfs4_stid *stid)
> +{
> + struct nfs4_cpntf_state *cps;
> + struct nfsd_net *nn;
> +
> + nn = net_generic(net, nfsd_net_id);
> +
> + might_sleep();

What's that for? Just remove it unless you've got some good reason.

--b.

2019-02-20 14:04:07

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation

On Tue, Feb 19, 2019 at 09:07:32PM -0500, J. Bruce Fields wrote:
> On Fri, Nov 30, 2018 at 03:03:44PM -0500, Olga Kornievskaia wrote:
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 879ddc6..c9fb625 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> ...
> > + /* cnr_lease_time */
> > + p = xdr_encode_hyper(p, cn->cpn_sec);
> > + *p++ = cpu_to_be32(cn->cpn_nsec);
>
> This is redundant; xdr_encode_hyper already wrote cn->cpn_sec.

Um, I completely missed the sec/nsec distinction, never mind me.

--b.

2019-06-14 19:14:04

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v2 06/10] NFSD add COPY_NOTIFY operation

On Tue, Feb 19, 2019 at 9:35 PM J. Bruce Fields <[email protected]> wrote:
>
> On Fri, Nov 30, 2018 at 03:03:44PM -0500, Olga Kornievskaia wrote:
> > Introducing the COPY_NOTIFY operation.
> >
> > Create a new unique stateid that will keep track of the copy
> > state and the upcoming READs that will use that stateid. Keep
> > it in the list associated with parent stateid.
> >
> > Return single netaddr to advertise to the copy.
> >
> > Signed-off-by: Andy Adamson <[email protected]>
> > Signed-off-by: Olga Kornievskaia <[email protected]>
> > ---
> > fs/nfsd/nfs4proc.c | 72 +++++++++++++++++++++++++++++++++++----
> > fs/nfsd/nfs4state.c | 64 +++++++++++++++++++++++++++++++----
> > fs/nfsd/nfs4xdr.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> > fs/nfsd/state.h | 18 ++++++++--
> > fs/nfsd/xdr4.h | 13 +++++++
> > 5 files changed, 248 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index 0152b34..51fca9e 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -37,6 +37,7 @@
> > #include <linux/falloc.h>
> > #include <linux/slab.h>
> > #include <linux/kthread.h>
> > +#include <linux/sunrpc/addr.h>
> >
> > #include "idmap.h"
> > #include "cache.h"
> > @@ -1035,7 +1036,8 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> > static __be32
> > nfsd4_verify_copy(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > stateid_t *src_stateid, struct file **src,
> > - stateid_t *dst_stateid, struct file **dst)
> > + stateid_t *dst_stateid, struct file **dst,
> > + struct nfs4_stid **stid)
> > {
> > __be32 status;
> >
> > @@ -1052,7 +1054,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> >
> > status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
> > dst_stateid, WR_STATE, dst, NULL,
> > - NULL);
> > + stid);
> > if (status) {
> > dprintk("NFSD: %s: couldn't process dst stateid!\n", __func__);
> > goto out_put_src;
> > @@ -1083,7 +1085,7 @@ static __be32 nfsd4_do_lookupp(struct svc_rqst *rqstp, struct svc_fh *fh)
> > __be32 status;
> >
> > status = nfsd4_verify_copy(rqstp, cstate, &clone->cl_src_stateid, &src,
> > - &clone->cl_dst_stateid, &dst);
> > + &clone->cl_dst_stateid, &dst, NULL);
> > if (status)
> > goto out;
> >
> > @@ -1230,7 +1232,7 @@ static void dup_copy_fields(struct nfsd4_copy *src, struct nfsd4_copy *dst)
> >
> > static void cleanup_async_copy(struct nfsd4_copy *copy)
> > {
> > - nfs4_free_cp_state(copy);
> > + nfs4_free_copy_state(copy);
> > fput(copy->file_dst);
> > fput(copy->file_src);
> > spin_lock(&copy->cp_clp->async_lock);
> > @@ -1270,7 +1272,7 @@ static int nfsd4_do_async_copy(void *data)
> >
> > status = nfsd4_verify_copy(rqstp, cstate, &copy->cp_src_stateid,
> > &copy->file_src, &copy->cp_dst_stateid,
> > - &copy->file_dst);
> > + &copy->file_dst, NULL);
> > if (status)
> > goto out;
> >
> > @@ -1284,7 +1286,7 @@ static int nfsd4_do_async_copy(void *data)
> > async_copy = kzalloc(sizeof(struct nfsd4_copy), GFP_KERNEL);
> > if (!async_copy)
> > goto out;
> > - if (!nfs4_init_cp_state(nn, copy)) {
> > + if (!nfs4_init_copy_state(nn, copy)) {
> > kfree(async_copy);
> > goto out;
> > }
> > @@ -1348,6 +1350,43 @@ struct nfsd4_copy *
> > }
> >
> > static __be32
> > +nfsd4_copy_notify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > + union nfsd4_op_u *u)
> > +{
> > + struct nfsd4_copy_notify *cn = &u->copy_notify;
> > + __be32 status;
> > + struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> > + struct nfs4_stid *stid;
> > + struct nfs4_cpntf_state *cps;
> > +
> > + status = nfs4_preprocess_stateid_op(rqstp, cstate, &cstate->current_fh,
> > + &cn->cpn_src_stateid, RD_STATE, NULL,
> > + NULL, &stid);
> > + if (status)
> > + return status;
> > +
> > + cn->cpn_sec = nn->nfsd4_lease;
> > + cn->cpn_nsec = 0;
> > +
> > + status = nfserrno(-ENOMEM);
> > + cps = nfs4_alloc_init_cpntf_state(nn, stid);
> > + if (!cps)
> > + return status;
> > + memcpy(&cn->cpn_cnr_stateid, &cps->cp_stateid, sizeof(stateid_t));
> > +
> > + /**
> > + * For now, only return one server address in cpn_src, the
> > + * address used by the client to connect to this server.
> > + */
> > + cn->cpn_src.nl4_type = NL4_NETADDR;
> > + status = nfsd4_set_netaddr((struct sockaddr *)&rqstp->rq_daddr,
> > + &cn->cpn_src.u.nl4_addr);
> > + WARN_ON_ONCE(status);
> > +
> > + return status;
> > +}
> > +
> > +static __be32
> > nfsd4_fallocate(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > struct nfsd4_fallocate *fallocate, int flags)
> > {
> > @@ -2299,6 +2338,21 @@ static inline u32 nfsd4_offload_status_rsize(struct svc_rqst *rqstp,
> > 1 /* osr_complete<1> optional 0 for now */) * sizeof(__be32);
> > }
> >
> > +static inline u32 nfsd4_copy_notify_rsize(struct svc_rqst *rqstp,
> > + struct nfsd4_op *op)
> > +{
> > + return (op_encode_hdr_size +
> > + 3 /* cnr_lease_time */ +
> > + 1 /* We support one cnr_source_server */ +
> > + 1 /* cnr_stateid seq */ +
> > + op_encode_stateid_maxsz /* cnr_stateid */ +
> > + 1 /* num cnr_source_server*/ +
> > + 1 /* nl4_type */ +
> > + 1 /* nl4 size */ +
> > + XDR_QUADLEN(NFS4_OPAQUE_LIMIT) /*nl4_loc + nl4_loc_sz */)
> > + * sizeof(__be32);
> > +}
> > +
> > #ifdef CONFIG_NFSD_PNFS
> > static inline u32 nfsd4_getdeviceinfo_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> > {
> > @@ -2723,6 +2777,12 @@ static inline u32 nfsd4_seek_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> > .op_name = "OP_OFFLOAD_CANCEL",
> > .op_rsize_bop = nfsd4_only_status_rsize,
> > },
> > + [OP_COPY_NOTIFY] = {
> > + .op_func = nfsd4_copy_notify,
> > + .op_flags = OP_MODIFIES_SOMETHING,
> > + .op_name = "OP_COPY_NOTIFY",
> > + .op_rsize_bop = nfsd4_copy_notify_rsize,
> > + },
> > };
> >
> > /**
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index be3e967..eaa136f 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -697,6 +697,7 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
> > /* Will be incremented before return to client: */
> > refcount_set(&stid->sc_count, 1);
> > spin_lock_init(&stid->sc_lock);
> > + INIT_LIST_HEAD(&stid->sc_cp_list);
> >
> > /*
> > * It shouldn't be a problem to reuse an opaque stateid value.
> > @@ -716,24 +717,53 @@ struct nfs4_stid *nfs4_alloc_stid(struct nfs4_client *cl, struct kmem_cache *sla
> > /*
> > * Create a unique stateid_t to represent each COPY.
> > */
> > -int nfs4_init_cp_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> > +static int nfs4_init_cp_state(struct nfsd_net *nn, void *ptr, stateid_t *stid)
> > {
> > int new_id;
> >
> > idr_preload(GFP_KERNEL);
> > spin_lock(&nn->s2s_cp_lock);
> > - new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, copy, 0, 0, GFP_NOWAIT);
> > + new_id = idr_alloc_cyclic(&nn->s2s_cp_stateids, ptr, 0, 0, GFP_NOWAIT);
> > spin_unlock(&nn->s2s_cp_lock);
> > idr_preload_end();
> > if (new_id < 0)
> > return 0;
> > - copy->cp_stateid.si_opaque.so_id = new_id;
> > - copy->cp_stateid.si_opaque.so_clid.cl_boot = nn->boot_time;
> > - copy->cp_stateid.si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> > + stid->si_opaque.so_id = new_id;
> > + stid->si_opaque.so_clid.cl_boot = nn->boot_time;
> > + stid->si_opaque.so_clid.cl_id = nn->s2s_cp_cl_id;
> > return 1;
> > }
> >
> > -void nfs4_free_cp_state(struct nfsd4_copy *copy)
> > +int nfs4_init_copy_state(struct nfsd_net *nn, struct nfsd4_copy *copy)
> > +{
> > + return nfs4_init_cp_state(nn, copy, &copy->cp_stateid);
> > +}
> > +
> > +struct nfs4_cpntf_state *nfs4_alloc_init_cpntf_state(struct nfsd_net *nn,
> > + struct nfs4_stid *p_stid)
> > +{
> > + struct nfs4_cpntf_state *cps;
> > +
> > + cps = kzalloc(sizeof(struct nfs4_cpntf_state), GFP_KERNEL);
> > + if (!cps)
> > + return NULL;
> > + if (!nfs4_init_cp_state(nn, cps, &cps->cp_stateid))
> > + goto out_free;
> > + cps->cp_p_stid = p_stid;
> > + cps->cp_active = false;
> > + cps->cp_timeout = jiffies + (nn->nfsd4_lease * HZ);
> > + INIT_LIST_HEAD(&cps->cp_list);
> > + spin_lock(&nn->s2s_cp_lock);
> > + list_add(&cps->cp_list, &p_stid->sc_cp_list);
> > + spin_unlock(&nn->s2s_cp_lock);
> > +
> > + return cps;
> > +out_free:
> > + kfree(cps);
> > + return NULL;
> > +}
> > +
> > +void nfs4_free_copy_state(struct nfsd4_copy *copy)
> > {
> > struct nfsd_net *nn;
> >
> > @@ -743,6 +773,27 @@ void nfs4_free_cp_state(struct nfsd4_copy *copy)
> > spin_unlock(&nn->s2s_cp_lock);
> > }
> >
> > +static void nfs4_free_cpntf_statelist(struct net *net, struct nfs4_stid *stid)
> > +{
> > + struct nfs4_cpntf_state *cps;
> > + struct nfsd_net *nn;
> > +
> > + nn = net_generic(net, nfsd_net_id);
> > +
> > + might_sleep();
>
> What's that for? Just remove it unless you've got some good reason.

I think this function was using free_ol_stateid_reaplist() as an
example which has might_sleep() in it. I don't really see a reason why
we'd be sleeping in nfs4_free_cpntf_statelist() so I'll remove it.

>
> --b.