2015-03-16 21:17:54

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH v3 0/5] NFS: Add READ_PLUS support

These patches add client support for the NFS v4.2 operation READ_PLUS. This
operation is triggered by doing any kind of read on a NFS v4.2 mounted
filesystem. `

I tested these patches using xfstests and compared the runtime between NFS
v4.1, NFS v4.2 without READ_PLUS, and NFS v4.2 with READ_PLUS. Here are
the results:


Test | v4.1 | no READ_PLUS | READ_PLUS
---------------------------------------------------
generic/013 | 135s | 143s | 123s
generic/075 | 4s | 11s | 4s
generic/091 | 9s | 16s | 15s
generic/112 | 4s | 7s | 6s
generic/127 | 62s | 117s | 114s
generic/213 | [not run] | 1s | 1s
generic/214 | [not run] | 0s | 0s
generic/228 | [not run] | 1s | 2s
generic/236 | 1s | 1s | 1s
generic/263 | 4s | 6s | 8s
generic/285 | 0s | 1s | 3s
generic/315 | [not run] | 2s | 1s
---------------------------------------------------
Total | 3:47.47 | 5:11.85 | 4:43.77


Using the READ_PLUS operation does have an impact on reading sparse
files over the network. There is still a big difference between v4.1
and v4.2 runtimes, but this looks to be a result of fallocate() tests
that only run over v4.2.


Changes since v2:
- Merge together patches adding DATA and HOLE segment support.
- Simplify hole zeroing code by using the zero_user_segment() function.

These patches and the corresponding server changes are available in the
[read_plus] branch of

git://git.linux-nfs.org/projects/anna/linux-nfs.git

Questions? Comments? Thoughts?

Anna


Anna Schumaker (5):
SUNRPC: Split out a function for setting current page
SUNRPC: Add the ability to expand holes in data pages
NFS: Add basic READ_PLUS support
SUNRPC: Add the ability to shift data to a specific offset
NFS: Add support for decoding multiple segments

fs/nfs/nfs42xdr.c | 162 ++++++++++++++++++++++++++++++
fs/nfs/nfs4proc.c | 30 +++++-
fs/nfs/nfs4xdr.c | 1 +
include/linux/nfs4.h | 1 +
include/linux/nfs_fs_sb.h | 1 +
include/linux/nfs_xdr.h | 2 +-
include/linux/sunrpc/xdr.h | 2 +
net/sunrpc/xdr.c | 240 ++++++++++++++++++++++++++++++++++++++++++++-
8 files changed, 434 insertions(+), 5 deletions(-)

--
2.3.3



2015-03-16 21:17:55

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH v3 1/5] SUNRPC: Split out a function for setting current page

I'm going to need this bit of code in a few places for READ_PLUS
decoding, so let's make it a helper function.

Signed-off-by: Anna Schumaker <[email protected]>
---
net/sunrpc/xdr.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 4439ac4..b1c4ffb 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -759,6 +759,12 @@ static int xdr_set_page_base(struct xdr_stream *xdr,
return 0;
}

+static void xdr_set_page(struct xdr_stream *xdr, unsigned int base)
+{
+ if (xdr_set_page_base(xdr, base, PAGE_SIZE) < 0)
+ xdr_set_iov(xdr, xdr->buf->tail, xdr->buf->len);
+}
+
static void xdr_set_next_page(struct xdr_stream *xdr)
{
unsigned int newbase;
@@ -766,8 +772,7 @@ static void xdr_set_next_page(struct xdr_stream *xdr)
newbase = (1 + xdr->page_ptr - xdr->buf->pages) << PAGE_SHIFT;
newbase -= xdr->buf->page_base;

- if (xdr_set_page_base(xdr, newbase, PAGE_SIZE) < 0)
- xdr_set_iov(xdr, xdr->buf->tail, xdr->buf->len);
+ xdr_set_page(xdr, newbase);
}

static bool xdr_set_next_buffer(struct xdr_stream *xdr)
--
2.3.3


2015-03-16 21:17:56

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH v3 2/5] SUNRPC: Add the ability to expand holes in data pages

This patch adds the ability to "read a hole" into a set of XDR data
pages by taking the following steps:

1) Shift all data after the current xdr->p to the right, possibly into
the tail,
2) Zero the specified range, and
3) Update xdr->p to point beyond the hole.

Signed-off-by: Anna Schumaker <[email protected]>
---
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/xdr.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 100 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 70c6b92..81c5a3f 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -229,6 +229,7 @@ extern __be32 *xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes);
extern unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len);
extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len);
extern int xdr_process_buf(struct xdr_buf *buf, unsigned int offset, unsigned int len, int (*actor)(struct scatterlist *, void *), void *data);
+extern size_t xdr_expand_hole(struct xdr_stream *, size_t, size_t);

#endif /* __KERNEL__ */

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index b1c4ffb..be154e7 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -219,6 +219,40 @@ _shift_data_right_pages(struct page **pages, size_t pgto_base,
} while ((len -= copy) != 0);
}

+static void
+_shift_data_right_tail(struct xdr_buf *buf, size_t pgfrom_base, size_t len)
+{
+ struct kvec *tail = buf->tail;
+
+ /* Make room for new data. */
+ if (tail->iov_len > 0)
+ memmove((char *)tail->iov_base + len, tail->iov_base, len);
+
+ _copy_from_pages((char *)tail->iov_base,
+ buf->pages,
+ buf->page_base + pgfrom_base,
+ len);
+
+ tail->iov_len += len;
+}
+
+static void
+_shift_data_right(struct xdr_buf *buf, size_t to, size_t from, size_t len)
+{
+ size_t shift = len;
+
+ if ((to + len) > buf->page_len) {
+ shift = (to + len) - buf->page_len;
+ _shift_data_right_tail(buf, (from + len) - shift, shift);
+ shift = len - shift;
+ }
+
+ _shift_data_right_pages(buf->pages,
+ buf->page_base + to,
+ buf->page_base + from,
+ shift);
+}
+
/**
* _copy_to_pages
* @pages: array of pages
@@ -304,6 +338,33 @@ _copy_from_pages(char *p, struct page **pages, size_t pgbase, size_t len)
EXPORT_SYMBOL_GPL(_copy_from_pages);

/**
+ * _zero_data_pages
+ * @pages: array of pages
+ * @pgbase: beginning page vector address
+ * @len: length
+ */
+static void
+_zero_data_pages(struct page **pages, size_t pgbase, size_t len)
+{
+ struct page **page;
+ size_t zero;
+
+ page = pages + (pgbase >> PAGE_CACHE_SHIFT);
+ pgbase &= ~PAGE_CACHE_MASK;
+
+ do {
+ zero = len;
+ if (pgbase + zero > PAGE_SIZE)
+ zero = PAGE_SIZE - pgbase;
+
+ zero_user_segment(*page, pgbase, pgbase + zero);
+ page++;
+ pgbase = 0;
+
+ } while ((len -= zero) != 0);
+}
+
+/**
* xdr_shrink_bufhead
* @buf: xdr_buf
* @len: bytes to remove from buf->head[0]
@@ -445,6 +506,24 @@ unsigned int xdr_stream_pos(const struct xdr_stream *xdr)
EXPORT_SYMBOL_GPL(xdr_stream_pos);

/**
+ * xdr_page_pos - Return the current offset from the start of the xdr->buf->pages
+ * @xdr: pointer to struct xdr_stream
+ */
+static size_t xdr_page_pos(const struct xdr_stream *xdr)
+{
+ unsigned int offset;
+ unsigned int base = xdr->buf->page_len;
+ void *kaddr = xdr->buf->tail->iov_base;;
+
+ if (xdr->page_ptr) {
+ base = (xdr->page_ptr - xdr->buf->pages) * PAGE_SIZE;
+ kaddr = page_address(*xdr->page_ptr);
+ }
+ offset = xdr->p - (__be32 *)kaddr;
+ return base + (offset * sizeof(__be32));
+}
+
+/**
* xdr_init_encode - Initialize a struct xdr_stream for sending data.
* @xdr: pointer to xdr_stream struct
* @buf: pointer to XDR buffer in which to encode data
@@ -976,6 +1055,26 @@ unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len)
}
EXPORT_SYMBOL_GPL(xdr_read_pages);

+size_t xdr_expand_hole(struct xdr_stream *xdr, size_t offset, size_t length)
+{
+ struct xdr_buf *buf = xdr->buf;
+ size_t from = 0;
+
+ if ((offset + length) > buf->page_len)
+ length = buf->page_len - offset;
+
+ if (offset == 0)
+ xdr_align_pages(xdr, xdr->nwords << 2);
+ else
+ from = xdr_page_pos(xdr);
+
+ _shift_data_right(buf, offset + length, from, xdr->nwords << 2);
+ _zero_data_pages(buf->pages, buf->page_base + offset, length);
+ xdr_set_page(xdr, offset + length);
+ return length;
+}
+EXPORT_SYMBOL_GPL(xdr_expand_hole);
+
/**
* xdr_enter_page - decode data from the XDR page
* @xdr: pointer to xdr_stream struct
--
2.3.3


2015-03-16 21:17:57

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH v3 3/5] NFS: Add basic READ_PLUS support

This patch adds support for decoding a single NFS4_CONTENT_DATA or
NFS4_CONTENT_HOLE segment returned by the server. This gives a simple
implementation that does not need to spent a lot of time shifting data
arount.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfs/nfs42xdr.c | 158 ++++++++++++++++++++++++++++++++++++++++++++++
fs/nfs/nfs4proc.c | 30 ++++++++-
fs/nfs/nfs4xdr.c | 1 +
include/linux/nfs4.h | 1 +
include/linux/nfs_fs_sb.h | 1 +
include/linux/nfs_xdr.h | 2 +-
6 files changed, 190 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index 1a25b27..4d7bd89 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -13,6 +13,14 @@
#define encode_deallocate_maxsz (op_encode_hdr_maxsz + \
encode_fallocate_maxsz)
#define decode_deallocate_maxsz (op_decode_hdr_maxsz)
+#define encode_read_plus_maxsz (op_encode_hdr_maxsz + \
+ encode_stateid_maxsz + 3)
+#define decode_read_plus_maxsz (op_decode_hdr_maxsz + \
+ 1 /* rpr_eof */ + \
+ 1 /* rpr_contents count */ + \
+ 1 /* data_content4 */ + \
+ 2 /* data_info4.di_offset */ + \
+ 2 /* data_info4.di_length */)
#define encode_seek_maxsz (op_encode_hdr_maxsz + \
encode_stateid_maxsz + \
2 /* offset */ + \
@@ -39,6 +47,12 @@
decode_putfh_maxsz + \
decode_deallocate_maxsz + \
decode_getattr_maxsz)
+#define NFS4_enc_read_plus_sz (compound_encode_hdr_maxsz + \
+ encode_putfh_maxsz + \
+ encode_read_plus_maxsz)
+#define NFS4_dec_read_plus_sz (compound_decode_hdr_maxsz + \
+ decode_putfh_maxsz + \
+ decode_read_plus_maxsz)
#define NFS4_enc_seek_sz (compound_encode_hdr_maxsz + \
encode_putfh_maxsz + \
encode_seek_maxsz)
@@ -71,6 +85,16 @@ static void encode_deallocate(struct xdr_stream *xdr,
encode_fallocate(xdr, args);
}

+static void encode_read_plus(struct xdr_stream *xdr,
+ struct nfs_pgio_args *args,
+ struct compound_hdr *hdr)
+{
+ encode_op_hdr(xdr, OP_READ_PLUS, decode_read_plus_maxsz, hdr);
+ encode_nfs4_stateid(xdr, &args->stateid);
+ encode_uint64(xdr, args->offset);
+ encode_uint32(xdr, args->count);
+}
+
static void encode_seek(struct xdr_stream *xdr,
struct nfs42_seek_args *args,
struct compound_hdr *hdr)
@@ -120,6 +144,28 @@ static void nfs4_xdr_enc_deallocate(struct rpc_rqst *req,
}

/*
+ * Encode READ_PLUS request
+ */
+static void nfs4_xdr_enc_read_plus(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ struct nfs_pgio_args *args)
+{
+ struct compound_hdr hdr = {
+ .minorversion = nfs4_xdr_minorversion(&args->seq_args),
+ };
+
+ encode_compound_hdr(xdr, req, &hdr);
+ encode_sequence(xdr, &args->seq_args, &hdr);
+ encode_putfh(xdr, args->fh, &hdr);
+ encode_read_plus(xdr, args, &hdr);
+
+ xdr_inline_pages(&req->rq_rcv_buf, hdr.replen << 2,
+ args->pages, args->pgbase, args->count);
+ req->rq_rcv_buf.flags |= XDRBUF_READ;
+ encode_nops(&hdr);
+}
+
+/*
* Encode SEEK request
*/
static void nfs4_xdr_enc_seek(struct rpc_rqst *req,
@@ -147,6 +193,92 @@ static int decode_deallocate(struct xdr_stream *xdr, struct nfs42_falloc_res *re
return decode_op_hdr(xdr, OP_DEALLOCATE);
}

+static int decode_read_plus_data(struct xdr_stream *xdr, struct nfs_pgio_res *res)
+{
+ __be32 *p;
+ uint32_t count, recvd;
+ uint64_t offset;
+
+ p = xdr_inline_decode(xdr, 8 + 4);
+ if (unlikely(!p))
+ goto out_overflow;
+
+ p = xdr_decode_hyper(p, &offset);
+ count = be32_to_cpup(p);
+
+ recvd = xdr_read_pages(xdr, count);
+ if (recvd < count)
+ res->eof = 0;
+
+ res->count = recvd;
+ return 0;
+out_overflow:
+ print_overflow_msg(__func__, xdr);
+ return -EIO;
+}
+
+static int decode_read_plus_hole(struct xdr_stream *xdr, struct nfs_pgio_res *res)
+{
+ __be32 *p;
+ uint64_t offset, length;
+ size_t recvd;
+
+ p = xdr_inline_decode(xdr, 8 + 8);
+ if (unlikely(!p))
+ goto out_overflow;
+
+ p = xdr_decode_hyper(p, &offset);
+ p = xdr_decode_hyper(p, &length);
+
+ recvd = xdr_expand_hole(xdr, 0, length);
+ if (recvd < length)
+ res->eof = 0;
+
+ res->count = recvd;
+ return 0;
+out_overflow:
+ print_overflow_msg(__func__, xdr);
+ return -EIO;
+}
+
+static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res *res)
+{
+ __be32 *p;
+ int status, type;
+ uint32_t segments;
+
+ status = decode_op_hdr(xdr, OP_READ_PLUS);
+ if (status)
+ return status;
+
+ p = xdr_inline_decode(xdr, 4 + 4);
+ if (unlikely(!p))
+ goto out_overflow;
+
+ res->count = 0;
+ res->eof = be32_to_cpup(p++);
+ segments = be32_to_cpup(p++);
+ if (segments == 0)
+ return 0;
+
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ goto out_overflow;
+
+ type = be32_to_cpup(p++);
+ if (type == NFS4_CONTENT_DATA)
+ status = decode_read_plus_data(xdr, res);
+ else
+ status = decode_read_plus_hole(xdr, res);
+
+ if (segments > 1)
+ res->eof = 0;
+ return status;
+out_overflow:
+ print_overflow_msg(__func__, xdr);
+ return -EIO;
+}
+
static int decode_seek(struct xdr_stream *xdr, struct nfs42_seek_res *res)
{
int status;
@@ -224,6 +356,32 @@ out:
}

/*
+ * Decode READ_PLUS request
+ */
+static int nfs4_xdr_dec_read_plus(struct rpc_rqst *rqstp,
+ struct xdr_stream *xdr,
+ struct nfs_pgio_res *res)
+{
+ struct compound_hdr hdr;
+ int status;
+
+ status = decode_compound_hdr(xdr, &hdr);
+ if (status)
+ goto out;
+ status = decode_sequence(xdr, &res->seq_res, rqstp);
+ if (status)
+ goto out;
+ status = decode_putfh(xdr);
+ if (status)
+ goto out;
+ status = decode_read_plus(xdr, res);
+ if (!status)
+ status = res->count;
+out:
+ return status;
+}
+
+/*
* Decode SEEK request
*/
static int nfs4_xdr_dec_seek(struct rpc_rqst *rqstp,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9ff8c63..175d01f 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -68,6 +68,10 @@

#include "nfs4trace.h"

+#ifdef CONFIG_NFS_V4_2
+#include "nfs42.h"
+#endif /* CONFIG_NFS_V4_2 */
+
#define NFSDBG_FACILITY NFSDBG_PROC

#define NFS4_POLL_RETRY_MIN (HZ/10)
@@ -4187,9 +4191,15 @@ static bool nfs4_read_stateid_changed(struct rpc_task *task,

static int nfs4_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
{
-
+ struct nfs_server *server = NFS_SERVER(hdr->inode);
dprintk("--> %s\n", __func__);

+ if ((server->caps & NFS_CAP_READ_PLUS) && (task->tk_status == -ENOTSUPP)) {
+ server->caps &= ~NFS_CAP_READ_PLUS;
+ if (rpc_restart_call_prepare(task))
+ task->tk_status = 0;
+ return -EAGAIN;
+ }
if (!nfs4_sequence_done(task, &hdr->res.seq_res))
return -EAGAIN;
if (nfs4_read_stateid_changed(task, &hdr->args))
@@ -4198,12 +4208,27 @@ static int nfs4_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
nfs4_read_done_cb(task, hdr);
}

+#ifdef CONFIG_NFS_V4_2
+static void nfs42_read_plus_support(struct nfs_server *server, struct rpc_message *msg)
+{
+ if (server->caps & NFS_CAP_READ_PLUS)
+ msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS];
+ else
+ msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
+}
+#else
+static void nfs42_read_plus_support(struct nfs_server *server, struct rpc_message *msg)
+{
+ msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
+}
+#endif /* CONFIG_NFS_V4_2 */
+
static void nfs4_proc_read_setup(struct nfs_pgio_header *hdr,
struct rpc_message *msg)
{
hdr->timestamp = jiffies;
hdr->pgio_done_cb = nfs4_read_done_cb;
- msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
+ nfs42_read_plus_support(NFS_SERVER(hdr->inode), msg);
nfs4_init_sequence(&hdr->args.seq_args, &hdr->res.seq_res, 0);
}

@@ -8556,6 +8581,7 @@ static const struct nfs4_minor_version_ops nfs_v4_2_minor_ops = {
| NFS_CAP_ATOMIC_OPEN_V1
| NFS_CAP_ALLOCATE
| NFS_CAP_DEALLOCATE
+ | NFS_CAP_READ_PLUS
| NFS_CAP_SEEK,
.init_client = nfs41_init_client,
.shutdown_client = nfs41_shutdown_client,
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 6b28a60..39e521a 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7421,6 +7421,7 @@ struct rpc_procinfo nfs4_procedures[] = {
PROC(SEEK, enc_seek, dec_seek),
PROC(ALLOCATE, enc_allocate, dec_allocate),
PROC(DEALLOCATE, enc_deallocate, dec_deallocate),
+ PROC(READ_PLUS, enc_read_plus, dec_read_plus),
#endif /* CONFIG_NFS_V4_2 */
};

diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index ed43cb7..2470e62 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -493,6 +493,7 @@ enum {
NFSPROC4_CLNT_SEEK,
NFSPROC4_CLNT_ALLOCATE,
NFSPROC4_CLNT_DEALLOCATE,
+ NFSPROC4_CLNT_READ_PLUS,
};

/* nfs41 types */
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 5e1273d..728d928 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -237,5 +237,6 @@ struct nfs_server {
#define NFS_CAP_SEEK (1U << 19)
#define NFS_CAP_ALLOCATE (1U << 20)
#define NFS_CAP_DEALLOCATE (1U << 21)
+#define NFS_CAP_READ_PLUS (1U << 22)

#endif
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 93ab607..35651b2 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -520,7 +520,7 @@ struct nfs_pgio_args {
struct nfs_pgio_res {
struct nfs4_sequence_res seq_res;
struct nfs_fattr * fattr;
- __u32 count;
+ __u64 count;
__u32 op_status;
int eof; /* used by read */
struct nfs_writeverf * verf; /* used by write */
--
2.3.3


2015-03-16 21:17:58

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH v3 4/5] SUNRPC: Add the ability to shift data to a specific offset

Expanding holes tends to put the data content a few bytes to the right
of where we want it. This patch implements a left-shift operation to
line everything up properly.

Signed-off-by: Anna Schumaker <[email protected]>
---
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/xdr.c | 132 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 133 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 81c5a3f..3670bf6 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -230,6 +230,7 @@ extern unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len);
extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len);
extern int xdr_process_buf(struct xdr_buf *buf, unsigned int offset, unsigned int len, int (*actor)(struct scatterlist *, void *), void *data);
extern size_t xdr_expand_hole(struct xdr_stream *, size_t, size_t);
+extern uint64_t xdr_align_data(struct xdr_stream *, uint64_t, uint64_t);

#endif /* __KERNEL__ */

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index be154e7..38efc8f 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -16,6 +16,9 @@
#include <linux/sunrpc/xdr.h>
#include <linux/sunrpc/msg_prot.h>

+static void _copy_to_pages(struct page **, size_t, const char *, size_t);
+
+
/*
* XDR functions for basic NFS types
*/
@@ -253,6 +256,117 @@ _shift_data_right(struct xdr_buf *buf, size_t to, size_t from, size_t len)
shift);
}

+
+/**
+ * _shift_data_left_pages
+ * @pages: vector of pages containing both the source and dest memory area.
+ * @pgto_base: page vector address of destination
+ * @pgfrom_base: page vector address of source
+ * @len: number of bytes to copy
+ *
+ * Note: the addresses pgto_base and pgfrom_base are both calculated in
+ * the same way:
+ * if a memory area starts at byte 'base' in page 'pages[i]',
+ * then its address is given as (i << PAGE_CACHE_SHIFT) + base
+ * Alse note: pgto_base must be < pgfrom_base, but the memory areas
+ * they point to may overlap.
+ */
+static void
+_shift_data_left_pages(struct page **pages, size_t pgto_base,
+ size_t pgfrom_base, size_t len)
+{
+ struct page **pgfrom, **pgto;
+ char *vfrom, *vto;
+ size_t copy;
+
+ BUG_ON(pgfrom_base <= pgto_base);
+
+ pgto = pages + (pgto_base >> PAGE_CACHE_SHIFT);
+ pgfrom = pages + (pgfrom_base >> PAGE_CACHE_SHIFT);
+
+ pgto_base = pgto_base % PAGE_CACHE_SIZE;
+ pgfrom_base = pgfrom_base % PAGE_CACHE_SIZE;
+
+ do {
+ if (pgto_base >= PAGE_CACHE_SIZE) {
+ pgto_base = 0;
+ pgto++;
+ }
+ if (pgfrom_base >= PAGE_CACHE_SIZE){
+ pgfrom_base = 0;
+ pgfrom++;
+ }
+
+ copy = len;
+ if (copy > (PAGE_CACHE_SIZE - pgto_base))
+ copy = PAGE_CACHE_SIZE - pgto_base;
+ if (copy > (PAGE_CACHE_SIZE - pgfrom_base))
+ copy = PAGE_CACHE_SIZE - pgfrom_base;
+
+ if (pgto_base == 131056)
+ break;
+
+ vto = kmap_atomic(*pgto);
+ if (*pgto != *pgfrom) {
+ vfrom = kmap_atomic(*pgfrom);
+ memcpy(vto + pgto_base, vfrom + pgfrom_base, copy);
+ kunmap_atomic(vfrom);
+ } else
+ memmove(vto + pgto_base, vto + pgfrom_base, copy);
+ flush_dcache_page(*pgto);
+ kunmap_atomic(vto);
+
+ pgto_base += copy;
+ pgfrom_base += copy;
+
+ } while ((len -= copy) != 0);
+}
+
+static void
+_shift_data_left_tail(struct xdr_buf *buf, size_t pgto_base,
+ size_t tail_from, size_t len)
+{
+ struct kvec *tail = buf->tail;
+ size_t shift = len;
+
+ if (len == 0)
+ return;
+ if (pgto_base + len > buf->page_len)
+ shift = buf->page_len - pgto_base;
+
+ _copy_to_pages(buf->pages,
+ buf->page_base + pgto_base,
+ (char *)(tail->iov_base + tail_from),
+ shift);
+
+ memmove((char *)tail->iov_base, tail->iov_base + tail_from + shift, shift);
+ tail->iov_len -= (tail_from + shift);
+}
+
+static void
+_shift_data_left(struct xdr_buf *buf, size_t to, size_t from, size_t len)
+{
+ size_t shift = len;
+
+ if (from < buf->page_len) {
+ shift = min(len, buf->page_len - from);
+ _shift_data_left_pages(buf->pages,
+ buf->page_base + to,
+ buf->page_base + from,
+ shift);
+ to += shift;
+ from += shift;
+ shift = len - shift;
+ }
+
+ if (shift == 0)
+ return;
+ if (from >= buf->page_len)
+ from -= buf->page_len;
+
+ _shift_data_left_tail(buf, to, from, shift);
+}
+
/**
* _copy_to_pages
* @pages: array of pages
@@ -1075,6 +1189,24 @@ size_t xdr_expand_hole(struct xdr_stream *xdr, size_t offset, size_t length)
}
EXPORT_SYMBOL_GPL(xdr_expand_hole);

+uint64_t xdr_align_data(struct xdr_stream *xdr, uint64_t offset, uint64_t length)
+{
+ struct xdr_buf *buf = xdr->buf;
+
+ if (offset + length > buf->page_len)
+ length = buf->page_len - offset;
+
+ if (offset == 0)
+ xdr_align_pages(xdr, xdr->nwords << 2);
+ else
+ _shift_data_left(buf, offset, xdr_page_pos(xdr), xdr->nwords << 2);
+
+ xdr->nwords -= XDR_QUADLEN(length);
+ xdr_set_page(xdr, offset + length);
+ return length;
+}
+EXPORT_SYMBOL_GPL(xdr_align_data);
+
/**
* xdr_enter_page - decode data from the XDR page
* @xdr: pointer to xdr_stream struct
--
2.3.3


2015-03-16 21:17:59

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH v3 5/5] NFS: Add support for decoding multiple segments

We now have everything we need to read holes and then shift data to
where it's supposed to be.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfs/nfs42xdr.c | 40 ++++++++++++++++++++++------------------
1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index 4d7bd89..2a93abc 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -206,11 +206,11 @@ static int decode_read_plus_data(struct xdr_stream *xdr, struct nfs_pgio_res *re
p = xdr_decode_hyper(p, &offset);
count = be32_to_cpup(p);

- recvd = xdr_read_pages(xdr, count);
+ recvd = xdr_align_data(xdr, res->count, count);
if (recvd < count)
res->eof = 0;

- res->count = recvd;
+ res->count += recvd;
return 0;
out_overflow:
print_overflow_msg(__func__, xdr);
@@ -230,11 +230,11 @@ static int decode_read_plus_hole(struct xdr_stream *xdr, struct nfs_pgio_res *re
p = xdr_decode_hyper(p, &offset);
p = xdr_decode_hyper(p, &length);

- recvd = xdr_expand_hole(xdr, 0, length);
+ recvd = xdr_expand_hole(xdr, res->count, length);
if (recvd < length)
res->eof = 0;

- res->count = recvd;
+ res->count += recvd;
return 0;
out_overflow:
print_overflow_msg(__func__, xdr);
@@ -245,7 +245,7 @@ static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res *res)
{
__be32 *p;
int status, type;
- uint32_t segments;
+ uint32_t i, segments;

status = decode_op_hdr(xdr, OP_READ_PLUS);
if (status)
@@ -258,20 +258,24 @@ static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res *res)
res->count = 0;
res->eof = be32_to_cpup(p++);
segments = be32_to_cpup(p++);
- if (segments == 0)
- return 0;

- p = xdr_inline_decode(xdr, 4);
- if (unlikely(!p))
- goto out_overflow;
-
- type = be32_to_cpup(p++);
- if (type == NFS4_CONTENT_DATA)
- status = decode_read_plus_data(xdr, res);
- else
- status = decode_read_plus_hole(xdr, res);
-
- if (segments > 1)
+ for (i = 0; i < segments; i++) {
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ goto out_overflow;
+
+ type = be32_to_cpup(p);
+ if (type == NFS4_CONTENT_DATA)
+ status = decode_read_plus_data(xdr, res);
+ else
+ status = decode_read_plus_hole(xdr, res);
+ if (status)
+ break;
+ if (res->count == xdr->buf->page_len)
+ break;
+ }
+
+ if (i < segments)
res->eof = 0;
return status;
out_overflow:
--
2.3.3


2015-03-17 10:31:54

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] NFS: Add READ_PLUS support

On Mon, Mar 16, 2015 at 05:17:41PM -0400, Anna Schumaker wrote:
> Test | v4.1 | no READ_PLUS | READ_PLUS
> ---------------------------------------------------
> generic/013 | 135s | 143s | 123s
> generic/075 | 4s | 11s | 4s
> generic/091 | 9s | 16s | 15s
> generic/112 | 4s | 7s | 6s
> generic/127 | 62s | 117s | 114s
> generic/213 | [not run] | 1s | 1s
> generic/214 | [not run] | 0s | 0s
> generic/228 | [not run] | 1s | 2s
> generic/236 | 1s | 1s | 1s
> generic/263 | 4s | 6s | 8s
> generic/285 | 0s | 1s | 3s
> generic/315 | [not run] | 2s | 1s
> ---------------------------------------------------
> Total | 3:47.47 | 5:11.85 | 4:43.77

So why is 4.2 generally slower than 4.1? I guess we really should look
into that first.

2015-03-17 12:30:27

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] NFS: Add READ_PLUS support

On 03/17/2015 06:31 AM, Christoph Hellwig wrote:
> On Mon, Mar 16, 2015 at 05:17:41PM -0400, Anna Schumaker wrote:
>> Test | v4.1 | no READ_PLUS | READ_PLUS
>> ---------------------------------------------------
>> generic/013 | 135s | 143s | 123s
>> generic/075 | 4s | 11s | 4s
>> generic/091 | 9s | 16s | 15s
>> generic/112 | 4s | 7s | 6s
>> generic/127 | 62s | 117s | 114s
>> generic/213 | [not run] | 1s | 1s
>> generic/214 | [not run] | 0s | 0s
>> generic/228 | [not run] | 1s | 2s
>> generic/236 | 1s | 1s | 1s
>> generic/263 | 4s | 6s | 8s
>> generic/285 | 0s | 1s | 3s
>> generic/315 | [not run] | 2s | 1s
>> ---------------------------------------------------
>> Total | 3:47.47 | 5:11.85 | 4:43.77
>
> So why is 4.2 generally slower than 4.1? I guess we really should look
> into that first.
>

I did! There are several tests that run on v4.1 but skip over fallocate() calls. v4.2 no longer skips these tests.

2015-03-17 19:08:44

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] NFS: Add READ_PLUS support

On Mon, Mar 16, 2015 at 05:17:41PM -0400, Anna Schumaker wrote:
> These patches add client support for the NFS v4.2 operation READ_PLUS. This
> operation is triggered by doing any kind of read on a NFS v4.2 mounted
> filesystem. `
>
> I tested these patches using xfstests and compared the runtime between NFS
> v4.1, NFS v4.2 without READ_PLUS, and NFS v4.2 with READ_PLUS. Here are
> the results:

These are interesting but don't tell us much on their own. As far as I
know xfstests isn't really intended to be used for performance
comparisons, and there are some mysteries here:

> Test | v4.1 | no READ_PLUS | READ_PLUS
> ---------------------------------------------------
> generic/013 | 135s | 143s | 123s

This is an fsstress run. Looking quickly at ltp/fsstress.c, I see it
can do fallocate calls, so presumably that explains the difference.

> generic/075 | 4s | 11s | 4s

Ditto for fsx.

> generic/091 | 9s | 16s | 15s
> generic/112 | 4s | 7s | 6s
> generic/127 | 62s | 117s | 114s

fsx again; this one must really have a ton of fallocate calls? Might be
interesting to look at strace or /proc/self/mountstats to confirm that.

> generic/213 | [not run] | 1s | 1s
> generic/214 | [not run] | 0s | 0s
> generic/228 | [not run] | 1s | 2s
> generic/236 | 1s | 1s | 1s
> generic/263 | 4s | 6s | 8s
> generic/285 | 0s | 1s | 3s
> generic/315 | [not run] | 2s | 1s
> ---------------------------------------------------
> Total | 3:47.47 | 5:11.85 | 4:43.77

I'm already inclined to believe that the bandwidth savings are more
important than zero-copy in the bad cases. And the xfstests timings
count for something. But I'd still rather see more of an argument.

It'd be interesting to see the best possible case (reading a large file
that's all one hole), and the worst (reading a file with alternating 4K
data and holes?).

--b.

>
>
> Using the READ_PLUS operation does have an impact on reading sparse
> files over the network. There is still a big difference between v4.1
> and v4.2 runtimes, but this looks to be a result of fallocate() tests
> that only run over v4.2.
>
>
> Changes since v2:
> - Merge together patches adding DATA and HOLE segment support.
> - Simplify hole zeroing code by using the zero_user_segment() function.
>
> These patches and the corresponding server changes are available in the
> [read_plus] branch of
>
> git://git.linux-nfs.org/projects/anna/linux-nfs.git
>
> Questions? Comments? Thoughts?
>
> Anna
>
>
> Anna Schumaker (5):
> SUNRPC: Split out a function for setting current page
> SUNRPC: Add the ability to expand holes in data pages
> NFS: Add basic READ_PLUS support
> SUNRPC: Add the ability to shift data to a specific offset
> NFS: Add support for decoding multiple segments
>
> fs/nfs/nfs42xdr.c | 162 ++++++++++++++++++++++++++++++
> fs/nfs/nfs4proc.c | 30 +++++-
> fs/nfs/nfs4xdr.c | 1 +
> include/linux/nfs4.h | 1 +
> include/linux/nfs_fs_sb.h | 1 +
> include/linux/nfs_xdr.h | 2 +-
> include/linux/sunrpc/xdr.h | 2 +
> net/sunrpc/xdr.c | 240 ++++++++++++++++++++++++++++++++++++++++++++-
> 8 files changed, 434 insertions(+), 5 deletions(-)
>
> --
> 2.3.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-03-17 19:14:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v3 3/5] NFS: Add basic READ_PLUS support

On Mon, Mar 16, 2015 at 05:17:44PM -0400, Anna Schumaker wrote:
> This patch adds support for decoding a single NFS4_CONTENT_DATA or
> NFS4_CONTENT_HOLE segment returned by the server. This gives a simple
> implementation that does not need to spent a lot of time shifting data
> arount.

So to make sure I understand the code correctly: if the server returns
(say) 10 segments, the client won't error out--it will just use the
information from the first segment, ignore the remaining 9, and send
more reads if necessary to get the rest.

So that looks simple and correct. The worst case for performance would
probably be reading a file that alternates small data and hole segments.

--b.

>
> Signed-off-by: Anna Schumaker <[email protected]>
> ---
> fs/nfs/nfs42xdr.c | 158 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/nfs/nfs4proc.c | 30 ++++++++-
> fs/nfs/nfs4xdr.c | 1 +
> include/linux/nfs4.h | 1 +
> include/linux/nfs_fs_sb.h | 1 +
> include/linux/nfs_xdr.h | 2 +-
> 6 files changed, 190 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> index 1a25b27..4d7bd89 100644
> --- a/fs/nfs/nfs42xdr.c
> +++ b/fs/nfs/nfs42xdr.c
> @@ -13,6 +13,14 @@
> #define encode_deallocate_maxsz (op_encode_hdr_maxsz + \
> encode_fallocate_maxsz)
> #define decode_deallocate_maxsz (op_decode_hdr_maxsz)
> +#define encode_read_plus_maxsz (op_encode_hdr_maxsz + \
> + encode_stateid_maxsz + 3)
> +#define decode_read_plus_maxsz (op_decode_hdr_maxsz + \
> + 1 /* rpr_eof */ + \
> + 1 /* rpr_contents count */ + \
> + 1 /* data_content4 */ + \
> + 2 /* data_info4.di_offset */ + \
> + 2 /* data_info4.di_length */)
> #define encode_seek_maxsz (op_encode_hdr_maxsz + \
> encode_stateid_maxsz + \
> 2 /* offset */ + \
> @@ -39,6 +47,12 @@
> decode_putfh_maxsz + \
> decode_deallocate_maxsz + \
> decode_getattr_maxsz)
> +#define NFS4_enc_read_plus_sz (compound_encode_hdr_maxsz + \
> + encode_putfh_maxsz + \
> + encode_read_plus_maxsz)
> +#define NFS4_dec_read_plus_sz (compound_decode_hdr_maxsz + \
> + decode_putfh_maxsz + \
> + decode_read_plus_maxsz)
> #define NFS4_enc_seek_sz (compound_encode_hdr_maxsz + \
> encode_putfh_maxsz + \
> encode_seek_maxsz)
> @@ -71,6 +85,16 @@ static void encode_deallocate(struct xdr_stream *xdr,
> encode_fallocate(xdr, args);
> }
>
> +static void encode_read_plus(struct xdr_stream *xdr,
> + struct nfs_pgio_args *args,
> + struct compound_hdr *hdr)
> +{
> + encode_op_hdr(xdr, OP_READ_PLUS, decode_read_plus_maxsz, hdr);
> + encode_nfs4_stateid(xdr, &args->stateid);
> + encode_uint64(xdr, args->offset);
> + encode_uint32(xdr, args->count);
> +}
> +
> static void encode_seek(struct xdr_stream *xdr,
> struct nfs42_seek_args *args,
> struct compound_hdr *hdr)
> @@ -120,6 +144,28 @@ static void nfs4_xdr_enc_deallocate(struct rpc_rqst *req,
> }
>
> /*
> + * Encode READ_PLUS request
> + */
> +static void nfs4_xdr_enc_read_plus(struct rpc_rqst *req,
> + struct xdr_stream *xdr,
> + struct nfs_pgio_args *args)
> +{
> + struct compound_hdr hdr = {
> + .minorversion = nfs4_xdr_minorversion(&args->seq_args),
> + };
> +
> + encode_compound_hdr(xdr, req, &hdr);
> + encode_sequence(xdr, &args->seq_args, &hdr);
> + encode_putfh(xdr, args->fh, &hdr);
> + encode_read_plus(xdr, args, &hdr);
> +
> + xdr_inline_pages(&req->rq_rcv_buf, hdr.replen << 2,
> + args->pages, args->pgbase, args->count);
> + req->rq_rcv_buf.flags |= XDRBUF_READ;
> + encode_nops(&hdr);
> +}
> +
> +/*
> * Encode SEEK request
> */
> static void nfs4_xdr_enc_seek(struct rpc_rqst *req,
> @@ -147,6 +193,92 @@ static int decode_deallocate(struct xdr_stream *xdr, struct nfs42_falloc_res *re
> return decode_op_hdr(xdr, OP_DEALLOCATE);
> }
>
> +static int decode_read_plus_data(struct xdr_stream *xdr, struct nfs_pgio_res *res)
> +{
> + __be32 *p;
> + uint32_t count, recvd;
> + uint64_t offset;
> +
> + p = xdr_inline_decode(xdr, 8 + 4);
> + if (unlikely(!p))
> + goto out_overflow;
> +
> + p = xdr_decode_hyper(p, &offset);
> + count = be32_to_cpup(p);
> +
> + recvd = xdr_read_pages(xdr, count);
> + if (recvd < count)
> + res->eof = 0;
> +
> + res->count = recvd;
> + return 0;
> +out_overflow:
> + print_overflow_msg(__func__, xdr);
> + return -EIO;
> +}
> +
> +static int decode_read_plus_hole(struct xdr_stream *xdr, struct nfs_pgio_res *res)
> +{
> + __be32 *p;
> + uint64_t offset, length;
> + size_t recvd;
> +
> + p = xdr_inline_decode(xdr, 8 + 8);
> + if (unlikely(!p))
> + goto out_overflow;
> +
> + p = xdr_decode_hyper(p, &offset);
> + p = xdr_decode_hyper(p, &length);
> +
> + recvd = xdr_expand_hole(xdr, 0, length);
> + if (recvd < length)
> + res->eof = 0;
> +
> + res->count = recvd;
> + return 0;
> +out_overflow:
> + print_overflow_msg(__func__, xdr);
> + return -EIO;
> +}
> +
> +static int decode_read_plus(struct xdr_stream *xdr, struct nfs_pgio_res *res)
> +{
> + __be32 *p;
> + int status, type;
> + uint32_t segments;
> +
> + status = decode_op_hdr(xdr, OP_READ_PLUS);
> + if (status)
> + return status;
> +
> + p = xdr_inline_decode(xdr, 4 + 4);
> + if (unlikely(!p))
> + goto out_overflow;
> +
> + res->count = 0;
> + res->eof = be32_to_cpup(p++);
> + segments = be32_to_cpup(p++);
> + if (segments == 0)
> + return 0;
> +
> + p = xdr_inline_decode(xdr, 4);
> + if (unlikely(!p))
> + goto out_overflow;
> +
> + type = be32_to_cpup(p++);
> + if (type == NFS4_CONTENT_DATA)
> + status = decode_read_plus_data(xdr, res);
> + else
> + status = decode_read_plus_hole(xdr, res);
> +
> + if (segments > 1)
> + res->eof = 0;
> + return status;
> +out_overflow:
> + print_overflow_msg(__func__, xdr);
> + return -EIO;
> +}
> +
> static int decode_seek(struct xdr_stream *xdr, struct nfs42_seek_res *res)
> {
> int status;
> @@ -224,6 +356,32 @@ out:
> }
>
> /*
> + * Decode READ_PLUS request
> + */
> +static int nfs4_xdr_dec_read_plus(struct rpc_rqst *rqstp,
> + struct xdr_stream *xdr,
> + struct nfs_pgio_res *res)
> +{
> + struct compound_hdr hdr;
> + int status;
> +
> + status = decode_compound_hdr(xdr, &hdr);
> + if (status)
> + goto out;
> + status = decode_sequence(xdr, &res->seq_res, rqstp);
> + if (status)
> + goto out;
> + status = decode_putfh(xdr);
> + if (status)
> + goto out;
> + status = decode_read_plus(xdr, res);
> + if (!status)
> + status = res->count;
> +out:
> + return status;
> +}
> +
> +/*
> * Decode SEEK request
> */
> static int nfs4_xdr_dec_seek(struct rpc_rqst *rqstp,
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 9ff8c63..175d01f 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -68,6 +68,10 @@
>
> #include "nfs4trace.h"
>
> +#ifdef CONFIG_NFS_V4_2
> +#include "nfs42.h"
> +#endif /* CONFIG_NFS_V4_2 */
> +
> #define NFSDBG_FACILITY NFSDBG_PROC
>
> #define NFS4_POLL_RETRY_MIN (HZ/10)
> @@ -4187,9 +4191,15 @@ static bool nfs4_read_stateid_changed(struct rpc_task *task,
>
> static int nfs4_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
> {
> -
> + struct nfs_server *server = NFS_SERVER(hdr->inode);
> dprintk("--> %s\n", __func__);
>
> + if ((server->caps & NFS_CAP_READ_PLUS) && (task->tk_status == -ENOTSUPP)) {
> + server->caps &= ~NFS_CAP_READ_PLUS;
> + if (rpc_restart_call_prepare(task))
> + task->tk_status = 0;
> + return -EAGAIN;
> + }
> if (!nfs4_sequence_done(task, &hdr->res.seq_res))
> return -EAGAIN;
> if (nfs4_read_stateid_changed(task, &hdr->args))
> @@ -4198,12 +4208,27 @@ static int nfs4_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr)
> nfs4_read_done_cb(task, hdr);
> }
>
> +#ifdef CONFIG_NFS_V4_2
> +static void nfs42_read_plus_support(struct nfs_server *server, struct rpc_message *msg)
> +{
> + if (server->caps & NFS_CAP_READ_PLUS)
> + msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS];
> + else
> + msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> +}
> +#else
> +static void nfs42_read_plus_support(struct nfs_server *server, struct rpc_message *msg)
> +{
> + msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> +}
> +#endif /* CONFIG_NFS_V4_2 */
> +
> static void nfs4_proc_read_setup(struct nfs_pgio_header *hdr,
> struct rpc_message *msg)
> {
> hdr->timestamp = jiffies;
> hdr->pgio_done_cb = nfs4_read_done_cb;
> - msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
> + nfs42_read_plus_support(NFS_SERVER(hdr->inode), msg);
> nfs4_init_sequence(&hdr->args.seq_args, &hdr->res.seq_res, 0);
> }
>
> @@ -8556,6 +8581,7 @@ static const struct nfs4_minor_version_ops nfs_v4_2_minor_ops = {
> | NFS_CAP_ATOMIC_OPEN_V1
> | NFS_CAP_ALLOCATE
> | NFS_CAP_DEALLOCATE
> + | NFS_CAP_READ_PLUS
> | NFS_CAP_SEEK,
> .init_client = nfs41_init_client,
> .shutdown_client = nfs41_shutdown_client,
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index 6b28a60..39e521a 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -7421,6 +7421,7 @@ struct rpc_procinfo nfs4_procedures[] = {
> PROC(SEEK, enc_seek, dec_seek),
> PROC(ALLOCATE, enc_allocate, dec_allocate),
> PROC(DEALLOCATE, enc_deallocate, dec_deallocate),
> + PROC(READ_PLUS, enc_read_plus, dec_read_plus),
> #endif /* CONFIG_NFS_V4_2 */
> };
>
> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> index ed43cb7..2470e62 100644
> --- a/include/linux/nfs4.h
> +++ b/include/linux/nfs4.h
> @@ -493,6 +493,7 @@ enum {
> NFSPROC4_CLNT_SEEK,
> NFSPROC4_CLNT_ALLOCATE,
> NFSPROC4_CLNT_DEALLOCATE,
> + NFSPROC4_CLNT_READ_PLUS,
> };
>
> /* nfs41 types */
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 5e1273d..728d928 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -237,5 +237,6 @@ struct nfs_server {
> #define NFS_CAP_SEEK (1U << 19)
> #define NFS_CAP_ALLOCATE (1U << 20)
> #define NFS_CAP_DEALLOCATE (1U << 21)
> +#define NFS_CAP_READ_PLUS (1U << 22)
>
> #endif
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 93ab607..35651b2 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -520,7 +520,7 @@ struct nfs_pgio_args {
> struct nfs_pgio_res {
> struct nfs4_sequence_res seq_res;
> struct nfs_fattr * fattr;
> - __u32 count;
> + __u64 count;
> __u32 op_status;
> int eof; /* used by read */
> struct nfs_writeverf * verf; /* used by write */
> --
> 2.3.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-03-17 19:16:12

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v3 0/5] NFS: Add READ_PLUS support

On 03/17/2015 03:08 PM, J. Bruce Fields wrote:
> On Mon, Mar 16, 2015 at 05:17:41PM -0400, Anna Schumaker wrote:
>> These patches add client support for the NFS v4.2 operation READ_PLUS. This
>> operation is triggered by doing any kind of read on a NFS v4.2 mounted
>> filesystem. `
>>
>> I tested these patches using xfstests and compared the runtime between NFS
>> v4.1, NFS v4.2 without READ_PLUS, and NFS v4.2 with READ_PLUS. Here are
>> the results:
>
> These are interesting but don't tell us much on their own. As far as I
> know xfstests isn't really intended to be used for performance
> comparisons, and there are some mysteries here:
>
>> Test | v4.1 | no READ_PLUS | READ_PLUS
>> ---------------------------------------------------
>> generic/013 | 135s | 143s | 123s
>
> This is an fsstress run. Looking quickly at ltp/fsstress.c, I see it
> can do fallocate calls, so presumably that explains the difference.
>
>> generic/075 | 4s | 11s | 4s
>
> Ditto for fsx.
>
>> generic/091 | 9s | 16s | 15s
>> generic/112 | 4s | 7s | 6s
>> generic/127 | 62s | 117s | 114s
>
> fsx again; this one must really have a ton of fallocate calls? Might be
> interesting to look at strace or /proc/self/mountstats to confirm that.
>
>> generic/213 | [not run] | 1s | 1s
>> generic/214 | [not run] | 0s | 0s
>> generic/228 | [not run] | 1s | 2s
>> generic/236 | 1s | 1s | 1s
>> generic/263 | 4s | 6s | 8s
>> generic/285 | 0s | 1s | 3s
>> generic/315 | [not run] | 2s | 1s
>> ---------------------------------------------------
>> Total | 3:47.47 | 5:11.85 | 4:43.77
>
> I'm already inclined to believe that the bandwidth savings are more
> important than zero-copy in the bad cases. And the xfstests timings
> count for something. But I'd still rather see more of an argument.
>
> It'd be interesting to see the best possible case (reading a large file
> that's all one hole), and the worst (reading a file with alternating 4K
> data and holes?).

I'm writing up this test right now, so I'll hopefully have this data soon!

Anna

>
> --b.
>
>>
>>
>> Using the READ_PLUS operation does have an impact on reading sparse
>> files over the network. There is still a big difference between v4.1
>> and v4.2 runtimes, but this looks to be a result of fallocate() tests
>> that only run over v4.2.
>>
>>
>> Changes since v2:
>> - Merge together patches adding DATA and HOLE segment support.
>> - Simplify hole zeroing code by using the zero_user_segment() function.
>>
>> These patches and the corresponding server changes are available in the
>> [read_plus] branch of
>>
>> git://git.linux-nfs.org/projects/anna/linux-nfs.git
>>
>> Questions? Comments? Thoughts?
>>
>> Anna
>>
>>
>> Anna Schumaker (5):
>> SUNRPC: Split out a function for setting current page
>> SUNRPC: Add the ability to expand holes in data pages
>> NFS: Add basic READ_PLUS support
>> SUNRPC: Add the ability to shift data to a specific offset
>> NFS: Add support for decoding multiple segments
>>
>> fs/nfs/nfs42xdr.c | 162 ++++++++++++++++++++++++++++++
>> fs/nfs/nfs4proc.c | 30 +++++-
>> fs/nfs/nfs4xdr.c | 1 +
>> include/linux/nfs4.h | 1 +
>> include/linux/nfs_fs_sb.h | 1 +
>> include/linux/nfs_xdr.h | 2 +-
>> include/linux/sunrpc/xdr.h | 2 +
>> net/sunrpc/xdr.c | 240 ++++++++++++++++++++++++++++++++++++++++++++-
>> 8 files changed, 434 insertions(+), 5 deletions(-)
>>
>> --
>> 2.3.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html