2020-02-14 21:13:19

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH v2 0/4] NFSD: Add support for the v4.2 READ_PLUS operation

From: Anna Schumaker <[email protected]>

These patches add server support for the READ_PLUS operation, which
breaks read requests into several "data" and "hole" segments when
replying to the client.

Here are the results of some performance tests I ran on Netapp lab
machines. I tested by reading various 2G files from a few different
undelying filesystems and across several NFS versions. I used the
`vmtouch` utility to make sure files were only cached when we wanted
them to be. In addition to 100% data and 100% hole cases, I also tested
with files that alternate between data and hole segments. These files
have either 4K, 8K, 16K, or 32K segment sizes and start with either data
or hole segments. So the file mixed-4d has a 4K segment size beginning
with a data segment, but mixed-32h hase 32K segments beginning with a
hole. The units are in seconds, with the first number for each NFS
version being the uncached read time and the second number is for when
the file is cached on the server.

ext4 | v3 | v4.0 | v4.1 | v4.2 |
----------|-----------------|-----------------|-----------------|-----------------|
data | 22.909 : 18.253 | 22.934 : 18.252 | 22.902 : 18.253 | 23.485 : 18.253 |
hole | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 | 0.708 : 0.709 |
mixed-4d | 28.261 : 18.253 | 29.616 : 18.252 | 28.341 : 18.252 | 24.508 : 9.150 |
mixed-8d | 27.956 : 18.253 | 28.404 : 18.252 | 28.320 : 18.252 | 23.967 : 9.140 |
mixed-16d | 28.172 : 18.253 | 27.946 : 18.252 | 27.627 : 18.252 | 23.043 : 9.134 |
mixed-32d | 25.350 : 18.253 | 24.406 : 18.252 | 24.384 : 18.253 | 20.698 : 9.132 |
mixed-4h | 28.913 : 18.253 | 28.564 : 18.252 | 27.996 : 18.252 | 21.837 : 9.150 |
mixed-8h | 28.625 : 18.253 | 27.833 : 18.252 | 27.798 : 18.253 | 21.710 : 9.140 |
mixed-16h | 27.975 : 18.253 | 27.662 : 18.252 | 27.795 : 18.253 | 20.585 : 9.134 |
mixed-32h | 25.958 : 18.253 | 25.491 : 18.252 | 24.856 : 18.252 | 21.018 : 9.132 |

xfs | v3 | v4.0 | v4.1 | v4.2 |
----------|-----------------|-----------------|-----------------|-----------------|
data | 22.041 : 18.253 | 22.618 : 18.252 | 23.067 : 18.253 | 23.496 : 18.253 |
hole | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 | 0.723 : 0.708 |
mixed-4d | 29.417 : 18.253 | 28.503 : 18.252 | 28.671 : 18.253 | 24.957 : 9.150 |
mixed-8d | 29.080 : 18.253 | 29.401 : 18.252 | 29.251 : 18.252 | 24.625 : 9.140 |
mixed-16d | 27.638 : 18.253 | 28.606 : 18.252 | 27.871 : 18.253 | 25.511 : 9.135 |
mixed-32d | 24.967 : 18.253 | 25.239 : 18.252 | 25.434 : 18.252 | 21.728 : 9.132 |
mixed-4h | 34.816 : 18.253 | 36.243 : 18.252 | 35.837 : 18.252 | 32.332 : 9.150 |
mixed-8h | 43.469 : 18.253 | 44.009 : 18.252 | 43.810 : 18.253 | 37.962 : 9.140 |
mixed-16h | 29.280 : 18.253 | 28.563 : 18.252 | 28.241 : 18.252 | 22.116 : 9.134 |
mixed-32h | 29.428 : 18.253 | 29.378 : 18.252 | 28.808 : 18.253 | 27.378 : 9.134 |

btrfs | v3 | v4.0 | v4.1 | v4.2 |
----------|-----------------|-----------------|-----------------|-----------------|
data | 25.547 : 18.253 | 25.053 : 18.252 | 24.209 : 18.253 | 32.121 : 18.253 |
hole | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.252 | 0.702 : 0.724 |
mixed-4d | 19.016 : 18.253 | 18.822 : 18.252 | 18.955 : 18.253 | 18.697 : 9.150 |
mixed-8d | 19.186 : 18.253 | 19.444 : 18.252 | 18.841 : 18.253 | 18.452 : 9.140 |
mixed-16d | 18.480 : 18.253 | 19.010 : 18.252 | 19.167 : 18.252 | 16.000 : 9.134 |
mixed-32d | 18.635 : 18.253 | 18.565 : 18.252 | 18.550 : 18.252 | 15.930 : 9.132 |
mixed-4h | 19.079 : 18.253 | 18.990 : 18.252 | 19.157 : 18.253 | 27.834 : 9.150 |
mixed-8h | 18.613 : 18.253 | 19.234 : 18.252 | 18.616 : 18.253 | 20.177 : 9.140 |
mixed-16h | 18.590 : 18.253 | 19.221 : 18.252 | 19.654 : 18.253 | 17.273 : 9.135 |
mixed-32h | 18.768 : 18.253 | 19.122 : 18.252 | 18.535 : 18.252 | 15.791 : 9.132 |

ext3 | v3 | v4.0 | v4.1 | v4.2 |
----------|-----------------|-----------------|-----------------|-----------------|
data | 34.292 : 18.253 | 33.810 : 18.252 | 33.450 : 18.253 | 33.390 : 18.254 |
hole | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 | 0.718 : 0.728 |
mixed-4d | 46.818 : 18.253 | 47.140 : 18.252 | 48.385 : 18.253 | 42.887 : 9.150 |
mixed-8d | 58.554 : 18.253 | 59.277 : 18.252 | 59.673 : 18.253 | 56.760 : 9.140 |
mixed-16d | 44.631 : 18.253 | 44.291 : 18.252 | 44.729 : 18.253 | 40.237 : 9.135 |
mixed-32d | 39.110 : 18.253 | 38.735 : 18.252 | 38.902 : 18.252 | 35.270 : 9.132 |
mixed-4h | 56.396 : 18.253 | 56.387 : 18.252 | 56.573 : 18.253 | 67.661 : 9.150 |
mixed-8h | 58.483 : 18.253 | 58.484 : 18.252 | 59.099 : 18.253 | 77.958 : 9.140 |
mixed-16h | 42.511 : 18.253 | 42.338 : 18.252 | 42.356 : 18.252 | 51.805 : 9.135 |
mixed-32h | 38.419 : 18.253 | 38.504 : 18.252 | 38.643 : 18.252 | 40.411 : 9.132 |

Any questions?
Anna


Anna Schumaker (4):
NFSD: Return eof and maxcount to nfsd4_encode_read()
NFSD: Add READ_PLUS data support
NFSD: Add READ_PLUS hole segment encoding
NFSD: Encode a full READ_PLUS reply

fs/nfsd/nfs4proc.c | 17 ++++
fs/nfsd/nfs4xdr.c | 202 +++++++++++++++++++++++++++++++++++++--------
2 files changed, 183 insertions(+), 36 deletions(-)

--
2.25.0


2020-02-14 21:13:19

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH v2 4/4] NFSD: Encode a full READ_PLUS reply

From: Anna Schumaker <[email protected]>

Reply to the client with multiple hole and data segments. This might
have performance issues due to the number of calls to vfs_llseek(),
depending on the underlying filesystem used on the server.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfsd/nfs4xdr.c | 41 +++++++++++++++++++++++++++++------------
1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 1a2f06de651d..44bd0b8deafb 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4385,14 +4385,18 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,

static __be32
nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
- struct nfsd4_read *read,
- unsigned long maxcount, u32 *eof)
+ struct nfsd4_read *read, u32 *eof)
{
struct xdr_stream *xdr = &resp->xdr;
struct file *file = read->rd_nf->nf_file;
+ unsigned long maxcount = read->rd_length;
+ loff_t hole_pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
__be32 nfserr;
__be32 *p;

+ if (hole_pos > read->rd_offset)
+ maxcount = min_t(unsigned long, maxcount, hole_pos - read->rd_offset);
+
/* Content type, offset, byte count */
p = xdr_reserve_space(xdr, 4 + 8 + 4);
if (!p)
@@ -4404,6 +4408,7 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
nfserr = nfsd4_encode_splice_read(resp, read, file, &maxcount, eof);
else
nfserr = nfsd4_encode_readv(resp, read, file, &maxcount, eof);
+ clear_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags);

if (nfserr)
return nfserr;
@@ -4418,18 +4423,24 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
}

static __be32
-nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, struct nfsd4_read *read,
- unsigned long maxcount, u32 *eof)
+nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
+ struct nfsd4_read *read, u32 *eof, loff_t data_pos)
{
struct file *file = read->rd_nf->nf_file;
+ unsigned long maxcount = read->rd_length;
__be32 *p;

+ if (data_pos == 0)
+ data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
+ if (data_pos == -ENXIO)
+ data_pos = i_size_read(file_inode(file));
+
/* Content type, offset, byte count */
p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8);
if (!p)
return nfserr_resource;

- maxcount = min_t(unsigned long, maxcount, read->rd_length);
+ maxcount = min_t(unsigned long, maxcount, data_pos - read->rd_offset);

*p++ = cpu_to_be32(NFS4_CONTENT_HOLE);
p = xdr_encode_hyper(p, read->rd_offset);
@@ -4453,6 +4464,7 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
int starting_len = xdr->buf->len;
unsigned int segments = 0;
loff_t data_pos;
+ bool is_data;
__be32 *p;

if (nfserr)
@@ -4476,21 +4488,26 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
maxcount = min_t(unsigned long, maxcount,
(xdr->buf->buflen - xdr->buf->len));
maxcount = min_t(unsigned long, maxcount, read->rd_length);
+ read->rd_length = maxcount;

data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
if (data_pos == -ENXIO)
data_pos = i_size_read(file_inode(file));
else if (data_pos < 0)
data_pos = read->rd_offset;
+ is_data = (data_pos == read->rd_offset);
+ eof = read->rd_offset > i_size_read(file_inode(file));

- if (data_pos > read->rd_offset) {
- nfserr = nfsd4_encode_read_plus_hole(resp, read,
- data_pos - read->rd_offset, &eof);
- segments++;
- }
+ while (read->rd_length > 0 && !eof) {
+ if (is_data)
+ nfserr = nfsd4_encode_read_plus_data(resp, read, &eof);
+ else
+ nfserr = nfsd4_encode_read_plus_hole(resp, read, &eof, data_pos);

- if (!nfserr && !eof && read->rd_length > 0) {
- nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
+ if (nfserr)
+ break;
+ is_data = !is_data;
+ data_pos = 0;
segments++;
}

--
2.25.0

2020-02-14 21:13:19

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH v2 2/4] NFSD: Add READ_PLUS data support

From: Anna Schumaker <[email protected]>

This patch adds READ_PLUS support for returning a single
NFS4_CONTENT_DATA segment to the client. This is basically the same as
the READ operation, only with the extra information about data segments.

Note that Wireshark 3.0 will report "malformed packed" when trying to
decode NFS4_CONTENT_DATA segments. This is because the actual data is
encoded as a variable length array, which RFC 4506 says should start
with a 32-bit length value. Wireshark incorrectly expects length to be a
64-bit value instead.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfsd/nfs4proc.c | 17 +++++++++
fs/nfsd/nfs4xdr.c | 90 ++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 101 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 0e75f7fb5fec..9643181591e3 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2522,6 +2522,16 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32);
}

+static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
+{
+ u32 maxcount = svc_max_payload(rqstp);
+ u32 rlen = min(op->u.read.rd_length, maxcount);
+ /* enough extra xdr space for encoding either a hole or data segment. */
+ u32 segments = 1 + 2 + 2;
+
+ return (op_encode_hdr_size + 2 + segments + XDR_QUADLEN(rlen)) * sizeof(__be32);
+}
+
static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
u32 maxcount = 0, rlen = 0;
@@ -3058,6 +3068,13 @@ static const struct nfsd4_operation nfsd4_ops[] = {
.op_name = "OP_COPY",
.op_rsize_bop = nfsd4_copy_rsize,
},
+ [OP_READ_PLUS] = {
+ .op_func = nfsd4_read,
+ .op_release = nfsd4_read_release,
+ .op_name = "OP_READ_PLUS",
+ .op_rsize_bop = nfsd4_read_plus_rsize,
+ .op_get_currentstateid = nfsd4_get_readstateid,
+ },
[OP_SEEK] = {
.op_func = nfsd4_seek,
.op_name = "OP_SEEK",
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 45f0623f6488..8efb59d4fda4 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1957,7 +1957,7 @@ static const nfsd4_dec nfsd4_dec_ops[] = {
[OP_LAYOUTSTATS] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_OFFLOAD_CANCEL] = (nfsd4_dec)nfsd4_decode_offload_status,
[OP_OFFLOAD_STATUS] = (nfsd4_dec)nfsd4_decode_offload_status,
- [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
+ [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_read,
[OP_SEEK] = (nfsd4_dec)nfsd4_decode_seek,
[OP_WRITE_SAME] = (nfsd4_dec)nfsd4_decode_notsupp,
[OP_CLONE] = (nfsd4_dec)nfsd4_decode_clone,
@@ -3664,10 +3664,11 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

if (nfserr)
xdr_truncate_encode(xdr, starting_len);
-
- read->rd_length = maxcount;
- *p++ = htonl(eof);
- *p++ = htonl(maxcount);
+ else {
+ read->rd_length = maxcount;
+ *p++ = htonl(eof);
+ *p++ = htonl(maxcount);
+ }

return nfserr;
}
@@ -4379,6 +4380,83 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
return nfserr_resource;
p = xdr_encode_hyper(p, os->count);
*p++ = cpu_to_be32(0);
+ return nfserr;
+}
+
+static __be32
+nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
+ struct nfsd4_read *read,
+ unsigned long maxcount, u32 *eof)
+{
+ struct xdr_stream *xdr = &resp->xdr;
+ struct file *file = read->rd_nf->nf_file;
+ __be32 nfserr;
+ __be32 *p;
+
+ /* Content type, offset, byte count */
+ p = xdr_reserve_space(xdr, 4 + 8 + 4);
+ if (!p)
+ return nfserr_resource;
+ xdr_commit_encode(xdr);
+
+ if (file->f_op->splice_read &&
+ test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags))
+ nfserr = nfsd4_encode_splice_read(resp, read, file, &maxcount, eof);
+ else
+ nfserr = nfsd4_encode_readv(resp, read, file, &maxcount, eof);
+
+ if (nfserr)
+ return nfserr;
+
+ *p++ = htonl(NFS4_CONTENT_DATA);
+ p = xdr_encode_hyper(p, read->rd_offset);
+ *p++ = htonl(maxcount);
+
+ read->rd_offset += maxcount;
+ read->rd_length = (maxcount > 0) ? read->rd_length - maxcount : 0;
+ return nfserr;
+}
+
+static __be32
+nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_read *read)
+{
+ unsigned long maxcount;
+ u32 eof;
+ struct xdr_stream *xdr = &resp->xdr;
+ struct file *file;
+ int starting_len = xdr->buf->len;
+ __be32 *p;
+
+ if (nfserr)
+ return nfserr;
+ file = read->rd_nf->nf_file;
+
+ /* eof flag, segment count */
+ p = xdr_reserve_space(xdr, 4 + 4);
+ if (!p) {
+ WARN_ON_ONCE(test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags));
+ return nfserr_resource;
+ }
+ if (resp->xdr.buf->page_len &&
+ test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags)) {
+ WARN_ON_ONCE(1);
+ return nfserr_resource;
+ }
+ xdr_commit_encode(xdr);
+
+ maxcount = svc_max_payload(resp->rqstp);
+ maxcount = min_t(unsigned long, maxcount,
+ (xdr->buf->buflen - xdr->buf->len));
+ maxcount = min_t(unsigned long, maxcount, read->rd_length);
+
+ nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
+ if (nfserr)
+ xdr_truncate_encode(xdr, starting_len);
+ else {
+ *p++ = htonl(eof);
+ *p++ = htonl(1);
+ }

return nfserr;
}
@@ -4521,7 +4599,7 @@ static const nfsd4_enc nfsd4_enc_ops[] = {
[OP_LAYOUTSTATS] = (nfsd4_enc)nfsd4_encode_noop,
[OP_OFFLOAD_CANCEL] = (nfsd4_enc)nfsd4_encode_noop,
[OP_OFFLOAD_STATUS] = (nfsd4_enc)nfsd4_encode_offload_status,
- [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
+ [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_read_plus,
[OP_SEEK] = (nfsd4_enc)nfsd4_encode_seek,
[OP_WRITE_SAME] = (nfsd4_enc)nfsd4_encode_noop,
[OP_CLONE] = (nfsd4_enc)nfsd4_encode_noop,
--
2.25.0

2020-02-14 21:13:49

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH v2 3/4] NFSD: Add READ_PLUS hole segment encoding

From: Anna Schumaker <[email protected]>

However, we only encode the hole if it is at the beginning of the range
and treat everything else as data to keep things simple.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
fs/nfsd/nfs4xdr.c | 47 ++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 9643181591e3..c65939a1e40c 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2527,7 +2527,7 @@ static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op
u32 maxcount = svc_max_payload(rqstp);
u32 rlen = min(op->u.read.rd_length, maxcount);
/* enough extra xdr space for encoding either a hole or data segment. */
- u32 segments = 1 + 2 + 2;
+ u32 segments = 2 * (1 + 2 + 2);

return (op_encode_hdr_size + 2 + segments + XDR_QUADLEN(rlen)) * sizeof(__be32);
}
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 8efb59d4fda4..1a2f06de651d 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4417,6 +4417,31 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
return nfserr;
}

+static __be32
+nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, struct nfsd4_read *read,
+ unsigned long maxcount, u32 *eof)
+{
+ struct file *file = read->rd_nf->nf_file;
+ __be32 *p;
+
+ /* Content type, offset, byte count */
+ p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8);
+ if (!p)
+ return nfserr_resource;
+
+ maxcount = min_t(unsigned long, maxcount, read->rd_length);
+
+ *p++ = cpu_to_be32(NFS4_CONTENT_HOLE);
+ p = xdr_encode_hyper(p, read->rd_offset);
+ p = xdr_encode_hyper(p, maxcount);
+
+ *eof = (read->rd_offset + maxcount) >= i_size_read(file_inode(file));
+
+ read->rd_offset += maxcount;
+ read->rd_length = (maxcount > 0) ? read->rd_length - maxcount : 0;
+ return nfs_ok;
+}
+
static __be32
nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_read *read)
@@ -4426,6 +4451,8 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
struct xdr_stream *xdr = &resp->xdr;
struct file *file;
int starting_len = xdr->buf->len;
+ unsigned int segments = 0;
+ loff_t data_pos;
__be32 *p;

if (nfserr)
@@ -4450,12 +4477,28 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
(xdr->buf->buflen - xdr->buf->len));
maxcount = min_t(unsigned long, maxcount, read->rd_length);

- nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
+ data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
+ if (data_pos == -ENXIO)
+ data_pos = i_size_read(file_inode(file));
+ else if (data_pos < 0)
+ data_pos = read->rd_offset;
+
+ if (data_pos > read->rd_offset) {
+ nfserr = nfsd4_encode_read_plus_hole(resp, read,
+ data_pos - read->rd_offset, &eof);
+ segments++;
+ }
+
+ if (!nfserr && !eof && read->rd_length > 0) {
+ nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
+ segments++;
+ }
+
if (nfserr)
xdr_truncate_encode(xdr, starting_len);
else {
*p++ = htonl(eof);
- *p++ = htonl(1);
+ *p++ = htonl(segments);
}

return nfserr;
--
2.25.0

2020-02-14 21:13:49

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH v2 1/4] NFSD: Return eof and maxcount to nfsd4_encode_read()

From: Anna Schumaker <[email protected]>

I want to reuse nfsd4_encode_readv() and nfsd4_encode_splice_read() in
READ_PLUS rather than reimplementing them. READ_PLUS returns a single
eof flag for the entire call and a separate maxcount for each data
segment, so we need to have the READ call encode these values in a
different place.

Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfsd/nfs4xdr.c | 60 ++++++++++++++++++++---------------------------
1 file changed, 26 insertions(+), 34 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 9761512674a0..45f0623f6488 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3521,23 +3521,22 @@ nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struc

static __be32 nfsd4_encode_splice_read(
struct nfsd4_compoundres *resp,
- struct nfsd4_read *read,
- struct file *file, unsigned long maxcount)
+ struct nfsd4_read *read, struct file *file,
+ unsigned long *maxcount, u32 *eof)
{
struct xdr_stream *xdr = &resp->xdr;
struct xdr_buf *buf = xdr->buf;
- u32 eof;
+ long len;
int space_left;
__be32 nfserr;
- __be32 *p = xdr->p - 2;

/* Make sure there will be room for padding if needed */
if (xdr->end - xdr->p < 1)
return nfserr_resource;

+ len = *maxcount;
nfserr = nfsd_splice_read(read->rd_rqstp, read->rd_fhp,
- file, read->rd_offset, &maxcount, &eof);
- read->rd_length = maxcount;
+ file, read->rd_offset, maxcount, eof);
if (nfserr) {
/*
* nfsd_splice_actor may have already messed with the
@@ -3548,24 +3547,21 @@ static __be32 nfsd4_encode_splice_read(
return nfserr;
}

- *(p++) = htonl(eof);
- *(p++) = htonl(maxcount);
-
- buf->page_len = maxcount;
- buf->len += maxcount;
- xdr->page_ptr += (buf->page_base + maxcount + PAGE_SIZE - 1)
+ buf->page_len = *maxcount;
+ buf->len += *maxcount;
+ xdr->page_ptr += (buf->page_base + *maxcount + PAGE_SIZE - 1)
/ PAGE_SIZE;

/* Use rest of head for padding and remaining ops: */
buf->tail[0].iov_base = xdr->p;
buf->tail[0].iov_len = 0;
xdr->iov = buf->tail;
- if (maxcount&3) {
- int pad = 4 - (maxcount&3);
+ if (*maxcount&3) {
+ int pad = 4 - (*maxcount&3);

*(xdr->p++) = 0;

- buf->tail[0].iov_base += maxcount&3;
+ buf->tail[0].iov_base += *maxcount&3;
buf->tail[0].iov_len = pad;
buf->len += pad;
}
@@ -3579,22 +3575,20 @@ static __be32 nfsd4_encode_splice_read(
}

static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
- struct nfsd4_read *read,
- struct file *file, unsigned long maxcount)
+ struct nfsd4_read *read, struct file *file,
+ unsigned long *maxcount, u32 *eof)
{
struct xdr_stream *xdr = &resp->xdr;
- u32 eof;
int v;
int starting_len = xdr->buf->len - 8;
long len;
int thislen;
__be32 nfserr;
- __be32 tmp;
__be32 *p;
u32 zzz = 0;
int pad;

- len = maxcount;
+ len = *maxcount;
v = 0;

thislen = min_t(long, len, ((void *)xdr->end - (void *)xdr->p));
@@ -3616,22 +3610,15 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
}
read->rd_vlen = v;

- len = maxcount;
+ len = *maxcount;
nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
- resp->rqstp->rq_vec, read->rd_vlen, &maxcount,
- &eof);
- read->rd_length = maxcount;
+ resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
if (nfserr)
return nfserr;
- xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
+ xdr_truncate_encode(xdr, starting_len + 8 + ((*maxcount+3)&~3));

- tmp = htonl(eof);
- write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4);
- tmp = htonl(maxcount);
- write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
-
- pad = (maxcount&3) ? 4 - (maxcount&3) : 0;
- write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + maxcount,
+ pad = (*maxcount&3) ? 4 - (*maxcount&3) : 0;
+ write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + *maxcount,
&zzz, pad);
return 0;

@@ -3642,6 +3629,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_read *read)
{
unsigned long maxcount;
+ u32 eof;
struct xdr_stream *xdr = &resp->xdr;
struct file *file;
int starting_len = xdr->buf->len;
@@ -3670,13 +3658,17 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

if (file->f_op->splice_read &&
test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags))
- nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
+ nfserr = nfsd4_encode_splice_read(resp, read, file, &maxcount, &eof);
else
- nfserr = nfsd4_encode_readv(resp, read, file, maxcount);
+ nfserr = nfsd4_encode_readv(resp, read, file, &maxcount, &eof);

if (nfserr)
xdr_truncate_encode(xdr, starting_len);

+ read->rd_length = maxcount;
+ *p++ = htonl(eof);
+ *p++ = htonl(maxcount);
+
return nfserr;
}

--
2.25.0

2020-02-14 22:22:12

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] NFSD: Return eof and maxcount to nfsd4_encode_read()



> On Feb 14, 2020, at 4:12 PM, [email protected] wrote:
>
> From: Anna Schumaker <[email protected]>
>
> I want to reuse nfsd4_encode_readv() and nfsd4_encode_splice_read() in
> READ_PLUS rather than reimplementing them. READ_PLUS returns a single
> eof flag for the entire call and a separate maxcount for each data
> segment, so we need to have the READ call encode these values in a
> different place.

This probably collides pretty nastily with the fix I posted today for
https://bugzilla.kernel.org/show_bug.cgi?id=198053 .

Can my fix go in first so that there is still opportunity to backport it?


> Signed-off-by: Anna Schumaker <[email protected]>
> ---
> fs/nfsd/nfs4xdr.c | 60 ++++++++++++++++++++---------------------------
> 1 file changed, 26 insertions(+), 34 deletions(-)
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 9761512674a0..45f0623f6488 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -3521,23 +3521,22 @@ nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struc
>
> static __be32 nfsd4_encode_splice_read(
> struct nfsd4_compoundres *resp,
> - struct nfsd4_read *read,
> - struct file *file, unsigned long maxcount)
> + struct nfsd4_read *read, struct file *file,
> + unsigned long *maxcount, u32 *eof)
> {
> struct xdr_stream *xdr = &resp->xdr;
> struct xdr_buf *buf = xdr->buf;
> - u32 eof;
> + long len;
> int space_left;
> __be32 nfserr;
> - __be32 *p = xdr->p - 2;
>
> /* Make sure there will be room for padding if needed */
> if (xdr->end - xdr->p < 1)
> return nfserr_resource;
>
> + len = *maxcount;
> nfserr = nfsd_splice_read(read->rd_rqstp, read->rd_fhp,
> - file, read->rd_offset, &maxcount, &eof);
> - read->rd_length = maxcount;
> + file, read->rd_offset, maxcount, eof);
> if (nfserr) {
> /*
> * nfsd_splice_actor may have already messed with the
> @@ -3548,24 +3547,21 @@ static __be32 nfsd4_encode_splice_read(
> return nfserr;
> }
>
> - *(p++) = htonl(eof);
> - *(p++) = htonl(maxcount);
> -
> - buf->page_len = maxcount;
> - buf->len += maxcount;
> - xdr->page_ptr += (buf->page_base + maxcount + PAGE_SIZE - 1)
> + buf->page_len = *maxcount;
> + buf->len += *maxcount;
> + xdr->page_ptr += (buf->page_base + *maxcount + PAGE_SIZE - 1)
> / PAGE_SIZE;
>
> /* Use rest of head for padding and remaining ops: */
> buf->tail[0].iov_base = xdr->p;
> buf->tail[0].iov_len = 0;
> xdr->iov = buf->tail;
> - if (maxcount&3) {
> - int pad = 4 - (maxcount&3);
> + if (*maxcount&3) {
> + int pad = 4 - (*maxcount&3);
>
> *(xdr->p++) = 0;
>
> - buf->tail[0].iov_base += maxcount&3;
> + buf->tail[0].iov_base += *maxcount&3;
> buf->tail[0].iov_len = pad;
> buf->len += pad;
> }
> @@ -3579,22 +3575,20 @@ static __be32 nfsd4_encode_splice_read(
> }
>
> static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> - struct nfsd4_read *read,
> - struct file *file, unsigned long maxcount)
> + struct nfsd4_read *read, struct file *file,
> + unsigned long *maxcount, u32 *eof)
> {
> struct xdr_stream *xdr = &resp->xdr;
> - u32 eof;
> int v;
> int starting_len = xdr->buf->len - 8;
> long len;
> int thislen;
> __be32 nfserr;
> - __be32 tmp;
> __be32 *p;
> u32 zzz = 0;
> int pad;
>
> - len = maxcount;
> + len = *maxcount;
> v = 0;
>
> thislen = min_t(long, len, ((void *)xdr->end - (void *)xdr->p));
> @@ -3616,22 +3610,15 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> }
> read->rd_vlen = v;
>
> - len = maxcount;
> + len = *maxcount;
> nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> - resp->rqstp->rq_vec, read->rd_vlen, &maxcount,
> - &eof);
> - read->rd_length = maxcount;
> + resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
> if (nfserr)
> return nfserr;
> - xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
> + xdr_truncate_encode(xdr, starting_len + 8 + ((*maxcount+3)&~3));
>
> - tmp = htonl(eof);
> - write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4);
> - tmp = htonl(maxcount);
> - write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> -
> - pad = (maxcount&3) ? 4 - (maxcount&3) : 0;
> - write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + maxcount,
> + pad = (*maxcount&3) ? 4 - (*maxcount&3) : 0;
> + write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + *maxcount,
> &zzz, pad);
> return 0;
>
> @@ -3642,6 +3629,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
> struct nfsd4_read *read)
> {
> unsigned long maxcount;
> + u32 eof;
> struct xdr_stream *xdr = &resp->xdr;
> struct file *file;
> int starting_len = xdr->buf->len;
> @@ -3670,13 +3658,17 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
>
> if (file->f_op->splice_read &&
> test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags))
> - nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
> + nfserr = nfsd4_encode_splice_read(resp, read, file, &maxcount, &eof);
> else
> - nfserr = nfsd4_encode_readv(resp, read, file, maxcount);
> + nfserr = nfsd4_encode_readv(resp, read, file, &maxcount, &eof);
>
> if (nfserr)
> xdr_truncate_encode(xdr, starting_len);
>
> + read->rd_length = maxcount;
> + *p++ = htonl(eof);
> + *p++ = htonl(maxcount);
> +
> return nfserr;
> }
>
> --
> 2.25.0
>

--
Chuck Lever



2020-02-17 19:55:21

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] NFSD: Return eof and maxcount to nfsd4_encode_read()

On Fri, Feb 14, 2020 at 05:20:37PM -0500, Chuck Lever wrote:
>
>
> > On Feb 14, 2020, at 4:12 PM, [email protected] wrote:
> >
> > From: Anna Schumaker <[email protected]>
> >
> > I want to reuse nfsd4_encode_readv() and nfsd4_encode_splice_read() in
> > READ_PLUS rather than reimplementing them. READ_PLUS returns a single
> > eof flag for the entire call and a separate maxcount for each data
> > segment, so we need to have the READ call encode these values in a
> > different place.
>
> This probably collides pretty nastily with the fix I posted today for
> https://bugzilla.kernel.org/show_bug.cgi?id=198053 .
>
> Can my fix go in first so that there is still opportunity to backport it?

Sure, makes sense.--b.

>
>
> > Signed-off-by: Anna Schumaker <[email protected]>
> > ---
> > fs/nfsd/nfs4xdr.c | 60 ++++++++++++++++++++---------------------------
> > 1 file changed, 26 insertions(+), 34 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 9761512674a0..45f0623f6488 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -3521,23 +3521,22 @@ nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struc
> >
> > static __be32 nfsd4_encode_splice_read(
> > struct nfsd4_compoundres *resp,
> > - struct nfsd4_read *read,
> > - struct file *file, unsigned long maxcount)
> > + struct nfsd4_read *read, struct file *file,
> > + unsigned long *maxcount, u32 *eof)
> > {
> > struct xdr_stream *xdr = &resp->xdr;
> > struct xdr_buf *buf = xdr->buf;
> > - u32 eof;
> > + long len;
> > int space_left;
> > __be32 nfserr;
> > - __be32 *p = xdr->p - 2;
> >
> > /* Make sure there will be room for padding if needed */
> > if (xdr->end - xdr->p < 1)
> > return nfserr_resource;
> >
> > + len = *maxcount;
> > nfserr = nfsd_splice_read(read->rd_rqstp, read->rd_fhp,
> > - file, read->rd_offset, &maxcount, &eof);
> > - read->rd_length = maxcount;
> > + file, read->rd_offset, maxcount, eof);
> > if (nfserr) {
> > /*
> > * nfsd_splice_actor may have already messed with the
> > @@ -3548,24 +3547,21 @@ static __be32 nfsd4_encode_splice_read(
> > return nfserr;
> > }
> >
> > - *(p++) = htonl(eof);
> > - *(p++) = htonl(maxcount);
> > -
> > - buf->page_len = maxcount;
> > - buf->len += maxcount;
> > - xdr->page_ptr += (buf->page_base + maxcount + PAGE_SIZE - 1)
> > + buf->page_len = *maxcount;
> > + buf->len += *maxcount;
> > + xdr->page_ptr += (buf->page_base + *maxcount + PAGE_SIZE - 1)
> > / PAGE_SIZE;
> >
> > /* Use rest of head for padding and remaining ops: */
> > buf->tail[0].iov_base = xdr->p;
> > buf->tail[0].iov_len = 0;
> > xdr->iov = buf->tail;
> > - if (maxcount&3) {
> > - int pad = 4 - (maxcount&3);
> > + if (*maxcount&3) {
> > + int pad = 4 - (*maxcount&3);
> >
> > *(xdr->p++) = 0;
> >
> > - buf->tail[0].iov_base += maxcount&3;
> > + buf->tail[0].iov_base += *maxcount&3;
> > buf->tail[0].iov_len = pad;
> > buf->len += pad;
> > }
> > @@ -3579,22 +3575,20 @@ static __be32 nfsd4_encode_splice_read(
> > }
> >
> > static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> > - struct nfsd4_read *read,
> > - struct file *file, unsigned long maxcount)
> > + struct nfsd4_read *read, struct file *file,
> > + unsigned long *maxcount, u32 *eof)
> > {
> > struct xdr_stream *xdr = &resp->xdr;
> > - u32 eof;
> > int v;
> > int starting_len = xdr->buf->len - 8;
> > long len;
> > int thislen;
> > __be32 nfserr;
> > - __be32 tmp;
> > __be32 *p;
> > u32 zzz = 0;
> > int pad;
> >
> > - len = maxcount;
> > + len = *maxcount;
> > v = 0;
> >
> > thislen = min_t(long, len, ((void *)xdr->end - (void *)xdr->p));
> > @@ -3616,22 +3610,15 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
> > }
> > read->rd_vlen = v;
> >
> > - len = maxcount;
> > + len = *maxcount;
> > nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> > - resp->rqstp->rq_vec, read->rd_vlen, &maxcount,
> > - &eof);
> > - read->rd_length = maxcount;
> > + resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
> > if (nfserr)
> > return nfserr;
> > - xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));
> > + xdr_truncate_encode(xdr, starting_len + 8 + ((*maxcount+3)&~3));
> >
> > - tmp = htonl(eof);
> > - write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4);
> > - tmp = htonl(maxcount);
> > - write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> > -
> > - pad = (maxcount&3) ? 4 - (maxcount&3) : 0;
> > - write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + maxcount,
> > + pad = (*maxcount&3) ? 4 - (*maxcount&3) : 0;
> > + write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + *maxcount,
> > &zzz, pad);
> > return 0;
> >
> > @@ -3642,6 +3629,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
> > struct nfsd4_read *read)
> > {
> > unsigned long maxcount;
> > + u32 eof;
> > struct xdr_stream *xdr = &resp->xdr;
> > struct file *file;
> > int starting_len = xdr->buf->len;
> > @@ -3670,13 +3658,17 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
> >
> > if (file->f_op->splice_read &&
> > test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags))
> > - nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
> > + nfserr = nfsd4_encode_splice_read(resp, read, file, &maxcount, &eof);
> > else
> > - nfserr = nfsd4_encode_readv(resp, read, file, maxcount);
> > + nfserr = nfsd4_encode_readv(resp, read, file, &maxcount, &eof);
> >
> > if (nfserr)
> > xdr_truncate_encode(xdr, starting_len);
> >
> > + read->rd_length = maxcount;
> > + *p++ = htonl(eof);
> > + *p++ = htonl(maxcount);
> > +
> > return nfserr;
> > }
> >
> > --
> > 2.25.0
> >
>
> --
> Chuck Lever
>
>
>

2020-03-03 15:46:39

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] NFSD: Add support for the v4.2 READ_PLUS operation

Sorry for the delay, looking at this a little more carefully now....

Previously I remember you found a problem with very slow
SEEK_HOLE/SEEK_DATA on some filesystems--has that been fixed?

On Fri, Feb 14, 2020 at 04:12:02PM -0500, [email protected] wrote:
> From: Anna Schumaker <[email protected]>
>
> These patches add server support for the READ_PLUS operation, which
> breaks read requests into several "data" and "hole" segments when
> replying to the client.
>
> Here are the results of some performance tests I ran on Netapp lab
> machines.

Any details? Ideally we'd have enough detail about the hardware and
software used that someone else could reproduce your results if
necessary.

At a minimum I think it would be helpful to know your network latency
and round trip time. RPC statistics (e.g. number of round trips) might
also be interesting.

Is this a single run for each number?

> I tested by reading various 2G files from a few different
> undelying filesystems and across several NFS versions. I used the
> `vmtouch` utility to make sure files were only cached when we wanted
> them to be. In addition to 100% data and 100% hole cases, I also tested
> with files that alternate between data and hole segments. These files
> have either 4K, 8K, 16K, or 32K segment sizes and start with either data
> or hole segments. So the file mixed-4d has a 4K segment size beginning
> with a data segment, but mixed-32h hase 32K segments beginning with a
> hole. The units are in seconds, with the first number for each NFS
> version being the uncached read time and the second number is for when
> the file is cached on the server.

OK, READ_PLUS is in 4.2, so it's the last column that's the most
interesting one:

>
> ext4 | v3 | v4.0 | v4.1 | v4.2 |
> ----------|-----------------|-----------------|-----------------|-----------------|
> data | 22.909 : 18.253 | 22.934 : 18.252 | 22.902 : 18.253 | 23.485 : 18.253 |

So, the 4.2 case may be taking a couple percent longer in the case there
are no holes.

> hole | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 | 0.708 : 0.709 |

And as expected READ_PLUS is a big advantage when the file is one big
hole. And there's no difference between cached and uncached reads in
this case since the server's got no data to read off its disk.

> mixed-4d | 28.261 : 18.253 | 29.616 : 18.252 | 28.341 : 18.252 | 24.508 : 9.150 |
> mixed-8d | 27.956 : 18.253 | 28.404 : 18.252 | 28.320 : 18.252 | 23.967 : 9.140 |
> mixed-16d | 28.172 : 18.253 | 27.946 : 18.252 | 27.627 : 18.252 | 23.043 : 9.134 |
> mixed-32d | 25.350 : 18.253 | 24.406 : 18.252 | 24.384 : 18.253 | 20.698 : 9.132 |
> mixed-4h | 28.913 : 18.253 | 28.564 : 18.252 | 27.996 : 18.252 | 21.837 : 9.150 |
> mixed-8h | 28.625 : 18.253 | 27.833 : 18.252 | 27.798 : 18.253 | 21.710 : 9.140 |
> mixed-16h | 27.975 : 18.253 | 27.662 : 18.252 | 27.795 : 18.253 | 20.585 : 9.134 |
> mixed-32h | 25.958 : 18.253 | 25.491 : 18.252 | 24.856 : 18.252 | 21.018 : 9.132 |

So looks like READ_PLUS helps in every case and there's a slight
improvement with larger hole/data segments, so the seeking does have
some overhead. (Either that or it's just the extra rpc round trips--I
seem to recall this READ_PLUS implementation only handles at most one
hole and one data segment. But the fact that the times are so similar
in the uncached case suggests rpc latency isn't a factor--what's your
network?)

I wonder why the hole-first cases are faster than the data-first?

>
> xfs | v3 | v4.0 | v4.1 | v4.2 |
> ----------|-----------------|-----------------|-----------------|-----------------|
> data | 22.041 : 18.253 | 22.618 : 18.252 | 23.067 : 18.253 | 23.496 : 18.253 |
> hole | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 | 0.723 : 0.708 |
> mixed-4d | 29.417 : 18.253 | 28.503 : 18.252 | 28.671 : 18.253 | 24.957 : 9.150 |
> mixed-8d | 29.080 : 18.253 | 29.401 : 18.252 | 29.251 : 18.252 | 24.625 : 9.140 |
> mixed-16d | 27.638 : 18.253 | 28.606 : 18.252 | 27.871 : 18.253 | 25.511 : 9.135 |
> mixed-32d | 24.967 : 18.253 | 25.239 : 18.252 | 25.434 : 18.252 | 21.728 : 9.132 |
> mixed-4h | 34.816 : 18.253 | 36.243 : 18.252 | 35.837 : 18.252 | 32.332 : 9.150 |
> mixed-8h | 43.469 : 18.253 | 44.009 : 18.252 | 43.810 : 18.253 | 37.962 : 9.140 |
> mixed-16h | 29.280 : 18.253 | 28.563 : 18.252 | 28.241 : 18.252 | 22.116 : 9.134 |
> mixed-32h | 29.428 : 18.253 | 29.378 : 18.252 | 28.808 : 18.253 | 27.378 : 9.134 |
>
> btrfs | v3 | v4.0 | v4.1 | v4.2 |
> ----------|-----------------|-----------------|-----------------|-----------------|
> data | 25.547 : 18.253 | 25.053 : 18.252 | 24.209 : 18.253 | 32.121 : 18.253 |
> hole | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.252 | 0.702 : 0.724 |
> mixed-4d | 19.016 : 18.253 | 18.822 : 18.252 | 18.955 : 18.253 | 18.697 : 9.150 |
> mixed-8d | 19.186 : 18.253 | 19.444 : 18.252 | 18.841 : 18.253 | 18.452 : 9.140 |
> mixed-16d | 18.480 : 18.253 | 19.010 : 18.252 | 19.167 : 18.252 | 16.000 : 9.134 |
> mixed-32d | 18.635 : 18.253 | 18.565 : 18.252 | 18.550 : 18.252 | 15.930 : 9.132 |
> mixed-4h | 19.079 : 18.253 | 18.990 : 18.252 | 19.157 : 18.253 | 27.834 : 9.150 |
> mixed-8h | 18.613 : 18.253 | 19.234 : 18.252 | 18.616 : 18.253 | 20.177 : 9.140 |
> mixed-16h | 18.590 : 18.253 | 19.221 : 18.252 | 19.654 : 18.253 | 17.273 : 9.135 |
> mixed-32h | 18.768 : 18.253 | 19.122 : 18.252 | 18.535 : 18.252 | 15.791 : 9.132 |
>
> ext3 | v3 | v4.0 | v4.1 | v4.2 |
> ----------|-----------------|-----------------|-----------------|-----------------|
> data | 34.292 : 18.253 | 33.810 : 18.252 | 33.450 : 18.253 | 33.390 : 18.254 |
> hole | 18.256 : 18.253 | 18.255 : 18.252 | 18.256 : 18.253 | 0.718 : 0.728 |
> mixed-4d | 46.818 : 18.253 | 47.140 : 18.252 | 48.385 : 18.253 | 42.887 : 9.150 |
> mixed-8d | 58.554 : 18.253 | 59.277 : 18.252 | 59.673 : 18.253 | 56.760 : 9.140 |
> mixed-16d | 44.631 : 18.253 | 44.291 : 18.252 | 44.729 : 18.253 | 40.237 : 9.135 |
> mixed-32d | 39.110 : 18.253 | 38.735 : 18.252 | 38.902 : 18.252 | 35.270 : 9.132 |
> mixed-4h | 56.396 : 18.253 | 56.387 : 18.252 | 56.573 : 18.253 | 67.661 : 9.150 |
> mixed-8h | 58.483 : 18.253 | 58.484 : 18.252 | 59.099 : 18.253 | 77.958 : 9.140 |
> mixed-16h | 42.511 : 18.253 | 42.338 : 18.252 | 42.356 : 18.252 | 51.805 : 9.135 |
> mixed-32h | 38.419 : 18.253 | 38.504 : 18.252 | 38.643 : 18.252 | 40.411 : 9.132 |
>
> Any questions?

I'm surprised at the big differences between filesystems in the mixed
cases. Time for the uncached mixed-4h NFSv4.1 read is (19s, 28s, 36s,
57s) respectively for (btrfs, ext4, xfs, ext3).

READ_PLUS means giving up zero-copy on the client since the offset of
read data in the reply is no longer predictable, I wonder what sort of
test would show that.

--b.