2014-01-06 21:57:16

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 0/3] READ_PLUS rough draft

These patches are my initial implementation of READ_PLUS. I still have a
few issues to work out before they can be applied, but I wanted to submit
them anyway to get feedback before going much further. These patches were
developed on top of my earlier SEEK and WRITE_PLUS patches, and probably
won't apply cleanly without them (I am willing to reorder things if necessary!).

On the server side, I handle the cases where a file is 100% hole, 100% data
or hole followed by data. Any holes after a data segment will be expanded
to zeros on the wire. This is due to a limitation in the the NFSD
encode-to-page function that will adjust pointers to point to the xdr tail
after reading a file to the "pages" section. Bruce, do you have any
suggestions here?

The client side needs to punch holes after decoding page information, since
data on pages will be aligned at the start of the page array. So a file that
is <HOLE><DATA> will store the hole information, decode the data, and then
punch the hole before returning. I think it would be better to use the
provided offset field to decode everything to their final locations, but I'm
struggling to come up with a clean way of doing so using the code that is
already there.

Let me know what you all think!
Anna

Anna Schumaker (3):
NFSD: Implement READ_PLUS support
SUNRPC: This patch adds functions for shifting page data
NFS: Client side changes for READ_PLUS

fs/nfs/nfs4client.c | 2 +-
fs/nfs/nfs4proc.c | 23 +++++-
fs/nfs/nfs4xdr.c | 191 +++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/Kconfig | 14 ++++
fs/nfsd/nfs4proc.c | 9 +++
fs/nfsd/nfs4xdr.c | 177 ++++++++++++++++++++++++++++++++++++-----
include/linux/nfs4.h | 1 +
include/linux/nfs_fs_sb.h | 1 +
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/xdr.c | 115 ++++++++++++++++++++++++++-
10 files changed, 511 insertions(+), 23 deletions(-)

--
1.8.5.2



2014-01-07 14:56:35

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/3] READ_PLUS rough draft

On Tue, Jan 07, 2014 at 09:42:04AM -0500, Anna Schumaker wrote:
> On 01/06/2014 05:32 PM, J. Bruce Fields wrote:
> > On Mon, Jan 06, 2014 at 04:57:10PM -0500, Anna Schumaker wrote:
> >> These patches are my initial implementation of READ_PLUS. I still have a
> >> few issues to work out before they can be applied, but I wanted to submit
> >> them anyway to get feedback before going much further. These patches were
> >> developed on top of my earlier SEEK and WRITE_PLUS patches, and probably
> >> won't apply cleanly without them (I am willing to reorder things if necessary!).
> >>
> >> On the server side, I handle the cases where a file is 100% hole, 100% data
> >> or hole followed by data. Any holes after a data segment will be expanded
> >> to zeros on the wire.
> >
> > I assume that for "a file" I should read "the requested range of the
> > file"?
>
> Yes.
>
> >
> > hole+data+hole should also be doable, shouldn't it? I'd think the real
> > problem would be multiple data extents.
>
> It might be, but I haven't tried it yet. I can soon!
>
> >
> >> This is due to a limitation in the the NFSD
> >> encode-to-page function that will adjust pointers to point to the xdr tail
> >> after reading a file to the "pages" section. Bruce, do you have any
> >> suggestions here?
> >
> > The server xdr encoding needs a rewrite. I'll see if I can ignore you
> > all and put my head down and get a version of that posted this week.
>
> :)
>
> >
> > That should make it easier to return all the data, though it will turn
> > off zero-copy in the case of multiple data extents.
> >
> > If we want READ_PLUS to support zero copy in the case of multiple
> > extents then I think we need a new data structure to represent the
> > resulting rpc reply. An xdr buf only knows how to insert one array of
> > pages in the middle of the data. Maybe a list of xdr bufs?
> >
> > But that's an annoying job and possibly a premature optimization.
> >
> > It might be useful to first understand the typical distribution of holes
> > in a file and how likely various workloads are to produce reads with
> > multiple holes in the middle.
>
> I already have a few performance numbers, but nothing that can be trusted due to the number of debugging printk()s I used to make sure the client decoded everything correctly. My plan is to collect the following information using: v4.0, v4.1, v4.2 (SEEK), v4.2 (SEEK + WRITE_PLUS), and v4.2 (SEEK + WRITE_PLUS + READ_PLUS).

What's the workload and hardware setup?

--b.

2014-01-06 21:57:18

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 2/3] SUNRPC: This patch adds functions for shifting page data

Encoding a hole followed by data takes up more space than the xdr head
has allocated to it. As a result, the data segment will already be some
number of bytes on the page (usually 20 in this case), so a shift left
operation is needed to decode data to the right location.

xdr_shift_hole() will be called to insert a hole into the page data
by shifting contents over by some number of bytes and then zeroing the
requested range.

Ideally, I want to use the offset provided by READ_PLUS to place data
exactly where it needs to be. I have a rough (non-functioning) patch
for this that I want to hack on a little bit more before submitting.
---
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/xdr.c | 115 ++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 15f9204..1deb79b 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -227,6 +227,7 @@ extern unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len);
extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len);
extern int xdr_process_buf(struct xdr_buf *buf, unsigned int offset, unsigned int len, int (*actor)(struct scatterlist *, void *), void *data);

+extern void xdr_shift_hole(struct xdr_stream *, size_t, size_t);
#endif /* __KERNEL__ */

#endif /* _SUNRPC_XDR_H_ */
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 1504bb1..96973e3 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -219,6 +219,95 @@ _shift_data_right_pages(struct page **pages, size_t pgto_base,
} while ((len -= copy) != 0);
}

+/*
+ * _shift_data_left_pages
+ * @pages: vector of pages containing both the source and dest memory area
+ * @pgto_base: page vector address of destination
+ * @pgfrom_base: page vector address of source
+ * @len: number of bytes to move
+ *
+ * Note: This function does not copy data out of the tail. It only shifts
+ * already in the pages.
+ */
+static void
+_shift_data_left_pages(struct page **pages, size_t pgto_base,
+ size_t pgfrom_base, size_t len)
+{
+ struct page **pgfrom, **pgto;
+ char *vto, *vfrom;
+ size_t copy;
+
+ BUG_ON(pgto_base >= pgfrom_base);
+
+ pgto = pages + (pgto_base >> PAGE_CACHE_SHIFT);
+ pgfrom = pages + (pgfrom_base >> PAGE_CACHE_SHIFT);
+
+ do {
+ /* Are any pointers crossing a page boundary? */
+ if (pgto_base == PAGE_SIZE) {
+ pgto_base = 0;
+ pgto++;
+ }
+ if (pgfrom_base == PAGE_SIZE) {
+ pgfrom_base = 0;
+ pgfrom++;
+ }
+
+ copy = len;
+ if (copy > PAGE_SIZE - pgto_base)
+ copy = PAGE_SIZE - pgto_base;
+ if (copy > PAGE_SIZE - pgfrom_base)
+ copy = PAGE_SIZE - pgfrom_base;
+
+ vto = kmap_atomic(*pgto);
+ if (*pgto != *pgfrom) {
+ vfrom = kmap_atomic(*pgfrom);
+ memcpy(vto + pgto_base, vfrom + pgfrom_base, copy);
+ kunmap_atomic(vfrom);
+ };
+ if (*pgto == *pgfrom)
+ memmove(vto + pgto_base, vto + pgfrom_base, copy);
+ flush_dcache_page(*pgto);
+ kunmap_atomic(vto);
+
+ pgto_base += copy;
+ pgfrom_base += copy;
+
+ } while ((len -= copy) != 0);
+}
+
+/**
+ * _zero_data_pages
+ * @pages: array of pages
+ * @pgbase: beginning page vector address
+ * @len: length
+ */
+static void
+_zero_data_pages(struct page **pages, size_t pgbase, size_t len)
+{
+ struct page **page;
+ char *vpage;
+ size_t zero;
+
+ page = pages + (pgbase >> PAGE_CACHE_SHIFT);
+ pgbase &= ~PAGE_CACHE_MASK;
+
+ do {
+ zero = len;
+ if (pgbase + zero > PAGE_SIZE)
+ zero = PAGE_SIZE - pgbase;
+
+ vpage = kmap_atomic(*page);
+ memset(vpage + pgbase, 0, zero);
+ flush_dcache_page(*page);
+ kunmap_atomic(vpage);
+
+ page++;
+ pgbase = 0;
+
+ } while ((len -= zero) != 0);
+}
+
/**
* _copy_to_pages
* @pages: array of pages
@@ -434,6 +523,22 @@ xdr_shift_buf(struct xdr_buf *buf, size_t len)
}
EXPORT_SYMBOL_GPL(xdr_shift_buf);

+static unsigned int xdr_align_pages(struct xdr_stream *, unsigned int);
+void
+xdr_shift_hole(struct xdr_stream *xdr, size_t offset, size_t length)
+{
+ struct xdr_buf *buf = xdr->buf;
+
+ if (buf->page_len == length)
+ xdr_align_pages(xdr, length);
+ else
+ _shift_data_right_pages(buf->pages, buf->page_base + length,
+ buf->page_base, buf->page_len - length);
+
+ _zero_data_pages(buf->pages, buf->page_base, length);
+}
+EXPORT_SYMBOL_GPL(xdr_shift_hole);
+
/**
* xdr_stream_pos - Return the current offset from the start of the xdr_stream
* @xdr: pointer to struct xdr_stream
@@ -727,6 +832,12 @@ __be32 * xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes)
}
EXPORT_SYMBOL_GPL(xdr_inline_decode);

+static void xdr_align_pages_left(struct xdr_buf *buf, unsigned int len)
+{
+ _shift_data_left_pages(buf->pages, buf->page_base,
+ buf->page_base + len, buf->page_len - len);
+}
+
static unsigned int xdr_align_pages(struct xdr_stream *xdr, unsigned int len)
{
struct xdr_buf *buf = xdr->buf;
@@ -741,7 +852,9 @@ static unsigned int xdr_align_pages(struct xdr_stream *xdr, unsigned int len)
if (iov->iov_len > cur) {
xdr_shrink_bufhead(buf, iov->iov_len - cur);
xdr->nwords = XDR_QUADLEN(buf->len - cur);
- }
+ /* cur points somewhere on the page array */
+ } else if (cur != iov->iov_len)
+ xdr_align_pages_left(buf, cur - iov->iov_len);

if (nwords > xdr->nwords) {
nwords = xdr->nwords;
--
1.8.5.2


2014-01-06 22:32:03

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/3] READ_PLUS rough draft

On Mon, Jan 06, 2014 at 04:57:10PM -0500, Anna Schumaker wrote:
> These patches are my initial implementation of READ_PLUS. I still have a
> few issues to work out before they can be applied, but I wanted to submit
> them anyway to get feedback before going much further. These patches were
> developed on top of my earlier SEEK and WRITE_PLUS patches, and probably
> won't apply cleanly without them (I am willing to reorder things if necessary!).
>
> On the server side, I handle the cases where a file is 100% hole, 100% data
> or hole followed by data. Any holes after a data segment will be expanded
> to zeros on the wire.

I assume that for "a file" I should read "the requested range of the
file"?

hole+data+hole should also be doable, shouldn't it? I'd think the real
problem would be multiple data extents.

> This is due to a limitation in the the NFSD
> encode-to-page function that will adjust pointers to point to the xdr tail
> after reading a file to the "pages" section. Bruce, do you have any
> suggestions here?

The server xdr encoding needs a rewrite. I'll see if I can ignore you
all and put my head down and get a version of that posted this week.

That should make it easier to return all the data, though it will turn
off zero-copy in the case of multiple data extents.

If we want READ_PLUS to support zero copy in the case of multiple
extents then I think we need a new data structure to represent the
resulting rpc reply. An xdr buf only knows how to insert one array of
pages in the middle of the data. Maybe a list of xdr bufs?

But that's an annoying job and possibly a premature optimization.

It might be useful to first understand the typical distribution of holes
in a file and how likely various workloads are to produce reads with
multiple holes in the middle.

--b.

>
> The client side needs to punch holes after decoding page information, since
> data on pages will be aligned at the start of the page array. So a file that
> is <HOLE><DATA> will store the hole information, decode the data, and then
> punch the hole before returning. I think it would be better to use the
> provided offset field to decode everything to their final locations, but I'm
> struggling to come up with a clean way of doing so using the code that is
> already there.
>
> Let me know what you all think!
> Anna
>
> Anna Schumaker (3):
> NFSD: Implement READ_PLUS support
> SUNRPC: This patch adds functions for shifting page data
> NFS: Client side changes for READ_PLUS
>
> fs/nfs/nfs4client.c | 2 +-
> fs/nfs/nfs4proc.c | 23 +++++-
> fs/nfs/nfs4xdr.c | 191 +++++++++++++++++++++++++++++++++++++++++++++
> fs/nfsd/Kconfig | 14 ++++
> fs/nfsd/nfs4proc.c | 9 +++
> fs/nfsd/nfs4xdr.c | 177 ++++++++++++++++++++++++++++++++++++-----
> include/linux/nfs4.h | 1 +
> include/linux/nfs_fs_sb.h | 1 +
> include/linux/sunrpc/xdr.h | 1 +
> net/sunrpc/xdr.c | 115 ++++++++++++++++++++++++++-
> 10 files changed, 511 insertions(+), 23 deletions(-)
>
> --
> 1.8.5.2
>

2014-01-07 15:38:51

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 0/3] READ_PLUS rough draft

On 01/07/2014 09:56 AM, J. Bruce Fields wrote:
> On Tue, Jan 07, 2014 at 09:42:04AM -0500, Anna Schumaker wrote:
>> On 01/06/2014 05:32 PM, J. Bruce Fields wrote:
>>> On Mon, Jan 06, 2014 at 04:57:10PM -0500, Anna Schumaker wrote:
>>>> These patches are my initial implementation of READ_PLUS. I still have a
>>>> few issues to work out before they can be applied, but I wanted to submit
>>>> them anyway to get feedback before going much further. These patches were
>>>> developed on top of my earlier SEEK and WRITE_PLUS patches, and probably
>>>> won't apply cleanly without them (I am willing to reorder things if necessary!).
>>>>
>>>> On the server side, I handle the cases where a file is 100% hole, 100% data
>>>> or hole followed by data. Any holes after a data segment will be expanded
>>>> to zeros on the wire.
>>>
>>> I assume that for "a file" I should read "the requested range of the
>>> file"?
>>
>> Yes.
>>
>>>
>>> hole+data+hole should also be doable, shouldn't it? I'd think the real
>>> problem would be multiple data extents.
>>
>> It might be, but I haven't tried it yet. I can soon!
>>
>>>
>>>> This is due to a limitation in the the NFSD
>>>> encode-to-page function that will adjust pointers to point to the xdr tail
>>>> after reading a file to the "pages" section. Bruce, do you have any
>>>> suggestions here?
>>>
>>> The server xdr encoding needs a rewrite. I'll see if I can ignore you
>>> all and put my head down and get a version of that posted this week.
>>
>> :)
>>
>>>
>>> That should make it easier to return all the data, though it will turn
>>> off zero-copy in the case of multiple data extents.
>>>
>>> If we want READ_PLUS to support zero copy in the case of multiple
>>> extents then I think we need a new data structure to represent the
>>> resulting rpc reply. An xdr buf only knows how to insert one array of
>>> pages in the middle of the data. Maybe a list of xdr bufs?
>>>
>>> But that's an annoying job and possibly a premature optimization.
>>>
>>> It might be useful to first understand the typical distribution of holes
>>> in a file and how likely various workloads are to produce reads with
>>> multiple holes in the middle.
>>
>> I already have a few performance numbers, but nothing that can be trusted due to the number of debugging printk()s I used to make sure the client decoded everything correctly. My plan is to collect the following information using: v4.0, v4.1, v4.2 (SEEK), v4.2 (SEEK + WRITE_PLUS), and v4.2 (SEEK + WRITE_PLUS + READ_PLUS).
>
> What's the workload and hardware setup?

I was going to run filebench tests (fileserver, mongo, varmail) between two VMs. I only have the one laptop with me today, so I can't test between two real machines without asking for a volunteer from Workantile. I am planning to kill Firefox and Thunderbird before running anything!

Anna

>
> --b.
>


2014-01-07 14:42:08

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 0/3] READ_PLUS rough draft

On 01/06/2014 05:32 PM, J. Bruce Fields wrote:
> On Mon, Jan 06, 2014 at 04:57:10PM -0500, Anna Schumaker wrote:
>> These patches are my initial implementation of READ_PLUS. I still have a
>> few issues to work out before they can be applied, but I wanted to submit
>> them anyway to get feedback before going much further. These patches were
>> developed on top of my earlier SEEK and WRITE_PLUS patches, and probably
>> won't apply cleanly without them (I am willing to reorder things if necessary!).
>>
>> On the server side, I handle the cases where a file is 100% hole, 100% data
>> or hole followed by data. Any holes after a data segment will be expanded
>> to zeros on the wire.
>
> I assume that for "a file" I should read "the requested range of the
> file"?

Yes.

>
> hole+data+hole should also be doable, shouldn't it? I'd think the real
> problem would be multiple data extents.

It might be, but I haven't tried it yet. I can soon!

>
>> This is due to a limitation in the the NFSD
>> encode-to-page function that will adjust pointers to point to the xdr tail
>> after reading a file to the "pages" section. Bruce, do you have any
>> suggestions here?
>
> The server xdr encoding needs a rewrite. I'll see if I can ignore you
> all and put my head down and get a version of that posted this week.

:)

>
> That should make it easier to return all the data, though it will turn
> off zero-copy in the case of multiple data extents.
>
> If we want READ_PLUS to support zero copy in the case of multiple
> extents then I think we need a new data structure to represent the
> resulting rpc reply. An xdr buf only knows how to insert one array of
> pages in the middle of the data. Maybe a list of xdr bufs?
>
> But that's an annoying job and possibly a premature optimization.
>
> It might be useful to first understand the typical distribution of holes
> in a file and how likely various workloads are to produce reads with
> multiple holes in the middle.

I already have a few performance numbers, but nothing that can be trusted due to the number of debugging printk()s I used to make sure the client decoded everything correctly. My plan is to collect the following information using: v4.0, v4.1, v4.2 (SEEK), v4.2 (SEEK + WRITE_PLUS), and v4.2 (SEEK + WRITE_PLUS + READ_PLUS).

Anna

>
> --b.
>
>>
>> The client side needs to punch holes after decoding page information, since
>> data on pages will be aligned at the start of the page array. So a file that
>> is <HOLE><DATA> will store the hole information, decode the data, and then
>> punch the hole before returning. I think it would be better to use the
>> provided offset field to decode everything to their final locations, but I'm
>> struggling to come up with a clean way of doing so using the code that is
>> already there.
>>
>> Let me know what you all think!
>> Anna
>>
>> Anna Schumaker (3):
>> NFSD: Implement READ_PLUS support
>> SUNRPC: This patch adds functions for shifting page data
>> NFS: Client side changes for READ_PLUS
>>
>> fs/nfs/nfs4client.c | 2 +-
>> fs/nfs/nfs4proc.c | 23 +++++-
>> fs/nfs/nfs4xdr.c | 191 +++++++++++++++++++++++++++++++++++++++++++++
>> fs/nfsd/Kconfig | 14 ++++
>> fs/nfsd/nfs4proc.c | 9 +++
>> fs/nfsd/nfs4xdr.c | 177 ++++++++++++++++++++++++++++++++++++-----
>> include/linux/nfs4.h | 1 +
>> include/linux/nfs_fs_sb.h | 1 +
>> include/linux/sunrpc/xdr.h | 1 +
>> net/sunrpc/xdr.c | 115 ++++++++++++++++++++++++++-
>> 10 files changed, 511 insertions(+), 23 deletions(-)
>>
>> --
>> 1.8.5.2
>>


2014-01-06 21:57:17

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 1/3] NFSD: Implement READ_PLUS support

I don't break the entire file into appropriate chunks. Instead, if the
first section is a hole I send hole information. Everything else is
reported as data.
---
fs/nfsd/Kconfig | 14 +++++
fs/nfsd/nfs4proc.c | 9 +++
fs/nfsd/nfs4xdr.c | 177 +++++++++++++++++++++++++++++++++++++++++++++++------
3 files changed, 181 insertions(+), 19 deletions(-)

diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index 28d7f5d..cd35125 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -106,6 +106,20 @@ config NFSD_V4_2_WRITE_PLUS

If unsure, say N.

+config NFSD_V4_2_READ_PLUS
+ bool "Enable READ_PLUS support for the NFS v4.2 server"
+ depends on NFSD_V4
+ help
+ Say Y here if you want to enable support for the NFS v4.2 operation
+ READ_PLUS, which is used for reading a file that may contain data
+ holes (sparse files)
+
+ WARNING: there is still a chance of backwards-incompatible protocol
+ changes. This feature is targeted at developers and testers only.
+
+ If unsure, say N.
+
+
config NFSD_V4_SECURITY_LABEL
bool "Provide Security Label support for NFSv4 server"
depends on NFSD_V4 && SECURITY
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 67ed233..905019c 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1943,6 +1943,15 @@ static struct nfsd4_operation nfsd4_ops[] = {
.op_get_currentstateid = (stateid_getter)nfsd4_get_writestateid,
},
#endif /* CONFIG_NFSD_V4_2_WRITE_PLUS */
+#ifdef CONFIG_NFSD_V4_2_READ_PLUS
+ [OP_READ_PLUS] = {
+ .op_func = (nfsd4op_func)nfsd4_read,
+ .op_flags = OP_MODIFIES_SOMETHING,
+ .op_name = "OP_READ_PLUS",
+ .op_rsize_bop = (nfsd4op_rsize)nfsd4_read_rsize,
+ .op_get_currentstateid = (stateid_getter)nfsd4_get_readstateid,
+ },
+#endif /* CONFIG_NFSD_V4_2_READ_PLUS */
#ifdef CONFIG_NFSD_V4_2_SEEK
[OP_SEEK] = {
.op_func = (nfsd4op_func)nfsd4_seek,
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 92946bb..40b7793 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1636,7 +1636,11 @@ static nfsd4_dec nfsd4_dec_ops[] = {
#else
[OP_WRITE_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
#endif /* CONFIG_NFSD_V4_2_WRITE_PLUS */
+#ifdef CONFIG_NFSD_V4_2_READ_PLUS
+ [OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_read,
+#else
[OP_READ_PLUS] = (nfsd4_dec)nfsd4_decode_notsupp,
+#endif /* CONFIG_NFSD_V4_2_READ_PLUS */
#ifdef CONFIG_NFSD_V4_2_SEEK
[OP_SEEK] = (nfsd4_dec)nfsd4_decode_seek,
#else
@@ -3038,29 +3042,14 @@ nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struc
return nfserr;
}

-static __be32
-nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
- struct nfsd4_read *read)
+static void nfsd4_encode_read_setup_pages(struct nfsd4_compoundres *resp,
+ struct nfsd4_read *read,
+ unsigned long maxcount)
{
- u32 eof;
int v;
struct page *page;
- unsigned long maxcount;
- long len;
- __be32 *p;
-
- if (nfserr)
- return nfserr;
- if (resp->xbuf->page_len)
- return nfserr_resource;
-
- RESERVE_SPACE(8); /* eof flag and byte count */
-
- maxcount = svc_max_payload(resp->rqstp);
- if (maxcount > read->rd_length)
- maxcount = read->rd_length;
+ long len = maxcount;

- len = maxcount;
v = 0;
while (len > 0) {
page = *(resp->rqstp->rq_next_page);
@@ -3076,7 +3065,28 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
len -= PAGE_SIZE;
}
read->rd_vlen = v;
+}
+
+static __be32
+nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_read *read)
+{
+ u32 eof;
+ unsigned long maxcount;
+ __be32 *p;

+ if (nfserr)
+ return nfserr;
+ if (resp->xbuf->page_len)
+ return nfserr_resource;
+
+ RESERVE_SPACE(8); /* eof flag and byte count */
+
+ maxcount = svc_max_payload(resp->rqstp);
+ if (maxcount > read->rd_length)
+ maxcount = read->rd_length;
+
+ nfsd4_encode_read_setup_pages(resp, read, maxcount);
nfserr = nfsd_read_file(read->rd_rqstp, read->rd_fhp, read->rd_filp,
read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
&maxcount);
@@ -3631,6 +3641,131 @@ nfsd4_encode_write_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
}
#endif /* CONFIG_NFSD_V4_2_WRITE_PLUS */

+#ifdef CONFIG_NFSD_V4_2_READ_PLUS
+static __be32
+nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
+ struct nfsd4_read *read, u32 *eof)
+{
+ __be32 *p, nfserr;
+ unsigned long maxcount;
+
+ maxcount = svc_max_payload(resp->rqstp);
+ if (maxcount > read->rd_length)
+ maxcount = read->rd_length;
+
+ nfsd4_encode_read_setup_pages(resp, read, maxcount);
+ nfserr = nfsd_read_file(read->rd_rqstp, read->rd_fhp, read->rd_filp,
+ read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
+ &maxcount);
+ if (nfserr)
+ return nfserr;
+
+ RESERVE_SPACE(20);
+ WRITE32(NFS4_CONTENT_DATA);
+ WRITE64(read->rd_offset);
+ WRITE32(true); /* allocated flag */
+ WRITE32(maxcount);
+ ADJUST_ARGS();
+
+ *eof = (read->rd_offset + maxcount >=
+ read->rd_fhp->fh_dentry->d_inode->i_size);
+
+ resp->xbuf->head[0].iov_len = (char*)p
+ - (char*)resp->xbuf->head[0].iov_base;
+ resp->xbuf->page_len = maxcount;
+
+ /* Use rest of head for padding and remaining ops: */
+ resp->xbuf->tail[0].iov_base = p;
+ resp->xbuf->tail[0].iov_len = 0;
+ if (maxcount&3) {
+ RESERVE_SPACE(4);
+ WRITE32(0);
+ resp->xbuf->tail[0].iov_base += maxcount&3;
+ resp->xbuf->tail[0].iov_len = 4 - (maxcount&3);
+ ADJUST_ARGS();
+ }
+ return 0;
+}
+
+static __be32
+nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
+ struct nfsd4_read *read, u32 *eof)
+{
+ __be32 *p;
+ u64 data_pos, max_pos, count;
+
+ /*
+ * Sometimes the file doesn't have any more data
+ */
+ data_pos = vfs_llseek(read->rd_filp, read->rd_offset, SEEK_DATA);
+ max_pos = read->rd_fhp->fh_dentry->d_inode->i_size;
+ if (data_pos > max_pos)
+ data_pos = max_pos;
+ count = data_pos - read->rd_offset;
+
+ RESERVE_SPACE(24);
+ WRITE32(NFS4_CONTENT_HOLE);
+ WRITE64(read->rd_offset);
+ WRITE64(count);
+ WRITE32(false);
+ ADJUST_ARGS();
+
+ *eof = (read->rd_offset + count >= max_pos);
+
+ read->rd_offset += count;
+ if (count > read->rd_length)
+ read->rd_length = 0;
+ else
+ read->rd_length -= count;
+
+ return 0;
+}
+
+static __be32
+nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_read *read)
+{
+ u32 eof, count = 0;
+ __be32 *p, *p_saved;
+
+ if (nfserr)
+ return nfserr;
+ if (resp->xbuf->page_len)
+ return nfserr_resource;
+
+ RESERVE_SPACE(8); /* eof flag and contents count */
+ p_saved = p;
+ WRITE32(0); /* dummy write, we'll revisit this point later */
+ WRITE32(0); /* transmit one giant data chunk */
+ ADJUST_ARGS();
+
+ /**
+ * Encode a hole only if we begin reading from one
+ */
+ if (read->rd_offset == vfs_llseek(read->rd_filp, read->rd_offset, SEEK_HOLE)) {
+ nfserr = nfsd4_encode_read_plus_hole(resp, read, &eof);
+ if (nfserr)
+ return nfserr;
+ count++;
+ if ((read->rd_length == 0) || (eof == true))
+ goto out_done;
+ }
+
+ /**
+ * Encode the rest as data
+ */
+ nfserr = nfsd4_encode_read_plus_data(resp, read, &eof);
+ if (nfserr)
+ return nfserr;
+ count++;
+
+out_done:
+ *p_saved++ = htonl(eof);
+ *p_saved = htonl(count);
+ return 0;
+}
+#endif /* CONFIG_NFSD_V4_2_READ_PLUS */
+
#ifdef CONFIG_NFSD_V4_2_SEEK
static __be32
nfsd4_encode_seek(struct nfsd4_compoundres *resp, __be32 nfserr,
@@ -3737,7 +3872,11 @@ static nfsd4_enc nfsd4_enc_ops[] = {
#else
[OP_WRITE_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
#endif /* CONFIG_NFSD_V4_2_WRITE_PLUS */
+#ifdef CONFIG_NFSD_V4_2_READ_PLUS
+ [OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_read_plus,
+#else
[OP_READ_PLUS] = (nfsd4_enc)nfsd4_encode_noop,
+#endif /* CONFIG_NFSD_V4_2_READ_PLUS */
#ifdef CONFIG_NFSD_V4_2_SEEK
[OP_SEEK] = (nfsd4_enc)nfsd4_encode_seek,
#else
--
1.8.5.2


2014-01-06 22:49:59

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 0/3] READ_PLUS rough draft


On Jan 6, 2014, at 17:32, J. Bruce Fields <[email protected]> wrote:

> On Mon, Jan 06, 2014 at 04:57:10PM -0500, Anna Schumaker wrote:
>> These patches are my initial implementation of READ_PLUS. I still have a
>> few issues to work out before they can be applied, but I wanted to submit
>> them anyway to get feedback before going much further. These patches were
>> developed on top of my earlier SEEK and WRITE_PLUS patches, and probably
>> won't apply cleanly without them (I am willing to reorder things if necessary!).
>>
>> On the server side, I handle the cases where a file is 100% hole, 100% data
>> or hole followed by data. Any holes after a data segment will be expanded
>> to zeros on the wire.
>
> I assume that for "a file" I should read "the requested range of the
> file"?
>
> hole+data+hole should also be doable, shouldn't it? I'd think the real
> problem would be multiple data extents.
>
>> This is due to a limitation in the the NFSD
>> encode-to-page function that will adjust pointers to point to the xdr tail
>> after reading a file to the "pages" section. Bruce, do you have any
>> suggestions here?
>
> The server xdr encoding needs a rewrite. I'll see if I can ignore you
> all and put my head down and get a version of that posted this week.
>
> That should make it easier to return all the data, though it will turn
> off zero-copy in the case of multiple data extents.
>
> If we want READ_PLUS to support zero copy in the case of multiple
> extents then I think we need a new data structure to represent the
> resulting rpc reply. An xdr buf only knows how to insert one array of
> pages in the middle of the data. Maybe a list of xdr bufs?
>
> But that's an annoying job and possibly a premature optimization.
>
> It might be useful to first understand the typical distribution of holes
> in a file and how likely various workloads are to produce reads with
> multiple holes in the middle.

Right. The main purpose of this patch set is to demonstrate that READ PLUS hole feature is pretty much useless in the cases you list above.

Cheers
Trond

2014-01-07 18:11:26

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/3] READ_PLUS rough draft

On Tue, Jan 07, 2014 at 10:38:46AM -0500, Anna Schumaker wrote:
> On 01/07/2014 09:56 AM, J. Bruce Fields wrote:
> > On Tue, Jan 07, 2014 at 09:42:04AM -0500, Anna Schumaker wrote:
> >> On 01/06/2014 05:32 PM, J. Bruce Fields wrote:
> >>> On Mon, Jan 06, 2014 at 04:57:10PM -0500, Anna Schumaker wrote:
> >>>> These patches are my initial implementation of READ_PLUS. I still have a
> >>>> few issues to work out before they can be applied, but I wanted to submit
> >>>> them anyway to get feedback before going much further. These patches were
> >>>> developed on top of my earlier SEEK and WRITE_PLUS patches, and probably
> >>>> won't apply cleanly without them (I am willing to reorder things if necessary!).
> >>>>
> >>>> On the server side, I handle the cases where a file is 100% hole, 100% data
> >>>> or hole followed by data. Any holes after a data segment will be expanded
> >>>> to zeros on the wire.
> >>>
> >>> I assume that for "a file" I should read "the requested range of the
> >>> file"?
> >>
> >> Yes.
> >>
> >>>
> >>> hole+data+hole should also be doable, shouldn't it? I'd think the real
> >>> problem would be multiple data extents.
> >>
> >> It might be, but I haven't tried it yet. I can soon!
> >>
> >>>
> >>>> This is due to a limitation in the the NFSD
> >>>> encode-to-page function that will adjust pointers to point to the xdr tail
> >>>> after reading a file to the "pages" section. Bruce, do you have any
> >>>> suggestions here?
> >>>
> >>> The server xdr encoding needs a rewrite. I'll see if I can ignore you
> >>> all and put my head down and get a version of that posted this week.
> >>
> >> :)
> >>
> >>>
> >>> That should make it easier to return all the data, though it will turn
> >>> off zero-copy in the case of multiple data extents.
> >>>
> >>> If we want READ_PLUS to support zero copy in the case of multiple
> >>> extents then I think we need a new data structure to represent the
> >>> resulting rpc reply. An xdr buf only knows how to insert one array of
> >>> pages in the middle of the data. Maybe a list of xdr bufs?
> >>>
> >>> But that's an annoying job and possibly a premature optimization.
> >>>
> >>> It might be useful to first understand the typical distribution of holes
> >>> in a file and how likely various workloads are to produce reads with
> >>> multiple holes in the middle.
> >>
> >> I already have a few performance numbers, but nothing that can be trusted due to the number of debugging printk()s I used to make sure the client decoded everything correctly. My plan is to collect the following information using: v4.0, v4.1, v4.2 (SEEK), v4.2 (SEEK + WRITE_PLUS), and v4.2 (SEEK + WRITE_PLUS + READ_PLUS).
> >
> > What's the workload and hardware setup?
>
> I was going to run filebench tests (fileserver, mongo, varmail) between two VMs. I only have the one laptop with me today, so I can't test between two real machines without asking for a volunteer from Workantile. I am planning to kill Firefox and Thunderbird before running anything!

That doesn't seem like an interesting test.

Running some generic filesystem benchmarks sounds like a fine idea. If
nothing else to check for regressions. But if we want to figure out
whether it helps where it's supposed to then I think we need to think
about what it's meant to do (reduce bandwidth use transferring sparse
files, I guess. Maybe something copying VM images would be
interesting??)

--b.

2014-01-06 21:57:19

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH 3/3] NFS: Client side changes for READ_PLUS

This patch implements client side read plus. At the moment, the client
is coded to expect files that are: <HOLE>, <DATA>, or <HOLE><DATA> this
may change eventually.
---
fs/nfs/nfs4client.c | 2 +-
fs/nfs/nfs4proc.c | 23 +++++-
fs/nfs/nfs4xdr.c | 191 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/nfs4.h | 1 +
include/linux/nfs_fs_sb.h | 1 +
5 files changed, 215 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index b2bce67..b93672b 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -912,7 +912,7 @@ static int nfs4_server_common_setup(struct nfs_server *server,

/* Set the basic capabilities */
server->caps |= server->nfs_client->cl_mvops->init_caps;
- server->caps |= NFS_CAP_SEEK | NFS_CAP_WRITE_PLUS_HOLE;
+ server->caps |= NFS_CAP_SEEK | NFS_CAP_WRITE_PLUS_HOLE | NFS_CAP_READ_PLUS;
if (server->flags & NFS_MOUNT_NORDIRPLUS)
server->caps &= ~NFS_CAP_READDIRPLUS;
/*
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ce4124c..925f8b9 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4069,9 +4069,13 @@ static bool nfs4_read_stateid_changed(struct rpc_task *task,

static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
{
-
+ struct nfs_server *server = NFS_SERVER(data->header->inode);
dprintk("--> %s\n", __func__);

+ if ((server->nfs_client->cl_minorversion >= 2) && (task->tk_status == -ENOTSUPP)) {
+ server->caps &= ~NFS_CAP_READ_PLUS;
+ return -EAGAIN;
+ }
if (!nfs4_sequence_done(task, &data->res.seq_res))
return -EAGAIN;
if (nfs4_read_stateid_changed(task, &data->args))
@@ -4080,11 +4084,26 @@ static int nfs4_read_done(struct rpc_task *task, struct nfs_read_data *data)
nfs4_read_done_cb(task, data);
}

+#ifdef CONFIG_NFS_V4_2
+static void nfs4_read_plus_support(struct nfs_server *server, struct rpc_message *msg)
+{
+ if ((server->nfs_client->cl_minorversion < 2) || !(server->caps & NFS_CAP_READ_PLUS))
+ msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
+ else
+ msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ_PLUS];
+}
+#else
+static void nfs4_read_plus_support(struct nfs_server *server, struct rpc_message *msg)
+{
+ msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
+}
+#endif /* CONFIG_NFS_V4_2 */
+
static void nfs4_proc_read_setup(struct nfs_read_data *data, struct rpc_message *msg)
{
data->timestamp = jiffies;
data->read_done_cb = nfs4_read_done_cb;
- msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_READ];
+ nfs4_read_plus_support(NFS_SERVER(data->header->inode), msg);
nfs4_init_sequence(&data->args.seq_args, &data->res.seq_res, 0);
}

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 534afee..a45f19d 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -430,6 +430,15 @@ static int nfs4_stat_to_errno(int);
2 /* bytes written */ + \
1 /* committed */ + \
XDR_QUADLEN(NFS4_VERIFIER_SIZE))
+#define encode_read_plus_maxsz (op_encode_hdr_maxsz + \
+ encode_stateid_maxsz + 3)
+#define decode_read_plus_maxsz (op_decode_hdr_maxsz + \
+ 1 /* rpr_eof */ + \
+ 1 /* rpr_contents array size */ + \
+ 1 /* data_content4 */ + \
+ 2 /* data_info4.di_offset */ + \
+ 2 /* data_info4.di_length */ + \
+ 1 /* data_info4.di_allocated */)
#define encode_seek_maxsz (op_encode_hdr_maxsz + \
XDR_QUADLEN(NFS4_STATEID_SIZE) + \
2 /* offset */ + \
@@ -923,6 +932,12 @@ EXPORT_SYMBOL_GPL(nfs41_maxgetdevinfo_overhead);
#define NFS4_dec_write_plus_sz (compound_decode_hdr_maxsz + \
decode_putfh_maxsz + \
decode_write_plus_maxsz)
+#define NFS4_enc_read_plus_sz (compound_encode_hdr_maxsz + \
+ encode_putfh_maxsz +\
+ encode_read_plus_maxsz)
+#define NFS4_dec_read_plus_sz (compound_decode_hdr_maxsz + \
+ decode_putfh_maxsz + \
+ decode_read_plus_maxsz)
#define NFS4_enc_seek_sz (compound_encode_hdr_maxsz + \
encode_putfh_maxsz + \
encode_seek_maxsz)
@@ -2133,6 +2148,20 @@ static void encode_write_plus(struct xdr_stream *xdr,
encode_write_plus_hole(xdr, args);
}

+static void encode_read_plus(struct xdr_stream *xdr,
+ struct nfs_readargs *args,
+ struct compound_hdr *hdr)
+{
+ __be32 *p;
+
+ encode_op_hdr(xdr, OP_READ_PLUS, decode_read_plus_maxsz, hdr);
+ encode_nfs4_stateid(xdr, &args->stateid);
+
+ p = reserve_space(xdr, 12);
+ p = xdr_encode_hyper(p, args->offset);
+ *p = cpu_to_be32(args->count);
+}
+
static void encode_seek(struct xdr_stream *xdr,
struct nfs42_seek_args *args,
struct compound_hdr *hdr)
@@ -3139,6 +3168,28 @@ static void nfs4_xdr_enc_write_plus(struct rpc_rqst *req,
}

/*
+ * Encode READ_PLUS request
+ */
+static void nfs4_xdr_enc_read_plus(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ struct nfs_readargs *args)
+{
+ struct compound_hdr hdr = {
+ .minorversion = nfs4_xdr_minorversion(&args->seq_args),
+ };
+
+ encode_compound_hdr(xdr, req, &hdr);
+ encode_sequence(xdr, &args->seq_args, &hdr);
+ encode_putfh(xdr, args->fh, &hdr);
+ encode_read_plus(xdr, args, &hdr);
+
+ xdr_inline_pages(&req->rq_rcv_buf, hdr.replen << 2,
+ args->pages, args->pgbase, args->count);
+ req->rq_rcv_buf.flags |= XDRBUF_READ;
+ encode_nops(&hdr);
+}
+
+/*
* Encode SEEK request
*/
static void nfs4_xdr_enc_seek(struct rpc_rqst *req,
@@ -6181,6 +6232,119 @@ static int decode_write_plus(struct xdr_stream *xdr, struct nfs42_write_res *res
return decode_write_response(xdr, res);
}

+struct read_plus_hole {
+ uint64_t offset;
+ uint64_t length;
+};
+
+static int decode_read_plus_data(struct xdr_stream *xdr, struct rpc_rqst *req,
+ struct nfs_readres *res)
+{
+ __be32 *p;
+ uint64_t offset;
+ uint32_t count, eof, recvd, allocated;
+
+ eof = res->eof;
+
+ p = xdr_inline_decode(xdr, 16);
+ if (unlikely(!p))
+ goto out_overflow;
+ p = xdr_decode_hyper(p, &offset);
+ allocated = be32_to_cpup(p++);
+ count = be32_to_cpup(p);
+
+ recvd = xdr_read_pages(xdr, count);
+ if (count > recvd) {
+ dprintk("NFS: server cheating in read reply: "
+ "count %u > recvd %u\n", count, recvd);
+ count = recvd;
+ eof = 0;
+ }
+
+ res->count += count;
+ res->eof = eof;
+ return 0;
+out_overflow:
+ print_overflow_msg(__func__, xdr);
+ return -EIO;
+}
+
+static int decode_read_plus_hole(struct xdr_stream *xdr, struct rpc_rqst *req,
+ struct nfs_readres *res, struct read_plus_hole *info)
+{
+ __be32 *p;
+ uint32_t allocated;
+
+ p = xdr_inline_decode(xdr, 20);
+ if (unlikely(!p))
+ goto out_overflow;
+
+ p = xdr_decode_hyper(p, &info->offset);
+ p = xdr_decode_hyper(p, &info->length);
+ allocated = be32_to_cpup(p);
+
+ res->count += info->length;
+ return 0;
+
+out_overflow:
+ print_overflow_msg(__func__, xdr);
+ return -EIO;
+}
+
+static int decode_read_plus(struct xdr_stream *xdr, struct rpc_rqst *req,
+ struct nfs_readres *res)
+{
+ __be32 *p;
+ uint32_t num_contents, content_type;
+ struct read_plus_hole pending_hole = {
+ .offset = 0,
+ .length = 0,
+ };
+ int status, i;
+
+ status = decode_op_hdr(xdr, OP_READ_PLUS);
+ if (status)
+ return status;
+
+ p = xdr_inline_decode(xdr, 8);
+ if (unlikely(!p))
+ goto out_overflow;
+ res->eof = be32_to_cpup(p++);
+ res->count = 0;
+
+ num_contents = be32_to_cpup(p);
+
+ for (i = 0; i < num_contents; i++) {
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ goto out_overflow;
+
+ content_type = be32_to_cpup(p);
+ switch (content_type) {
+ case NFS4_CONTENT_DATA:
+ status = decode_read_plus_data(xdr, req, res);
+ break;
+ case NFS4_CONTENT_HOLE:
+ status = decode_read_plus_hole(xdr, req, res, &pending_hole);
+ break;
+ default:
+ break;
+ }
+
+ if (status != 0)
+ return status;
+ }
+
+ if (pending_hole.length != 0)
+ xdr_shift_hole(xdr, pending_hole.offset, pending_hole.length);
+
+ return 0;
+
+out_overflow:
+ print_overflow_msg(__func__, xdr);
+ return -EIO;
+}
+
static int decode_seek(struct xdr_stream *xdr, struct nfs42_seek_res *res)
{
int status;
@@ -7486,6 +7650,32 @@ out:
}

/*
+ * Decode READ_PLUS response
+ */
+static int nfs4_xdr_dec_read_plus(struct rpc_rqst *rqstp,
+ struct xdr_stream *xdr,
+ struct nfs_readres *res)
+{
+ struct compound_hdr hdr;
+ int status;
+
+ status = decode_compound_hdr(xdr, &hdr);
+ if (status)
+ goto out;
+ status = decode_sequence(xdr, &res->seq_res, rqstp);
+ if (status)
+ goto out;
+ status = decode_putfh(xdr);
+ if (status)
+ goto out;
+ status = decode_read_plus(xdr, rqstp, res);
+ if (!status)
+ status = res->count;
+out:
+ return status;
+}
+
+/*
* Decode SEEK request
*/
static int nfs4_xdr_dec_seek(struct rpc_rqst *rqstp,
@@ -7722,6 +7912,7 @@ struct rpc_procinfo nfs4_procedures[] = {
#endif /* CONFIG_NFS_V4_1 */
#if defined(CONFIG_NFS_V4_2)
PROC(WRITE_PLUS, enc_write_plus, dec_write_plus),
+ PROC(READ_PLUS, enc_read_plus, dec_read_plus),
PROC(SEEK, enc_seek, dec_seek),
#endif /* CONFIG_NFS_V4_2 */
};
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 6c0e0b9..c1175a3 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -488,6 +488,7 @@ enum {

/* nfs42 */
NFSPROC4_CLNT_WRITE_PLUS,
+ NFSPROC4_CLNT_READ_PLUS,
NFSPROC4_CLNT_SEEK,
};

diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index aba3053..f194cd2 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -231,5 +231,6 @@ struct nfs_server {
#define NFS_CAP_SECURITY_LABEL (1U << 18)
#define NFS_CAP_SEEK (1U << 19)
#define NFS_CAP_WRITE_PLUS_HOLE (1U << 20)
+#define NFS_CAP_READ_PLUS (1U << 21)

#endif
--
1.8.5.2