2014-05-11 20:52:54

by J. Bruce Fields

[permalink] [raw]
Subject: nfsd4 xdr encoding fixes v2

Since the previous posting I've delayed a change in error return (see
fc208d026be0c7d60db9118583fc62f6ca97743d 'Revert "nfsd4: fix
nfs4err_resource in 4.1 case"' for discussion) and beefed up some
changelogs and comments based on Christoph's review. I've also rebased;
this series applies on top of

git://linux-nfs.org/~bfields/linux.git for-3.16

which already includes a few more uncontroversial-looking patches from
the previous posting of this series.

Original introduction follows:

This is a collection of fixes for the NFS server's encoding of NFSv4
compounds, along with a few tangentially related cleanups and bugfixes I
noticed along the way.

The basic problem is that we've always assumed an rpc reply is either

- "small" (much less than a page), or
- looks like a read (a bunch of data with a little bit at the
beginning and the end).

That assumption has allowed us to cover the most important cases without
having to deal with some annoying details like how to encode arbitrary
data across a page boundary, but:

- The inability to encode attributes of more than a page annoys
some people who would like to get and set extraordinarily long
ACLs.

- The inability to encode attributes that cross page boundaries
also means we can't return more than a page of readdir data at
a time, limiting readdir performance on large directories.

- The NFSv4 protocol doesn't really allow us to place these
sorts of arbitrary limits on the types of compounds we handle.
(Well, 4.0 is a bit fuzzy on this point, but 4.1 I think
definitely considers it a bug if a server won't handle, e.g.,
multiple read ops in a compound.) This hasn't been an issue
because most of these exotic compounds aren't really useful to
clients. But maybe future clients will find a use for some of
them--in which case we'd prefer not to make the work around a
server that doesn't meet the spec.

So, the main goal is to fix those limitations. We also get to share a
little more code with the client.

Further work may include:

- writing more pynfs tests for exotic compounds and odd corner
cases,
- auditing the annoying nfsd4_*_rsize() functions, which we now
depend on for more things,
- improving our (currently very sloppy) estimate of how much
space we need for krb5i/krb5p to checksum/encrypt the result.
- sharing some of this with the v2/v3 code (especially in the
read and readdir cases),
- allow rpc's whose call and reply are both very large (our one
remaining dubious limit on compounds, though again something
clients seem unlikely to notice for now),
- on the decode side, eliminating the existing macros and
sharing more helpers with the client.

--b.


2014-05-13 14:48:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd4 xdr encoding fixes v2

On Tue, May 13, 2014 at 04:09:45AM -0700, Christoph Hellwig wrote:
> On Mon, May 12, 2014 at 09:11:28AM -0700, Christoph Hellwig wrote:
> > On Mon, May 12, 2014 at 12:07:41PM -0400, J. Bruce Fields wrote:
> > > On Mon, May 12, 2014 at 01:20:59AM -0700, Christoph Hellwig wrote:
> > > > This series seem to cause hangs during xfstests against a server on the
> > > > same VM. The trace is fairly similar every the hang happens, but the
> > > > point at which it happens differs:
> > >
> > > Ouch, OK, and you're sure it starts with this series?
> > >
> > > I guess I should try to replicate it here. Might take a copule days.
>
> Seems lile "nfsd4: allow exotic read compounds" is the culprit.

OK, it makes sense that the problem would be there. Looking....

--b.

2014-05-11 20:52:54

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 01/43] nfsd4: embed xdr_stream in nfsd4_compoundres

From: "J. Bruce Fields" <[email protected]>

This is a mechanical transformation with no change in behavior.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 12 +++++-----
fs/nfsd/nfs4state.c | 8 +++----
fs/nfsd/nfs4xdr.c | 65 ++++++++++++++++++++++++++-------------------------
fs/nfsd/xdr4.h | 4 +---
4 files changed, 44 insertions(+), 45 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 2c1ee70..46370f5 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1268,13 +1268,13 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
u32 plen = 0;
__be32 status;

- resp->xbuf = &rqstp->rq_res;
- resp->p = rqstp->rq_res.head[0].iov_base +
+ resp->xdr.buf = &rqstp->rq_res;
+ resp->xdr.p = rqstp->rq_res.head[0].iov_base +
rqstp->rq_res.head[0].iov_len;
- resp->tagp = resp->p;
+ resp->tagp = resp->xdr.p;
/* reserve space for: taglen, tag, and opcnt */
- resp->p += 2 + XDR_QUADLEN(args->taglen);
- resp->end = rqstp->rq_res.head[0].iov_base + PAGE_SIZE;
+ resp->xdr.p += 2 + XDR_QUADLEN(args->taglen);
+ resp->xdr.end = rqstp->rq_res.head[0].iov_base + PAGE_SIZE;
resp->taglen = args->taglen;
resp->tag = args->tag;
resp->opcnt = 0;
@@ -1326,7 +1326,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
* failed response to the next operation. If we don't
* have enough room, fail with ERR_RESOURCE.
*/
- slack_bytes = (char *)resp->end - (char *)resp->p;
+ slack_bytes = (char *)resp->xdr.end - (char *)resp->xdr.p;
if (slack_bytes < COMPOUND_SLACK_SPACE
+ COMPOUND_ERR_SLACK_SPACE) {
BUG_ON(slack_bytes < COMPOUND_ERR_SLACK_SPACE);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fac2683..05cc3eb 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1569,10 +1569,10 @@ nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
slot->sl_datalen = 0;
return;
}
- slot->sl_datalen = (char *)resp->p - (char *)resp->cstate.datap;
+ slot->sl_datalen = (char *)resp->xdr.p - (char *)resp->cstate.datap;
base = (char *)resp->cstate.datap -
- (char *)resp->xbuf->head[0].iov_base;
- if (read_bytes_from_xdr_buf(resp->xbuf, base, slot->sl_data,
+ (char *)resp->xdr.buf->head[0].iov_base;
+ if (read_bytes_from_xdr_buf(resp->xdr.buf, base, slot->sl_data,
slot->sl_datalen))
WARN("%s: sessions DRC could not cache compound\n", __func__);
return;
@@ -1626,7 +1626,7 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
memcpy(resp->cstate.datap, slot->sl_data, slot->sl_datalen);

resp->opcnt = slot->sl_opcnt;
- resp->p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
+ resp->xdr.p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
status = slot->sl_status;

return status;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 18881f3..ef65ffc 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1747,10 +1747,10 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
}

#define RESERVE_SPACE(nbytes) do { \
- p = resp->p; \
- BUG_ON(p + XDR_QUADLEN(nbytes) > resp->end); \
+ p = resp->xdr.p; \
+ BUG_ON(p + XDR_QUADLEN(nbytes) > resp->xdr.end); \
} while (0)
-#define ADJUST_ARGS() resp->p = p
+#define ADJUST_ARGS() resp->xdr.p = p

/* Encode as an array of strings the string given with components
* separated @sep, escaped with esc_enter and esc_exit.
@@ -2751,9 +2751,9 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
if (nfserr)
return nfserr;

- buflen = resp->end - resp->p - (COMPOUND_ERR_SLACK_SPACE >> 2);
+ buflen = resp->xdr.end - resp->xdr.p - (COMPOUND_ERR_SLACK_SPACE >> 2);
nfserr = nfsd4_encode_fattr(fhp, fhp->fh_export, fhp->fh_dentry,
- &resp->p, buflen, getattr->ga_bmval,
+ &resp->xdr.p, buflen, getattr->ga_bmval,
resp->rqstp, 0);
return nfserr;
}
@@ -2953,7 +2953,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

if (nfserr)
return nfserr;
- if (resp->xbuf->page_len)
+ if (resp->xdr.buf->page_len)
return nfserr_resource;

RESERVE_SPACE(8); /* eof flag and byte count */
@@ -2991,18 +2991,18 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(eof);
WRITE32(maxcount);
ADJUST_ARGS();
- resp->xbuf->head[0].iov_len = (char*)p
- - (char*)resp->xbuf->head[0].iov_base;
- resp->xbuf->page_len = maxcount;
+ resp->xdr.buf->head[0].iov_len = (char*)p
+ - (char*)resp->xdr.buf->head[0].iov_base;
+ resp->xdr.buf->page_len = maxcount;

/* Use rest of head for padding and remaining ops: */
- resp->xbuf->tail[0].iov_base = p;
- resp->xbuf->tail[0].iov_len = 0;
+ resp->xdr.buf->tail[0].iov_base = p;
+ resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
RESERVE_SPACE(4);
WRITE32(0);
- resp->xbuf->tail[0].iov_base += maxcount&3;
- resp->xbuf->tail[0].iov_len = 4 - (maxcount&3);
+ resp->xdr.buf->tail[0].iov_base += maxcount&3;
+ resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
ADJUST_ARGS();
}
return 0;
@@ -3017,7 +3017,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd

if (nfserr)
return nfserr;
- if (resp->xbuf->page_len)
+ if (resp->xdr.buf->page_len)
return nfserr_resource;
if (!*resp->rqstp->rq_next_page)
return nfserr_resource;
@@ -3041,18 +3041,18 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd

WRITE32(maxcount);
ADJUST_ARGS();
- resp->xbuf->head[0].iov_len = (char*)p
- - (char*)resp->xbuf->head[0].iov_base;
- resp->xbuf->page_len = maxcount;
+ resp->xdr.buf->head[0].iov_len = (char*)p
+ - (char*)resp->xdr.buf->head[0].iov_base;
+ resp->xdr.buf->page_len = maxcount;

/* Use rest of head for padding and remaining ops: */
- resp->xbuf->tail[0].iov_base = p;
- resp->xbuf->tail[0].iov_len = 0;
+ resp->xdr.buf->tail[0].iov_base = p;
+ resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
RESERVE_SPACE(4);
WRITE32(0);
- resp->xbuf->tail[0].iov_base += maxcount&3;
- resp->xbuf->tail[0].iov_len = 4 - (maxcount&3);
+ resp->xdr.buf->tail[0].iov_base += maxcount&3;
+ resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
ADJUST_ARGS();
}
return 0;
@@ -3068,7 +3068,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4

if (nfserr)
return nfserr;
- if (resp->xbuf->page_len)
+ if (resp->xdr.buf->page_len)
return nfserr_resource;
if (!*resp->rqstp->rq_next_page)
return nfserr_resource;
@@ -3080,7 +3080,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
WRITE32(0);
WRITE32(0);
ADJUST_ARGS();
- resp->xbuf->head[0].iov_len = ((char*)resp->p) - (char*)resp->xbuf->head[0].iov_base;
+ resp->xdr.buf->head[0].iov_len = ((char*)resp->xdr.p)
+ - (char*)resp->xdr.buf->head[0].iov_base;
tailbase = p;

maxcount = PAGE_SIZE;
@@ -3121,14 +3122,14 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
p = readdir->buffer;
*p++ = 0; /* no more entries */
*p++ = htonl(readdir->common.err == nfserr_eof);
- resp->xbuf->page_len = ((char*)p) -
+ resp->xdr.buf->page_len = ((char*)p) -
(char*)page_address(*(resp->rqstp->rq_next_page-1));

/* Use rest of head for padding and remaining ops: */
- resp->xbuf->tail[0].iov_base = tailbase;
- resp->xbuf->tail[0].iov_len = 0;
- resp->p = resp->xbuf->tail[0].iov_base;
- resp->end = resp->p + (PAGE_SIZE - resp->xbuf->head[0].iov_len)/4;
+ resp->xdr.buf->tail[0].iov_base = tailbase;
+ resp->xdr.buf->tail[0].iov_len = 0;
+ resp->xdr.p = resp->xdr.buf->tail[0].iov_base;
+ resp->xdr.end = resp->xdr.p + (PAGE_SIZE - resp->xdr.buf->head[0].iov_len)/4;

return 0;
err_no_verf:
@@ -3587,10 +3588,10 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 pad)
session = resp->cstate.session;

if (xb->page_len == 0) {
- length = (char *)resp->p - (char *)xb->head[0].iov_base + pad;
+ length = (char *)resp->xdr.p - (char *)xb->head[0].iov_base + pad;
} else {
if (xb->tail[0].iov_base && xb->tail[0].iov_len > 0)
- tlen = (char *)resp->p - (char *)xb->tail[0].iov_base;
+ tlen = (char *)resp->xdr.p - (char *)xb->tail[0].iov_base;

length = xb->head[0].iov_len + xb->page_len + tlen + pad;
}
@@ -3629,7 +3630,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
op->status = nfsd4_check_resp_size(resp, 0);
if (so) {
so->so_replay.rp_status = op->status;
- so->so_replay.rp_buflen = (char *)resp->p - (char *)(statp+1);
+ so->so_replay.rp_buflen = (char *)resp->xdr.p - (char *)(statp+1);
memcpy(so->so_replay.rp_buf, statp+1, so->so_replay.rp_buflen);
}
status:
@@ -3731,7 +3732,7 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
iov = &rqstp->rq_res.tail[0];
else
iov = &rqstp->rq_res.head[0];
- iov->iov_len = ((char*)resp->p) - (char*)iov->iov_base;
+ iov->iov_len = ((char*)resp->xdr.p) - (char*)iov->iov_base;
BUG_ON(iov->iov_len > PAGE_SIZE);
if (nfsd4_has_session(cs)) {
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 5ea7df3..6884d70 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -506,9 +506,7 @@ struct nfsd4_compoundargs {

struct nfsd4_compoundres {
/* scratch variables for XDR encode */
- __be32 * p;
- __be32 * end;
- struct xdr_buf * xbuf;
+ struct xdr_stream xdr;
struct svc_rqst * rqstp;

u32 taglen;
--
1.7.9.5


2014-05-12 08:20:59

by Christoph Hellwig

[permalink] [raw]
Subject: Re: nfsd4 xdr encoding fixes v2

This series seem to cause hangs during xfstests against a server on the
same VM. The trace is fairly similar every the hang happens, but the
point at which it happens differs:

[ 3120.186527] INFO: task fill:26222 blocked for more than 120 seconds.
[ 3120.187607] Not tainted 3.15.0-rc1+ #22
[ 3120.188424] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 3120.189765] fill D ffff88007a5b3c20 0 26222 26130
0x00000002
[ 3120.191158] ffff88007a5b3b78 0000000000000046 ffff880079284f10
0000000000013dc0
[ 3120.192666] ffff88007a5b3fd8 0000000000013dc0 ffff88007350cf10
ffff880079284f10
[ 3120.195303] 0000000000000000 0000000000000002 0000000000000001
0000000000000000
[ 3120.197980] Call Trace:
[ 3120.198849] [<ffffffff8112ff2d>] ? __delayacct_blkio_start+0x1d/0x20
[ 3120.200791] [<ffffffff810ead35>] ? prepare_to_wait+0x25/0x90
[ 3120.202438] [<ffffffff811114f5>] ? ktime_get_ts+0x145/0x180
[ 3120.204033] [<ffffffff8115ef50>] ? __lock_page+0x70/0x70
[ 3120.205598] [<ffffffff8107c83f>] ? kvm_clock_read+0x1f/0x30
[ 3120.207236] [<ffffffff8107c859>] ? kvm_clock_get_cycles+0x9/0x10
[ 3120.209006] [<ffffffff81111464>] ? ktime_get_ts+0xb4/0x180
[ 3120.210828] [<ffffffff8112ff2d>] ? __delayacct_blkio_start+0x1d/0x20
[ 3120.212645] [<ffffffff8115ef50>] ? __lock_page+0x70/0x70
[ 3120.214290] [<ffffffff81ce5294>] schedule+0x24/0x70
[ 3120.216915] [<ffffffff81ce536a>] io_schedule+0x8a/0xd0
[ 3120.218484] [<ffffffff8115ef59>] sleep_on_page+0x9/0x10
[ 3120.219979] [<ffffffff81ce5a8a>] __wait_on_bit+0x5a/0x90
[ 3120.221543] [<ffffffff8115e9cf>] ? find_get_pages_tag+0x1f/0x190
[ 3120.223310] [<ffffffff8115f438>] wait_on_page_bit+0x78/0x80
[ 3120.224934] [<ffffffff810eb240>] ? wake_atomic_t_function+0x30/0x30
[ 3120.226755] [<ffffffff8115f5a2>] filemap_fdatawait_range+0x102/0x190
[ 3120.228615] [<ffffffff8116033a>]
filemap_write_and_wait_range+0x4a/0x80
[ 3120.230640] [<ffffffff8135c00f>] nfs4_file_fsync+0x5f/0xb0
[ 3120.232230] [<ffffffff811d70c1>] vfs_fsync+0x21/0x30
[ 3120.233716] [<ffffffff8132a1fe>] nfs_file_flush+0x6e/0x90
[ 3120.235261] [<ffffffff811a4ac5>] filp_close+0x35/0x80
[ 3120.236758] [<ffffffff811c4844>] put_files_struct+0x94/0xe0
[ 3120.238361] [<ffffffff811c494d>] exit_files+0x4d/0x60
[ 3120.239863] [<ffffffff810ad947>] do_exit+0x297/0xa00
[ 3120.241336] [<ffffffff811a91b8>] ? __sb_end_write+0x78/0x80
[ 3120.242925] [<ffffffff81cea158>] ? retint_swapgs+0x13/0x1b
[ 3120.244541] [<ffffffff810ae1d7>] do_group_exit+0x47/0xc0
[ 3120.246129] [<ffffffff810ae262>] SyS_exit_group+0x12/0x20
[ 3120.247960] [<ffffffff81cf24f9>] system_call_fastpath+0x16/0x1b
[ 3120.249226] no locks held by fill/26222.


2014-05-11 20:53:04

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 32/43] nfsd4: don't treat readlink like a zero-copy operation

From: "J. Bruce Fields" <[email protected]>

There's no advantage to this zero-copy-style readlink encoding, and it
unnecessarily limits the kinds of compounds we can handle. (In practice
I can't see why a client would want e.g. multiple readlink calls in a
comound, but it's probably a spec violation for us not to handle it.)

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 44 +++++++++++++-------------------------------
1 file changed, 13 insertions(+), 31 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 58717de..4dba311 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3154,8 +3154,9 @@ static __be32
nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readlink *readlink)
{
int maxcount;
+ __be32 wire_count;
+ int zero = 0;
struct xdr_stream *xdr = &resp->xdr;
- char *page;
int length_offset = xdr->buf->len;
__be32 *p;

@@ -3165,51 +3166,32 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
p = xdr_reserve_space(xdr, 4);
if (!p)
return nfserr_resource;
-
- if (resp->xdr.buf->page_len)
- return nfserr_resource;
- if (!*resp->rqstp->rq_next_page)
- return nfserr_resource;
-
- page = page_address(*(resp->rqstp->rq_next_page++));
-
maxcount = PAGE_SIZE;

- if (xdr->end - xdr->p < 1)
+ p = xdr_reserve_space(xdr, maxcount);
+ if (!p)
return nfserr_resource;
-
/*
* XXX: By default, the ->readlink() VFS op will truncate symlinks
* if they would overflow the buffer. Is this kosher in NFSv4? If
* not, one easy fix is: if ->readlink() precisely fills the buffer,
* assume that truncation occurred, and return NFS4ERR_RESOURCE.
*/
- nfserr = nfsd_readlink(readlink->rl_rqstp, readlink->rl_fhp, page, &maxcount);
+ nfserr = nfsd_readlink(readlink->rl_rqstp, readlink->rl_fhp, (void *)p, &maxcount);
+
if (nfserr == nfserr_isdir)
- nfserr = nfserr_inval;
+ nfserr= nfserr_inval;
if (nfserr) {
xdr_truncate_encode(xdr, length_offset);
return nfserr;
}

- WRITE32(maxcount);
- resp->xdr.buf->head[0].iov_len = (char*)p
- - (char*)resp->xdr.buf->head[0].iov_base;
- resp->xdr.buf->page_len = maxcount;
- xdr->buf->len += maxcount;
- xdr->page_ptr += 1;
- xdr->buf->buflen -= PAGE_SIZE;
- xdr->iov = xdr->buf->tail;
-
- /* Use rest of head for padding and remaining ops: */
- resp->xdr.buf->tail[0].iov_base = p;
- resp->xdr.buf->tail[0].iov_len = 0;
- if (maxcount&3) {
- p = xdr_reserve_space(xdr, 4);
- WRITE32(0);
- resp->xdr.buf->tail[0].iov_base += maxcount&3;
- resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
- }
+ wire_count = htonl(maxcount);
+ write_bytes_to_xdr_buf(xdr->buf, length_offset, &wire_count, 4);
+ xdr_truncate_encode(xdr, length_offset + 4 + maxcount);
+ if (maxcount & 3)
+ write_bytes_to_xdr_buf(xdr->buf, length_offset + 4 + maxcount,
+ &zero, 4 - (maxcount&3));
return 0;
}

--
1.7.9.5


2014-05-11 20:53:07

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 37/43] nfsd4: separate splice and readv cases

From: "J. Bruce Fields" <[email protected]>

The splice and readv cases are actually quite different--for example the
former case ignores the array of vectors we build up for the latter.

It is probably clearer to separate the two cases entirely.

There's some code duplication between the split out encoders, but this
is only temporary and will be fixed by a later patch.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 163 ++++++++++++++++++++++++++++++++++++++++-------------
fs/nfsd/vfs.c | 121 ++++++++++++++++++++++++---------------
fs/nfsd/vfs.h | 8 +++
3 files changed, 207 insertions(+), 85 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index f69906d..19539fc 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3069,36 +3069,77 @@ nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struc
return nfserr;
}

-static __be32
-nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
- struct nfsd4_read *read)
+static __be32 nfsd4_encode_splice_read(
+ struct nfsd4_compoundres *resp,
+ struct nfsd4_read *read,
+ struct file *file, unsigned long maxcount)
{
- u32 eof;
- int v;
- struct page *page;
- unsigned long maxcount;
struct xdr_stream *xdr = &resp->xdr;
- int starting_len = xdr->buf->len;
- long len;
+ u32 eof;
+ int starting_len = xdr->buf->len - 8;
+ __be32 nfserr;
+ __be32 tmp;
__be32 *p;

- if (nfserr)
+ /*
+ * Don't inline pages unless we know there's room for eof,
+ * count, and possible padding:
+ */
+ if (xdr->end - xdr->p < 3)
+ return nfserr_resource;
+
+ nfserr = nfsd_splice_read(read->rd_rqstp, file,
+ read->rd_offset, &maxcount);
+ if (nfserr) {
+ /*
+ * nfsd_splice_actor may have already messed with the
+ * page length; reset it so as not to confuse
+ * xdr_truncate_encode:
+ */
+ xdr->buf->page_len = 0;
return nfserr;
+ }

- p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
- if (!p)
- return nfserr_resource;
+ eof = (read->rd_offset + maxcount >=
+ read->rd_fhp->fh_dentry->d_inode->i_size);

- /* Make sure there will be room for padding if needed: */
- if (xdr->end - xdr->p < 1)
- return nfserr_resource;
+ tmp = htonl(eof);
+ write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4);
+ tmp = htonl(maxcount);
+ write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);

- if (resp->xdr.buf->page_len)
- return nfserr_resource;
+ resp->xdr.buf->page_len = maxcount;
+ xdr->buf->len += maxcount;
+ xdr->page_ptr += (maxcount + PAGE_SIZE - 1) / PAGE_SIZE;
+ xdr->buf->buflen = maxcount + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
+ xdr->iov = xdr->buf->tail;

- maxcount = svc_max_payload(resp->rqstp);
- if (maxcount > read->rd_length)
- maxcount = read->rd_length;
+ /* Use rest of head for padding and remaining ops: */
+ resp->xdr.buf->tail[0].iov_base = xdr->p;
+ resp->xdr.buf->tail[0].iov_len = 0;
+ if (maxcount&3) {
+ p = xdr_reserve_space(xdr, 4);
+ WRITE32(0);
+ resp->xdr.buf->tail[0].iov_base += maxcount&3;
+ resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
+ xdr->buf->len -= (maxcount&3);
+ }
+ return 0;
+}
+
+static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
+ struct nfsd4_read *read,
+ struct file *file, unsigned long maxcount)
+{
+ struct xdr_stream *xdr = &resp->xdr;
+ u32 eof;
+ int v;
+ struct page *page;
+ int starting_len = xdr->buf->len - 8;
+ long len;
+ __be32 nfserr;
+ __be32 tmp;
+ __be32 *p;

len = maxcount;
v = 0;
@@ -3119,27 +3160,19 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
}
read->rd_vlen = v;

- nfserr = nfsd_read_file(read->rd_rqstp, read->rd_fhp, read->rd_filp,
- read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
- &maxcount);
-
- if (nfserr) {
- /*
- * nfsd_splice_actor may have already messed with the
- * page length; reset it so as not to confuse
- * xdr_truncate_encode:
- */
- xdr->buf->page_len = 0;
- xdr_truncate_encode(xdr, starting_len);
+ nfserr = nfsd_readv(file, read->rd_offset, resp->rqstp->rq_vec,
+ read->rd_vlen, &maxcount);
+ if (nfserr)
return nfserr;
- }
+
eof = (read->rd_offset + maxcount >=
read->rd_fhp->fh_dentry->d_inode->i_size);

- WRITE32(eof);
- WRITE32(maxcount);
- WARN_ON_ONCE(resp->xdr.buf->head[0].iov_len != (char *)p
- - (char *)resp->xdr.buf->head[0].iov_base);
+ tmp = htonl(eof);
+ write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4);
+ tmp = htonl(maxcount);
+ write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
+
resp->xdr.buf->page_len = maxcount;
xdr->buf->len += maxcount;
xdr->page_ptr += v;
@@ -3147,7 +3180,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
- resp->xdr.buf->tail[0].iov_base = p;
+ resp->xdr.buf->tail[0].iov_base = xdr->p;
resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
p = xdr_reserve_space(xdr, 4);
@@ -3157,6 +3190,58 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
xdr->buf->len -= (maxcount&3);
}
return 0;
+
+}
+
+static __be32
+nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_read *read)
+{
+ unsigned long maxcount;
+ struct xdr_stream *xdr = &resp->xdr;
+ struct file *file = read->rd_filp;
+ int starting_len = xdr->buf->len;
+ struct raparms *ra;
+ __be32 *p;
+ __be32 err;
+
+ if (nfserr)
+ return nfserr;
+
+ p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
+ if (!p) {
+ WARN_ON_ONCE(1);
+ return nfserr_resource;
+ }
+
+ if (resp->xdr.buf->page_len)
+ return nfserr_resource;
+
+ xdr_commit_encode(xdr);
+
+ maxcount = svc_max_payload(resp->rqstp);
+ if (maxcount > read->rd_length)
+ maxcount = read->rd_length;
+
+ if (!read->rd_filp) {
+ err = nfsd_get_tmp_read_open(resp->rqstp, read->rd_fhp,
+ &file, &ra);
+ if (err)
+ goto err_truncate;
+ }
+
+ if (file->f_op->splice_read && resp->rqstp->rq_splice_ok)
+ err = nfsd4_encode_splice_read(resp, read, file, maxcount);
+ else
+ err = nfsd4_encode_readv(resp, read, file, maxcount);
+
+ if (!read->rd_filp)
+ nfsd_put_tmp_read_open(file, ra);
+
+err_truncate:
+ if (err)
+ xdr_truncate_encode(xdr, starting_len);
+ return err;
}

static __be32
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index cfd83f6..45292e4 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -820,41 +820,54 @@ static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
return __splice_from_pipe(pipe, sd, nfsd_splice_actor);
}

-static __be32
-nfsd_vfs_read(struct svc_rqst *rqstp, struct file *file,
- loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
+__be32 nfsd_finish_read(struct file *file, unsigned long *count, int host_err)
{
- mm_segment_t oldfs;
- __be32 err;
- int host_err;
-
- err = nfserr_perm;
-
- if (file->f_op->splice_read && rqstp->rq_splice_ok) {
- struct splice_desc sd = {
- .len = 0,
- .total_len = *count,
- .pos = offset,
- .u.data = rqstp,
- };
-
- rqstp->rq_next_page = rqstp->rq_respages + 1;
- host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
- } else {
- oldfs = get_fs();
- set_fs(KERNEL_DS);
- host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
- set_fs(oldfs);
- }
-
if (host_err >= 0) {
nfsdstats.io_read += host_err;
*count = host_err;
- err = 0;
fsnotify_access(file);
+ return 0;
} else
- err = nfserrno(host_err);
- return err;
+ return nfserrno(host_err);
+}
+
+int nfsd_splice_read(struct svc_rqst *rqstp,
+ struct file *file, loff_t offset, unsigned long *count)
+{
+ struct splice_desc sd = {
+ .len = 0,
+ .total_len = *count,
+ .pos = offset,
+ .u.data = rqstp,
+ };
+ int host_err;
+
+ rqstp->rq_next_page = rqstp->rq_respages + 1;
+ host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
+ return nfsd_finish_read(file, count, host_err);
+}
+
+int nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
+ unsigned long *count)
+{
+ mm_segment_t oldfs;
+ int host_err;
+
+ oldfs = get_fs();
+ set_fs(KERNEL_DS);
+ host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
+ set_fs(oldfs);
+ return nfsd_finish_read(file, count, host_err);
+}
+
+static __be32
+nfsd_vfs_read(struct svc_rqst *rqstp, struct file *file,
+ loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
+{
+ if (file->f_op->splice_read && rqstp->rq_splice_ok)
+ return nfsd_splice_read(rqstp, file, offset, count);
+ else
+ return nfsd_readv(file, offset, vec, vlen, count);
}

static void kill_suid(struct dentry *dentry)
@@ -962,33 +975,28 @@ out_nfserr:
return err;
}

-/*
- * Read data from a file. count must contain the requested read count
- * on entry. On return, *count contains the number of bytes actually read.
- * N.B. After this call fhp needs an fh_put
- */
-__be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
- loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
+__be32 nfsd_get_tmp_read_open(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ struct file **file, struct raparms **ra)
{
- struct file *file;
struct inode *inode;
- struct raparms *ra;
__be32 err;

- err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
+ err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, file);
if (err)
return err;

- inode = file_inode(file);
+ inode = file_inode(*file);

/* Get readahead parameters */
- ra = nfsd_get_raparms(inode->i_sb->s_dev, inode->i_ino);
-
- if (ra && ra->p_set)
- file->f_ra = ra->p_ra;
+ *ra = nfsd_get_raparms(inode->i_sb->s_dev, inode->i_ino);

- err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
+ if (*ra && (*ra)->p_set)
+ (*file)->f_ra = (*ra)->p_ra;
+ return nfs_ok;
+}

+void nfsd_put_tmp_read_open(struct file *file, struct raparms *ra)
+{
/* Write back readahead params */
if (ra) {
struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
@@ -998,8 +1006,29 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
ra->p_count--;
spin_unlock(&rab->pb_lock);
}
-
nfsd_close(file);
+}
+
+/*
+ * Read data from a file. count must contain the requested read count
+ * on entry. On return, *count contains the number of bytes actually read.
+ * N.B. After this call fhp needs an fh_put
+ */
+__be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
+{
+ struct file *file;
+ struct raparms *ra;
+ __be32 err;
+
+ err = nfsd_get_tmp_read_open(rqstp, fhp, &file, &ra);
+ if (err)
+ return err;
+
+ err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
+
+ nfsd_put_tmp_read_open(file, ra);
+
return err;
}

diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index fbe90bd..7441e96 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -70,6 +70,14 @@ __be32 nfsd_commit(struct svc_rqst *, struct svc_fh *,
__be32 nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
int, struct file **);
void nfsd_close(struct file *);
+struct raparms;
+__be32 nfsd_get_tmp_read_open(struct svc_rqst *, struct svc_fh *,
+ struct file **, struct raparms **);
+void nfsd_put_tmp_read_open(struct file *, struct raparms *);
+int nfsd_splice_read(struct svc_rqst *,
+ struct file *, loff_t, unsigned long *);
+int nfsd_readv(struct file *, loff_t, struct kvec *, int,
+ unsigned long *);
__be32 nfsd_read(struct svc_rqst *, struct svc_fh *,
loff_t, struct kvec *, int, unsigned long *);
__be32 nfsd_read_file(struct svc_rqst *, struct svc_fh *, struct file *,
--
1.7.9.5


2014-05-11 20:53:05

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 31/43] nfsd4: enforce rd_dircount

From: "J. Bruce Fields" <[email protected]>

As long as we're here, let's enforce the protocol's limit on the number
of directory entries to return in a readdir.

I don't think anyone's ever noticed our lack of enforcement, but maybe
there's more of a chance they will now that we allow larger readdirs.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 731587c..58717de 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1033,7 +1033,7 @@ nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *read
READ_BUF(24);
READ64(readdir->rd_cookie);
COPYMEM(readdir->rd_verf.data, sizeof(readdir->rd_verf.data));
- READ32(readdir->rd_dircount); /* just in case you needed a useless field... */
+ READ32(readdir->rd_dircount);
READ32(readdir->rd_maxcount);
if ((status = nfsd4_decode_bitmap(argp, readdir->rd_bmval)))
goto out;
@@ -2720,6 +2720,9 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
if (entry_bytes > cd->rd_maxcount)
goto fail;
cd->rd_maxcount -= entry_bytes;
+ if (!cd->rd_dircount)
+ goto fail;
+ cd->rd_dircount--;
cd->cookie_offset = cookie_offset;
skip_entry:
cd->common.err = nfs_ok;
--
1.7.9.5


2014-05-13 21:33:27

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd4 xdr encoding fixes v2

On Tue, May 13, 2014 at 05:18:41PM -0400, bfields wrote:
> On Tue, May 13, 2014 at 10:48:26AM -0400, J. Bruce Fields wrote:
> > On Tue, May 13, 2014 at 04:09:45AM -0700, Christoph Hellwig wrote:
> > > On Mon, May 12, 2014 at 09:11:28AM -0700, Christoph Hellwig wrote:
> > > > On Mon, May 12, 2014 at 12:07:41PM -0400, J. Bruce Fields wrote:
> > > > > On Mon, May 12, 2014 at 01:20:59AM -0700, Christoph Hellwig wrote:
> > > > > > This series seem to cause hangs during xfstests against a server on the
> > > > > > same VM. The trace is fairly similar every the hang happens, but the
> > > > > > point at which it happens differs:
> > > > >
> > > > > Ouch, OK, and you're sure it starts with this series?
> > > > >
> > > > > I guess I should try to replicate it here. Might take a copule days.
> > >
> > > Seems lile "nfsd4: allow exotic read compounds" is the culprit.
> >
> > OK, it makes sense that the problem would be there. Looking....
>
> I got xfstests set up and can reproduce some problems (a hang, the
> nfs4svc_encode_compoundres WARN, and some allocation failures), though
> not exactly what you reported.
>
> I also notice that commit ("nfsd4: allow exotic read compounds") has a
> lot of extraneous cleanup that I should split out.

In fact, the change to handle multiple reads per compound *should* only
affect the nfsd4_encode_readv() case. Whereas you're probably
exercising only the nfsd4_encode_splice_read() case. So probably some
of the extraneous cleanup in that patch is bogus.

--b.

2014-05-11 20:53:00

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 20/43] nfsd4: size-checking cleanup

From: "J. Bruce Fields" <[email protected]>

Better variable name, some comments, etc.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 9 ++++++---
fs/nfsd/nfs4xdr.c | 29 +++++++++++++++--------------
2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index d50fc14..18063e0 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1228,7 +1228,6 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate = &resp->cstate;
struct svc_fh *current_fh = &cstate->current_fh;
struct svc_fh *save_fh = &cstate->save_fh;
- u32 plen = 0;
__be32 status;

svcxdr_init_encode(rqstp, resp);
@@ -1298,9 +1297,13 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,

/* If op is non-idempotent */
if (opdesc->op_flags & OP_MODIFIES_SOMETHING) {
- plen = opdesc->op_rsize_bop(rqstp, op);
/*
- * If there's still another operation, make sure
+ * Don't execute this op if we couldn't encode a
+ * succesful reply:
+ */
+ u32 plen = opdesc->op_rsize_bop(rqstp, op);
+ /*
+ * Plus if there's another operation, make sure
* we'll have space to at least encode an error:
*/
if (resp->opcnt < args->opcnt)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index a330dd7..def2ceb 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3722,35 +3722,36 @@ static nfsd4_enc nfsd4_enc_ops[] = {
};

/*
- * Calculate the total amount of memory that the compound response has taken
- * after encoding the current operation with pad.
+ * Calculate whether we still have space to encode repsize bytes.
+ * There are two considerations:
+ * - For NFS versions >=4.1, the size of the reply must stay within
+ * session limits
+ * - For all NFS versions, we must stay within limited preallocated
+ * buffer space.
*
- * pad: if operation is non-idempotent, pad was calculate by op_rsize_bop()
- * which was specified at nfsd4_operation, else pad is zero.
- *
- * Compare this length to the session se_fmaxresp_sz and se_fmaxresp_cached.
- *
- * Our se_fmaxresp_cached will always be a multiple of PAGE_SIZE, and so
- * will be at least a page and will therefore hold the xdr_buf head.
+ * This is called before the operation is processed, so can only provide
+ * an upper estimate. For some nonidempotent operations (such as
+ * getattr), it's not necessarily a problem if that estimate is wrong,
+ * as we can fail it after processing without significant side effects.
*/
-__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 pad)
+__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 respsize)
{
struct xdr_buf *buf = &resp->rqstp->rq_res;
struct nfsd4_session *session = resp->cstate.session;
- struct nfsd4_slot *slot = resp->cstate.slot;
int slack_bytes = (char *)resp->xdr.end - (char *)resp->xdr.p;

if (nfsd4_has_session(&resp->cstate)) {
+ struct nfsd4_slot *slot = resp->cstate.slot;

- if (buf->len + pad > session->se_fchannel.maxresp_sz)
+ if (buf->len + respsize > session->se_fchannel.maxresp_sz)
return nfserr_rep_too_big;

if ((slot->sl_flags & NFSD4_SLOT_CACHETHIS) &&
- buf->len + pad > session->se_fchannel.maxresp_cached)
+ buf->len + respsize > session->se_fchannel.maxresp_cached)
return nfserr_rep_too_big_to_cache;
}

- if (pad > slack_bytes) {
+ if (respsize > slack_bytes) {
WARN_ON_ONCE(nfsd4_has_session(&resp->cstate));
return nfserr_resource;
}
--
1.7.9.5


2014-05-11 20:53:08

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 43/43] nfsd4: kill write32, write64

From: "J. Bruce Fields" <[email protected]>

And switch a couple other functions from the encode(&p,...) convention
to the p = encode(p,...) convention mostly used elsewhere.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 51 +++++++++++++++++++++------------------------------
1 file changed, 21 insertions(+), 30 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index f875bda..2956e66 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1699,39 +1699,30 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
DECODE_TAIL;
}

-static void write32(__be32 **p, u32 n)
-{
- *(*p)++ = htonl(n);
-}
-
-static void write64(__be32 **p, u64 n)
-{
- write32(p, (n >> 32));
- write32(p, (u32)n);
-}
-
-static void write_change(__be32 **p, struct kstat *stat, struct inode *inode)
+static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode)
{
if (IS_I_VERSION(inode)) {
- write64(p, inode->i_version);
+ p = xdr_encode_hyper(p, inode->i_version);
} else {
- write32(p, stat->ctime.tv_sec);
- write32(p, stat->ctime.tv_nsec);
+ *p++ = cpu_to_be32(stat->ctime.tv_sec);
+ *p++ = cpu_to_be32(stat->ctime.tv_nsec);
}
+ return p;
}

-static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
+static __be32 *encode_cinfo(__be32 *p, struct nfsd4_change_info *c)
{
- write32(p, c->atomic);
+ *p++ = cpu_to_be32(c->atomic);
if (c->change_supported) {
- write64(p, c->before_change);
- write64(p, c->after_change);
+ p = xdr_encode_hyper(p, c->before_change);
+ p = xdr_encode_hyper(p, c->after_change);
} else {
- write32(p, c->before_ctime_sec);
- write32(p, c->before_ctime_nsec);
- write32(p, c->after_ctime_sec);
- write32(p, c->after_ctime_nsec);
+ *p++ = cpu_to_be32(c->before_ctime_sec);
+ *p++ = cpu_to_be32(c->before_ctime_nsec);
+ *p++ = cpu_to_be32(c->after_ctime_sec);
+ *p++ = cpu_to_be32(c->after_ctime_nsec);
}
+ return p;
}

/* Encode as an array of strings the string given with components
@@ -2189,7 +2180,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- write_change(&p, &stat, dentry->d_inode);
+ p = encode_change(p, &stat, dentry->d_inode);
}
if (bmval0 & FATTR4_WORD0_SIZE) {
p = xdr_reserve_space(xdr, 8);
@@ -2815,7 +2806,7 @@ nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
p = xdr_reserve_space(xdr, 32);
if (!p)
return nfserr_resource;
- write_cinfo(&p, &create->cr_cinfo);
+ p = encode_cinfo(p, &create->cr_cinfo);
*p++ = cpu_to_be32(2);
*p++ = cpu_to_be32(create->cr_bmval[0]);
*p++ = cpu_to_be32(create->cr_bmval[1]);
@@ -2938,7 +2929,7 @@ nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_li
p = xdr_reserve_space(xdr, 20);
if (!p)
return nfserr_resource;
- write_cinfo(&p, &link->li_cinfo);
+ p = encode_cinfo(p, &link->li_cinfo);
}
return nfserr;
}
@@ -2959,7 +2950,7 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
p = xdr_reserve_space(xdr, 40);
if (!p)
return nfserr_resource;
- write_cinfo(&p, &open->op_cinfo);
+ p = encode_cinfo(p, &open->op_cinfo);
*p++ = cpu_to_be32(open->op_rflags);
*p++ = cpu_to_be32(2);
*p++ = cpu_to_be32(open->op_bmval[0]);
@@ -3378,7 +3369,7 @@ nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
p = xdr_reserve_space(xdr, 20);
if (!p)
return nfserr_resource;
- write_cinfo(&p, &remove->rm_cinfo);
+ p = encode_cinfo(p, &remove->rm_cinfo);
}
return nfserr;
}
@@ -3393,8 +3384,8 @@ nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
p = xdr_reserve_space(xdr, 40);
if (!p)
return nfserr_resource;
- write_cinfo(&p, &rename->rn_sinfo);
- write_cinfo(&p, &rename->rn_tinfo);
+ p = encode_cinfo(p, &rename->rn_sinfo);
+ p = encode_cinfo(p, &rename->rn_tinfo);
}
return nfserr;
}
--
1.7.9.5


2014-05-11 20:52:55

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 06/43] nfsd4: fix encoding of out-of-space replies

From: "J. Bruce Fields" <[email protected]>

If nfsd4_check_resp_size() returns an error then we should really be
truncating the reply here, otherwise we may leave extra garbage at the
end of the rpc reply.

Also add a warning to catch any cases where our reply-size estimates may
be wrong in the case of a non-idempotent operation.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 6cdd660..fb40dd1 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3633,6 +3633,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
{
struct nfs4_stateowner *so = resp->cstate.replay_owner;
__be32 *statp;
+ nfsd4_enc encoder;
__be32 *p;

RESERVE_SPACE(8);
@@ -3644,10 +3645,29 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
goto status;
BUG_ON(op->opnum < 0 || op->opnum >= ARRAY_SIZE(nfsd4_enc_ops) ||
!nfsd4_enc_ops[op->opnum]);
- op->status = nfsd4_enc_ops[op->opnum](resp, op->status, &op->u);
+ encoder = nfsd4_enc_ops[op->opnum];
+ op->status = encoder(resp, op->status, &op->u);
/* nfsd4_check_resp_size guarantees enough room for error status */
if (!op->status)
op->status = nfsd4_check_resp_size(resp, 0);
+ if (op->status == nfserr_resource ||
+ op->status == nfserr_rep_too_big ||
+ op->status == nfserr_rep_too_big_to_cache) {
+ /*
+ * The operation may have already been encoded or
+ * partially encoded. No op returns anything additional
+ * in the case of one of these three errors, so we can
+ * just truncate back to after the status. But it's a
+ * bug if we had to do this on a non-idempotent op:
+ */
+ if (OPDESC(op)->op_flags & OP_MODIFIES_SOMETHING) {
+ printk("unable to encode reply to nonidempotent op"
+ " %d (%s)\n", op->opnum,
+ nfsd4_op_name(op->opnum));
+ WARN_ON_ONCE(1);
+ }
+ resp->xdr.p = statp + 1;
+ }
if (so) {
so->so_replay.rp_status = op->status;
so->so_replay.rp_buflen = (char *)resp->xdr.p - (char *)(statp+1);
--
1.7.9.5


2014-05-11 20:52:55

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 03/43] nfsd4: move proc_compound xdr encode init to helper

From: "J. Bruce Fields" <[email protected]>

Mechanical transformation with no change of behavior.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 1499aa4..6c049c4 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1251,6 +1251,17 @@ static bool need_wrongsec_check(struct svc_rqst *rqstp)
return !(nextd->op_flags & OP_HANDLES_WRONGSEC);
}

+static void svcxdr_init_encode(struct svc_rqst *rqstp, struct nfsd4_compoundres *resp)
+{
+ struct xdr_stream *xdr = &resp->xdr;
+ struct xdr_buf *buf = &rqstp->rq_res;
+ struct kvec *head = buf->head;
+
+ xdr->buf = buf;
+ xdr->p = head->iov_base + head->iov_len;
+ xdr->end = head->iov_base + PAGE_SIZE;
+}
+
/*
* COMPOUND call.
*/
@@ -1268,13 +1279,10 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
u32 plen = 0;
__be32 status;

- resp->xdr.buf = &rqstp->rq_res;
- resp->xdr.p = rqstp->rq_res.head[0].iov_base +
- rqstp->rq_res.head[0].iov_len;
+ svcxdr_init_encode(rqstp, resp);
resp->tagp = resp->xdr.p;
/* reserve space for: taglen, tag, and opcnt */
resp->xdr.p += 2 + XDR_QUADLEN(args->taglen);
- resp->xdr.end = rqstp->rq_res.head[0].iov_base + PAGE_SIZE;
resp->taglen = args->taglen;
resp->tag = args->tag;
resp->opcnt = 0;
--
1.7.9.5


2014-05-11 20:53:08

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 38/43] nfsd4: allow exotic read compounds

From: "J. Bruce Fields" <[email protected]>

---
Documentation/filesystems/nfs/nfs41-server.txt | 2 -
fs/nfsd/nfs4xdr.c | 100 ++++++++++++------------
2 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt
index b930ad0..c49cd7e 100644
--- a/Documentation/filesystems/nfs/nfs41-server.txt
+++ b/Documentation/filesystems/nfs/nfs41-server.txt
@@ -176,7 +176,5 @@ Nonstandard compound limitations:
ca_maxrequestsize request and a ca_maxresponsesize reply, so we may
fail to live up to the promise we made in CREATE_SESSION fore channel
negotiation.
-* No more than one read-like operation allowed per compound; encoding
- replies that cross page boundaries (except for read data) not handled.

See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues.
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 19539fc..91a50a0 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3075,11 +3075,11 @@ static __be32 nfsd4_encode_splice_read(
struct file *file, unsigned long maxcount)
{
struct xdr_stream *xdr = &resp->xdr;
+ struct xdr_buf *buf = xdr->buf;
u32 eof;
- int starting_len = xdr->buf->len - 8;
+ int space_left;
__be32 nfserr;
- __be32 tmp;
- __be32 *p;
+ __be32 *p = xdr->p - 2;

/*
* Don't inline pages unless we know there's room for eof,
@@ -3096,34 +3096,38 @@ static __be32 nfsd4_encode_splice_read(
* page length; reset it so as not to confuse
* xdr_truncate_encode:
*/
- xdr->buf->page_len = 0;
+ buf->page_len = 0;
return nfserr;
}

eof = (read->rd_offset + maxcount >=
read->rd_fhp->fh_dentry->d_inode->i_size);

- tmp = htonl(eof);
- write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4);
- tmp = htonl(maxcount);
- write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
+ *(p++) = htonl(eof);
+ *(p++) = htonl(maxcount);

- resp->xdr.buf->page_len = maxcount;
- xdr->buf->len += maxcount;
- xdr->page_ptr += (maxcount + PAGE_SIZE - 1) / PAGE_SIZE;
- xdr->buf->buflen = maxcount + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
- xdr->iov = xdr->buf->tail;
+ buf->page_len = maxcount;
+ buf->len += maxcount;

/* Use rest of head for padding and remaining ops: */
- resp->xdr.buf->tail[0].iov_base = xdr->p;
- resp->xdr.buf->tail[0].iov_len = 0;
+ buf->tail[0].iov_base = xdr->p;
+ buf->tail[0].iov_len = 0;
+ xdr->iov = xdr->buf->tail;
if (maxcount&3) {
- p = xdr_reserve_space(xdr, 4);
- WRITE32(0);
- resp->xdr.buf->tail[0].iov_base += maxcount&3;
- resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
- xdr->buf->len -= (maxcount&3);
+ int pad = 4 - (maxcount&3);
+
+ *(xdr->p++) = 0;
+
+ buf->tail[0].iov_base += maxcount&3;
+ buf->tail[0].iov_len = pad;
+ buf->len += pad;
}
+
+ space_left = min_t(int, (void *)xdr->end - (void *)xdr->p,
+ buf->buflen - buf->len);
+ buf->buflen = buf->len + space_left;
+ xdr->end = (__be32 *)((void *)xdr->end + space_left);
+
return 0;
}

@@ -3134,27 +3138,34 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
struct xdr_stream *xdr = &resp->xdr;
u32 eof;
int v;
- struct page *page;
int starting_len = xdr->buf->len - 8;
long len;
+ int thislen;
__be32 nfserr;
__be32 tmp;
__be32 *p;
+ u32 zzz = 0;
+ int pad;

len = maxcount;
v = 0;
- while (len) {
- int thislen;

- page = *(resp->rqstp->rq_next_page);
- if (!page) { /* ran out of pages */
- maxcount -= len;
- break;
- }
+ thislen = (void *)xdr->end - (void *)xdr->p;
+ if (len < thislen)
+ thislen = len;
+ p = xdr_reserve_space(xdr, (thislen+3)&~3);
+ WARN_ON_ONCE(!p);
+ resp->rqstp->rq_vec[v].iov_base = p;
+ resp->rqstp->rq_vec[v].iov_len = thislen;
+ v++;
+ len -= thislen;
+
+ while (len) {
thislen = min_t(long, len, PAGE_SIZE);
- resp->rqstp->rq_vec[v].iov_base = page_address(page);
+ p = xdr_reserve_space(xdr, (thislen+3)&~3);
+ WARN_ON_ONCE(!p);
+ resp->rqstp->rq_vec[v].iov_base = p;
resp->rqstp->rq_vec[v].iov_len = thislen;
- resp->rqstp->rq_next_page++;
v++;
len -= thislen;
}
@@ -3164,6 +3175,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
read->rd_vlen, &maxcount);
if (nfserr)
return nfserr;
+ xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));

eof = (read->rd_offset + maxcount >=
read->rd_fhp->fh_dentry->d_inode->i_size);
@@ -3173,22 +3185,9 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
tmp = htonl(maxcount);
write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);

- resp->xdr.buf->page_len = maxcount;
- xdr->buf->len += maxcount;
- xdr->page_ptr += v;
- xdr->buf->buflen = maxcount + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
- xdr->iov = xdr->buf->tail;
-
- /* Use rest of head for padding and remaining ops: */
- resp->xdr.buf->tail[0].iov_base = xdr->p;
- resp->xdr.buf->tail[0].iov_len = 0;
- if (maxcount&3) {
- p = xdr_reserve_space(xdr, 4);
- WRITE32(0);
- resp->xdr.buf->tail[0].iov_base += maxcount&3;
- resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
- xdr->buf->len -= (maxcount&3);
- }
+ pad = (maxcount&3) ? 4 - (maxcount&3) : 0;
+ write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + maxcount,
+ &zzz, pad);
return 0;

}
@@ -3210,16 +3209,19 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
if (!p) {
- WARN_ON_ONCE(1);
+ WARN_ON_ONCE(resp->rqstp->rq_splice_ok);
return nfserr_resource;
}

- if (resp->xdr.buf->page_len)
+ if (resp->rqstp->rq_splice_ok && resp->xdr.buf->page_len) {
+ WARN_ON_ONCE(1);
return nfserr_resource;
-
+ }
xdr_commit_encode(xdr);

maxcount = svc_max_payload(resp->rqstp);
+ if (maxcount > xdr->buf->buflen - xdr->buf->len)
+ maxcount = xdr->buf->buflen - xdr->buf->len;
if (maxcount > read->rd_length)
maxcount = read->rd_length;

--
1.7.9.5


2014-05-22 19:18:07

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd4 xdr encoding fixes v2

On Tue, May 13, 2014 at 05:33:21PM -0400, J. Bruce Fields wrote:
> On Tue, May 13, 2014 at 05:18:41PM -0400, bfields wrote:
> > On Tue, May 13, 2014 at 10:48:26AM -0400, J. Bruce Fields wrote:
> > > On Tue, May 13, 2014 at 04:09:45AM -0700, Christoph Hellwig wrote:
> > > > On Mon, May 12, 2014 at 09:11:28AM -0700, Christoph Hellwig wrote:
> > > > > On Mon, May 12, 2014 at 12:07:41PM -0400, J. Bruce Fields wrote:
> > > > > > On Mon, May 12, 2014 at 01:20:59AM -0700, Christoph Hellwig wrote:
> > > > > > > This series seem to cause hangs during xfstests against a server on the
> > > > > > > same VM. The trace is fairly similar every the hang happens, but the
> > > > > > > point at which it happens differs:
> > > > > >
> > > > > > Ouch, OK, and you're sure it starts with this series?
> > > > > >
> > > > > > I guess I should try to replicate it here. Might take a copule days.
> > > >
> > > > Seems lile "nfsd4: allow exotic read compounds" is the culprit.
> > >
> > > OK, it makes sense that the problem would be there. Looking....
> >
> > I got xfstests set up and can reproduce some problems (a hang, the
> > nfs4svc_encode_compoundres WARN, and some allocation failures), though
> > not exactly what you reported.
> >
> > I also notice that commit ("nfsd4: allow exotic read compounds") has a
> > lot of extraneous cleanup that I should split out.
>
> In fact, the change to handle multiple reads per compound *should* only
> affect the nfsd4_encode_readv() case. Whereas you're probably
> exercising only the nfsd4_encode_splice_read() case. So probably some
> of the extraneous cleanup in that patch is bogus.

Yes, this removal:

- xdr->page_ptr += (maxcount + PAGE_SIZE - 1) / PAGE_SIZE;

buried in that cleanup is bogus and seems like it could result in
freeing a page twice. I've fixed that and split the confusing "cleanup"
out into separate patches. I can't get any hangs by running xfstests
any more.

On the other hand, I never saw exactly what you did, so more testing
wouldn't hurt.

I'll send out a v3 soon.

--b.

2014-05-11 20:52:58

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 16/43] nfsd4: teach encoders to handle reserve_space failures

From: "J. Bruce Fields" <[email protected]>

We've tried to prevent running out of space with COMPOUND_SLACK_SPACE
and special checking in those operations (getattr) whose result can vary
enormously.

However:
- COMPOUND_SLACK_SPACE may be difficult to maintain as we add
more protocol.
- BUG_ON or page faulting on failure seems overly fragile.
- Especially in the 4.1 case, we prefer not to fail compounds
just because the returned result came *close* to session
limits. (Though perfect enforcement here may be difficult.)
- I'd prefer encoding to be uniform for all encoders instead of
having special exceptions for encoders containing, for
example, attributes.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
fs/nfsd/nfs4xdr.c | 246 ++++++++++++++++++++++++++++++++++++++--------------
fs/nfsd/xdr4.h | 2 +-
3 files changed, 184 insertions(+), 66 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index ad0bc5f..d9fe000 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1349,7 +1349,7 @@ encode_op:
}
if (op->status == nfserr_replay_me) {
op->replay = &cstate->replay_owner->so_replay;
- nfsd4_encode_replay(resp, op);
+ nfsd4_encode_replay(&resp->xdr, op);
status = op->status = op->replay->rp_status;
} else {
nfsd4_encode_operation(resp, op);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index cc219b8..d0e59b9 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1746,11 +1746,6 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
}
}

-#define RESERVE_SPACE(nbytes) do { \
- p = xdr_reserve_space(&resp->xdr, nbytes); \
- BUG_ON(!p); \
-} while (0)
-
/* Encode as an array of strings the string given with components
* separated @sep, escaped with esc_enter and esc_exit.
*/
@@ -2722,23 +2717,29 @@ fail:
return -EINVAL;
}

-static void
-nfsd4_encode_stateid(struct nfsd4_compoundres *resp, stateid_t *sid)
+static __be32
+nfsd4_encode_stateid(struct xdr_stream *xdr, stateid_t *sid)
{
__be32 *p;

- RESERVE_SPACE(sizeof(stateid_t));
+ p = xdr_reserve_space(xdr, sizeof(stateid_t));
+ if (!p)
+ return nfserr_resource;
WRITE32(sid->si_generation);
WRITEMEM(&sid->si_opaque, sizeof(stateid_opaque_t));
+ return 0;
}

static __be32
nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_access *access)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(8);
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
+ return nfserr_resource;
WRITE32(access->ac_supported);
WRITE32(access->ac_resp_access);
}
@@ -2747,10 +2748,13 @@ nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_

static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_bind_conn_to_session *bcts)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(NFS4_MAX_SESSIONID_LEN + 8);
+ p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 8);
+ if (!p)
+ return nfserr_resource;
WRITEMEM(bcts->sessionid.data, NFS4_MAX_SESSIONID_LEN);
WRITE32(bcts->dir);
/* Sorry, we do not yet support RDMA over 4.1: */
@@ -2762,8 +2766,10 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
static __be32
nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_close *close)
{
+ struct xdr_stream *xdr = &resp->xdr;
+
if (!nfserr)
- nfsd4_encode_stateid(resp, &close->cl_stateid);
+ nfserr = nfsd4_encode_stateid(xdr, &close->cl_stateid);

return nfserr;
}
@@ -2772,10 +2778,13 @@ nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_c
static __be32
nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_commit *commit)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(NFS4_VERIFIER_SIZE);
+ p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE);
+ if (!p)
+ return nfserr_resource;
WRITEMEM(commit->co_verf.data, NFS4_VERIFIER_SIZE);
}
return nfserr;
@@ -2784,10 +2793,13 @@ nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
static __be32
nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_create *create)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(32);
+ p = xdr_reserve_space(xdr, 32);
+ if (!p)
+ return nfserr_resource;
write_cinfo(&p, &create->cr_cinfo);
WRITE32(2);
WRITE32(create->cr_bmval[0]);
@@ -2814,13 +2826,16 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
static __be32
nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh **fhpp)
{
+ struct xdr_stream *xdr = &resp->xdr;
struct svc_fh *fhp = *fhpp;
unsigned int len;
__be32 *p;

if (!nfserr) {
len = fhp->fh_handle.fh_size;
- RESERVE_SPACE(len + 4);
+ p = xdr_reserve_space(xdr, len + 4);
+ if (!p)
+ return nfserr_resource;
WRITE32(len);
WRITEMEM(&fhp->fh_handle.fh_base, len);
}
@@ -2831,13 +2846,15 @@ nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh
* Including all fields other than the name, a LOCK4denied structure requires
* 8(clientid) + 4(namelen) + 8(offset) + 8(length) + 4(type) = 32 bytes.
*/
-static void
-nfsd4_encode_lock_denied(struct nfsd4_compoundres *resp, struct nfsd4_lock_denied *ld)
+static __be32
+nfsd4_encode_lock_denied(struct xdr_stream *xdr, struct nfsd4_lock_denied *ld)
{
struct xdr_netobj *conf = &ld->ld_owner;
__be32 *p;

- RESERVE_SPACE(32 + XDR_LEN(conf->len));
+ p = xdr_reserve_space(xdr, 32 + XDR_LEN(conf->len));
+ if (!p)
+ return nfserr_resource;
WRITE64(ld->ld_start);
WRITE64(ld->ld_length);
WRITE32(ld->ld_type);
@@ -2850,15 +2867,18 @@ nfsd4_encode_lock_denied(struct nfsd4_compoundres *resp, struct nfsd4_lock_denie
WRITE64((u64)0); /* clientid */
WRITE32(0); /* length of owner name */
}
+ return nfserr_denied;
}

static __be32
nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lock *lock)
{
+ struct xdr_stream *xdr = &resp->xdr;
+
if (!nfserr)
- nfsd4_encode_stateid(resp, &lock->lk_resp_stateid);
+ nfserr = nfsd4_encode_stateid(xdr, &lock->lk_resp_stateid);
else if (nfserr == nfserr_denied)
- nfsd4_encode_lock_denied(resp, &lock->lk_denied);
+ nfserr = nfsd4_encode_lock_denied(xdr, &lock->lk_denied);

return nfserr;
}
@@ -2866,16 +2886,20 @@ nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lo
static __be32
nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lockt *lockt)
{
+ struct xdr_stream *xdr = &resp->xdr;
+
if (nfserr == nfserr_denied)
- nfsd4_encode_lock_denied(resp, &lockt->lt_denied);
+ nfsd4_encode_lock_denied(xdr, &lockt->lt_denied);
return nfserr;
}

static __be32
nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_locku *locku)
{
+ struct xdr_stream *xdr = &resp->xdr;
+
if (!nfserr)
- nfsd4_encode_stateid(resp, &locku->lu_stateid);
+ nfserr = nfsd4_encode_stateid(xdr, &locku->lu_stateid);

return nfserr;
}
@@ -2884,10 +2908,13 @@ nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_l
static __be32
nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_link *link)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(20);
+ p = xdr_reserve_space(xdr, 20);
+ if (!p)
+ return nfserr_resource;
write_cinfo(&p, &link->li_cinfo);
}
return nfserr;
@@ -2897,13 +2924,18 @@ nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_li
static __be32
nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open *open)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (nfserr)
goto out;

- nfsd4_encode_stateid(resp, &open->op_stateid);
- RESERVE_SPACE(40);
+ nfserr = nfsd4_encode_stateid(xdr, &open->op_stateid);
+ if (nfserr)
+ goto out;
+ p = xdr_reserve_space(xdr, 40);
+ if (!p)
+ return nfserr_resource;
write_cinfo(&p, &open->op_cinfo);
WRITE32(open->op_rflags);
WRITE32(2);
@@ -2915,8 +2947,12 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
case NFS4_OPEN_DELEGATE_NONE:
break;
case NFS4_OPEN_DELEGATE_READ:
- nfsd4_encode_stateid(resp, &open->op_delegate_stateid);
- RESERVE_SPACE(20);
+ nfserr = nfsd4_encode_stateid(xdr, &open->op_delegate_stateid);
+ if (nfserr)
+ return nfserr;
+ p = xdr_reserve_space(xdr, 20);
+ if (!p)
+ return nfserr_resource;
WRITE32(open->op_recall);

/*
@@ -2928,8 +2964,12 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
WRITE32(0); /* XXX: is NULL principal ok? */
break;
case NFS4_OPEN_DELEGATE_WRITE:
- nfsd4_encode_stateid(resp, &open->op_delegate_stateid);
- RESERVE_SPACE(32);
+ nfserr = nfsd4_encode_stateid(xdr, &open->op_delegate_stateid);
+ if (nfserr)
+ return nfserr;
+ p = xdr_reserve_space(xdr, 32);
+ if (!p)
+ return nfserr_resource;
WRITE32(0);

/*
@@ -2951,12 +2991,16 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
switch (open->op_why_no_deleg) {
case WND4_CONTENTION:
case WND4_RESOURCE:
- RESERVE_SPACE(8);
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
+ return nfserr_resource;
WRITE32(open->op_why_no_deleg);
WRITE32(0); /* deleg signaling not supported yet */
break;
default:
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
WRITE32(open->op_why_no_deleg);
}
break;
@@ -2971,8 +3015,10 @@ out:
static __be32
nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_confirm *oc)
{
+ struct xdr_stream *xdr = &resp->xdr;
+
if (!nfserr)
- nfsd4_encode_stateid(resp, &oc->oc_resp_stateid);
+ nfserr = nfsd4_encode_stateid(xdr, &oc->oc_resp_stateid);

return nfserr;
}
@@ -2980,8 +3026,10 @@ nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, struct
static __be32
nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_downgrade *od)
{
+ struct xdr_stream *xdr = &resp->xdr;
+
if (!nfserr)
- nfsd4_encode_stateid(resp, &od->od_stateid);
+ nfserr = nfsd4_encode_stateid(xdr, &od->od_stateid);

return nfserr;
}
@@ -3004,7 +3052,9 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
if (resp->xdr.buf->page_len)
return nfserr_resource;

- RESERVE_SPACE(8); /* eof flag and byte count */
+ p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
+ if (!p)
+ return nfserr_resource;

maxcount = svc_max_payload(resp->rqstp);
if (maxcount > read->rd_length)
@@ -3056,7 +3106,9 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
resp->xdr.buf->tail[0].iov_base = p;
resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
@@ -3084,7 +3136,10 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
page = page_address(*(resp->rqstp->rq_next_page++));

maxcount = PAGE_SIZE;
- RESERVE_SPACE(4);
+
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;

/*
* XXX: By default, the ->readlink() VFS op will truncate symlinks
@@ -3111,7 +3166,9 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
resp->xdr.buf->tail[0].iov_base = p;
resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
@@ -3136,7 +3193,9 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
if (!*resp->rqstp->rq_next_page)
return nfserr_resource;

- RESERVE_SPACE(NFS4_VERIFIER_SIZE);
+ p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE);
+ if (!p)
+ return nfserr_resource;

/* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
WRITE32(0);
@@ -3204,10 +3263,13 @@ err_no_verf:
static __be32
nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_remove *remove)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(20);
+ p = xdr_reserve_space(xdr, 20);
+ if (!p)
+ return nfserr_resource;
write_cinfo(&p, &remove->rm_cinfo);
}
return nfserr;
@@ -3216,10 +3278,13 @@ nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
static __be32
nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_rename *rename)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(40);
+ p = xdr_reserve_space(xdr, 40);
+ if (!p)
+ return nfserr_resource;
write_cinfo(&p, &rename->rn_sinfo);
write_cinfo(&p, &rename->rn_tinfo);
}
@@ -3227,7 +3292,7 @@ nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
}

static __be32
-nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,
+nfsd4_do_encode_secinfo(struct xdr_stream *xdr,
__be32 nfserr, struct svc_export *exp)
{
u32 i, nflavs, supported;
@@ -3238,6 +3303,7 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,

if (nfserr)
goto out;
+ nfserr = nfserr_resource;
if (exp->ex_nflavors) {
flavs = exp->ex_flavors;
nflavs = exp->ex_nflavors;
@@ -3259,7 +3325,9 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,
}

supported = 0;
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ goto out;
flavorsp = p++; /* to be backfilled later */

for (i = 0; i < nflavs; i++) {
@@ -3268,7 +3336,9 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,

if (rpcauth_get_gssinfo(pf, &info) == 0) {
supported++;
- RESERVE_SPACE(4 + 4 + XDR_LEN(info.oid.len) + 4 + 4);
+ p = xdr_reserve_space(xdr, 4 + 4 + XDR_LEN(info.oid.len) + 4 + 4);
+ if (!p)
+ goto out;
WRITE32(RPC_AUTH_GSS);
WRITE32(info.oid.len);
WRITEMEM(info.oid.data, info.oid.len);
@@ -3276,7 +3346,9 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,
WRITE32(info.service);
} else if (pf < RPC_AUTH_MAXFLAVOR) {
supported++;
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ goto out;
WRITE32(pf);
} else {
if (report)
@@ -3288,7 +3360,7 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,
if (nflavs != supported)
report = false;
*flavorsp = htonl(supported);
-
+ nfserr = 0;
out:
if (exp)
exp_put(exp);
@@ -3299,14 +3371,18 @@ static __be32
nfsd4_encode_secinfo(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_secinfo *secinfo)
{
- return nfsd4_do_encode_secinfo(resp, nfserr, secinfo->si_exp);
+ struct xdr_stream *xdr = &resp->xdr;
+
+ return nfsd4_do_encode_secinfo(xdr, nfserr, secinfo->si_exp);
}

static __be32
nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_secinfo_no_name *secinfo)
{
- return nfsd4_do_encode_secinfo(resp, nfserr, secinfo->sin_exp);
+ struct xdr_stream *xdr = &resp->xdr;
+
+ return nfsd4_do_encode_secinfo(xdr, nfserr, secinfo->sin_exp);
}

/*
@@ -3316,9 +3392,12 @@ nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32
nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setattr *setattr)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

- RESERVE_SPACE(16);
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
+ return nfserr_resource;
if (nfserr) {
WRITE32(3);
WRITE32(0);
@@ -3337,15 +3416,20 @@ nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
static __be32
nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setclientid *scd)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(8 + NFS4_VERIFIER_SIZE);
+ p = xdr_reserve_space(xdr, 8 + NFS4_VERIFIER_SIZE);
+ if (!p)
+ return nfserr_resource;
WRITEMEM(&scd->se_clientid, 8);
WRITEMEM(&scd->se_confirm, NFS4_VERIFIER_SIZE);
}
else if (nfserr == nfserr_clid_inuse) {
- RESERVE_SPACE(8);
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
+ return nfserr_resource;
WRITE32(0);
WRITE32(0);
}
@@ -3355,10 +3439,13 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct n
static __be32
nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_write *write)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(16);
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
+ return nfserr_resource;
WRITE32(write->wr_bytes_written);
WRITE32(write->wr_how_written);
WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
@@ -3378,6 +3465,7 @@ static __be32
nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_exchange_id *exid)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;
char *major_id;
char *server_scope;
@@ -3393,11 +3481,13 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
server_scope = utsname()->nodename;
server_scope_sz = strlen(server_scope);

- RESERVE_SPACE(
+ p = xdr_reserve_space(xdr,
8 /* eir_clientid */ +
4 /* eir_sequenceid */ +
4 /* eir_flags */ +
4 /* spr_how */);
+ if (!p)
+ return nfserr_resource;

WRITEMEM(&exid->clientid, 8);
WRITE32(exid->seqid);
@@ -3410,7 +3500,9 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
break;
case SP4_MACH_CRED:
/* spo_must_enforce, spo_must_allow */
- RESERVE_SPACE(16);
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
+ return nfserr_resource;

/* spo_must_enforce bitmap: */
WRITE32(2);
@@ -3424,13 +3516,15 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
WARN_ON_ONCE(1);
}

- RESERVE_SPACE(
+ p = xdr_reserve_space(xdr,
8 /* so_minor_id */ +
4 /* so_major_id.len */ +
(XDR_QUADLEN(major_id_sz) * 4) +
4 /* eir_server_scope.len */ +
(XDR_QUADLEN(server_scope_sz) * 4) +
4 /* eir_server_impl_id.count (0) */);
+ if (!p)
+ return nfserr_resource;

/* The server_owner struct */
WRITE64(minor_id); /* Minor id */
@@ -3451,17 +3545,22 @@ static __be32
nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_create_session *sess)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (nfserr)
return nfserr;

- RESERVE_SPACE(24);
+ p = xdr_reserve_space(xdr, 24);
+ if (!p)
+ return nfserr_resource;
WRITEMEM(sess->sessionid.data, NFS4_MAX_SESSIONID_LEN);
WRITE32(sess->seqid);
WRITE32(sess->flags);

- RESERVE_SPACE(28);
+ p = xdr_reserve_space(xdr, 28);
+ if (!p)
+ return nfserr_resource;
WRITE32(0); /* headerpadsz */
WRITE32(sess->fore_channel.maxreq_sz);
WRITE32(sess->fore_channel.maxresp_sz);
@@ -3471,11 +3570,15 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(sess->fore_channel.nr_rdma_attrs);

if (sess->fore_channel.nr_rdma_attrs) {
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
WRITE32(sess->fore_channel.rdma_attrs);
}

- RESERVE_SPACE(28);
+ p = xdr_reserve_space(xdr, 28);
+ if (!p)
+ return nfserr_resource;
WRITE32(0); /* headerpadsz */
WRITE32(sess->back_channel.maxreq_sz);
WRITE32(sess->back_channel.maxresp_sz);
@@ -3485,7 +3588,9 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(sess->back_channel.nr_rdma_attrs);

if (sess->back_channel.nr_rdma_attrs) {
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
WRITE32(sess->back_channel.rdma_attrs);
}
return 0;
@@ -3495,12 +3600,15 @@ static __be32
nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_sequence *seq)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (nfserr)
return nfserr;

- RESERVE_SPACE(NFS4_MAX_SESSIONID_LEN + 20);
+ p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 20);
+ if (!p)
+ return nfserr_resource;
WRITEMEM(seq->sessionid.data, NFS4_MAX_SESSIONID_LEN);
WRITE32(seq->seqid);
WRITE32(seq->slotid);
@@ -3517,13 +3625,16 @@ static __be32
nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_test_stateid *test_stateid)
{
+ struct xdr_stream *xdr = &resp->xdr;
struct nfsd4_test_stateid_id *stateid, *next;
__be32 *p;

if (nfserr)
return nfserr;

- RESERVE_SPACE(4 + (4 * test_stateid->ts_num_ids));
+ p = xdr_reserve_space(xdr, 4 + (4 * test_stateid->ts_num_ids));
+ if (!p)
+ return nfserr_resource;
*p++ = htonl(test_stateid->ts_num_ids);

list_for_each_entry_safe(stateid, next, &test_stateid->ts_stateid_list, ts_id_list) {
@@ -3662,7 +3773,11 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
nfsd4_enc encoder;
__be32 *p;

- RESERVE_SPACE(8);
+ p = xdr_reserve_space(xdr, 8);
+ if (!p) {
+ WARN_ON_ONCE(1);
+ return;
+ }
WRITE32(op->opnum);
post_err_offset = xdr->buf->len;

@@ -3719,18 +3834,21 @@ status:
* called with nfs4_lock_state() held
*/
void
-nfsd4_encode_replay(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
+nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op)
{
__be32 *p;
struct nfs4_replay *rp = op->replay;

BUG_ON(!rp);

- RESERVE_SPACE(8);
+ p = xdr_reserve_space(xdr, 8 + rp->rp_buflen);
+ if (!p) {
+ WARN_ON_ONCE(1);
+ return;
+ }
WRITE32(op->opnum);
*p++ = rp->rp_status; /* already xdr'ed */

- RESERVE_SPACE(rp->rp_buflen);
WRITEMEM(rp->rp_buf, rp->rp_buflen);
}

diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index fa3a589..19bf3fc 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -614,7 +614,7 @@ int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *,
struct nfsd4_compoundres *);
__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32);
void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *);
-void nfsd4_encode_replay(struct nfsd4_compoundres *resp, struct nfsd4_op *op);
+void nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op);
__be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
struct svc_fh *fhp, struct svc_export *exp,
struct dentry *dentry,
--
1.7.9.5


2014-05-13 11:09:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: nfsd4 xdr encoding fixes v2

On Mon, May 12, 2014 at 09:11:28AM -0700, Christoph Hellwig wrote:
> On Mon, May 12, 2014 at 12:07:41PM -0400, J. Bruce Fields wrote:
> > On Mon, May 12, 2014 at 01:20:59AM -0700, Christoph Hellwig wrote:
> > > This series seem to cause hangs during xfstests against a server on the
> > > same VM. The trace is fairly similar every the hang happens, but the
> > > point at which it happens differs:
> >
> > Ouch, OK, and you're sure it starts with this series?
> >
> > I guess I should try to replicate it here. Might take a copule days.

Seems lile "nfsd4: allow exotic read compounds" is the culprit.

That patch is also missing a signoff while we're at it.


2014-05-12 05:33:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 39/43] nfsd4: really fix nfs4err_resource in 4.1 case

> + if (op->status == nfserr_resource && nfsd4_has_session(&resp->cstate)) {
> + struct nfsd4_slot *slot = resp->cstate.slot;
> +
> + if (slot->sl_flags & NFSD4_SLOT_CACHETHIS)
> + op->status = nfserr_rep_too_big_to_cache;
> + else
> + op->status = nfserr_rep_too_big;

There is a closing brace missing here, which breaks the compile for me.


2014-05-12 14:06:47

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 07/43] nfsd4: allow space for final error return

On Mon, May 12, 2014 at 01:18:46AM -0700, Christoph Hellwig wrote:
> On Sun, May 11, 2014 at 04:52:12PM -0400, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <[email protected]>
> >
> > This post-encoding check should be taking into account the need to
> > encode at least an out-of-space error to the following op (if any).
> >
> > Signed-off-by: J. Bruce Fields <[email protected]>
>
> Looks good,
>
> Reviewed-by: Christoph Hellwig <[email protected]>

Thanks, added Reviewed-by's for 1,2,3,7.--b.

2014-05-11 20:53:00

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 17/43] nfsd4: reserve space before inlining 0-copy pages

From: "J. Bruce Fields" <[email protected]>

Once we've included page-cache pages in the encoding it's difficult to
remove them and restart encoding. (xdr_truncate_encode doesn't handle
that case.) So, make sure we'll have adequate space to finish the
operation first.

For now COMPOUND_SLACK_SPACE checks should prevent this case happening,
but we want to remove those checks.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d0e59b9..846d241 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3056,6 +3056,10 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
if (!p)
return nfserr_resource;

+ /* Make sure there will be room for padding if needed: */
+ if (xdr->end - xdr->p < 1)
+ return nfserr_resource;
+
maxcount = svc_max_payload(resp->rqstp);
if (maxcount > read->rd_length)
maxcount = read->rd_length;
@@ -3107,8 +3111,6 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
p = xdr_reserve_space(xdr, 4);
- if (!p)
- return nfserr_resource;
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
@@ -3141,6 +3143,9 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
if (!p)
return nfserr_resource;

+ if (xdr->end - xdr->p < 1)
+ return nfserr_resource;
+
/*
* XXX: By default, the ->readlink() VFS op will truncate symlinks
* if they would overflow the buffer. Is this kosher in NFSv4? If
@@ -3167,8 +3172,6 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
p = xdr_reserve_space(xdr, 4);
- if (!p)
- return nfserr_resource;
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
--
1.7.9.5


2014-05-12 05:35:54

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 02/43] nfsd4: tweak nfsd4_encode_getattr to take xdr_stream

> * Note: @fhp can be NULL; in this case, we might have to compose the filehandle
> * ourselves.
> - *
> - * countp is the buffer size in _words_
> */
> __be32
> -nfsd4_encode_fattr(struct svc_fh *fhp, struct svc_export *exp,
> - struct dentry *dentry, __be32 **buffer, int count, u32 *bmval,
> +nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export *exp,

Can you make sure that line that the patches touch don't exceed 80
lines? There's quite a few more through the series, including new code.


Otherwise looks good,

Signed-off-by: Christoph Hellwig <[email protected]>

2014-05-11 20:53:00

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 21/43] nfsd4: allow encoding across page boundaries

From: "J. Bruce Fields" <[email protected]>

After this we can handle for example getattr of very large ACLs.

Read, readdir, readlink are still special cases with their own limits.

Also we can't handle a new operation starting close to the end of a
page.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 4 +++
fs/nfsd/nfs4xdr.c | 59 +++++++++++++++++++++++----------
include/linux/sunrpc/svc.h | 1 +
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/svc_xprt.c | 1 +
net/sunrpc/xdr.c | 78 ++++++++++++++++++++++++++++++++++++++++++--
6 files changed, 125 insertions(+), 19 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 18063e0..787aa9f 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1213,6 +1213,10 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp, struct nfsd4_compoundres
xdr->end = head->iov_base + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
/* Tail and page_len should be zero at this point: */
buf->len = buf->head[0].iov_len;
+ xdr->scratch.iov_len = 0;
+ xdr->page_ptr = buf->pages;
+ buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages)
+ - 2 * RPC_MAX_AUTH_SIZE;
}

/*
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index def2ceb..aedf19a 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1624,6 +1624,7 @@ static int nfsd4_max_reply(u32 opnum)
* the head and tail in another page:
*/
return 2 * PAGE_SIZE;
+ case OP_GETATTR:
case OP_READ:
return INT_MAX;
default:
@@ -2547,21 +2548,30 @@ out_resource:
goto out;
}

+static void svcxdr_init_encode_from_buffer(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, int bytes)
+{
+ xdr->scratch.iov_len = 0;
+ memset(buf, 0, sizeof(struct xdr_buf));
+ buf->head[0].iov_base = p;
+ buf->head[0].iov_len = 0;
+ buf->len = 0;
+ xdr->buf = buf;
+ xdr->iov = buf->head;
+ xdr->p = p;
+ xdr->end = (void *)p + bytes;
+ buf->buflen = bytes;
+}
+
__be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
struct svc_fh *fhp, struct svc_export *exp,
struct dentry *dentry, u32 *bmval,
struct svc_rqst *rqstp, int ignore_crossmnt)
{
- struct xdr_buf dummy = {
- .head[0] = {
- .iov_base = *p,
- },
- .buflen = words << 2,
- };
+ struct xdr_buf dummy;
struct xdr_stream xdr;
__be32 ret;

- xdr_init_encode(&xdr, &dummy, NULL);
+ svcxdr_init_encode_from_buffer(&xdr, &dummy, *p, words << 2);
ret = nfsd4_encode_fattr(&xdr, fhp, exp, dentry, bmval, rqstp, ignore_crossmnt);
*p = xdr.p;
return ret;
@@ -3049,8 +3059,6 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

if (nfserr)
return nfserr;
- if (resp->xdr.buf->page_len)
- return nfserr_resource;

p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
if (!p)
@@ -3060,6 +3068,9 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
if (xdr->end - xdr->p < 1)
return nfserr_resource;

+ if (resp->xdr.buf->page_len)
+ return nfserr_resource;
+
maxcount = svc_max_payload(resp->rqstp);
if (maxcount > read->rd_length)
maxcount = read->rd_length;
@@ -3104,6 +3115,8 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
- (char *)resp->xdr.buf->head[0].iov_base);
resp->xdr.buf->page_len = maxcount;
xdr->buf->len += maxcount;
+ xdr->page_ptr += v;
+ xdr->buf->buflen = maxcount + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
@@ -3130,6 +3143,11 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd

if (nfserr)
return nfserr;
+
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
+
if (resp->xdr.buf->page_len)
return nfserr_resource;
if (!*resp->rqstp->rq_next_page)
@@ -3139,10 +3157,6 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd

maxcount = PAGE_SIZE;

- p = xdr_reserve_space(xdr, 4);
- if (!p)
- return nfserr_resource;
-
if (xdr->end - xdr->p < 1)
return nfserr_resource;

@@ -3165,6 +3179,8 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
- (char*)resp->xdr.buf->head[0].iov_base;
resp->xdr.buf->page_len = maxcount;
xdr->buf->len += maxcount;
+ xdr->page_ptr += 1;
+ xdr->buf->buflen -= PAGE_SIZE;
xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
@@ -3191,15 +3207,16 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4

if (nfserr)
return nfserr;
- if (resp->xdr.buf->page_len)
- return nfserr_resource;
- if (!*resp->rqstp->rq_next_page)
- return nfserr_resource;

p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE);
if (!p)
return nfserr_resource;

+ if (resp->xdr.buf->page_len)
+ return nfserr_resource;
+ if (!*resp->rqstp->rq_next_page)
+ return nfserr_resource;
+
/* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
WRITE32(0);
WRITE32(0);
@@ -3251,6 +3268,10 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4

xdr->iov = xdr->buf->tail;

+ xdr->page_ptr++;
+ xdr->buf->buflen -= PAGE_SIZE;
+ xdr->iov = xdr->buf->tail;
+
/* Use rest of head for padding and remaining ops: */
resp->xdr.buf->tail[0].iov_base = tailbase;
resp->xdr.buf->tail[0].iov_len = 0;
@@ -3783,6 +3804,8 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
!nfsd4_enc_ops[op->opnum]);
encoder = nfsd4_enc_ops[op->opnum];
op->status = encoder(resp, op->status, &op->u);
+ xdr_commit_encode(xdr);
+
/* nfsd4_check_resp_size guarantees enough room for error status */
if (!op->status) {
int space_needed = 0;
@@ -3907,6 +3930,8 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len +
buf->tail[0].iov_len);

+ rqstp->rq_next_page = resp->xdr.page_ptr + 1;
+
p = resp->tagp;
*p++ = htonl(resp->taglen);
memcpy(p, resp->tag, resp->taglen);
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 04e7632..39c50e1 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -244,6 +244,7 @@ struct svc_rqst {
struct page * rq_pages[RPCSVC_MAXPAGES];
struct page * *rq_respages; /* points into rq_pages */
struct page * *rq_next_page; /* next reply page to use */
+ struct page * *rq_page_end; /* one past the last page */

struct kvec rq_vec[RPCSVC_MAXPAGES]; /* generally useful.. */

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index e7bb2e3..b23d69f 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -215,6 +215,7 @@ typedef int (*kxdrdproc_t)(void *rqstp, struct xdr_stream *xdr, void *obj);

extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p);
extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
+extern void xdr_commit_encode(struct xdr_stream *xdr);
extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len);
extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages,
unsigned int base, unsigned int len);
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 06c6ff0..baec792 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -597,6 +597,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
}
rqstp->rq_pages[i] = p;
}
+ rqstp->rq_page_end = &rqstp->rq_pages[i];
rqstp->rq_pages[i++] = NULL; /* this might be seen in nfs_read_actor */

/* Make arg->head point to first page and arg->pages point to rest */
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 8ae8ee7..e65d6b6 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -462,6 +462,7 @@ void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p)
struct kvec *iov = buf->head;
int scratch_len = buf->buflen - buf->page_len - buf->tail[0].iov_len;

+ xdr_set_scratch_buffer(xdr, NULL, 0);
BUG_ON(scratch_len < 0);
xdr->buf = buf;
xdr->iov = iov;
@@ -482,6 +483,74 @@ void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p)
EXPORT_SYMBOL_GPL(xdr_init_encode);

/**
+ * xdr_commit_encode - Ensure all data is written to buffer
+ * @xdr: pointer to xdr_stream
+ *
+ * We handle encoding across page boundaries by giving the caller a
+ * temporary location to write to, then later copying the data into
+ * place; xdr_commit_encode does that copying.
+ *
+ * Normally the caller doesn't need to call this directly, as the
+ * following xdr_reserve_space will do it. But an explicit call may be
+ * required at the end of encoding, or any other time when the xdr_buf
+ * data might be read.
+ */
+void xdr_commit_encode(struct xdr_stream *xdr)
+{
+ int shift = xdr->scratch.iov_len;
+ void *page;
+
+ if (shift == 0)
+ return;
+ page = page_address(*xdr->page_ptr);
+ memcpy(xdr->scratch.iov_base, page, shift);
+ memmove(page, page + shift, (void *)xdr->p - page);
+ xdr->scratch.iov_len = 0;
+}
+EXPORT_SYMBOL_GPL(xdr_commit_encode);
+
+__be32 * xdr_get_next_encode_buffer(struct xdr_stream *xdr, size_t nbytes)
+{
+ static __be32 *p;
+ int space_left;
+ int frag1bytes, frag2bytes;
+
+ if (nbytes > PAGE_SIZE)
+ return NULL; /* Bigger buffers require special handling */
+ if (xdr->buf->len + nbytes > xdr->buf->buflen)
+ return NULL; /* Sorry, we're totally out of space */
+ frag1bytes = (xdr->end - xdr->p) << 2;
+ frag2bytes = nbytes - frag1bytes;
+ if (xdr->iov)
+ xdr->iov->iov_len += frag1bytes;
+ else {
+ xdr->buf->page_len += frag1bytes;
+ xdr->page_ptr++;
+ }
+ xdr->iov = NULL;
+ /*
+ * If the last encode didn't end exactly on a page boundary, the
+ * next one will straddle boundaries. Encode into the next
+ * page, then copy it back later in xdr_commit_encode. We use
+ * the "scratch" iov to track any temporarily unused fragment of
+ * space at the end of the previous buffer:
+ */
+ xdr->scratch.iov_base = xdr->p;
+ xdr->scratch.iov_len = frag1bytes;
+ p = page_address(*xdr->page_ptr);
+ /*
+ * Note this is where the next encode will start after we've
+ * shifted this one back:
+ */
+ xdr->p = (void *)p + frag2bytes;
+ space_left = xdr->buf->buflen - xdr->buf->len;
+ xdr->end = (void *)p + min_t(int, space_left, PAGE_SIZE);
+ xdr->buf->page_len += frag2bytes;
+ xdr->buf->len += nbytes;
+ return p;
+}
+
+/**
* xdr_reserve_space - Reserve buffer space for sending
* @xdr: pointer to xdr_stream
* @nbytes: number of bytes to reserve
@@ -495,14 +564,18 @@ __be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
__be32 *p = xdr->p;
__be32 *q;

+ xdr_commit_encode(xdr);
/* align nbytes on the next 32-bit boundary */
nbytes += 3;
nbytes &= ~3;
q = p + (nbytes >> 2);
if (unlikely(q > xdr->end || q < p))
- return NULL;
+ return xdr_get_next_encode_buffer(xdr, nbytes);
xdr->p = q;
- xdr->iov->iov_len += nbytes;
+ if (xdr->iov)
+ xdr->iov->iov_len += nbytes;
+ else
+ xdr->buf->page_len += nbytes;
xdr->buf->len += nbytes;
return p;
}
@@ -539,6 +612,7 @@ void xdr_truncate_encode(struct xdr_stream *xdr, size_t len)
WARN_ON_ONCE(1);
return;
}
+ xdr_commit_encode(xdr);

fraglen = min_t(int, buf->len - len, tail->iov_len);
tail->iov_len -= fraglen;
--
1.7.9.5


2014-05-11 20:53:08

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 42/43] nfsd4: kill WRITEMEM

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 62 ++++++++++++++++++++++++-----------------------------
1 file changed, 28 insertions(+), 34 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index a2e34f5..f875bda 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1699,12 +1699,6 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
DECODE_TAIL;
}

-#define WRITEMEM(ptr,nbytes) do { if (nbytes > 0) { \
- *(p + XDR_QUADLEN(nbytes) -1) = 0; \
- memcpy(p, ptr, nbytes); \
- p += XDR_QUADLEN(nbytes); \
-}} while (0)
-
static void write32(__be32 **p, u32 n)
{
*(*p)++ = htonl(n);
@@ -1783,8 +1777,7 @@ static __be32 nfsd4_encode_components_esc(struct xdr_stream *xdr, char sep, char
p = xdr_reserve_space(xdr, strlen + 4);
if (!p)
return nfserr_resource;
- *p++ = cpu_to_be32(strlen);
- WRITEMEM(str, strlen);
+ p = xdr_encode_opaque(p, str, strlen);
count++;
}
else
@@ -1875,8 +1868,7 @@ static __be32 nfsd4_encode_path(struct xdr_stream *xdr, const struct path *root,
spin_unlock(&dentry->d_lock);
goto out_free;
}
- *p++ = cpu_to_be32(len);
- WRITEMEM(dentry->d_name.name, len);
+ p = xdr_encode_opaque(p, dentry->d_name.name, len);
dprintk("/%s", dentry->d_name.name);
spin_unlock(&dentry->d_lock);
dput(dentry);
@@ -2242,7 +2234,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
*p++ = cpu_to_be32(MINOR(stat.dev));
break;
case FSIDSOURCE_UUID:
- WRITEMEM(exp->ex_uuid, 16);
+ p = xdr_encode_opaque_fixed(p, exp->ex_uuid, 16);
break;
}
}
@@ -2328,8 +2320,8 @@ out_acl:
p = xdr_reserve_space(xdr, fhp->fh_handle.fh_size + 4);
if (!p)
goto out_resource;
- *p++ = cpu_to_be32(fhp->fh_handle.fh_size);
- WRITEMEM(&fhp->fh_handle.fh_base, fhp->fh_handle.fh_size);
+ p = xdr_encode_opaque(p, &fhp->fh_handle.fh_base,
+ fhp->fh_handle.fh_size);
}
if (bmval0 & FATTR4_WORD0_FILEID) {
p = xdr_reserve_space(xdr, 8);
@@ -2746,7 +2738,8 @@ nfsd4_encode_stateid(struct xdr_stream *xdr, stateid_t *sid)
if (!p)
return nfserr_resource;
*p++ = cpu_to_be32(sid->si_generation);
- WRITEMEM(&sid->si_opaque, sizeof(stateid_opaque_t));
+ p = xdr_encode_opaque_fixed(p, &sid->si_opaque,
+ sizeof(stateid_opaque_t));
return 0;
}

@@ -2775,7 +2768,8 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 8);
if (!p)
return nfserr_resource;
- WRITEMEM(bcts->sessionid.data, NFS4_MAX_SESSIONID_LEN);
+ p = xdr_encode_opaque_fixed(p, bcts->sessionid.data,
+ NFS4_MAX_SESSIONID_LEN);
*p++ = cpu_to_be32(bcts->dir);
/* Sorry, we do not yet support RDMA over 4.1: */
*p++ = cpu_to_be32(0);
@@ -2805,7 +2799,8 @@ nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE);
if (!p)
return nfserr_resource;
- WRITEMEM(commit->co_verf.data, NFS4_VERIFIER_SIZE);
+ p = xdr_encode_opaque_fixed(p, commit->co_verf.data,
+ NFS4_VERIFIER_SIZE);
}
return nfserr;
}
@@ -2856,8 +2851,7 @@ nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh
p = xdr_reserve_space(xdr, len + 4);
if (!p)
return nfserr_resource;
- *p++ = cpu_to_be32(len);
- WRITEMEM(&fhp->fh_handle.fh_base, len);
+ p = xdr_encode_opaque(p, &fhp->fh_handle.fh_base, len);
}
return nfserr;
}
@@ -2890,9 +2884,8 @@ again:
p = xdr_encode_hyper(p, ld->ld_length);
*p++ = cpu_to_be32(ld->ld_type);
if (conf->len) {
- WRITEMEM(&ld->ld_clientid, 8);
- *p++ = cpu_to_be32(conf->len);
- WRITEMEM(conf->data, conf->len);
+ p = xdr_encode_opaque_fixed(p, &ld->ld_clientid, 8);
+ p = xdr_encode_opaque(p, conf->data, conf->len);
} else { /* non - nfsv4 lock in conflict, no clientid nor owner */
p = xdr_encode_hyper(p, (u64)0); /* clientid */
*p++ = cpu_to_be32(0); /* length of owner name */
@@ -3455,8 +3448,7 @@ nfsd4_do_encode_secinfo(struct xdr_stream *xdr,
if (!p)
goto out;
*p++ = cpu_to_be32(RPC_AUTH_GSS);
- *p++ = cpu_to_be32(info.oid.len);
- WRITEMEM(info.oid.data, info.oid.len);
+ p = xdr_encode_opaque(p, info.oid.data, info.oid.len);
*p++ = cpu_to_be32(info.qop);
*p++ = cpu_to_be32(info.service);
} else if (pf < RPC_AUTH_MAXFLAVOR) {
@@ -3538,8 +3530,9 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct n
p = xdr_reserve_space(xdr, 8 + NFS4_VERIFIER_SIZE);
if (!p)
return nfserr_resource;
- WRITEMEM(&scd->se_clientid, 8);
- WRITEMEM(&scd->se_confirm, NFS4_VERIFIER_SIZE);
+ p = xdr_encode_opaque_fixed(p, &scd->se_clientid, 8);
+ p = xdr_encode_opaque_fixed(p, &scd->se_confirm,
+ NFS4_VERIFIER_SIZE);
}
else if (nfserr == nfserr_clid_inuse) {
p = xdr_reserve_space(xdr, 8);
@@ -3563,7 +3556,8 @@ nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_w
return nfserr_resource;
*p++ = cpu_to_be32(write->wr_bytes_written);
*p++ = cpu_to_be32(write->wr_how_written);
- WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
+ p = xdr_encode_opaque_fixed(p, write->wr_verifier.data,
+ NFS4_VERIFIER_SIZE);
}
return nfserr;
}
@@ -3604,7 +3598,7 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
if (!p)
return nfserr_resource;

- WRITEMEM(&exid->clientid, 8);
+ p = xdr_encode_opaque_fixed(p, &exid->clientid, 8);
*p++ = cpu_to_be32(exid->seqid);
*p++ = cpu_to_be32(exid->flags);

@@ -3644,12 +3638,10 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
/* The server_owner struct */
p = xdr_encode_hyper(p, minor_id); /* Minor id */
/* major id */
- *p++ = cpu_to_be32(major_id_sz);
- WRITEMEM(major_id, major_id_sz);
+ p = xdr_encode_opaque(p, major_id, major_id_sz);

/* Server scope */
- *p++ = cpu_to_be32(server_scope_sz);
- WRITEMEM(server_scope, server_scope_sz);
+ p = xdr_encode_opaque(p, server_scope, server_scope_sz);

/* Implementation id */
*p++ = cpu_to_be32(0); /* zero length nfs_impl_id4 array */
@@ -3669,7 +3661,8 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
p = xdr_reserve_space(xdr, 24);
if (!p)
return nfserr_resource;
- WRITEMEM(sess->sessionid.data, NFS4_MAX_SESSIONID_LEN);
+ p = xdr_encode_opaque_fixed(p, sess->sessionid.data,
+ NFS4_MAX_SESSIONID_LEN);
*p++ = cpu_to_be32(sess->seqid);
*p++ = cpu_to_be32(sess->flags);

@@ -3724,7 +3717,8 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 20);
if (!p)
return nfserr_resource;
- WRITEMEM(seq->sessionid.data, NFS4_MAX_SESSIONID_LEN);
+ p = xdr_encode_opaque_fixed(p, seq->sessionid.data,
+ NFS4_MAX_SESSIONID_LEN);
*p++ = cpu_to_be32(seq->seqid);
*p++ = cpu_to_be32(seq->slotid);
/* Note slotid's are numbered from zero: */
@@ -3957,7 +3951,7 @@ nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op)
*p++ = cpu_to_be32(op->opnum);
*p++ = rp->rp_status; /* already xdr'ed */

- WRITEMEM(rp->rp_buf, rp->rp_buflen);
+ p = xdr_encode_opaque_fixed(p, rp->rp_buf, rp->rp_buflen);
}

int
--
1.7.9.5


2014-05-11 20:52:57

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 14/43] nfsd4: use xdr_truncate_encode

From: "J. Bruce Fields" <[email protected]>

Now that lengths are reliable, we can use xdr_truncate instead of
open-coding it everywhere.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 33 ++++++++++++++++-----------------
1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 2f16a80..da6b43d 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2041,7 +2041,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
struct svc_fh *tempfh = NULL;
struct kstatfs statfs;
__be32 *p;
- __be32 *start = xdr->p;
+ int starting_len = xdr->buf->len;
__be32 *attrlenp;
u32 dummy;
u64 dummy64;
@@ -2534,13 +2534,8 @@ out:
fh_put(tempfh);
kfree(tempfh);
}
- if (status) {
- int nbytes = (char *)xdr->p - (char *)start;
- /* open code what *should* be xdr_truncate(xdr, len); */
- xdr->iov->iov_len -= nbytes;
- xdr->buf->len -= nbytes;
- xdr->p = start;
- }
+ if (status)
+ xdr_truncate_encode(xdr, starting_len);
return status;
out_nfserr:
status = nfserrno(err);
@@ -2993,6 +2988,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
struct page *page;
unsigned long maxcount;
struct xdr_stream *xdr = &resp->xdr;
+ int starting_len = xdr->buf->len;
long len;
__be32 *p;

@@ -3029,8 +3025,13 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
&maxcount);

if (nfserr) {
- xdr->p -= 2;
- xdr->iov->iov_len -= 8;
+ /*
+ * nfsd_splice_actor may have already messed with the
+ * page length; reset it so as not to confuse
+ * xdr_truncate_encode:
+ */
+ xdr->buf->page_len = 0;
+ xdr_truncate_encode(xdr, starting_len);
return nfserr;
}
eof = (read->rd_offset + maxcount >=
@@ -3063,6 +3064,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
int maxcount;
struct xdr_stream *xdr = &resp->xdr;
char *page;
+ int length_offset = xdr->buf->len;
__be32 *p;

if (nfserr)
@@ -3087,8 +3089,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
if (nfserr == nfserr_isdir)
nfserr = nfserr_inval;
if (nfserr) {
- xdr->p--;
- xdr->iov->iov_len -= 4;
+ xdr_truncate_encode(xdr, length_offset);
return nfserr;
}

@@ -3117,7 +3118,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
int maxcount;
loff_t offset;
struct xdr_stream *xdr = &resp->xdr;
- __be32 *page, *savep, *tailbase;
+ int starting_len = xdr->buf->len;
+ __be32 *page, *tailbase;
__be32 *p;

if (nfserr)
@@ -3128,7 +3130,6 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
return nfserr_resource;

RESERVE_SPACE(NFS4_VERIFIER_SIZE);
- savep = p;

/* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
WRITE32(0);
@@ -3189,9 +3190,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4

return 0;
err_no_verf:
- xdr->p = savep;
- xdr->iov->iov_len = ((char *)resp->xdr.p)
- - (char *)resp->xdr.buf->head[0].iov_base;
+ xdr_truncate_encode(xdr, starting_len);
return nfserr;
}

--
1.7.9.5


2014-05-12 05:37:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 04/43] nfsd4: reserve head space for krb5 integ/priv info

On Sun, May 11, 2014 at 04:52:09PM -0400, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> Currently if the nfs-level part of a reply would be too large, we'll
> return an error to the client. But if the nfs-level part fits and
> leaves no room for krb5p or krb5i stuff, then we just drop the request
> entirely.
>
> That's no good. Instead, reserve some slack space at the end of the
> buffer and make sure we fail outright if we'd come close.
>
> The slack space here is a massive overstimate of what's required, we
> should probably try for a tighter limit at some point.

Don't we know the rpc auth scheme at this point and can at least
avoid it for non-krb setups?


2014-05-22 15:56:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 05/43] nfsd4: move nfsd4_operation to xdr4.h

On Sun, May 11, 2014 at 10:41:15PM -0700, Christoph Hellwig wrote:
> > +struct nfsd4_operation *OPDESC(struct nfsd4_op *op)
>
> I don't think OPDESC is a good name for a non-static function.
>
>
> But looking at the whoile tree we only need the exposed information
> in two simple places in nfs4xdr.c, so I'd suggest to export something
> higher level instead, e.g. move nfsd4_max_reply into nfs4proc.c
> and have a helper to check that the op doesn't modify anything and
> warn if it does.
>

Sure, makes sense; done.

--b.

2014-05-12 21:46:08

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 04/43] nfsd4: reserve head space for krb5 integ/priv info

On Sun, May 11, 2014 at 10:37:27PM -0700, Christoph Hellwig wrote:
> On Sun, May 11, 2014 at 04:52:09PM -0400, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <[email protected]>
> >
> > Currently if the nfs-level part of a reply would be too large, we'll
> > return an error to the client. But if the nfs-level part fits and
> > leaves no room for krb5p or krb5i stuff, then we just drop the request
> > entirely.
> >
> > That's no good. Instead, reserve some slack space at the end of the
> > buffer and make sure we fail outright if we'd come close.
> >
> > The slack space here is a massive overstimate of what's required, we
> > should probably try for a tighter limit at some point.
>
> Don't we know the rpc auth scheme at this point and can at least
> avoid it for non-krb setups?

Yes. At the end of this series we have RPC_MAX_AUTH_SIZE scattered
around in a few different places. Rather than have each place have some
flavor-specific logic I think I'd like the auth code to set an
rq_auth_slack field in the struct svc_rqst for code like this to use.

--b.

2014-05-11 20:53:05

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 30/43] nfsd4: allow large readdirs

From: "J. Bruce Fields" <[email protected]>

Currently we limit readdir results to a single page. This can result in
a performance regression compared to NFSv3 when reading large
directories.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 3 --
fs/nfsd/nfs4xdr.c | 134 +++++++++++++++++++++++++++++-----------------------
fs/nfsd/xdr4.h | 5 +-
3 files changed, 76 insertions(+), 66 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index be638c1..0ab65ae 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1451,9 +1451,6 @@ static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *o
{
u32 rlen = op->u.readdir.rd_maxcount;

- if (rlen > PAGE_SIZE)
- rlen = PAGE_SIZE;
-
return (op_encode_hdr_size + op_encode_verifier_maxsz)
* sizeof(__be32) + rlen;
}
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 0d8a18d..731587c 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2576,8 +2576,8 @@ static inline int attributes_need_mount(u32 *bmval)
}

static __be32
-nfsd4_encode_dirent_fattr(struct nfsd4_readdir *cd,
- const char *name, int namlen, __be32 **p, int buflen)
+nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
+ const char *name, int namlen)
{
struct svc_export *exp = cd->rd_fhp->fh_export;
struct dentry *dentry;
@@ -2629,7 +2629,7 @@ nfsd4_encode_dirent_fattr(struct nfsd4_readdir *cd,

}
out_encode:
- nfserr = nfsd4_encode_fattr_to_buf(p, buflen, NULL, exp, dentry, cd->rd_bmval,
+ nfserr = nfsd4_encode_fattr(xdr, NULL, exp, dentry, cd->rd_bmval,
cd->rd_rqstp, ignore_crossmnt);
out_put:
dput(dentry);
@@ -2638,9 +2638,12 @@ out_put:
}

static __be32 *
-nfsd4_encode_rdattr_error(__be32 *p, int buflen, __be32 nfserr)
+nfsd4_encode_rdattr_error(struct xdr_stream *xdr, __be32 nfserr)
{
- if (buflen < 6)
+ __be32 *p;
+
+ p = xdr_reserve_space(xdr, 6);
+ if (!p)
return NULL;
*p++ = htonl(2);
*p++ = htonl(FATTR4_WORD0_RDATTR_ERROR); /* bmval0 */
@@ -2657,10 +2660,13 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
{
struct readdir_cd *ccd = ccdv;
struct nfsd4_readdir *cd = container_of(ccd, struct nfsd4_readdir, common);
- int buflen;
- __be32 *p = cd->buffer;
- __be32 *cookiep;
+ struct xdr_stream *xdr = cd->xdr;
+ int start_offset = xdr->buf->len;
+ int cookie_offset;
+ int entry_bytes;
__be32 nfserr = nfserr_toosmall;
+ __be64 wire_offset;
+ __be32 *p;

/* In nfsv4, "." and ".." never make it onto the wire.. */
if (name && isdotent(name, namlen)) {
@@ -2668,19 +2674,23 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
return 0;
}

- if (cd->offset)
- xdr_encode_hyper(cd->offset, (u64) offset);
+ if (cd->cookie_offset) {
+ wire_offset = cpu_to_be64(offset);
+ write_bytes_to_xdr_buf(xdr->buf, cd->cookie_offset, &wire_offset, 8);
+ }

- buflen = cd->buflen - 4 - XDR_QUADLEN(namlen);
- if (buflen < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto fail;
-
*p++ = xdr_one; /* mark entry present */
- cookiep = p;
+ cookie_offset = xdr->buf->len;
+ p = xdr_reserve_space(xdr, 3*4 + namlen);
+ if (!p)
+ goto fail;
p = xdr_encode_hyper(p, NFS_OFFSET_MAX); /* offset of next entry */
p = xdr_encode_array(p, name, namlen); /* name length & name */

- nfserr = nfsd4_encode_dirent_fattr(cd, name, namlen, &p, buflen);
+ nfserr = nfsd4_encode_dirent_fattr(xdr, cd, name, namlen);
switch (nfserr) {
case nfs_ok:
break;
@@ -2699,19 +2709,23 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
*/
if (!(cd->rd_bmval[0] & FATTR4_WORD0_RDATTR_ERROR))
goto fail;
- p = nfsd4_encode_rdattr_error(p, buflen, nfserr);
+ p = nfsd4_encode_rdattr_error(xdr, nfserr);
if (p == NULL) {
nfserr = nfserr_toosmall;
goto fail;
}
}
- cd->buflen -= (p - cd->buffer);
- cd->buffer = p;
- cd->offset = cookiep;
+ nfserr = nfserr_toosmall;
+ entry_bytes = xdr->buf->len - start_offset;
+ if (entry_bytes > cd->rd_maxcount)
+ goto fail;
+ cd->rd_maxcount -= entry_bytes;
+ cd->cookie_offset = cookie_offset;
skip_entry:
cd->common.err = nfs_ok;
return 0;
fail:
+ xdr_truncate_encode(xdr, start_offset);
cd->common.err = nfserr;
return -EINVAL;
}
@@ -3200,10 +3214,11 @@ static __be32
nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readdir *readdir)
{
int maxcount;
+ int bytes_left;
loff_t offset;
+ __be64 wire_offset;
struct xdr_stream *xdr = &resp->xdr;
int starting_len = xdr->buf->len;
- __be32 *page, *tailbase;
__be32 *p;

if (nfserr)
@@ -3213,38 +3228,38 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
if (!p)
return nfserr_resource;

- if (resp->xdr.buf->page_len)
- return nfserr_resource;
- if (!*resp->rqstp->rq_next_page)
- return nfserr_resource;
-
/* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
WRITE32(0);
WRITE32(0);
resp->xdr.buf->head[0].iov_len = ((char*)resp->xdr.p)
- (char*)resp->xdr.buf->head[0].iov_base;
- tailbase = p;
-
- maxcount = PAGE_SIZE;
- if (maxcount > readdir->rd_maxcount)
- maxcount = readdir->rd_maxcount;

/*
- * Convert from bytes to words, account for the two words already
- * written, make sure to leave two words at the end for the next
- * pointer and eof field.
+ * Number of bytes left for directory entries allowing for the
+ * final 8 bytes of the readdir and a following failed op:
+ */
+ bytes_left = xdr->buf->buflen - xdr->buf->len
+ - COMPOUND_ERR_SLACK_SPACE - 8;
+ if (bytes_left < 0) {
+ nfserr = nfserr_resource;
+ goto err_no_verf;
+ }
+ maxcount = min_t(u32, readdir->rd_maxcount, INT_MAX);
+ /*
+ * Note the rfc defines rd_maxcount as the size of the
+ * READDIR4resok structure, which includes the verifier above
+ * and the 8 bytes encoded at the end of this function:
*/
- maxcount = (maxcount >> 2) - 4;
- if (maxcount < 0) {
- nfserr = nfserr_toosmall;
+ if (maxcount < 16) {
+ nfserr = nfserr_toosmall;
goto err_no_verf;
}
+ maxcount = min_t(int, maxcount-16, bytes_left);

- page = page_address(*(resp->rqstp->rq_next_page++));
+ readdir->xdr = xdr;
+ readdir->rd_maxcount = maxcount;
readdir->common.err = 0;
- readdir->buflen = maxcount;
- readdir->buffer = page;
- readdir->offset = NULL;
+ readdir->cookie_offset = 0;

offset = readdir->rd_cookie;
nfserr = nfsd_readdir(readdir->rd_rqstp, readdir->rd_fhp,
@@ -3252,32 +3267,31 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
&readdir->common, nfsd4_encode_dirent);
if (nfserr == nfs_ok &&
readdir->common.err == nfserr_toosmall &&
- readdir->buffer == page)
- nfserr = nfserr_toosmall;
+ xdr->buf->len == starting_len + 8) {
+ /* nothing encoded; which limit did we hit?: */
+ if (maxcount - 16 < bytes_left)
+ /* It was the fault of rd_maxcount: */
+ nfserr = nfserr_toosmall;
+ else
+ /* We ran out of buffer space: */
+ nfserr = nfserr_resource;
+ }
if (nfserr)
goto err_no_verf;

- if (readdir->offset)
- xdr_encode_hyper(readdir->offset, offset);
+ if (readdir->cookie_offset) {
+ wire_offset = cpu_to_be64(offset);
+ write_bytes_to_xdr_buf(xdr->buf, readdir->cookie_offset,
+ &wire_offset, 8);
+ }

- p = readdir->buffer;
+ p = xdr_reserve_space(xdr, 8);
+ if (!p) {
+ WARN_ON_ONCE(1);
+ goto err_no_verf;
+ }
*p++ = 0; /* no more entries */
*p++ = htonl(readdir->common.err == nfserr_eof);
- resp->xdr.buf->page_len = ((char*)p) -
- (char*)page_address(*(resp->rqstp->rq_next_page-1));
- xdr->buf->len += xdr->buf->page_len;
-
- xdr->iov = xdr->buf->tail;
-
- xdr->page_ptr++;
- xdr->buf->buflen -= PAGE_SIZE;
- xdr->iov = xdr->buf->tail;
-
- /* Use rest of head for padding and remaining ops: */
- resp->xdr.buf->tail[0].iov_base = tailbase;
- resp->xdr.buf->tail[0].iov_len = 0;
- resp->xdr.p = resp->xdr.buf->tail[0].iov_base;
- resp->xdr.end = resp->xdr.p + (PAGE_SIZE - resp->xdr.buf->head[0].iov_len)/4;

return 0;
err_no_verf:
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index d1c6e21..04b8a80 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -287,9 +287,8 @@ struct nfsd4_readdir {
struct svc_fh * rd_fhp; /* response */

struct readdir_cd common;
- __be32 * buffer;
- int buflen;
- __be32 * offset;
+ struct xdr_stream *xdr;
+ int cookie_offset;
};

struct nfsd4_release_lockowner {
--
1.7.9.5


2014-05-11 20:52:56

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 13/43] rpc: xdr_truncate_encode

From: "J. Bruce Fields" <[email protected]>

This will be used in the server side in a few cases:
- when certain operations (read, readdir, readlink) fail after
encoding a partial response.
- when we run out of space after encoding a partial response.
- in readlink, where we initially reserve PAGE_SIZE bytes for
data, then truncate to the actual size.

Signed-off-by: J. Bruce Fields <[email protected]>
---
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/xdr.c | 67 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 68 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 15f9204..e7bb2e3 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -215,6 +215,7 @@ typedef int (*kxdrdproc_t)(void *rqstp, struct xdr_stream *xdr, void *obj);

extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p);
extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
+extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len);
extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages,
unsigned int base, unsigned int len);
extern unsigned int xdr_stream_pos(const struct xdr_stream *xdr);
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index dd97ba3..8ae8ee7 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -509,6 +509,73 @@ __be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
EXPORT_SYMBOL_GPL(xdr_reserve_space);

/**
+ * xdr_truncate_encode - truncate an encode buffer
+ * @xdr: pointer to xdr_stream
+ * @len: new length of buffer
+ *
+ * Truncates the xdr stream, so that xdr->buf->len == len,
+ * and xdr->p points at offset len from the start of the buffer, and
+ * head, tail, and page lengths are adjusted to correspond.
+ *
+ * If this means moving xdr->p to a different buffer, we assume that
+ * that the end pointer should be set to the end of the current page,
+ * except in the case of the head buffer when we assume the head
+ * buffer's current length represents the end of the available buffer.
+ *
+ * This is *not* safe to use on a buffer that already has inlined page
+ * cache pages (as in a zero-copy server read reply), except for the
+ * simple case of truncating from one position in the tail to another.
+ *
+ */
+void xdr_truncate_encode(struct xdr_stream *xdr, size_t len)
+{
+ struct xdr_buf *buf = xdr->buf;
+ struct kvec *head = buf->head;
+ struct kvec *tail = buf->tail;
+ int fraglen;
+ int new, old;
+
+ if (len > buf->len) {
+ WARN_ON_ONCE(1);
+ return;
+ }
+
+ fraglen = min_t(int, buf->len - len, tail->iov_len);
+ tail->iov_len -= fraglen;
+ buf->len -= fraglen;
+ if (tail->iov_len && buf->len == len) {
+ xdr->p = tail->iov_base + tail->iov_len;
+ /* xdr->end, xdr->iov should be set already */
+ return;
+ }
+ WARN_ON_ONCE(fraglen);
+ fraglen = min_t(int, buf->len - len, buf->page_len);
+ buf->page_len -= fraglen;
+ buf->len -= fraglen;
+
+ new = buf->page_base + buf->page_len;
+ old = new + fraglen;
+ xdr->page_ptr -= (old >> PAGE_SHIFT) - (new >> PAGE_SHIFT);
+
+ if (buf->page_len && buf->len == len) {
+ xdr->p = page_address(*xdr->page_ptr);
+ xdr->end = (void *)xdr->p + PAGE_SIZE;
+ xdr->p = (void *)xdr->p + (new % PAGE_SIZE);
+ /* xdr->iov should already be NULL */
+ return;
+ }
+ if (fraglen)
+ xdr->end = head->iov_base + head->iov_len;
+ /* (otherwise assume xdr->end is already set) */
+ head->iov_len = len;
+ buf->len = len;
+ xdr->p = head->iov_base + head->iov_len;
+ xdr->iov = buf->head;
+ xdr->page_ptr -= 1;
+}
+EXPORT_SYMBOL(xdr_truncate_encode);
+
+/**
* xdr_write_pages - Insert a list of pages into an XDR buffer for sending
* @xdr: pointer to xdr_stream
* @pages: list of pages
--
1.7.9.5


2014-05-12 16:06:31

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 02/43] nfsd4: tweak nfsd4_encode_getattr to take xdr_stream

On Sun, May 11, 2014 at 10:35:54PM -0700, Christoph Hellwig wrote:
> > * Note: @fhp can be NULL; in this case, we might have to compose the filehandle
> > * ourselves.
> > - *
> > - * countp is the buffer size in _words_
> > */
> > __be32
> > -nfsd4_encode_fattr(struct svc_fh *fhp, struct svc_export *exp,
> > - struct dentry *dentry, __be32 **buffer, int count, u32 *bmval,
> > +nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export *exp,
>
> Can you make sure that line that the patches touch don't exceed 80
> lines? There's quite a few more through the series, including new code.

OK, this one is fixed and I'm working through the others and fixing a
few other checkpatch complaints while I'm at it (mainly leftover
(char*)'s).

--b.

>
>
> Otherwise looks good,
>
> Signed-off-by: Christoph Hellwig <[email protected]>

2014-05-11 20:52:58

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 15/43] nfsd4: "backfill" using write_bytes_to_xdr_buf

From: "J. Bruce Fields" <[email protected]>

Normally xdr encoding proceeds in a single pass from start of a buffer
to end, but sometimes we have to write a few bytes to an earlier
position.

Use write_bytes_to_xdr_buf for these cases rather than saving a pointer
to write to. We plan to rewrite xdr_reserve_space to handle encoding
across page boundaries using a scratch buffer, and don't want to risk
writing to a pointer that was contained in a scratch buffer.

Also it will no longer be safe to calculate lengths by subtracting two
pointers, so use xdr_buf offsets instead.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 44 ++++++++++++++++++++++++++------------------
1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index da6b43d..cc219b8 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1757,16 +1757,19 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
static __be32 nfsd4_encode_components_esc(struct xdr_stream *xdr, char sep, char *components, char esc_enter, char esc_exit)
{
__be32 *p;
- __be32 *countp;
+ __be32 pathlen;
+ int pathlen_offset;
int strlen, count=0;
char *str, *end, *next;

dprintk("nfsd4_encode_components(%s)\n", components);
+
+ pathlen_offset = xdr->buf->len;
p = xdr_reserve_space(xdr, 4);
if (!p)
return nfserr_resource;
- countp = p;
- WRITE32(0); /* We will fill this in with @count later */
+ p++; /* We will fill this in with @count later */
+
end = str = components;
while (*end) {
bool found_esc = false;
@@ -1799,8 +1802,8 @@ static __be32 nfsd4_encode_components_esc(struct xdr_stream *xdr, char sep, char
end++;
str = end;
}
- p = countp;
- WRITE32(count);
+ pathlen = htonl(xdr->buf->len - pathlen_offset);
+ write_bytes_to_xdr_buf(xdr->buf, pathlen_offset, &pathlen, 4);
return 0;
}

@@ -2042,7 +2045,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
struct kstatfs statfs;
__be32 *p;
int starting_len = xdr->buf->len;
- __be32 *attrlenp;
+ int attrlen_offset;
+ __be32 attrlen;
u32 dummy;
u64 dummy64;
u32 rdattr_err = 0;
@@ -2147,10 +2151,12 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
WRITE32(1);
WRITE32(bmval0);
}
+
+ attrlen_offset = xdr->buf->len;
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- attrlenp = p++; /* to be backfilled later */
+ p++; /* to be backfilled later */

if (bmval0 & FATTR4_WORD0_SUPPORTED_ATTRS) {
u32 word0 = nfsd_suppattrs0(minorversion);
@@ -2521,7 +2527,8 @@ out_acl:
WRITE32(NFSD_SUPPATTR_EXCLCREAT_WORD2);
}

- *attrlenp = htonl((char *)xdr->p - (char *)attrlenp - 4);
+ attrlen = htonl(xdr->buf->len - attrlen_offset - 4);
+ write_bytes_to_xdr_buf(xdr->buf, attrlen_offset, &attrlen, 4);
status = nfs_ok;

out:
@@ -3648,15 +3655,16 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 pad)
void
nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
{
+ struct xdr_stream *xdr = &resp->xdr;
struct nfs4_stateowner *so = resp->cstate.replay_owner;
struct svc_rqst *rqstp = resp->rqstp;
- __be32 *statp;
+ int post_err_offset;
nfsd4_enc encoder;
__be32 *p;

RESERVE_SPACE(8);
WRITE32(op->opnum);
- statp = p++; /* to be backfilled at the end */
+ post_err_offset = xdr->buf->len;

if (op->opnum == OP_ILLEGAL)
goto status;
@@ -3687,19 +3695,19 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
nfsd4_op_name(op->opnum));
WARN_ON_ONCE(1);
}
- resp->xdr.p = statp + 1;
+ xdr_truncate_encode(xdr, post_err_offset);
}
if (so) {
+ int len = xdr->buf->len - post_err_offset;
+
so->so_replay.rp_status = op->status;
- so->so_replay.rp_buflen = (char *)resp->xdr.p - (char *)(statp+1);
- memcpy(so->so_replay.rp_buf, statp+1, so->so_replay.rp_buflen);
+ so->so_replay.rp_buflen = len;
+ read_bytes_from_xdr_buf(xdr->buf, post_err_offset,
+ so->so_replay.rp_buf, len);
}
status:
- /*
- * Note: We write the status directly, instead of using WRITE32(),
- * since it is already in network byte order.
- */
- *statp = op->status;
+ /* Note that op->status is already in network byte order: */
+ write_bytes_to_xdr_buf(xdr->buf, post_err_offset - 4, &op->status, 4);
}

/*
--
1.7.9.5


2014-05-11 20:52:55

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 10/43] nfsd4: remove ADJUST_ARGS

From: "J. Bruce Fields" <[email protected]>

It's just uninteresting debugging code at this point.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 40 ----------------------------------------
1 file changed, 40 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 53de2db..53708ce 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1750,7 +1750,6 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
p = xdr_reserve_space(&resp->xdr, nbytes); \
BUG_ON(!p); \
} while (0)
-#define ADJUST_ARGS() WARN_ON_ONCE(p != resp->xdr.p) \

/* Encode as an array of strings the string given with components
* separated @sep, escaped with esc_enter and esc_exit.
@@ -2729,7 +2728,6 @@ nfsd4_encode_stateid(struct nfsd4_compoundres *resp, stateid_t *sid)
RESERVE_SPACE(sizeof(stateid_t));
WRITE32(sid->si_generation);
WRITEMEM(&sid->si_opaque, sizeof(stateid_opaque_t));
- ADJUST_ARGS();
}

static __be32
@@ -2741,7 +2739,6 @@ nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
RESERVE_SPACE(8);
WRITE32(access->ac_supported);
WRITE32(access->ac_resp_access);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -2756,7 +2753,6 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
WRITE32(bcts->dir);
/* Sorry, we do not yet support RDMA over 4.1: */
WRITE32(0);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -2779,7 +2775,6 @@ nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
if (!nfserr) {
RESERVE_SPACE(NFS4_VERIFIER_SIZE);
WRITEMEM(commit->co_verf.data, NFS4_VERIFIER_SIZE);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -2795,7 +2790,6 @@ nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
WRITE32(2);
WRITE32(create->cr_bmval[0]);
WRITE32(create->cr_bmval[1]);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -2827,7 +2821,6 @@ nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh
RESERVE_SPACE(len + 4);
WRITE32(len);
WRITEMEM(&fhp->fh_handle.fh_base, len);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -2855,7 +2848,6 @@ nfsd4_encode_lock_denied(struct nfsd4_compoundres *resp, struct nfsd4_lock_denie
WRITE64((u64)0); /* clientid */
WRITE32(0); /* length of owner name */
}
- ADJUST_ARGS();
}

static __be32
@@ -2895,7 +2887,6 @@ nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_li
if (!nfserr) {
RESERVE_SPACE(20);
write_cinfo(&p, &link->li_cinfo);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -2917,7 +2908,6 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
WRITE32(open->op_bmval[0]);
WRITE32(open->op_bmval[1]);
WRITE32(open->op_delegate_type);
- ADJUST_ARGS();

switch (open->op_delegate_type) {
case NFS4_OPEN_DELEGATE_NONE:
@@ -2934,7 +2924,6 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
WRITE32(0);
WRITE32(0);
WRITE32(0); /* XXX: is NULL principal ok? */
- ADJUST_ARGS();
break;
case NFS4_OPEN_DELEGATE_WRITE:
nfsd4_encode_stateid(resp, &open->op_delegate_stateid);
@@ -2955,7 +2944,6 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
WRITE32(0);
WRITE32(0);
WRITE32(0); /* XXX: is NULL principal ok? */
- ADJUST_ARGS();
break;
case NFS4_OPEN_DELEGATE_NONE_EXT: /* 4.1 */
switch (open->op_why_no_deleg) {
@@ -2969,7 +2957,6 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
RESERVE_SPACE(4);
WRITE32(open->op_why_no_deleg);
}
- ADJUST_ARGS();
break;
default:
BUG();
@@ -3051,7 +3038,6 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

WRITE32(eof);
WRITE32(maxcount);
- ADJUST_ARGS();
resp->xdr.buf->head[0].iov_len = (char*)p
- (char*)resp->xdr.buf->head[0].iov_base;
resp->xdr.buf->page_len = maxcount;
@@ -3065,7 +3051,6 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
- ADJUST_ARGS();
}
return 0;
}
@@ -3106,7 +3091,6 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
}

WRITE32(maxcount);
- ADJUST_ARGS();
resp->xdr.buf->head[0].iov_len = (char*)p
- (char*)resp->xdr.buf->head[0].iov_base;
resp->xdr.buf->page_len = maxcount;
@@ -3120,7 +3104,6 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
- ADJUST_ARGS();
}
return 0;
}
@@ -3147,7 +3130,6 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
/* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
WRITE32(0);
WRITE32(0);
- ADJUST_ARGS();
resp->xdr.buf->head[0].iov_len = ((char*)resp->xdr.p)
- (char*)resp->xdr.buf->head[0].iov_base;
tailbase = p;
@@ -3217,7 +3199,6 @@ nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
if (!nfserr) {
RESERVE_SPACE(20);
write_cinfo(&p, &remove->rm_cinfo);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -3231,7 +3212,6 @@ nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
RESERVE_SPACE(40);
write_cinfo(&p, &rename->rn_sinfo);
write_cinfo(&p, &rename->rn_tinfo);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -3271,7 +3251,6 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,
supported = 0;
RESERVE_SPACE(4);
flavorsp = p++; /* to be backfilled later */
- ADJUST_ARGS();

for (i = 0; i < nflavs; i++) {
rpc_authflavor_t pf = flavs[i].pseudoflavor;
@@ -3285,12 +3264,10 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,
WRITEMEM(info.oid.data, info.oid.len);
WRITE32(info.qop);
WRITE32(info.service);
- ADJUST_ARGS();
} else if (pf < RPC_AUTH_MAXFLAVOR) {
supported++;
RESERVE_SPACE(4);
WRITE32(pf);
- ADJUST_ARGS();
} else {
if (report)
pr_warn("NFS: SECINFO: security flavor %u "
@@ -3344,7 +3321,6 @@ nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
WRITE32(setattr->sa_bmval[1]);
WRITE32(setattr->sa_bmval[2]);
}
- ADJUST_ARGS();
return nfserr;
}

@@ -3357,13 +3333,11 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct n
RESERVE_SPACE(8 + NFS4_VERIFIER_SIZE);
WRITEMEM(&scd->se_clientid, 8);
WRITEMEM(&scd->se_confirm, NFS4_VERIFIER_SIZE);
- ADJUST_ARGS();
}
else if (nfserr == nfserr_clid_inuse) {
RESERVE_SPACE(8);
WRITE32(0);
WRITE32(0);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -3378,7 +3352,6 @@ nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_w
WRITE32(write->wr_bytes_written);
WRITE32(write->wr_how_written);
WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -3421,7 +3394,6 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(exid->flags);

WRITE32(exid->spa_how);
- ADJUST_ARGS();

switch (exid->spa_how) {
case SP4_NONE:
@@ -3437,7 +3409,6 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
/* empty spo_must_allow bitmap: */
WRITE32(0);

- ADJUST_ARGS();
break;
default:
WARN_ON_ONCE(1);
@@ -3463,7 +3434,6 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,

/* Implementation id */
WRITE32(0); /* zero length nfs_impl_id4 array */
- ADJUST_ARGS();
return 0;
}

@@ -3480,7 +3450,6 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITEMEM(sess->sessionid.data, NFS4_MAX_SESSIONID_LEN);
WRITE32(sess->seqid);
WRITE32(sess->flags);
- ADJUST_ARGS();

RESERVE_SPACE(28);
WRITE32(0); /* headerpadsz */
@@ -3490,12 +3459,10 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(sess->fore_channel.maxops);
WRITE32(sess->fore_channel.maxreqs);
WRITE32(sess->fore_channel.nr_rdma_attrs);
- ADJUST_ARGS();

if (sess->fore_channel.nr_rdma_attrs) {
RESERVE_SPACE(4);
WRITE32(sess->fore_channel.rdma_attrs);
- ADJUST_ARGS();
}

RESERVE_SPACE(28);
@@ -3506,12 +3473,10 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(sess->back_channel.maxops);
WRITE32(sess->back_channel.maxreqs);
WRITE32(sess->back_channel.nr_rdma_attrs);
- ADJUST_ARGS();

if (sess->back_channel.nr_rdma_attrs) {
RESERVE_SPACE(4);
WRITE32(sess->back_channel.rdma_attrs);
- ADJUST_ARGS();
}
return 0;
}
@@ -3534,7 +3499,6 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(seq->maxslots - 1); /* sr_target_highest_slotid */
WRITE32(seq->status_flags);

- ADJUST_ARGS();
resp->cstate.datap = p; /* DRC cache data pointer */
return 0;
}
@@ -3556,7 +3520,6 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
*p++ = stateid->ts_id_status;
}

- ADJUST_ARGS();
return nfserr;
}

@@ -3691,7 +3654,6 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
RESERVE_SPACE(8);
WRITE32(op->opnum);
statp = p++; /* to be backfilled at the end */
- ADJUST_ARGS();

if (op->opnum == OP_ILLEGAL)
goto status;
@@ -3756,11 +3718,9 @@ nfsd4_encode_replay(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
RESERVE_SPACE(8);
WRITE32(op->opnum);
*p++ = rp->rp_status; /* already xdr'ed */
- ADJUST_ARGS();

RESERVE_SPACE(rp->rp_buflen);
WRITEMEM(rp->rp_buf, rp->rp_buflen);
- ADJUST_ARGS();
}

int
--
1.7.9.5


2014-05-12 08:18:01

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 06/43] nfsd4: fix encoding of out-of-space replies

On Sun, May 11, 2014 at 04:52:11PM -0400, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> If nfsd4_check_resp_size() returns an error then we should really be
> truncating the reply here, otherwise we may leave extra garbage at the
> end of the rpc reply.
>
> Also add a warning to catch any cases where our reply-size estimates may
> be wrong in the case of a non-idempotent operation.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/nfsd/nfs4xdr.c | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 6cdd660..fb40dd1 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -3633,6 +3633,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
> {
> struct nfs4_stateowner *so = resp->cstate.replay_owner;
> __be32 *statp;
> + nfsd4_enc encoder;
> __be32 *p;
>
> RESERVE_SPACE(8);
> @@ -3644,10 +3645,29 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
> goto status;
> BUG_ON(op->opnum < 0 || op->opnum >= ARRAY_SIZE(nfsd4_enc_ops) ||
> !nfsd4_enc_ops[op->opnum]);
> - op->status = nfsd4_enc_ops[op->opnum](resp, op->status, &op->u);
> + encoder = nfsd4_enc_ops[op->opnum];
> + op->status = encoder(resp, op->status, &op->u);

What is the point of the encoder variable that gets set and used a line
later the only time?


2014-05-11 20:52:54

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 09/43] nfsd4: use xdr_stream throughout compound encoding

From: "J. Bruce Fields" <[email protected]>

Note this makes ADJUST_ARGS useless; we'll remove it in the following
patch.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
fs/nfsd/nfs4xdr.c | 23 +++++++++++++++--------
2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index b0f01df..7c45172 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1233,7 +1233,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
svcxdr_init_encode(rqstp, resp);
resp->tagp = resp->xdr.p;
/* reserve space for: taglen, tag, and opcnt */
- resp->xdr.p += 2 + XDR_QUADLEN(args->taglen);
+ xdr_reserve_space(&resp->xdr, 8 + args->taglen);
resp->taglen = args->taglen;
resp->tag = args->tag;
resp->opcnt = 0;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index e337321..53de2db 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1747,10 +1747,10 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
}

#define RESERVE_SPACE(nbytes) do { \
- p = resp->xdr.p; \
- BUG_ON(p + XDR_QUADLEN(nbytes) > resp->xdr.end); \
+ p = xdr_reserve_space(&resp->xdr, nbytes); \
+ BUG_ON(!p); \
} while (0)
-#define ADJUST_ARGS() resp->xdr.p = p
+#define ADJUST_ARGS() WARN_ON_ONCE(p != resp->xdr.p) \

/* Encode as an array of strings the string given with components
* separated @sep, escaped with esc_enter and esc_exit.
@@ -3041,8 +3041,11 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
&maxcount);

- if (nfserr)
+ if (nfserr) {
+ xdr->p -= 2;
+ xdr->iov->iov_len -= 8;
return nfserr;
+ }
eof = (read->rd_offset + maxcount >=
read->rd_fhp->fh_dentry->d_inode->i_size);

@@ -3095,9 +3098,12 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
*/
nfserr = nfsd_readlink(readlink->rl_rqstp, readlink->rl_fhp, page, &maxcount);
if (nfserr == nfserr_isdir)
- return nfserr_inval;
- if (nfserr)
+ nfserr = nfserr_inval;
+ if (nfserr) {
+ xdr->p--;
+ xdr->iov->iov_len -= 4;
return nfserr;
+ }

WRITE32(maxcount);
ADJUST_ARGS();
@@ -3197,8 +3203,9 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4

return 0;
err_no_verf:
- p = savep;
- ADJUST_ARGS();
+ xdr->p = savep;
+ xdr->iov->iov_len = ((char *)resp->xdr.p)
+ - (char *)resp->xdr.buf->head[0].iov_base;
return nfserr;
}

--
1.7.9.5


2014-05-11 20:52:57

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 08/43] nfsd4: use xdr_reserve_space in attribute encoding

From: "J. Bruce Fields" <[email protected]>

This is a cosmetic change for now; no change in behavior.

Note we're just depending on xdr_reserve_space to do the bounds checking
for us, we're not really depending on its adjustment of iovec or xdr_buf
lengths yet, as those are fixed up by as necessary after the fact by
read-link operations and by nfs4svc_encode_compoundres. However we do
have to update xdr->iov on read-like operations to prevent
xdr_reserve_space from messing with the already-fixed-up length of the
the head.

When the attribute encoding fails partway through we have to undo the
length adjustments made so far. We do it manually for now, but later
patches will add an xdr_truncate_encode() helper to handle cases like
this.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/acl.h | 2 +-
fs/nfsd/idmap.h | 4 +-
fs/nfsd/nfs4acl.c | 11 +--
fs/nfsd/nfs4idmap.c | 38 ++++---
fs/nfsd/nfs4proc.c | 1 +
fs/nfsd/nfs4xdr.c | 274 ++++++++++++++++++++++++++++++---------------------
6 files changed, 186 insertions(+), 144 deletions(-)

diff --git a/fs/nfsd/acl.h b/fs/nfsd/acl.h
index b481e1f..a986ceb 100644
--- a/fs/nfsd/acl.h
+++ b/fs/nfsd/acl.h
@@ -49,7 +49,7 @@ struct svc_rqst;

struct nfs4_acl *nfs4_acl_new(int);
int nfs4_acl_get_whotype(char *, u32);
-__be32 nfs4_acl_write_who(int who, __be32 **p, int *len);
+__be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who);

int nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
struct nfs4_acl **acl);
diff --git a/fs/nfsd/idmap.h b/fs/nfsd/idmap.h
index 66e58db..a3f3490 100644
--- a/fs/nfsd/idmap.h
+++ b/fs/nfsd/idmap.h
@@ -56,7 +56,7 @@ static inline void nfsd_idmap_shutdown(struct net *net)

__be32 nfsd_map_name_to_uid(struct svc_rqst *, const char *, size_t, kuid_t *);
__be32 nfsd_map_name_to_gid(struct svc_rqst *, const char *, size_t, kgid_t *);
-__be32 nfsd4_encode_user(struct svc_rqst *, kuid_t, __be32 **, int *);
-__be32 nfsd4_encode_group(struct svc_rqst *, kgid_t, __be32 **, int *);
+__be32 nfsd4_encode_user(struct xdr_stream *, struct svc_rqst *, kuid_t);
+__be32 nfsd4_encode_group(struct xdr_stream *, struct svc_rqst *, kgid_t);

#endif /* LINUX_NFSD_IDMAP_H */
diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
index 05c9b2f..56956df 100644
--- a/fs/nfsd/nfs4acl.c
+++ b/fs/nfsd/nfs4acl.c
@@ -919,20 +919,19 @@ nfs4_acl_get_whotype(char *p, u32 len)
return NFS4_ACL_WHO_NAMED;
}

-__be32 nfs4_acl_write_who(int who, __be32 **p, int *len)
+__be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who)
{
+ __be32 *p;
int i;
- int bytes;

for (i = 0; i < ARRAY_SIZE(s2t_map); i++) {
if (s2t_map[i].type != who)
continue;
- bytes = 4 + (XDR_QUADLEN(s2t_map[i].stringlen) << 2);
- if (bytes > *len)
+ p = xdr_reserve_space(xdr, s2t_map[i].stringlen + 4);
+ if (!p)
return nfserr_resource;
- *p = xdr_encode_opaque(*p, s2t_map[i].string,
+ p = xdr_encode_opaque(p, s2t_map[i].string,
s2t_map[i].stringlen);
- *len -= bytes;
return 0;
}
WARN_ON_ONCE(1);
diff --git a/fs/nfsd/nfs4idmap.c b/fs/nfsd/nfs4idmap.c
index c0dfde6..43afc90 100644
--- a/fs/nfsd/nfs4idmap.c
+++ b/fs/nfsd/nfs4idmap.c
@@ -551,44 +551,42 @@ idmap_name_to_id(struct svc_rqst *rqstp, int type, const char *name, u32 namelen
return 0;
}

-static __be32 encode_ascii_id(u32 id, __be32 **p, int *buflen)
+static __be32 encode_ascii_id(struct xdr_stream *xdr, u32 id)
{
char buf[11];
int len;
- int bytes;
+ __be32 *p;

len = sprintf(buf, "%u", id);
- bytes = 4 + (XDR_QUADLEN(len) << 2);
- if (bytes > *buflen)
+ p = xdr_reserve_space(xdr, len + 4);
+ if (!p)
return nfserr_resource;
- *p = xdr_encode_opaque(*p, buf, len);
- *buflen -= bytes;
+ p = xdr_encode_opaque(p, buf, len);
return 0;
}

-static __be32 idmap_id_to_name(struct svc_rqst *rqstp, int type, u32 id, __be32 **p, int *buflen)
+static __be32 idmap_id_to_name(struct xdr_stream *xdr, struct svc_rqst *rqstp, int type, u32 id)
{
struct ent *item, key = {
.id = id,
.type = type,
};
+ __be32 *p;
int ret;
- int bytes;
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);

strlcpy(key.authname, rqst_authname(rqstp), sizeof(key.authname));
ret = idmap_lookup(rqstp, idtoname_lookup, &key, nn->idtoname_cache, &item);
if (ret == -ENOENT)
- return encode_ascii_id(id, p, buflen);
+ return encode_ascii_id(xdr, id);
if (ret)
return nfserrno(ret);
ret = strlen(item->name);
WARN_ON_ONCE(ret > IDMAP_NAMESZ);
- bytes = 4 + (XDR_QUADLEN(ret) << 2);
- if (bytes > *buflen)
+ p = xdr_reserve_space(xdr, ret + 4);
+ if (!p)
return nfserr_resource;
- *p = xdr_encode_opaque(*p, item->name, ret);
- *buflen -= bytes;
+ p = xdr_encode_opaque(p, item->name, ret);
cache_put(&item->h, nn->idtoname_cache);
return 0;
}
@@ -622,11 +620,11 @@ do_name_to_id(struct svc_rqst *rqstp, int type, const char *name, u32 namelen, u
return idmap_name_to_id(rqstp, type, name, namelen, id);
}

-static __be32 encode_name_from_id(struct svc_rqst *rqstp, int type, u32 id, __be32 **p, int *buflen)
+static __be32 encode_name_from_id(struct xdr_stream *xdr, struct svc_rqst *rqstp, int type, u32 id)
{
if (nfs4_disable_idmapping && rqstp->rq_cred.cr_flavor < RPC_AUTH_GSS)
- return encode_ascii_id(id, p, buflen);
- return idmap_id_to_name(rqstp, type, id, p, buflen);
+ return encode_ascii_id(xdr, id);
+ return idmap_id_to_name(xdr, rqstp, type, id);
}

__be32
@@ -655,14 +653,14 @@ nfsd_map_name_to_gid(struct svc_rqst *rqstp, const char *name, size_t namelen,
return status;
}

-__be32 nfsd4_encode_user(struct svc_rqst *rqstp, kuid_t uid, __be32 **p, int *buflen)
+__be32 nfsd4_encode_user(struct xdr_stream *xdr, struct svc_rqst *rqstp, kuid_t uid)
{
u32 id = from_kuid(&init_user_ns, uid);
- return encode_name_from_id(rqstp, IDMAP_TYPE_USER, id, p, buflen);
+ return encode_name_from_id(xdr, rqstp, IDMAP_TYPE_USER, id);
}

-__be32 nfsd4_encode_group(struct svc_rqst *rqstp, kgid_t gid, __be32 **p, int *buflen)
+__be32 nfsd4_encode_group(struct xdr_stream *xdr, struct svc_rqst *rqstp, kgid_t gid)
{
u32 id = from_kgid(&init_user_ns, gid);
- return encode_name_from_id(rqstp, IDMAP_TYPE_GROUP, id, p, buflen);
+ return encode_name_from_id(xdr, rqstp, IDMAP_TYPE_GROUP, id);
}
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 99aa348..b0f01df 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1208,6 +1208,7 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp, struct nfsd4_compoundres
struct kvec *head = buf->head;

xdr->buf = buf;
+ xdr->iov = head;
xdr->p = head->iov_base + head->iov_len;
xdr->end = head->iov_base + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
}
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index e915bb1..e337321 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1755,18 +1755,18 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
/* Encode as an array of strings the string given with components
* separated @sep, escaped with esc_enter and esc_exit.
*/
-static __be32 nfsd4_encode_components_esc(char sep, char *components,
- __be32 **pp, int *buflen,
- char esc_enter, char esc_exit)
+static __be32 nfsd4_encode_components_esc(struct xdr_stream *xdr, char sep, char *components, char esc_enter, char esc_exit)
{
- __be32 *p = *pp;
- __be32 *countp = p;
+ __be32 *p;
+ __be32 *countp;
int strlen, count=0;
char *str, *end, *next;

dprintk("nfsd4_encode_components(%s)\n", components);
- if ((*buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
return nfserr_resource;
+ countp = p;
WRITE32(0); /* We will fill this in with @count later */
end = str = components;
while (*end) {
@@ -1789,7 +1789,8 @@ static __be32 nfsd4_encode_components_esc(char sep, char *components,

strlen = end - str;
if (strlen) {
- if ((*buflen -= ((XDR_QUADLEN(strlen) << 2) + 4)) < 0)
+ p = xdr_reserve_space(xdr, strlen + 4);
+ if (!p)
return nfserr_resource;
WRITE32(strlen);
WRITEMEM(str, strlen);
@@ -1799,7 +1800,6 @@ static __be32 nfsd4_encode_components_esc(char sep, char *components,
end++;
str = end;
}
- *pp = p;
p = countp;
WRITE32(count);
return 0;
@@ -1808,40 +1808,35 @@ static __be32 nfsd4_encode_components_esc(char sep, char *components,
/* Encode as an array of strings the string given with components
* separated @sep.
*/
-static __be32 nfsd4_encode_components(char sep, char *components,
- __be32 **pp, int *buflen)
+static __be32 nfsd4_encode_components(struct xdr_stream *xdr, char sep, char *components)
{
- return nfsd4_encode_components_esc(sep, components, pp, buflen, 0, 0);
+ return nfsd4_encode_components_esc(xdr, sep, components, 0, 0);
}

/*
* encode a location element of a fs_locations structure
*/
-static __be32 nfsd4_encode_fs_location4(struct nfsd4_fs_location *location,
- __be32 **pp, int *buflen)
+static __be32 nfsd4_encode_fs_location4(struct xdr_stream *xdr, struct nfsd4_fs_location *location)
{
__be32 status;
- __be32 *p = *pp;

- status = nfsd4_encode_components_esc(':', location->hosts, &p, buflen,
+ status = nfsd4_encode_components_esc(xdr, ':', location->hosts,
'[', ']');
if (status)
return status;
- status = nfsd4_encode_components('/', location->path, &p, buflen);
+ status = nfsd4_encode_components(xdr, '/', location->path);
if (status)
return status;
- *pp = p;
return 0;
}

/*
* Encode a path in RFC3530 'pathname4' format
*/
-static __be32 nfsd4_encode_path(const struct path *root,
- const struct path *path, __be32 **pp, int *buflen)
+static __be32 nfsd4_encode_path(struct xdr_stream *xdr, const struct path *root, const struct path *path)
{
struct path cur = *path;
- __be32 *p = *pp;
+ __be32 *p;
struct dentry **components = NULL;
unsigned int ncomponents = 0;
__be32 err = nfserr_jukebox;
@@ -1872,9 +1867,9 @@ static __be32 nfsd4_encode_path(const struct path *root,
components[ncomponents++] = cur.dentry;
cur.dentry = dget_parent(cur.dentry);
}
-
- *buflen -= 4;
- if (*buflen < 0)
+ err = nfserr_resource;
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_free;
WRITE32(ncomponents);

@@ -1884,8 +1879,8 @@ static __be32 nfsd4_encode_path(const struct path *root,

spin_lock(&dentry->d_lock);
len = dentry->d_name.len;
- *buflen -= 4 + (XDR_QUADLEN(len) << 2);
- if (*buflen < 0) {
+ p = xdr_reserve_space(xdr, len + 4);
+ if (!p) {
spin_unlock(&dentry->d_lock);
goto out_free;
}
@@ -1897,7 +1892,6 @@ static __be32 nfsd4_encode_path(const struct path *root,
ncomponents--;
}

- *pp = p;
err = 0;
out_free:
dprintk(")\n");
@@ -1908,8 +1902,7 @@ out_free:
return err;
}

-static __be32 nfsd4_encode_fsloc_fsroot(struct svc_rqst *rqstp,
- const struct path *path, __be32 **pp, int *buflen)
+static __be32 nfsd4_encode_fsloc_fsroot(struct xdr_stream *xdr, struct svc_rqst *rqstp, const struct path *path)
{
struct svc_export *exp_ps;
__be32 res;
@@ -1917,7 +1910,7 @@ static __be32 nfsd4_encode_fsloc_fsroot(struct svc_rqst *rqstp,
exp_ps = rqst_find_fsidzero_export(rqstp);
if (IS_ERR(exp_ps))
return nfserrno(PTR_ERR(exp_ps));
- res = nfsd4_encode_path(&exp_ps->ex_path, path, pp, buflen);
+ res = nfsd4_encode_path(xdr, &exp_ps->ex_path, path);
exp_put(exp_ps);
return res;
}
@@ -1925,28 +1918,25 @@ static __be32 nfsd4_encode_fsloc_fsroot(struct svc_rqst *rqstp,
/*
* encode a fs_locations structure
*/
-static __be32 nfsd4_encode_fs_locations(struct svc_rqst *rqstp,
- struct svc_export *exp,
- __be32 **pp, int *buflen)
+static __be32 nfsd4_encode_fs_locations(struct xdr_stream *xdr, struct svc_rqst *rqstp, struct svc_export *exp)
{
__be32 status;
int i;
- __be32 *p = *pp;
+ __be32 *p;
struct nfsd4_fs_locations *fslocs = &exp->ex_fslocs;

- status = nfsd4_encode_fsloc_fsroot(rqstp, &exp->ex_path, &p, buflen);
+ status = nfsd4_encode_fsloc_fsroot(xdr, rqstp, &exp->ex_path);
if (status)
return status;
- if ((*buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
return nfserr_resource;
WRITE32(fslocs->locations_count);
for (i=0; i<fslocs->locations_count; i++) {
- status = nfsd4_encode_fs_location4(&fslocs->locations[i],
- &p, buflen);
+ status = nfsd4_encode_fs_location4(xdr, &fslocs->locations[i]);
if (status)
return status;
}
- *pp = p;
return 0;
}

@@ -1965,15 +1955,14 @@ static u32 nfs4_file_type(umode_t mode)
}

static inline __be32
-nfsd4_encode_aclname(struct svc_rqst *rqstp, struct nfs4_ace *ace,
- __be32 **p, int *buflen)
+nfsd4_encode_aclname(struct xdr_stream *xdr, struct svc_rqst *rqstp, struct nfs4_ace *ace)
{
if (ace->whotype != NFS4_ACL_WHO_NAMED)
- return nfs4_acl_write_who(ace->whotype, p, buflen);
+ return nfs4_acl_write_who(xdr, ace->whotype);
else if (ace->flag & NFS4_ACE_IDENTIFIER_GROUP)
- return nfsd4_encode_group(rqstp, ace->who_gid, p, buflen);
+ return nfsd4_encode_group(xdr, rqstp, ace->who_gid);
else
- return nfsd4_encode_user(rqstp, ace->who_uid, p, buflen);
+ return nfsd4_encode_user(xdr, rqstp, ace->who_uid);
}

#define WORD0_ABSENT_FS_ATTRS (FATTR4_WORD0_FS_LOCATIONS | FATTR4_WORD0_FSID | \
@@ -1982,31 +1971,26 @@ nfsd4_encode_aclname(struct svc_rqst *rqstp, struct nfs4_ace *ace,

#ifdef CONFIG_NFSD_V4_SECURITY_LABEL
static inline __be32
-nfsd4_encode_security_label(struct svc_rqst *rqstp, void *context, int len, __be32 **pp, int *buflen)
+nfsd4_encode_security_label(struct xdr_stream *xdr, struct svc_rqst *rqstp, void *context, int len)
{
__be32 *p = *pp;

- if (*buflen < ((XDR_QUADLEN(len) << 2) + 4 + 4 + 4))
+ p = xdr_reserve_space(xdr, len + 4 + 4 + 4);
+ if (!p)
return nfserr_resource;

/*
* For now we use a 0 here to indicate the null translation; in
* the future we may place a call to translation code here.
*/
- if ((*buflen -= 8) < 0)
- return nfserr_resource;
-
WRITE32(0); /* lfs */
WRITE32(0); /* pi */
p = xdr_encode_opaque(p, context, len);
- *buflen -= (XDR_QUADLEN(len) << 2) + 4;
-
- *pp = p;
return 0;
}
#else
static inline __be32
-nfsd4_encode_security_label(struct svc_rqst *rqstp, void *context, int len, __be32 **pp, int *buflen)
+nfsd4_encode_security_label(struct xdr_stream *xdr, struct svc_rqst *rqstp, void *context, int len)
{ return 0; }
#endif

@@ -2057,8 +2041,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
struct kstat stat;
struct svc_fh *tempfh = NULL;
struct kstatfs statfs;
- __be32 *p = xdr->p;
- int buflen = xdr->buf->buflen;
+ __be32 *p;
+ __be32 *start = xdr->p;
__be32 *attrlenp;
u32 dummy;
u64 dummy64;
@@ -2143,24 +2127,30 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
#endif /* CONFIG_NFSD_V4_SECURITY_LABEL */

if (bmval2) {
- if ((buflen -= 16) < 0)
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
goto out_resource;
WRITE32(3);
WRITE32(bmval0);
WRITE32(bmval1);
WRITE32(bmval2);
} else if (bmval1) {
- if ((buflen -= 12) < 0)
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
goto out_resource;
WRITE32(2);
WRITE32(bmval0);
WRITE32(bmval1);
} else {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE32(1);
WRITE32(bmval0);
}
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ goto out_resource;
attrlenp = p++; /* to be backfilled later */

if (bmval0 & FATTR4_WORD0_SUPPORTED_ATTRS) {
@@ -2173,13 +2163,15 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
if (!contextsupport)
word2 &= ~FATTR4_WORD2_SECURITY_LABEL;
if (!word2) {
- if ((buflen -= 12) < 0)
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
goto out_resource;
WRITE32(2);
WRITE32(word0);
WRITE32(word1);
} else {
- if ((buflen -= 16) < 0)
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
goto out_resource;
WRITE32(3);
WRITE32(word0);
@@ -2188,7 +2180,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
}
}
if (bmval0 & FATTR4_WORD0_TYPE) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
dummy = nfs4_file_type(stat.mode);
if (dummy == NF4BAD) {
@@ -2198,7 +2191,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
WRITE32(dummy);
}
if (bmval0 & FATTR4_WORD0_FH_EXPIRE_TYPE) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
if (exp->ex_flags & NFSEXP_NOSUBTREECHECK)
WRITE32(NFS4_FH_PERSISTENT);
@@ -2206,32 +2200,38 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
WRITE32(NFS4_FH_PERSISTENT|NFS4_FH_VOL_RENAME);
}
if (bmval0 & FATTR4_WORD0_CHANGE) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
write_change(&p, &stat, dentry->d_inode);
}
if (bmval0 & FATTR4_WORD0_SIZE) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64(stat.size);
}
if (bmval0 & FATTR4_WORD0_LINK_SUPPORT) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval0 & FATTR4_WORD0_SYMLINK_SUPPORT) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval0 & FATTR4_WORD0_NAMED_ATTR) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(0);
}
if (bmval0 & FATTR4_WORD0_FSID) {
- if ((buflen -= 16) < 0)
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
goto out_resource;
if (exp->ex_fslocs.migrated) {
WRITE64(NFS4_REFERRAL_FSID_MAJOR);
@@ -2253,17 +2253,20 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
}
}
if (bmval0 & FATTR4_WORD0_UNIQUE_HANDLES) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(0);
}
if (bmval0 & FATTR4_WORD0_LEASE_TIME) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(nn->nfsd4_lease);
}
if (bmval0 & FATTR4_WORD0_RDATTR_ERROR) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(rdattr_err);
}
@@ -2271,198 +2274,229 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
struct nfs4_ace *ace;

if (acl == NULL) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;

WRITE32(0);
goto out_acl;
}
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(acl->naces);

for (ace = acl->aces; ace < acl->aces + acl->naces; ace++) {
- if ((buflen -= 4*3) < 0)
+ p = xdr_reserve_space(xdr, 4*3);
+ if (!p)
goto out_resource;
WRITE32(ace->type);
WRITE32(ace->flag);
WRITE32(ace->access_mask & NFS4_ACE_MASK_ALL);
- status = nfsd4_encode_aclname(rqstp, ace, &p, &buflen);
+ status = nfsd4_encode_aclname(xdr, rqstp, ace);
if (status)
goto out;
}
}
out_acl:
if (bmval0 & FATTR4_WORD0_ACLSUPPORT) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(aclsupport ?
ACL4_SUPPORT_ALLOW_ACL|ACL4_SUPPORT_DENY_ACL : 0);
}
if (bmval0 & FATTR4_WORD0_CANSETTIME) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval0 & FATTR4_WORD0_CASE_INSENSITIVE) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(0);
}
if (bmval0 & FATTR4_WORD0_CASE_PRESERVING) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval0 & FATTR4_WORD0_CHOWN_RESTRICTED) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval0 & FATTR4_WORD0_FILEHANDLE) {
- buflen -= (XDR_QUADLEN(fhp->fh_handle.fh_size) << 2) + 4;
- if (buflen < 0)
+ p = xdr_reserve_space(xdr, fhp->fh_handle.fh_size + 4);
+ if (!p)
goto out_resource;
WRITE32(fhp->fh_handle.fh_size);
WRITEMEM(&fhp->fh_handle.fh_base, fhp->fh_handle.fh_size);
}
if (bmval0 & FATTR4_WORD0_FILEID) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64(stat.ino);
}
if (bmval0 & FATTR4_WORD0_FILES_AVAIL) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64((u64) statfs.f_ffree);
}
if (bmval0 & FATTR4_WORD0_FILES_FREE) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64((u64) statfs.f_ffree);
}
if (bmval0 & FATTR4_WORD0_FILES_TOTAL) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64((u64) statfs.f_files);
}
if (bmval0 & FATTR4_WORD0_FS_LOCATIONS) {
- status = nfsd4_encode_fs_locations(rqstp, exp, &p, &buflen);
+ status = nfsd4_encode_fs_locations(xdr, rqstp, exp);
if (status)
goto out;
}
if (bmval0 & FATTR4_WORD0_HOMOGENEOUS) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval0 & FATTR4_WORD0_MAXFILESIZE) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64(exp->ex_path.mnt->mnt_sb->s_maxbytes);
}
if (bmval0 & FATTR4_WORD0_MAXLINK) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(255);
}
if (bmval0 & FATTR4_WORD0_MAXNAME) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(statfs.f_namelen);
}
if (bmval0 & FATTR4_WORD0_MAXREAD) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64((u64) svc_max_payload(rqstp));
}
if (bmval0 & FATTR4_WORD0_MAXWRITE) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64((u64) svc_max_payload(rqstp));
}
if (bmval1 & FATTR4_WORD1_MODE) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(stat.mode & S_IALLUGO);
}
if (bmval1 & FATTR4_WORD1_NO_TRUNC) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval1 & FATTR4_WORD1_NUMLINKS) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(stat.nlink);
}
if (bmval1 & FATTR4_WORD1_OWNER) {
- status = nfsd4_encode_user(rqstp, stat.uid, &p, &buflen);
+ status = nfsd4_encode_user(xdr, rqstp, stat.uid);
if (status)
goto out;
}
if (bmval1 & FATTR4_WORD1_OWNER_GROUP) {
- status = nfsd4_encode_group(rqstp, stat.gid, &p, &buflen);
+ status = nfsd4_encode_group(xdr, rqstp, stat.gid);
if (status)
goto out;
}
if (bmval1 & FATTR4_WORD1_RAWDEV) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE32((u32) MAJOR(stat.rdev));
WRITE32((u32) MINOR(stat.rdev));
}
if (bmval1 & FATTR4_WORD1_SPACE_AVAIL) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
dummy64 = (u64)statfs.f_bavail * (u64)statfs.f_bsize;
WRITE64(dummy64);
}
if (bmval1 & FATTR4_WORD1_SPACE_FREE) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
dummy64 = (u64)statfs.f_bfree * (u64)statfs.f_bsize;
WRITE64(dummy64);
}
if (bmval1 & FATTR4_WORD1_SPACE_TOTAL) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
dummy64 = (u64)statfs.f_blocks * (u64)statfs.f_bsize;
WRITE64(dummy64);
}
if (bmval1 & FATTR4_WORD1_SPACE_USED) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
dummy64 = (u64)stat.blocks << 9;
WRITE64(dummy64);
}
if (bmval1 & FATTR4_WORD1_TIME_ACCESS) {
- if ((buflen -= 12) < 0)
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
goto out_resource;
WRITE64((s64)stat.atime.tv_sec);
WRITE32(stat.atime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_TIME_DELTA) {
- if ((buflen -= 12) < 0)
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
goto out_resource;
WRITE32(0);
WRITE32(1);
WRITE32(0);
}
if (bmval1 & FATTR4_WORD1_TIME_METADATA) {
- if ((buflen -= 12) < 0)
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
goto out_resource;
WRITE64((s64)stat.ctime.tv_sec);
WRITE32(stat.ctime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_TIME_MODIFY) {
- if ((buflen -= 12) < 0)
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
goto out_resource;
WRITE64((s64)stat.mtime.tv_sec);
WRITE32(stat.mtime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
/*
* Get parent's attributes if not ignoring crossmount
@@ -2474,13 +2508,13 @@ out_acl:
WRITE64(stat.ino);
}
if (bmval2 & FATTR4_WORD2_SECURITY_LABEL) {
- status = nfsd4_encode_security_label(rqstp, context,
- contextlen, &p, &buflen);
+ status = nfsd4_encode_security_label(xdr, rqstp, context, contextlen);
if (status)
goto out;
}
if (bmval2 & FATTR4_WORD2_SUPPATTR_EXCLCREAT) {
- if ((buflen -= 16) < 0)
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
goto out_resource;
WRITE32(3);
WRITE32(NFSD_SUPPATTR_EXCLCREAT_WORD0);
@@ -2488,8 +2522,7 @@ out_acl:
WRITE32(NFSD_SUPPATTR_EXCLCREAT_WORD2);
}

- *attrlenp = htonl((char *)p - (char *)attrlenp - 4);
- xdr->p = p;
+ *attrlenp = htonl((char *)xdr->p - (char *)attrlenp - 4);
status = nfs_ok;

out:
@@ -2502,6 +2535,13 @@ out:
fh_put(tempfh);
kfree(tempfh);
}
+ if (status) {
+ int nbytes = (char *)xdr->p - (char *)start;
+ /* open code what *should* be xdr_truncate(xdr, len); */
+ xdr->iov->iov_len -= nbytes;
+ xdr->buf->len -= nbytes;
+ xdr->p = start;
+ }
return status;
out_nfserr:
status = nfserrno(err);
@@ -2765,13 +2805,10 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
{
struct svc_fh *fhp = getattr->ga_fhp;
struct xdr_stream *xdr = &resp->xdr;
- struct xdr_buf *buf = resp->xdr.buf;

if (nfserr)
return nfserr;

- buf->buflen = (void *)resp->xdr.end - (void *)resp->xdr.p
- - COMPOUND_ERR_SLACK_SPACE;
nfserr = nfsd4_encode_fattr(xdr, fhp, fhp->fh_export, fhp->fh_dentry,
getattr->ga_bmval,
resp->rqstp, 0);
@@ -2968,6 +3005,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
int v;
struct page *page;
unsigned long maxcount;
+ struct xdr_stream *xdr = &resp->xdr;
long len;
__be32 *p;

@@ -3014,6 +3052,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
resp->xdr.buf->head[0].iov_len = (char*)p
- (char*)resp->xdr.buf->head[0].iov_base;
resp->xdr.buf->page_len = maxcount;
+ xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
resp->xdr.buf->tail[0].iov_base = p;
@@ -3032,6 +3071,7 @@ static __be32
nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readlink *readlink)
{
int maxcount;
+ struct xdr_stream *xdr = &resp->xdr;
char *page;
__be32 *p;

@@ -3064,6 +3104,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
resp->xdr.buf->head[0].iov_len = (char*)p
- (char*)resp->xdr.buf->head[0].iov_base;
resp->xdr.buf->page_len = maxcount;
+ xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
resp->xdr.buf->tail[0].iov_base = p;
@@ -3083,6 +3124,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
{
int maxcount;
loff_t offset;
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *page, *savep, *tailbase;
__be32 *p;

@@ -3145,6 +3187,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
resp->xdr.buf->page_len = ((char*)p) -
(char*)page_address(*(resp->rqstp->rq_next_page-1));

+ xdr->iov = xdr->buf->tail;
+
/* Use rest of head for padding and remaining ops: */
resp->xdr.buf->tail[0].iov_base = tailbase;
resp->xdr.buf->tail[0].iov_len = 0;
--
1.7.9.5


2014-05-13 21:18:45

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd4 xdr encoding fixes v2

On Tue, May 13, 2014 at 10:48:26AM -0400, J. Bruce Fields wrote:
> On Tue, May 13, 2014 at 04:09:45AM -0700, Christoph Hellwig wrote:
> > On Mon, May 12, 2014 at 09:11:28AM -0700, Christoph Hellwig wrote:
> > > On Mon, May 12, 2014 at 12:07:41PM -0400, J. Bruce Fields wrote:
> > > > On Mon, May 12, 2014 at 01:20:59AM -0700, Christoph Hellwig wrote:
> > > > > This series seem to cause hangs during xfstests against a server on the
> > > > > same VM. The trace is fairly similar every the hang happens, but the
> > > > > point at which it happens differs:
> > > >
> > > > Ouch, OK, and you're sure it starts with this series?
> > > >
> > > > I guess I should try to replicate it here. Might take a copule days.
> >
> > Seems lile "nfsd4: allow exotic read compounds" is the culprit.
>
> OK, it makes sense that the problem would be there. Looking....

I got xfstests set up and can reproduce some problems (a hang, the
nfs4svc_encode_compoundres WARN, and some allocation failures), though
not exactly what you reported.

I also notice that commit ("nfsd4: allow exotic read compounds") has a
lot of extraneous cleanup that I should split out.

Anyway, thanks for the report, I'll investigate some more tomorrow....

--b.

2014-05-13 05:05:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 04/43] nfsd4: reserve head space for krb5 integ/priv info

On Mon, May 12, 2014 at 05:45:45PM -0400, J. Bruce Fields wrote:
> Yes. At the end of this series we have RPC_MAX_AUTH_SIZE scattered
> around in a few different places. Rather than have each place have some
> flavor-specific logic I think I'd like the auth code to set an
> rq_auth_slack field in the struct svc_rqst for code like this to use.

That sounds pretty reasonable to me.

2014-05-11 20:53:03

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 29/43] nfsd4: use session limits to release send buffer reservation

From: "J. Bruce Fields" <[email protected]>

Once we know the limits the session places on the size of the rpc, we
can also use that information to release any unnecessary reserved reply
buffer space.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4state.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 620d240..2526426 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2286,6 +2286,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
nfserr_rep_too_big;
if (xdr_restrict_buflen(xdr, buflen - 2 * RPC_MAX_AUTH_SIZE))
goto out_put_session;
+ svc_reserve(rqstp, buflen);

status = nfs_ok;
/* Success! bump slot seqid */
--
1.7.9.5


2014-05-11 20:53:06

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 36/43] nfsd4: nfsd_vfs_read doesn't use file handle parameter

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/vfs.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 16f0673..cfd83f6 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -821,7 +821,7 @@ static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
}

static __be32
-nfsd_vfs_read(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
+nfsd_vfs_read(struct svc_rqst *rqstp, struct file *file,
loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
{
mm_segment_t oldfs;
@@ -987,7 +987,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (ra && ra->p_set)
file->f_ra = ra->p_ra;

- err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
+ err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);

/* Write back readahead params */
if (ra) {
@@ -1016,7 +1016,7 @@ nfsd_read_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
NFSD_MAY_READ|NFSD_MAY_OWNER_OVERRIDE);
if (err)
goto out;
- err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
+ err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
} else /* Note file may still be NULL in NFSv4 special stateid case: */
err = nfsd_read(rqstp, fhp, offset, vec, vlen, count);
out:
--
1.7.9.5


2014-05-11 20:53:04

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 33/43] nfsd4: better estimate of getattr response size

From: "J. Bruce Fields" <[email protected]>

We plan to use this estimate to decide whether or not to allow zero-copy
reads. Currently we're assuming all getattr's are a page, which can be
both too small (ACLs e.g. may be arbitrarily long) and too large (after
an upcoming read patch this will unnecessarily prevent zero copy reads
in any read compound also containing a getattr).

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 43 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 0ab65ae..edd2eb1 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1414,6 +1414,48 @@ static inline u32 nfsd4_create_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op
+ nfs4_fattr_bitmap_maxsz) * sizeof(__be32);
}

+/*
+ * Note since this is an idempotent operation we won't insist on failing
+ * the op prematurely if the estimate is too large. We may turn off splice
+ * reads unnecessarily.
+ */
+static inline u32 nfsd4_getattr_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
+{
+ u32 *bmap = op->u.getattr.ga_bmval;
+ u32 bmap0 = bmap[0], bmap1 = bmap[1], bmap2 = bmap[2];
+ u32 ret = 0;
+
+ if (bmap0 & FATTR4_WORD0_ACL)
+ return svc_max_payload(rqstp);
+ if (bmap0 & FATTR4_WORD0_FS_LOCATIONS)
+ return svc_max_payload(rqstp);
+
+ if (bmap1 & FATTR4_WORD1_OWNER) {
+ ret += IDMAP_NAMESZ + 4;
+ bmap1 &= ~FATTR4_WORD1_OWNER;
+ }
+ if (bmap1 & FATTR4_WORD1_OWNER_GROUP) {
+ ret += IDMAP_NAMESZ + 4;
+ bmap1 &= ~FATTR4_WORD1_OWNER_GROUP;
+ }
+ if (bmap0 & FATTR4_WORD0_FILEHANDLE) {
+ ret += NFS4_FHSIZE + 4;
+ bmap0 &= ~FATTR4_WORD0_FILEHANDLE;
+ }
+ if (bmap2 & FATTR4_WORD2_SECURITY_LABEL) {
+ ret += NFSD4_MAX_SEC_LABEL_LEN + 12;
+ bmap2 &= ~FATTR4_WORD2_SECURITY_LABEL;
+ }
+ /*
+ * Largest of remaining attributes are 16 bytes (e.g.,
+ * supported_attributes)
+ */
+ ret += 16 * (hweight32(bmap0) + hweight32(bmap1) + hweight32(bmap2));
+ /* bitmask, length */
+ ret += 20;
+ return ret;
+}
+
static inline u32 nfsd4_link_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
return (op_encode_hdr_size + op_encode_change_info_maxsz)
@@ -1548,6 +1590,7 @@ static struct nfsd4_operation nfsd4_ops[] = {
[OP_GETATTR] = {
.op_func = (nfsd4op_func)nfsd4_getattr,
.op_flags = ALLOWED_ON_ABSENT_FS,
+ .op_rsize_bop = nfsd4_getattr_rsize,
.op_name = "OP_GETATTR",
},
[OP_GETFH] = {
--
1.7.9.5


2014-05-11 20:52:55

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 07/43] nfsd4: allow space for final error return

From: "J. Bruce Fields" <[email protected]>

This post-encoding check should be taking into account the need to
encode at least an out-of-space error to the following op (if any).

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index fb40dd1..e915bb1 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3632,6 +3632,7 @@ void
nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
{
struct nfs4_stateowner *so = resp->cstate.replay_owner;
+ struct svc_rqst *rqstp = resp->rqstp;
__be32 *statp;
nfsd4_enc encoder;
__be32 *p;
@@ -3648,8 +3649,12 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
encoder = nfsd4_enc_ops[op->opnum];
op->status = encoder(resp, op->status, &op->u);
/* nfsd4_check_resp_size guarantees enough room for error status */
- if (!op->status)
- op->status = nfsd4_check_resp_size(resp, 0);
+ if (!op->status) {
+ int space_needed = 0;
+ if (!nfsd4_last_compound_op(rqstp))
+ space_needed = COMPOUND_ERR_SLACK_SPACE;
+ op->status = nfsd4_check_resp_size(resp, space_needed);
+ }
if (op->status == nfserr_resource ||
op->status == nfserr_rep_too_big ||
op->status == nfserr_rep_too_big_to_cache) {
--
1.7.9.5


2014-05-12 16:11:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: nfsd4 xdr encoding fixes v2

On Mon, May 12, 2014 at 12:07:41PM -0400, J. Bruce Fields wrote:
> On Mon, May 12, 2014 at 01:20:59AM -0700, Christoph Hellwig wrote:
> > This series seem to cause hangs during xfstests against a server on the
> > same VM. The trace is fairly similar every the hang happens, but the
> > point at which it happens differs:
>
> Ouch, OK, and you're sure it starts with this series?
>
> I guess I should try to replicate it here. Might take a copule days.

Taking your branch and resetting it to
9fa1959e976f7a6ae84f616ca669359028070c61 didn't reproduce it after
multiple runs. I tried a bisect, but it it instead produces a different
warn on on the first iteration, which then continues to be around, but
it gone with the whole series applied:


[ 171.070621] ------------[ cut here ]------------
[ 171.071611] WARNING: CPU: 1 PID: 3784 at /work/hch/linux/fs/nfsd/nfs4xdr.c:3907 nfs4svc_encode_compoundres+0x16d/0x180()
[ 171.074170] Modules linked in:
[ 171.074959] CPU: 1 PID: 3784 Comm: nfsd Not tainted 3.15.0-rc1+ #23
[ 171.076582] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 171.078117] 0000000000000009 ffff880000167cc8 ffffffff81cdfd4a 0000000000000000
[ 171.080377] 0000000000000000 ffff880000167d08 ffffffff810ab6e7 ffff88007ac611a8
[ 171.082631] ffff88007ac61000[ 171.083271] ------------[ cut here ]------------


2014-05-11 20:52:55

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 12/43] nfsd4: keep xdr buf length updated

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 ++
fs/nfsd/nfs4xdr.c | 12 ++++++++++--
2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 7c45172..ad0bc5f 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1211,6 +1211,8 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp, struct nfsd4_compoundres
xdr->iov = head;
xdr->p = head->iov_base + head->iov_len;
xdr->end = head->iov_base + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
+ /* Tail and page_len should be zero at this point: */
+ buf->len = buf->head[0].iov_len;
}

/*
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 38a78a5..2f16a80 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3038,9 +3038,10 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

WRITE32(eof);
WRITE32(maxcount);
- resp->xdr.buf->head[0].iov_len = (char*)p
- - (char*)resp->xdr.buf->head[0].iov_base;
+ WARN_ON_ONCE(resp->xdr.buf->head[0].iov_len != (char *)p
+ - (char *)resp->xdr.buf->head[0].iov_base);
resp->xdr.buf->page_len = maxcount;
+ xdr->buf->len += maxcount;
xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
@@ -3051,6 +3052,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
+ xdr->buf->len -= (maxcount&3);
}
return 0;
}
@@ -3094,6 +3096,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
resp->xdr.buf->head[0].iov_len = (char*)p
- (char*)resp->xdr.buf->head[0].iov_base;
resp->xdr.buf->page_len = maxcount;
+ xdr->buf->len += maxcount;
xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
@@ -3174,6 +3177,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
*p++ = htonl(readdir->common.err == nfserr_eof);
resp->xdr.buf->page_len = ((char*)p) -
(char*)page_address(*(resp->rqstp->rq_next_page-1));
+ xdr->buf->len += xdr->buf->page_len;

xdr->iov = xdr->buf->tail;

@@ -3777,6 +3781,10 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
* All that remains is to write the tag and operation count...
*/
struct nfsd4_compound_state *cs = &resp->cstate;
+ struct xdr_buf *buf = resp->xdr.buf;
+
+ WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len +
+ buf->tail[0].iov_len);

p = resp->tagp;
*p++ = htonl(resp->taglen);
--
1.7.9.5


2014-05-11 20:52:55

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 05/43] nfsd4: move nfsd4_operation to xdr4.h

From: "J. Bruce Fields" <[email protected]>

We want to share some of these definitions.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 58 ++++------------------------------------------------
fs/nfsd/xdr4.h | 53 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+), 54 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 3d4b044..99aa348 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1123,58 +1123,8 @@ static inline void nfsd4_increment_op_stats(u32 opnum)
nfsdstats.nfs4_opcount[opnum]++;
}

-typedef __be32(*nfsd4op_func)(struct svc_rqst *, struct nfsd4_compound_state *,
- void *);
-typedef u32(*nfsd4op_rsize)(struct svc_rqst *, struct nfsd4_op *op);
-typedef void(*stateid_setter)(struct nfsd4_compound_state *, void *);
-typedef void(*stateid_getter)(struct nfsd4_compound_state *, void *);
-
-enum nfsd4_op_flags {
- ALLOWED_WITHOUT_FH = 1 << 0, /* No current filehandle required */
- ALLOWED_ON_ABSENT_FS = 1 << 1, /* ops processed on absent fs */
- ALLOWED_AS_FIRST_OP = 1 << 2, /* ops reqired first in compound */
- /* For rfc 5661 section 2.6.3.1.1: */
- OP_HANDLES_WRONGSEC = 1 << 3,
- OP_IS_PUTFH_LIKE = 1 << 4,
- /*
- * These are the ops whose result size we estimate before
- * encoding, to avoid performing an op then not being able to
- * respond or cache a response. This includes writes and setattrs
- * as well as the operations usually called "nonidempotent":
- */
- OP_MODIFIES_SOMETHING = 1 << 5,
- /*
- * Cache compounds containing these ops in the xid-based drc:
- * We use the DRC for compounds containing non-idempotent
- * operations, *except* those that are 4.1-specific (since
- * sessions provide their own EOS), and except for stateful
- * operations other than setclientid and setclientid_confirm
- * (since sequence numbers provide EOS for open, lock, etc in
- * the v4.0 case).
- */
- OP_CACHEME = 1 << 6,
- /*
- * These are ops which clear current state id.
- */
- OP_CLEAR_STATEID = 1 << 7,
-};
-
-struct nfsd4_operation {
- nfsd4op_func op_func;
- u32 op_flags;
- char *op_name;
- /* Try to get response size before operation */
- nfsd4op_rsize op_rsize_bop;
- stateid_getter op_get_currentstateid;
- stateid_setter op_set_currentstateid;
-};
-
static struct nfsd4_operation nfsd4_ops[];

-#ifdef NFSD_DEBUG
-static const char *nfsd4_op_name(unsigned opnum);
-#endif
-
/*
* Enforce NFSv4.1 COMPOUND ordering rules:
*
@@ -1208,7 +1158,7 @@ static __be32 nfs41_check_op_ordering(struct nfsd4_compoundargs *args)
return nfs_ok;
}

-static inline struct nfsd4_operation *OPDESC(struct nfsd4_op *op)
+struct nfsd4_operation *OPDESC(struct nfsd4_op *op)
{
return &nfsd4_ops[op->opnum];
}
@@ -1856,14 +1806,14 @@ static struct nfsd4_operation nfsd4_ops[] = {
},
};

-#ifdef NFSD_DEBUG
-static const char *nfsd4_op_name(unsigned opnum)
+const char *nfsd4_op_name(unsigned opnum)
{
+#ifdef NFSD_DEBUG
if (opnum < ARRAY_SIZE(nfsd4_ops))
return nfsd4_ops[opnum].op_name;
+#endif
return "unknown_operation";
}
-#endif

#define nfsd4_voidres nfsd4_voidargs
struct nfsd4_voidargs { int dummy; };
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index f62a055..fa3a589 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -536,6 +536,59 @@ static inline bool nfsd4_last_compound_op(struct svc_rqst *rqstp)
return argp->opcnt == resp->opcnt;
}

+
+
+typedef __be32(*nfsd4op_func)(struct svc_rqst *, struct nfsd4_compound_state *,
+ void *);
+typedef u32(*nfsd4op_rsize)(struct svc_rqst *, struct nfsd4_op *op);
+typedef void(*stateid_setter)(struct nfsd4_compound_state *, void *);
+typedef void(*stateid_getter)(struct nfsd4_compound_state *, void *);
+
+enum nfsd4_op_flags {
+ ALLOWED_WITHOUT_FH = 1 << 0, /* No current filehandle required */
+ ALLOWED_ON_ABSENT_FS = 1 << 1, /* ops processed on absent fs */
+ ALLOWED_AS_FIRST_OP = 1 << 2, /* ops reqired first in compound */
+ /* For rfc 5661 section 2.6.3.1.1: */
+ OP_HANDLES_WRONGSEC = 1 << 3,
+ OP_IS_PUTFH_LIKE = 1 << 4,
+ /*
+ * These are the ops whose result size we estimate before
+ * encoding, to avoid performing an op then not being able to
+ * respond or cache a response. This includes writes and setattrs
+ * as well as the operations usually called "nonidempotent":
+ */
+ OP_MODIFIES_SOMETHING = 1 << 5,
+ /*
+ * Cache compounds containing these ops in the xid-based drc:
+ * We use the DRC for compounds containing non-idempotent
+ * operations, *except* those that are 4.1-specific (since
+ * sessions provide their own EOS), and except for stateful
+ * operations other than setclientid and setclientid_confirm
+ * (since sequence numbers provide EOS for open, lock, etc in
+ * the v4.0 case).
+ */
+ OP_CACHEME = 1 << 6,
+ /*
+ * These are ops which clear current state id.
+ */
+ OP_CLEAR_STATEID = 1 << 7,
+};
+
+struct nfsd4_operation {
+ nfsd4op_func op_func;
+ u32 op_flags;
+ char *op_name;
+ /* Try to get response size before
+ * operation */
+ nfsd4op_rsize op_rsize_bop;
+ stateid_getter op_get_currentstateid;
+ stateid_setter op_set_currentstateid;
+};
+
+struct nfsd4_operation *OPDESC(struct nfsd4_op *op);
+
+const char *nfsd4_op_name(unsigned opnum);
+
#define NFS4_SVC_XDRSIZE sizeof(struct nfsd4_compoundargs)

static inline void
--
1.7.9.5


2014-05-11 20:53:02

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 26/43] nfsd4: nfsd4_check_resp_size should check against whole buffer

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index eb1694d..4751fd4 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3760,7 +3760,6 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 respsize)
{
struct xdr_buf *buf = &resp->rqstp->rq_res;
struct nfsd4_session *session = resp->cstate.session;
- int slack_bytes = (char *)resp->xdr.end - (char *)resp->xdr.p;

if (nfsd4_has_session(&resp->cstate)) {
struct nfsd4_slot *slot = resp->cstate.slot;
@@ -3773,7 +3772,7 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 respsize)
return nfserr_rep_too_big_to_cache;
}

- if (respsize > slack_bytes) {
+ if (buf->len + respsize > buf->buflen) {
WARN_ON_ONCE(nfsd4_has_session(&resp->cstate));
return nfserr_resource;
}
--
1.7.9.5


2014-05-11 20:53:05

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 35/43] nfsd4: turn off zero-copy-read in exotic cases

From: "J. Bruce Fields" <[email protected]>

We currently allow only one read per compound, with operations before
and after whose responses will require no more than about a page to
encode.

While we don't expect clients to violate those limits any time soon,
this limitation isn't really condoned by the spec, so to future proof
the server we should lift the limitation.

At the same time we'd like to continue to support zero-copy reads.

Supporting multiple zero-copy-reads per compound would require a new
data structure to replace struct xdr_buf, which can represent only one
set of included pages.

So for now we plan to modify encode_read() to support either zero-copy
or non-zero-copy reads, and use some heuristics at the start of the
compound processing to decide whether a zero-copy read will work.

This will allow us to support more exotic compounds without introducing
a performance regression in the normal case.

Later patches handle those "exotic compounds", this one just makes sure
zero-copy is turned off in those cases.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 4dba311..f69906d 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1629,6 +1629,8 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
struct nfsd4_op *op;
bool cachethis = false;
int max_reply = 2 * RPC_MAX_AUTH_SIZE; /* uh, kind of a guess */
+ int readcount = 0;
+ int readbytes = 0;
int i;

READ_BUF(4);
@@ -1679,14 +1681,21 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
*/
cachethis |= nfsd4_cache_this_op(op);

- max_reply += nfsd4_max_reply(argp->rqstp, op);
+ if (op->opnum == OP_READ) {
+ readcount++;
+ readbytes += nfsd4_max_reply(argp->rqstp, op);
+ } else
+ max_reply += nfsd4_max_reply(argp->rqstp, op);
}
/* Sessions make the DRC unnecessary: */
if (argp->minorversion)
cachethis = false;
- svc_reserve(argp->rqstp, max_reply);
+ svc_reserve(argp->rqstp, max_reply + readbytes);
argp->rqstp->rq_cachetype = cachethis ? RC_REPLBUFF : RC_NOCACHE;

+ if (readcount > 1 || max_reply > PAGE_SIZE - 2*RPC_MAX_AUTH_SIZE)
+ argp->rqstp->rq_splice_ok = false;
+
DECODE_TAIL;
}

--
1.7.9.5


2014-05-12 08:18:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 07/43] nfsd4: allow space for final error return

On Sun, May 11, 2014 at 04:52:12PM -0400, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> This post-encoding check should be taking into account the need to
> encode at least an out-of-space error to the following op (if any).
>
> Signed-off-by: J. Bruce Fields <[email protected]>

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2014-05-11 20:53:02

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 27/43] rpc: define xdr_restrict_buflen

From: "J. Bruce Fields" <[email protected]>

With this xdr_reserve_space can help us enforce various limits.

Signed-off-by: J. Bruce Fields <[email protected]>
---
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/xdr.c | 29 +++++++++++++++++++++++++++++
2 files changed, 30 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index b23d69f..70c6b92 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -217,6 +217,7 @@ extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32
extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
extern void xdr_commit_encode(struct xdr_stream *xdr);
extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len);
+extern int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen);
extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages,
unsigned int base, unsigned int len);
extern unsigned int xdr_stream_pos(const struct xdr_stream *xdr);
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index e65d6b6..f97e3df 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -650,6 +650,35 @@ void xdr_truncate_encode(struct xdr_stream *xdr, size_t len)
EXPORT_SYMBOL(xdr_truncate_encode);

/**
+ * xdr_restrict_buflen - decrease available buffer space
+ * @xdr: pointer to xdr_stream
+ * @newbuflen: new maximum number of bytes available
+ *
+ * Adjust our idea of how much space is available in the buffer.
+ * If we've already used too much space in the buffer, returns -1.
+ * If the available space is already smaller than newbuflen, returns 0
+ * and does nothing. Otherwise, adjusts xdr->buf->buflen to newbuflen
+ * and ensures xdr->end is set at most offset newbuflen from the start
+ * of the buffer.
+ */
+int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen)
+{
+ struct xdr_buf *buf = xdr->buf;
+ int left_in_this_buf = (void *)xdr->end - (void *)xdr->p;
+ int end_offset = buf->len + left_in_this_buf;
+
+ if (newbuflen < 0 || newbuflen < buf->len)
+ return -1;
+ if (newbuflen > buf->buflen)
+ return 0;
+ if (newbuflen < end_offset)
+ xdr->end = (void *)xdr->end + newbuflen - end_offset;
+ buf->buflen = newbuflen;
+ return 0;
+}
+EXPORT_SYMBOL(xdr_restrict_buflen);
+
+/**
* xdr_write_pages - Insert a list of pages into an XDR buffer for sending
* @xdr: pointer to xdr_stream
* @pages: list of pages
--
1.7.9.5


2014-05-11 20:52:57

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 11/43] nfsd4: no need for encode_compoundres to adjust lengths

From: "J. Bruce Fields" <[email protected]>

xdr_reserve_space should now be calculating the length correctly as we
go, so there's no longer any need to fix it up here.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4state.c | 3 +++
fs/nfsd/nfs4xdr.c | 8 +-------
2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 05cc3eb..d4c9683 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1614,6 +1614,7 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
struct nfsd4_sequence *seq)
{
struct nfsd4_slot *slot = resp->cstate.slot;
+ struct kvec *head = resp->xdr.iov;
__be32 status;

dprintk("--> %s slot %p\n", __func__, slot);
@@ -1627,6 +1628,8 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,

resp->opcnt = slot->sl_opcnt;
resp->xdr.p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
+ head->iov_len = (void *)resp->xdr.p - head->iov_base;
+ resp->xdr.buf->len = head->iov_len;
status = slot->sl_status;

return status;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 53708ce..38a78a5 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3777,19 +3777,13 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
* All that remains is to write the tag and operation count...
*/
struct nfsd4_compound_state *cs = &resp->cstate;
- struct kvec *iov;
+
p = resp->tagp;
*p++ = htonl(resp->taglen);
memcpy(p, resp->tag, resp->taglen);
p += XDR_QUADLEN(resp->taglen);
*p++ = htonl(resp->opcnt);

- if (rqstp->rq_res.page_len)
- iov = &rqstp->rq_res.tail[0];
- else
- iov = &rqstp->rq_res.head[0];
- iov->iov_len = ((char*)resp->xdr.p) - (char*)iov->iov_base;
- BUG_ON(iov->iov_len > PAGE_SIZE);
if (nfsd4_has_session(cs)) {
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
struct nfs4_client *clp = cs->session->se_client;
--
1.7.9.5


2014-05-11 20:53:00

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 18/43] nfsd4: nfsd4_check_resp_size needn't recalculate length

From: "J. Bruce Fields" <[email protected]>

We're keeping the length updated as we go now, so there's no need for
the extra calculation here.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 18 +++---------------
1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 846d241..4c036eb 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3735,32 +3735,20 @@ static nfsd4_enc nfsd4_enc_ops[] = {
*/
__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 pad)
{
- struct xdr_buf *xb = &resp->rqstp->rq_res;
+ struct xdr_buf *buf = &resp->rqstp->rq_res;
struct nfsd4_session *session = NULL;
struct nfsd4_slot *slot = resp->cstate.slot;
- u32 length, tlen = 0;

if (!nfsd4_has_session(&resp->cstate))
return 0;

session = resp->cstate.session;

- if (xb->page_len == 0) {
- length = (char *)resp->xdr.p - (char *)xb->head[0].iov_base + pad;
- } else {
- if (xb->tail[0].iov_base && xb->tail[0].iov_len > 0)
- tlen = (char *)resp->xdr.p - (char *)xb->tail[0].iov_base;
-
- length = xb->head[0].iov_len + xb->page_len + tlen + pad;
- }
- dprintk("%s length %u, xb->page_len %u tlen %u pad %u\n", __func__,
- length, xb->page_len, tlen, pad);
-
- if (length > session->se_fchannel.maxresp_sz)
+ if (buf->len + pad > session->se_fchannel.maxresp_sz)
return nfserr_rep_too_big;

if ((slot->sl_flags & NFSD4_SLOT_CACHETHIS) &&
- length > session->se_fchannel.maxresp_cached)
+ buf->len + pad > session->se_fchannel.maxresp_cached)
return nfserr_rep_too_big_to_cache;

return 0;
--
1.7.9.5


2014-05-11 20:53:02

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 24/43] nfsd4: more precise nfsd4_max_reply

From: "J. Bruce Fields" <[email protected]>

---
fs/nfsd/nfs4xdr.c | 27 ++++++++-------------------
1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d418d7e..19071d7 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1614,22 +1614,12 @@ nfsd4_opnum_in_range(struct nfsd4_compoundargs *argp, struct nfsd4_op *op)
* use pages beyond the first one, so the maximum possible length is the
* maximum over these values, not the sum.
*/
-static int nfsd4_max_reply(u32 opnum)
+static int nfsd4_max_reply(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
- switch (opnum) {
- case OP_READLINK:
- case OP_READDIR:
- /*
- * Both of these ops take a single page for data and put
- * the head and tail in another page:
- */
- return 2 * PAGE_SIZE;
- case OP_GETATTR:
- case OP_READ:
- return INT_MAX;
- default:
- return PAGE_SIZE;
- }
+ struct nfsd4_operation *opdesc = OPDESC(op);
+ nfsd4op_rsize estimator = opdesc->op_rsize_bop;
+
+ return estimator ? estimator(rqstp, op) : PAGE_SIZE;
}

static __be32
@@ -1638,7 +1628,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
DECODE_HEAD;
struct nfsd4_op *op;
bool cachethis = false;
- int max_reply = PAGE_SIZE;
+ int max_reply = 2 * RPC_MAX_AUTH_SIZE; /* uh, kind of a guess */
int i;

READ_BUF(4);
@@ -1689,13 +1679,12 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
*/
cachethis |= nfsd4_cache_this_op(op);

- max_reply = max(max_reply, nfsd4_max_reply(op->opnum));
+ max_reply += nfsd4_max_reply(argp->rqstp, op);
}
/* Sessions make the DRC unnecessary: */
if (argp->minorversion)
cachethis = false;
- if (max_reply != INT_MAX)
- svc_reserve(argp->rqstp, max_reply);
+ svc_reserve(argp->rqstp, max_reply);
argp->rqstp->rq_cachetype = cachethis ? RC_REPLBUFF : RC_NOCACHE;

DECODE_TAIL;
--
1.7.9.5


2014-05-11 20:53:08

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 41/43] nfsd4: kill WRITE64

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 52 ++++++++++++++++++++++++----------------------------
1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index db4c53c..a2e34f5 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1699,10 +1699,6 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
DECODE_TAIL;
}

-#define WRITE64(n) do { \
- *p++ = htonl((u32)((n) >> 32)); \
- *p++ = htonl((u32)(n)); \
-} while (0)
#define WRITEMEM(ptr,nbytes) do { if (nbytes > 0) { \
*(p + XDR_QUADLEN(nbytes) -1) = 0; \
memcpy(p, ptr, nbytes); \
@@ -2207,7 +2203,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64(stat.size);
+ p = xdr_encode_hyper(p, stat.size);
}
if (bmval0 & FATTR4_WORD0_LINK_SUPPORT) {
p = xdr_reserve_space(xdr, 4);
@@ -2232,12 +2228,12 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
if (!p)
goto out_resource;
if (exp->ex_fslocs.migrated) {
- WRITE64(NFS4_REFERRAL_FSID_MAJOR);
- WRITE64(NFS4_REFERRAL_FSID_MINOR);
+ p = xdr_encode_hyper(p, NFS4_REFERRAL_FSID_MAJOR);
+ p = xdr_encode_hyper(p, NFS4_REFERRAL_FSID_MINOR);
} else switch(fsid_source(fhp)) {
case FSIDSOURCE_FSID:
- WRITE64((u64)exp->ex_fsid);
- WRITE64((u64)0);
+ p = xdr_encode_hyper(p, (u64)exp->ex_fsid);
+ p = xdr_encode_hyper(p, (u64)0);
break;
case FSIDSOURCE_DEV:
*p++ = cpu_to_be32(0);
@@ -2339,25 +2335,25 @@ out_acl:
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64(stat.ino);
+ p = xdr_encode_hyper(p, stat.ino);
}
if (bmval0 & FATTR4_WORD0_FILES_AVAIL) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64((u64) statfs.f_ffree);
+ p = xdr_encode_hyper(p, (u64) statfs.f_ffree);
}
if (bmval0 & FATTR4_WORD0_FILES_FREE) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64((u64) statfs.f_ffree);
+ p = xdr_encode_hyper(p, (u64) statfs.f_ffree);
}
if (bmval0 & FATTR4_WORD0_FILES_TOTAL) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64((u64) statfs.f_files);
+ p = xdr_encode_hyper(p, (u64) statfs.f_files);
}
if (bmval0 & FATTR4_WORD0_FS_LOCATIONS) {
status = nfsd4_encode_fs_locations(xdr, rqstp, exp);
@@ -2374,7 +2370,7 @@ out_acl:
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64(exp->ex_path.mnt->mnt_sb->s_maxbytes);
+ p = xdr_encode_hyper(p, exp->ex_path.mnt->mnt_sb->s_maxbytes);
}
if (bmval0 & FATTR4_WORD0_MAXLINK) {
p = xdr_reserve_space(xdr, 4);
@@ -2392,13 +2388,13 @@ out_acl:
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64((u64) svc_max_payload(rqstp));
+ p = xdr_encode_hyper(p, (u64) svc_max_payload(rqstp));
}
if (bmval0 & FATTR4_WORD0_MAXWRITE) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64((u64) svc_max_payload(rqstp));
+ p = xdr_encode_hyper(p, (u64) svc_max_payload(rqstp));
}
if (bmval1 & FATTR4_WORD1_MODE) {
p = xdr_reserve_space(xdr, 4);
@@ -2440,34 +2436,34 @@ out_acl:
if (!p)
goto out_resource;
dummy64 = (u64)statfs.f_bavail * (u64)statfs.f_bsize;
- WRITE64(dummy64);
+ p = xdr_encode_hyper(p, dummy64);
}
if (bmval1 & FATTR4_WORD1_SPACE_FREE) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
dummy64 = (u64)statfs.f_bfree * (u64)statfs.f_bsize;
- WRITE64(dummy64);
+ p = xdr_encode_hyper(p, dummy64);
}
if (bmval1 & FATTR4_WORD1_SPACE_TOTAL) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
dummy64 = (u64)statfs.f_blocks * (u64)statfs.f_bsize;
- WRITE64(dummy64);
+ p = xdr_encode_hyper(p, dummy64);
}
if (bmval1 & FATTR4_WORD1_SPACE_USED) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
dummy64 = (u64)stat.blocks << 9;
- WRITE64(dummy64);
+ p = xdr_encode_hyper(p, dummy64);
}
if (bmval1 & FATTR4_WORD1_TIME_ACCESS) {
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
- WRITE64((s64)stat.atime.tv_sec);
+ p = xdr_encode_hyper(p, (s64)stat.atime.tv_sec);
*p++ = cpu_to_be32(stat.atime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_TIME_DELTA) {
@@ -2482,14 +2478,14 @@ out_acl:
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
- WRITE64((s64)stat.ctime.tv_sec);
+ p = xdr_encode_hyper(p, (s64)stat.ctime.tv_sec);
*p++ = cpu_to_be32(stat.ctime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_TIME_MODIFY) {
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
- WRITE64((s64)stat.mtime.tv_sec);
+ p = xdr_encode_hyper(p, (s64)stat.mtime.tv_sec);
*p++ = cpu_to_be32(stat.mtime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) {
@@ -2503,7 +2499,7 @@ out_acl:
if (ignore_crossmnt == 0 &&
dentry == exp->ex_path.mnt->mnt_root)
get_parent_attributes(exp, &stat);
- WRITE64(stat.ino);
+ p = xdr_encode_hyper(p, stat.ino);
}
if (bmval2 & FATTR4_WORD2_SECURITY_LABEL) {
status = nfsd4_encode_security_label(xdr, rqstp, context, contextlen);
@@ -2890,15 +2886,15 @@ again:
}
return nfserr_resource;
}
- WRITE64(ld->ld_start);
- WRITE64(ld->ld_length);
+ p = xdr_encode_hyper(p, ld->ld_start);
+ p = xdr_encode_hyper(p, ld->ld_length);
*p++ = cpu_to_be32(ld->ld_type);
if (conf->len) {
WRITEMEM(&ld->ld_clientid, 8);
*p++ = cpu_to_be32(conf->len);
WRITEMEM(conf->data, conf->len);
} else { /* non - nfsv4 lock in conflict, no clientid nor owner */
- WRITE64((u64)0); /* clientid */
+ p = xdr_encode_hyper(p, (u64)0); /* clientid */
*p++ = cpu_to_be32(0); /* length of owner name */
}
return nfserr_denied;
@@ -3646,7 +3642,7 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
return nfserr_resource;

/* The server_owner struct */
- WRITE64(minor_id); /* Minor id */
+ p = xdr_encode_hyper(p, minor_id); /* Minor id */
/* major id */
*p++ = cpu_to_be32(major_id_sz);
WRITEMEM(major_id, major_id_sz);
--
1.7.9.5


2014-05-11 20:53:08

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 39/43] nfsd4: really fix nfs4err_resource in 4.1 case

From: "J. Bruce Fields" <[email protected]>

encode_getattr, for example, can return nfserr_resource to indicate it
ran out of buffer space. That's not a legal error in the 4.1 case.
And in the 4.1 case, if we ran out of buffer space, we should have
exceeded a session limit too.

(Note in 1bc49d83c37cfaf46be357757e592711e67f9809 "nfsd4: fix
nfs4err_resource in 4.1 case" we originally tried fixing this error
return before fixing the problem that we could error out while we still
had lots of available space. The result was to trade one illegal error
for another in those cases. We decided that was helpful, so reverted
the change in fc208d026be0c7d60db9118583fc62f6ca97743d, and are only
reinstating it now that we've elimited almost all of those cases.)

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 91a50a0..5ff7bea 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3900,6 +3900,13 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
space_needed = COMPOUND_ERR_SLACK_SPACE;
op->status = nfsd4_check_resp_size(resp, space_needed);
}
+ if (op->status == nfserr_resource && nfsd4_has_session(&resp->cstate)) {
+ struct nfsd4_slot *slot = resp->cstate.slot;
+
+ if (slot->sl_flags & NFSD4_SLOT_CACHETHIS)
+ op->status = nfserr_rep_too_big_to_cache;
+ else
+ op->status = nfserr_rep_too_big;
if (op->status == nfserr_resource ||
op->status == nfserr_rep_too_big ||
op->status == nfserr_rep_too_big_to_cache) {
--
1.7.9.5


2014-05-11 20:53:08

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 40/43] nfsd4: kill WRITE32

From: "J. Bruce Fields" <[email protected]>

These macros just obsucre what's going on. Adopt the convention of the
client-side code.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 303 ++++++++++++++++++++++++++---------------------------
1 file changed, 151 insertions(+), 152 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 5ff7bea..db4c53c 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1699,7 +1699,6 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
DECODE_TAIL;
}

-#define WRITE32(n) *p++ = htonl(n)
#define WRITE64(n) do { \
*p++ = htonl((u32)((n) >> 32)); \
*p++ = htonl((u32)(n)); \
@@ -1788,7 +1787,7 @@ static __be32 nfsd4_encode_components_esc(struct xdr_stream *xdr, char sep, char
p = xdr_reserve_space(xdr, strlen + 4);
if (!p)
return nfserr_resource;
- WRITE32(strlen);
+ *p++ = cpu_to_be32(strlen);
WRITEMEM(str, strlen);
count++;
}
@@ -1867,7 +1866,7 @@ static __be32 nfsd4_encode_path(struct xdr_stream *xdr, const struct path *root,
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_free;
- WRITE32(ncomponents);
+ *p++ = cpu_to_be32(ncomponents);

while (ncomponents) {
struct dentry *dentry = components[ncomponents - 1];
@@ -1880,7 +1879,7 @@ static __be32 nfsd4_encode_path(struct xdr_stream *xdr, const struct path *root,
spin_unlock(&dentry->d_lock);
goto out_free;
}
- WRITE32(len);
+ *p++ = cpu_to_be32(len);
WRITEMEM(dentry->d_name.name, len);
dprintk("/%s", dentry->d_name.name);
spin_unlock(&dentry->d_lock);
@@ -1927,7 +1926,7 @@ static __be32 nfsd4_encode_fs_locations(struct xdr_stream *xdr, struct svc_rqst
p = xdr_reserve_space(xdr, 4);
if (!p)
return nfserr_resource;
- WRITE32(fslocs->locations_count);
+ *p++ = cpu_to_be32(fslocs->locations_count);
for (i=0; i<fslocs->locations_count; i++) {
status = nfsd4_encode_fs_location4(xdr, &fslocs->locations[i]);
if (status)
@@ -1979,8 +1978,8 @@ nfsd4_encode_security_label(struct xdr_stream *xdr, struct svc_rqst *rqstp, void
* For now we use a 0 here to indicate the null translation; in
* the future we may place a call to translation code here.
*/
- WRITE32(0); /* lfs */
- WRITE32(0); /* pi */
+ *p++ = cpu_to_be32(0); /* lfs */
+ *p++ = cpu_to_be32(0); /* pi */
p = xdr_encode_opaque(p, context, len);
return 0;
}
@@ -2127,23 +2126,23 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
p = xdr_reserve_space(xdr, 16);
if (!p)
goto out_resource;
- WRITE32(3);
- WRITE32(bmval0);
- WRITE32(bmval1);
- WRITE32(bmval2);
+ *p++ = cpu_to_be32(3);
+ *p++ = cpu_to_be32(bmval0);
+ *p++ = cpu_to_be32(bmval1);
+ *p++ = cpu_to_be32(bmval2);
} else if (bmval1) {
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
- WRITE32(2);
- WRITE32(bmval0);
- WRITE32(bmval1);
+ *p++ = cpu_to_be32(2);
+ *p++ = cpu_to_be32(bmval0);
+ *p++ = cpu_to_be32(bmval1);
} else {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE32(1);
- WRITE32(bmval0);
+ *p++ = cpu_to_be32(1);
+ *p++ = cpu_to_be32(bmval0);
}

attrlen_offset = xdr->buf->len;
@@ -2165,17 +2164,17 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
- WRITE32(2);
- WRITE32(word0);
- WRITE32(word1);
+ *p++ = cpu_to_be32(2);
+ *p++ = cpu_to_be32(word0);
+ *p++ = cpu_to_be32(word1);
} else {
p = xdr_reserve_space(xdr, 16);
if (!p)
goto out_resource;
- WRITE32(3);
- WRITE32(word0);
- WRITE32(word1);
- WRITE32(word2);
+ *p++ = cpu_to_be32(3);
+ *p++ = cpu_to_be32(word0);
+ *p++ = cpu_to_be32(word1);
+ *p++ = cpu_to_be32(word2);
}
}
if (bmval0 & FATTR4_WORD0_TYPE) {
@@ -2187,16 +2186,16 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
status = nfserr_serverfault;
goto out;
}
- WRITE32(dummy);
+ *p++ = cpu_to_be32(dummy);
}
if (bmval0 & FATTR4_WORD0_FH_EXPIRE_TYPE) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
if (exp->ex_flags & NFSEXP_NOSUBTREECHECK)
- WRITE32(NFS4_FH_PERSISTENT);
+ *p++ = cpu_to_be32(NFS4_FH_PERSISTENT);
else
- WRITE32(NFS4_FH_PERSISTENT|NFS4_FH_VOL_RENAME);
+ *p++ = cpu_to_be32(NFS4_FH_PERSISTENT|NFS4_FH_VOL_RENAME);
}
if (bmval0 & FATTR4_WORD0_CHANGE) {
p = xdr_reserve_space(xdr, 8);
@@ -2214,19 +2213,19 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval0 & FATTR4_WORD0_SYMLINK_SUPPORT) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval0 & FATTR4_WORD0_NAMED_ATTR) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
}
if (bmval0 & FATTR4_WORD0_FSID) {
p = xdr_reserve_space(xdr, 16);
@@ -2241,10 +2240,10 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
WRITE64((u64)0);
break;
case FSIDSOURCE_DEV:
- WRITE32(0);
- WRITE32(MAJOR(stat.dev));
- WRITE32(0);
- WRITE32(MINOR(stat.dev));
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(MAJOR(stat.dev));
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(MINOR(stat.dev));
break;
case FSIDSOURCE_UUID:
WRITEMEM(exp->ex_uuid, 16);
@@ -2255,19 +2254,19 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
}
if (bmval0 & FATTR4_WORD0_LEASE_TIME) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(nn->nfsd4_lease);
+ *p++ = cpu_to_be32(nn->nfsd4_lease);
}
if (bmval0 & FATTR4_WORD0_RDATTR_ERROR) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(rdattr_err);
+ *p++ = cpu_to_be32(rdattr_err);
}
if (bmval0 & FATTR4_WORD0_ACL) {
struct nfs4_ace *ace;
@@ -2277,21 +2276,21 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export
if (!p)
goto out_resource;

- WRITE32(0);
+ *p++ = cpu_to_be32(0);
goto out_acl;
}
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(acl->naces);
+ *p++ = cpu_to_be32(acl->naces);

for (ace = acl->aces; ace < acl->aces + acl->naces; ace++) {
p = xdr_reserve_space(xdr, 4*3);
if (!p)
goto out_resource;
- WRITE32(ace->type);
- WRITE32(ace->flag);
- WRITE32(ace->access_mask & NFS4_ACE_MASK_ALL);
+ *p++ = cpu_to_be32(ace->type);
+ *p++ = cpu_to_be32(ace->flag);
+ *p++ = cpu_to_be32(ace->access_mask & NFS4_ACE_MASK_ALL);
status = nfsd4_encode_aclname(xdr, rqstp, ace);
if (status)
goto out;
@@ -2302,38 +2301,38 @@ out_acl:
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(aclsupport ?
+ *p++ = cpu_to_be32(aclsupport ?
ACL4_SUPPORT_ALLOW_ACL|ACL4_SUPPORT_DENY_ACL : 0);
}
if (bmval0 & FATTR4_WORD0_CANSETTIME) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval0 & FATTR4_WORD0_CASE_INSENSITIVE) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
}
if (bmval0 & FATTR4_WORD0_CASE_PRESERVING) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval0 & FATTR4_WORD0_CHOWN_RESTRICTED) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval0 & FATTR4_WORD0_FILEHANDLE) {
p = xdr_reserve_space(xdr, fhp->fh_handle.fh_size + 4);
if (!p)
goto out_resource;
- WRITE32(fhp->fh_handle.fh_size);
+ *p++ = cpu_to_be32(fhp->fh_handle.fh_size);
WRITEMEM(&fhp->fh_handle.fh_base, fhp->fh_handle.fh_size);
}
if (bmval0 & FATTR4_WORD0_FILEID) {
@@ -2369,7 +2368,7 @@ out_acl:
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval0 & FATTR4_WORD0_MAXFILESIZE) {
p = xdr_reserve_space(xdr, 8);
@@ -2381,13 +2380,13 @@ out_acl:
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(255);
+ *p++ = cpu_to_be32(255);
}
if (bmval0 & FATTR4_WORD0_MAXNAME) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(statfs.f_namelen);
+ *p++ = cpu_to_be32(statfs.f_namelen);
}
if (bmval0 & FATTR4_WORD0_MAXREAD) {
p = xdr_reserve_space(xdr, 8);
@@ -2405,19 +2404,19 @@ out_acl:
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(stat.mode & S_IALLUGO);
+ *p++ = cpu_to_be32(stat.mode & S_IALLUGO);
}
if (bmval1 & FATTR4_WORD1_NO_TRUNC) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval1 & FATTR4_WORD1_NUMLINKS) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(stat.nlink);
+ *p++ = cpu_to_be32(stat.nlink);
}
if (bmval1 & FATTR4_WORD1_OWNER) {
status = nfsd4_encode_user(xdr, rqstp, stat.uid);
@@ -2433,8 +2432,8 @@ out_acl:
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE32((u32) MAJOR(stat.rdev));
- WRITE32((u32) MINOR(stat.rdev));
+ *p++ = cpu_to_be32((u32) MAJOR(stat.rdev));
+ *p++ = cpu_to_be32((u32) MINOR(stat.rdev));
}
if (bmval1 & FATTR4_WORD1_SPACE_AVAIL) {
p = xdr_reserve_space(xdr, 8);
@@ -2469,29 +2468,29 @@ out_acl:
if (!p)
goto out_resource;
WRITE64((s64)stat.atime.tv_sec);
- WRITE32(stat.atime.tv_nsec);
+ *p++ = cpu_to_be32(stat.atime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_TIME_DELTA) {
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
- WRITE32(0);
- WRITE32(1);
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(1);
+ *p++ = cpu_to_be32(0);
}
if (bmval1 & FATTR4_WORD1_TIME_METADATA) {
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
WRITE64((s64)stat.ctime.tv_sec);
- WRITE32(stat.ctime.tv_nsec);
+ *p++ = cpu_to_be32(stat.ctime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_TIME_MODIFY) {
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
WRITE64((s64)stat.mtime.tv_sec);
- WRITE32(stat.mtime.tv_nsec);
+ *p++ = cpu_to_be32(stat.mtime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) {
p = xdr_reserve_space(xdr, 8);
@@ -2515,10 +2514,10 @@ out_acl:
p = xdr_reserve_space(xdr, 16);
if (!p)
goto out_resource;
- WRITE32(3);
- WRITE32(NFSD_SUPPATTR_EXCLCREAT_WORD0);
- WRITE32(NFSD_SUPPATTR_EXCLCREAT_WORD1);
- WRITE32(NFSD_SUPPATTR_EXCLCREAT_WORD2);
+ *p++ = cpu_to_be32(3);
+ *p++ = cpu_to_be32(NFSD_SUPPATTR_EXCLCREAT_WORD0);
+ *p++ = cpu_to_be32(NFSD_SUPPATTR_EXCLCREAT_WORD1);
+ *p++ = cpu_to_be32(NFSD_SUPPATTR_EXCLCREAT_WORD2);
}

attrlen = htonl(xdr->buf->len - attrlen_offset - 4);
@@ -2750,7 +2749,7 @@ nfsd4_encode_stateid(struct xdr_stream *xdr, stateid_t *sid)
p = xdr_reserve_space(xdr, sizeof(stateid_t));
if (!p)
return nfserr_resource;
- WRITE32(sid->si_generation);
+ *p++ = cpu_to_be32(sid->si_generation);
WRITEMEM(&sid->si_opaque, sizeof(stateid_opaque_t));
return 0;
}
@@ -2765,8 +2764,8 @@ nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
p = xdr_reserve_space(xdr, 8);
if (!p)
return nfserr_resource;
- WRITE32(access->ac_supported);
- WRITE32(access->ac_resp_access);
+ *p++ = cpu_to_be32(access->ac_supported);
+ *p++ = cpu_to_be32(access->ac_resp_access);
}
return nfserr;
}
@@ -2781,9 +2780,9 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
if (!p)
return nfserr_resource;
WRITEMEM(bcts->sessionid.data, NFS4_MAX_SESSIONID_LEN);
- WRITE32(bcts->dir);
+ *p++ = cpu_to_be32(bcts->dir);
/* Sorry, we do not yet support RDMA over 4.1: */
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
}
return nfserr;
}
@@ -2826,9 +2825,9 @@ nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
if (!p)
return nfserr_resource;
write_cinfo(&p, &create->cr_cinfo);
- WRITE32(2);
- WRITE32(create->cr_bmval[0]);
- WRITE32(create->cr_bmval[1]);
+ *p++ = cpu_to_be32(2);
+ *p++ = cpu_to_be32(create->cr_bmval[0]);
+ *p++ = cpu_to_be32(create->cr_bmval[1]);
}
return nfserr;
}
@@ -2861,7 +2860,7 @@ nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh
p = xdr_reserve_space(xdr, len + 4);
if (!p)
return nfserr_resource;
- WRITE32(len);
+ *p++ = cpu_to_be32(len);
WRITEMEM(&fhp->fh_handle.fh_base, len);
}
return nfserr;
@@ -2893,14 +2892,14 @@ again:
}
WRITE64(ld->ld_start);
WRITE64(ld->ld_length);
- WRITE32(ld->ld_type);
+ *p++ = cpu_to_be32(ld->ld_type);
if (conf->len) {
WRITEMEM(&ld->ld_clientid, 8);
- WRITE32(conf->len);
+ *p++ = cpu_to_be32(conf->len);
WRITEMEM(conf->data, conf->len);
} else { /* non - nfsv4 lock in conflict, no clientid nor owner */
WRITE64((u64)0); /* clientid */
- WRITE32(0); /* length of owner name */
+ *p++ = cpu_to_be32(0); /* length of owner name */
}
return nfserr_denied;
}
@@ -2972,11 +2971,11 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
if (!p)
return nfserr_resource;
write_cinfo(&p, &open->op_cinfo);
- WRITE32(open->op_rflags);
- WRITE32(2);
- WRITE32(open->op_bmval[0]);
- WRITE32(open->op_bmval[1]);
- WRITE32(open->op_delegate_type);
+ *p++ = cpu_to_be32(open->op_rflags);
+ *p++ = cpu_to_be32(2);
+ *p++ = cpu_to_be32(open->op_bmval[0]);
+ *p++ = cpu_to_be32(open->op_bmval[1]);
+ *p++ = cpu_to_be32(open->op_delegate_type);

switch (open->op_delegate_type) {
case NFS4_OPEN_DELEGATE_NONE:
@@ -2988,15 +2987,15 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
p = xdr_reserve_space(xdr, 20);
if (!p)
return nfserr_resource;
- WRITE32(open->op_recall);
+ *p++ = cpu_to_be32(open->op_recall);

/*
* TODO: ACE's in delegations
*/
- WRITE32(NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE);
- WRITE32(0);
- WRITE32(0);
- WRITE32(0); /* XXX: is NULL principal ok? */
+ *p++ = cpu_to_be32(NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0); /* XXX: is NULL principal ok? */
break;
case NFS4_OPEN_DELEGATE_WRITE:
nfserr = nfsd4_encode_stateid(xdr, &open->op_delegate_stateid);
@@ -3005,22 +3004,22 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
p = xdr_reserve_space(xdr, 32);
if (!p)
return nfserr_resource;
- WRITE32(0);
+ *p++ = cpu_to_be32(0);

/*
* TODO: space_limit's in delegations
*/
- WRITE32(NFS4_LIMIT_SIZE);
- WRITE32(~(u32)0);
- WRITE32(~(u32)0);
+ *p++ = cpu_to_be32(NFS4_LIMIT_SIZE);
+ *p++ = cpu_to_be32(~(u32)0);
+ *p++ = cpu_to_be32(~(u32)0);

/*
* TODO: ACE's in delegations
*/
- WRITE32(NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE);
- WRITE32(0);
- WRITE32(0);
- WRITE32(0); /* XXX: is NULL principal ok? */
+ *p++ = cpu_to_be32(NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0); /* XXX: is NULL principal ok? */
break;
case NFS4_OPEN_DELEGATE_NONE_EXT: /* 4.1 */
switch (open->op_why_no_deleg) {
@@ -3029,14 +3028,14 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
p = xdr_reserve_space(xdr, 8);
if (!p)
return nfserr_resource;
- WRITE32(open->op_why_no_deleg);
- WRITE32(0); /* deleg signaling not supported yet */
+ *p++ = cpu_to_be32(open->op_why_no_deleg);
+ *p++ = cpu_to_be32(0); /* deleg signaling not supported yet */
break;
default:
p = xdr_reserve_space(xdr, 4);
if (!p)
return nfserr_resource;
- WRITE32(open->op_why_no_deleg);
+ *p++ = cpu_to_be32(open->op_why_no_deleg);
}
break;
default:
@@ -3310,8 +3309,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
return nfserr_resource;

/* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
- WRITE32(0);
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
resp->xdr.buf->head[0].iov_len = ((char*)resp->xdr.p)
- (char*)resp->xdr.buf->head[0].iov_base;

@@ -3459,17 +3458,17 @@ nfsd4_do_encode_secinfo(struct xdr_stream *xdr,
p = xdr_reserve_space(xdr, 4 + 4 + XDR_LEN(info.oid.len) + 4 + 4);
if (!p)
goto out;
- WRITE32(RPC_AUTH_GSS);
- WRITE32(info.oid.len);
+ *p++ = cpu_to_be32(RPC_AUTH_GSS);
+ *p++ = cpu_to_be32(info.oid.len);
WRITEMEM(info.oid.data, info.oid.len);
- WRITE32(info.qop);
- WRITE32(info.service);
+ *p++ = cpu_to_be32(info.qop);
+ *p++ = cpu_to_be32(info.service);
} else if (pf < RPC_AUTH_MAXFLAVOR) {
supported++;
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out;
- WRITE32(pf);
+ *p++ = cpu_to_be32(pf);
} else {
if (report)
pr_warn("NFS: SECINFO: security flavor %u "
@@ -3519,16 +3518,16 @@ nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
if (!p)
return nfserr_resource;
if (nfserr) {
- WRITE32(3);
- WRITE32(0);
- WRITE32(0);
- WRITE32(0);
+ *p++ = cpu_to_be32(3);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
}
else {
- WRITE32(3);
- WRITE32(setattr->sa_bmval[0]);
- WRITE32(setattr->sa_bmval[1]);
- WRITE32(setattr->sa_bmval[2]);
+ *p++ = cpu_to_be32(3);
+ *p++ = cpu_to_be32(setattr->sa_bmval[0]);
+ *p++ = cpu_to_be32(setattr->sa_bmval[1]);
+ *p++ = cpu_to_be32(setattr->sa_bmval[2]);
}
return nfserr;
}
@@ -3550,8 +3549,8 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct n
p = xdr_reserve_space(xdr, 8);
if (!p)
return nfserr_resource;
- WRITE32(0);
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
}
return nfserr;
}
@@ -3566,8 +3565,8 @@ nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_w
p = xdr_reserve_space(xdr, 16);
if (!p)
return nfserr_resource;
- WRITE32(write->wr_bytes_written);
- WRITE32(write->wr_how_written);
+ *p++ = cpu_to_be32(write->wr_bytes_written);
+ *p++ = cpu_to_be32(write->wr_how_written);
WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
}
return nfserr;
@@ -3610,10 +3609,10 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
return nfserr_resource;

WRITEMEM(&exid->clientid, 8);
- WRITE32(exid->seqid);
- WRITE32(exid->flags);
+ *p++ = cpu_to_be32(exid->seqid);
+ *p++ = cpu_to_be32(exid->flags);

- WRITE32(exid->spa_how);
+ *p++ = cpu_to_be32(exid->spa_how);

switch (exid->spa_how) {
case SP4_NONE:
@@ -3625,11 +3624,11 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
return nfserr_resource;

/* spo_must_enforce bitmap: */
- WRITE32(2);
- WRITE32(nfs4_minimal_spo_must_enforce[0]);
- WRITE32(nfs4_minimal_spo_must_enforce[1]);
+ *p++ = cpu_to_be32(2);
+ *p++ = cpu_to_be32(nfs4_minimal_spo_must_enforce[0]);
+ *p++ = cpu_to_be32(nfs4_minimal_spo_must_enforce[1]);
/* empty spo_must_allow bitmap: */
- WRITE32(0);
+ *p++ = cpu_to_be32(0);

break;
default:
@@ -3649,15 +3648,15 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
/* The server_owner struct */
WRITE64(minor_id); /* Minor id */
/* major id */
- WRITE32(major_id_sz);
+ *p++ = cpu_to_be32(major_id_sz);
WRITEMEM(major_id, major_id_sz);

/* Server scope */
- WRITE32(server_scope_sz);
+ *p++ = cpu_to_be32(server_scope_sz);
WRITEMEM(server_scope, server_scope_sz);

/* Implementation id */
- WRITE32(0); /* zero length nfs_impl_id4 array */
+ *p++ = cpu_to_be32(0); /* zero length nfs_impl_id4 array */
return 0;
}

@@ -3675,43 +3674,43 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
if (!p)
return nfserr_resource;
WRITEMEM(sess->sessionid.data, NFS4_MAX_SESSIONID_LEN);
- WRITE32(sess->seqid);
- WRITE32(sess->flags);
+ *p++ = cpu_to_be32(sess->seqid);
+ *p++ = cpu_to_be32(sess->flags);

p = xdr_reserve_space(xdr, 28);
if (!p)
return nfserr_resource;
- WRITE32(0); /* headerpadsz */
- WRITE32(sess->fore_channel.maxreq_sz);
- WRITE32(sess->fore_channel.maxresp_sz);
- WRITE32(sess->fore_channel.maxresp_cached);
- WRITE32(sess->fore_channel.maxops);
- WRITE32(sess->fore_channel.maxreqs);
- WRITE32(sess->fore_channel.nr_rdma_attrs);
+ *p++ = cpu_to_be32(0); /* headerpadsz */
+ *p++ = cpu_to_be32(sess->fore_channel.maxreq_sz);
+ *p++ = cpu_to_be32(sess->fore_channel.maxresp_sz);
+ *p++ = cpu_to_be32(sess->fore_channel.maxresp_cached);
+ *p++ = cpu_to_be32(sess->fore_channel.maxops);
+ *p++ = cpu_to_be32(sess->fore_channel.maxreqs);
+ *p++ = cpu_to_be32(sess->fore_channel.nr_rdma_attrs);

if (sess->fore_channel.nr_rdma_attrs) {
p = xdr_reserve_space(xdr, 4);
if (!p)
return nfserr_resource;
- WRITE32(sess->fore_channel.rdma_attrs);
+ *p++ = cpu_to_be32(sess->fore_channel.rdma_attrs);
}

p = xdr_reserve_space(xdr, 28);
if (!p)
return nfserr_resource;
- WRITE32(0); /* headerpadsz */
- WRITE32(sess->back_channel.maxreq_sz);
- WRITE32(sess->back_channel.maxresp_sz);
- WRITE32(sess->back_channel.maxresp_cached);
- WRITE32(sess->back_channel.maxops);
- WRITE32(sess->back_channel.maxreqs);
- WRITE32(sess->back_channel.nr_rdma_attrs);
+ *p++ = cpu_to_be32(0); /* headerpadsz */
+ *p++ = cpu_to_be32(sess->back_channel.maxreq_sz);
+ *p++ = cpu_to_be32(sess->back_channel.maxresp_sz);
+ *p++ = cpu_to_be32(sess->back_channel.maxresp_cached);
+ *p++ = cpu_to_be32(sess->back_channel.maxops);
+ *p++ = cpu_to_be32(sess->back_channel.maxreqs);
+ *p++ = cpu_to_be32(sess->back_channel.nr_rdma_attrs);

if (sess->back_channel.nr_rdma_attrs) {
p = xdr_reserve_space(xdr, 4);
if (!p)
return nfserr_resource;
- WRITE32(sess->back_channel.rdma_attrs);
+ *p++ = cpu_to_be32(sess->back_channel.rdma_attrs);
}
return 0;
}
@@ -3730,12 +3729,12 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
if (!p)
return nfserr_resource;
WRITEMEM(seq->sessionid.data, NFS4_MAX_SESSIONID_LEN);
- WRITE32(seq->seqid);
- WRITE32(seq->slotid);
+ *p++ = cpu_to_be32(seq->seqid);
+ *p++ = cpu_to_be32(seq->slotid);
/* Note slotid's are numbered from zero: */
- WRITE32(seq->maxslots - 1); /* sr_highest_slotid */
- WRITE32(seq->maxslots - 1); /* sr_target_highest_slotid */
- WRITE32(seq->status_flags);
+ *p++ = cpu_to_be32(seq->maxslots - 1); /* sr_highest_slotid */
+ *p++ = cpu_to_be32(seq->maxslots - 1); /* sr_target_highest_slotid */
+ *p++ = cpu_to_be32(seq->status_flags);

resp->cstate.data_offset = xdr->buf->len; /* DRC cache data pointer */
return 0;
@@ -3882,7 +3881,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
WARN_ON_ONCE(1);
return;
}
- WRITE32(op->opnum);
+ *p++ = cpu_to_be32(op->opnum);
post_err_offset = xdr->buf->len;

if (op->opnum == OP_ILLEGAL)
@@ -3959,7 +3958,7 @@ nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op)
WARN_ON_ONCE(1);
return;
}
- WRITE32(op->opnum);
+ *p++ = cpu_to_be32(op->opnum);
*p++ = rp->rp_status; /* already xdr'ed */

WRITEMEM(rp->rp_buf, rp->rp_buflen);
--
1.7.9.5


2014-05-11 20:52:55

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 02/43] nfsd4: tweak nfsd4_encode_getattr to take xdr_stream

From: "J. Bruce Fields" <[email protected]>

Just change the nfsd4_encode_getattr api. Not changing any code or
adding any new functionality yet.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 6 +++---
fs/nfsd/nfs4xdr.c | 44 ++++++++++++++++++++++++++++++++------------
fs/nfsd/xdr4.h | 7 ++++---
3 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 46370f5..1499aa4 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1061,10 +1061,10 @@ _nfsd4_verify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return nfserr_jukebox;

p = buf;
- status = nfsd4_encode_fattr(&cstate->current_fh,
+ status = nfsd4_encode_fattr_to_buf(&p, count, &cstate->current_fh,
cstate->current_fh.fh_export,
- cstate->current_fh.fh_dentry, &p,
- count, verify->ve_bmval,
+ cstate->current_fh.fh_dentry,
+ verify->ve_bmval,
rqstp, 0);
/*
* If nfsd4_encode_fattr() ran out of space, assume that's because
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index ef65ffc..6cdd660 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2045,12 +2045,10 @@ static int get_parent_attributes(struct svc_export *exp, struct kstat *stat)
/*
* Note: @fhp can be NULL; in this case, we might have to compose the filehandle
* ourselves.
- *
- * countp is the buffer size in _words_
*/
__be32
-nfsd4_encode_fattr(struct svc_fh *fhp, struct svc_export *exp,
- struct dentry *dentry, __be32 **buffer, int count, u32 *bmval,
+nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp, struct svc_export *exp,
+ struct dentry *dentry, u32 *bmval,
struct svc_rqst *rqstp, int ignore_crossmnt)
{
u32 bmval0 = bmval[0];
@@ -2059,12 +2057,12 @@ nfsd4_encode_fattr(struct svc_fh *fhp, struct svc_export *exp,
struct kstat stat;
struct svc_fh *tempfh = NULL;
struct kstatfs statfs;
- int buflen = count << 2;
+ __be32 *p = xdr->p;
+ int buflen = xdr->buf->buflen;
__be32 *attrlenp;
u32 dummy;
u64 dummy64;
u32 rdattr_err = 0;
- __be32 *p = *buffer;
__be32 status;
int err;
int aclsupport = 0;
@@ -2491,7 +2489,7 @@ out_acl:
}

*attrlenp = htonl((char *)p - (char *)attrlenp - 4);
- *buffer = p;
+ xdr->p = p;
status = nfs_ok;

out:
@@ -2513,6 +2511,26 @@ out_resource:
goto out;
}

+__be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
+ struct svc_fh *fhp, struct svc_export *exp,
+ struct dentry *dentry, u32 *bmval,
+ struct svc_rqst *rqstp, int ignore_crossmnt)
+{
+ struct xdr_buf dummy = {
+ .head[0] = {
+ .iov_base = *p,
+ },
+ .buflen = words << 2,
+ };
+ struct xdr_stream xdr;
+ __be32 ret;
+
+ xdr_init_encode(&xdr, &dummy, NULL);
+ ret = nfsd4_encode_fattr(&xdr, fhp, exp, dentry, bmval, rqstp, ignore_crossmnt);
+ *p = xdr.p;
+ return ret;
+}
+
static inline int attributes_need_mount(u32 *bmval)
{
if (bmval[0] & ~(FATTR4_WORD0_RDATTR_ERROR | FATTR4_WORD0_LEASE_TIME))
@@ -2576,7 +2594,7 @@ nfsd4_encode_dirent_fattr(struct nfsd4_readdir *cd,

}
out_encode:
- nfserr = nfsd4_encode_fattr(NULL, exp, dentry, p, buflen, cd->rd_bmval,
+ nfserr = nfsd4_encode_fattr_to_buf(p, buflen, NULL, exp, dentry, cd->rd_bmval,
cd->rd_rqstp, ignore_crossmnt);
out_put:
dput(dentry);
@@ -2746,14 +2764,16 @@ static __be32
nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_getattr *getattr)
{
struct svc_fh *fhp = getattr->ga_fhp;
- int buflen;
+ struct xdr_stream *xdr = &resp->xdr;
+ struct xdr_buf *buf = resp->xdr.buf;

if (nfserr)
return nfserr;

- buflen = resp->xdr.end - resp->xdr.p - (COMPOUND_ERR_SLACK_SPACE >> 2);
- nfserr = nfsd4_encode_fattr(fhp, fhp->fh_export, fhp->fh_dentry,
- &resp->xdr.p, buflen, getattr->ga_bmval,
+ buf->buflen = (void *)resp->xdr.end - (void *)resp->xdr.p
+ - COMPOUND_ERR_SLACK_SPACE;
+ nfserr = nfsd4_encode_fattr(xdr, fhp, fhp->fh_export, fhp->fh_dentry,
+ getattr->ga_bmval,
resp->rqstp, 0);
return nfserr;
}
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 6884d70..f62a055 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -562,9 +562,10 @@ int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *,
__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32);
void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *);
void nfsd4_encode_replay(struct nfsd4_compoundres *resp, struct nfsd4_op *op);
-__be32 nfsd4_encode_fattr(struct svc_fh *fhp, struct svc_export *exp,
- struct dentry *dentry, __be32 **buffer, int countp,
- u32 *bmval, struct svc_rqst *, int ignore_crossmnt);
+__be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
+ struct svc_fh *fhp, struct svc_export *exp,
+ struct dentry *dentry,
+ u32 *bmval, struct svc_rqst *, int ignore_crossmnt);
extern __be32 nfsd4_setclientid(struct svc_rqst *rqstp,
struct nfsd4_compound_state *,
struct nfsd4_setclientid *setclid);
--
1.7.9.5


2014-05-11 20:53:00

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 19/43] nfsd4: remove redundant encode buffer size checking

From: "J. Bruce Fields" <[email protected]>

Now that all op encoders can handle running out of space, we no longer
need to check the remaining size for every operation; only nonidempotent
operations need that check, and that can be done by
nfsd4_check_resp_size.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 14 --------------
fs/nfsd/nfs4xdr.c | 22 +++++++++++++---------
2 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index d9fe000..d50fc14 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1228,7 +1228,6 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate = &resp->cstate;
struct svc_fh *current_fh = &cstate->current_fh;
struct svc_fh *save_fh = &cstate->save_fh;
- int slack_bytes;
u32 plen = 0;
__be32 status;

@@ -1282,19 +1281,6 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
goto encode_op;
}

- /* We must be able to encode a successful response to
- * this operation, with enough room left over to encode a
- * failed response to the next operation. If we don't
- * have enough room, fail with ERR_RESOURCE.
- */
- slack_bytes = (char *)resp->xdr.end - (char *)resp->xdr.p;
- if (slack_bytes < COMPOUND_SLACK_SPACE
- + COMPOUND_ERR_SLACK_SPACE) {
- BUG_ON(slack_bytes < COMPOUND_ERR_SLACK_SPACE);
- op->status = nfserr_resource;
- goto encode_op;
- }
-
opdesc = OPDESC(op);

if (!current_fh->fh_dentry) {
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 4c036eb..a330dd7 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3736,20 +3736,24 @@ static nfsd4_enc nfsd4_enc_ops[] = {
__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 pad)
{
struct xdr_buf *buf = &resp->rqstp->rq_res;
- struct nfsd4_session *session = NULL;
+ struct nfsd4_session *session = resp->cstate.session;
struct nfsd4_slot *slot = resp->cstate.slot;
+ int slack_bytes = (char *)resp->xdr.end - (char *)resp->xdr.p;

- if (!nfsd4_has_session(&resp->cstate))
- return 0;
+ if (nfsd4_has_session(&resp->cstate)) {

- session = resp->cstate.session;
+ if (buf->len + pad > session->se_fchannel.maxresp_sz)
+ return nfserr_rep_too_big;

- if (buf->len + pad > session->se_fchannel.maxresp_sz)
- return nfserr_rep_too_big;
+ if ((slot->sl_flags & NFSD4_SLOT_CACHETHIS) &&
+ buf->len + pad > session->se_fchannel.maxresp_cached)
+ return nfserr_rep_too_big_to_cache;
+ }

- if ((slot->sl_flags & NFSD4_SLOT_CACHETHIS) &&
- buf->len + pad > session->se_fchannel.maxresp_cached)
- return nfserr_rep_too_big_to_cache;
+ if (pad > slack_bytes) {
+ WARN_ON_ONCE(nfsd4_has_session(&resp->cstate));
+ return nfserr_resource;
+ }

return 0;
}
--
1.7.9.5


2014-05-11 20:53:03

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 28/43] nfsd4: adjust buflen to session channel limit

From: "J. Bruce Fields" <[email protected]>

We can simplify session limit enforcement by restricting the xdr buflen
to the session size.

Also fix a preexisting bug: we should really have been taking into
account the auth-required space when comparing against session limits,
which are limits on the size of the entire rpc reply, including any krb5
overhead.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4state.c | 11 +++++++++++
fs/nfsd/nfs4xdr.c | 24 ++++++++----------------
2 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index dd7efe9..620d240 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2204,11 +2204,13 @@ nfsd4_sequence(struct svc_rqst *rqstp,
struct nfsd4_sequence *seq)
{
struct nfsd4_compoundres *resp = rqstp->rq_resp;
+ struct xdr_stream *xdr = &resp->xdr;
struct nfsd4_session *session;
struct nfs4_client *clp;
struct nfsd4_slot *slot;
struct nfsd4_conn *conn;
__be32 status;
+ int buflen;
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);

if (resp->opcnt != 1)
@@ -2277,6 +2279,15 @@ nfsd4_sequence(struct svc_rqst *rqstp,
if (status)
goto out_put_session;

+ buflen = (seq->cachethis) ?
+ session->se_fchannel.maxresp_cached :
+ session->se_fchannel.maxresp_sz;
+ status = (seq->cachethis) ? nfserr_rep_too_big_to_cache :
+ nfserr_rep_too_big;
+ if (xdr_restrict_buflen(xdr, buflen - 2 * RPC_MAX_AUTH_SIZE))
+ goto out_put_session;
+
+ status = nfs_ok;
/* Success! bump slot seqid */
slot->sl_seqid = seq->seqid;
slot->sl_flags |= NFSD4_SLOT_INUSE;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 4751fd4..0d8a18d 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3759,25 +3759,17 @@ static nfsd4_enc nfsd4_enc_ops[] = {
__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 respsize)
{
struct xdr_buf *buf = &resp->rqstp->rq_res;
- struct nfsd4_session *session = resp->cstate.session;
+ struct nfsd4_slot *slot = resp->cstate.slot;

- if (nfsd4_has_session(&resp->cstate)) {
- struct nfsd4_slot *slot = resp->cstate.slot;
-
- if (buf->len + respsize > session->se_fchannel.maxresp_sz)
- return nfserr_rep_too_big;
-
- if ((slot->sl_flags & NFSD4_SLOT_CACHETHIS) &&
- buf->len + respsize > session->se_fchannel.maxresp_cached)
- return nfserr_rep_too_big_to_cache;
- }
-
- if (buf->len + respsize > buf->buflen) {
- WARN_ON_ONCE(nfsd4_has_session(&resp->cstate));
+ if (buf->len + respsize <= buf->buflen)
+ return nfs_ok;
+ if (!nfsd4_has_session(&resp->cstate))
return nfserr_resource;
+ if (slot->sl_flags & NFSD4_SLOT_CACHETHIS) {
+ WARN_ON_ONCE(1);
+ return nfserr_rep_too_big_to_cache;
}
-
- return 0;
+ return nfserr_rep_too_big;
}

void
--
1.7.9.5


2014-05-22 15:13:26

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 01/43] nfsd4: embed xdr_stream in nfsd4_compoundres

On Fri, May 16, 2014 at 05:58:21PM +0800, Kinglong Mee wrote:
> On 5/12/2014 04:52, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <[email protected]>
> >
> > This is a mechanical transformation with no change in behavior.
> >
> > Signed-off-by: J. Bruce Fields <[email protected]>
> > ---
> > fs/nfsd/nfs4proc.c | 12 +++++-----
> > fs/nfsd/nfs4state.c | 8 +++----
> > fs/nfsd/nfs4xdr.c | 65 ++++++++++++++++++++++++++-------------------------
> > fs/nfsd/xdr4.h | 4 +---
> > 4 files changed, 44 insertions(+), 45 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index 2c1ee70..46370f5 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -1268,13 +1268,13 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
> > u32 plen = 0;
> > __be32 status;
> >
> > - resp->xbuf = &rqstp->rq_res;
> > - resp->p = rqstp->rq_res.head[0].iov_base +
> > + resp->xdr.buf = &rqstp->rq_res;
> > + resp->xdr.p = rqstp->rq_res.head[0].iov_base +
> > rqstp->rq_res.head[0].iov_len;
>
> I'd like using a temp pointer as (struct xdr_stream *resp_xdr = &resp->xdr;).

Agreed. I move this code into a helper function a couple patches later
and define the temp pointer then.

--b.

>
> thanks,
> Kinglong Mee
>
> > - resp->tagp = resp->p;
> > + resp->tagp = resp->xdr.p;
> > /* reserve space for: taglen, tag, and opcnt */
> > - resp->p += 2 + XDR_QUADLEN(args->taglen);
> > - resp->end = rqstp->rq_res.head[0].iov_base + PAGE_SIZE;
> > + resp->xdr.p += 2 + XDR_QUADLEN(args->taglen);
> > + resp->xdr.end = rqstp->rq_res.head[0].iov_base + PAGE_SIZE;
> > resp->taglen = args->taglen;
> > resp->tag = args->tag;
> > resp->opcnt = 0;
> > @@ -1326,7 +1326,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
> > * failed response to the next operation. If we don't
> > * have enough room, fail with ERR_RESOURCE.
> > */
> > - slack_bytes = (char *)resp->end - (char *)resp->p;
> > + slack_bytes = (char *)resp->xdr.end - (char *)resp->xdr.p;
> > if (slack_bytes < COMPOUND_SLACK_SPACE
> > + COMPOUND_ERR_SLACK_SPACE) {
> > BUG_ON(slack_bytes < COMPOUND_ERR_SLACK_SPACE);
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index fac2683..05cc3eb 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -1569,10 +1569,10 @@ nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
> > slot->sl_datalen = 0;
> > return;
> > }
> > - slot->sl_datalen = (char *)resp->p - (char *)resp->cstate.datap;
> > + slot->sl_datalen = (char *)resp->xdr.p - (char *)resp->cstate.datap;
> > base = (char *)resp->cstate.datap -
> > - (char *)resp->xbuf->head[0].iov_base;
> > - if (read_bytes_from_xdr_buf(resp->xbuf, base, slot->sl_data,
> > + (char *)resp->xdr.buf->head[0].iov_base;
> > + if (read_bytes_from_xdr_buf(resp->xdr.buf, base, slot->sl_data,
> > slot->sl_datalen))
> > WARN("%s: sessions DRC could not cache compound\n", __func__);
> > return;
> > @@ -1626,7 +1626,7 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
> > memcpy(resp->cstate.datap, slot->sl_data, slot->sl_datalen);
> >
> > resp->opcnt = slot->sl_opcnt;
> > - resp->p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
> > + resp->xdr.p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
> > status = slot->sl_status;
> >
> > return status;
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 18881f3..ef65ffc 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -1747,10 +1747,10 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
> > }
> >
> > #define RESERVE_SPACE(nbytes) do { \
> > - p = resp->p; \
> > - BUG_ON(p + XDR_QUADLEN(nbytes) > resp->end); \
> > + p = resp->xdr.p; \
> > + BUG_ON(p + XDR_QUADLEN(nbytes) > resp->xdr.end); \
> > } while (0)
> > -#define ADJUST_ARGS() resp->p = p
> > +#define ADJUST_ARGS() resp->xdr.p = p
> >
> > /* Encode as an array of strings the string given with components
> > * separated @sep, escaped with esc_enter and esc_exit.
> > @@ -2751,9 +2751,9 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
> > if (nfserr)
> > return nfserr;
> >
> > - buflen = resp->end - resp->p - (COMPOUND_ERR_SLACK_SPACE >> 2);
> > + buflen = resp->xdr.end - resp->xdr.p - (COMPOUND_ERR_SLACK_SPACE >> 2);
> > nfserr = nfsd4_encode_fattr(fhp, fhp->fh_export, fhp->fh_dentry,
> > - &resp->p, buflen, getattr->ga_bmval,
> > + &resp->xdr.p, buflen, getattr->ga_bmval,
> > resp->rqstp, 0);
> > return nfserr;
> > }
> > @@ -2953,7 +2953,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
> >
> > if (nfserr)
> > return nfserr;
> > - if (resp->xbuf->page_len)
> > + if (resp->xdr.buf->page_len)
> > return nfserr_resource;
> >
> > RESERVE_SPACE(8); /* eof flag and byte count */
> > @@ -2991,18 +2991,18 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
> > WRITE32(eof);
> > WRITE32(maxcount);
> > ADJUST_ARGS();
> > - resp->xbuf->head[0].iov_len = (char*)p
> > - - (char*)resp->xbuf->head[0].iov_base;
> > - resp->xbuf->page_len = maxcount;
> > + resp->xdr.buf->head[0].iov_len = (char*)p
> > + - (char*)resp->xdr.buf->head[0].iov_base;
> > + resp->xdr.buf->page_len = maxcount;
> >
> > /* Use rest of head for padding and remaining ops: */
> > - resp->xbuf->tail[0].iov_base = p;
> > - resp->xbuf->tail[0].iov_len = 0;
> > + resp->xdr.buf->tail[0].iov_base = p;
> > + resp->xdr.buf->tail[0].iov_len = 0;
> > if (maxcount&3) {
> > RESERVE_SPACE(4);
> > WRITE32(0);
> > - resp->xbuf->tail[0].iov_base += maxcount&3;
> > - resp->xbuf->tail[0].iov_len = 4 - (maxcount&3);
> > + resp->xdr.buf->tail[0].iov_base += maxcount&3;
> > + resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
> > ADJUST_ARGS();
> > }
> > return 0;
> > @@ -3017,7 +3017,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
> >
> > if (nfserr)
> > return nfserr;
> > - if (resp->xbuf->page_len)
> > + if (resp->xdr.buf->page_len)
> > return nfserr_resource;
> > if (!*resp->rqstp->rq_next_page)
> > return nfserr_resource;
> > @@ -3041,18 +3041,18 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
> >
> > WRITE32(maxcount);
> > ADJUST_ARGS();
> > - resp->xbuf->head[0].iov_len = (char*)p
> > - - (char*)resp->xbuf->head[0].iov_base;
> > - resp->xbuf->page_len = maxcount;
> > + resp->xdr.buf->head[0].iov_len = (char*)p
> > + - (char*)resp->xdr.buf->head[0].iov_base;
> > + resp->xdr.buf->page_len = maxcount;
> >
> > /* Use rest of head for padding and remaining ops: */
> > - resp->xbuf->tail[0].iov_base = p;
> > - resp->xbuf->tail[0].iov_len = 0;
> > + resp->xdr.buf->tail[0].iov_base = p;
> > + resp->xdr.buf->tail[0].iov_len = 0;
> > if (maxcount&3) {
> > RESERVE_SPACE(4);
> > WRITE32(0);
> > - resp->xbuf->tail[0].iov_base += maxcount&3;
> > - resp->xbuf->tail[0].iov_len = 4 - (maxcount&3);
> > + resp->xdr.buf->tail[0].iov_base += maxcount&3;
> > + resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
> > ADJUST_ARGS();
> > }
> > return 0;
> > @@ -3068,7 +3068,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
> >
> > if (nfserr)
> > return nfserr;
> > - if (resp->xbuf->page_len)
> > + if (resp->xdr.buf->page_len)
> > return nfserr_resource;
> > if (!*resp->rqstp->rq_next_page)
> > return nfserr_resource;
> > @@ -3080,7 +3080,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
> > WRITE32(0);
> > WRITE32(0);
> > ADJUST_ARGS();
> > - resp->xbuf->head[0].iov_len = ((char*)resp->p) - (char*)resp->xbuf->head[0].iov_base;
> > + resp->xdr.buf->head[0].iov_len = ((char*)resp->xdr.p)
> > + - (char*)resp->xdr.buf->head[0].iov_base;
> > tailbase = p;
> >
> > maxcount = PAGE_SIZE;
> > @@ -3121,14 +3122,14 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
> > p = readdir->buffer;
> > *p++ = 0; /* no more entries */
> > *p++ = htonl(readdir->common.err == nfserr_eof);
> > - resp->xbuf->page_len = ((char*)p) -
> > + resp->xdr.buf->page_len = ((char*)p) -
> > (char*)page_address(*(resp->rqstp->rq_next_page-1));
> >
> > /* Use rest of head for padding and remaining ops: */
> > - resp->xbuf->tail[0].iov_base = tailbase;
> > - resp->xbuf->tail[0].iov_len = 0;
> > - resp->p = resp->xbuf->tail[0].iov_base;
> > - resp->end = resp->p + (PAGE_SIZE - resp->xbuf->head[0].iov_len)/4;
> > + resp->xdr.buf->tail[0].iov_base = tailbase;
> > + resp->xdr.buf->tail[0].iov_len = 0;
> > + resp->xdr.p = resp->xdr.buf->tail[0].iov_base;
> > + resp->xdr.end = resp->xdr.p + (PAGE_SIZE - resp->xdr.buf->head[0].iov_len)/4;
> >
> > return 0;
> > err_no_verf:
> > @@ -3587,10 +3588,10 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 pad)
> > session = resp->cstate.session;
> >
> > if (xb->page_len == 0) {
> > - length = (char *)resp->p - (char *)xb->head[0].iov_base + pad;
> > + length = (char *)resp->xdr.p - (char *)xb->head[0].iov_base + pad;
> > } else {
> > if (xb->tail[0].iov_base && xb->tail[0].iov_len > 0)
> > - tlen = (char *)resp->p - (char *)xb->tail[0].iov_base;
> > + tlen = (char *)resp->xdr.p - (char *)xb->tail[0].iov_base;
> >
> > length = xb->head[0].iov_len + xb->page_len + tlen + pad;
> > }
> > @@ -3629,7 +3630,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
> > op->status = nfsd4_check_resp_size(resp, 0);
> > if (so) {
> > so->so_replay.rp_status = op->status;
> > - so->so_replay.rp_buflen = (char *)resp->p - (char *)(statp+1);
> > + so->so_replay.rp_buflen = (char *)resp->xdr.p - (char *)(statp+1);
> > memcpy(so->so_replay.rp_buf, statp+1, so->so_replay.rp_buflen);
> > }
> > status:
> > @@ -3731,7 +3732,7 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
> > iov = &rqstp->rq_res.tail[0];
> > else
> > iov = &rqstp->rq_res.head[0];
> > - iov->iov_len = ((char*)resp->p) - (char*)iov->iov_base;
> > + iov->iov_len = ((char*)resp->xdr.p) - (char*)iov->iov_base;
> > BUG_ON(iov->iov_len > PAGE_SIZE);
> > if (nfsd4_has_session(cs)) {
> > struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> > diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> > index 5ea7df3..6884d70 100644
> > --- a/fs/nfsd/xdr4.h
> > +++ b/fs/nfsd/xdr4.h
> > @@ -506,9 +506,7 @@ struct nfsd4_compoundargs {
> >
> > struct nfsd4_compoundres {
> > /* scratch variables for XDR encode */
> > - __be32 * p;
> > - __be32 * end;
> > - struct xdr_buf * xbuf;
> > + struct xdr_stream xdr;
> > struct svc_rqst * rqstp;
> >
> > u32 taglen;
> >

2014-05-12 21:48:07

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 06/43] nfsd4: fix encoding of out-of-space replies

On Mon, May 12, 2014 at 01:18:01AM -0700, Christoph Hellwig wrote:
> On Sun, May 11, 2014 at 04:52:11PM -0400, J. Bruce Fields wrote:
...
> > - op->status = nfsd4_enc_ops[op->opnum](resp, op->status, &op->u);
> > + encoder = nfsd4_enc_ops[op->opnum];
> > + op->status = encoder(resp, op->status, &op->u);
>
> What is the point of the encoder variable that gets set and used a line
> later the only time?

I find the two new lines above a little easier to read than the single
original line. That's all. But I don't feel strongly about it.

--b.

2014-05-11 20:53:02

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 25/43] nfsd4: minor encode_read cleanup

From: "J. Bruce Fields" <[email protected]>

---
fs/nfsd/nfs4xdr.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 19071d7..eb1694d 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3076,18 +3076,20 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

len = maxcount;
v = 0;
- while (len > 0) {
+ while (len) {
+ int thislen;
+
page = *(resp->rqstp->rq_next_page);
if (!page) { /* ran out of pages */
maxcount -= len;
break;
}
+ thislen = min_t(long, len, PAGE_SIZE);
resp->rqstp->rq_vec[v].iov_base = page_address(page);
- resp->rqstp->rq_vec[v].iov_len =
- len < PAGE_SIZE ? len : PAGE_SIZE;
+ resp->rqstp->rq_vec[v].iov_len = thislen;
resp->rqstp->rq_next_page++;
v++;
- len -= PAGE_SIZE;
+ len -= thislen;
}
read->rd_vlen = v;

--
1.7.9.5


2014-05-12 05:41:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 05/43] nfsd4: move nfsd4_operation to xdr4.h

> +struct nfsd4_operation *OPDESC(struct nfsd4_op *op)

I don't think OPDESC is a good name for a non-static function.


But looking at the whoile tree we only need the exposed information
in two simple places in nfs4xdr.c, so I'd suggest to export something
higher level instead, e.g. move nfsd4_max_reply into nfs4proc.c
and have a helper to check that the op doesn't modify anything and
warn if it does.


2014-05-12 16:08:02

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd4 xdr encoding fixes v2

On Mon, May 12, 2014 at 01:20:59AM -0700, Christoph Hellwig wrote:
> This series seem to cause hangs during xfstests against a server on the
> same VM. The trace is fairly similar every the hang happens, but the
> point at which it happens differs:

Ouch, OK, and you're sure it starts with this series?

I guess I should try to replicate it here. Might take a copule days.

--b.

>
> [ 3120.186527] INFO: task fill:26222 blocked for more than 120 seconds.
> [ 3120.187607] Not tainted 3.15.0-rc1+ #22
> [ 3120.188424] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 3120.189765] fill D ffff88007a5b3c20 0 26222 26130
> 0x00000002
> [ 3120.191158] ffff88007a5b3b78 0000000000000046 ffff880079284f10
> 0000000000013dc0
> [ 3120.192666] ffff88007a5b3fd8 0000000000013dc0 ffff88007350cf10
> ffff880079284f10
> [ 3120.195303] 0000000000000000 0000000000000002 0000000000000001
> 0000000000000000
> [ 3120.197980] Call Trace:
> [ 3120.198849] [<ffffffff8112ff2d>] ? __delayacct_blkio_start+0x1d/0x20
> [ 3120.200791] [<ffffffff810ead35>] ? prepare_to_wait+0x25/0x90
> [ 3120.202438] [<ffffffff811114f5>] ? ktime_get_ts+0x145/0x180
> [ 3120.204033] [<ffffffff8115ef50>] ? __lock_page+0x70/0x70
> [ 3120.205598] [<ffffffff8107c83f>] ? kvm_clock_read+0x1f/0x30
> [ 3120.207236] [<ffffffff8107c859>] ? kvm_clock_get_cycles+0x9/0x10
> [ 3120.209006] [<ffffffff81111464>] ? ktime_get_ts+0xb4/0x180
> [ 3120.210828] [<ffffffff8112ff2d>] ? __delayacct_blkio_start+0x1d/0x20
> [ 3120.212645] [<ffffffff8115ef50>] ? __lock_page+0x70/0x70
> [ 3120.214290] [<ffffffff81ce5294>] schedule+0x24/0x70
> [ 3120.216915] [<ffffffff81ce536a>] io_schedule+0x8a/0xd0
> [ 3120.218484] [<ffffffff8115ef59>] sleep_on_page+0x9/0x10
> [ 3120.219979] [<ffffffff81ce5a8a>] __wait_on_bit+0x5a/0x90
> [ 3120.221543] [<ffffffff8115e9cf>] ? find_get_pages_tag+0x1f/0x190
> [ 3120.223310] [<ffffffff8115f438>] wait_on_page_bit+0x78/0x80
> [ 3120.224934] [<ffffffff810eb240>] ? wake_atomic_t_function+0x30/0x30
> [ 3120.226755] [<ffffffff8115f5a2>] filemap_fdatawait_range+0x102/0x190
> [ 3120.228615] [<ffffffff8116033a>]
> filemap_write_and_wait_range+0x4a/0x80
> [ 3120.230640] [<ffffffff8135c00f>] nfs4_file_fsync+0x5f/0xb0
> [ 3120.232230] [<ffffffff811d70c1>] vfs_fsync+0x21/0x30
> [ 3120.233716] [<ffffffff8132a1fe>] nfs_file_flush+0x6e/0x90
> [ 3120.235261] [<ffffffff811a4ac5>] filp_close+0x35/0x80
> [ 3120.236758] [<ffffffff811c4844>] put_files_struct+0x94/0xe0
> [ 3120.238361] [<ffffffff811c494d>] exit_files+0x4d/0x60
> [ 3120.239863] [<ffffffff810ad947>] do_exit+0x297/0xa00
> [ 3120.241336] [<ffffffff811a91b8>] ? __sb_end_write+0x78/0x80
> [ 3120.242925] [<ffffffff81cea158>] ? retint_swapgs+0x13/0x1b
> [ 3120.244541] [<ffffffff810ae1d7>] do_group_exit+0x47/0xc0
> [ 3120.246129] [<ffffffff810ae262>] SyS_exit_group+0x12/0x20
> [ 3120.247960] [<ffffffff81cf24f9>] system_call_fastpath+0x16/0x1b
> [ 3120.249226] no locks held by fill/26222.
>

2014-05-11 20:52:54

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 04/43] nfsd4: reserve head space for krb5 integ/priv info

From: "J. Bruce Fields" <[email protected]>

Currently if the nfs-level part of a reply would be too large, we'll
return an error to the client. But if the nfs-level part fits and
leaves no room for krb5p or krb5i stuff, then we just drop the request
entirely.

That's no good. Instead, reserve some slack space at the end of the
buffer and make sure we fail outright if we'd come close.

The slack space here is a massive overstimate of what's required, we
should probably try for a tighter limit at some point.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 6c049c4..3d4b044 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1259,7 +1259,7 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp, struct nfsd4_compoundres

xdr->buf = buf;
xdr->p = head->iov_base + head->iov_len;
- xdr->end = head->iov_base + PAGE_SIZE;
+ xdr->end = head->iov_base + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
}

/*
--
1.7.9.5


2014-05-12 05:34:01

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 01/43] nfsd4: embed xdr_stream in nfsd4_compoundres

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2014-05-12 14:19:02

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 39/43] nfsd4: really fix nfs4err_resource in 4.1 case

On Sun, May 11, 2014 at 10:33:17PM -0700, Christoph Hellwig wrote:
> > + if (op->status == nfserr_resource && nfsd4_has_session(&resp->cstate)) {
> > + struct nfsd4_slot *slot = resp->cstate.slot;
> > +
> > + if (slot->sl_flags & NFSD4_SLOT_CACHETHIS)
> > + op->status = nfserr_rep_too_big_to_cache;
> > + else
> > + op->status = nfserr_rep_too_big;
>
> There is a closing brace missing here, which breaks the compile for me.

Aie, sorry, I guess I didn't test anything after moving that patch back
in the series. I normally do a quick test-compile of whatever a patch
touched while rebasing, then run a set of regression tests before
posting. But in this case I think I finished the rebase in a hurry
Friday, then posted Sunday without remembering what exactly I'd last
done....

Anyway, fixed, new tests are running--thanks for the review.

--b.

2014-05-11 20:53:05

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 34/43] nfsd4: estimate sequence response size

From: "J. Bruce Fields" <[email protected]>

Otherwise a following patch would turn off all 4.1 zero-copy reads.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index edd2eb1..84a850c 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1509,6 +1509,11 @@ static inline u32 nfsd4_rename_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op
+ op_encode_change_info_maxsz) * sizeof(__be32);
}

+static inline u32 nfsd4_sequence_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
+{
+ return NFS4_MAX_SESSIONID_LEN + 20;
+}
+
static inline u32 nfsd4_setattr_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
return (op_encode_hdr_size + nfs4_fattr_bitmap_maxsz) * sizeof(__be32);
--
1.7.9.5


2014-05-11 20:53:00

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 22/43] nfsd4: convert 4.1 replay encoding

From: "J. Bruce Fields" <[email protected]>

Limits on maxresp_sz mean that we only ever need to replay rpc's that
are contained entirely in the head.

The one exception is very small zero-copy reads. That's an odd corner
case as clients wouldn't normally ask those to be cached.

in any case, this seems a little more robust.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4state.c | 28 ++++++++++++++--------------
fs/nfsd/nfs4xdr.c | 2 +-
fs/nfsd/xdr4.h | 2 +-
3 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d4c9683..dd7efe9 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1556,6 +1556,7 @@ out_err:
void
nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
{
+ struct xdr_buf *buf = resp->xdr.buf;
struct nfsd4_slot *slot = resp->cstate.slot;
unsigned int base;

@@ -1569,11 +1570,9 @@ nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
slot->sl_datalen = 0;
return;
}
- slot->sl_datalen = (char *)resp->xdr.p - (char *)resp->cstate.datap;
- base = (char *)resp->cstate.datap -
- (char *)resp->xdr.buf->head[0].iov_base;
- if (read_bytes_from_xdr_buf(resp->xdr.buf, base, slot->sl_data,
- slot->sl_datalen))
+ base = resp->cstate.data_offset;
+ slot->sl_datalen = buf->len - base;
+ if (read_bytes_from_xdr_buf(buf, base, slot->sl_data, slot->sl_datalen))
WARN("%s: sessions DRC could not cache compound\n", __func__);
return;
}
@@ -1614,7 +1613,8 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
struct nfsd4_sequence *seq)
{
struct nfsd4_slot *slot = resp->cstate.slot;
- struct kvec *head = resp->xdr.iov;
+ struct xdr_stream *xdr = &resp->xdr;
+ __be32 *p;
__be32 status;

dprintk("--> %s slot %p\n", __func__, slot);
@@ -1623,16 +1623,16 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
if (status)
return status;

- /* The sequence operation has been encoded, cstate->datap set. */
- memcpy(resp->cstate.datap, slot->sl_data, slot->sl_datalen);
+ p = xdr_reserve_space(xdr, slot->sl_datalen);
+ if (!p) {
+ WARN_ON_ONCE(1);
+ return nfserr_serverfault;
+ }
+ xdr_encode_opaque_fixed(p, slot->sl_data, slot->sl_datalen);
+ xdr_commit_encode(xdr);

resp->opcnt = slot->sl_opcnt;
- resp->xdr.p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
- head->iov_len = (void *)resp->xdr.p - head->iov_base;
- resp->xdr.buf->len = head->iov_len;
- status = slot->sl_status;
-
- return status;
+ return slot->sl_status;
}

/*
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index aedf19a..6c3ac43 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3641,7 +3641,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(seq->maxslots - 1); /* sr_target_highest_slotid */
WRITE32(seq->status_flags);

- resp->cstate.datap = p; /* DRC cache data pointer */
+ resp->cstate.data_offset = xdr->buf->len; /* DRC cache data pointer */
return 0;
}

diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 19bf3fc..d1c6e21 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -58,7 +58,7 @@ struct nfsd4_compound_state {
/* For sessions DRC */
struct nfsd4_session *session;
struct nfsd4_slot *slot;
- __be32 *datap;
+ int data_offset;
size_t iovlen;
u32 minorversion;
__be32 status;
--
1.7.9.5


2014-05-11 20:53:02

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 23/43] nfsd4: don't try to encode conflicting owner if low on space

From: "J. Bruce Fields" <[email protected]>

I ran into this corner case in testing: in theory clients can provide
state owners up to 1024 bytes long. In the sessions case there might be
a risk of this pushing us over the DRC slot size.

The conflicting owner isn't really that important, so let's humor a
client that provides a small maxresponsize_cached by allowing ourselves
to return without the conflicting owner instead of outright failing the
operation.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 3 ++-
fs/nfsd/nfs4xdr.c | 16 +++++++++++++---
2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 787aa9f..be638c1 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1381,7 +1381,8 @@ out:
#define op_encode_change_info_maxsz (5)
#define nfs4_fattr_bitmap_maxsz (4)

-#define op_encode_lockowner_maxsz (1 + XDR_QUADLEN(IDMAP_NAMESZ))
+/* We'll fall back on returning no lockowner if run out of space: */
+#define op_encode_lockowner_maxsz (0)
#define op_encode_lock_denied_maxsz (8 + op_encode_lockowner_maxsz)

#define nfs4_owner_maxsz (1 + XDR_QUADLEN(IDMAP_NAMESZ))
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 6c3ac43..d418d7e 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2862,9 +2862,20 @@ nfsd4_encode_lock_denied(struct xdr_stream *xdr, struct nfsd4_lock_denied *ld)
struct xdr_netobj *conf = &ld->ld_owner;
__be32 *p;

+again:
p = xdr_reserve_space(xdr, 32 + XDR_LEN(conf->len));
- if (!p)
+ if (!p) {
+ /*
+ * Don't fail to return the result just because we can't
+ * return the conflicting open:
+ */
+ if (conf->len) {
+ conf->len = 0;
+ conf->data = NULL;
+ goto again;
+ }
return nfserr_resource;
+ }
WRITE64(ld->ld_start);
WRITE64(ld->ld_length);
WRITE32(ld->ld_type);
@@ -2872,7 +2883,6 @@ nfsd4_encode_lock_denied(struct xdr_stream *xdr, struct nfsd4_lock_denied *ld)
WRITEMEM(&ld->ld_clientid, 8);
WRITE32(conf->len);
WRITEMEM(conf->data, conf->len);
- kfree(conf->data);
} else { /* non - nfsv4 lock in conflict, no clientid nor owner */
WRITE64((u64)0); /* clientid */
WRITE32(0); /* length of owner name */
@@ -2889,7 +2899,7 @@ nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lo
nfserr = nfsd4_encode_stateid(xdr, &lock->lk_resp_stateid);
else if (nfserr == nfserr_denied)
nfserr = nfsd4_encode_lock_denied(xdr, &lock->lk_denied);
-
+ kfree(lock->lk_denied.ld_owner.data);
return nfserr;
}

--
1.7.9.5


2014-05-16 09:58:34

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH 01/43] nfsd4: embed xdr_stream in nfsd4_compoundres

On 5/12/2014 04:52, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> This is a mechanical transformation with no change in behavior.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/nfsd/nfs4proc.c | 12 +++++-----
> fs/nfsd/nfs4state.c | 8 +++----
> fs/nfsd/nfs4xdr.c | 65 ++++++++++++++++++++++++++-------------------------
> fs/nfsd/xdr4.h | 4 +---
> 4 files changed, 44 insertions(+), 45 deletions(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 2c1ee70..46370f5 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1268,13 +1268,13 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
> u32 plen = 0;
> __be32 status;
>
> - resp->xbuf = &rqstp->rq_res;
> - resp->p = rqstp->rq_res.head[0].iov_base +
> + resp->xdr.buf = &rqstp->rq_res;
> + resp->xdr.p = rqstp->rq_res.head[0].iov_base +
> rqstp->rq_res.head[0].iov_len;

I'd like using a temp pointer as (struct xdr_stream *resp_xdr = &resp->xdr;).

thanks,
Kinglong Mee

> - resp->tagp = resp->p;
> + resp->tagp = resp->xdr.p;
> /* reserve space for: taglen, tag, and opcnt */
> - resp->p += 2 + XDR_QUADLEN(args->taglen);
> - resp->end = rqstp->rq_res.head[0].iov_base + PAGE_SIZE;
> + resp->xdr.p += 2 + XDR_QUADLEN(args->taglen);
> + resp->xdr.end = rqstp->rq_res.head[0].iov_base + PAGE_SIZE;
> resp->taglen = args->taglen;
> resp->tag = args->tag;
> resp->opcnt = 0;
> @@ -1326,7 +1326,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
> * failed response to the next operation. If we don't
> * have enough room, fail with ERR_RESOURCE.
> */
> - slack_bytes = (char *)resp->end - (char *)resp->p;
> + slack_bytes = (char *)resp->xdr.end - (char *)resp->xdr.p;
> if (slack_bytes < COMPOUND_SLACK_SPACE
> + COMPOUND_ERR_SLACK_SPACE) {
> BUG_ON(slack_bytes < COMPOUND_ERR_SLACK_SPACE);
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index fac2683..05cc3eb 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1569,10 +1569,10 @@ nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
> slot->sl_datalen = 0;
> return;
> }
> - slot->sl_datalen = (char *)resp->p - (char *)resp->cstate.datap;
> + slot->sl_datalen = (char *)resp->xdr.p - (char *)resp->cstate.datap;
> base = (char *)resp->cstate.datap -
> - (char *)resp->xbuf->head[0].iov_base;
> - if (read_bytes_from_xdr_buf(resp->xbuf, base, slot->sl_data,
> + (char *)resp->xdr.buf->head[0].iov_base;
> + if (read_bytes_from_xdr_buf(resp->xdr.buf, base, slot->sl_data,
> slot->sl_datalen))
> WARN("%s: sessions DRC could not cache compound\n", __func__);
> return;
> @@ -1626,7 +1626,7 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
> memcpy(resp->cstate.datap, slot->sl_data, slot->sl_datalen);
>
> resp->opcnt = slot->sl_opcnt;
> - resp->p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
> + resp->xdr.p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
> status = slot->sl_status;
>
> return status;
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 18881f3..ef65ffc 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1747,10 +1747,10 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
> }
>
> #define RESERVE_SPACE(nbytes) do { \
> - p = resp->p; \
> - BUG_ON(p + XDR_QUADLEN(nbytes) > resp->end); \
> + p = resp->xdr.p; \
> + BUG_ON(p + XDR_QUADLEN(nbytes) > resp->xdr.end); \
> } while (0)
> -#define ADJUST_ARGS() resp->p = p
> +#define ADJUST_ARGS() resp->xdr.p = p
>
> /* Encode as an array of strings the string given with components
> * separated @sep, escaped with esc_enter and esc_exit.
> @@ -2751,9 +2751,9 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
> if (nfserr)
> return nfserr;
>
> - buflen = resp->end - resp->p - (COMPOUND_ERR_SLACK_SPACE >> 2);
> + buflen = resp->xdr.end - resp->xdr.p - (COMPOUND_ERR_SLACK_SPACE >> 2);
> nfserr = nfsd4_encode_fattr(fhp, fhp->fh_export, fhp->fh_dentry,
> - &resp->p, buflen, getattr->ga_bmval,
> + &resp->xdr.p, buflen, getattr->ga_bmval,
> resp->rqstp, 0);
> return nfserr;
> }
> @@ -2953,7 +2953,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
>
> if (nfserr)
> return nfserr;
> - if (resp->xbuf->page_len)
> + if (resp->xdr.buf->page_len)
> return nfserr_resource;
>
> RESERVE_SPACE(8); /* eof flag and byte count */
> @@ -2991,18 +2991,18 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
> WRITE32(eof);
> WRITE32(maxcount);
> ADJUST_ARGS();
> - resp->xbuf->head[0].iov_len = (char*)p
> - - (char*)resp->xbuf->head[0].iov_base;
> - resp->xbuf->page_len = maxcount;
> + resp->xdr.buf->head[0].iov_len = (char*)p
> + - (char*)resp->xdr.buf->head[0].iov_base;
> + resp->xdr.buf->page_len = maxcount;
>
> /* Use rest of head for padding and remaining ops: */
> - resp->xbuf->tail[0].iov_base = p;
> - resp->xbuf->tail[0].iov_len = 0;
> + resp->xdr.buf->tail[0].iov_base = p;
> + resp->xdr.buf->tail[0].iov_len = 0;
> if (maxcount&3) {
> RESERVE_SPACE(4);
> WRITE32(0);
> - resp->xbuf->tail[0].iov_base += maxcount&3;
> - resp->xbuf->tail[0].iov_len = 4 - (maxcount&3);
> + resp->xdr.buf->tail[0].iov_base += maxcount&3;
> + resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
> ADJUST_ARGS();
> }
> return 0;
> @@ -3017,7 +3017,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
>
> if (nfserr)
> return nfserr;
> - if (resp->xbuf->page_len)
> + if (resp->xdr.buf->page_len)
> return nfserr_resource;
> if (!*resp->rqstp->rq_next_page)
> return nfserr_resource;
> @@ -3041,18 +3041,18 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
>
> WRITE32(maxcount);
> ADJUST_ARGS();
> - resp->xbuf->head[0].iov_len = (char*)p
> - - (char*)resp->xbuf->head[0].iov_base;
> - resp->xbuf->page_len = maxcount;
> + resp->xdr.buf->head[0].iov_len = (char*)p
> + - (char*)resp->xdr.buf->head[0].iov_base;
> + resp->xdr.buf->page_len = maxcount;
>
> /* Use rest of head for padding and remaining ops: */
> - resp->xbuf->tail[0].iov_base = p;
> - resp->xbuf->tail[0].iov_len = 0;
> + resp->xdr.buf->tail[0].iov_base = p;
> + resp->xdr.buf->tail[0].iov_len = 0;
> if (maxcount&3) {
> RESERVE_SPACE(4);
> WRITE32(0);
> - resp->xbuf->tail[0].iov_base += maxcount&3;
> - resp->xbuf->tail[0].iov_len = 4 - (maxcount&3);
> + resp->xdr.buf->tail[0].iov_base += maxcount&3;
> + resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
> ADJUST_ARGS();
> }
> return 0;
> @@ -3068,7 +3068,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
>
> if (nfserr)
> return nfserr;
> - if (resp->xbuf->page_len)
> + if (resp->xdr.buf->page_len)
> return nfserr_resource;
> if (!*resp->rqstp->rq_next_page)
> return nfserr_resource;
> @@ -3080,7 +3080,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
> WRITE32(0);
> WRITE32(0);
> ADJUST_ARGS();
> - resp->xbuf->head[0].iov_len = ((char*)resp->p) - (char*)resp->xbuf->head[0].iov_base;
> + resp->xdr.buf->head[0].iov_len = ((char*)resp->xdr.p)
> + - (char*)resp->xdr.buf->head[0].iov_base;
> tailbase = p;
>
> maxcount = PAGE_SIZE;
> @@ -3121,14 +3122,14 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
> p = readdir->buffer;
> *p++ = 0; /* no more entries */
> *p++ = htonl(readdir->common.err == nfserr_eof);
> - resp->xbuf->page_len = ((char*)p) -
> + resp->xdr.buf->page_len = ((char*)p) -
> (char*)page_address(*(resp->rqstp->rq_next_page-1));
>
> /* Use rest of head for padding and remaining ops: */
> - resp->xbuf->tail[0].iov_base = tailbase;
> - resp->xbuf->tail[0].iov_len = 0;
> - resp->p = resp->xbuf->tail[0].iov_base;
> - resp->end = resp->p + (PAGE_SIZE - resp->xbuf->head[0].iov_len)/4;
> + resp->xdr.buf->tail[0].iov_base = tailbase;
> + resp->xdr.buf->tail[0].iov_len = 0;
> + resp->xdr.p = resp->xdr.buf->tail[0].iov_base;
> + resp->xdr.end = resp->xdr.p + (PAGE_SIZE - resp->xdr.buf->head[0].iov_len)/4;
>
> return 0;
> err_no_verf:
> @@ -3587,10 +3588,10 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 pad)
> session = resp->cstate.session;
>
> if (xb->page_len == 0) {
> - length = (char *)resp->p - (char *)xb->head[0].iov_base + pad;
> + length = (char *)resp->xdr.p - (char *)xb->head[0].iov_base + pad;
> } else {
> if (xb->tail[0].iov_base && xb->tail[0].iov_len > 0)
> - tlen = (char *)resp->p - (char *)xb->tail[0].iov_base;
> + tlen = (char *)resp->xdr.p - (char *)xb->tail[0].iov_base;
>
> length = xb->head[0].iov_len + xb->page_len + tlen + pad;
> }
> @@ -3629,7 +3630,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
> op->status = nfsd4_check_resp_size(resp, 0);
> if (so) {
> so->so_replay.rp_status = op->status;
> - so->so_replay.rp_buflen = (char *)resp->p - (char *)(statp+1);
> + so->so_replay.rp_buflen = (char *)resp->xdr.p - (char *)(statp+1);
> memcpy(so->so_replay.rp_buf, statp+1, so->so_replay.rp_buflen);
> }
> status:
> @@ -3731,7 +3732,7 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
> iov = &rqstp->rq_res.tail[0];
> else
> iov = &rqstp->rq_res.head[0];
> - iov->iov_len = ((char*)resp->p) - (char*)iov->iov_base;
> + iov->iov_len = ((char*)resp->xdr.p) - (char*)iov->iov_base;
> BUG_ON(iov->iov_len > PAGE_SIZE);
> if (nfsd4_has_session(cs)) {
> struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index 5ea7df3..6884d70 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -506,9 +506,7 @@ struct nfsd4_compoundargs {
>
> struct nfsd4_compoundres {
> /* scratch variables for XDR encode */
> - __be32 * p;
> - __be32 * end;
> - struct xdr_buf * xbuf;
> + struct xdr_stream xdr;
> struct svc_rqst * rqstp;
>
> u32 taglen;
>

2014-05-13 14:47:47

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 04/43] nfsd4: reserve head space for krb5 integ/priv info

On Mon, May 12, 2014 at 10:05:53PM -0700, Christoph Hellwig wrote:
> On Mon, May 12, 2014 at 05:45:45PM -0400, J. Bruce Fields wrote:
> > Yes. At the end of this series we have RPC_MAX_AUTH_SIZE scattered
> > around in a few different places. Rather than have each place have some
> > flavor-specific logic I think I'd like the auth code to set an
> > rq_auth_slack field in the struct svc_rqst for code like this to use.
>
> That sounds pretty reasonable to me.

Here's an attempt.

(The limit still could be tightened a lot. RPC_MAX_AUTH_SIZE really has
nothing to do with the amount of extra space required for krb5i/p, it's
just a random constant that we happen to know is plenty large enough.)

--b.

commit a5f2429b2756a66c35aab463a2784f334718719f
Author: J. Bruce Fields <[email protected]>
Date: Mon May 12 18:10:58 2014 -0400

nfsd4: better reservation of head space for krb5

RPC_MAX_AUTH_SIZE is scattered around several places. Better to set it
once in the auth code, where this kind of estimate should be made. And
while we're at it we can leave it zero when we're not using krb5i or
krb5p.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 3ab29ba..d1885a1 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1211,13 +1211,13 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
xdr->buf = buf;
xdr->iov = head;
xdr->p = head->iov_base + head->iov_len;
- xdr->end = head->iov_base + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
+ xdr->end = head->iov_base + PAGE_SIZE - rqstp->rq_auth_slack;
/* Tail and page_len should be zero at this point: */
buf->len = buf->head[0].iov_len;
xdr->scratch.iov_len = 0;
xdr->page_ptr = buf->pages;
buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages)
- - 2 * RPC_MAX_AUTH_SIZE;
+ - rqstp->rq_auth_slack;
}

/*
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 2526426..7336462 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2284,7 +2284,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
session->se_fchannel.maxresp_sz;
status = (seq->cachethis) ? nfserr_rep_too_big_to_cache :
nfserr_rep_too_big;
- if (xdr_restrict_buflen(xdr, buflen - 2 * RPC_MAX_AUTH_SIZE))
+ if (xdr_restrict_buflen(xdr, buflen - rqstp->rq_auth_slack))
goto out_put_session;
svc_reserve(rqstp, buflen);

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 53731b7..7a8259c 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1628,7 +1628,8 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
DECODE_HEAD;
struct nfsd4_op *op;
bool cachethis = false;
- int max_reply = 2 * RPC_MAX_AUTH_SIZE; /* uh, kind of a guess */
+ int auth_slack = argp->rqstp->rq_auth_slack;
+ int max_reply = auth_slack;
int readcount = 0;
int readbytes = 0;
int i;
@@ -1693,7 +1694,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
svc_reserve(argp->rqstp, max_reply + readbytes);
argp->rqstp->rq_cachetype = cachethis ? RC_REPLBUFF : RC_NOCACHE;

- if (readcount > 1 || max_reply > PAGE_SIZE - 2*RPC_MAX_AUTH_SIZE)
+ if (readcount > 1 || max_reply > PAGE_SIZE - auth_slack)
argp->rqstp->rq_splice_ok = false;

DECODE_TAIL;
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 39c50e1..b987bd9 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -259,7 +259,10 @@ struct svc_rqst {
void * rq_argp; /* decoded arguments */
void * rq_resp; /* xdr'd results */
void * rq_auth_data; /* flavor-specific data */
-
+ int rq_auth_slack; /* extra space xdr code
+ * should leave in head
+ * for krb5i, krb5p.
+ */
int rq_reserved; /* space on socket outq
* reserved for this request
*/
diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index 0f73f45..4ce5ecce 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -1503,6 +1503,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp)
if (unwrap_integ_data(rqstp, &rqstp->rq_arg,
gc->gc_seq, rsci->mechctx))
goto garbage_args;
+ rqstp->rq_auth_slack = RPC_MAX_AUTH_SIZE;
break;
case RPC_GSS_SVC_PRIVACY:
/* placeholders for length and seq. number: */
@@ -1511,6 +1512,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp)
if (unwrap_priv_data(rqstp, &rqstp->rq_arg,
gc->gc_seq, rsci->mechctx))
goto garbage_args;
+ rqstp->rq_auth_slack = RPC_MAX_AUTH_SIZE * 2;
break;
default:
goto auth_err;
diff --git a/net/sunrpc/svcauth.c b/net/sunrpc/svcauth.c
index 2af7b0c..79c0f34 100644
--- a/net/sunrpc/svcauth.c
+++ b/net/sunrpc/svcauth.c
@@ -54,6 +54,8 @@ svc_authenticate(struct svc_rqst *rqstp, __be32 *authp)
}
spin_unlock(&authtab_lock);

+ rqstp->rq_auth_slack = 0;
+
rqstp->rq_authop = aops;
return aops->accept(rqstp, authp);
}