2014-05-22 19:32:33

by J. Bruce Fields

[permalink] [raw]
Subject: xdr encoding fixes (v3)

Changes since the previous posting include:

- fix for a bug in "allow exotic read compounds" that appears to
have been the cause for xfstests hangs
- tighten the estimate of space required for
checksumming/encryption (in particular it can be zero in the
non-krb5i/krb5p case).
- fix some sloppy reply size estimates uncovered by the previous
change.

I'll soon start merging at least the first patches for 3.16.

Original introduction follows:

This is a collection of fixes for the NFS server's encoding of NFSv4
compounds, along with a few tangentially related cleanups and bugfixes I
noticed along the way.

The basic problem is that we've always assumed an rpc reply is either

- "small" (much less than a page), or
- looks like a read (a bunch of data with a little bit at the
beginning and the end).

That assumption has allowed us to cover the most important cases without
having to deal with some annoying details like how to encode arbitrary
data across a page boundary, but:

- The inability to encode attributes of more than a page annoys
some people who would like to get and set extraordinarily long
ACLs.

- The inability to encode attributes that cross page boundaries
also means we can't return more than a page of readdir data at
a time, limiting readdir performance on large directories.

- The NFSv4 protocol doesn't really allow us to place these
sorts of arbitrary limits on the types of compounds we handle.
(Well, 4.0 is a bit fuzzy on this point, but 4.1 I think
definitely considers it a bug if a server won't handle, e.g.,
multiple read ops in a compound.) This hasn't been an issue
because most of these exotic compounds aren't really useful to
clients. But maybe future clients will find a use for some of
them--in which case we'd prefer not to make the work around a
server that doesn't meet the spec.

So, the main goal is to fix those limitations. We also get to share a
little more code with the client.

Further work may include:

- writing more pynfs tests for exotic compounds and odd corner
cases,
- auditing the annoying nfsd4_*_rsize() functions, which we now
depend on for more things,
- improving our (currently very sloppy) estimate of how much
space we need for krb5i/krb5p to checksum/encrypt the result.
- sharing some of this with the v2/v3 code (especially in the
read and readdir cases),
- allow rpc's whose call and reply are both very large (our one
remaining dubious limit on compounds, though again something
clients seem unlikely to notice for now),
- on the decode side, eliminating the existing macros and
sharing more helpers with the client.

--b.


2014-05-22 19:32:43

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 34/52] nfsd4: adjust buflen to session channel limit

From: "J. Bruce Fields" <[email protected]>

We can simplify session limit enforcement by restricting the xdr buflen
to the session size.

Also fix a preexisting bug: we should really have been taking into
account the auth-required space when comparing against session limits,
which are limits on the size of the entire rpc reply, including any krb5
overhead.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4state.c | 11 +++++++++++
fs/nfsd/nfs4xdr.c | 24 ++++++++----------------
2 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index e936d9f..09be2e0 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2208,11 +2208,13 @@ nfsd4_sequence(struct svc_rqst *rqstp,
struct nfsd4_sequence *seq)
{
struct nfsd4_compoundres *resp = rqstp->rq_resp;
+ struct xdr_stream *xdr = &resp->xdr;
struct nfsd4_session *session;
struct nfs4_client *clp;
struct nfsd4_slot *slot;
struct nfsd4_conn *conn;
__be32 status;
+ int buflen;
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);

if (resp->opcnt != 1)
@@ -2281,6 +2283,15 @@ nfsd4_sequence(struct svc_rqst *rqstp,
if (status)
goto out_put_session;

+ buflen = (seq->cachethis) ?
+ session->se_fchannel.maxresp_cached :
+ session->se_fchannel.maxresp_sz;
+ status = (seq->cachethis) ? nfserr_rep_too_big_to_cache :
+ nfserr_rep_too_big;
+ if (xdr_restrict_buflen(xdr, buflen - 2 * RPC_MAX_AUTH_SIZE))
+ goto out_put_session;
+
+ status = nfs_ok;
/* Success! bump slot seqid */
slot->sl_seqid = seq->seqid;
slot->sl_flags |= NFSD4_SLOT_INUSE;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index ad594f2..311e281 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3767,25 +3767,17 @@ static nfsd4_enc nfsd4_enc_ops[] = {
__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 respsize)
{
struct xdr_buf *buf = &resp->rqstp->rq_res;
- struct nfsd4_session *session = resp->cstate.session;
+ struct nfsd4_slot *slot = resp->cstate.slot;

- if (nfsd4_has_session(&resp->cstate)) {
- struct nfsd4_slot *slot = resp->cstate.slot;
-
- if (buf->len + respsize > session->se_fchannel.maxresp_sz)
- return nfserr_rep_too_big;
-
- if ((slot->sl_flags & NFSD4_SLOT_CACHETHIS) &&
- buf->len + respsize > session->se_fchannel.maxresp_cached)
- return nfserr_rep_too_big_to_cache;
- }
-
- if (buf->len + respsize > buf->buflen) {
- WARN_ON_ONCE(nfsd4_has_session(&resp->cstate));
+ if (buf->len + respsize <= buf->buflen)
+ return nfs_ok;
+ if (!nfsd4_has_session(&resp->cstate))
return nfserr_resource;
+ if (slot->sl_flags & NFSD4_SLOT_CACHETHIS) {
+ WARN_ON_ONCE(1);
+ return nfserr_rep_too_big_to_cache;
}
-
- return 0;
+ return nfserr_rep_too_big;
}

void
--
1.9.0


2014-05-22 19:32:33

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 05/52] nfsd4: read size estimate should include padding

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 26995a5..44cd477 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1491,7 +1491,7 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
if (rlen > maxcount)
rlen = maxcount;

- return (op_encode_hdr_size + 2) * sizeof(__be32) + rlen;
+ return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32);
}

static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
--
1.9.0


2014-05-22 19:32:33

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 14/52] nfsd4: use xdr_stream throughout compound encoding

From: "J. Bruce Fields" <[email protected]>

Note this makes ADJUST_ARGS useless; we'll remove it in the following
patch.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
fs/nfsd/nfs4xdr.c | 23 +++++++++++++++--------
2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 1433854..969a7b3 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1286,7 +1286,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
svcxdr_init_encode(rqstp, resp);
resp->tagp = resp->xdr.p;
/* reserve space for: taglen, tag, and opcnt */
- resp->xdr.p += 2 + XDR_QUADLEN(args->taglen);
+ xdr_reserve_space(&resp->xdr, 8 + args->taglen);
resp->taglen = args->taglen;
resp->tag = args->tag;
resp->opcnt = 0;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 0ab7aae..bc8c9e0 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1747,10 +1747,10 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
}

#define RESERVE_SPACE(nbytes) do { \
- p = resp->xdr.p; \
- BUG_ON(p + XDR_QUADLEN(nbytes) > resp->xdr.end); \
+ p = xdr_reserve_space(&resp->xdr, nbytes); \
+ BUG_ON(!p); \
} while (0)
-#define ADJUST_ARGS() resp->xdr.p = p
+#define ADJUST_ARGS() WARN_ON_ONCE(p != resp->xdr.p) \

/* Encode as an array of strings the string given with components
* separated @sep, escaped with esc_enter and esc_exit.
@@ -3056,8 +3056,11 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
&maxcount);

- if (nfserr)
+ if (nfserr) {
+ xdr->p -= 2;
+ xdr->iov->iov_len -= 8;
return nfserr;
+ }
eof = (read->rd_offset + maxcount >=
read->rd_fhp->fh_dentry->d_inode->i_size);

@@ -3110,9 +3113,12 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
*/
nfserr = nfsd_readlink(readlink->rl_rqstp, readlink->rl_fhp, page, &maxcount);
if (nfserr == nfserr_isdir)
- return nfserr_inval;
- if (nfserr)
+ nfserr = nfserr_inval;
+ if (nfserr) {
+ xdr->p--;
+ xdr->iov->iov_len -= 4;
return nfserr;
+ }

WRITE32(maxcount);
ADJUST_ARGS();
@@ -3213,8 +3219,9 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4

return 0;
err_no_verf:
- p = savep;
- ADJUST_ARGS();
+ xdr->p = savep;
+ xdr->iov->iov_len = ((char *)resp->xdr.p)
+ - (char *)resp->xdr.buf->head[0].iov_base;
return nfserr;
}

--
1.9.0


2014-05-22 19:32:40

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 26/52] nfsd4: allow encoding across page boundaries

From: "J. Bruce Fields" <[email protected]>

After this we can handle for example getattr of very large ACLs.

Read, readdir, readlink are still special cases with their own limits.

Also we can't handle a new operation starting close to the end of a
page.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 4 +++
fs/nfsd/nfs4xdr.c | 60 +++++++++++++++++++++++++----------
include/linux/sunrpc/svc.h | 1 +
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/svc_xprt.c | 1 +
net/sunrpc/xdr.c | 78 ++++++++++++++++++++++++++++++++++++++++++++--
6 files changed, 126 insertions(+), 19 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index f8e7472..2517e0b 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1266,6 +1266,10 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
xdr->end = head->iov_base + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
/* Tail and page_len should be zero at this point: */
buf->len = buf->head[0].iov_len;
+ xdr->scratch.iov_len = 0;
+ xdr->page_ptr = buf->pages;
+ buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages)
+ - 2 * RPC_MAX_AUTH_SIZE;
}

/*
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index f1a9716..1bac000 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1624,6 +1624,7 @@ static int nfsd4_max_reply(u32 opnum)
* the head and tail in another page:
*/
return 2 * PAGE_SIZE;
+ case OP_GETATTR:
case OP_READ:
return INT_MAX;
default:
@@ -2560,21 +2561,31 @@ out_resource:
goto out;
}

+static void svcxdr_init_encode_from_buffer(struct xdr_stream *xdr,
+ struct xdr_buf *buf, __be32 *p, int bytes)
+{
+ xdr->scratch.iov_len = 0;
+ memset(buf, 0, sizeof(struct xdr_buf));
+ buf->head[0].iov_base = p;
+ buf->head[0].iov_len = 0;
+ buf->len = 0;
+ xdr->buf = buf;
+ xdr->iov = buf->head;
+ xdr->p = p;
+ xdr->end = (void *)p + bytes;
+ buf->buflen = bytes;
+}
+
__be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
struct svc_fh *fhp, struct svc_export *exp,
struct dentry *dentry, u32 *bmval,
struct svc_rqst *rqstp, int ignore_crossmnt)
{
- struct xdr_buf dummy = {
- .head[0] = {
- .iov_base = *p,
- },
- .buflen = words << 2,
- };
+ struct xdr_buf dummy;
struct xdr_stream xdr;
__be32 ret;

- xdr_init_encode(&xdr, &dummy, NULL);
+ svcxdr_init_encode_from_buffer(&xdr, &dummy, *p, words << 2);
ret = nfsd4_encode_fattr(&xdr, fhp, exp, dentry, bmval, rqstp,
ignore_crossmnt);
*p = xdr.p;
@@ -3064,8 +3075,6 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

if (nfserr)
return nfserr;
- if (resp->xdr.buf->page_len)
- return nfserr_resource;

p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
if (!p)
@@ -3075,6 +3084,9 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
if (xdr->end - xdr->p < 1)
return nfserr_resource;

+ if (resp->xdr.buf->page_len)
+ return nfserr_resource;
+
maxcount = svc_max_payload(resp->rqstp);
if (maxcount > read->rd_length)
maxcount = read->rd_length;
@@ -3119,6 +3131,8 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
- (char *)resp->xdr.buf->head[0].iov_base);
resp->xdr.buf->page_len = maxcount;
xdr->buf->len += maxcount;
+ xdr->page_ptr += v;
+ xdr->buf->buflen = maxcount + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
@@ -3145,6 +3159,11 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd

if (nfserr)
return nfserr;
+
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
+
if (resp->xdr.buf->page_len)
return nfserr_resource;
if (!*resp->rqstp->rq_next_page)
@@ -3154,10 +3173,6 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd

maxcount = PAGE_SIZE;

- p = xdr_reserve_space(xdr, 4);
- if (!p)
- return nfserr_resource;
-
if (xdr->end - xdr->p < 1)
return nfserr_resource;

@@ -3180,6 +3195,8 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
- (char *)resp->xdr.buf->head[0].iov_base;
resp->xdr.buf->page_len = maxcount;
xdr->buf->len += maxcount;
+ xdr->page_ptr += 1;
+ xdr->buf->buflen -= PAGE_SIZE;
xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
@@ -3206,15 +3223,16 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4

if (nfserr)
return nfserr;
- if (resp->xdr.buf->page_len)
- return nfserr_resource;
- if (!*resp->rqstp->rq_next_page)
- return nfserr_resource;

p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE);
if (!p)
return nfserr_resource;

+ if (resp->xdr.buf->page_len)
+ return nfserr_resource;
+ if (!*resp->rqstp->rq_next_page)
+ return nfserr_resource;
+
/* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
WRITE32(0);
WRITE32(0);
@@ -3266,6 +3284,10 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4

xdr->iov = xdr->buf->tail;

+ xdr->page_ptr++;
+ xdr->buf->buflen -= PAGE_SIZE;
+ xdr->iov = xdr->buf->tail;
+
/* Use rest of head for padding and remaining ops: */
resp->xdr.buf->tail[0].iov_base = tailbase;
resp->xdr.buf->tail[0].iov_len = 0;
@@ -3800,6 +3822,8 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
!nfsd4_enc_ops[op->opnum]);
encoder = nfsd4_enc_ops[op->opnum];
op->status = encoder(resp, op->status, &op->u);
+ xdr_commit_encode(xdr);
+
/* nfsd4_check_resp_size guarantees enough room for error status */
if (!op->status) {
int space_needed = 0;
@@ -3919,6 +3943,8 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len +
buf->tail[0].iov_len);

+ rqstp->rq_next_page = resp->xdr.page_ptr + 1;
+
p = resp->tagp;
*p++ = htonl(resp->taglen);
memcpy(p, resp->tag, resp->taglen);
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 04e7632..39c50e1 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -244,6 +244,7 @@ struct svc_rqst {
struct page * rq_pages[RPCSVC_MAXPAGES];
struct page * *rq_respages; /* points into rq_pages */
struct page * *rq_next_page; /* next reply page to use */
+ struct page * *rq_page_end; /* one past the last page */

struct kvec rq_vec[RPCSVC_MAXPAGES]; /* generally useful.. */

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index e7bb2e3..b23d69f 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -215,6 +215,7 @@ typedef int (*kxdrdproc_t)(void *rqstp, struct xdr_stream *xdr, void *obj);

extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p);
extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
+extern void xdr_commit_encode(struct xdr_stream *xdr);
extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len);
extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages,
unsigned int base, unsigned int len);
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 06c6ff0..baec792 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -597,6 +597,7 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
}
rqstp->rq_pages[i] = p;
}
+ rqstp->rq_page_end = &rqstp->rq_pages[i];
rqstp->rq_pages[i++] = NULL; /* this might be seen in nfs_read_actor */

/* Make arg->head point to first page and arg->pages point to rest */
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 8ae8ee7..40198c5 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -462,6 +462,7 @@ void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p)
struct kvec *iov = buf->head;
int scratch_len = buf->buflen - buf->page_len - buf->tail[0].iov_len;

+ xdr_set_scratch_buffer(xdr, NULL, 0);
BUG_ON(scratch_len < 0);
xdr->buf = buf;
xdr->iov = iov;
@@ -482,6 +483,74 @@ void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p)
EXPORT_SYMBOL_GPL(xdr_init_encode);

/**
+ * xdr_commit_encode - Ensure all data is written to buffer
+ * @xdr: pointer to xdr_stream
+ *
+ * We handle encoding across page boundaries by giving the caller a
+ * temporary location to write to, then later copying the data into
+ * place; xdr_commit_encode does that copying.
+ *
+ * Normally the caller doesn't need to call this directly, as the
+ * following xdr_reserve_space will do it. But an explicit call may be
+ * required at the end of encoding, or any other time when the xdr_buf
+ * data might be read.
+ */
+void xdr_commit_encode(struct xdr_stream *xdr)
+{
+ int shift = xdr->scratch.iov_len;
+ void *page;
+
+ if (shift == 0)
+ return;
+ page = page_address(*xdr->page_ptr);
+ memcpy(xdr->scratch.iov_base, page, shift);
+ memmove(page, page + shift, (void *)xdr->p - page);
+ xdr->scratch.iov_len = 0;
+}
+EXPORT_SYMBOL_GPL(xdr_commit_encode);
+
+__be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr, size_t nbytes)
+{
+ static __be32 *p;
+ int space_left;
+ int frag1bytes, frag2bytes;
+
+ if (nbytes > PAGE_SIZE)
+ return NULL; /* Bigger buffers require special handling */
+ if (xdr->buf->len + nbytes > xdr->buf->buflen)
+ return NULL; /* Sorry, we're totally out of space */
+ frag1bytes = (xdr->end - xdr->p) << 2;
+ frag2bytes = nbytes - frag1bytes;
+ if (xdr->iov)
+ xdr->iov->iov_len += frag1bytes;
+ else {
+ xdr->buf->page_len += frag1bytes;
+ xdr->page_ptr++;
+ }
+ xdr->iov = NULL;
+ /*
+ * If the last encode didn't end exactly on a page boundary, the
+ * next one will straddle boundaries. Encode into the next
+ * page, then copy it back later in xdr_commit_encode. We use
+ * the "scratch" iov to track any temporarily unused fragment of
+ * space at the end of the previous buffer:
+ */
+ xdr->scratch.iov_base = xdr->p;
+ xdr->scratch.iov_len = frag1bytes;
+ p = page_address(*xdr->page_ptr);
+ /*
+ * Note this is where the next encode will start after we've
+ * shifted this one back:
+ */
+ xdr->p = (void *)p + frag2bytes;
+ space_left = xdr->buf->buflen - xdr->buf->len;
+ xdr->end = (void *)p + min_t(int, space_left, PAGE_SIZE);
+ xdr->buf->page_len += frag2bytes;
+ xdr->buf->len += nbytes;
+ return p;
+}
+
+/**
* xdr_reserve_space - Reserve buffer space for sending
* @xdr: pointer to xdr_stream
* @nbytes: number of bytes to reserve
@@ -495,14 +564,18 @@ __be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
__be32 *p = xdr->p;
__be32 *q;

+ xdr_commit_encode(xdr);
/* align nbytes on the next 32-bit boundary */
nbytes += 3;
nbytes &= ~3;
q = p + (nbytes >> 2);
if (unlikely(q > xdr->end || q < p))
- return NULL;
+ return xdr_get_next_encode_buffer(xdr, nbytes);
xdr->p = q;
- xdr->iov->iov_len += nbytes;
+ if (xdr->iov)
+ xdr->iov->iov_len += nbytes;
+ else
+ xdr->buf->page_len += nbytes;
xdr->buf->len += nbytes;
return p;
}
@@ -539,6 +612,7 @@ void xdr_truncate_encode(struct xdr_stream *xdr, size_t len)
WARN_ON_ONCE(1);
return;
}
+ xdr_commit_encode(xdr);

fraglen = min_t(int, buf->len - len, tail->iov_len);
tail->iov_len -= fraglen;
--
1.9.0


2014-05-28 14:13:12

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 41/52] nfsd4: turn off zero-copy-read in exotic cases

On 05/28/2014 10:01 AM, J. Bruce Fields wrote:
> On Wed, May 28, 2014 at 01:09:45AM -0700, Christoph Hellwig wrote:
>> On Thu, May 22, 2014 at 03:32:16PM -0400, J. Bruce Fields wrote:
>>> Later patches handle those "exotic compounds", this one just makes sure
>>> zero-copy is turned off in those cases.
>> How did you test these exotic compounds?
> I have is a pynfs test that sends a compound with multiple reads in it.
>
> I don't think that's pushed out to my regular pynfs tree, I'll try to do
> that today.
>
> I could really use more of those. Maybe I'm wrong, but I'm worried less
> about this case than the more finicky out-of-reply-space cases, where I
> do have patches puporting to fix problems that I haven't really
> verified.

I'll eventually be using this case for READ_PLUS. I'll be sure to send problems your way! :)

Anna
>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2014-05-22 19:32:37

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 22/52] nfsd4: reserve space before inlining 0-copy pages

From: "J. Bruce Fields" <[email protected]>

Once we've included page-cache pages in the encoding it's difficult to
remove them and restart encoding. (xdr_truncate_encode doesn't handle
that case.) So, make sure we'll have adequate space to finish the
operation first.

For now COMPOUND_SLACK_SPACE checks should prevent this case happening,
but we want to remove those checks.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 9c532eb..9fb1b1a 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3071,6 +3071,10 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
if (!p)
return nfserr_resource;

+ /* Make sure there will be room for padding if needed: */
+ if (xdr->end - xdr->p < 1)
+ return nfserr_resource;
+
maxcount = svc_max_payload(resp->rqstp);
if (maxcount > read->rd_length)
maxcount = read->rd_length;
@@ -3122,8 +3126,6 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
p = xdr_reserve_space(xdr, 4);
- if (!p)
- return nfserr_resource;
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
@@ -3156,6 +3158,9 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
if (!p)
return nfserr_resource;

+ if (xdr->end - xdr->p < 1)
+ return nfserr_resource;
+
/*
* XXX: By default, the ->readlink() VFS op will truncate symlinks
* if they would overflow the buffer. Is this kosher in NFSv4? If
@@ -3182,8 +3187,6 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
p = xdr_reserve_space(xdr, 4);
- if (!p)
- return nfserr_resource;
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
--
1.9.0


2014-05-28 08:09:45

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 41/52] nfsd4: turn off zero-copy-read in exotic cases

On Thu, May 22, 2014 at 03:32:16PM -0400, J. Bruce Fields wrote:
> Later patches handle those "exotic compounds", this one just makes sure
> zero-copy is turned off in those cases.

How did you test these exotic compounds?


2014-05-22 19:32:32

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 18/52] rpc: xdr_truncate_encode

From: "J. Bruce Fields" <[email protected]>

This will be used in the server side in a few cases:
- when certain operations (read, readdir, readlink) fail after
encoding a partial response.
- when we run out of space after encoding a partial response.
- in readlink, where we initially reserve PAGE_SIZE bytes for
data, then truncate to the actual size.

Signed-off-by: J. Bruce Fields <[email protected]>
---
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/xdr.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 68 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 15f9204..e7bb2e3 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -215,6 +215,7 @@ typedef int (*kxdrdproc_t)(void *rqstp, struct xdr_stream *xdr, void *obj);

extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p);
extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
+extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len);
extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages,
unsigned int base, unsigned int len);
extern unsigned int xdr_stream_pos(const struct xdr_stream *xdr);
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index dd97ba3..8ae8ee7 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -509,6 +509,73 @@ __be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
EXPORT_SYMBOL_GPL(xdr_reserve_space);

/**
+ * xdr_truncate_encode - truncate an encode buffer
+ * @xdr: pointer to xdr_stream
+ * @len: new length of buffer
+ *
+ * Truncates the xdr stream, so that xdr->buf->len == len,
+ * and xdr->p points at offset len from the start of the buffer, and
+ * head, tail, and page lengths are adjusted to correspond.
+ *
+ * If this means moving xdr->p to a different buffer, we assume that
+ * that the end pointer should be set to the end of the current page,
+ * except in the case of the head buffer when we assume the head
+ * buffer's current length represents the end of the available buffer.
+ *
+ * This is *not* safe to use on a buffer that already has inlined page
+ * cache pages (as in a zero-copy server read reply), except for the
+ * simple case of truncating from one position in the tail to another.
+ *
+ */
+void xdr_truncate_encode(struct xdr_stream *xdr, size_t len)
+{
+ struct xdr_buf *buf = xdr->buf;
+ struct kvec *head = buf->head;
+ struct kvec *tail = buf->tail;
+ int fraglen;
+ int new, old;
+
+ if (len > buf->len) {
+ WARN_ON_ONCE(1);
+ return;
+ }
+
+ fraglen = min_t(int, buf->len - len, tail->iov_len);
+ tail->iov_len -= fraglen;
+ buf->len -= fraglen;
+ if (tail->iov_len && buf->len == len) {
+ xdr->p = tail->iov_base + tail->iov_len;
+ /* xdr->end, xdr->iov should be set already */
+ return;
+ }
+ WARN_ON_ONCE(fraglen);
+ fraglen = min_t(int, buf->len - len, buf->page_len);
+ buf->page_len -= fraglen;
+ buf->len -= fraglen;
+
+ new = buf->page_base + buf->page_len;
+ old = new + fraglen;
+ xdr->page_ptr -= (old >> PAGE_SHIFT) - (new >> PAGE_SHIFT);
+
+ if (buf->page_len && buf->len == len) {
+ xdr->p = page_address(*xdr->page_ptr);
+ xdr->end = (void *)xdr->p + PAGE_SIZE;
+ xdr->p = (void *)xdr->p + (new % PAGE_SIZE);
+ /* xdr->iov should already be NULL */
+ return;
+ }
+ if (fraglen)
+ xdr->end = head->iov_base + head->iov_len;
+ /* (otherwise assume xdr->end is already set) */
+ head->iov_len = len;
+ buf->len = len;
+ xdr->p = head->iov_base + head->iov_len;
+ xdr->iov = buf->head;
+ xdr->page_ptr -= 1;
+}
+EXPORT_SYMBOL(xdr_truncate_encode);
+
+/**
* xdr_write_pages - Insert a list of pages into an XDR buffer for sending
* @xdr: pointer to xdr_stream
* @pages: list of pages
--
1.9.0


2014-05-22 19:32:43

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 32/52] nfsd4: fix buflen calculation after read encoding

From: "J. Bruce Fields" <[email protected]>

We don't necessarily want to assume that the buflen is the same
as the number of bytes available in the pages. We may have some reason
to set it to something less (for example, later patches will use a
smaller buflen to enforce session limits).

So, calculate the buflen relative to the previous buflen instead of
recalculating it from scratch.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 91d0b9a..ad594f2 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3053,6 +3053,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
unsigned long maxcount;
struct xdr_stream *xdr = &resp->xdr;
int starting_len = xdr->buf->len;
+ int space_left;
long len;
__be32 *p;

@@ -3117,7 +3118,6 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
resp->xdr.buf->page_len = maxcount;
xdr->buf->len += maxcount;
xdr->page_ptr += v;
- xdr->buf->buflen = maxcount + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
@@ -3130,6 +3130,12 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
xdr->buf->len -= (maxcount&3);
}
+
+ space_left = min_t(int, (void *)xdr->end - (void *)xdr->p,
+ xdr->buf->buflen - xdr->buf->len);
+ xdr->buf->buflen = xdr->buf->len + space_left;
+ xdr->end = (__be32 *)((void *)xdr->end + space_left);
+
return 0;
}

--
1.9.0


2014-05-22 19:32:34

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 13/52] nfsd4: use xdr_reserve_space in attribute encoding

From: "J. Bruce Fields" <[email protected]>

This is a cosmetic change for now; no change in behavior.

Note we're just depending on xdr_reserve_space to do the bounds checking
for us, we're not really depending on its adjustment of iovec or xdr_buf
lengths yet, as those are fixed up by as necessary after the fact by
read-link operations and by nfs4svc_encode_compoundres. However we do
have to update xdr->iov on read-like operations to prevent
xdr_reserve_space from messing with the already-fixed-up length of the
the head.

When the attribute encoding fails partway through we have to undo the
length adjustments made so far. We do it manually for now, but later
patches will add an xdr_truncate_encode() helper to handle cases like
this.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/acl.h | 2 +-
fs/nfsd/idmap.h | 4 +-
fs/nfsd/nfs4acl.c | 11 +-
fs/nfsd/nfs4idmap.c | 42 ++++----
fs/nfsd/nfs4proc.c | 1 +
fs/nfsd/nfs4xdr.c | 286 +++++++++++++++++++++++++++++++---------------------
6 files changed, 202 insertions(+), 144 deletions(-)

diff --git a/fs/nfsd/acl.h b/fs/nfsd/acl.h
index b481e1f..a986ceb 100644
--- a/fs/nfsd/acl.h
+++ b/fs/nfsd/acl.h
@@ -49,7 +49,7 @@ struct svc_rqst;

struct nfs4_acl *nfs4_acl_new(int);
int nfs4_acl_get_whotype(char *, u32);
-__be32 nfs4_acl_write_who(int who, __be32 **p, int *len);
+__be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who);

int nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
struct nfs4_acl **acl);
diff --git a/fs/nfsd/idmap.h b/fs/nfsd/idmap.h
index 66e58db..a3f3490 100644
--- a/fs/nfsd/idmap.h
+++ b/fs/nfsd/idmap.h
@@ -56,7 +56,7 @@ static inline void nfsd_idmap_shutdown(struct net *net)

__be32 nfsd_map_name_to_uid(struct svc_rqst *, const char *, size_t, kuid_t *);
__be32 nfsd_map_name_to_gid(struct svc_rqst *, const char *, size_t, kgid_t *);
-__be32 nfsd4_encode_user(struct svc_rqst *, kuid_t, __be32 **, int *);
-__be32 nfsd4_encode_group(struct svc_rqst *, kgid_t, __be32 **, int *);
+__be32 nfsd4_encode_user(struct xdr_stream *, struct svc_rqst *, kuid_t);
+__be32 nfsd4_encode_group(struct xdr_stream *, struct svc_rqst *, kgid_t);

#endif /* LINUX_NFSD_IDMAP_H */
diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
index 7c7c025..d714156 100644
--- a/fs/nfsd/nfs4acl.c
+++ b/fs/nfsd/nfs4acl.c
@@ -919,20 +919,19 @@ nfs4_acl_get_whotype(char *p, u32 len)
return NFS4_ACL_WHO_NAMED;
}

-__be32 nfs4_acl_write_who(int who, __be32 **p, int *len)
+__be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who)
{
+ __be32 *p;
int i;
- int bytes;

for (i = 0; i < ARRAY_SIZE(s2t_map); i++) {
if (s2t_map[i].type != who)
continue;
- bytes = 4 + (XDR_QUADLEN(s2t_map[i].stringlen) << 2);
- if (bytes > *len)
+ p = xdr_reserve_space(xdr, s2t_map[i].stringlen + 4);
+ if (!p)
return nfserr_resource;
- *p = xdr_encode_opaque(*p, s2t_map[i].string,
+ p = xdr_encode_opaque(p, s2t_map[i].string,
s2t_map[i].stringlen);
- *len -= bytes;
return 0;
}
WARN_ON_ONCE(1);
diff --git a/fs/nfsd/nfs4idmap.c b/fs/nfsd/nfs4idmap.c
index c0dfde6..a0ab0a8 100644
--- a/fs/nfsd/nfs4idmap.c
+++ b/fs/nfsd/nfs4idmap.c
@@ -551,44 +551,43 @@ idmap_name_to_id(struct svc_rqst *rqstp, int type, const char *name, u32 namelen
return 0;
}

-static __be32 encode_ascii_id(u32 id, __be32 **p, int *buflen)
+static __be32 encode_ascii_id(struct xdr_stream *xdr, u32 id)
{
char buf[11];
int len;
- int bytes;
+ __be32 *p;

len = sprintf(buf, "%u", id);
- bytes = 4 + (XDR_QUADLEN(len) << 2);
- if (bytes > *buflen)
+ p = xdr_reserve_space(xdr, len + 4);
+ if (!p)
return nfserr_resource;
- *p = xdr_encode_opaque(*p, buf, len);
- *buflen -= bytes;
+ p = xdr_encode_opaque(p, buf, len);
return 0;
}

-static __be32 idmap_id_to_name(struct svc_rqst *rqstp, int type, u32 id, __be32 **p, int *buflen)
+static __be32 idmap_id_to_name(struct xdr_stream *xdr,
+ struct svc_rqst *rqstp, int type, u32 id)
{
struct ent *item, key = {
.id = id,
.type = type,
};
+ __be32 *p;
int ret;
- int bytes;
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);

strlcpy(key.authname, rqst_authname(rqstp), sizeof(key.authname));
ret = idmap_lookup(rqstp, idtoname_lookup, &key, nn->idtoname_cache, &item);
if (ret == -ENOENT)
- return encode_ascii_id(id, p, buflen);
+ return encode_ascii_id(xdr, id);
if (ret)
return nfserrno(ret);
ret = strlen(item->name);
WARN_ON_ONCE(ret > IDMAP_NAMESZ);
- bytes = 4 + (XDR_QUADLEN(ret) << 2);
- if (bytes > *buflen)
+ p = xdr_reserve_space(xdr, ret + 4);
+ if (!p)
return nfserr_resource;
- *p = xdr_encode_opaque(*p, item->name, ret);
- *buflen -= bytes;
+ p = xdr_encode_opaque(p, item->name, ret);
cache_put(&item->h, nn->idtoname_cache);
return 0;
}
@@ -622,11 +621,12 @@ do_name_to_id(struct svc_rqst *rqstp, int type, const char *name, u32 namelen, u
return idmap_name_to_id(rqstp, type, name, namelen, id);
}

-static __be32 encode_name_from_id(struct svc_rqst *rqstp, int type, u32 id, __be32 **p, int *buflen)
+static __be32 encode_name_from_id(struct xdr_stream *xdr,
+ struct svc_rqst *rqstp, int type, u32 id)
{
if (nfs4_disable_idmapping && rqstp->rq_cred.cr_flavor < RPC_AUTH_GSS)
- return encode_ascii_id(id, p, buflen);
- return idmap_id_to_name(rqstp, type, id, p, buflen);
+ return encode_ascii_id(xdr, id);
+ return idmap_id_to_name(xdr, rqstp, type, id);
}

__be32
@@ -655,14 +655,16 @@ nfsd_map_name_to_gid(struct svc_rqst *rqstp, const char *name, size_t namelen,
return status;
}

-__be32 nfsd4_encode_user(struct svc_rqst *rqstp, kuid_t uid, __be32 **p, int *buflen)
+__be32 nfsd4_encode_user(struct xdr_stream *xdr, struct svc_rqst *rqstp,
+ kuid_t uid)
{
u32 id = from_kuid(&init_user_ns, uid);
- return encode_name_from_id(rqstp, IDMAP_TYPE_USER, id, p, buflen);
+ return encode_name_from_id(xdr, rqstp, IDMAP_TYPE_USER, id);
}

-__be32 nfsd4_encode_group(struct svc_rqst *rqstp, kgid_t gid, __be32 **p, int *buflen)
+__be32 nfsd4_encode_group(struct xdr_stream *xdr, struct svc_rqst *rqstp,
+ kgid_t gid)
{
u32 id = from_kgid(&init_user_ns, gid);
- return encode_name_from_id(rqstp, IDMAP_TYPE_GROUP, id, p, buflen);
+ return encode_name_from_id(xdr, rqstp, IDMAP_TYPE_GROUP, id);
}
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index db14965..1433854 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1261,6 +1261,7 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
struct kvec *head = buf->head;

xdr->buf = buf;
+ xdr->iov = head;
xdr->p = head->iov_base + head->iov_len;
xdr->end = head->iov_base + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
}
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 2ed8036..0ab7aae 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1755,18 +1755,20 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
/* Encode as an array of strings the string given with components
* separated @sep, escaped with esc_enter and esc_exit.
*/
-static __be32 nfsd4_encode_components_esc(char sep, char *components,
- __be32 **pp, int *buflen,
- char esc_enter, char esc_exit)
+static __be32 nfsd4_encode_components_esc(struct xdr_stream *xdr, char sep,
+ char *components, char esc_enter,
+ char esc_exit)
{
- __be32 *p = *pp;
- __be32 *countp = p;
+ __be32 *p;
+ __be32 *countp;
int strlen, count=0;
char *str, *end, *next;

dprintk("nfsd4_encode_components(%s)\n", components);
- if ((*buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
return nfserr_resource;
+ countp = p;
WRITE32(0); /* We will fill this in with @count later */
end = str = components;
while (*end) {
@@ -1789,7 +1791,8 @@ static __be32 nfsd4_encode_components_esc(char sep, char *components,

strlen = end - str;
if (strlen) {
- if ((*buflen -= ((XDR_QUADLEN(strlen) << 2) + 4)) < 0)
+ p = xdr_reserve_space(xdr, strlen + 4);
+ if (!p)
return nfserr_resource;
WRITE32(strlen);
WRITEMEM(str, strlen);
@@ -1799,7 +1802,6 @@ static __be32 nfsd4_encode_components_esc(char sep, char *components,
end++;
str = end;
}
- *pp = p;
p = countp;
WRITE32(count);
return 0;
@@ -1808,40 +1810,39 @@ static __be32 nfsd4_encode_components_esc(char sep, char *components,
/* Encode as an array of strings the string given with components
* separated @sep.
*/
-static __be32 nfsd4_encode_components(char sep, char *components,
- __be32 **pp, int *buflen)
+static __be32 nfsd4_encode_components(struct xdr_stream *xdr, char sep,
+ char *components)
{
- return nfsd4_encode_components_esc(sep, components, pp, buflen, 0, 0);
+ return nfsd4_encode_components_esc(xdr, sep, components, 0, 0);
}

/*
* encode a location element of a fs_locations structure
*/
-static __be32 nfsd4_encode_fs_location4(struct nfsd4_fs_location *location,
- __be32 **pp, int *buflen)
+static __be32 nfsd4_encode_fs_location4(struct xdr_stream *xdr,
+ struct nfsd4_fs_location *location)
{
__be32 status;
- __be32 *p = *pp;

- status = nfsd4_encode_components_esc(':', location->hosts, &p, buflen,
+ status = nfsd4_encode_components_esc(xdr, ':', location->hosts,
'[', ']');
if (status)
return status;
- status = nfsd4_encode_components('/', location->path, &p, buflen);
+ status = nfsd4_encode_components(xdr, '/', location->path);
if (status)
return status;
- *pp = p;
return 0;
}

/*
* Encode a path in RFC3530 'pathname4' format
*/
-static __be32 nfsd4_encode_path(const struct path *root,
- const struct path *path, __be32 **pp, int *buflen)
+static __be32 nfsd4_encode_path(struct xdr_stream *xdr,
+ const struct path *root,
+ const struct path *path)
{
struct path cur = *path;
- __be32 *p = *pp;
+ __be32 *p;
struct dentry **components = NULL;
unsigned int ncomponents = 0;
__be32 err = nfserr_jukebox;
@@ -1872,9 +1873,9 @@ static __be32 nfsd4_encode_path(const struct path *root,
components[ncomponents++] = cur.dentry;
cur.dentry = dget_parent(cur.dentry);
}
-
- *buflen -= 4;
- if (*buflen < 0)
+ err = nfserr_resource;
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_free;
WRITE32(ncomponents);

@@ -1884,8 +1885,8 @@ static __be32 nfsd4_encode_path(const struct path *root,

spin_lock(&dentry->d_lock);
len = dentry->d_name.len;
- *buflen -= 4 + (XDR_QUADLEN(len) << 2);
- if (*buflen < 0) {
+ p = xdr_reserve_space(xdr, len + 4);
+ if (!p) {
spin_unlock(&dentry->d_lock);
goto out_free;
}
@@ -1897,7 +1898,6 @@ static __be32 nfsd4_encode_path(const struct path *root,
ncomponents--;
}

- *pp = p;
err = 0;
out_free:
dprintk(")\n");
@@ -1908,8 +1908,8 @@ out_free:
return err;
}

-static __be32 nfsd4_encode_fsloc_fsroot(struct svc_rqst *rqstp,
- const struct path *path, __be32 **pp, int *buflen)
+static __be32 nfsd4_encode_fsloc_fsroot(struct xdr_stream *xdr,
+ struct svc_rqst *rqstp, const struct path *path)
{
struct svc_export *exp_ps;
__be32 res;
@@ -1917,7 +1917,7 @@ static __be32 nfsd4_encode_fsloc_fsroot(struct svc_rqst *rqstp,
exp_ps = rqst_find_fsidzero_export(rqstp);
if (IS_ERR(exp_ps))
return nfserrno(PTR_ERR(exp_ps));
- res = nfsd4_encode_path(&exp_ps->ex_path, path, pp, buflen);
+ res = nfsd4_encode_path(xdr, &exp_ps->ex_path, path);
exp_put(exp_ps);
return res;
}
@@ -1925,28 +1925,26 @@ static __be32 nfsd4_encode_fsloc_fsroot(struct svc_rqst *rqstp,
/*
* encode a fs_locations structure
*/
-static __be32 nfsd4_encode_fs_locations(struct svc_rqst *rqstp,
- struct svc_export *exp,
- __be32 **pp, int *buflen)
+static __be32 nfsd4_encode_fs_locations(struct xdr_stream *xdr,
+ struct svc_rqst *rqstp, struct svc_export *exp)
{
__be32 status;
int i;
- __be32 *p = *pp;
+ __be32 *p;
struct nfsd4_fs_locations *fslocs = &exp->ex_fslocs;

- status = nfsd4_encode_fsloc_fsroot(rqstp, &exp->ex_path, &p, buflen);
+ status = nfsd4_encode_fsloc_fsroot(xdr, rqstp, &exp->ex_path);
if (status)
return status;
- if ((*buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
return nfserr_resource;
WRITE32(fslocs->locations_count);
for (i=0; i<fslocs->locations_count; i++) {
- status = nfsd4_encode_fs_location4(&fslocs->locations[i],
- &p, buflen);
+ status = nfsd4_encode_fs_location4(xdr, &fslocs->locations[i]);
if (status)
return status;
}
- *pp = p;
return 0;
}

@@ -1965,15 +1963,15 @@ static u32 nfs4_file_type(umode_t mode)
}

static inline __be32
-nfsd4_encode_aclname(struct svc_rqst *rqstp, struct nfs4_ace *ace,
- __be32 **p, int *buflen)
+nfsd4_encode_aclname(struct xdr_stream *xdr, struct svc_rqst *rqstp,
+ struct nfs4_ace *ace)
{
if (ace->whotype != NFS4_ACL_WHO_NAMED)
- return nfs4_acl_write_who(ace->whotype, p, buflen);
+ return nfs4_acl_write_who(xdr, ace->whotype);
else if (ace->flag & NFS4_ACE_IDENTIFIER_GROUP)
- return nfsd4_encode_group(rqstp, ace->who_gid, p, buflen);
+ return nfsd4_encode_group(xdr, rqstp, ace->who_gid);
else
- return nfsd4_encode_user(rqstp, ace->who_uid, p, buflen);
+ return nfsd4_encode_user(xdr, rqstp, ace->who_uid);
}

#define WORD0_ABSENT_FS_ATTRS (FATTR4_WORD0_FS_LOCATIONS | FATTR4_WORD0_FSID | \
@@ -1982,31 +1980,28 @@ nfsd4_encode_aclname(struct svc_rqst *rqstp, struct nfs4_ace *ace,

#ifdef CONFIG_NFSD_V4_SECURITY_LABEL
static inline __be32
-nfsd4_encode_security_label(struct svc_rqst *rqstp, void *context, int len, __be32 **pp, int *buflen)
+nfsd4_encode_security_label(struct xdr_stream *xdr, struct svc_rqst *rqstp,
+ void *context, int len)
{
__be32 *p = *pp;

- if (*buflen < ((XDR_QUADLEN(len) << 2) + 4 + 4 + 4))
+ p = xdr_reserve_space(xdr, len + 4 + 4 + 4);
+ if (!p)
return nfserr_resource;

/*
* For now we use a 0 here to indicate the null translation; in
* the future we may place a call to translation code here.
*/
- if ((*buflen -= 8) < 0)
- return nfserr_resource;
-
WRITE32(0); /* lfs */
WRITE32(0); /* pi */
p = xdr_encode_opaque(p, context, len);
- *buflen -= (XDR_QUADLEN(len) << 2) + 4;
-
- *pp = p;
return 0;
}
#else
static inline __be32
-nfsd4_encode_security_label(struct svc_rqst *rqstp, void *context, int len, __be32 **pp, int *buflen)
+nfsd4_encode_security_label(struct xdr_stream *xdr, struct svc_rqst *rqstp,
+ void *context, int len)
{ return 0; }
#endif

@@ -2058,8 +2053,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
struct kstat stat;
struct svc_fh *tempfh = NULL;
struct kstatfs statfs;
- __be32 *p = xdr->p;
- int buflen = xdr->buf->buflen;
+ __be32 *p;
+ __be32 *start = xdr->p;
__be32 *attrlenp;
u32 dummy;
u64 dummy64;
@@ -2144,24 +2139,30 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
#endif /* CONFIG_NFSD_V4_SECURITY_LABEL */

if (bmval2) {
- if ((buflen -= 16) < 0)
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
goto out_resource;
WRITE32(3);
WRITE32(bmval0);
WRITE32(bmval1);
WRITE32(bmval2);
} else if (bmval1) {
- if ((buflen -= 12) < 0)
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
goto out_resource;
WRITE32(2);
WRITE32(bmval0);
WRITE32(bmval1);
} else {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE32(1);
WRITE32(bmval0);
}
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ goto out_resource;
attrlenp = p++; /* to be backfilled later */

if (bmval0 & FATTR4_WORD0_SUPPORTED_ATTRS) {
@@ -2174,13 +2175,15 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
if (!contextsupport)
word2 &= ~FATTR4_WORD2_SECURITY_LABEL;
if (!word2) {
- if ((buflen -= 12) < 0)
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
goto out_resource;
WRITE32(2);
WRITE32(word0);
WRITE32(word1);
} else {
- if ((buflen -= 16) < 0)
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
goto out_resource;
WRITE32(3);
WRITE32(word0);
@@ -2189,7 +2192,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
}
}
if (bmval0 & FATTR4_WORD0_TYPE) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
dummy = nfs4_file_type(stat.mode);
if (dummy == NF4BAD) {
@@ -2199,7 +2203,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
WRITE32(dummy);
}
if (bmval0 & FATTR4_WORD0_FH_EXPIRE_TYPE) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
if (exp->ex_flags & NFSEXP_NOSUBTREECHECK)
WRITE32(NFS4_FH_PERSISTENT);
@@ -2207,32 +2212,38 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
WRITE32(NFS4_FH_PERSISTENT|NFS4_FH_VOL_RENAME);
}
if (bmval0 & FATTR4_WORD0_CHANGE) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
write_change(&p, &stat, dentry->d_inode);
}
if (bmval0 & FATTR4_WORD0_SIZE) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64(stat.size);
}
if (bmval0 & FATTR4_WORD0_LINK_SUPPORT) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval0 & FATTR4_WORD0_SYMLINK_SUPPORT) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval0 & FATTR4_WORD0_NAMED_ATTR) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(0);
}
if (bmval0 & FATTR4_WORD0_FSID) {
- if ((buflen -= 16) < 0)
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
goto out_resource;
if (exp->ex_fslocs.migrated) {
WRITE64(NFS4_REFERRAL_FSID_MAJOR);
@@ -2254,17 +2265,20 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
}
}
if (bmval0 & FATTR4_WORD0_UNIQUE_HANDLES) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(0);
}
if (bmval0 & FATTR4_WORD0_LEASE_TIME) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(nn->nfsd4_lease);
}
if (bmval0 & FATTR4_WORD0_RDATTR_ERROR) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(rdattr_err);
}
@@ -2272,198 +2286,229 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
struct nfs4_ace *ace;

if (acl == NULL) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;

WRITE32(0);
goto out_acl;
}
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(acl->naces);

for (ace = acl->aces; ace < acl->aces + acl->naces; ace++) {
- if ((buflen -= 4*3) < 0)
+ p = xdr_reserve_space(xdr, 4*3);
+ if (!p)
goto out_resource;
WRITE32(ace->type);
WRITE32(ace->flag);
WRITE32(ace->access_mask & NFS4_ACE_MASK_ALL);
- status = nfsd4_encode_aclname(rqstp, ace, &p, &buflen);
+ status = nfsd4_encode_aclname(xdr, rqstp, ace);
if (status)
goto out;
}
}
out_acl:
if (bmval0 & FATTR4_WORD0_ACLSUPPORT) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(aclsupport ?
ACL4_SUPPORT_ALLOW_ACL|ACL4_SUPPORT_DENY_ACL : 0);
}
if (bmval0 & FATTR4_WORD0_CANSETTIME) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval0 & FATTR4_WORD0_CASE_INSENSITIVE) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(0);
}
if (bmval0 & FATTR4_WORD0_CASE_PRESERVING) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval0 & FATTR4_WORD0_CHOWN_RESTRICTED) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval0 & FATTR4_WORD0_FILEHANDLE) {
- buflen -= (XDR_QUADLEN(fhp->fh_handle.fh_size) << 2) + 4;
- if (buflen < 0)
+ p = xdr_reserve_space(xdr, fhp->fh_handle.fh_size + 4);
+ if (!p)
goto out_resource;
WRITE32(fhp->fh_handle.fh_size);
WRITEMEM(&fhp->fh_handle.fh_base, fhp->fh_handle.fh_size);
}
if (bmval0 & FATTR4_WORD0_FILEID) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64(stat.ino);
}
if (bmval0 & FATTR4_WORD0_FILES_AVAIL) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64((u64) statfs.f_ffree);
}
if (bmval0 & FATTR4_WORD0_FILES_FREE) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64((u64) statfs.f_ffree);
}
if (bmval0 & FATTR4_WORD0_FILES_TOTAL) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64((u64) statfs.f_files);
}
if (bmval0 & FATTR4_WORD0_FS_LOCATIONS) {
- status = nfsd4_encode_fs_locations(rqstp, exp, &p, &buflen);
+ status = nfsd4_encode_fs_locations(xdr, rqstp, exp);
if (status)
goto out;
}
if (bmval0 & FATTR4_WORD0_HOMOGENEOUS) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval0 & FATTR4_WORD0_MAXFILESIZE) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64(exp->ex_path.mnt->mnt_sb->s_maxbytes);
}
if (bmval0 & FATTR4_WORD0_MAXLINK) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(255);
}
if (bmval0 & FATTR4_WORD0_MAXNAME) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(statfs.f_namelen);
}
if (bmval0 & FATTR4_WORD0_MAXREAD) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64((u64) svc_max_payload(rqstp));
}
if (bmval0 & FATTR4_WORD0_MAXWRITE) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE64((u64) svc_max_payload(rqstp));
}
if (bmval1 & FATTR4_WORD1_MODE) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(stat.mode & S_IALLUGO);
}
if (bmval1 & FATTR4_WORD1_NO_TRUNC) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(1);
}
if (bmval1 & FATTR4_WORD1_NUMLINKS) {
- if ((buflen -= 4) < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto out_resource;
WRITE32(stat.nlink);
}
if (bmval1 & FATTR4_WORD1_OWNER) {
- status = nfsd4_encode_user(rqstp, stat.uid, &p, &buflen);
+ status = nfsd4_encode_user(xdr, rqstp, stat.uid);
if (status)
goto out;
}
if (bmval1 & FATTR4_WORD1_OWNER_GROUP) {
- status = nfsd4_encode_group(rqstp, stat.gid, &p, &buflen);
+ status = nfsd4_encode_group(xdr, rqstp, stat.gid);
if (status)
goto out;
}
if (bmval1 & FATTR4_WORD1_RAWDEV) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
WRITE32((u32) MAJOR(stat.rdev));
WRITE32((u32) MINOR(stat.rdev));
}
if (bmval1 & FATTR4_WORD1_SPACE_AVAIL) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
dummy64 = (u64)statfs.f_bavail * (u64)statfs.f_bsize;
WRITE64(dummy64);
}
if (bmval1 & FATTR4_WORD1_SPACE_FREE) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
dummy64 = (u64)statfs.f_bfree * (u64)statfs.f_bsize;
WRITE64(dummy64);
}
if (bmval1 & FATTR4_WORD1_SPACE_TOTAL) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
dummy64 = (u64)statfs.f_blocks * (u64)statfs.f_bsize;
WRITE64(dummy64);
}
if (bmval1 & FATTR4_WORD1_SPACE_USED) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
dummy64 = (u64)stat.blocks << 9;
WRITE64(dummy64);
}
if (bmval1 & FATTR4_WORD1_TIME_ACCESS) {
- if ((buflen -= 12) < 0)
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
goto out_resource;
WRITE64((s64)stat.atime.tv_sec);
WRITE32(stat.atime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_TIME_DELTA) {
- if ((buflen -= 12) < 0)
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
goto out_resource;
WRITE32(0);
WRITE32(1);
WRITE32(0);
}
if (bmval1 & FATTR4_WORD1_TIME_METADATA) {
- if ((buflen -= 12) < 0)
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
goto out_resource;
WRITE64((s64)stat.ctime.tv_sec);
WRITE32(stat.ctime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_TIME_MODIFY) {
- if ((buflen -= 12) < 0)
+ p = xdr_reserve_space(xdr, 12);
+ if (!p)
goto out_resource;
WRITE64((s64)stat.mtime.tv_sec);
WRITE32(stat.mtime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) {
- if ((buflen -= 8) < 0)
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
goto out_resource;
/*
* Get parent's attributes if not ignoring crossmount
@@ -2475,13 +2520,14 @@ out_acl:
WRITE64(stat.ino);
}
if (bmval2 & FATTR4_WORD2_SECURITY_LABEL) {
- status = nfsd4_encode_security_label(rqstp, context,
- contextlen, &p, &buflen);
+ status = nfsd4_encode_security_label(xdr, rqstp, context,
+ contextlen);
if (status)
goto out;
}
if (bmval2 & FATTR4_WORD2_SUPPATTR_EXCLCREAT) {
- if ((buflen -= 16) < 0)
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
goto out_resource;
WRITE32(3);
WRITE32(NFSD_SUPPATTR_EXCLCREAT_WORD0);
@@ -2489,8 +2535,7 @@ out_acl:
WRITE32(NFSD_SUPPATTR_EXCLCREAT_WORD2);
}

- *attrlenp = htonl((char *)p - (char *)attrlenp - 4);
- xdr->p = p;
+ *attrlenp = htonl((char *)xdr->p - (char *)attrlenp - 4);
status = nfs_ok;

out:
@@ -2503,6 +2548,13 @@ out:
fh_put(tempfh);
kfree(tempfh);
}
+ if (status) {
+ int nbytes = (char *)xdr->p - (char *)start;
+ /* open code what *should* be xdr_truncate(xdr, len); */
+ xdr->iov->iov_len -= nbytes;
+ xdr->buf->len -= nbytes;
+ xdr->p = start;
+ }
return status;
out_nfserr:
status = nfserrno(err);
@@ -2768,13 +2820,10 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
{
struct svc_fh *fhp = getattr->ga_fhp;
struct xdr_stream *xdr = &resp->xdr;
- struct xdr_buf *buf = resp->xdr.buf;

if (nfserr)
return nfserr;

- buf->buflen = (void *)resp->xdr.end - (void *)resp->xdr.p
- - COMPOUND_ERR_SLACK_SPACE;
nfserr = nfsd4_encode_fattr(xdr, fhp, fhp->fh_export, fhp->fh_dentry,
getattr->ga_bmval,
resp->rqstp, 0);
@@ -2971,6 +3020,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
int v;
struct page *page;
unsigned long maxcount;
+ struct xdr_stream *xdr = &resp->xdr;
long len;
__be32 *p;

@@ -3017,6 +3067,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
resp->xdr.buf->head[0].iov_len = (char *)p
- (char *)resp->xdr.buf->head[0].iov_base;
resp->xdr.buf->page_len = maxcount;
+ xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
resp->xdr.buf->tail[0].iov_base = p;
@@ -3035,6 +3086,7 @@ static __be32
nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readlink *readlink)
{
int maxcount;
+ struct xdr_stream *xdr = &resp->xdr;
char *page;
__be32 *p;

@@ -3067,6 +3119,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
resp->xdr.buf->head[0].iov_len = (char *)p
- (char *)resp->xdr.buf->head[0].iov_base;
resp->xdr.buf->page_len = maxcount;
+ xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
resp->xdr.buf->tail[0].iov_base = p;
@@ -3086,6 +3139,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
{
int maxcount;
loff_t offset;
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *page, *savep, *tailbase;
__be32 *p;

@@ -3148,6 +3202,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
resp->xdr.buf->page_len = ((char *)p) -
(char*)page_address(*(resp->rqstp->rq_next_page-1));

+ xdr->iov = xdr->buf->tail;
+
/* Use rest of head for padding and remaining ops: */
resp->xdr.buf->tail[0].iov_base = tailbase;
resp->xdr.buf->tail[0].iov_len = 0;
--
1.9.0


2014-05-22 19:32:32

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 04/52] nfsd4: decoding errors can still be cached and require space

From: "J. Bruce Fields" <[email protected]>

Currently a non-idempotent op reply may be cached if it fails in the
proc code but not if it fails at xdr decoding. I doubt there are any
xdr-decoding-time errors that would make this a problem in practice, so
this probably isn't a serious bug.

The space estimates should also take into account space required for
encoding of error returns. Again, not a practical problem, though it
would become one after future patches which will tighten the space
estimates.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 ++
fs/nfsd/nfs4xdr.c | 10 +++++-----
2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 2c1ee70..26995a5 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1215,6 +1215,8 @@ static inline struct nfsd4_operation *OPDESC(struct nfsd4_op *op)

bool nfsd4_cache_this_op(struct nfsd4_op *op)
{
+ if (op->opnum == OP_ILLEGAL)
+ return false;
return OPDESC(op)->op_flags & OP_CACHEME;
}

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 18881f3..e866a06 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1677,11 +1677,6 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
op->opnum = OP_ILLEGAL;
op->status = nfserr_op_illegal;
}
-
- if (op->status) {
- argp->opcnt = i+1;
- break;
- }
/*
* We'll try to cache the result in the DRC if any one
* op in the compound wants to be cached:
@@ -1689,6 +1684,11 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
cachethis |= nfsd4_cache_this_op(op);

max_reply = max(max_reply, nfsd4_max_reply(op->opnum));
+
+ if (op->status) {
+ argp->opcnt = i+1;
+ break;
+ }
}
/* Sessions make the DRC unnecessary: */
if (argp->minorversion)
--
1.9.0


2014-05-22 19:32:40

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 27/52] nfsd4: convert 4.1 replay encoding

From: "J. Bruce Fields" <[email protected]>

Limits on maxresp_sz mean that we only ever need to replay rpc's that
are contained entirely in the head.

The one exception is very small zero-copy reads. That's an odd corner
case as clients wouldn't normally ask those to be cached.

in any case, this seems a little more robust.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4state.c | 28 ++++++++++++++--------------
fs/nfsd/nfs4xdr.c | 2 +-
fs/nfsd/xdr4.h | 2 +-
3 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 63bb9eb..e936d9f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1560,6 +1560,7 @@ out_err:
void
nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
{
+ struct xdr_buf *buf = resp->xdr.buf;
struct nfsd4_slot *slot = resp->cstate.slot;
unsigned int base;

@@ -1573,11 +1574,9 @@ nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
slot->sl_datalen = 0;
return;
}
- slot->sl_datalen = (char *)resp->xdr.p - (char *)resp->cstate.datap;
- base = (char *)resp->cstate.datap -
- (char *)resp->xdr.buf->head[0].iov_base;
- if (read_bytes_from_xdr_buf(resp->xdr.buf, base, slot->sl_data,
- slot->sl_datalen))
+ base = resp->cstate.data_offset;
+ slot->sl_datalen = buf->len - base;
+ if (read_bytes_from_xdr_buf(buf, base, slot->sl_data, slot->sl_datalen))
WARN("%s: sessions DRC could not cache compound\n", __func__);
return;
}
@@ -1618,7 +1617,8 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
struct nfsd4_sequence *seq)
{
struct nfsd4_slot *slot = resp->cstate.slot;
- struct kvec *head = resp->xdr.iov;
+ struct xdr_stream *xdr = &resp->xdr;
+ __be32 *p;
__be32 status;

dprintk("--> %s slot %p\n", __func__, slot);
@@ -1627,16 +1627,16 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
if (status)
return status;

- /* The sequence operation has been encoded, cstate->datap set. */
- memcpy(resp->cstate.datap, slot->sl_data, slot->sl_datalen);
+ p = xdr_reserve_space(xdr, slot->sl_datalen);
+ if (!p) {
+ WARN_ON_ONCE(1);
+ return nfserr_serverfault;
+ }
+ xdr_encode_opaque_fixed(p, slot->sl_data, slot->sl_datalen);
+ xdr_commit_encode(xdr);

resp->opcnt = slot->sl_opcnt;
- resp->xdr.p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
- head->iov_len = (void *)resp->xdr.p - head->iov_base;
- resp->xdr.buf->len = head->iov_len;
- status = slot->sl_status;
-
- return status;
+ return slot->sl_status;
}

/*
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 1bac000..19df1c2 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3659,7 +3659,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(seq->maxslots - 1); /* sr_target_highest_slotid */
WRITE32(seq->status_flags);

- resp->cstate.datap = p; /* DRC cache data pointer */
+ resp->cstate.data_offset = xdr->buf->len; /* DRC cache data pointer */
return 0;
}

diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index ea5ad5d..ee9ffdc 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -58,7 +58,7 @@ struct nfsd4_compound_state {
/* For sessions DRC */
struct nfsd4_session *session;
struct nfsd4_slot *slot;
- __be32 *datap;
+ int data_offset;
size_t iovlen;
u32 minorversion;
__be32 status;
--
1.9.0


2014-05-22 19:32:34

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 21/52] nfsd4: teach encoders to handle reserve_space failures

From: "J. Bruce Fields" <[email protected]>

We've tried to prevent running out of space with COMPOUND_SLACK_SPACE
and special checking in those operations (getattr) whose result can vary
enormously.

However:
- COMPOUND_SLACK_SPACE may be difficult to maintain as we add
more protocol.
- BUG_ON or page faulting on failure seems overly fragile.
- Especially in the 4.1 case, we prefer not to fail compounds
just because the returned result came *close* to session
limits. (Though perfect enforcement here may be difficult.)
- I'd prefer encoding to be uniform for all encoders instead of
having special exceptions for encoders containing, for
example, attributes.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
fs/nfsd/nfs4xdr.c | 247 +++++++++++++++++++++++++++++++++++++++--------------
fs/nfsd/xdr4.h | 2 +-
3 files changed, 185 insertions(+), 66 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index d2453bf..694e1a5 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1402,7 +1402,7 @@ encode_op:
}
if (op->status == nfserr_replay_me) {
op->replay = &cstate->replay_owner->so_replay;
- nfsd4_encode_replay(resp, op);
+ nfsd4_encode_replay(&resp->xdr, op);
status = op->status = op->replay->rp_status;
} else {
nfsd4_encode_operation(resp, op);
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 14efc5d..9c532eb 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1746,11 +1746,6 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
}
}

-#define RESERVE_SPACE(nbytes) do { \
- p = xdr_reserve_space(&resp->xdr, nbytes); \
- BUG_ON(!p); \
-} while (0)
-
/* Encode as an array of strings the string given with components
* separated @sep, escaped with esc_enter and esc_exit.
*/
@@ -2737,23 +2732,29 @@ fail:
return -EINVAL;
}

-static void
-nfsd4_encode_stateid(struct nfsd4_compoundres *resp, stateid_t *sid)
+static __be32
+nfsd4_encode_stateid(struct xdr_stream *xdr, stateid_t *sid)
{
__be32 *p;

- RESERVE_SPACE(sizeof(stateid_t));
+ p = xdr_reserve_space(xdr, sizeof(stateid_t));
+ if (!p)
+ return nfserr_resource;
WRITE32(sid->si_generation);
WRITEMEM(&sid->si_opaque, sizeof(stateid_opaque_t));
+ return 0;
}

static __be32
nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_access *access)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(8);
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
+ return nfserr_resource;
WRITE32(access->ac_supported);
WRITE32(access->ac_resp_access);
}
@@ -2762,10 +2763,13 @@ nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_

static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_bind_conn_to_session *bcts)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(NFS4_MAX_SESSIONID_LEN + 8);
+ p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 8);
+ if (!p)
+ return nfserr_resource;
WRITEMEM(bcts->sessionid.data, NFS4_MAX_SESSIONID_LEN);
WRITE32(bcts->dir);
/* Sorry, we do not yet support RDMA over 4.1: */
@@ -2777,8 +2781,10 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
static __be32
nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_close *close)
{
+ struct xdr_stream *xdr = &resp->xdr;
+
if (!nfserr)
- nfsd4_encode_stateid(resp, &close->cl_stateid);
+ nfserr = nfsd4_encode_stateid(xdr, &close->cl_stateid);

return nfserr;
}
@@ -2787,10 +2793,13 @@ nfsd4_encode_close(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_c
static __be32
nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_commit *commit)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(NFS4_VERIFIER_SIZE);
+ p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE);
+ if (!p)
+ return nfserr_resource;
WRITEMEM(commit->co_verf.data, NFS4_VERIFIER_SIZE);
}
return nfserr;
@@ -2799,10 +2808,13 @@ nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
static __be32
nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_create *create)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(32);
+ p = xdr_reserve_space(xdr, 32);
+ if (!p)
+ return nfserr_resource;
write_cinfo(&p, &create->cr_cinfo);
WRITE32(2);
WRITE32(create->cr_bmval[0]);
@@ -2829,13 +2841,16 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
static __be32
nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh **fhpp)
{
+ struct xdr_stream *xdr = &resp->xdr;
struct svc_fh *fhp = *fhpp;
unsigned int len;
__be32 *p;

if (!nfserr) {
len = fhp->fh_handle.fh_size;
- RESERVE_SPACE(len + 4);
+ p = xdr_reserve_space(xdr, len + 4);
+ if (!p)
+ return nfserr_resource;
WRITE32(len);
WRITEMEM(&fhp->fh_handle.fh_base, len);
}
@@ -2846,13 +2861,15 @@ nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh
* Including all fields other than the name, a LOCK4denied structure requires
* 8(clientid) + 4(namelen) + 8(offset) + 8(length) + 4(type) = 32 bytes.
*/
-static void
-nfsd4_encode_lock_denied(struct nfsd4_compoundres *resp, struct nfsd4_lock_denied *ld)
+static __be32
+nfsd4_encode_lock_denied(struct xdr_stream *xdr, struct nfsd4_lock_denied *ld)
{
struct xdr_netobj *conf = &ld->ld_owner;
__be32 *p;

- RESERVE_SPACE(32 + XDR_LEN(conf->len));
+ p = xdr_reserve_space(xdr, 32 + XDR_LEN(conf->len));
+ if (!p)
+ return nfserr_resource;
WRITE64(ld->ld_start);
WRITE64(ld->ld_length);
WRITE32(ld->ld_type);
@@ -2865,15 +2882,18 @@ nfsd4_encode_lock_denied(struct nfsd4_compoundres *resp, struct nfsd4_lock_denie
WRITE64((u64)0); /* clientid */
WRITE32(0); /* length of owner name */
}
+ return nfserr_denied;
}

static __be32
nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lock *lock)
{
+ struct xdr_stream *xdr = &resp->xdr;
+
if (!nfserr)
- nfsd4_encode_stateid(resp, &lock->lk_resp_stateid);
+ nfserr = nfsd4_encode_stateid(xdr, &lock->lk_resp_stateid);
else if (nfserr == nfserr_denied)
- nfsd4_encode_lock_denied(resp, &lock->lk_denied);
+ nfserr = nfsd4_encode_lock_denied(xdr, &lock->lk_denied);

return nfserr;
}
@@ -2881,16 +2901,20 @@ nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lo
static __be32
nfsd4_encode_lockt(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lockt *lockt)
{
+ struct xdr_stream *xdr = &resp->xdr;
+
if (nfserr == nfserr_denied)
- nfsd4_encode_lock_denied(resp, &lockt->lt_denied);
+ nfsd4_encode_lock_denied(xdr, &lockt->lt_denied);
return nfserr;
}

static __be32
nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_locku *locku)
{
+ struct xdr_stream *xdr = &resp->xdr;
+
if (!nfserr)
- nfsd4_encode_stateid(resp, &locku->lu_stateid);
+ nfserr = nfsd4_encode_stateid(xdr, &locku->lu_stateid);

return nfserr;
}
@@ -2899,10 +2923,13 @@ nfsd4_encode_locku(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_l
static __be32
nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_link *link)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(20);
+ p = xdr_reserve_space(xdr, 20);
+ if (!p)
+ return nfserr_resource;
write_cinfo(&p, &link->li_cinfo);
}
return nfserr;
@@ -2912,13 +2939,18 @@ nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_li
static __be32
nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open *open)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (nfserr)
goto out;

- nfsd4_encode_stateid(resp, &open->op_stateid);
- RESERVE_SPACE(40);
+ nfserr = nfsd4_encode_stateid(xdr, &open->op_stateid);
+ if (nfserr)
+ goto out;
+ p = xdr_reserve_space(xdr, 40);
+ if (!p)
+ return nfserr_resource;
write_cinfo(&p, &open->op_cinfo);
WRITE32(open->op_rflags);
WRITE32(2);
@@ -2930,8 +2962,12 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
case NFS4_OPEN_DELEGATE_NONE:
break;
case NFS4_OPEN_DELEGATE_READ:
- nfsd4_encode_stateid(resp, &open->op_delegate_stateid);
- RESERVE_SPACE(20);
+ nfserr = nfsd4_encode_stateid(xdr, &open->op_delegate_stateid);
+ if (nfserr)
+ return nfserr;
+ p = xdr_reserve_space(xdr, 20);
+ if (!p)
+ return nfserr_resource;
WRITE32(open->op_recall);

/*
@@ -2943,8 +2979,12 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
WRITE32(0); /* XXX: is NULL principal ok? */
break;
case NFS4_OPEN_DELEGATE_WRITE:
- nfsd4_encode_stateid(resp, &open->op_delegate_stateid);
- RESERVE_SPACE(32);
+ nfserr = nfsd4_encode_stateid(xdr, &open->op_delegate_stateid);
+ if (nfserr)
+ return nfserr;
+ p = xdr_reserve_space(xdr, 32);
+ if (!p)
+ return nfserr_resource;
WRITE32(0);

/*
@@ -2966,12 +3006,16 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
switch (open->op_why_no_deleg) {
case WND4_CONTENTION:
case WND4_RESOURCE:
- RESERVE_SPACE(8);
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
+ return nfserr_resource;
WRITE32(open->op_why_no_deleg);
WRITE32(0); /* deleg signaling not supported yet */
break;
default:
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
WRITE32(open->op_why_no_deleg);
}
break;
@@ -2986,8 +3030,10 @@ out:
static __be32
nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_confirm *oc)
{
+ struct xdr_stream *xdr = &resp->xdr;
+
if (!nfserr)
- nfsd4_encode_stateid(resp, &oc->oc_resp_stateid);
+ nfserr = nfsd4_encode_stateid(xdr, &oc->oc_resp_stateid);

return nfserr;
}
@@ -2995,8 +3041,10 @@ nfsd4_encode_open_confirm(struct nfsd4_compoundres *resp, __be32 nfserr, struct
static __be32
nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_open_downgrade *od)
{
+ struct xdr_stream *xdr = &resp->xdr;
+
if (!nfserr)
- nfsd4_encode_stateid(resp, &od->od_stateid);
+ nfserr = nfsd4_encode_stateid(xdr, &od->od_stateid);

return nfserr;
}
@@ -3019,7 +3067,9 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
if (resp->xdr.buf->page_len)
return nfserr_resource;

- RESERVE_SPACE(8); /* eof flag and byte count */
+ p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
+ if (!p)
+ return nfserr_resource;

maxcount = svc_max_payload(resp->rqstp);
if (maxcount > read->rd_length)
@@ -3071,7 +3121,9 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
resp->xdr.buf->tail[0].iov_base = p;
resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
@@ -3099,7 +3151,10 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
page = page_address(*(resp->rqstp->rq_next_page++));

maxcount = PAGE_SIZE;
- RESERVE_SPACE(4);
+
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;

/*
* XXX: By default, the ->readlink() VFS op will truncate symlinks
@@ -3126,7 +3181,9 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
resp->xdr.buf->tail[0].iov_base = p;
resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
@@ -3151,7 +3208,9 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
if (!*resp->rqstp->rq_next_page)
return nfserr_resource;

- RESERVE_SPACE(NFS4_VERIFIER_SIZE);
+ p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE);
+ if (!p)
+ return nfserr_resource;

/* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
WRITE32(0);
@@ -3220,10 +3279,13 @@ err_no_verf:
static __be32
nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_remove *remove)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(20);
+ p = xdr_reserve_space(xdr, 20);
+ if (!p)
+ return nfserr_resource;
write_cinfo(&p, &remove->rm_cinfo);
}
return nfserr;
@@ -3232,10 +3294,13 @@ nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
static __be32
nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_rename *rename)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(40);
+ p = xdr_reserve_space(xdr, 40);
+ if (!p)
+ return nfserr_resource;
write_cinfo(&p, &rename->rn_sinfo);
write_cinfo(&p, &rename->rn_tinfo);
}
@@ -3243,7 +3308,7 @@ nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
}

static __be32
-nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,
+nfsd4_do_encode_secinfo(struct xdr_stream *xdr,
__be32 nfserr, struct svc_export *exp)
{
u32 i, nflavs, supported;
@@ -3254,6 +3319,7 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,

if (nfserr)
goto out;
+ nfserr = nfserr_resource;
if (exp->ex_nflavors) {
flavs = exp->ex_flavors;
nflavs = exp->ex_nflavors;
@@ -3275,7 +3341,9 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,
}

supported = 0;
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ goto out;
flavorsp = p++; /* to be backfilled later */

for (i = 0; i < nflavs; i++) {
@@ -3284,7 +3352,10 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,

if (rpcauth_get_gssinfo(pf, &info) == 0) {
supported++;
- RESERVE_SPACE(4 + 4 + XDR_LEN(info.oid.len) + 4 + 4);
+ p = xdr_reserve_space(xdr, 4 + 4 +
+ XDR_LEN(info.oid.len) + 4 + 4);
+ if (!p)
+ goto out;
WRITE32(RPC_AUTH_GSS);
WRITE32(info.oid.len);
WRITEMEM(info.oid.data, info.oid.len);
@@ -3292,7 +3363,9 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,
WRITE32(info.service);
} else if (pf < RPC_AUTH_MAXFLAVOR) {
supported++;
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ goto out;
WRITE32(pf);
} else {
if (report)
@@ -3304,7 +3377,7 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,
if (nflavs != supported)
report = false;
*flavorsp = htonl(supported);
-
+ nfserr = 0;
out:
if (exp)
exp_put(exp);
@@ -3315,14 +3388,18 @@ static __be32
nfsd4_encode_secinfo(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_secinfo *secinfo)
{
- return nfsd4_do_encode_secinfo(resp, nfserr, secinfo->si_exp);
+ struct xdr_stream *xdr = &resp->xdr;
+
+ return nfsd4_do_encode_secinfo(xdr, nfserr, secinfo->si_exp);
}

static __be32
nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_secinfo_no_name *secinfo)
{
- return nfsd4_do_encode_secinfo(resp, nfserr, secinfo->sin_exp);
+ struct xdr_stream *xdr = &resp->xdr;
+
+ return nfsd4_do_encode_secinfo(xdr, nfserr, secinfo->sin_exp);
}

/*
@@ -3332,9 +3409,12 @@ nfsd4_encode_secinfo_no_name(struct nfsd4_compoundres *resp, __be32 nfserr,
static __be32
nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setattr *setattr)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

- RESERVE_SPACE(16);
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
+ return nfserr_resource;
if (nfserr) {
WRITE32(3);
WRITE32(0);
@@ -3353,15 +3433,20 @@ nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
static __be32
nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_setclientid *scd)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(8 + NFS4_VERIFIER_SIZE);
+ p = xdr_reserve_space(xdr, 8 + NFS4_VERIFIER_SIZE);
+ if (!p)
+ return nfserr_resource;
WRITEMEM(&scd->se_clientid, 8);
WRITEMEM(&scd->se_confirm, NFS4_VERIFIER_SIZE);
}
else if (nfserr == nfserr_clid_inuse) {
- RESERVE_SPACE(8);
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
+ return nfserr_resource;
WRITE32(0);
WRITE32(0);
}
@@ -3371,10 +3456,13 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct n
static __be32
nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_write *write)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (!nfserr) {
- RESERVE_SPACE(16);
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
+ return nfserr_resource;
WRITE32(write->wr_bytes_written);
WRITE32(write->wr_how_written);
WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
@@ -3394,6 +3482,7 @@ static __be32
nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_exchange_id *exid)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;
char *major_id;
char *server_scope;
@@ -3409,11 +3498,13 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
server_scope = utsname()->nodename;
server_scope_sz = strlen(server_scope);

- RESERVE_SPACE(
+ p = xdr_reserve_space(xdr,
8 /* eir_clientid */ +
4 /* eir_sequenceid */ +
4 /* eir_flags */ +
4 /* spr_how */);
+ if (!p)
+ return nfserr_resource;

WRITEMEM(&exid->clientid, 8);
WRITE32(exid->seqid);
@@ -3426,7 +3517,9 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
break;
case SP4_MACH_CRED:
/* spo_must_enforce, spo_must_allow */
- RESERVE_SPACE(16);
+ p = xdr_reserve_space(xdr, 16);
+ if (!p)
+ return nfserr_resource;

/* spo_must_enforce bitmap: */
WRITE32(2);
@@ -3440,13 +3533,15 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
WARN_ON_ONCE(1);
}

- RESERVE_SPACE(
+ p = xdr_reserve_space(xdr,
8 /* so_minor_id */ +
4 /* so_major_id.len */ +
(XDR_QUADLEN(major_id_sz) * 4) +
4 /* eir_server_scope.len */ +
(XDR_QUADLEN(server_scope_sz) * 4) +
4 /* eir_server_impl_id.count (0) */);
+ if (!p)
+ return nfserr_resource;

/* The server_owner struct */
WRITE64(minor_id); /* Minor id */
@@ -3467,17 +3562,22 @@ static __be32
nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_create_session *sess)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (nfserr)
return nfserr;

- RESERVE_SPACE(24);
+ p = xdr_reserve_space(xdr, 24);
+ if (!p)
+ return nfserr_resource;
WRITEMEM(sess->sessionid.data, NFS4_MAX_SESSIONID_LEN);
WRITE32(sess->seqid);
WRITE32(sess->flags);

- RESERVE_SPACE(28);
+ p = xdr_reserve_space(xdr, 28);
+ if (!p)
+ return nfserr_resource;
WRITE32(0); /* headerpadsz */
WRITE32(sess->fore_channel.maxreq_sz);
WRITE32(sess->fore_channel.maxresp_sz);
@@ -3487,11 +3587,15 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(sess->fore_channel.nr_rdma_attrs);

if (sess->fore_channel.nr_rdma_attrs) {
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
WRITE32(sess->fore_channel.rdma_attrs);
}

- RESERVE_SPACE(28);
+ p = xdr_reserve_space(xdr, 28);
+ if (!p)
+ return nfserr_resource;
WRITE32(0); /* headerpadsz */
WRITE32(sess->back_channel.maxreq_sz);
WRITE32(sess->back_channel.maxresp_sz);
@@ -3501,7 +3605,9 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(sess->back_channel.nr_rdma_attrs);

if (sess->back_channel.nr_rdma_attrs) {
- RESERVE_SPACE(4);
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
+ return nfserr_resource;
WRITE32(sess->back_channel.rdma_attrs);
}
return 0;
@@ -3511,12 +3617,15 @@ static __be32
nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_sequence *seq)
{
+ struct xdr_stream *xdr = &resp->xdr;
__be32 *p;

if (nfserr)
return nfserr;

- RESERVE_SPACE(NFS4_MAX_SESSIONID_LEN + 20);
+ p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 20);
+ if (!p)
+ return nfserr_resource;
WRITEMEM(seq->sessionid.data, NFS4_MAX_SESSIONID_LEN);
WRITE32(seq->seqid);
WRITE32(seq->slotid);
@@ -3533,13 +3642,16 @@ static __be32
nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
struct nfsd4_test_stateid *test_stateid)
{
+ struct xdr_stream *xdr = &resp->xdr;
struct nfsd4_test_stateid_id *stateid, *next;
__be32 *p;

if (nfserr)
return nfserr;

- RESERVE_SPACE(4 + (4 * test_stateid->ts_num_ids));
+ p = xdr_reserve_space(xdr, 4 + (4 * test_stateid->ts_num_ids));
+ if (!p)
+ return nfserr_resource;
*p++ = htonl(test_stateid->ts_num_ids);

list_for_each_entry_safe(stateid, next, &test_stateid->ts_stateid_list, ts_id_list) {
@@ -3678,7 +3790,11 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
nfsd4_enc encoder;
__be32 *p;

- RESERVE_SPACE(8);
+ p = xdr_reserve_space(xdr, 8);
+ if (!p) {
+ WARN_ON_ONCE(1);
+ return;
+ }
WRITE32(op->opnum);
post_err_offset = xdr->buf->len;

@@ -3730,18 +3846,21 @@ status:
* called with nfs4_lock_state() held
*/
void
-nfsd4_encode_replay(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
+nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op)
{
__be32 *p;
struct nfs4_replay *rp = op->replay;

BUG_ON(!rp);

- RESERVE_SPACE(8);
+ p = xdr_reserve_space(xdr, 8 + rp->rp_buflen);
+ if (!p) {
+ WARN_ON_ONCE(1);
+ return;
+ }
WRITE32(op->opnum);
*p++ = rp->rp_status; /* already xdr'ed */

- RESERVE_SPACE(rp->rp_buflen);
WRITEMEM(rp->rp_buf, rp->rp_buflen);
}

diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 15ca477..ea5ad5d 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -563,7 +563,7 @@ int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *,
struct nfsd4_compoundres *);
__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32);
void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *);
-void nfsd4_encode_replay(struct nfsd4_compoundres *resp, struct nfsd4_op *op);
+void nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op);
__be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
struct svc_fh *fhp, struct svc_export *exp,
struct dentry *dentry,
--
1.9.0


2014-05-22 19:32:42

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 31/52] nfsd4: nfsd4_check_resp_size should check against whole buffer

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 741aab2..91d0b9a 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3762,7 +3762,6 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 respsize)
{
struct xdr_buf *buf = &resp->rqstp->rq_res;
struct nfsd4_session *session = resp->cstate.session;
- int slack_bytes = (char *)resp->xdr.end - (char *)resp->xdr.p;

if (nfsd4_has_session(&resp->cstate)) {
struct nfsd4_slot *slot = resp->cstate.slot;
@@ -3775,7 +3774,7 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 respsize)
return nfserr_rep_too_big_to_cache;
}

- if (respsize > slack_bytes) {
+ if (buf->len + respsize > buf->buflen) {
WARN_ON_ONCE(nfsd4_has_session(&resp->cstate));
return nfserr_resource;
}
--
1.9.0


2014-05-22 19:32:51

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 51/52] nfsd4: kill write32, write64

From: "J. Bruce Fields" <[email protected]>

And switch a couple other functions from the encode(&p,...) convention
to the p = encode(p,...) convention mostly used elsewhere.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 51 +++++++++++++++++++++------------------------------
1 file changed, 21 insertions(+), 30 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 2a5bd29..6813870 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1683,39 +1683,30 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
DECODE_TAIL;
}

-static void write32(__be32 **p, u32 n)
-{
- *(*p)++ = htonl(n);
-}
-
-static void write64(__be32 **p, u64 n)
-{
- write32(p, (n >> 32));
- write32(p, (u32)n);
-}
-
-static void write_change(__be32 **p, struct kstat *stat, struct inode *inode)
+static __be32 *encode_change(__be32 *p, struct kstat *stat, struct inode *inode)
{
if (IS_I_VERSION(inode)) {
- write64(p, inode->i_version);
+ p = xdr_encode_hyper(p, inode->i_version);
} else {
- write32(p, stat->ctime.tv_sec);
- write32(p, stat->ctime.tv_nsec);
+ *p++ = cpu_to_be32(stat->ctime.tv_sec);
+ *p++ = cpu_to_be32(stat->ctime.tv_nsec);
}
+ return p;
}

-static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
+static __be32 *encode_cinfo(__be32 *p, struct nfsd4_change_info *c)
{
- write32(p, c->atomic);
+ *p++ = cpu_to_be32(c->atomic);
if (c->change_supported) {
- write64(p, c->before_change);
- write64(p, c->after_change);
+ p = xdr_encode_hyper(p, c->before_change);
+ p = xdr_encode_hyper(p, c->after_change);
} else {
- write32(p, c->before_ctime_sec);
- write32(p, c->before_ctime_nsec);
- write32(p, c->after_ctime_sec);
- write32(p, c->after_ctime_nsec);
+ *p++ = cpu_to_be32(c->before_ctime_sec);
+ *p++ = cpu_to_be32(c->before_ctime_nsec);
+ *p++ = cpu_to_be32(c->after_ctime_sec);
+ *p++ = cpu_to_be32(c->after_ctime_nsec);
}
+ return p;
}

/* Encode as an array of strings the string given with components
@@ -2186,7 +2177,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- write_change(&p, &stat, dentry->d_inode);
+ p = encode_change(p, &stat, dentry->d_inode);
}
if (bmval0 & FATTR4_WORD0_SIZE) {
p = xdr_reserve_space(xdr, 8);
@@ -2817,7 +2808,7 @@ nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
p = xdr_reserve_space(xdr, 32);
if (!p)
return nfserr_resource;
- write_cinfo(&p, &create->cr_cinfo);
+ p = encode_cinfo(p, &create->cr_cinfo);
*p++ = cpu_to_be32(2);
*p++ = cpu_to_be32(create->cr_bmval[0]);
*p++ = cpu_to_be32(create->cr_bmval[1]);
@@ -2940,7 +2931,7 @@ nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_li
p = xdr_reserve_space(xdr, 20);
if (!p)
return nfserr_resource;
- write_cinfo(&p, &link->li_cinfo);
+ p = encode_cinfo(p, &link->li_cinfo);
}
return nfserr;
}
@@ -2961,7 +2952,7 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
p = xdr_reserve_space(xdr, 40);
if (!p)
return nfserr_resource;
- write_cinfo(&p, &open->op_cinfo);
+ p = encode_cinfo(p, &open->op_cinfo);
*p++ = cpu_to_be32(open->op_rflags);
*p++ = cpu_to_be32(2);
*p++ = cpu_to_be32(open->op_bmval[0]);
@@ -3383,7 +3374,7 @@ nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
p = xdr_reserve_space(xdr, 20);
if (!p)
return nfserr_resource;
- write_cinfo(&p, &remove->rm_cinfo);
+ p = encode_cinfo(p, &remove->rm_cinfo);
}
return nfserr;
}
@@ -3398,8 +3389,8 @@ nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
p = xdr_reserve_space(xdr, 40);
if (!p)
return nfserr_resource;
- write_cinfo(&p, &rename->rn_sinfo);
- write_cinfo(&p, &rename->rn_tinfo);
+ p = encode_cinfo(p, &rename->rn_sinfo);
+ p = encode_cinfo(p, &rename->rn_tinfo);
}
return nfserr;
}
--
1.9.0


2014-05-22 19:32:34

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 10/52] nfsd4: reserve head space for krb5 integ/priv info

From: "J. Bruce Fields" <[email protected]>

Currently if the nfs-level part of a reply would be too large, we'll
return an error to the client. But if the nfs-level part fits and
leaves no room for krb5p or krb5i stuff, then we just drop the request
entirely.

That's no good. Instead, reserve some slack space at the end of the
buffer and make sure we fail outright if we'd come close.

The slack space here is a massive overstimate of what's required, we
should probably try for a tighter limit at some point.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index f809aa8..747d6a8 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1262,7 +1262,7 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,

xdr->buf = buf;
xdr->p = head->iov_base + head->iov_len;
- xdr->end = head->iov_base + PAGE_SIZE;
+ xdr->end = head->iov_base + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
}

/*
--
1.9.0


2014-05-22 19:32:47

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 46/52] nfsd4: allow exotic read compounds

From: "J. Bruce Fields" <[email protected]>

I'm not sure why a client would want to stuff multiple reads in a
single compound rpc, but it's legal for them to do it, and we should
really support it.

Signed-off-by: J. Bruce Fields <[email protected]>
---
Documentation/filesystems/nfs/nfs41-server.txt | 2 -
fs/nfsd/nfs4xdr.c | 55 +++++++++++---------------
2 files changed, 23 insertions(+), 34 deletions(-)

diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt
index b930ad0..c49cd7e 100644
--- a/Documentation/filesystems/nfs/nfs41-server.txt
+++ b/Documentation/filesystems/nfs/nfs41-server.txt
@@ -176,7 +176,5 @@ Nonstandard compound limitations:
ca_maxrequestsize request and a ca_maxresponsesize reply, so we may
fail to live up to the promise we made in CREATE_SESSION fore channel
negotiation.
-* No more than one read-like operation allowed per compound; encoding
- replies that cross page boundaries (except for read data) not handled.

See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues.
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index fec9e13..f3eb7fa 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3139,28 +3139,34 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
struct xdr_stream *xdr = &resp->xdr;
u32 eof;
int v;
- struct page *page;
int starting_len = xdr->buf->len - 8;
- int space_left;
long len;
+ int thislen;
__be32 nfserr;
__be32 tmp;
__be32 *p;
+ u32 zzz = 0;
+ int pad;

len = maxcount;
v = 0;
- while (len) {
- int thislen;

- page = *(resp->rqstp->rq_next_page);
- if (!page) { /* ran out of pages */
- maxcount -= len;
- break;
- }
+ thislen = (void *)xdr->end - (void *)xdr->p;
+ if (len < thislen)
+ thislen = len;
+ p = xdr_reserve_space(xdr, (thislen+3)&~3);
+ WARN_ON_ONCE(!p);
+ resp->rqstp->rq_vec[v].iov_base = p;
+ resp->rqstp->rq_vec[v].iov_len = thislen;
+ v++;
+ len -= thislen;
+
+ while (len) {
thislen = min_t(long, len, PAGE_SIZE);
- resp->rqstp->rq_vec[v].iov_base = page_address(page);
+ p = xdr_reserve_space(xdr, (thislen+3)&~3);
+ WARN_ON_ONCE(!p);
+ resp->rqstp->rq_vec[v].iov_base = p;
resp->rqstp->rq_vec[v].iov_len = thislen;
- resp->rqstp->rq_next_page++;
v++;
len -= thislen;
}
@@ -3170,6 +3176,7 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
read->rd_vlen, &maxcount);
if (nfserr)
return nfserr;
+ xdr_truncate_encode(xdr, starting_len + 8 + ((maxcount+3)&~3));

eof = (read->rd_offset + maxcount >=
read->rd_fhp->fh_dentry->d_inode->i_size);
@@ -3179,27 +3186,9 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
tmp = htonl(maxcount);
write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);

- resp->xdr.buf->page_len = maxcount;
- xdr->buf->len += maxcount;
- xdr->page_ptr += v;
- xdr->iov = xdr->buf->tail;
-
- /* Use rest of head for padding and remaining ops: */
- resp->xdr.buf->tail[0].iov_base = xdr->p;
- resp->xdr.buf->tail[0].iov_len = 0;
- if (maxcount&3) {
- p = xdr_reserve_space(xdr, 4);
- WRITE32(0);
- resp->xdr.buf->tail[0].iov_base += maxcount&3;
- resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
- xdr->buf->len -= (maxcount&3);
- }
-
- space_left = min_t(int, (void *)xdr->end - (void *)xdr->p,
- xdr->buf->buflen - xdr->buf->len);
- xdr->buf->buflen = xdr->buf->len + space_left;
- xdr->end = (__be32 *)((void *)xdr->end + space_left);
-
+ pad = (maxcount&3) ? 4 - (maxcount&3) : 0;
+ write_bytes_to_xdr_buf(xdr->buf, starting_len + 8 + maxcount,
+ &zzz, pad);
return 0;

}
@@ -3233,6 +3222,8 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
xdr_commit_encode(xdr);

maxcount = svc_max_payload(resp->rqstp);
+ if (maxcount > xdr->buf->buflen - xdr->buf->len)
+ maxcount = xdr->buf->buflen - xdr->buf->len;
if (maxcount > read->rd_length)
maxcount = read->rd_length;

--
1.9.0


2014-05-22 19:32:43

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 33/52] rpc: define xdr_restrict_buflen

From: "J. Bruce Fields" <[email protected]>

With this xdr_reserve_space can help us enforce various limits.

Signed-off-by: J. Bruce Fields <[email protected]>
---
include/linux/sunrpc/xdr.h | 1 +
net/sunrpc/xdr.c | 29 +++++++++++++++++++++++++++++
2 files changed, 30 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index b23d69f..70c6b92 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -217,6 +217,7 @@ extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32
extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
extern void xdr_commit_encode(struct xdr_stream *xdr);
extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len);
+extern int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen);
extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages,
unsigned int base, unsigned int len);
extern unsigned int xdr_stream_pos(const struct xdr_stream *xdr);
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 40198c5..aa4eb70 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -650,6 +650,35 @@ void xdr_truncate_encode(struct xdr_stream *xdr, size_t len)
EXPORT_SYMBOL(xdr_truncate_encode);

/**
+ * xdr_restrict_buflen - decrease available buffer space
+ * @xdr: pointer to xdr_stream
+ * @newbuflen: new maximum number of bytes available
+ *
+ * Adjust our idea of how much space is available in the buffer.
+ * If we've already used too much space in the buffer, returns -1.
+ * If the available space is already smaller than newbuflen, returns 0
+ * and does nothing. Otherwise, adjusts xdr->buf->buflen to newbuflen
+ * and ensures xdr->end is set at most offset newbuflen from the start
+ * of the buffer.
+ */
+int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen)
+{
+ struct xdr_buf *buf = xdr->buf;
+ int left_in_this_buf = (void *)xdr->end - (void *)xdr->p;
+ int end_offset = buf->len + left_in_this_buf;
+
+ if (newbuflen < 0 || newbuflen < buf->len)
+ return -1;
+ if (newbuflen > buf->buflen)
+ return 0;
+ if (newbuflen < end_offset)
+ xdr->end = (void *)xdr->end + newbuflen - end_offset;
+ buf->buflen = newbuflen;
+ return 0;
+}
+EXPORT_SYMBOL(xdr_restrict_buflen);
+
+/**
* xdr_write_pages - Insert a list of pages into an XDR buffer for sending
* @xdr: pointer to xdr_stream
* @pages: list of pages
--
1.9.0


2014-05-22 19:32:34

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 11/52] nfsd4: fix encoding of out-of-space replies

From: "J. Bruce Fields" <[email protected]>

If nfsd4_check_resp_size() returns an error then we should really be
truncating the reply here, otherwise we may leave extra garbage at the
end of the rpc reply.

Also add a warning to catch any cases where our reply-size estimates may
be wrong in the case of a non-idempotent operation.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 9 +++++++++
fs/nfsd/nfs4xdr.c | 17 ++++++++++++++++-
fs/nfsd/xdr4.h | 2 ++
3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 747d6a8..db14965 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1859,6 +1859,15 @@ static struct nfsd4_operation nfsd4_ops[] = {
},
};

+void warn_on_nonidempotent_op(struct nfsd4_op *op)
+{
+ if (OPDESC(op)->op_flags & OP_MODIFIES_SOMETHING) {
+ pr_err("unable to encode reply to nonidempotent op %d (%s)\n",
+ op->opnum, nfsd4_op_name(op->opnum));
+ WARN_ON_ONCE(1);
+ }
+}
+
#ifdef NFSD_DEBUG
static const char *nfsd4_op_name(unsigned opnum)
{
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 1879250..24ba652 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3637,6 +3637,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
{
struct nfs4_stateowner *so = resp->cstate.replay_owner;
__be32 *statp;
+ nfsd4_enc encoder;
__be32 *p;

RESERVE_SPACE(8);
@@ -3648,10 +3649,24 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
goto status;
BUG_ON(op->opnum < 0 || op->opnum >= ARRAY_SIZE(nfsd4_enc_ops) ||
!nfsd4_enc_ops[op->opnum]);
- op->status = nfsd4_enc_ops[op->opnum](resp, op->status, &op->u);
+ encoder = nfsd4_enc_ops[op->opnum];
+ op->status = encoder(resp, op->status, &op->u);
/* nfsd4_check_resp_size guarantees enough room for error status */
if (!op->status)
op->status = nfsd4_check_resp_size(resp, 0);
+ if (op->status == nfserr_resource ||
+ op->status == nfserr_rep_too_big ||
+ op->status == nfserr_rep_too_big_to_cache) {
+ /*
+ * The operation may have already been encoded or
+ * partially encoded. No op returns anything additional
+ * in the case of one of these three errors, so we can
+ * just truncate back to after the status. But it's a
+ * bug if we had to do this on a non-idempotent op:
+ */
+ warn_on_nonidempotent_op(op);
+ resp->xdr.p = statp + 1;
+ }
if (so) {
so->so_replay.rp_status = op->status;
so->so_replay.rp_buflen = (char *)resp->xdr.p
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index f62a055..15ca477 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -536,6 +536,8 @@ static inline bool nfsd4_last_compound_op(struct svc_rqst *rqstp)
return argp->opcnt == resp->opcnt;
}

+void warn_on_nonidempotent_op(struct nfsd4_op *op);
+
#define NFS4_SVC_XDRSIZE sizeof(struct nfsd4_compoundargs)

static inline void
--
1.9.0


2014-05-22 19:32:49

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 49/52] nfsd4: kill WRITE64

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 52 ++++++++++++++++++++++++----------------------------
1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 71071ae..46fc861 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1683,10 +1683,6 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
DECODE_TAIL;
}

-#define WRITE64(n) do { \
- *p++ = htonl((u32)((n) >> 32)); \
- *p++ = htonl((u32)(n)); \
-} while (0)
#define WRITEMEM(ptr,nbytes) do { if (nbytes > 0) { \
*(p + XDR_QUADLEN(nbytes) -1) = 0; \
memcpy(p, ptr, nbytes); \
@@ -2204,7 +2200,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64(stat.size);
+ p = xdr_encode_hyper(p, stat.size);
}
if (bmval0 & FATTR4_WORD0_LINK_SUPPORT) {
p = xdr_reserve_space(xdr, 4);
@@ -2229,12 +2225,12 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
if (!p)
goto out_resource;
if (exp->ex_fslocs.migrated) {
- WRITE64(NFS4_REFERRAL_FSID_MAJOR);
- WRITE64(NFS4_REFERRAL_FSID_MINOR);
+ p = xdr_encode_hyper(p, NFS4_REFERRAL_FSID_MAJOR);
+ p = xdr_encode_hyper(p, NFS4_REFERRAL_FSID_MINOR);
} else switch(fsid_source(fhp)) {
case FSIDSOURCE_FSID:
- WRITE64((u64)exp->ex_fsid);
- WRITE64((u64)0);
+ p = xdr_encode_hyper(p, (u64)exp->ex_fsid);
+ p = xdr_encode_hyper(p, (u64)0);
break;
case FSIDSOURCE_DEV:
*p++ = cpu_to_be32(0);
@@ -2337,25 +2333,25 @@ out_acl:
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64(stat.ino);
+ p = xdr_encode_hyper(p, stat.ino);
}
if (bmval0 & FATTR4_WORD0_FILES_AVAIL) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64((u64) statfs.f_ffree);
+ p = xdr_encode_hyper(p, (u64) statfs.f_ffree);
}
if (bmval0 & FATTR4_WORD0_FILES_FREE) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64((u64) statfs.f_ffree);
+ p = xdr_encode_hyper(p, (u64) statfs.f_ffree);
}
if (bmval0 & FATTR4_WORD0_FILES_TOTAL) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64((u64) statfs.f_files);
+ p = xdr_encode_hyper(p, (u64) statfs.f_files);
}
if (bmval0 & FATTR4_WORD0_FS_LOCATIONS) {
status = nfsd4_encode_fs_locations(xdr, rqstp, exp);
@@ -2372,7 +2368,7 @@ out_acl:
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64(exp->ex_path.mnt->mnt_sb->s_maxbytes);
+ p = xdr_encode_hyper(p, exp->ex_path.mnt->mnt_sb->s_maxbytes);
}
if (bmval0 & FATTR4_WORD0_MAXLINK) {
p = xdr_reserve_space(xdr, 4);
@@ -2390,13 +2386,13 @@ out_acl:
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64((u64) svc_max_payload(rqstp));
+ p = xdr_encode_hyper(p, (u64) svc_max_payload(rqstp));
}
if (bmval0 & FATTR4_WORD0_MAXWRITE) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE64((u64) svc_max_payload(rqstp));
+ p = xdr_encode_hyper(p, (u64) svc_max_payload(rqstp));
}
if (bmval1 & FATTR4_WORD1_MODE) {
p = xdr_reserve_space(xdr, 4);
@@ -2438,34 +2434,34 @@ out_acl:
if (!p)
goto out_resource;
dummy64 = (u64)statfs.f_bavail * (u64)statfs.f_bsize;
- WRITE64(dummy64);
+ p = xdr_encode_hyper(p, dummy64);
}
if (bmval1 & FATTR4_WORD1_SPACE_FREE) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
dummy64 = (u64)statfs.f_bfree * (u64)statfs.f_bsize;
- WRITE64(dummy64);
+ p = xdr_encode_hyper(p, dummy64);
}
if (bmval1 & FATTR4_WORD1_SPACE_TOTAL) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
dummy64 = (u64)statfs.f_blocks * (u64)statfs.f_bsize;
- WRITE64(dummy64);
+ p = xdr_encode_hyper(p, dummy64);
}
if (bmval1 & FATTR4_WORD1_SPACE_USED) {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
dummy64 = (u64)stat.blocks << 9;
- WRITE64(dummy64);
+ p = xdr_encode_hyper(p, dummy64);
}
if (bmval1 & FATTR4_WORD1_TIME_ACCESS) {
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
- WRITE64((s64)stat.atime.tv_sec);
+ p = xdr_encode_hyper(p, (s64)stat.atime.tv_sec);
*p++ = cpu_to_be32(stat.atime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_TIME_DELTA) {
@@ -2480,14 +2476,14 @@ out_acl:
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
- WRITE64((s64)stat.ctime.tv_sec);
+ p = xdr_encode_hyper(p, (s64)stat.ctime.tv_sec);
*p++ = cpu_to_be32(stat.ctime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_TIME_MODIFY) {
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
- WRITE64((s64)stat.mtime.tv_sec);
+ p = xdr_encode_hyper(p, (s64)stat.mtime.tv_sec);
*p++ = cpu_to_be32(stat.mtime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) {
@@ -2501,7 +2497,7 @@ out_acl:
if (ignore_crossmnt == 0 &&
dentry == exp->ex_path.mnt->mnt_root)
get_parent_attributes(exp, &stat);
- WRITE64(stat.ino);
+ p = xdr_encode_hyper(p, stat.ino);
}
if (bmval2 & FATTR4_WORD2_SECURITY_LABEL) {
status = nfsd4_encode_security_label(xdr, rqstp, context,
@@ -2892,15 +2888,15 @@ again:
}
return nfserr_resource;
}
- WRITE64(ld->ld_start);
- WRITE64(ld->ld_length);
+ p = xdr_encode_hyper(p, ld->ld_start);
+ p = xdr_encode_hyper(p, ld->ld_length);
*p++ = cpu_to_be32(ld->ld_type);
if (conf->len) {
WRITEMEM(&ld->ld_clientid, 8);
*p++ = cpu_to_be32(conf->len);
WRITEMEM(conf->data, conf->len);
} else { /* non - nfsv4 lock in conflict, no clientid nor owner */
- WRITE64((u64)0); /* clientid */
+ p = xdr_encode_hyper(p, (u64)0); /* clientid */
*p++ = cpu_to_be32(0); /* length of owner name */
}
return nfserr_denied;
@@ -3652,7 +3648,7 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
return nfserr_resource;

/* The server_owner struct */
- WRITE64(minor_id); /* Minor id */
+ p = xdr_encode_hyper(p, minor_id); /* Minor id */
/* major id */
*p++ = cpu_to_be32(major_id_sz);
WRITEMEM(major_id, major_id_sz);
--
1.9.0


2014-05-28 14:02:16

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 41/52] nfsd4: turn off zero-copy-read in exotic cases

On Wed, May 28, 2014 at 01:09:45AM -0700, Christoph Hellwig wrote:
> On Thu, May 22, 2014 at 03:32:16PM -0400, J. Bruce Fields wrote:
> > Later patches handle those "exotic compounds", this one just makes sure
> > zero-copy is turned off in those cases.
>
> How did you test these exotic compounds?

I have is a pynfs test that sends a compound with multiple reads in it.

I don't think that's pushed out to my regular pynfs tree, I'll try to do
that today.

I could really use more of those. Maybe I'm wrong, but I'm worried less
about this case than the more finicky out-of-reply-space cases, where I
do have patches puporting to fix problems that I haven't really
verified.

--b.

2014-05-23 13:15:39

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 01/52] nfsd4: READ, READDIR, etc., are idempotent

On Fri, May 23, 2014 at 09:21:38AM +0800, Kinglong Mee wrote:
> On 5/23/2014 03:31, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <[email protected]>
> >
> > OP_MODIFIES_SOMETHING flags operations that we should be careful not to
> > initiate without being sure we have the buffer space to encode a reply.
> >
> > None of these ops fall into that category.
> >
> > We could probably remove a few more, but this isn't a very important
> > problem at least for ops whose reply size is easy to estimate.
> >
> > Signed-off-by: J. Bruce Fields <[email protected]>
> > ---
> > fs/nfsd/nfs4proc.c | 13 +++----------
> > 1 file changed, 3 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index ac83778..4c4cd96 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -1665,38 +1665,31 @@ static struct nfsd4_operation nfsd4_ops[] = {
> > [OP_PUTFH] = {
> > .op_func = (nfsd4op_func)nfsd4_putfh,
> > .op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS
> > - | OP_IS_PUTFH_LIKE | OP_MODIFIES_SOMETHING
> > - | OP_CLEAR_STATEID,
> > + | OP_IS_PUTFH_LIKE | OP_CLEAR_STATEID,
> > .op_name = "OP_PUTFH",
> > .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
> > },
> > [OP_PUTPUBFH] = {
> > .op_func = (nfsd4op_func)nfsd4_putrootfh,
> > .op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS
> > - | OP_IS_PUTFH_LIKE | OP_MODIFIES_SOMETHING
> > - | OP_CLEAR_STATEID,
> > + | OP_IS_PUTFH_LIKE | OP_CLEAR_STATEID,
> > .op_name = "OP_PUTPUBFH",
> > .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
> > },
> > [OP_PUTROOTFH] = {
> > .op_func = (nfsd4op_func)nfsd4_putrootfh,
> > .op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS
> > - | OP_IS_PUTFH_LIKE | OP_MODIFIES_SOMETHING
> > - | OP_CLEAR_STATEID,
> > + | OP_IS_PUTFH_LIKE | OP_CLEAR_STATEID,
> > .op_name = "OP_PUTROOTFH",
> > .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
> > },
> > [OP_READ] = {
> > .op_func = (nfsd4op_func)nfsd4_read,
> > - .op_flags = OP_MODIFIES_SOMETHING,
> > - .op_name = "OP_READ",
>
> Should not delete op_name.
>
> > .op_rsize_bop = (nfsd4op_rsize)nfsd4_read_rsize,
> > .op_get_currentstateid = (stateid_getter)nfsd4_get_readstateid,
> > },
> > [OP_READDIR] = {
> > .op_func = (nfsd4op_func)nfsd4_readdir,
> > - .op_flags = OP_MODIFIES_SOMETHING,
> > - .op_name = "OP_READDIR",
>
> Same as above.
>
> Because "[PATCH 03/52] nfsd4: fill in some missing op_name's" re-adds them.

Good grief--I should have checked when those op_name's diseappeared
before making the new patch. Thanks.

--b.

2014-05-22 19:32:48

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 47/52] nfsd4: really fix nfs4err_resource in 4.1 case

From: "J. Bruce Fields" <[email protected]>

encode_getattr, for example, can return nfserr_resource to indicate it
ran out of buffer space. That's not a legal error in the 4.1 case.
And in the 4.1 case, if we ran out of buffer space, we should have
exceeded a session limit too.

(Note in 1bc49d83c37cfaf46be357757e592711e67f9809 "nfsd4: fix
nfs4err_resource in 4.1 case" we originally tried fixing this error
return before fixing the problem that we could error out while we still
had lots of available space. The result was to trade one illegal error
for another in those cases. We decided that was helpful, so reverted
the change in fc208d026be0c7d60db9118583fc62f6ca97743d, and are only
reinstating it now that we've elimited almost all of those cases.)

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index f3eb7fa..5fc3ba0 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3903,6 +3903,14 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
space_needed = COMPOUND_ERR_SLACK_SPACE;
op->status = nfsd4_check_resp_size(resp, space_needed);
}
+ if (op->status == nfserr_resource && nfsd4_has_session(&resp->cstate)) {
+ struct nfsd4_slot *slot = resp->cstate.slot;
+
+ if (slot->sl_flags & NFSD4_SLOT_CACHETHIS)
+ op->status = nfserr_rep_too_big_to_cache;
+ else
+ op->status = nfserr_rep_too_big;
+ }
if (op->status == nfserr_resource ||
op->status == nfserr_rep_too_big ||
op->status == nfserr_rep_too_big_to_cache) {
--
1.9.0


2014-05-28 14:23:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 41/52] nfsd4: turn off zero-copy-read in exotic cases

On Wed, May 28, 2014 at 10:13:09AM -0400, Anna Schumaker wrote:
> On 05/28/2014 10:01 AM, J. Bruce Fields wrote:
> > On Wed, May 28, 2014 at 01:09:45AM -0700, Christoph Hellwig wrote:
> >> On Thu, May 22, 2014 at 03:32:16PM -0400, J. Bruce Fields wrote:
> >>> Later patches handle those "exotic compounds", this one just makes
> >>> sure zero-copy is turned off in those cases.
> >> How did you test these exotic compounds?
> > I have is a pynfs test that sends a compound with multiple reads in
> > it.
> >
> > I don't think that's pushed out to my regular pynfs tree, I'll try
> > to do that today.
> >
> > I could really use more of those. Maybe I'm wrong, but I'm worried
> > less about this case than the more finicky out-of-reply-space cases,
> > where I do have patches puporting to fix problems that I haven't
> > really verified.
>
> I'll eventually be using this case for READ_PLUS. I'll be sure to
> send problems your way! :)

Great, thanks.

Actually my main question there is how to handle 4.2 in pynfs.

4.1 and 4.0 are entirely separate codebases. We definitely don't want
to do that again.

There's not much point re-running all the 4.1 tests over 4.2. Maybe all
we may need is to say "use minor version 2 on this compound" in tests of
the new features.

But I haven't even tried to figure out how to tell pynfs about the new
.x files.

--b.

2014-05-28 14:27:15

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 41/52] nfsd4: turn off zero-copy-read in exotic cases

On 05/28/2014 10:23 AM, J. Bruce Fields wrote:
> On Wed, May 28, 2014 at 10:13:09AM -0400, Anna Schumaker wrote:
>> On 05/28/2014 10:01 AM, J. Bruce Fields wrote:
>>> On Wed, May 28, 2014 at 01:09:45AM -0700, Christoph Hellwig wrote:
>>>> On Thu, May 22, 2014 at 03:32:16PM -0400, J. Bruce Fields wrote:
>>>>> Later patches handle those "exotic compounds", this one just makes
>>>>> sure zero-copy is turned off in those cases.
>>>> How did you test these exotic compounds?
>>> I have is a pynfs test that sends a compound with multiple reads in
>>> it.
>>>
>>> I don't think that's pushed out to my regular pynfs tree, I'll try
>>> to do that today.
>>>
>>> I could really use more of those. Maybe I'm wrong, but I'm worried
>>> less about this case than the more finicky out-of-reply-space cases,
>>> where I do have patches puporting to fix problems that I haven't
>>> really verified.
>> I'll eventually be using this case for READ_PLUS. I'll be sure to
>> send problems your way! :)
> Great, thanks.
>
> Actually my main question there is how to handle 4.2 in pynfs.
>
> 4.1 and 4.0 are entirely separate codebases. We definitely don't want
> to do that again.

Can you make v4.2 a new file that uses the v4.1 functions? That way all the v4.2 stuff is kept together. I want to try doing something similar on the client for functions in nfs4proc.c and nfs4xdr.c.

Anna
>
> There's not much point re-running all the 4.1 tests over 4.2. Maybe all
> we may need is to say "use minor version 2 on this compound" in tests of
> the new features.
>
> But I haven't even tried to figure out how to tell pynfs about the new
> .x files.
>
> --b.


2014-05-22 19:32:40

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 25/52] nfsd4: size-checking cleanup

From: "J. Bruce Fields" <[email protected]>

Better variable name, some comments, etc.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 9 ++++++---
fs/nfsd/nfs4xdr.c | 29 +++++++++++++++--------------
2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 9b57894..f8e7472 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1281,7 +1281,6 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate = &resp->cstate;
struct svc_fh *current_fh = &cstate->current_fh;
struct svc_fh *save_fh = &cstate->save_fh;
- u32 plen = 0;
__be32 status;

svcxdr_init_encode(rqstp, resp);
@@ -1351,9 +1350,13 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,

/* If op is non-idempotent */
if (opdesc->op_flags & OP_MODIFIES_SOMETHING) {
- plen = opdesc->op_rsize_bop(rqstp, op);
/*
- * If there's still another operation, make sure
+ * Don't execute this op if we couldn't encode a
+ * succesful reply:
+ */
+ u32 plen = opdesc->op_rsize_bop(rqstp, op);
+ /*
+ * Plus if there's another operation, make sure
* we'll have space to at least encode an error:
*/
if (resp->opcnt < args->opcnt)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 86caa98..f1a9716 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3739,35 +3739,36 @@ static nfsd4_enc nfsd4_enc_ops[] = {
};

/*
- * Calculate the total amount of memory that the compound response has taken
- * after encoding the current operation with pad.
+ * Calculate whether we still have space to encode repsize bytes.
+ * There are two considerations:
+ * - For NFS versions >=4.1, the size of the reply must stay within
+ * session limits
+ * - For all NFS versions, we must stay within limited preallocated
+ * buffer space.
*
- * pad: if operation is non-idempotent, pad was calculate by op_rsize_bop()
- * which was specified at nfsd4_operation, else pad is zero.
- *
- * Compare this length to the session se_fmaxresp_sz and se_fmaxresp_cached.
- *
- * Our se_fmaxresp_cached will always be a multiple of PAGE_SIZE, and so
- * will be at least a page and will therefore hold the xdr_buf head.
+ * This is called before the operation is processed, so can only provide
+ * an upper estimate. For some nonidempotent operations (such as
+ * getattr), it's not necessarily a problem if that estimate is wrong,
+ * as we can fail it after processing without significant side effects.
*/
-__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 pad)
+__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 respsize)
{
struct xdr_buf *buf = &resp->rqstp->rq_res;
struct nfsd4_session *session = resp->cstate.session;
- struct nfsd4_slot *slot = resp->cstate.slot;
int slack_bytes = (char *)resp->xdr.end - (char *)resp->xdr.p;

if (nfsd4_has_session(&resp->cstate)) {
+ struct nfsd4_slot *slot = resp->cstate.slot;

- if (buf->len + pad > session->se_fchannel.maxresp_sz)
+ if (buf->len + respsize > session->se_fchannel.maxresp_sz)
return nfserr_rep_too_big;

if ((slot->sl_flags & NFSD4_SLOT_CACHETHIS) &&
- buf->len + pad > session->se_fchannel.maxresp_cached)
+ buf->len + respsize > session->se_fchannel.maxresp_cached)
return nfserr_rep_too_big_to_cache;
}

- if (pad > slack_bytes) {
+ if (respsize > slack_bytes) {
WARN_ON_ONCE(nfsd4_has_session(&resp->cstate));
return nfserr_resource;
}
--
1.9.0


2014-05-22 19:32:33

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 15/52] nfsd4: remove ADJUST_ARGS

From: "J. Bruce Fields" <[email protected]>

It's just uninteresting debugging code at this point.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 40 ----------------------------------------
1 file changed, 40 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index bc8c9e0..865a2c2 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1750,7 +1750,6 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
p = xdr_reserve_space(&resp->xdr, nbytes); \
BUG_ON(!p); \
} while (0)
-#define ADJUST_ARGS() WARN_ON_ONCE(p != resp->xdr.p) \

/* Encode as an array of strings the string given with components
* separated @sep, escaped with esc_enter and esc_exit.
@@ -2744,7 +2743,6 @@ nfsd4_encode_stateid(struct nfsd4_compoundres *resp, stateid_t *sid)
RESERVE_SPACE(sizeof(stateid_t));
WRITE32(sid->si_generation);
WRITEMEM(&sid->si_opaque, sizeof(stateid_opaque_t));
- ADJUST_ARGS();
}

static __be32
@@ -2756,7 +2754,6 @@ nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
RESERVE_SPACE(8);
WRITE32(access->ac_supported);
WRITE32(access->ac_resp_access);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -2771,7 +2768,6 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
WRITE32(bcts->dir);
/* Sorry, we do not yet support RDMA over 4.1: */
WRITE32(0);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -2794,7 +2790,6 @@ nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
if (!nfserr) {
RESERVE_SPACE(NFS4_VERIFIER_SIZE);
WRITEMEM(commit->co_verf.data, NFS4_VERIFIER_SIZE);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -2810,7 +2805,6 @@ nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
WRITE32(2);
WRITE32(create->cr_bmval[0]);
WRITE32(create->cr_bmval[1]);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -2842,7 +2836,6 @@ nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh
RESERVE_SPACE(len + 4);
WRITE32(len);
WRITEMEM(&fhp->fh_handle.fh_base, len);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -2870,7 +2863,6 @@ nfsd4_encode_lock_denied(struct nfsd4_compoundres *resp, struct nfsd4_lock_denie
WRITE64((u64)0); /* clientid */
WRITE32(0); /* length of owner name */
}
- ADJUST_ARGS();
}

static __be32
@@ -2910,7 +2902,6 @@ nfsd4_encode_link(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_li
if (!nfserr) {
RESERVE_SPACE(20);
write_cinfo(&p, &link->li_cinfo);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -2932,7 +2923,6 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
WRITE32(open->op_bmval[0]);
WRITE32(open->op_bmval[1]);
WRITE32(open->op_delegate_type);
- ADJUST_ARGS();

switch (open->op_delegate_type) {
case NFS4_OPEN_DELEGATE_NONE:
@@ -2949,7 +2939,6 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
WRITE32(0);
WRITE32(0);
WRITE32(0); /* XXX: is NULL principal ok? */
- ADJUST_ARGS();
break;
case NFS4_OPEN_DELEGATE_WRITE:
nfsd4_encode_stateid(resp, &open->op_delegate_stateid);
@@ -2970,7 +2959,6 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
WRITE32(0);
WRITE32(0);
WRITE32(0); /* XXX: is NULL principal ok? */
- ADJUST_ARGS();
break;
case NFS4_OPEN_DELEGATE_NONE_EXT: /* 4.1 */
switch (open->op_why_no_deleg) {
@@ -2984,7 +2972,6 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
RESERVE_SPACE(4);
WRITE32(open->op_why_no_deleg);
}
- ADJUST_ARGS();
break;
default:
BUG();
@@ -3066,7 +3053,6 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

WRITE32(eof);
WRITE32(maxcount);
- ADJUST_ARGS();
resp->xdr.buf->head[0].iov_len = (char *)p
- (char *)resp->xdr.buf->head[0].iov_base;
resp->xdr.buf->page_len = maxcount;
@@ -3080,7 +3066,6 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
- ADJUST_ARGS();
}
return 0;
}
@@ -3121,7 +3106,6 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
}

WRITE32(maxcount);
- ADJUST_ARGS();
resp->xdr.buf->head[0].iov_len = (char *)p
- (char *)resp->xdr.buf->head[0].iov_base;
resp->xdr.buf->page_len = maxcount;
@@ -3135,7 +3119,6 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
- ADJUST_ARGS();
}
return 0;
}
@@ -3162,7 +3145,6 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
/* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
WRITE32(0);
WRITE32(0);
- ADJUST_ARGS();
resp->xdr.buf->head[0].iov_len = ((char *)resp->xdr.p)
- (char *)resp->xdr.buf->head[0].iov_base;
tailbase = p;
@@ -3233,7 +3215,6 @@ nfsd4_encode_remove(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
if (!nfserr) {
RESERVE_SPACE(20);
write_cinfo(&p, &remove->rm_cinfo);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -3247,7 +3228,6 @@ nfsd4_encode_rename(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
RESERVE_SPACE(40);
write_cinfo(&p, &rename->rn_sinfo);
write_cinfo(&p, &rename->rn_tinfo);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -3287,7 +3267,6 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,
supported = 0;
RESERVE_SPACE(4);
flavorsp = p++; /* to be backfilled later */
- ADJUST_ARGS();

for (i = 0; i < nflavs; i++) {
rpc_authflavor_t pf = flavs[i].pseudoflavor;
@@ -3301,12 +3280,10 @@ nfsd4_do_encode_secinfo(struct nfsd4_compoundres *resp,
WRITEMEM(info.oid.data, info.oid.len);
WRITE32(info.qop);
WRITE32(info.service);
- ADJUST_ARGS();
} else if (pf < RPC_AUTH_MAXFLAVOR) {
supported++;
RESERVE_SPACE(4);
WRITE32(pf);
- ADJUST_ARGS();
} else {
if (report)
pr_warn("NFS: SECINFO: security flavor %u "
@@ -3360,7 +3337,6 @@ nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
WRITE32(setattr->sa_bmval[1]);
WRITE32(setattr->sa_bmval[2]);
}
- ADJUST_ARGS();
return nfserr;
}

@@ -3373,13 +3349,11 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct n
RESERVE_SPACE(8 + NFS4_VERIFIER_SIZE);
WRITEMEM(&scd->se_clientid, 8);
WRITEMEM(&scd->se_confirm, NFS4_VERIFIER_SIZE);
- ADJUST_ARGS();
}
else if (nfserr == nfserr_clid_inuse) {
RESERVE_SPACE(8);
WRITE32(0);
WRITE32(0);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -3394,7 +3368,6 @@ nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_w
WRITE32(write->wr_bytes_written);
WRITE32(write->wr_how_written);
WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
- ADJUST_ARGS();
}
return nfserr;
}
@@ -3437,7 +3410,6 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(exid->flags);

WRITE32(exid->spa_how);
- ADJUST_ARGS();

switch (exid->spa_how) {
case SP4_NONE:
@@ -3453,7 +3425,6 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
/* empty spo_must_allow bitmap: */
WRITE32(0);

- ADJUST_ARGS();
break;
default:
WARN_ON_ONCE(1);
@@ -3479,7 +3450,6 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,

/* Implementation id */
WRITE32(0); /* zero length nfs_impl_id4 array */
- ADJUST_ARGS();
return 0;
}

@@ -3496,7 +3466,6 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITEMEM(sess->sessionid.data, NFS4_MAX_SESSIONID_LEN);
WRITE32(sess->seqid);
WRITE32(sess->flags);
- ADJUST_ARGS();

RESERVE_SPACE(28);
WRITE32(0); /* headerpadsz */
@@ -3506,12 +3475,10 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(sess->fore_channel.maxops);
WRITE32(sess->fore_channel.maxreqs);
WRITE32(sess->fore_channel.nr_rdma_attrs);
- ADJUST_ARGS();

if (sess->fore_channel.nr_rdma_attrs) {
RESERVE_SPACE(4);
WRITE32(sess->fore_channel.rdma_attrs);
- ADJUST_ARGS();
}

RESERVE_SPACE(28);
@@ -3522,12 +3489,10 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(sess->back_channel.maxops);
WRITE32(sess->back_channel.maxreqs);
WRITE32(sess->back_channel.nr_rdma_attrs);
- ADJUST_ARGS();

if (sess->back_channel.nr_rdma_attrs) {
RESERVE_SPACE(4);
WRITE32(sess->back_channel.rdma_attrs);
- ADJUST_ARGS();
}
return 0;
}
@@ -3550,7 +3515,6 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(seq->maxslots - 1); /* sr_target_highest_slotid */
WRITE32(seq->status_flags);

- ADJUST_ARGS();
resp->cstate.datap = p; /* DRC cache data pointer */
return 0;
}
@@ -3572,7 +3536,6 @@ nfsd4_encode_test_stateid(struct nfsd4_compoundres *resp, __be32 nfserr,
*p++ = stateid->ts_id_status;
}

- ADJUST_ARGS();
return nfserr;
}

@@ -3707,7 +3670,6 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
RESERVE_SPACE(8);
WRITE32(op->opnum);
statp = p++; /* to be backfilled at the end */
- ADJUST_ARGS();

if (op->opnum == OP_ILLEGAL)
goto status;
@@ -3768,11 +3730,9 @@ nfsd4_encode_replay(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
RESERVE_SPACE(8);
WRITE32(op->opnum);
*p++ = rp->rp_status; /* already xdr'ed */
- ADJUST_ARGS();

RESERVE_SPACE(rp->rp_buflen);
WRITEMEM(rp->rp_buf, rp->rp_buflen);
- ADJUST_ARGS();
}

int
--
1.9.0


2014-05-22 19:32:46

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 43/52] nfsd4: separate splice and readv cases

From: "J. Bruce Fields" <[email protected]>

The splice and readv cases are actually quite different--for example the
former case ignores the array of vectors we build up for the latter.

It is probably clearer to separate the two cases entirely.

There's some code duplication between the split out encoders, but this
is only temporary and will be fixed by a later patch.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 173 +++++++++++++++++++++++++++++++++++++++++-------------
fs/nfsd/vfs.c | 121 +++++++++++++++++++++++---------------
fs/nfsd/vfs.h | 8 +++
3 files changed, 214 insertions(+), 88 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 1ca547d..bf853fd 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3069,41 +3069,84 @@ nfsd4_encode_open_downgrade(struct nfsd4_compoundres *resp, __be32 nfserr, struc
return nfserr;
}

-static __be32
-nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
- struct nfsd4_read *read)
+static __be32 nfsd4_encode_splice_read(
+ struct nfsd4_compoundres *resp,
+ struct nfsd4_read *read,
+ struct file *file, unsigned long maxcount)
{
- u32 eof;
- int v;
- struct page *page;
- unsigned long maxcount;
struct xdr_stream *xdr = &resp->xdr;
- int starting_len = xdr->buf->len;
+ u32 eof;
+ int starting_len = xdr->buf->len - 8;
int space_left;
- long len;
+ __be32 nfserr;
+ __be32 tmp;
__be32 *p;

- if (nfserr)
- return nfserr;
-
- p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
- if (!p) {
- WARN_ON_ONCE(resp->rqstp->rq_splice_ok);
+ /*
+ * Don't inline pages unless we know there's room for eof,
+ * count, and possible padding:
+ */
+ if (xdr->end - xdr->p < 3)
return nfserr_resource;
+
+ nfserr = nfsd_splice_read(read->rd_rqstp, file,
+ read->rd_offset, &maxcount);
+ if (nfserr) {
+ /*
+ * nfsd_splice_actor may have already messed with the
+ * page length; reset it so as not to confuse
+ * xdr_truncate_encode:
+ */
+ xdr->buf->page_len = 0;
+ return nfserr;
}

- /* Make sure there will be room for padding if needed: */
- if (xdr->end - xdr->p < 1)
- return nfserr_resource;
+ eof = (read->rd_offset + maxcount >=
+ read->rd_fhp->fh_dentry->d_inode->i_size);

- if (resp->xdr.buf->page_len) {
- WARN_ON_ONCE(resp->rqstp->rq_splice_ok);
- return nfserr_resource;
+ tmp = htonl(eof);
+ write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4);
+ tmp = htonl(maxcount);
+ write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
+
+ resp->xdr.buf->page_len = maxcount;
+ xdr->buf->len += maxcount;
+ xdr->page_ptr += (maxcount + PAGE_SIZE - 1) / PAGE_SIZE;
+ xdr->iov = xdr->buf->tail;
+
+ /* Use rest of head for padding and remaining ops: */
+ resp->xdr.buf->tail[0].iov_base = xdr->p;
+ resp->xdr.buf->tail[0].iov_len = 0;
+ if (maxcount&3) {
+ p = xdr_reserve_space(xdr, 4);
+ WRITE32(0);
+ resp->xdr.buf->tail[0].iov_base += maxcount&3;
+ resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
+ xdr->buf->len -= (maxcount&3);
}

- maxcount = svc_max_payload(resp->rqstp);
- if (maxcount > read->rd_length)
- maxcount = read->rd_length;
+ space_left = min_t(int, (void *)xdr->end - (void *)xdr->p,
+ xdr->buf->buflen - xdr->buf->len);
+ xdr->buf->buflen = xdr->buf->len + space_left;
+ xdr->end = (__be32 *)((void *)xdr->end + space_left);
+
+ return 0;
+}
+
+static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
+ struct nfsd4_read *read,
+ struct file *file, unsigned long maxcount)
+{
+ struct xdr_stream *xdr = &resp->xdr;
+ u32 eof;
+ int v;
+ struct page *page;
+ int starting_len = xdr->buf->len - 8;
+ int space_left;
+ long len;
+ __be32 nfserr;
+ __be32 tmp;
+ __be32 *p;

len = maxcount;
v = 0;
@@ -3124,34 +3167,26 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
}
read->rd_vlen = v;

- nfserr = nfsd_read_file(read->rd_rqstp, read->rd_fhp, read->rd_filp,
- read->rd_offset, resp->rqstp->rq_vec, read->rd_vlen,
- &maxcount);
-
- if (nfserr) {
- /*
- * nfsd_splice_actor may have already messed with the
- * page length; reset it so as not to confuse
- * xdr_truncate_encode:
- */
- xdr->buf->page_len = 0;
- xdr_truncate_encode(xdr, starting_len);
+ nfserr = nfsd_readv(file, read->rd_offset, resp->rqstp->rq_vec,
+ read->rd_vlen, &maxcount);
+ if (nfserr)
return nfserr;
- }
+
eof = (read->rd_offset + maxcount >=
read->rd_fhp->fh_dentry->d_inode->i_size);

- WRITE32(eof);
- WRITE32(maxcount);
- WARN_ON_ONCE(resp->xdr.buf->head[0].iov_len != (char *)p
- - (char *)resp->xdr.buf->head[0].iov_base);
+ tmp = htonl(eof);
+ write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4);
+ tmp = htonl(maxcount);
+ write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
+
resp->xdr.buf->page_len = maxcount;
xdr->buf->len += maxcount;
xdr->page_ptr += v;
xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
- resp->xdr.buf->tail[0].iov_base = p;
+ resp->xdr.buf->tail[0].iov_base = xdr->p;
resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
p = xdr_reserve_space(xdr, 4);
@@ -3167,6 +3202,60 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
xdr->end = (__be32 *)((void *)xdr->end + space_left);

return 0;
+
+}
+
+static __be32
+nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
+ struct nfsd4_read *read)
+{
+ unsigned long maxcount;
+ struct xdr_stream *xdr = &resp->xdr;
+ struct file *file = read->rd_filp;
+ int starting_len = xdr->buf->len;
+ struct raparms *ra;
+ __be32 *p;
+ __be32 err;
+
+ if (nfserr)
+ return nfserr;
+
+ p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
+ if (!p) {
+ WARN_ON_ONCE(resp->rqstp->rq_splice_ok);
+ return nfserr_resource;
+ }
+
+ if (resp->xdr.buf->page_len) {
+ WARN_ON_ONCE(resp->rqstp->rq_splice_ok);
+ return nfserr_resource;
+ }
+
+ xdr_commit_encode(xdr);
+
+ maxcount = svc_max_payload(resp->rqstp);
+ if (maxcount > read->rd_length)
+ maxcount = read->rd_length;
+
+ if (!read->rd_filp) {
+ err = nfsd_get_tmp_read_open(resp->rqstp, read->rd_fhp,
+ &file, &ra);
+ if (err)
+ goto err_truncate;
+ }
+
+ if (file->f_op->splice_read && resp->rqstp->rq_splice_ok)
+ err = nfsd4_encode_splice_read(resp, read, file, maxcount);
+ else
+ err = nfsd4_encode_readv(resp, read, file, maxcount);
+
+ if (!read->rd_filp)
+ nfsd_put_tmp_read_open(file, ra);
+
+err_truncate:
+ if (err)
+ xdr_truncate_encode(xdr, starting_len);
+ return err;
}

static __be32
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ce781da..c0fbe24 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -820,41 +820,54 @@ static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
return __splice_from_pipe(pipe, sd, nfsd_splice_actor);
}

-static __be32
-nfsd_vfs_read(struct svc_rqst *rqstp, struct file *file,
- loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
+__be32 nfsd_finish_read(struct file *file, unsigned long *count, int host_err)
{
- mm_segment_t oldfs;
- __be32 err;
- int host_err;
-
- err = nfserr_perm;
-
- if (file->f_op->splice_read && rqstp->rq_splice_ok) {
- struct splice_desc sd = {
- .len = 0,
- .total_len = *count,
- .pos = offset,
- .u.data = rqstp,
- };
-
- rqstp->rq_next_page = rqstp->rq_respages + 1;
- host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
- } else {
- oldfs = get_fs();
- set_fs(KERNEL_DS);
- host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
- set_fs(oldfs);
- }
-
if (host_err >= 0) {
nfsdstats.io_read += host_err;
*count = host_err;
- err = 0;
fsnotify_access(file);
+ return 0;
} else
- err = nfserrno(host_err);
- return err;
+ return nfserrno(host_err);
+}
+
+int nfsd_splice_read(struct svc_rqst *rqstp,
+ struct file *file, loff_t offset, unsigned long *count)
+{
+ struct splice_desc sd = {
+ .len = 0,
+ .total_len = *count,
+ .pos = offset,
+ .u.data = rqstp,
+ };
+ int host_err;
+
+ rqstp->rq_next_page = rqstp->rq_respages + 1;
+ host_err = splice_direct_to_actor(file, &sd, nfsd_direct_splice_actor);
+ return nfsd_finish_read(file, count, host_err);
+}
+
+int nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
+ unsigned long *count)
+{
+ mm_segment_t oldfs;
+ int host_err;
+
+ oldfs = get_fs();
+ set_fs(KERNEL_DS);
+ host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
+ set_fs(oldfs);
+ return nfsd_finish_read(file, count, host_err);
+}
+
+static __be32
+nfsd_vfs_read(struct svc_rqst *rqstp, struct file *file,
+ loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
+{
+ if (file->f_op->splice_read && rqstp->rq_splice_ok)
+ return nfsd_splice_read(rqstp, file, offset, count);
+ else
+ return nfsd_readv(file, offset, vec, vlen, count);
}

/*
@@ -944,33 +957,28 @@ out_nfserr:
return err;
}

-/*
- * Read data from a file. count must contain the requested read count
- * on entry. On return, *count contains the number of bytes actually read.
- * N.B. After this call fhp needs an fh_put
- */
-__be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
- loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
+__be32 nfsd_get_tmp_read_open(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ struct file **file, struct raparms **ra)
{
- struct file *file;
struct inode *inode;
- struct raparms *ra;
__be32 err;

- err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, &file);
+ err = nfsd_open(rqstp, fhp, S_IFREG, NFSD_MAY_READ, file);
if (err)
return err;

- inode = file_inode(file);
+ inode = file_inode(*file);

/* Get readahead parameters */
- ra = nfsd_get_raparms(inode->i_sb->s_dev, inode->i_ino);
-
- if (ra && ra->p_set)
- file->f_ra = ra->p_ra;
+ *ra = nfsd_get_raparms(inode->i_sb->s_dev, inode->i_ino);

- err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
+ if (*ra && (*ra)->p_set)
+ (*file)->f_ra = (*ra)->p_ra;
+ return nfs_ok;
+}

+void nfsd_put_tmp_read_open(struct file *file, struct raparms *ra)
+{
/* Write back readahead params */
if (ra) {
struct raparm_hbucket *rab = &raparm_hash[ra->p_hindex];
@@ -980,8 +988,29 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
ra->p_count--;
spin_unlock(&rab->pb_lock);
}
-
nfsd_close(file);
+}
+
+/*
+ * Read data from a file. count must contain the requested read count
+ * on entry. On return, *count contains the number of bytes actually read.
+ * N.B. After this call fhp needs an fh_put
+ */
+__be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
+{
+ struct file *file;
+ struct raparms *ra;
+ __be32 err;
+
+ err = nfsd_get_tmp_read_open(rqstp, fhp, &file, &ra);
+ if (err)
+ return err;
+
+ err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
+
+ nfsd_put_tmp_read_open(file, ra);
+
return err;
}

diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index fbe90bd..7441e96 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -70,6 +70,14 @@ __be32 nfsd_commit(struct svc_rqst *, struct svc_fh *,
__be32 nfsd_open(struct svc_rqst *, struct svc_fh *, umode_t,
int, struct file **);
void nfsd_close(struct file *);
+struct raparms;
+__be32 nfsd_get_tmp_read_open(struct svc_rqst *, struct svc_fh *,
+ struct file **, struct raparms **);
+void nfsd_put_tmp_read_open(struct file *, struct raparms *);
+int nfsd_splice_read(struct svc_rqst *,
+ struct file *, loff_t, unsigned long *);
+int nfsd_readv(struct file *, loff_t, struct kvec *, int,
+ unsigned long *);
__be32 nfsd_read(struct svc_rqst *, struct svc_fh *,
loff_t, struct kvec *, int, unsigned long *);
__be32 nfsd_read_file(struct svc_rqst *, struct svc_fh *, struct file *,
--
1.9.0


2014-05-22 19:32:44

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 36/52] nfsd4: allow large readdirs

From: "J. Bruce Fields" <[email protected]>

Currently we limit readdir results to a single page. This can result in
a performance regression compared to NFSv3 when reading large
directories.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 9 ++--
fs/nfsd/nfs4xdr.c | 137 +++++++++++++++++++++++++++++------------------------
fs/nfsd/xdr4.h | 5 +-
3 files changed, 82 insertions(+), 69 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index e1196ed..f56f5e3 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1502,13 +1502,14 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)

static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
+ u32 maxcount = svc_max_payload(rqstp);
u32 rlen = op->u.readdir.rd_maxcount;

- if (rlen > PAGE_SIZE)
- rlen = PAGE_SIZE;
+ if (rlen > maxcount)
+ rlen = maxcount;

- return (op_encode_hdr_size + op_encode_verifier_maxsz)
- * sizeof(__be32) + rlen;
+ return (op_encode_hdr_size + op_encode_verifier_maxsz +
+ XDR_QUADLEN(rlen)) * sizeof(__be32);
}

static inline u32 nfsd4_remove_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 311e281..a2524b3 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2575,8 +2575,8 @@ static inline int attributes_need_mount(u32 *bmval)
}

static __be32
-nfsd4_encode_dirent_fattr(struct nfsd4_readdir *cd,
- const char *name, int namlen, __be32 **p, int buflen)
+nfsd4_encode_dirent_fattr(struct xdr_stream *xdr, struct nfsd4_readdir *cd,
+ const char *name, int namlen)
{
struct svc_export *exp = cd->rd_fhp->fh_export;
struct dentry *dentry;
@@ -2628,8 +2628,7 @@ nfsd4_encode_dirent_fattr(struct nfsd4_readdir *cd,

}
out_encode:
- nfserr = nfsd4_encode_fattr_to_buf(p, buflen, NULL, exp, dentry,
- cd->rd_bmval,
+ nfserr = nfsd4_encode_fattr(xdr, NULL, exp, dentry, cd->rd_bmval,
cd->rd_rqstp, ignore_crossmnt);
out_put:
dput(dentry);
@@ -2638,9 +2637,12 @@ out_put:
}

static __be32 *
-nfsd4_encode_rdattr_error(__be32 *p, int buflen, __be32 nfserr)
+nfsd4_encode_rdattr_error(struct xdr_stream *xdr, __be32 nfserr)
{
- if (buflen < 6)
+ __be32 *p;
+
+ p = xdr_reserve_space(xdr, 6);
+ if (!p)
return NULL;
*p++ = htonl(2);
*p++ = htonl(FATTR4_WORD0_RDATTR_ERROR); /* bmval0 */
@@ -2657,10 +2659,13 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
{
struct readdir_cd *ccd = ccdv;
struct nfsd4_readdir *cd = container_of(ccd, struct nfsd4_readdir, common);
- int buflen;
- __be32 *p = cd->buffer;
- __be32 *cookiep;
+ struct xdr_stream *xdr = cd->xdr;
+ int start_offset = xdr->buf->len;
+ int cookie_offset;
+ int entry_bytes;
__be32 nfserr = nfserr_toosmall;
+ __be64 wire_offset;
+ __be32 *p;

/* In nfsv4, "." and ".." never make it onto the wire.. */
if (name && isdotent(name, namlen)) {
@@ -2668,19 +2673,24 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
return 0;
}

- if (cd->offset)
- xdr_encode_hyper(cd->offset, (u64) offset);
+ if (cd->cookie_offset) {
+ wire_offset = cpu_to_be64(offset);
+ write_bytes_to_xdr_buf(xdr->buf, cd->cookie_offset,
+ &wire_offset, 8);
+ }

- buflen = cd->buflen - 4 - XDR_QUADLEN(namlen);
- if (buflen < 0)
+ p = xdr_reserve_space(xdr, 4);
+ if (!p)
goto fail;
-
*p++ = xdr_one; /* mark entry present */
- cookiep = p;
+ cookie_offset = xdr->buf->len;
+ p = xdr_reserve_space(xdr, 3*4 + namlen);
+ if (!p)
+ goto fail;
p = xdr_encode_hyper(p, NFS_OFFSET_MAX); /* offset of next entry */
p = xdr_encode_array(p, name, namlen); /* name length & name */

- nfserr = nfsd4_encode_dirent_fattr(cd, name, namlen, &p, buflen);
+ nfserr = nfsd4_encode_dirent_fattr(xdr, cd, name, namlen);
switch (nfserr) {
case nfs_ok:
break;
@@ -2699,19 +2709,23 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
*/
if (!(cd->rd_bmval[0] & FATTR4_WORD0_RDATTR_ERROR))
goto fail;
- p = nfsd4_encode_rdattr_error(p, buflen, nfserr);
+ p = nfsd4_encode_rdattr_error(xdr, nfserr);
if (p == NULL) {
nfserr = nfserr_toosmall;
goto fail;
}
}
- cd->buflen -= (p - cd->buffer);
- cd->buffer = p;
- cd->offset = cookiep;
+ nfserr = nfserr_toosmall;
+ entry_bytes = xdr->buf->len - start_offset;
+ if (entry_bytes > cd->rd_maxcount)
+ goto fail;
+ cd->rd_maxcount -= entry_bytes;
+ cd->cookie_offset = cookie_offset;
skip_entry:
cd->common.err = nfs_ok;
return 0;
fail:
+ xdr_truncate_encode(xdr, start_offset);
cd->common.err = nfserr;
return -EINVAL;
}
@@ -3206,10 +3220,11 @@ static __be32
nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readdir *readdir)
{
int maxcount;
+ int bytes_left;
loff_t offset;
+ __be64 wire_offset;
struct xdr_stream *xdr = &resp->xdr;
int starting_len = xdr->buf->len;
- __be32 *page, *tailbase;
__be32 *p;

if (nfserr)
@@ -3219,38 +3234,38 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
if (!p)
return nfserr_resource;

- if (resp->xdr.buf->page_len)
- return nfserr_resource;
- if (!*resp->rqstp->rq_next_page)
- return nfserr_resource;
-
/* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
WRITE32(0);
WRITE32(0);
resp->xdr.buf->head[0].iov_len = ((char *)resp->xdr.p)
- (char *)resp->xdr.buf->head[0].iov_base;
- tailbase = p;
-
- maxcount = PAGE_SIZE;
- if (maxcount > readdir->rd_maxcount)
- maxcount = readdir->rd_maxcount;

/*
- * Convert from bytes to words, account for the two words already
- * written, make sure to leave two words at the end for the next
- * pointer and eof field.
+ * Number of bytes left for directory entries allowing for the
+ * final 8 bytes of the readdir and a following failed op:
+ */
+ bytes_left = xdr->buf->buflen - xdr->buf->len
+ - COMPOUND_ERR_SLACK_SPACE - 8;
+ if (bytes_left < 0) {
+ nfserr = nfserr_resource;
+ goto err_no_verf;
+ }
+ maxcount = min_t(u32, readdir->rd_maxcount, INT_MAX);
+ /*
+ * Note the rfc defines rd_maxcount as the size of the
+ * READDIR4resok structure, which includes the verifier above
+ * and the 8 bytes encoded at the end of this function:
*/
- maxcount = (maxcount >> 2) - 4;
- if (maxcount < 0) {
- nfserr = nfserr_toosmall;
+ if (maxcount < 16) {
+ nfserr = nfserr_toosmall;
goto err_no_verf;
}
+ maxcount = min_t(int, maxcount-16, bytes_left);

- page = page_address(*(resp->rqstp->rq_next_page++));
+ readdir->xdr = xdr;
+ readdir->rd_maxcount = maxcount;
readdir->common.err = 0;
- readdir->buflen = maxcount;
- readdir->buffer = page;
- readdir->offset = NULL;
+ readdir->cookie_offset = 0;

offset = readdir->rd_cookie;
nfserr = nfsd_readdir(readdir->rd_rqstp, readdir->rd_fhp,
@@ -3258,33 +3273,31 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
&readdir->common, nfsd4_encode_dirent);
if (nfserr == nfs_ok &&
readdir->common.err == nfserr_toosmall &&
- readdir->buffer == page)
- nfserr = nfserr_toosmall;
+ xdr->buf->len == starting_len + 8) {
+ /* nothing encoded; which limit did we hit?: */
+ if (maxcount - 16 < bytes_left)
+ /* It was the fault of rd_maxcount: */
+ nfserr = nfserr_toosmall;
+ else
+ /* We ran out of buffer space: */
+ nfserr = nfserr_resource;
+ }
if (nfserr)
goto err_no_verf;

- if (readdir->offset)
- xdr_encode_hyper(readdir->offset, offset);
+ if (readdir->cookie_offset) {
+ wire_offset = cpu_to_be64(offset);
+ write_bytes_to_xdr_buf(xdr->buf, readdir->cookie_offset,
+ &wire_offset, 8);
+ }

- p = readdir->buffer;
+ p = xdr_reserve_space(xdr, 8);
+ if (!p) {
+ WARN_ON_ONCE(1);
+ goto err_no_verf;
+ }
*p++ = 0; /* no more entries */
*p++ = htonl(readdir->common.err == nfserr_eof);
- resp->xdr.buf->page_len = ((char *)p) -
- (char*)page_address(*(resp->rqstp->rq_next_page-1));
- xdr->buf->len += xdr->buf->page_len;
-
- xdr->iov = xdr->buf->tail;
-
- xdr->page_ptr++;
- xdr->buf->buflen -= PAGE_SIZE;
- xdr->iov = xdr->buf->tail;
-
- /* Use rest of head for padding and remaining ops: */
- resp->xdr.buf->tail[0].iov_base = tailbase;
- resp->xdr.buf->tail[0].iov_len = 0;
- resp->xdr.p = resp->xdr.buf->tail[0].iov_base;
- resp->xdr.end = resp->xdr.p +
- (PAGE_SIZE - resp->xdr.buf->head[0].iov_len)/4;

return 0;
err_no_verf:
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 41e5229..18cbb6d 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -287,9 +287,8 @@ struct nfsd4_readdir {
struct svc_fh * rd_fhp; /* response */

struct readdir_cd common;
- __be32 * buffer;
- int buflen;
- __be32 * offset;
+ struct xdr_stream *xdr;
+ int cookie_offset;
};

struct nfsd4_release_lockowner {
--
1.9.0


2014-05-22 19:32:43

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 35/52] nfsd4: use session limits to release send buffer reservation

From: "J. Bruce Fields" <[email protected]>

Once we know the limits the session places on the size of the rpc, we
can also use that information to release any unnecessary reserved reply
buffer space.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4state.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 09be2e0..7a06246 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2290,6 +2290,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
nfserr_rep_too_big;
if (xdr_restrict_buflen(xdr, buflen - 2 * RPC_MAX_AUTH_SIZE))
goto out_put_session;
+ svc_reserve(rqstp, buflen);

status = nfs_ok;
/* Success! bump slot seqid */
--
1.9.0


2014-05-22 19:32:32

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 07/52] nfsd4: embed xdr_stream in nfsd4_compoundres

From: "J. Bruce Fields" <[email protected]>

This is a mechanical transformation with no change in behavior.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 12 +++++-----
fs/nfsd/nfs4state.c | 8 +++----
fs/nfsd/nfs4xdr.c | 67 ++++++++++++++++++++++++++++-------------------------
fs/nfsd/xdr4.h | 4 +---
4 files changed, 46 insertions(+), 45 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index b0e075c..6d94adf 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1270,13 +1270,13 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
u32 plen = 0;
__be32 status;

- resp->xbuf = &rqstp->rq_res;
- resp->p = rqstp->rq_res.head[0].iov_base +
+ resp->xdr.buf = &rqstp->rq_res;
+ resp->xdr.p = rqstp->rq_res.head[0].iov_base +
rqstp->rq_res.head[0].iov_len;
- resp->tagp = resp->p;
+ resp->tagp = resp->xdr.p;
/* reserve space for: taglen, tag, and opcnt */
- resp->p += 2 + XDR_QUADLEN(args->taglen);
- resp->end = rqstp->rq_res.head[0].iov_base + PAGE_SIZE;
+ resp->xdr.p += 2 + XDR_QUADLEN(args->taglen);
+ resp->xdr.end = rqstp->rq_res.head[0].iov_base + PAGE_SIZE;
resp->taglen = args->taglen;
resp->tag = args->tag;
resp->opcnt = 0;
@@ -1328,7 +1328,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
* failed response to the next operation. If we don't
* have enough room, fail with ERR_RESOURCE.
*/
- slack_bytes = (char *)resp->end - (char *)resp->p;
+ slack_bytes = (char *)resp->xdr.end - (char *)resp->xdr.p;
if (slack_bytes < COMPOUND_SLACK_SPACE
+ COMPOUND_ERR_SLACK_SPACE) {
BUG_ON(slack_bytes < COMPOUND_ERR_SLACK_SPACE);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a037627..56e8992 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1573,10 +1573,10 @@ nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
slot->sl_datalen = 0;
return;
}
- slot->sl_datalen = (char *)resp->p - (char *)resp->cstate.datap;
+ slot->sl_datalen = (char *)resp->xdr.p - (char *)resp->cstate.datap;
base = (char *)resp->cstate.datap -
- (char *)resp->xbuf->head[0].iov_base;
- if (read_bytes_from_xdr_buf(resp->xbuf, base, slot->sl_data,
+ (char *)resp->xdr.buf->head[0].iov_base;
+ if (read_bytes_from_xdr_buf(resp->xdr.buf, base, slot->sl_data,
slot->sl_datalen))
WARN("%s: sessions DRC could not cache compound\n", __func__);
return;
@@ -1630,7 +1630,7 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
memcpy(resp->cstate.datap, slot->sl_data, slot->sl_datalen);

resp->opcnt = slot->sl_opcnt;
- resp->p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
+ resp->xdr.p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
status = slot->sl_status;

return status;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index e866a06..9113725 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1747,10 +1747,10 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
}

#define RESERVE_SPACE(nbytes) do { \
- p = resp->p; \
- BUG_ON(p + XDR_QUADLEN(nbytes) > resp->end); \
+ p = resp->xdr.p; \
+ BUG_ON(p + XDR_QUADLEN(nbytes) > resp->xdr.end); \
} while (0)
-#define ADJUST_ARGS() resp->p = p
+#define ADJUST_ARGS() resp->xdr.p = p

/* Encode as an array of strings the string given with components
* separated @sep, escaped with esc_enter and esc_exit.
@@ -2751,9 +2751,9 @@ nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
if (nfserr)
return nfserr;

- buflen = resp->end - resp->p - (COMPOUND_ERR_SLACK_SPACE >> 2);
+ buflen = resp->xdr.end - resp->xdr.p - (COMPOUND_ERR_SLACK_SPACE >> 2);
nfserr = nfsd4_encode_fattr(fhp, fhp->fh_export, fhp->fh_dentry,
- &resp->p, buflen, getattr->ga_bmval,
+ &resp->xdr.p, buflen, getattr->ga_bmval,
resp->rqstp, 0);
return nfserr;
}
@@ -2953,7 +2953,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

if (nfserr)
return nfserr;
- if (resp->xbuf->page_len)
+ if (resp->xdr.buf->page_len)
return nfserr_resource;

RESERVE_SPACE(8); /* eof flag and byte count */
@@ -2991,18 +2991,18 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(eof);
WRITE32(maxcount);
ADJUST_ARGS();
- resp->xbuf->head[0].iov_len = (char*)p
- - (char*)resp->xbuf->head[0].iov_base;
- resp->xbuf->page_len = maxcount;
+ resp->xdr.buf->head[0].iov_len = (char *)p
+ - (char *)resp->xdr.buf->head[0].iov_base;
+ resp->xdr.buf->page_len = maxcount;

/* Use rest of head for padding and remaining ops: */
- resp->xbuf->tail[0].iov_base = p;
- resp->xbuf->tail[0].iov_len = 0;
+ resp->xdr.buf->tail[0].iov_base = p;
+ resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
RESERVE_SPACE(4);
WRITE32(0);
- resp->xbuf->tail[0].iov_base += maxcount&3;
- resp->xbuf->tail[0].iov_len = 4 - (maxcount&3);
+ resp->xdr.buf->tail[0].iov_base += maxcount&3;
+ resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
ADJUST_ARGS();
}
return 0;
@@ -3017,7 +3017,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd

if (nfserr)
return nfserr;
- if (resp->xbuf->page_len)
+ if (resp->xdr.buf->page_len)
return nfserr_resource;
if (!*resp->rqstp->rq_next_page)
return nfserr_resource;
@@ -3041,18 +3041,18 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd

WRITE32(maxcount);
ADJUST_ARGS();
- resp->xbuf->head[0].iov_len = (char*)p
- - (char*)resp->xbuf->head[0].iov_base;
- resp->xbuf->page_len = maxcount;
+ resp->xdr.buf->head[0].iov_len = (char *)p
+ - (char *)resp->xdr.buf->head[0].iov_base;
+ resp->xdr.buf->page_len = maxcount;

/* Use rest of head for padding and remaining ops: */
- resp->xbuf->tail[0].iov_base = p;
- resp->xbuf->tail[0].iov_len = 0;
+ resp->xdr.buf->tail[0].iov_base = p;
+ resp->xdr.buf->tail[0].iov_len = 0;
if (maxcount&3) {
RESERVE_SPACE(4);
WRITE32(0);
- resp->xbuf->tail[0].iov_base += maxcount&3;
- resp->xbuf->tail[0].iov_len = 4 - (maxcount&3);
+ resp->xdr.buf->tail[0].iov_base += maxcount&3;
+ resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
ADJUST_ARGS();
}
return 0;
@@ -3068,7 +3068,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4

if (nfserr)
return nfserr;
- if (resp->xbuf->page_len)
+ if (resp->xdr.buf->page_len)
return nfserr_resource;
if (!*resp->rqstp->rq_next_page)
return nfserr_resource;
@@ -3080,7 +3080,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
WRITE32(0);
WRITE32(0);
ADJUST_ARGS();
- resp->xbuf->head[0].iov_len = ((char*)resp->p) - (char*)resp->xbuf->head[0].iov_base;
+ resp->xdr.buf->head[0].iov_len = ((char *)resp->xdr.p)
+ - (char *)resp->xdr.buf->head[0].iov_base;
tailbase = p;

maxcount = PAGE_SIZE;
@@ -3121,14 +3122,15 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
p = readdir->buffer;
*p++ = 0; /* no more entries */
*p++ = htonl(readdir->common.err == nfserr_eof);
- resp->xbuf->page_len = ((char*)p) -
+ resp->xdr.buf->page_len = ((char *)p) -
(char*)page_address(*(resp->rqstp->rq_next_page-1));

/* Use rest of head for padding and remaining ops: */
- resp->xbuf->tail[0].iov_base = tailbase;
- resp->xbuf->tail[0].iov_len = 0;
- resp->p = resp->xbuf->tail[0].iov_base;
- resp->end = resp->p + (PAGE_SIZE - resp->xbuf->head[0].iov_len)/4;
+ resp->xdr.buf->tail[0].iov_base = tailbase;
+ resp->xdr.buf->tail[0].iov_len = 0;
+ resp->xdr.p = resp->xdr.buf->tail[0].iov_base;
+ resp->xdr.end = resp->xdr.p +
+ (PAGE_SIZE - resp->xdr.buf->head[0].iov_len)/4;

return 0;
err_no_verf:
@@ -3587,10 +3589,10 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 pad)
session = resp->cstate.session;

if (xb->page_len == 0) {
- length = (char *)resp->p - (char *)xb->head[0].iov_base + pad;
+ length = (char *)resp->xdr.p - (char *)xb->head[0].iov_base + pad;
} else {
if (xb->tail[0].iov_base && xb->tail[0].iov_len > 0)
- tlen = (char *)resp->p - (char *)xb->tail[0].iov_base;
+ tlen = (char *)resp->xdr.p - (char *)xb->tail[0].iov_base;

length = xb->head[0].iov_len + xb->page_len + tlen + pad;
}
@@ -3629,7 +3631,8 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
op->status = nfsd4_check_resp_size(resp, 0);
if (so) {
so->so_replay.rp_status = op->status;
- so->so_replay.rp_buflen = (char *)resp->p - (char *)(statp+1);
+ so->so_replay.rp_buflen = (char *)resp->xdr.p
+ - (char *)(statp+1);
memcpy(so->so_replay.rp_buf, statp+1, so->so_replay.rp_buflen);
}
status:
@@ -3731,7 +3734,7 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
iov = &rqstp->rq_res.tail[0];
else
iov = &rqstp->rq_res.head[0];
- iov->iov_len = ((char*)resp->p) - (char*)iov->iov_base;
+ iov->iov_len = ((char *)resp->xdr.p) - (char *)iov->iov_base;
BUG_ON(iov->iov_len > PAGE_SIZE);
if (nfsd4_has_session(cs)) {
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 5ea7df3..6884d70 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -506,9 +506,7 @@ struct nfsd4_compoundargs {

struct nfsd4_compoundres {
/* scratch variables for XDR encode */
- __be32 * p;
- __be32 * end;
- struct xdr_buf * xbuf;
+ struct xdr_stream xdr;
struct svc_rqst * rqstp;

u32 taglen;
--
1.9.0


2014-05-22 19:32:45

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 40/52] nfsd4: estimate sequence response size

From: "J. Bruce Fields" <[email protected]>

Otherwise a following patch would turn off all 4.1 zero-copy reads.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 7c3df68..f5f608d 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1567,6 +1567,12 @@ static inline u32 nfsd4_rename_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op
+ op_encode_change_info_maxsz) * sizeof(__be32);
}

+static inline u32 nfsd4_sequence_rsize(struct svc_rqst *rqstp,
+ struct nfsd4_op *op)
+{
+ return NFS4_MAX_SESSIONID_LEN + 20;
+}
+
static inline u32 nfsd4_setattr_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
return (op_encode_hdr_size + nfs4_fattr_bitmap_maxsz) * sizeof(__be32);
--
1.9.0


2014-05-22 19:32:33

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 19/52] nfsd4: use xdr_truncate_encode

From: "J. Bruce Fields" <[email protected]>

Now that lengths are reliable, we can use xdr_truncate instead of
open-coding it everywhere.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 33 ++++++++++++++++-----------------
1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index f9d26d8..c8435ba 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2053,7 +2053,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
struct svc_fh *tempfh = NULL;
struct kstatfs statfs;
__be32 *p;
- __be32 *start = xdr->p;
+ int starting_len = xdr->buf->len;
__be32 *attrlenp;
u32 dummy;
u64 dummy64;
@@ -2547,13 +2547,8 @@ out:
fh_put(tempfh);
kfree(tempfh);
}
- if (status) {
- int nbytes = (char *)xdr->p - (char *)start;
- /* open code what *should* be xdr_truncate(xdr, len); */
- xdr->iov->iov_len -= nbytes;
- xdr->buf->len -= nbytes;
- xdr->p = start;
- }
+ if (status)
+ xdr_truncate_encode(xdr, starting_len);
return status;
out_nfserr:
status = nfserrno(err);
@@ -3008,6 +3003,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
struct page *page;
unsigned long maxcount;
struct xdr_stream *xdr = &resp->xdr;
+ int starting_len = xdr->buf->len;
long len;
__be32 *p;

@@ -3044,8 +3040,13 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
&maxcount);

if (nfserr) {
- xdr->p -= 2;
- xdr->iov->iov_len -= 8;
+ /*
+ * nfsd_splice_actor may have already messed with the
+ * page length; reset it so as not to confuse
+ * xdr_truncate_encode:
+ */
+ xdr->buf->page_len = 0;
+ xdr_truncate_encode(xdr, starting_len);
return nfserr;
}
eof = (read->rd_offset + maxcount >=
@@ -3078,6 +3079,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
int maxcount;
struct xdr_stream *xdr = &resp->xdr;
char *page;
+ int length_offset = xdr->buf->len;
__be32 *p;

if (nfserr)
@@ -3102,8 +3104,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
if (nfserr == nfserr_isdir)
nfserr = nfserr_inval;
if (nfserr) {
- xdr->p--;
- xdr->iov->iov_len -= 4;
+ xdr_truncate_encode(xdr, length_offset);
return nfserr;
}

@@ -3132,7 +3133,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
int maxcount;
loff_t offset;
struct xdr_stream *xdr = &resp->xdr;
- __be32 *page, *savep, *tailbase;
+ int starting_len = xdr->buf->len;
+ __be32 *page, *tailbase;
__be32 *p;

if (nfserr)
@@ -3143,7 +3145,6 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
return nfserr_resource;

RESERVE_SPACE(NFS4_VERIFIER_SIZE);
- savep = p;

/* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
WRITE32(0);
@@ -3205,9 +3206,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4

return 0;
err_no_verf:
- xdr->p = savep;
- xdr->iov->iov_len = ((char *)resp->xdr.p)
- - (char *)resp->xdr.buf->head[0].iov_base;
+ xdr_truncate_encode(xdr, starting_len);
return nfserr;
}

--
1.9.0


2014-05-22 19:32:33

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 16/52] nfsd4: no need for encode_compoundres to adjust lengths

From: "J. Bruce Fields" <[email protected]>

xdr_reserve_space should now be calculating the length correctly as we
go, so there's no longer any need to fix it up here.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4state.c | 3 +++
fs/nfsd/nfs4xdr.c | 8 +-------
2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 56e8992..63bb9eb 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1618,6 +1618,7 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
struct nfsd4_sequence *seq)
{
struct nfsd4_slot *slot = resp->cstate.slot;
+ struct kvec *head = resp->xdr.iov;
__be32 status;

dprintk("--> %s slot %p\n", __func__, slot);
@@ -1631,6 +1632,8 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,

resp->opcnt = slot->sl_opcnt;
resp->xdr.p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
+ head->iov_len = (void *)resp->xdr.p - head->iov_base;
+ resp->xdr.buf->len = head->iov_len;
status = slot->sl_status;

return status;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 865a2c2..2fc4f9c 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3789,19 +3789,13 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
* All that remains is to write the tag and operation count...
*/
struct nfsd4_compound_state *cs = &resp->cstate;
- struct kvec *iov;
+
p = resp->tagp;
*p++ = htonl(resp->taglen);
memcpy(p, resp->tag, resp->taglen);
p += XDR_QUADLEN(resp->taglen);
*p++ = htonl(resp->opcnt);

- if (rqstp->rq_res.page_len)
- iov = &rqstp->rq_res.tail[0];
- else
- iov = &rqstp->rq_res.head[0];
- iov->iov_len = ((char *)resp->xdr.p) - (char *)iov->iov_base;
- BUG_ON(iov->iov_len > PAGE_SIZE);
if (nfsd4_has_session(cs)) {
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
struct nfs4_client *clp = cs->session->se_client;
--
1.9.0


2014-05-22 19:32:38

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 23/52] nfsd4: nfsd4_check_resp_size needn't recalculate length

From: "J. Bruce Fields" <[email protected]>

We're keeping the length updated as we go now, so there's no need for
the extra calculation here.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 18 +++---------------
1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 9fb1b1a..be5793f 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3752,32 +3752,20 @@ static nfsd4_enc nfsd4_enc_ops[] = {
*/
__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 pad)
{
- struct xdr_buf *xb = &resp->rqstp->rq_res;
+ struct xdr_buf *buf = &resp->rqstp->rq_res;
struct nfsd4_session *session = NULL;
struct nfsd4_slot *slot = resp->cstate.slot;
- u32 length, tlen = 0;

if (!nfsd4_has_session(&resp->cstate))
return 0;

session = resp->cstate.session;

- if (xb->page_len == 0) {
- length = (char *)resp->xdr.p - (char *)xb->head[0].iov_base + pad;
- } else {
- if (xb->tail[0].iov_base && xb->tail[0].iov_len > 0)
- tlen = (char *)resp->xdr.p - (char *)xb->tail[0].iov_base;
-
- length = xb->head[0].iov_len + xb->page_len + tlen + pad;
- }
- dprintk("%s length %u, xb->page_len %u tlen %u pad %u\n", __func__,
- length, xb->page_len, tlen, pad);
-
- if (length > session->se_fchannel.maxresp_sz)
+ if (buf->len + pad > session->se_fchannel.maxresp_sz)
return nfserr_rep_too_big;

if ((slot->sl_flags & NFSD4_SLOT_CACHETHIS) &&
- length > session->se_fchannel.maxresp_cached)
+ buf->len + pad > session->se_fchannel.maxresp_cached)
return nfserr_rep_too_big_to_cache;

return 0;
--
1.9.0


2014-05-22 19:32:49

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 50/52] nfsd4: kill WRITEMEM

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 62 +++++++++++++++++++++++++------------------------------
1 file changed, 28 insertions(+), 34 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 46fc861..2a5bd29 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1683,12 +1683,6 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
DECODE_TAIL;
}

-#define WRITEMEM(ptr,nbytes) do { if (nbytes > 0) { \
- *(p + XDR_QUADLEN(nbytes) -1) = 0; \
- memcpy(p, ptr, nbytes); \
- p += XDR_QUADLEN(nbytes); \
-}} while (0)
-
static void write32(__be32 **p, u32 n)
{
*(*p)++ = htonl(n);
@@ -1769,8 +1763,7 @@ static __be32 nfsd4_encode_components_esc(struct xdr_stream *xdr, char sep,
p = xdr_reserve_space(xdr, strlen + 4);
if (!p)
return nfserr_resource;
- *p++ = cpu_to_be32(strlen);
- WRITEMEM(str, strlen);
+ p = xdr_encode_opaque(p, str, strlen);
count++;
}
else
@@ -1865,8 +1858,7 @@ static __be32 nfsd4_encode_path(struct xdr_stream *xdr,
spin_unlock(&dentry->d_lock);
goto out_free;
}
- *p++ = cpu_to_be32(len);
- WRITEMEM(dentry->d_name.name, len);
+ p = xdr_encode_opaque(p, dentry->d_name.name, len);
dprintk("/%s", dentry->d_name.name);
spin_unlock(&dentry->d_lock);
dput(dentry);
@@ -2239,7 +2231,7 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
*p++ = cpu_to_be32(MINOR(stat.dev));
break;
case FSIDSOURCE_UUID:
- WRITEMEM(exp->ex_uuid, 16);
+ p = xdr_encode_opaque_fixed(p, exp->ex_uuid, 16);
break;
}
}
@@ -2326,8 +2318,8 @@ out_acl:
p = xdr_reserve_space(xdr, fhp->fh_handle.fh_size + 4);
if (!p)
goto out_resource;
- *p++ = cpu_to_be32(fhp->fh_handle.fh_size);
- WRITEMEM(&fhp->fh_handle.fh_base, fhp->fh_handle.fh_size);
+ p = xdr_encode_opaque(p, &fhp->fh_handle.fh_base,
+ fhp->fh_handle.fh_size);
}
if (bmval0 & FATTR4_WORD0_FILEID) {
p = xdr_reserve_space(xdr, 8);
@@ -2748,7 +2740,8 @@ nfsd4_encode_stateid(struct xdr_stream *xdr, stateid_t *sid)
if (!p)
return nfserr_resource;
*p++ = cpu_to_be32(sid->si_generation);
- WRITEMEM(&sid->si_opaque, sizeof(stateid_opaque_t));
+ p = xdr_encode_opaque_fixed(p, &sid->si_opaque,
+ sizeof(stateid_opaque_t));
return 0;
}

@@ -2777,7 +2770,8 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 8);
if (!p)
return nfserr_resource;
- WRITEMEM(bcts->sessionid.data, NFS4_MAX_SESSIONID_LEN);
+ p = xdr_encode_opaque_fixed(p, bcts->sessionid.data,
+ NFS4_MAX_SESSIONID_LEN);
*p++ = cpu_to_be32(bcts->dir);
/* Sorry, we do not yet support RDMA over 4.1: */
*p++ = cpu_to_be32(0);
@@ -2807,7 +2801,8 @@ nfsd4_encode_commit(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
p = xdr_reserve_space(xdr, NFS4_VERIFIER_SIZE);
if (!p)
return nfserr_resource;
- WRITEMEM(commit->co_verf.data, NFS4_VERIFIER_SIZE);
+ p = xdr_encode_opaque_fixed(p, commit->co_verf.data,
+ NFS4_VERIFIER_SIZE);
}
return nfserr;
}
@@ -2858,8 +2853,7 @@ nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh
p = xdr_reserve_space(xdr, len + 4);
if (!p)
return nfserr_resource;
- *p++ = cpu_to_be32(len);
- WRITEMEM(&fhp->fh_handle.fh_base, len);
+ p = xdr_encode_opaque(p, &fhp->fh_handle.fh_base, len);
}
return nfserr;
}
@@ -2892,9 +2886,8 @@ again:
p = xdr_encode_hyper(p, ld->ld_length);
*p++ = cpu_to_be32(ld->ld_type);
if (conf->len) {
- WRITEMEM(&ld->ld_clientid, 8);
- *p++ = cpu_to_be32(conf->len);
- WRITEMEM(conf->data, conf->len);
+ p = xdr_encode_opaque_fixed(p, &ld->ld_clientid, 8);
+ p = xdr_encode_opaque(p, conf->data, conf->len);
} else { /* non - nfsv4 lock in conflict, no clientid nor owner */
p = xdr_encode_hyper(p, (u64)0); /* clientid */
*p++ = cpu_to_be32(0); /* length of owner name */
@@ -3461,8 +3454,7 @@ nfsd4_do_encode_secinfo(struct xdr_stream *xdr,
if (!p)
goto out;
*p++ = cpu_to_be32(RPC_AUTH_GSS);
- *p++ = cpu_to_be32(info.oid.len);
- WRITEMEM(info.oid.data, info.oid.len);
+ p = xdr_encode_opaque(p, info.oid.data, info.oid.len);
*p++ = cpu_to_be32(info.qop);
*p++ = cpu_to_be32(info.service);
} else if (pf < RPC_AUTH_MAXFLAVOR) {
@@ -3544,8 +3536,9 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct n
p = xdr_reserve_space(xdr, 8 + NFS4_VERIFIER_SIZE);
if (!p)
return nfserr_resource;
- WRITEMEM(&scd->se_clientid, 8);
- WRITEMEM(&scd->se_confirm, NFS4_VERIFIER_SIZE);
+ p = xdr_encode_opaque_fixed(p, &scd->se_clientid, 8);
+ p = xdr_encode_opaque_fixed(p, &scd->se_confirm,
+ NFS4_VERIFIER_SIZE);
}
else if (nfserr == nfserr_clid_inuse) {
p = xdr_reserve_space(xdr, 8);
@@ -3569,7 +3562,8 @@ nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_w
return nfserr_resource;
*p++ = cpu_to_be32(write->wr_bytes_written);
*p++ = cpu_to_be32(write->wr_how_written);
- WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
+ p = xdr_encode_opaque_fixed(p, write->wr_verifier.data,
+ NFS4_VERIFIER_SIZE);
}
return nfserr;
}
@@ -3610,7 +3604,7 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
if (!p)
return nfserr_resource;

- WRITEMEM(&exid->clientid, 8);
+ p = xdr_encode_opaque_fixed(p, &exid->clientid, 8);
*p++ = cpu_to_be32(exid->seqid);
*p++ = cpu_to_be32(exid->flags);

@@ -3650,12 +3644,10 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
/* The server_owner struct */
p = xdr_encode_hyper(p, minor_id); /* Minor id */
/* major id */
- *p++ = cpu_to_be32(major_id_sz);
- WRITEMEM(major_id, major_id_sz);
+ p = xdr_encode_opaque(p, major_id, major_id_sz);

/* Server scope */
- *p++ = cpu_to_be32(server_scope_sz);
- WRITEMEM(server_scope, server_scope_sz);
+ p = xdr_encode_opaque(p, server_scope, server_scope_sz);

/* Implementation id */
*p++ = cpu_to_be32(0); /* zero length nfs_impl_id4 array */
@@ -3675,7 +3667,8 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
p = xdr_reserve_space(xdr, 24);
if (!p)
return nfserr_resource;
- WRITEMEM(sess->sessionid.data, NFS4_MAX_SESSIONID_LEN);
+ p = xdr_encode_opaque_fixed(p, sess->sessionid.data,
+ NFS4_MAX_SESSIONID_LEN);
*p++ = cpu_to_be32(sess->seqid);
*p++ = cpu_to_be32(sess->flags);

@@ -3730,7 +3723,8 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
p = xdr_reserve_space(xdr, NFS4_MAX_SESSIONID_LEN + 20);
if (!p)
return nfserr_resource;
- WRITEMEM(seq->sessionid.data, NFS4_MAX_SESSIONID_LEN);
+ p = xdr_encode_opaque_fixed(p, seq->sessionid.data,
+ NFS4_MAX_SESSIONID_LEN);
*p++ = cpu_to_be32(seq->seqid);
*p++ = cpu_to_be32(seq->slotid);
/* Note slotid's are numbered from zero: */
@@ -3959,7 +3953,7 @@ nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op)
*p++ = cpu_to_be32(op->opnum);
*p++ = rp->rp_status; /* already xdr'ed */

- WRITEMEM(rp->rp_buf, rp->rp_buflen);
+ p = xdr_encode_opaque_fixed(p, rp->rp_buf, rp->rp_buflen);
}

int
--
1.9.0


2014-05-22 19:32:33

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 17/52] nfsd4: keep xdr buf length updated

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 ++
fs/nfsd/nfs4xdr.c | 12 ++++++++++--
2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 969a7b3..d2453bf 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1264,6 +1264,8 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
xdr->iov = head;
xdr->p = head->iov_base + head->iov_len;
xdr->end = head->iov_base + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
+ /* Tail and page_len should be zero at this point: */
+ buf->len = buf->head[0].iov_len;
}

/*
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 2fc4f9c..f9d26d8 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3053,9 +3053,10 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

WRITE32(eof);
WRITE32(maxcount);
- resp->xdr.buf->head[0].iov_len = (char *)p
- - (char *)resp->xdr.buf->head[0].iov_base;
+ WARN_ON_ONCE(resp->xdr.buf->head[0].iov_len != (char *)p
+ - (char *)resp->xdr.buf->head[0].iov_base);
resp->xdr.buf->page_len = maxcount;
+ xdr->buf->len += maxcount;
xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
@@ -3066,6 +3067,7 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
WRITE32(0);
resp->xdr.buf->tail[0].iov_base += maxcount&3;
resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
+ xdr->buf->len -= (maxcount&3);
}
return 0;
}
@@ -3109,6 +3111,7 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
resp->xdr.buf->head[0].iov_len = (char *)p
- (char *)resp->xdr.buf->head[0].iov_base;
resp->xdr.buf->page_len = maxcount;
+ xdr->buf->len += maxcount;
xdr->iov = xdr->buf->tail;

/* Use rest of head for padding and remaining ops: */
@@ -3189,6 +3192,7 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
*p++ = htonl(readdir->common.err == nfserr_eof);
resp->xdr.buf->page_len = ((char *)p) -
(char*)page_address(*(resp->rqstp->rq_next_page-1));
+ xdr->buf->len += xdr->buf->page_len;

xdr->iov = xdr->buf->tail;

@@ -3789,6 +3793,10 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
* All that remains is to write the tag and operation count...
*/
struct nfsd4_compound_state *cs = &resp->cstate;
+ struct xdr_buf *buf = resp->xdr.buf;
+
+ WARN_ON_ONCE(buf->len != buf->head[0].iov_len + buf->page_len +
+ buf->tail[0].iov_len);

p = resp->tagp;
*p++ = htonl(resp->taglen);
--
1.9.0


2014-05-22 19:32:41

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 29/52] nfsd4: more precise nfsd4_max_reply

From: "J. Bruce Fields" <[email protected]>

It will turn out to be useful to have a more accurate estimate of reply
size; so, piggyback on the existing op reply-size estimators.

Also move nfsd4_max_reply to nfs4proc.c to get easier access to struct
nfsd4_operation and friends. (Thanks to Christoph Hellwig for pointing
out that simplification.)

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 12 ++++++++++++
fs/nfsd/nfs4xdr.c | 35 ++++-------------------------------
fs/nfsd/xdr4.h | 1 +
3 files changed, 17 insertions(+), 31 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 1326a0b..e1196ed 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1856,6 +1856,18 @@ static struct nfsd4_operation nfsd4_ops[] = {
},
};

+int nfsd4_max_reply(struct svc_rqst *rqstp, struct nfsd4_op *op)
+{
+ struct nfsd4_operation *opdesc;
+ nfsd4op_rsize estimator;
+
+ if (op->opnum == OP_ILLEGAL)
+ return op_encode_hdr_size;
+ opdesc = OPDESC(op);
+ estimator = opdesc->op_rsize_bop;
+ return estimator ? estimator(rqstp, op) : PAGE_SIZE;
+}
+
void warn_on_nonidempotent_op(struct nfsd4_op *op)
{
if (OPDESC(op)->op_flags & OP_MODIFIES_SOMETHING) {
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 4c2f866..4877abb 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1605,40 +1605,13 @@ nfsd4_opnum_in_range(struct nfsd4_compoundargs *argp, struct nfsd4_op *op)
return true;
}

-/*
- * Return a rough estimate of the maximum possible reply size. Note the
- * estimate includes rpc headers so is meant to be passed to
- * svc_reserve, not svc_reserve_auth.
- *
- * Also note the current compound encoding permits only one operation to
- * use pages beyond the first one, so the maximum possible length is the
- * maximum over these values, not the sum.
- */
-static int nfsd4_max_reply(u32 opnum)
-{
- switch (opnum) {
- case OP_READLINK:
- case OP_READDIR:
- /*
- * Both of these ops take a single page for data and put
- * the head and tail in another page:
- */
- return 2 * PAGE_SIZE;
- case OP_GETATTR:
- case OP_READ:
- return INT_MAX;
- default:
- return PAGE_SIZE;
- }
-}
-
static __be32
nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
{
DECODE_HEAD;
struct nfsd4_op *op;
bool cachethis = false;
- int max_reply = PAGE_SIZE;
+ int max_reply = 2 * RPC_MAX_AUTH_SIZE + 8; /* opcnt, status */
int i;

READ_BUF(4);
@@ -1647,6 +1620,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
SAVEMEM(argp->tag, argp->taglen);
READ32(argp->minorversion);
READ32(argp->opcnt);
+ max_reply += 4 + (XDR_QUADLEN(argp->taglen) << 2);

if (argp->taglen > NFSD4_MAX_TAGLEN)
goto xdr_error;
@@ -1684,7 +1658,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
*/
cachethis |= nfsd4_cache_this_op(op);

- max_reply = max(max_reply, nfsd4_max_reply(op->opnum));
+ max_reply += nfsd4_max_reply(argp->rqstp, op);

if (op->status) {
argp->opcnt = i+1;
@@ -1694,8 +1668,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
/* Sessions make the DRC unnecessary: */
if (argp->minorversion)
cachethis = false;
- if (max_reply != INT_MAX)
- svc_reserve(argp->rqstp, max_reply);
+ svc_reserve(argp->rqstp, max_reply);
argp->rqstp->rq_cachetype = cachethis ? RC_REPLBUFF : RC_NOCACHE;

DECODE_TAIL;
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index ee9ffdc..41e5229 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -536,6 +536,7 @@ static inline bool nfsd4_last_compound_op(struct svc_rqst *rqstp)
return argp->opcnt == resp->opcnt;
}

+int nfsd4_max_reply(struct svc_rqst *rqstp, struct nfsd4_op *op);
void warn_on_nonidempotent_op(struct nfsd4_op *op);

#define NFS4_SVC_XDRSIZE sizeof(struct nfsd4_compoundargs)
--
1.9.0


2014-05-22 19:32:34

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 06/52] nfsd4: fix write reply size estimate

From: "J. Bruce Fields" <[email protected]>

The write reply also includes count and stable_how.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 44cd477..b0e075c 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1530,7 +1530,7 @@ static inline u32 nfsd4_setclientid_rsize(struct svc_rqst *rqstp, struct nfsd4_o

static inline u32 nfsd4_write_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
- return (op_encode_hdr_size + op_encode_verifier_maxsz) * sizeof(__be32);
+ return (op_encode_hdr_size + 2 + op_encode_verifier_maxsz) * sizeof(__be32);
}

static inline u32 nfsd4_exchange_id_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
--
1.9.0


2014-05-22 19:32:32

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 09/52] nfsd4: move proc_compound xdr encode init to helper

From: "J. Bruce Fields" <[email protected]>

Mechanical transformation with no change of behavior.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 94ef528..f809aa8 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1253,6 +1253,18 @@ static bool need_wrongsec_check(struct svc_rqst *rqstp)
return !(nextd->op_flags & OP_HANDLES_WRONGSEC);
}

+static void svcxdr_init_encode(struct svc_rqst *rqstp,
+ struct nfsd4_compoundres *resp)
+{
+ struct xdr_stream *xdr = &resp->xdr;
+ struct xdr_buf *buf = &rqstp->rq_res;
+ struct kvec *head = buf->head;
+
+ xdr->buf = buf;
+ xdr->p = head->iov_base + head->iov_len;
+ xdr->end = head->iov_base + PAGE_SIZE;
+}
+
/*
* COMPOUND call.
*/
@@ -1270,13 +1282,10 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
u32 plen = 0;
__be32 status;

- resp->xdr.buf = &rqstp->rq_res;
- resp->xdr.p = rqstp->rq_res.head[0].iov_base +
- rqstp->rq_res.head[0].iov_len;
+ svcxdr_init_encode(rqstp, resp);
resp->tagp = resp->xdr.p;
/* reserve space for: taglen, tag, and opcnt */
resp->xdr.p += 2 + XDR_QUADLEN(args->taglen);
- resp->xdr.end = rqstp->rq_res.head[0].iov_base + PAGE_SIZE;
resp->taglen = args->taglen;
resp->tag = args->tag;
resp->opcnt = 0;
--
1.9.0


2014-05-22 19:32:49

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 48/52] nfsd4: kill WRITE32

From: "J. Bruce Fields" <[email protected]>

These macros just obscure what's going on. Adopt the convention of the
client-side code.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 306 +++++++++++++++++++++++++++---------------------------
1 file changed, 154 insertions(+), 152 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 5fc3ba0..71071ae 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1683,7 +1683,6 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
DECODE_TAIL;
}

-#define WRITE32(n) *p++ = htonl(n)
#define WRITE64(n) do { \
*p++ = htonl((u32)((n) >> 32)); \
*p++ = htonl((u32)(n)); \
@@ -1774,7 +1773,7 @@ static __be32 nfsd4_encode_components_esc(struct xdr_stream *xdr, char sep,
p = xdr_reserve_space(xdr, strlen + 4);
if (!p)
return nfserr_resource;
- WRITE32(strlen);
+ *p++ = cpu_to_be32(strlen);
WRITEMEM(str, strlen);
count++;
}
@@ -1857,7 +1856,7 @@ static __be32 nfsd4_encode_path(struct xdr_stream *xdr,
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_free;
- WRITE32(ncomponents);
+ *p++ = cpu_to_be32(ncomponents);

while (ncomponents) {
struct dentry *dentry = components[ncomponents - 1];
@@ -1870,7 +1869,7 @@ static __be32 nfsd4_encode_path(struct xdr_stream *xdr,
spin_unlock(&dentry->d_lock);
goto out_free;
}
- WRITE32(len);
+ *p++ = cpu_to_be32(len);
WRITEMEM(dentry->d_name.name, len);
dprintk("/%s", dentry->d_name.name);
spin_unlock(&dentry->d_lock);
@@ -1919,7 +1918,7 @@ static __be32 nfsd4_encode_fs_locations(struct xdr_stream *xdr,
p = xdr_reserve_space(xdr, 4);
if (!p)
return nfserr_resource;
- WRITE32(fslocs->locations_count);
+ *p++ = cpu_to_be32(fslocs->locations_count);
for (i=0; i<fslocs->locations_count; i++) {
status = nfsd4_encode_fs_location4(xdr, &fslocs->locations[i]);
if (status)
@@ -1973,8 +1972,8 @@ nfsd4_encode_security_label(struct xdr_stream *xdr, struct svc_rqst *rqstp,
* For now we use a 0 here to indicate the null translation; in
* the future we may place a call to translation code here.
*/
- WRITE32(0); /* lfs */
- WRITE32(0); /* pi */
+ *p++ = cpu_to_be32(0); /* lfs */
+ *p++ = cpu_to_be32(0); /* pi */
p = xdr_encode_opaque(p, context, len);
return 0;
}
@@ -2123,23 +2122,23 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
p = xdr_reserve_space(xdr, 16);
if (!p)
goto out_resource;
- WRITE32(3);
- WRITE32(bmval0);
- WRITE32(bmval1);
- WRITE32(bmval2);
+ *p++ = cpu_to_be32(3);
+ *p++ = cpu_to_be32(bmval0);
+ *p++ = cpu_to_be32(bmval1);
+ *p++ = cpu_to_be32(bmval2);
} else if (bmval1) {
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
- WRITE32(2);
- WRITE32(bmval0);
- WRITE32(bmval1);
+ *p++ = cpu_to_be32(2);
+ *p++ = cpu_to_be32(bmval0);
+ *p++ = cpu_to_be32(bmval1);
} else {
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE32(1);
- WRITE32(bmval0);
+ *p++ = cpu_to_be32(1);
+ *p++ = cpu_to_be32(bmval0);
}

attrlen_offset = xdr->buf->len;
@@ -2161,17 +2160,17 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
- WRITE32(2);
- WRITE32(word0);
- WRITE32(word1);
+ *p++ = cpu_to_be32(2);
+ *p++ = cpu_to_be32(word0);
+ *p++ = cpu_to_be32(word1);
} else {
p = xdr_reserve_space(xdr, 16);
if (!p)
goto out_resource;
- WRITE32(3);
- WRITE32(word0);
- WRITE32(word1);
- WRITE32(word2);
+ *p++ = cpu_to_be32(3);
+ *p++ = cpu_to_be32(word0);
+ *p++ = cpu_to_be32(word1);
+ *p++ = cpu_to_be32(word2);
}
}
if (bmval0 & FATTR4_WORD0_TYPE) {
@@ -2183,16 +2182,17 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
status = nfserr_serverfault;
goto out;
}
- WRITE32(dummy);
+ *p++ = cpu_to_be32(dummy);
}
if (bmval0 & FATTR4_WORD0_FH_EXPIRE_TYPE) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
if (exp->ex_flags & NFSEXP_NOSUBTREECHECK)
- WRITE32(NFS4_FH_PERSISTENT);
+ *p++ = cpu_to_be32(NFS4_FH_PERSISTENT);
else
- WRITE32(NFS4_FH_PERSISTENT|NFS4_FH_VOL_RENAME);
+ *p++ = cpu_to_be32(NFS4_FH_PERSISTENT|
+ NFS4_FH_VOL_RENAME);
}
if (bmval0 & FATTR4_WORD0_CHANGE) {
p = xdr_reserve_space(xdr, 8);
@@ -2210,19 +2210,19 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval0 & FATTR4_WORD0_SYMLINK_SUPPORT) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval0 & FATTR4_WORD0_NAMED_ATTR) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
}
if (bmval0 & FATTR4_WORD0_FSID) {
p = xdr_reserve_space(xdr, 16);
@@ -2237,10 +2237,10 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
WRITE64((u64)0);
break;
case FSIDSOURCE_DEV:
- WRITE32(0);
- WRITE32(MAJOR(stat.dev));
- WRITE32(0);
- WRITE32(MINOR(stat.dev));
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(MAJOR(stat.dev));
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(MINOR(stat.dev));
break;
case FSIDSOURCE_UUID:
WRITEMEM(exp->ex_uuid, 16);
@@ -2251,19 +2251,19 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
}
if (bmval0 & FATTR4_WORD0_LEASE_TIME) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(nn->nfsd4_lease);
+ *p++ = cpu_to_be32(nn->nfsd4_lease);
}
if (bmval0 & FATTR4_WORD0_RDATTR_ERROR) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(rdattr_err);
+ *p++ = cpu_to_be32(rdattr_err);
}
if (bmval0 & FATTR4_WORD0_ACL) {
struct nfs4_ace *ace;
@@ -2273,21 +2273,22 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
if (!p)
goto out_resource;

- WRITE32(0);
+ *p++ = cpu_to_be32(0);
goto out_acl;
}
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(acl->naces);
+ *p++ = cpu_to_be32(acl->naces);

for (ace = acl->aces; ace < acl->aces + acl->naces; ace++) {
p = xdr_reserve_space(xdr, 4*3);
if (!p)
goto out_resource;
- WRITE32(ace->type);
- WRITE32(ace->flag);
- WRITE32(ace->access_mask & NFS4_ACE_MASK_ALL);
+ *p++ = cpu_to_be32(ace->type);
+ *p++ = cpu_to_be32(ace->flag);
+ *p++ = cpu_to_be32(ace->access_mask &
+ NFS4_ACE_MASK_ALL);
status = nfsd4_encode_aclname(xdr, rqstp, ace);
if (status)
goto out;
@@ -2298,38 +2299,38 @@ out_acl:
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(aclsupport ?
+ *p++ = cpu_to_be32(aclsupport ?
ACL4_SUPPORT_ALLOW_ACL|ACL4_SUPPORT_DENY_ACL : 0);
}
if (bmval0 & FATTR4_WORD0_CANSETTIME) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval0 & FATTR4_WORD0_CASE_INSENSITIVE) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
}
if (bmval0 & FATTR4_WORD0_CASE_PRESERVING) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval0 & FATTR4_WORD0_CHOWN_RESTRICTED) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval0 & FATTR4_WORD0_FILEHANDLE) {
p = xdr_reserve_space(xdr, fhp->fh_handle.fh_size + 4);
if (!p)
goto out_resource;
- WRITE32(fhp->fh_handle.fh_size);
+ *p++ = cpu_to_be32(fhp->fh_handle.fh_size);
WRITEMEM(&fhp->fh_handle.fh_base, fhp->fh_handle.fh_size);
}
if (bmval0 & FATTR4_WORD0_FILEID) {
@@ -2365,7 +2366,7 @@ out_acl:
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval0 & FATTR4_WORD0_MAXFILESIZE) {
p = xdr_reserve_space(xdr, 8);
@@ -2377,13 +2378,13 @@ out_acl:
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(255);
+ *p++ = cpu_to_be32(255);
}
if (bmval0 & FATTR4_WORD0_MAXNAME) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(statfs.f_namelen);
+ *p++ = cpu_to_be32(statfs.f_namelen);
}
if (bmval0 & FATTR4_WORD0_MAXREAD) {
p = xdr_reserve_space(xdr, 8);
@@ -2401,19 +2402,19 @@ out_acl:
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(stat.mode & S_IALLUGO);
+ *p++ = cpu_to_be32(stat.mode & S_IALLUGO);
}
if (bmval1 & FATTR4_WORD1_NO_TRUNC) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(1);
+ *p++ = cpu_to_be32(1);
}
if (bmval1 & FATTR4_WORD1_NUMLINKS) {
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- WRITE32(stat.nlink);
+ *p++ = cpu_to_be32(stat.nlink);
}
if (bmval1 & FATTR4_WORD1_OWNER) {
status = nfsd4_encode_user(xdr, rqstp, stat.uid);
@@ -2429,8 +2430,8 @@ out_acl:
p = xdr_reserve_space(xdr, 8);
if (!p)
goto out_resource;
- WRITE32((u32) MAJOR(stat.rdev));
- WRITE32((u32) MINOR(stat.rdev));
+ *p++ = cpu_to_be32((u32) MAJOR(stat.rdev));
+ *p++ = cpu_to_be32((u32) MINOR(stat.rdev));
}
if (bmval1 & FATTR4_WORD1_SPACE_AVAIL) {
p = xdr_reserve_space(xdr, 8);
@@ -2465,29 +2466,29 @@ out_acl:
if (!p)
goto out_resource;
WRITE64((s64)stat.atime.tv_sec);
- WRITE32(stat.atime.tv_nsec);
+ *p++ = cpu_to_be32(stat.atime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_TIME_DELTA) {
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
- WRITE32(0);
- WRITE32(1);
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(1);
+ *p++ = cpu_to_be32(0);
}
if (bmval1 & FATTR4_WORD1_TIME_METADATA) {
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
WRITE64((s64)stat.ctime.tv_sec);
- WRITE32(stat.ctime.tv_nsec);
+ *p++ = cpu_to_be32(stat.ctime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_TIME_MODIFY) {
p = xdr_reserve_space(xdr, 12);
if (!p)
goto out_resource;
WRITE64((s64)stat.mtime.tv_sec);
- WRITE32(stat.mtime.tv_nsec);
+ *p++ = cpu_to_be32(stat.mtime.tv_nsec);
}
if (bmval1 & FATTR4_WORD1_MOUNTED_ON_FILEID) {
p = xdr_reserve_space(xdr, 8);
@@ -2512,10 +2513,10 @@ out_acl:
p = xdr_reserve_space(xdr, 16);
if (!p)
goto out_resource;
- WRITE32(3);
- WRITE32(NFSD_SUPPATTR_EXCLCREAT_WORD0);
- WRITE32(NFSD_SUPPATTR_EXCLCREAT_WORD1);
- WRITE32(NFSD_SUPPATTR_EXCLCREAT_WORD2);
+ *p++ = cpu_to_be32(3);
+ *p++ = cpu_to_be32(NFSD_SUPPATTR_EXCLCREAT_WORD0);
+ *p++ = cpu_to_be32(NFSD_SUPPATTR_EXCLCREAT_WORD1);
+ *p++ = cpu_to_be32(NFSD_SUPPATTR_EXCLCREAT_WORD2);
}

attrlen = htonl(xdr->buf->len - attrlen_offset - 4);
@@ -2750,7 +2751,7 @@ nfsd4_encode_stateid(struct xdr_stream *xdr, stateid_t *sid)
p = xdr_reserve_space(xdr, sizeof(stateid_t));
if (!p)
return nfserr_resource;
- WRITE32(sid->si_generation);
+ *p++ = cpu_to_be32(sid->si_generation);
WRITEMEM(&sid->si_opaque, sizeof(stateid_opaque_t));
return 0;
}
@@ -2765,8 +2766,8 @@ nfsd4_encode_access(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
p = xdr_reserve_space(xdr, 8);
if (!p)
return nfserr_resource;
- WRITE32(access->ac_supported);
- WRITE32(access->ac_resp_access);
+ *p++ = cpu_to_be32(access->ac_supported);
+ *p++ = cpu_to_be32(access->ac_resp_access);
}
return nfserr;
}
@@ -2781,9 +2782,9 @@ static __be32 nfsd4_encode_bind_conn_to_session(struct nfsd4_compoundres *resp,
if (!p)
return nfserr_resource;
WRITEMEM(bcts->sessionid.data, NFS4_MAX_SESSIONID_LEN);
- WRITE32(bcts->dir);
+ *p++ = cpu_to_be32(bcts->dir);
/* Sorry, we do not yet support RDMA over 4.1: */
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
}
return nfserr;
}
@@ -2826,9 +2827,9 @@ nfsd4_encode_create(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_
if (!p)
return nfserr_resource;
write_cinfo(&p, &create->cr_cinfo);
- WRITE32(2);
- WRITE32(create->cr_bmval[0]);
- WRITE32(create->cr_bmval[1]);
+ *p++ = cpu_to_be32(2);
+ *p++ = cpu_to_be32(create->cr_bmval[0]);
+ *p++ = cpu_to_be32(create->cr_bmval[1]);
}
return nfserr;
}
@@ -2861,7 +2862,7 @@ nfsd4_encode_getfh(struct nfsd4_compoundres *resp, __be32 nfserr, struct svc_fh
p = xdr_reserve_space(xdr, len + 4);
if (!p)
return nfserr_resource;
- WRITE32(len);
+ *p++ = cpu_to_be32(len);
WRITEMEM(&fhp->fh_handle.fh_base, len);
}
return nfserr;
@@ -2893,14 +2894,14 @@ again:
}
WRITE64(ld->ld_start);
WRITE64(ld->ld_length);
- WRITE32(ld->ld_type);
+ *p++ = cpu_to_be32(ld->ld_type);
if (conf->len) {
WRITEMEM(&ld->ld_clientid, 8);
- WRITE32(conf->len);
+ *p++ = cpu_to_be32(conf->len);
WRITEMEM(conf->data, conf->len);
} else { /* non - nfsv4 lock in conflict, no clientid nor owner */
WRITE64((u64)0); /* clientid */
- WRITE32(0); /* length of owner name */
+ *p++ = cpu_to_be32(0); /* length of owner name */
}
return nfserr_denied;
}
@@ -2972,11 +2973,11 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
if (!p)
return nfserr_resource;
write_cinfo(&p, &open->op_cinfo);
- WRITE32(open->op_rflags);
- WRITE32(2);
- WRITE32(open->op_bmval[0]);
- WRITE32(open->op_bmval[1]);
- WRITE32(open->op_delegate_type);
+ *p++ = cpu_to_be32(open->op_rflags);
+ *p++ = cpu_to_be32(2);
+ *p++ = cpu_to_be32(open->op_bmval[0]);
+ *p++ = cpu_to_be32(open->op_bmval[1]);
+ *p++ = cpu_to_be32(open->op_delegate_type);

switch (open->op_delegate_type) {
case NFS4_OPEN_DELEGATE_NONE:
@@ -2988,15 +2989,15 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
p = xdr_reserve_space(xdr, 20);
if (!p)
return nfserr_resource;
- WRITE32(open->op_recall);
+ *p++ = cpu_to_be32(open->op_recall);

/*
* TODO: ACE's in delegations
*/
- WRITE32(NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE);
- WRITE32(0);
- WRITE32(0);
- WRITE32(0); /* XXX: is NULL principal ok? */
+ *p++ = cpu_to_be32(NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0); /* XXX: is NULL principal ok? */
break;
case NFS4_OPEN_DELEGATE_WRITE:
nfserr = nfsd4_encode_stateid(xdr, &open->op_delegate_stateid);
@@ -3005,22 +3006,22 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
p = xdr_reserve_space(xdr, 32);
if (!p)
return nfserr_resource;
- WRITE32(0);
+ *p++ = cpu_to_be32(0);

/*
* TODO: space_limit's in delegations
*/
- WRITE32(NFS4_LIMIT_SIZE);
- WRITE32(~(u32)0);
- WRITE32(~(u32)0);
+ *p++ = cpu_to_be32(NFS4_LIMIT_SIZE);
+ *p++ = cpu_to_be32(~(u32)0);
+ *p++ = cpu_to_be32(~(u32)0);

/*
* TODO: ACE's in delegations
*/
- WRITE32(NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE);
- WRITE32(0);
- WRITE32(0);
- WRITE32(0); /* XXX: is NULL principal ok? */
+ *p++ = cpu_to_be32(NFS4_ACE_ACCESS_ALLOWED_ACE_TYPE);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0); /* XXX: is NULL principal ok? */
break;
case NFS4_OPEN_DELEGATE_NONE_EXT: /* 4.1 */
switch (open->op_why_no_deleg) {
@@ -3029,14 +3030,15 @@ nfsd4_encode_open(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_op
p = xdr_reserve_space(xdr, 8);
if (!p)
return nfserr_resource;
- WRITE32(open->op_why_no_deleg);
- WRITE32(0); /* deleg signaling not supported yet */
+ *p++ = cpu_to_be32(open->op_why_no_deleg);
+ /* deleg signaling not supported yet: */
+ *p++ = cpu_to_be32(0);
break;
default:
p = xdr_reserve_space(xdr, 4);
if (!p)
return nfserr_resource;
- WRITE32(open->op_why_no_deleg);
+ *p++ = cpu_to_be32(open->op_why_no_deleg);
}
break;
default:
@@ -3312,8 +3314,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
return nfserr_resource;

/* XXX: Following NFSv3, we ignore the READDIR verifier for now. */
- WRITE32(0);
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
resp->xdr.buf->head[0].iov_len = ((char *)resp->xdr.p)
- (char *)resp->xdr.buf->head[0].iov_base;

@@ -3462,17 +3464,17 @@ nfsd4_do_encode_secinfo(struct xdr_stream *xdr,
XDR_LEN(info.oid.len) + 4 + 4);
if (!p)
goto out;
- WRITE32(RPC_AUTH_GSS);
- WRITE32(info.oid.len);
+ *p++ = cpu_to_be32(RPC_AUTH_GSS);
+ *p++ = cpu_to_be32(info.oid.len);
WRITEMEM(info.oid.data, info.oid.len);
- WRITE32(info.qop);
- WRITE32(info.service);
+ *p++ = cpu_to_be32(info.qop);
+ *p++ = cpu_to_be32(info.service);
} else if (pf < RPC_AUTH_MAXFLAVOR) {
supported++;
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out;
- WRITE32(pf);
+ *p++ = cpu_to_be32(pf);
} else {
if (report)
pr_warn("NFS: SECINFO: security flavor %u "
@@ -3522,16 +3524,16 @@ nfsd4_encode_setattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
if (!p)
return nfserr_resource;
if (nfserr) {
- WRITE32(3);
- WRITE32(0);
- WRITE32(0);
- WRITE32(0);
+ *p++ = cpu_to_be32(3);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
}
else {
- WRITE32(3);
- WRITE32(setattr->sa_bmval[0]);
- WRITE32(setattr->sa_bmval[1]);
- WRITE32(setattr->sa_bmval[2]);
+ *p++ = cpu_to_be32(3);
+ *p++ = cpu_to_be32(setattr->sa_bmval[0]);
+ *p++ = cpu_to_be32(setattr->sa_bmval[1]);
+ *p++ = cpu_to_be32(setattr->sa_bmval[2]);
}
return nfserr;
}
@@ -3553,8 +3555,8 @@ nfsd4_encode_setclientid(struct nfsd4_compoundres *resp, __be32 nfserr, struct n
p = xdr_reserve_space(xdr, 8);
if (!p)
return nfserr_resource;
- WRITE32(0);
- WRITE32(0);
+ *p++ = cpu_to_be32(0);
+ *p++ = cpu_to_be32(0);
}
return nfserr;
}
@@ -3569,8 +3571,8 @@ nfsd4_encode_write(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_w
p = xdr_reserve_space(xdr, 16);
if (!p)
return nfserr_resource;
- WRITE32(write->wr_bytes_written);
- WRITE32(write->wr_how_written);
+ *p++ = cpu_to_be32(write->wr_bytes_written);
+ *p++ = cpu_to_be32(write->wr_how_written);
WRITEMEM(write->wr_verifier.data, NFS4_VERIFIER_SIZE);
}
return nfserr;
@@ -3613,10 +3615,10 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
return nfserr_resource;

WRITEMEM(&exid->clientid, 8);
- WRITE32(exid->seqid);
- WRITE32(exid->flags);
+ *p++ = cpu_to_be32(exid->seqid);
+ *p++ = cpu_to_be32(exid->flags);

- WRITE32(exid->spa_how);
+ *p++ = cpu_to_be32(exid->spa_how);

switch (exid->spa_how) {
case SP4_NONE:
@@ -3628,11 +3630,11 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
return nfserr_resource;

/* spo_must_enforce bitmap: */
- WRITE32(2);
- WRITE32(nfs4_minimal_spo_must_enforce[0]);
- WRITE32(nfs4_minimal_spo_must_enforce[1]);
+ *p++ = cpu_to_be32(2);
+ *p++ = cpu_to_be32(nfs4_minimal_spo_must_enforce[0]);
+ *p++ = cpu_to_be32(nfs4_minimal_spo_must_enforce[1]);
/* empty spo_must_allow bitmap: */
- WRITE32(0);
+ *p++ = cpu_to_be32(0);

break;
default:
@@ -3652,15 +3654,15 @@ nfsd4_encode_exchange_id(struct nfsd4_compoundres *resp, __be32 nfserr,
/* The server_owner struct */
WRITE64(minor_id); /* Minor id */
/* major id */
- WRITE32(major_id_sz);
+ *p++ = cpu_to_be32(major_id_sz);
WRITEMEM(major_id, major_id_sz);

/* Server scope */
- WRITE32(server_scope_sz);
+ *p++ = cpu_to_be32(server_scope_sz);
WRITEMEM(server_scope, server_scope_sz);

/* Implementation id */
- WRITE32(0); /* zero length nfs_impl_id4 array */
+ *p++ = cpu_to_be32(0); /* zero length nfs_impl_id4 array */
return 0;
}

@@ -3678,43 +3680,43 @@ nfsd4_encode_create_session(struct nfsd4_compoundres *resp, __be32 nfserr,
if (!p)
return nfserr_resource;
WRITEMEM(sess->sessionid.data, NFS4_MAX_SESSIONID_LEN);
- WRITE32(sess->seqid);
- WRITE32(sess->flags);
+ *p++ = cpu_to_be32(sess->seqid);
+ *p++ = cpu_to_be32(sess->flags);

p = xdr_reserve_space(xdr, 28);
if (!p)
return nfserr_resource;
- WRITE32(0); /* headerpadsz */
- WRITE32(sess->fore_channel.maxreq_sz);
- WRITE32(sess->fore_channel.maxresp_sz);
- WRITE32(sess->fore_channel.maxresp_cached);
- WRITE32(sess->fore_channel.maxops);
- WRITE32(sess->fore_channel.maxreqs);
- WRITE32(sess->fore_channel.nr_rdma_attrs);
+ *p++ = cpu_to_be32(0); /* headerpadsz */
+ *p++ = cpu_to_be32(sess->fore_channel.maxreq_sz);
+ *p++ = cpu_to_be32(sess->fore_channel.maxresp_sz);
+ *p++ = cpu_to_be32(sess->fore_channel.maxresp_cached);
+ *p++ = cpu_to_be32(sess->fore_channel.maxops);
+ *p++ = cpu_to_be32(sess->fore_channel.maxreqs);
+ *p++ = cpu_to_be32(sess->fore_channel.nr_rdma_attrs);

if (sess->fore_channel.nr_rdma_attrs) {
p = xdr_reserve_space(xdr, 4);
if (!p)
return nfserr_resource;
- WRITE32(sess->fore_channel.rdma_attrs);
+ *p++ = cpu_to_be32(sess->fore_channel.rdma_attrs);
}

p = xdr_reserve_space(xdr, 28);
if (!p)
return nfserr_resource;
- WRITE32(0); /* headerpadsz */
- WRITE32(sess->back_channel.maxreq_sz);
- WRITE32(sess->back_channel.maxresp_sz);
- WRITE32(sess->back_channel.maxresp_cached);
- WRITE32(sess->back_channel.maxops);
- WRITE32(sess->back_channel.maxreqs);
- WRITE32(sess->back_channel.nr_rdma_attrs);
+ *p++ = cpu_to_be32(0); /* headerpadsz */
+ *p++ = cpu_to_be32(sess->back_channel.maxreq_sz);
+ *p++ = cpu_to_be32(sess->back_channel.maxresp_sz);
+ *p++ = cpu_to_be32(sess->back_channel.maxresp_cached);
+ *p++ = cpu_to_be32(sess->back_channel.maxops);
+ *p++ = cpu_to_be32(sess->back_channel.maxreqs);
+ *p++ = cpu_to_be32(sess->back_channel.nr_rdma_attrs);

if (sess->back_channel.nr_rdma_attrs) {
p = xdr_reserve_space(xdr, 4);
if (!p)
return nfserr_resource;
- WRITE32(sess->back_channel.rdma_attrs);
+ *p++ = cpu_to_be32(sess->back_channel.rdma_attrs);
}
return 0;
}
@@ -3733,12 +3735,12 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, __be32 nfserr,
if (!p)
return nfserr_resource;
WRITEMEM(seq->sessionid.data, NFS4_MAX_SESSIONID_LEN);
- WRITE32(seq->seqid);
- WRITE32(seq->slotid);
+ *p++ = cpu_to_be32(seq->seqid);
+ *p++ = cpu_to_be32(seq->slotid);
/* Note slotid's are numbered from zero: */
- WRITE32(seq->maxslots - 1); /* sr_highest_slotid */
- WRITE32(seq->maxslots - 1); /* sr_target_highest_slotid */
- WRITE32(seq->status_flags);
+ *p++ = cpu_to_be32(seq->maxslots - 1); /* sr_highest_slotid */
+ *p++ = cpu_to_be32(seq->maxslots - 1); /* sr_target_highest_slotid */
+ *p++ = cpu_to_be32(seq->status_flags);

resp->cstate.data_offset = xdr->buf->len; /* DRC cache data pointer */
return 0;
@@ -3885,7 +3887,7 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
WARN_ON_ONCE(1);
return;
}
- WRITE32(op->opnum);
+ *p++ = cpu_to_be32(op->opnum);
post_err_offset = xdr->buf->len;

if (op->opnum == OP_ILLEGAL)
@@ -3958,7 +3960,7 @@ nfsd4_encode_replay(struct xdr_stream *xdr, struct nfsd4_op *op)
WARN_ON_ONCE(1);
return;
}
- WRITE32(op->opnum);
+ *p++ = cpu_to_be32(op->opnum);
*p++ = rp->rp_status; /* already xdr'ed */

WRITEMEM(rp->rp_buf, rp->rp_buflen);
--
1.9.0


2014-05-22 19:32:34

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 12/52] nfsd4: allow space for final error return

From: "J. Bruce Fields" <[email protected]>

This post-encoding check should be taking into account the need to
encode at least an out-of-space error to the following op (if any).

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 24ba652..2ed8036 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3636,6 +3636,7 @@ void
nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
{
struct nfs4_stateowner *so = resp->cstate.replay_owner;
+ struct svc_rqst *rqstp = resp->rqstp;
__be32 *statp;
nfsd4_enc encoder;
__be32 *p;
@@ -3652,8 +3653,12 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
encoder = nfsd4_enc_ops[op->opnum];
op->status = encoder(resp, op->status, &op->u);
/* nfsd4_check_resp_size guarantees enough room for error status */
- if (!op->status)
- op->status = nfsd4_check_resp_size(resp, 0);
+ if (!op->status) {
+ int space_needed = 0;
+ if (!nfsd4_last_compound_op(rqstp))
+ space_needed = COMPOUND_ERR_SLACK_SPACE;
+ op->status = nfsd4_check_resp_size(resp, space_needed);
+ }
if (op->status == nfserr_resource ||
op->status == nfserr_rep_too_big ||
op->status == nfserr_rep_too_big_to_cache) {
--
1.9.0


2014-05-22 19:32:45

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 39/52] nfsd4: better estimate of getattr response size

From: "J. Bruce Fields" <[email protected]>

We plan to use this estimate to decide whether or not to allow zero-copy
reads. Currently we're assuming all getattr's are a page, which can be
both too small (ACLs e.g. may be arbitrarily long) and too large (after
an upcoming read patch this will unnecessarily prevent zero copy reads
in any read compound also containing a getattr).

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index f56f5e3..7c3df68 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1467,6 +1467,49 @@ static inline u32 nfsd4_create_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op
+ nfs4_fattr_bitmap_maxsz) * sizeof(__be32);
}

+/*
+ * Note since this is an idempotent operation we won't insist on failing
+ * the op prematurely if the estimate is too large. We may turn off splice
+ * reads unnecessarily.
+ */
+static inline u32 nfsd4_getattr_rsize(struct svc_rqst *rqstp,
+ struct nfsd4_op *op)
+{
+ u32 *bmap = op->u.getattr.ga_bmval;
+ u32 bmap0 = bmap[0], bmap1 = bmap[1], bmap2 = bmap[2];
+ u32 ret = 0;
+
+ if (bmap0 & FATTR4_WORD0_ACL)
+ return svc_max_payload(rqstp);
+ if (bmap0 & FATTR4_WORD0_FS_LOCATIONS)
+ return svc_max_payload(rqstp);
+
+ if (bmap1 & FATTR4_WORD1_OWNER) {
+ ret += IDMAP_NAMESZ + 4;
+ bmap1 &= ~FATTR4_WORD1_OWNER;
+ }
+ if (bmap1 & FATTR4_WORD1_OWNER_GROUP) {
+ ret += IDMAP_NAMESZ + 4;
+ bmap1 &= ~FATTR4_WORD1_OWNER_GROUP;
+ }
+ if (bmap0 & FATTR4_WORD0_FILEHANDLE) {
+ ret += NFS4_FHSIZE + 4;
+ bmap0 &= ~FATTR4_WORD0_FILEHANDLE;
+ }
+ if (bmap2 & FATTR4_WORD2_SECURITY_LABEL) {
+ ret += NFSD4_MAX_SEC_LABEL_LEN + 12;
+ bmap2 &= ~FATTR4_WORD2_SECURITY_LABEL;
+ }
+ /*
+ * Largest of remaining attributes are 16 bytes (e.g.,
+ * supported_attributes)
+ */
+ ret += 16 * (hweight32(bmap0) + hweight32(bmap1) + hweight32(bmap2));
+ /* bitmask, length */
+ ret += 20;
+ return ret;
+}
+
static inline u32 nfsd4_link_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
{
return (op_encode_hdr_size + op_encode_change_info_maxsz)
@@ -1605,6 +1648,7 @@ static struct nfsd4_operation nfsd4_ops[] = {
[OP_GETATTR] = {
.op_func = (nfsd4op_func)nfsd4_getattr,
.op_flags = ALLOWED_ON_ABSENT_FS,
+ .op_rsize_bop = nfsd4_getattr_rsize,
.op_name = "OP_GETATTR",
},
[OP_GETFH] = {
--
1.9.0


2014-05-22 19:32:33

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 03/52] nfsd4: fill in some missing op_name's

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 4c4cd96..2c1ee70 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1685,11 +1685,13 @@ static struct nfsd4_operation nfsd4_ops[] = {
},
[OP_READ] = {
.op_func = (nfsd4op_func)nfsd4_read,
+ .op_name = "OP_READ",
.op_rsize_bop = (nfsd4op_rsize)nfsd4_read_rsize,
.op_get_currentstateid = (stateid_getter)nfsd4_get_readstateid,
},
[OP_READDIR] = {
.op_func = (nfsd4op_func)nfsd4_readdir,
+ .op_name = "OP_READDIR",
.op_rsize_bop = (nfsd4op_rsize)nfsd4_readdir_rsize,
},
[OP_READLINK] = {
--
1.9.0


2014-05-22 19:32:40

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 24/52] nfsd4: remove redundant encode buffer size checking

From: "J. Bruce Fields" <[email protected]>

Now that all op encoders can handle running out of space, we no longer
need to check the remaining size for every operation; only nonidempotent
operations need that check, and that can be done by
nfsd4_check_resp_size.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 14 --------------
fs/nfsd/nfs4xdr.c | 22 +++++++++++++---------
2 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 694e1a5..9b57894 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1281,7 +1281,6 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
struct nfsd4_compound_state *cstate = &resp->cstate;
struct svc_fh *current_fh = &cstate->current_fh;
struct svc_fh *save_fh = &cstate->save_fh;
- int slack_bytes;
u32 plen = 0;
__be32 status;

@@ -1335,19 +1334,6 @@ nfsd4_proc_compound(struct svc_rqst *rqstp,
goto encode_op;
}

- /* We must be able to encode a successful response to
- * this operation, with enough room left over to encode a
- * failed response to the next operation. If we don't
- * have enough room, fail with ERR_RESOURCE.
- */
- slack_bytes = (char *)resp->xdr.end - (char *)resp->xdr.p;
- if (slack_bytes < COMPOUND_SLACK_SPACE
- + COMPOUND_ERR_SLACK_SPACE) {
- BUG_ON(slack_bytes < COMPOUND_ERR_SLACK_SPACE);
- op->status = nfserr_resource;
- goto encode_op;
- }
-
opdesc = OPDESC(op);

if (!current_fh->fh_dentry) {
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index be5793f..86caa98 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3753,20 +3753,24 @@ static nfsd4_enc nfsd4_enc_ops[] = {
__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 pad)
{
struct xdr_buf *buf = &resp->rqstp->rq_res;
- struct nfsd4_session *session = NULL;
+ struct nfsd4_session *session = resp->cstate.session;
struct nfsd4_slot *slot = resp->cstate.slot;
+ int slack_bytes = (char *)resp->xdr.end - (char *)resp->xdr.p;

- if (!nfsd4_has_session(&resp->cstate))
- return 0;
+ if (nfsd4_has_session(&resp->cstate)) {

- session = resp->cstate.session;
+ if (buf->len + pad > session->se_fchannel.maxresp_sz)
+ return nfserr_rep_too_big;

- if (buf->len + pad > session->se_fchannel.maxresp_sz)
- return nfserr_rep_too_big;
+ if ((slot->sl_flags & NFSD4_SLOT_CACHETHIS) &&
+ buf->len + pad > session->se_fchannel.maxresp_cached)
+ return nfserr_rep_too_big_to_cache;
+ }

- if ((slot->sl_flags & NFSD4_SLOT_CACHETHIS) &&
- buf->len + pad > session->se_fchannel.maxresp_cached)
- return nfserr_rep_too_big_to_cache;
+ if (pad > slack_bytes) {
+ WARN_ON_ONCE(nfsd4_has_session(&resp->cstate));
+ return nfserr_resource;
+ }

return 0;
}
--
1.9.0


2014-05-22 19:32:46

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 42/52] nfsd4: nfsd_vfs_read doesn't use file handle parameter

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/vfs.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 6aaa305..ce781da 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -821,7 +821,7 @@ static int nfsd_direct_splice_actor(struct pipe_inode_info *pipe,
}

static __be32
-nfsd_vfs_read(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
+nfsd_vfs_read(struct svc_rqst *rqstp, struct file *file,
loff_t offset, struct kvec *vec, int vlen, unsigned long *count)
{
mm_segment_t oldfs;
@@ -969,7 +969,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (ra && ra->p_set)
file->f_ra = ra->p_ra;

- err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
+ err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);

/* Write back readahead params */
if (ra) {
@@ -998,7 +998,7 @@ nfsd_read_file(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
NFSD_MAY_READ|NFSD_MAY_OWNER_OVERRIDE);
if (err)
goto out;
- err = nfsd_vfs_read(rqstp, fhp, file, offset, vec, vlen, count);
+ err = nfsd_vfs_read(rqstp, file, offset, vec, vlen, count);
} else /* Note file may still be NULL in NFSv4 special stateid case: */
err = nfsd_read(rqstp, fhp, offset, vec, vlen, count);
out:
--
1.9.0


2014-05-22 19:32:34

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 01/52] nfsd4: READ, READDIR, etc., are idempotent

From: "J. Bruce Fields" <[email protected]>

OP_MODIFIES_SOMETHING flags operations that we should be careful not to
initiate without being sure we have the buffer space to encode a reply.

None of these ops fall into that category.

We could probably remove a few more, but this isn't a very important
problem at least for ops whose reply size is easy to estimate.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index ac83778..4c4cd96 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1665,38 +1665,31 @@ static struct nfsd4_operation nfsd4_ops[] = {
[OP_PUTFH] = {
.op_func = (nfsd4op_func)nfsd4_putfh,
.op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS
- | OP_IS_PUTFH_LIKE | OP_MODIFIES_SOMETHING
- | OP_CLEAR_STATEID,
+ | OP_IS_PUTFH_LIKE | OP_CLEAR_STATEID,
.op_name = "OP_PUTFH",
.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
},
[OP_PUTPUBFH] = {
.op_func = (nfsd4op_func)nfsd4_putrootfh,
.op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS
- | OP_IS_PUTFH_LIKE | OP_MODIFIES_SOMETHING
- | OP_CLEAR_STATEID,
+ | OP_IS_PUTFH_LIKE | OP_CLEAR_STATEID,
.op_name = "OP_PUTPUBFH",
.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
},
[OP_PUTROOTFH] = {
.op_func = (nfsd4op_func)nfsd4_putrootfh,
.op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS
- | OP_IS_PUTFH_LIKE | OP_MODIFIES_SOMETHING
- | OP_CLEAR_STATEID,
+ | OP_IS_PUTFH_LIKE | OP_CLEAR_STATEID,
.op_name = "OP_PUTROOTFH",
.op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
},
[OP_READ] = {
.op_func = (nfsd4op_func)nfsd4_read,
- .op_flags = OP_MODIFIES_SOMETHING,
- .op_name = "OP_READ",
.op_rsize_bop = (nfsd4op_rsize)nfsd4_read_rsize,
.op_get_currentstateid = (stateid_getter)nfsd4_get_readstateid,
},
[OP_READDIR] = {
.op_func = (nfsd4op_func)nfsd4_readdir,
- .op_flags = OP_MODIFIES_SOMETHING,
- .op_name = "OP_READDIR",
.op_rsize_bop = (nfsd4op_rsize)nfsd4_readdir_rsize,
},
[OP_READLINK] = {
--
1.9.0


2014-05-22 19:32:41

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 30/52] nfsd4: minor encode_read cleanup

From: "J. Bruce Fields" <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 4877abb..741aab2 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3076,18 +3076,20 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,

len = maxcount;
v = 0;
- while (len > 0) {
+ while (len) {
+ int thislen;
+
page = *(resp->rqstp->rq_next_page);
if (!page) { /* ran out of pages */
maxcount -= len;
break;
}
+ thislen = min_t(long, len, PAGE_SIZE);
resp->rqstp->rq_vec[v].iov_base = page_address(page);
- resp->rqstp->rq_vec[v].iov_len =
- len < PAGE_SIZE ? len : PAGE_SIZE;
+ resp->rqstp->rq_vec[v].iov_len = thislen;
resp->rqstp->rq_next_page++;
v++;
- len -= PAGE_SIZE;
+ len -= thislen;
}
read->rd_vlen = v;

--
1.9.0


2014-05-22 19:32:44

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 37/52] nfsd4: enforce rd_dircount

From: "J. Bruce Fields" <[email protected]>

As long as we're here, let's enforce the protocol's limit on the number
of directory entries to return in a readdir.

I don't think anyone's ever noticed our lack of enforcement, but maybe
there's more of a chance they will now that we allow larger readdirs.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index a2524b3..97a25a7 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1033,7 +1033,7 @@ nfsd4_decode_readdir(struct nfsd4_compoundargs *argp, struct nfsd4_readdir *read
READ_BUF(24);
READ64(readdir->rd_cookie);
COPYMEM(readdir->rd_verf.data, sizeof(readdir->rd_verf.data));
- READ32(readdir->rd_dircount); /* just in case you needed a useless field... */
+ READ32(readdir->rd_dircount);
READ32(readdir->rd_maxcount);
if ((status = nfsd4_decode_bitmap(argp, readdir->rd_bmval)))
goto out;
@@ -2720,6 +2720,9 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen,
if (entry_bytes > cd->rd_maxcount)
goto fail;
cd->rd_maxcount -= entry_bytes;
+ if (!cd->rd_dircount)
+ goto fail;
+ cd->rd_dircount--;
cd->cookie_offset = cookie_offset;
skip_entry:
cd->common.err = nfs_ok;
--
1.9.0


2014-05-22 19:32:33

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 02/52] nfsd4: allow larger 4.1 session drc slots

From: "J. Bruce Fields" <[email protected]>

The client is actually asking for 2532 bytes. I suspect that's a
mistake. But maybe we can allow some more. In theory lock needs more
if it might return a maximum-length lockowner in the denied case.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/state.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index fda9ce2..374c662 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -122,7 +122,7 @@ static inline struct nfs4_delegation *delegstateid(struct nfs4_stid *s)
/* Maximum number of operations per session compound */
#define NFSD_MAX_OPS_PER_COMPOUND 16
/* Maximum session per slot cache size */
-#define NFSD_SLOT_CACHE_SIZE 1024
+#define NFSD_SLOT_CACHE_SIZE 2048
/* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
#define NFSD_CACHE_SIZE_SLOTS_PER_SESSION 32
#define NFSD_MAX_MEM_PER_SESSION \
--
1.9.0


2014-05-22 19:32:45

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 38/52] nfsd4: don't treat readlink like a zero-copy operation

From: "J. Bruce Fields" <[email protected]>

There's no advantage to this zero-copy-style readlink encoding, and it
unnecessarily limits the kinds of compounds we can handle. (In practice
I can't see why a client would want e.g. multiple readlink calls in a
comound, but it's probably a spec violation for us not to handle it.)

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 42 ++++++++++++------------------------------
1 file changed, 12 insertions(+), 30 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 97a25a7..a3cfb89 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3160,8 +3160,9 @@ static __be32
nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_readlink *readlink)
{
int maxcount;
+ __be32 wire_count;
+ int zero = 0;
struct xdr_stream *xdr = &resp->xdr;
- char *page;
int length_offset = xdr->buf->len;
__be32 *p;

@@ -3171,26 +3172,19 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
p = xdr_reserve_space(xdr, 4);
if (!p)
return nfserr_resource;
-
- if (resp->xdr.buf->page_len)
- return nfserr_resource;
- if (!*resp->rqstp->rq_next_page)
- return nfserr_resource;
-
- page = page_address(*(resp->rqstp->rq_next_page++));
-
maxcount = PAGE_SIZE;

- if (xdr->end - xdr->p < 1)
+ p = xdr_reserve_space(xdr, maxcount);
+ if (!p)
return nfserr_resource;
-
/*
* XXX: By default, the ->readlink() VFS op will truncate symlinks
* if they would overflow the buffer. Is this kosher in NFSv4? If
* not, one easy fix is: if ->readlink() precisely fills the buffer,
* assume that truncation occurred, and return NFS4ERR_RESOURCE.
*/
- nfserr = nfsd_readlink(readlink->rl_rqstp, readlink->rl_fhp, page, &maxcount);
+ nfserr = nfsd_readlink(readlink->rl_rqstp, readlink->rl_fhp,
+ (char *)p, &maxcount);
if (nfserr == nfserr_isdir)
nfserr = nfserr_inval;
if (nfserr) {
@@ -3198,24 +3192,12 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
return nfserr;
}

- WRITE32(maxcount);
- resp->xdr.buf->head[0].iov_len = (char *)p
- - (char *)resp->xdr.buf->head[0].iov_base;
- resp->xdr.buf->page_len = maxcount;
- xdr->buf->len += maxcount;
- xdr->page_ptr += 1;
- xdr->buf->buflen -= PAGE_SIZE;
- xdr->iov = xdr->buf->tail;
-
- /* Use rest of head for padding and remaining ops: */
- resp->xdr.buf->tail[0].iov_base = p;
- resp->xdr.buf->tail[0].iov_len = 0;
- if (maxcount&3) {
- p = xdr_reserve_space(xdr, 4);
- WRITE32(0);
- resp->xdr.buf->tail[0].iov_base += maxcount&3;
- resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
- }
+ wire_count = htonl(maxcount);
+ write_bytes_to_xdr_buf(xdr->buf, length_offset, &wire_count, 4);
+ xdr_truncate_encode(xdr, length_offset + 4 + maxcount);
+ if (maxcount & 3)
+ write_bytes_to_xdr_buf(xdr->buf, length_offset + 4 + maxcount,
+ &zero, 4 - (maxcount&3));
return 0;
}

--
1.9.0


2014-05-22 19:32:47

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 45/52] nfsd4: more read encoding cleanup

From: "J. Bruce Fields" <[email protected]>

More cleanup, no change in functionality.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index b8be187..fec9e13 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3077,11 +3077,9 @@ static __be32 nfsd4_encode_splice_read(
struct xdr_stream *xdr = &resp->xdr;
struct xdr_buf *buf = xdr->buf;
u32 eof;
- int starting_len = xdr->buf->len - 8;
int space_left;
__be32 nfserr;
- __be32 tmp;
- __be32 *p;
+ __be32 *p = xdr->p - 2;

/*
* Don't inline pages unless we know there's room for eof,
@@ -3105,25 +3103,25 @@ static __be32 nfsd4_encode_splice_read(
eof = (read->rd_offset + maxcount >=
read->rd_fhp->fh_dentry->d_inode->i_size);

- tmp = htonl(eof);
- write_bytes_to_xdr_buf(buf, starting_len , &tmp, 4);
- tmp = htonl(maxcount);
- write_bytes_to_xdr_buf(buf, starting_len + 4, &tmp, 4);
+ *(p++) = htonl(eof);
+ *(p++) = htonl(maxcount);

buf->page_len = maxcount;
buf->len += maxcount;
xdr->page_ptr += (maxcount + PAGE_SIZE - 1) / PAGE_SIZE;
- xdr->iov = buf->tail;

/* Use rest of head for padding and remaining ops: */
buf->tail[0].iov_base = xdr->p;
buf->tail[0].iov_len = 0;
+ xdr->iov = buf->tail;
if (maxcount&3) {
- p = xdr_reserve_space(xdr, 4);
- WRITE32(0);
+ int pad = 4 - (maxcount&3);
+
+ *(xdr->p++) = 0;
+
buf->tail[0].iov_base += maxcount&3;
- buf->tail[0].iov_len = 4 - (maxcount&3);
- buf->len -= (maxcount&3);
+ buf->tail[0].iov_len = pad;
+ buf->len += pad;
}

space_left = min_t(int, (void *)xdr->end - (void *)xdr->p,
--
1.9.0


2014-05-22 19:32:34

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 20/52] nfsd4: "backfill" using write_bytes_to_xdr_buf

From: "J. Bruce Fields" <[email protected]>

Normally xdr encoding proceeds in a single pass from start of a buffer
to end, but sometimes we have to write a few bytes to an earlier
position.

Use write_bytes_to_xdr_buf for these cases rather than saving a pointer
to write to. We plan to rewrite xdr_reserve_space to handle encoding
across page boundaries using a scratch buffer, and don't want to risk
writing to a pointer that was contained in a scratch buffer.

Also it will no longer be safe to calculate lengths by subtracting two
pointers, so use xdr_buf offsets instead.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 45 ++++++++++++++++++++++++++-------------------
1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index c8435ba..14efc5d 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1759,16 +1759,19 @@ static __be32 nfsd4_encode_components_esc(struct xdr_stream *xdr, char sep,
char esc_exit)
{
__be32 *p;
- __be32 *countp;
+ __be32 pathlen;
+ int pathlen_offset;
int strlen, count=0;
char *str, *end, *next;

dprintk("nfsd4_encode_components(%s)\n", components);
+
+ pathlen_offset = xdr->buf->len;
p = xdr_reserve_space(xdr, 4);
if (!p)
return nfserr_resource;
- countp = p;
- WRITE32(0); /* We will fill this in with @count later */
+ p++; /* We will fill this in with @count later */
+
end = str = components;
while (*end) {
bool found_esc = false;
@@ -1801,8 +1804,8 @@ static __be32 nfsd4_encode_components_esc(struct xdr_stream *xdr, char sep,
end++;
str = end;
}
- p = countp;
- WRITE32(count);
+ pathlen = htonl(xdr->buf->len - pathlen_offset);
+ write_bytes_to_xdr_buf(xdr->buf, pathlen_offset, &pathlen, 4);
return 0;
}

@@ -2054,7 +2057,8 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
struct kstatfs statfs;
__be32 *p;
int starting_len = xdr->buf->len;
- __be32 *attrlenp;
+ int attrlen_offset;
+ __be32 attrlen;
u32 dummy;
u64 dummy64;
u32 rdattr_err = 0;
@@ -2159,10 +2163,12 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
WRITE32(1);
WRITE32(bmval0);
}
+
+ attrlen_offset = xdr->buf->len;
p = xdr_reserve_space(xdr, 4);
if (!p)
goto out_resource;
- attrlenp = p++; /* to be backfilled later */
+ p++; /* to be backfilled later */

if (bmval0 & FATTR4_WORD0_SUPPORTED_ATTRS) {
u32 word0 = nfsd_suppattrs0(minorversion);
@@ -2534,7 +2540,8 @@ out_acl:
WRITE32(NFSD_SUPPATTR_EXCLCREAT_WORD2);
}

- *attrlenp = htonl((char *)xdr->p - (char *)attrlenp - 4);
+ attrlen = htonl(xdr->buf->len - attrlen_offset - 4);
+ write_bytes_to_xdr_buf(xdr->buf, attrlen_offset, &attrlen, 4);
status = nfs_ok;

out:
@@ -3664,15 +3671,16 @@ __be32 nfsd4_check_resp_size(struct nfsd4_compoundres *resp, u32 pad)
void
nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
{
+ struct xdr_stream *xdr = &resp->xdr;
struct nfs4_stateowner *so = resp->cstate.replay_owner;
struct svc_rqst *rqstp = resp->rqstp;
- __be32 *statp;
+ int post_err_offset;
nfsd4_enc encoder;
__be32 *p;

RESERVE_SPACE(8);
WRITE32(op->opnum);
- statp = p++; /* to be backfilled at the end */
+ post_err_offset = xdr->buf->len;

if (op->opnum == OP_ILLEGAL)
goto status;
@@ -3698,20 +3706,19 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
* bug if we had to do this on a non-idempotent op:
*/
warn_on_nonidempotent_op(op);
- resp->xdr.p = statp + 1;
+ xdr_truncate_encode(xdr, post_err_offset);
}
if (so) {
+ int len = xdr->buf->len - post_err_offset;
+
so->so_replay.rp_status = op->status;
- so->so_replay.rp_buflen = (char *)resp->xdr.p
- - (char *)(statp+1);
- memcpy(so->so_replay.rp_buf, statp+1, so->so_replay.rp_buflen);
+ so->so_replay.rp_buflen = len;
+ read_bytes_from_xdr_buf(xdr->buf, post_err_offset,
+ so->so_replay.rp_buf, len);
}
status:
- /*
- * Note: We write the status directly, instead of using WRITE32(),
- * since it is already in network byte order.
- */
- *statp = op->status;
+ /* Note that op->status is already in network byte order: */
+ write_bytes_to_xdr_buf(xdr->buf, post_err_offset - 4, &op->status, 4);
}

/*
--
1.9.0


2014-05-22 19:32:40

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 28/52] nfsd4: don't try to encode conflicting owner if low on space

From: "J. Bruce Fields" <[email protected]>

I ran into this corner case in testing: in theory clients can provide
state owners up to 1024 bytes long. In the sessions case there might be
a risk of this pushing us over the DRC slot size.

The conflicting owner isn't really that important, so let's humor a
client that provides a small maxresponsize_cached by allowing ourselves
to return without the conflicting owner instead of outright failing the
operation.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 3 ++-
fs/nfsd/nfs4xdr.c | 16 +++++++++++++---
2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 2517e0b..1326a0b 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1434,7 +1434,8 @@ out:
#define op_encode_change_info_maxsz (5)
#define nfs4_fattr_bitmap_maxsz (4)

-#define op_encode_lockowner_maxsz (1 + XDR_QUADLEN(IDMAP_NAMESZ))
+/* We'll fall back on returning no lockowner if run out of space: */
+#define op_encode_lockowner_maxsz (0)
#define op_encode_lock_denied_maxsz (8 + op_encode_lockowner_maxsz)

#define nfs4_owner_maxsz (1 + XDR_QUADLEN(IDMAP_NAMESZ))
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 19df1c2..4c2f866 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2878,9 +2878,20 @@ nfsd4_encode_lock_denied(struct xdr_stream *xdr, struct nfsd4_lock_denied *ld)
struct xdr_netobj *conf = &ld->ld_owner;
__be32 *p;

+again:
p = xdr_reserve_space(xdr, 32 + XDR_LEN(conf->len));
- if (!p)
+ if (!p) {
+ /*
+ * Don't fail to return the result just because we can't
+ * return the conflicting open:
+ */
+ if (conf->len) {
+ conf->len = 0;
+ conf->data = NULL;
+ goto again;
+ }
return nfserr_resource;
+ }
WRITE64(ld->ld_start);
WRITE64(ld->ld_length);
WRITE32(ld->ld_type);
@@ -2888,7 +2899,6 @@ nfsd4_encode_lock_denied(struct xdr_stream *xdr, struct nfsd4_lock_denied *ld)
WRITEMEM(&ld->ld_clientid, 8);
WRITE32(conf->len);
WRITEMEM(conf->data, conf->len);
- kfree(conf->data);
} else { /* non - nfsv4 lock in conflict, no clientid nor owner */
WRITE64((u64)0); /* clientid */
WRITE32(0); /* length of owner name */
@@ -2905,7 +2915,7 @@ nfsd4_encode_lock(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_lo
nfserr = nfsd4_encode_stateid(xdr, &lock->lk_resp_stateid);
else if (nfserr == nfserr_denied)
nfserr = nfsd4_encode_lock_denied(xdr, &lock->lk_denied);
-
+ kfree(lock->lk_denied.ld_owner.data);
return nfserr;
}

--
1.9.0


2014-05-22 19:32:46

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 44/52] nfsd4: read encoding cleanup

From: "J. Bruce Fields" <[email protected]>

Trivial cleanup, no change in functionality.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index bf853fd..b8be187 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3075,6 +3075,7 @@ static __be32 nfsd4_encode_splice_read(
struct file *file, unsigned long maxcount)
{
struct xdr_stream *xdr = &resp->xdr;
+ struct xdr_buf *buf = xdr->buf;
u32 eof;
int starting_len = xdr->buf->len - 8;
int space_left;
@@ -3097,7 +3098,7 @@ static __be32 nfsd4_encode_splice_read(
* page length; reset it so as not to confuse
* xdr_truncate_encode:
*/
- xdr->buf->page_len = 0;
+ buf->page_len = 0;
return nfserr;
}

@@ -3105,29 +3106,29 @@ static __be32 nfsd4_encode_splice_read(
read->rd_fhp->fh_dentry->d_inode->i_size);

tmp = htonl(eof);
- write_bytes_to_xdr_buf(xdr->buf, starting_len , &tmp, 4);
+ write_bytes_to_xdr_buf(buf, starting_len , &tmp, 4);
tmp = htonl(maxcount);
- write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
+ write_bytes_to_xdr_buf(buf, starting_len + 4, &tmp, 4);

- resp->xdr.buf->page_len = maxcount;
- xdr->buf->len += maxcount;
+ buf->page_len = maxcount;
+ buf->len += maxcount;
xdr->page_ptr += (maxcount + PAGE_SIZE - 1) / PAGE_SIZE;
- xdr->iov = xdr->buf->tail;
+ xdr->iov = buf->tail;

/* Use rest of head for padding and remaining ops: */
- resp->xdr.buf->tail[0].iov_base = xdr->p;
- resp->xdr.buf->tail[0].iov_len = 0;
+ buf->tail[0].iov_base = xdr->p;
+ buf->tail[0].iov_len = 0;
if (maxcount&3) {
p = xdr_reserve_space(xdr, 4);
WRITE32(0);
- resp->xdr.buf->tail[0].iov_base += maxcount&3;
- resp->xdr.buf->tail[0].iov_len = 4 - (maxcount&3);
- xdr->buf->len -= (maxcount&3);
+ buf->tail[0].iov_base += maxcount&3;
+ buf->tail[0].iov_len = 4 - (maxcount&3);
+ buf->len -= (maxcount&3);
}

space_left = min_t(int, (void *)xdr->end - (void *)xdr->p,
- xdr->buf->buflen - xdr->buf->len);
- xdr->buf->buflen = xdr->buf->len + space_left;
+ buf->buflen - buf->len);
+ buf->buflen = buf->len + space_left;
xdr->end = (__be32 *)((void *)xdr->end + space_left);

return 0;
--
1.9.0


2014-05-25 08:46:55

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH 13/52] nfsd4: use xdr_reserve_space in attribute encoding

On 5/23/2014 03:31, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> This is a cosmetic change for now; no change in behavior.
>
> Note we're just depending on xdr_reserve_space to do the bounds checking
> for us, we're not really depending on its adjustment of iovec or xdr_buf
> lengths yet, as those are fixed up by as necessary after the fact by
> read-link operations and by nfs4svc_encode_compoundres. However we do
> have to update xdr->iov on read-like operations to prevent
> xdr_reserve_space from messing with the already-fixed-up length of the
> the head.
>
> When the attribute encoding fails partway through we have to undo the
> length adjustments made so far. We do it manually for now, but later
> patches will add an xdr_truncate_encode() helper to handle cases like
> this.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/nfsd/acl.h | 2 +-
> fs/nfsd/idmap.h | 4 +-
> fs/nfsd/nfs4acl.c | 11 +-
> fs/nfsd/nfs4idmap.c | 42 ++++----
> fs/nfsd/nfs4proc.c | 1 +
> fs/nfsd/nfs4xdr.c | 286 +++++++++++++++++++++++++++++++---------------------
> 6 files changed, 202 insertions(+), 144 deletions(-)
>
> diff --git a/fs/nfsd/acl.h b/fs/nfsd/acl.h
> index b481e1f..a986ceb 100644
> --- a/fs/nfsd/acl.h
> +++ b/fs/nfsd/acl.h
> @@ -49,7 +49,7 @@ struct svc_rqst;
>
> struct nfs4_acl *nfs4_acl_new(int);
> int nfs4_acl_get_whotype(char *, u32);
> -__be32 nfs4_acl_write_who(int who, __be32 **p, int *len);
> +__be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who);
>
> int nfsd4_get_nfs4_acl(struct svc_rqst *rqstp, struct dentry *dentry,
> struct nfs4_acl **acl);
> diff --git a/fs/nfsd/idmap.h b/fs/nfsd/idmap.h
> index 66e58db..a3f3490 100644
> --- a/fs/nfsd/idmap.h
> +++ b/fs/nfsd/idmap.h
> @@ -56,7 +56,7 @@ static inline void nfsd_idmap_shutdown(struct net *net)
>
> __be32 nfsd_map_name_to_uid(struct svc_rqst *, const char *, size_t, kuid_t *);
> __be32 nfsd_map_name_to_gid(struct svc_rqst *, const char *, size_t, kgid_t *);
> -__be32 nfsd4_encode_user(struct svc_rqst *, kuid_t, __be32 **, int *);
> -__be32 nfsd4_encode_group(struct svc_rqst *, kgid_t, __be32 **, int *);
> +__be32 nfsd4_encode_user(struct xdr_stream *, struct svc_rqst *, kuid_t);
> +__be32 nfsd4_encode_group(struct xdr_stream *, struct svc_rqst *, kgid_t);
>
> #endif /* LINUX_NFSD_IDMAP_H */
> diff --git a/fs/nfsd/nfs4acl.c b/fs/nfsd/nfs4acl.c
> index 7c7c025..d714156 100644
> --- a/fs/nfsd/nfs4acl.c
> +++ b/fs/nfsd/nfs4acl.c
> @@ -919,20 +919,19 @@ nfs4_acl_get_whotype(char *p, u32 len)
> return NFS4_ACL_WHO_NAMED;
> }
>
> -__be32 nfs4_acl_write_who(int who, __be32 **p, int *len)
> +__be32 nfs4_acl_write_who(struct xdr_stream *xdr, int who)
> {
> + __be32 *p;
> int i;
> - int bytes;
>
> for (i = 0; i < ARRAY_SIZE(s2t_map); i++) {
> if (s2t_map[i].type != who)
> continue;
> - bytes = 4 + (XDR_QUADLEN(s2t_map[i].stringlen) << 2);
> - if (bytes > *len)
> + p = xdr_reserve_space(xdr, s2t_map[i].stringlen + 4);
> + if (!p)
> return nfserr_resource;
> - *p = xdr_encode_opaque(*p, s2t_map[i].string,
> + p = xdr_encode_opaque(p, s2t_map[i].string,
> s2t_map[i].stringlen);
> - *len -= bytes;
> return 0;
> }
> WARN_ON_ONCE(1);
> diff --git a/fs/nfsd/nfs4idmap.c b/fs/nfsd/nfs4idmap.c
> index c0dfde6..a0ab0a8 100644
> --- a/fs/nfsd/nfs4idmap.c
> +++ b/fs/nfsd/nfs4idmap.c
> @@ -551,44 +551,43 @@ idmap_name_to_id(struct svc_rqst *rqstp, int type, const char *name, u32 namelen
> return 0;
> }
>
> -static __be32 encode_ascii_id(u32 id, __be32 **p, int *buflen)
> +static __be32 encode_ascii_id(struct xdr_stream *xdr, u32 id)
> {
> char buf[11];
> int len;
> - int bytes;
> + __be32 *p;
>
> len = sprintf(buf, "%u", id);
> - bytes = 4 + (XDR_QUADLEN(len) << 2);
> - if (bytes > *buflen)
> + p = xdr_reserve_space(xdr, len + 4);
> + if (!p)
> return nfserr_resource;
> - *p = xdr_encode_opaque(*p, buf, len);
> - *buflen -= bytes;
> + p = xdr_encode_opaque(p, buf, len);
> return 0;
> }
>
> -static __be32 idmap_id_to_name(struct svc_rqst *rqstp, int type, u32 id, __be32 **p, int *buflen)
> +static __be32 idmap_id_to_name(struct xdr_stream *xdr,
> + struct svc_rqst *rqstp, int type, u32 id)
> {
> struct ent *item, key = {
> .id = id,
> .type = type,
> };
> + __be32 *p;
> int ret;
> - int bytes;
> struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
>
> strlcpy(key.authname, rqst_authname(rqstp), sizeof(key.authname));
> ret = idmap_lookup(rqstp, idtoname_lookup, &key, nn->idtoname_cache, &item);
> if (ret == -ENOENT)
> - return encode_ascii_id(id, p, buflen);
> + return encode_ascii_id(xdr, id);
> if (ret)
> return nfserrno(ret);
> ret = strlen(item->name);
> WARN_ON_ONCE(ret > IDMAP_NAMESZ);
> - bytes = 4 + (XDR_QUADLEN(ret) << 2);
> - if (bytes > *buflen)
> + p = xdr_reserve_space(xdr, ret + 4);
> + if (!p)
> return nfserr_resource;
> - *p = xdr_encode_opaque(*p, item->name, ret);
> - *buflen -= bytes;
> + p = xdr_encode_opaque(p, item->name, ret);
> cache_put(&item->h, nn->idtoname_cache);
> return 0;
> }
> @@ -622,11 +621,12 @@ do_name_to_id(struct svc_rqst *rqstp, int type, const char *name, u32 namelen, u
> return idmap_name_to_id(rqstp, type, name, namelen, id);
> }
>
> -static __be32 encode_name_from_id(struct svc_rqst *rqstp, int type, u32 id, __be32 **p, int *buflen)
> +static __be32 encode_name_from_id(struct xdr_stream *xdr,
> + struct svc_rqst *rqstp, int type, u32 id)
> {
> if (nfs4_disable_idmapping && rqstp->rq_cred.cr_flavor < RPC_AUTH_GSS)
> - return encode_ascii_id(id, p, buflen);
> - return idmap_id_to_name(rqstp, type, id, p, buflen);
> + return encode_ascii_id(xdr, id);
> + return idmap_id_to_name(xdr, rqstp, type, id);
> }
>
> __be32
> @@ -655,14 +655,16 @@ nfsd_map_name_to_gid(struct svc_rqst *rqstp, const char *name, size_t namelen,
> return status;
> }
>
> -__be32 nfsd4_encode_user(struct svc_rqst *rqstp, kuid_t uid, __be32 **p, int *buflen)
> +__be32 nfsd4_encode_user(struct xdr_stream *xdr, struct svc_rqst *rqstp,
> + kuid_t uid)
> {
> u32 id = from_kuid(&init_user_ns, uid);
> - return encode_name_from_id(rqstp, IDMAP_TYPE_USER, id, p, buflen);
> + return encode_name_from_id(xdr, rqstp, IDMAP_TYPE_USER, id);
> }
>
> -__be32 nfsd4_encode_group(struct svc_rqst *rqstp, kgid_t gid, __be32 **p, int *buflen)
> +__be32 nfsd4_encode_group(struct xdr_stream *xdr, struct svc_rqst *rqstp,
> + kgid_t gid)
> {
> u32 id = from_kgid(&init_user_ns, gid);
> - return encode_name_from_id(rqstp, IDMAP_TYPE_GROUP, id, p, buflen);
> + return encode_name_from_id(xdr, rqstp, IDMAP_TYPE_GROUP, id);
> }
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index db14965..1433854 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1261,6 +1261,7 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
> struct kvec *head = buf->head;
>
> xdr->buf = buf;
> + xdr->iov = head;
> xdr->p = head->iov_base + head->iov_len;
> xdr->end = head->iov_base + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
> }
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 2ed8036..0ab7aae 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1755,18 +1755,20 @@ static void write_cinfo(__be32 **p, struct nfsd4_change_info *c)
> /* Encode as an array of strings the string given with components
> * separated @sep, escaped with esc_enter and esc_exit.
> */
> -static __be32 nfsd4_encode_components_esc(char sep, char *components,
> - __be32 **pp, int *buflen,
> - char esc_enter, char esc_exit)
> +static __be32 nfsd4_encode_components_esc(struct xdr_stream *xdr, char sep,
> + char *components, char esc_enter,
> + char esc_exit)
> {
> - __be32 *p = *pp;
> - __be32 *countp = p;
> + __be32 *p;
> + __be32 *countp;
> int strlen, count=0;
> char *str, *end, *next;
>
> dprintk("nfsd4_encode_components(%s)\n", components);
> - if ((*buflen -= 4) < 0)
> + p = xdr_reserve_space(xdr, 4);
> + if (!p)
> return nfserr_resource;
> + countp = p;
> WRITE32(0); /* We will fill this in with @count later */
> end = str = components;
> while (*end) {
> @@ -1789,7 +1791,8 @@ static __be32 nfsd4_encode_components_esc(char sep, char *components,
>
> strlen = end - str;
> if (strlen) {
> - if ((*buflen -= ((XDR_QUADLEN(strlen) << 2) + 4)) < 0)
> + p = xdr_reserve_space(xdr, strlen + 4);
> + if (!p)
> return nfserr_resource;
> WRITE32(strlen);
> WRITEMEM(str, strlen);
> @@ -1799,7 +1802,6 @@ static __be32 nfsd4_encode_components_esc(char sep, char *components,
> end++;
> str = end;
> }
> - *pp = p;
> p = countp;
> WRITE32(count);
> return 0;
> @@ -1808,40 +1810,39 @@ static __be32 nfsd4_encode_components_esc(char sep, char *components,
> /* Encode as an array of strings the string given with components
> * separated @sep.
> */
> -static __be32 nfsd4_encode_components(char sep, char *components,
> - __be32 **pp, int *buflen)
> +static __be32 nfsd4_encode_components(struct xdr_stream *xdr, char sep,
> + char *components)
> {
> - return nfsd4_encode_components_esc(sep, components, pp, buflen, 0, 0);
> + return nfsd4_encode_components_esc(xdr, sep, components, 0, 0);
> }
>
> /*
> * encode a location element of a fs_locations structure
> */
> -static __be32 nfsd4_encode_fs_location4(struct nfsd4_fs_location *location,
> - __be32 **pp, int *buflen)
> +static __be32 nfsd4_encode_fs_location4(struct xdr_stream *xdr,
> + struct nfsd4_fs_location *location)
> {
> __be32 status;
> - __be32 *p = *pp;
>
> - status = nfsd4_encode_components_esc(':', location->hosts, &p, buflen,
> + status = nfsd4_encode_components_esc(xdr, ':', location->hosts,
> '[', ']');
> if (status)
> return status;
> - status = nfsd4_encode_components('/', location->path, &p, buflen);
> + status = nfsd4_encode_components(xdr, '/', location->path);
> if (status)
> return status;
> - *pp = p;
> return 0;
> }
>
> /*
> * Encode a path in RFC3530 'pathname4' format
> */
> -static __be32 nfsd4_encode_path(const struct path *root,
> - const struct path *path, __be32 **pp, int *buflen)
> +static __be32 nfsd4_encode_path(struct xdr_stream *xdr,
> + const struct path *root,
> + const struct path *path)
> {
> struct path cur = *path;
> - __be32 *p = *pp;
> + __be32 *p;
> struct dentry **components = NULL;
> unsigned int ncomponents = 0;
> __be32 err = nfserr_jukebox;
> @@ -1872,9 +1873,9 @@ static __be32 nfsd4_encode_path(const struct path *root,
> components[ncomponents++] = cur.dentry;
> cur.dentry = dget_parent(cur.dentry);
> }
> -
> - *buflen -= 4;
> - if (*buflen < 0)
> + err = nfserr_resource;
> + p = xdr_reserve_space(xdr, 4);
> + if (!p)
> goto out_free;
> WRITE32(ncomponents);
>
> @@ -1884,8 +1885,8 @@ static __be32 nfsd4_encode_path(const struct path *root,
>
> spin_lock(&dentry->d_lock);
> len = dentry->d_name.len;
> - *buflen -= 4 + (XDR_QUADLEN(len) << 2);
> - if (*buflen < 0) {
> + p = xdr_reserve_space(xdr, len + 4);
> + if (!p) {
> spin_unlock(&dentry->d_lock);
> goto out_free;
> }
> @@ -1897,7 +1898,6 @@ static __be32 nfsd4_encode_path(const struct path *root,
> ncomponents--;
> }
>
> - *pp = p;
> err = 0;
> out_free:
> dprintk(")\n");
> @@ -1908,8 +1908,8 @@ out_free:
> return err;
> }
>
> -static __be32 nfsd4_encode_fsloc_fsroot(struct svc_rqst *rqstp,
> - const struct path *path, __be32 **pp, int *buflen)
> +static __be32 nfsd4_encode_fsloc_fsroot(struct xdr_stream *xdr,
> + struct svc_rqst *rqstp, const struct path *path)
> {
> struct svc_export *exp_ps;
> __be32 res;
> @@ -1917,7 +1917,7 @@ static __be32 nfsd4_encode_fsloc_fsroot(struct svc_rqst *rqstp,
> exp_ps = rqst_find_fsidzero_export(rqstp);
> if (IS_ERR(exp_ps))
> return nfserrno(PTR_ERR(exp_ps));
> - res = nfsd4_encode_path(&exp_ps->ex_path, path, pp, buflen);
> + res = nfsd4_encode_path(xdr, &exp_ps->ex_path, path);
> exp_put(exp_ps);
> return res;
> }
> @@ -1925,28 +1925,26 @@ static __be32 nfsd4_encode_fsloc_fsroot(struct svc_rqst *rqstp,
> /*
> * encode a fs_locations structure
> */
> -static __be32 nfsd4_encode_fs_locations(struct svc_rqst *rqstp,
> - struct svc_export *exp,
> - __be32 **pp, int *buflen)
> +static __be32 nfsd4_encode_fs_locations(struct xdr_stream *xdr,
> + struct svc_rqst *rqstp, struct svc_export *exp)
> {
> __be32 status;
> int i;
> - __be32 *p = *pp;
> + __be32 *p;
> struct nfsd4_fs_locations *fslocs = &exp->ex_fslocs;
>
> - status = nfsd4_encode_fsloc_fsroot(rqstp, &exp->ex_path, &p, buflen);
> + status = nfsd4_encode_fsloc_fsroot(xdr, rqstp, &exp->ex_path);
> if (status)
> return status;
> - if ((*buflen -= 4) < 0)
> + p = xdr_reserve_space(xdr, 4);
> + if (!p)
> return nfserr_resource;
> WRITE32(fslocs->locations_count);
> for (i=0; i<fslocs->locations_count; i++) {
> - status = nfsd4_encode_fs_location4(&fslocs->locations[i],
> - &p, buflen);
> + status = nfsd4_encode_fs_location4(xdr, &fslocs->locations[i]);
> if (status)
> return status;
> }
> - *pp = p;
> return 0;
> }
>
> @@ -1965,15 +1963,15 @@ static u32 nfs4_file_type(umode_t mode)
> }
>
> static inline __be32
> -nfsd4_encode_aclname(struct svc_rqst *rqstp, struct nfs4_ace *ace,
> - __be32 **p, int *buflen)
> +nfsd4_encode_aclname(struct xdr_stream *xdr, struct svc_rqst *rqstp,
> + struct nfs4_ace *ace)
> {
> if (ace->whotype != NFS4_ACL_WHO_NAMED)
> - return nfs4_acl_write_who(ace->whotype, p, buflen);
> + return nfs4_acl_write_who(xdr, ace->whotype);
> else if (ace->flag & NFS4_ACE_IDENTIFIER_GROUP)
> - return nfsd4_encode_group(rqstp, ace->who_gid, p, buflen);
> + return nfsd4_encode_group(xdr, rqstp, ace->who_gid);
> else
> - return nfsd4_encode_user(rqstp, ace->who_uid, p, buflen);
> + return nfsd4_encode_user(xdr, rqstp, ace->who_uid);
> }
>
> #define WORD0_ABSENT_FS_ATTRS (FATTR4_WORD0_FS_LOCATIONS | FATTR4_WORD0_FSID | \
> @@ -1982,31 +1980,28 @@ nfsd4_encode_aclname(struct svc_rqst *rqstp, struct nfs4_ace *ace,
>
> #ifdef CONFIG_NFSD_V4_SECURITY_LABEL
> static inline __be32
> -nfsd4_encode_security_label(struct svc_rqst *rqstp, void *context, int len, __be32 **pp, int *buflen)
> +nfsd4_encode_security_label(struct xdr_stream *xdr, struct svc_rqst *rqstp,
> + void *context, int len)
> {
> __be32 *p = *pp;

CC [M] fs/nfsd/nfs4xdr.o
fs/nfsd/nfs4xdr.c: In function ?nfsd4_encode_security_label?:
fs/nfsd/nfs4xdr.c:1984:15: error: ?pp? undeclared (first use in this function)
__be32 *p = *pp;
^
fs/nfsd/nfs4xdr.c:1984:15: note: each undeclared identifier is reported only once for each function it appears in
make[1]: *** [fs/nfsd/nfs4xdr.o] Error 1
make: *** [fs/nfsd/] Error 2

thanks,
Kinglong Mee

2014-05-22 19:32:34

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 08/52] nfsd4: tweak nfsd4_encode_getattr to take xdr_stream

From: "J. Bruce Fields" <[email protected]>

Just change the nfsd4_encode_getattr api. Not changing any code or
adding any new functionality yet.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 6 +++---
fs/nfsd/nfs4xdr.c | 47 +++++++++++++++++++++++++++++++++++------------
fs/nfsd/xdr4.h | 7 ++++---
3 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 6d94adf..94ef528 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1061,10 +1061,10 @@ _nfsd4_verify(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
return nfserr_jukebox;

p = buf;
- status = nfsd4_encode_fattr(&cstate->current_fh,
+ status = nfsd4_encode_fattr_to_buf(&p, count, &cstate->current_fh,
cstate->current_fh.fh_export,
- cstate->current_fh.fh_dentry, &p,
- count, verify->ve_bmval,
+ cstate->current_fh.fh_dentry,
+ verify->ve_bmval,
rqstp, 0);
/*
* If nfsd4_encode_fattr() ran out of space, assume that's because
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 9113725..1879250 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2045,12 +2045,11 @@ static int get_parent_attributes(struct svc_export *exp, struct kstat *stat)
/*
* Note: @fhp can be NULL; in this case, we might have to compose the filehandle
* ourselves.
- *
- * countp is the buffer size in _words_
*/
__be32
-nfsd4_encode_fattr(struct svc_fh *fhp, struct svc_export *exp,
- struct dentry *dentry, __be32 **buffer, int count, u32 *bmval,
+nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
+ struct svc_export *exp,
+ struct dentry *dentry, u32 *bmval,
struct svc_rqst *rqstp, int ignore_crossmnt)
{
u32 bmval0 = bmval[0];
@@ -2059,12 +2058,12 @@ nfsd4_encode_fattr(struct svc_fh *fhp, struct svc_export *exp,
struct kstat stat;
struct svc_fh *tempfh = NULL;
struct kstatfs statfs;
- int buflen = count << 2;
+ __be32 *p = xdr->p;
+ int buflen = xdr->buf->buflen;
__be32 *attrlenp;
u32 dummy;
u64 dummy64;
u32 rdattr_err = 0;
- __be32 *p = *buffer;
__be32 status;
int err;
int aclsupport = 0;
@@ -2491,7 +2490,7 @@ out_acl:
}

*attrlenp = htonl((char *)p - (char *)attrlenp - 4);
- *buffer = p;
+ xdr->p = p;
status = nfs_ok;

out:
@@ -2513,6 +2512,27 @@ out_resource:
goto out;
}

+__be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
+ struct svc_fh *fhp, struct svc_export *exp,
+ struct dentry *dentry, u32 *bmval,
+ struct svc_rqst *rqstp, int ignore_crossmnt)
+{
+ struct xdr_buf dummy = {
+ .head[0] = {
+ .iov_base = *p,
+ },
+ .buflen = words << 2,
+ };
+ struct xdr_stream xdr;
+ __be32 ret;
+
+ xdr_init_encode(&xdr, &dummy, NULL);
+ ret = nfsd4_encode_fattr(&xdr, fhp, exp, dentry, bmval, rqstp,
+ ignore_crossmnt);
+ *p = xdr.p;
+ return ret;
+}
+
static inline int attributes_need_mount(u32 *bmval)
{
if (bmval[0] & ~(FATTR4_WORD0_RDATTR_ERROR | FATTR4_WORD0_LEASE_TIME))
@@ -2576,7 +2596,8 @@ nfsd4_encode_dirent_fattr(struct nfsd4_readdir *cd,

}
out_encode:
- nfserr = nfsd4_encode_fattr(NULL, exp, dentry, p, buflen, cd->rd_bmval,
+ nfserr = nfsd4_encode_fattr_to_buf(p, buflen, NULL, exp, dentry,
+ cd->rd_bmval,
cd->rd_rqstp, ignore_crossmnt);
out_put:
dput(dentry);
@@ -2746,14 +2767,16 @@ static __be32
nfsd4_encode_getattr(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4_getattr *getattr)
{
struct svc_fh *fhp = getattr->ga_fhp;
- int buflen;
+ struct xdr_stream *xdr = &resp->xdr;
+ struct xdr_buf *buf = resp->xdr.buf;

if (nfserr)
return nfserr;

- buflen = resp->xdr.end - resp->xdr.p - (COMPOUND_ERR_SLACK_SPACE >> 2);
- nfserr = nfsd4_encode_fattr(fhp, fhp->fh_export, fhp->fh_dentry,
- &resp->xdr.p, buflen, getattr->ga_bmval,
+ buf->buflen = (void *)resp->xdr.end - (void *)resp->xdr.p
+ - COMPOUND_ERR_SLACK_SPACE;
+ nfserr = nfsd4_encode_fattr(xdr, fhp, fhp->fh_export, fhp->fh_dentry,
+ getattr->ga_bmval,
resp->rqstp, 0);
return nfserr;
}
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 6884d70..f62a055 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -562,9 +562,10 @@ int nfs4svc_encode_compoundres(struct svc_rqst *, __be32 *,
__be32 nfsd4_check_resp_size(struct nfsd4_compoundres *, u32);
void nfsd4_encode_operation(struct nfsd4_compoundres *, struct nfsd4_op *);
void nfsd4_encode_replay(struct nfsd4_compoundres *resp, struct nfsd4_op *op);
-__be32 nfsd4_encode_fattr(struct svc_fh *fhp, struct svc_export *exp,
- struct dentry *dentry, __be32 **buffer, int countp,
- u32 *bmval, struct svc_rqst *, int ignore_crossmnt);
+__be32 nfsd4_encode_fattr_to_buf(__be32 **p, int words,
+ struct svc_fh *fhp, struct svc_export *exp,
+ struct dentry *dentry,
+ u32 *bmval, struct svc_rqst *, int ignore_crossmnt);
extern __be32 nfsd4_setclientid(struct svc_rqst *rqstp,
struct nfsd4_compound_state *,
struct nfsd4_setclientid *setclid);
--
1.9.0


2014-05-22 19:32:51

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 52/52] nfsd4: better reservation of head space for krb5

From: "J. Bruce Fields" <[email protected]>

RPC_MAX_AUTH_SIZE is scattered around several places. Better to set it
once in the auth code, where this kind of estimate should be made. And
while we're at it we can leave it zero when we're not using krb5i or
krb5p.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4proc.c | 4 ++--
fs/nfsd/nfs4state.c | 2 +-
fs/nfsd/nfs4xdr.c | 5 +++--
include/linux/sunrpc/svc.h | 11 +++++------
net/sunrpc/auth_gss/svcauth_gss.c | 2 ++
net/sunrpc/svcauth.c | 2 ++
6 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index f5f608d..71fc494 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1263,13 +1263,13 @@ static void svcxdr_init_encode(struct svc_rqst *rqstp,
xdr->buf = buf;
xdr->iov = head;
xdr->p = head->iov_base + head->iov_len;
- xdr->end = head->iov_base + PAGE_SIZE - 2 * RPC_MAX_AUTH_SIZE;
+ xdr->end = head->iov_base + PAGE_SIZE - rqstp->rq_auth_slack;
/* Tail and page_len should be zero at this point: */
buf->len = buf->head[0].iov_len;
xdr->scratch.iov_len = 0;
xdr->page_ptr = buf->pages;
buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages)
- - 2 * RPC_MAX_AUTH_SIZE;
+ - rqstp->rq_auth_slack;
}

/*
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 7a06246..052dc45 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2288,7 +2288,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
session->se_fchannel.maxresp_sz;
status = (seq->cachethis) ? nfserr_rep_too_big_to_cache :
nfserr_rep_too_big;
- if (xdr_restrict_buflen(xdr, buflen - 2 * RPC_MAX_AUTH_SIZE))
+ if (xdr_restrict_buflen(xdr, buflen - rqstp->rq_auth_slack))
goto out_put_session;
svc_reserve(rqstp, buflen);

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 6813870..fd39c49 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1611,7 +1611,8 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
DECODE_HEAD;
struct nfsd4_op *op;
bool cachethis = false;
- int max_reply = 2 * RPC_MAX_AUTH_SIZE + 8; /* opcnt, status */
+ int auth_slack= argp->rqstp->rq_auth_slack;
+ int max_reply = auth_slack + 8; /* opcnt, status */
int readcount = 0;
int readbytes = 0;
int i;
@@ -1677,7 +1678,7 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
svc_reserve(argp->rqstp, max_reply + readbytes);
argp->rqstp->rq_cachetype = cachethis ? RC_REPLBUFF : RC_NOCACHE;

- if (readcount > 1 || max_reply > PAGE_SIZE - 2*RPC_MAX_AUTH_SIZE)
+ if (readcount > 1 || max_reply > PAGE_SIZE - auth_slack)
argp->rqstp->rq_splice_ok = false;

DECODE_TAIL;
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 39c50e1..0e6b39c 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -259,7 +259,10 @@ struct svc_rqst {
void * rq_argp; /* decoded arguments */
void * rq_resp; /* xdr'd results */
void * rq_auth_data; /* flavor-specific data */
-
+ int rq_auth_slack; /* extra space xdr code
+ * should leave in head
+ * for krb5i, krb5p.
+ */
int rq_reserved; /* space on socket outq
* reserved for this request
*/
@@ -455,11 +458,7 @@ char * svc_print_addr(struct svc_rqst *, char *, size_t);
*/
static inline void svc_reserve_auth(struct svc_rqst *rqstp, int space)
{
- int added_space = 0;
-
- if (rqstp->rq_authop->flavour)
- added_space = RPC_MAX_AUTH_SIZE;
- svc_reserve(rqstp, space + added_space);
+ svc_reserve(rqstp, space + rqstp->rq_auth_slack);
}

#endif /* SUNRPC_SVC_H */
diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c
index 0f73f45..4ce5ecce 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -1503,6 +1503,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp)
if (unwrap_integ_data(rqstp, &rqstp->rq_arg,
gc->gc_seq, rsci->mechctx))
goto garbage_args;
+ rqstp->rq_auth_slack = RPC_MAX_AUTH_SIZE;
break;
case RPC_GSS_SVC_PRIVACY:
/* placeholders for length and seq. number: */
@@ -1511,6 +1512,7 @@ svcauth_gss_accept(struct svc_rqst *rqstp, __be32 *authp)
if (unwrap_priv_data(rqstp, &rqstp->rq_arg,
gc->gc_seq, rsci->mechctx))
goto garbage_args;
+ rqstp->rq_auth_slack = RPC_MAX_AUTH_SIZE * 2;
break;
default:
goto auth_err;
diff --git a/net/sunrpc/svcauth.c b/net/sunrpc/svcauth.c
index 2af7b0c..79c0f34 100644
--- a/net/sunrpc/svcauth.c
+++ b/net/sunrpc/svcauth.c
@@ -54,6 +54,8 @@ svc_authenticate(struct svc_rqst *rqstp, __be32 *authp)
}
spin_unlock(&authtab_lock);

+ rqstp->rq_auth_slack = 0;
+
rqstp->rq_authop = aops;
return aops->accept(rqstp, authp);
}
--
1.9.0


2014-05-22 19:32:45

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 41/52] nfsd4: turn off zero-copy-read in exotic cases

From: "J. Bruce Fields" <[email protected]>

We currently allow only one read per compound, with operations before
and after whose responses will require no more than about a page to
encode.

While we don't expect clients to violate those limits any time soon,
this limitation isn't really condoned by the spec, so to future proof
the server we should lift the limitation.

At the same time we'd like to continue to support zero-copy reads.

Supporting multiple zero-copy-reads per compound would require a new
data structure to replace struct xdr_buf, which can represent only one
set of included pages.

So for now we plan to modify encode_read() to support either zero-copy
or non-zero-copy reads, and use some heuristics at the start of the
compound processing to decide whether a zero-copy read will work.

This will allow us to support more exotic compounds without introducing
a performance regression in the normal case.

Later patches handle those "exotic compounds", this one just makes sure
zero-copy is turned off in those cases.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4xdr.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index a3cfb89..1ca547d 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1612,6 +1612,8 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
struct nfsd4_op *op;
bool cachethis = false;
int max_reply = 2 * RPC_MAX_AUTH_SIZE + 8; /* opcnt, status */
+ int readcount = 0;
+ int readbytes = 0;
int i;

READ_BUF(4);
@@ -1658,7 +1660,11 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
*/
cachethis |= nfsd4_cache_this_op(op);

- max_reply += nfsd4_max_reply(argp->rqstp, op);
+ if (op->opnum == OP_READ) {
+ readcount++;
+ readbytes += nfsd4_max_reply(argp->rqstp, op);
+ } else
+ max_reply += nfsd4_max_reply(argp->rqstp, op);

if (op->status) {
argp->opcnt = i+1;
@@ -1668,9 +1674,12 @@ nfsd4_decode_compound(struct nfsd4_compoundargs *argp)
/* Sessions make the DRC unnecessary: */
if (argp->minorversion)
cachethis = false;
- svc_reserve(argp->rqstp, max_reply);
+ svc_reserve(argp->rqstp, max_reply + readbytes);
argp->rqstp->rq_cachetype = cachethis ? RC_REPLBUFF : RC_NOCACHE;

+ if (readcount > 1 || max_reply > PAGE_SIZE - 2*RPC_MAX_AUTH_SIZE)
+ argp->rqstp->rq_splice_ok = false;
+
DECODE_TAIL;
}

@@ -3078,15 +3087,19 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
return nfserr;

p = xdr_reserve_space(xdr, 8); /* eof flag and byte count */
- if (!p)
+ if (!p) {
+ WARN_ON_ONCE(resp->rqstp->rq_splice_ok);
return nfserr_resource;
+ }

/* Make sure there will be room for padding if needed: */
if (xdr->end - xdr->p < 1)
return nfserr_resource;

- if (resp->xdr.buf->page_len)
+ if (resp->xdr.buf->page_len) {
+ WARN_ON_ONCE(resp->rqstp->rq_splice_ok);
return nfserr_resource;
+ }

maxcount = svc_max_payload(resp->rqstp);
if (maxcount > read->rd_length)
--
1.9.0


2014-05-23 01:21:57

by Kinglong Mee

[permalink] [raw]
Subject: Re: [PATCH 01/52] nfsd4: READ, READDIR, etc., are idempotent

On 5/23/2014 03:31, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> OP_MODIFIES_SOMETHING flags operations that we should be careful not to
> initiate without being sure we have the buffer space to encode a reply.
>
> None of these ops fall into that category.
>
> We could probably remove a few more, but this isn't a very important
> problem at least for ops whose reply size is easy to estimate.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/nfsd/nfs4proc.c | 13 +++----------
> 1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index ac83778..4c4cd96 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1665,38 +1665,31 @@ static struct nfsd4_operation nfsd4_ops[] = {
> [OP_PUTFH] = {
> .op_func = (nfsd4op_func)nfsd4_putfh,
> .op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS
> - | OP_IS_PUTFH_LIKE | OP_MODIFIES_SOMETHING
> - | OP_CLEAR_STATEID,
> + | OP_IS_PUTFH_LIKE | OP_CLEAR_STATEID,
> .op_name = "OP_PUTFH",
> .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
> },
> [OP_PUTPUBFH] = {
> .op_func = (nfsd4op_func)nfsd4_putrootfh,
> .op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS
> - | OP_IS_PUTFH_LIKE | OP_MODIFIES_SOMETHING
> - | OP_CLEAR_STATEID,
> + | OP_IS_PUTFH_LIKE | OP_CLEAR_STATEID,
> .op_name = "OP_PUTPUBFH",
> .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
> },
> [OP_PUTROOTFH] = {
> .op_func = (nfsd4op_func)nfsd4_putrootfh,
> .op_flags = ALLOWED_WITHOUT_FH | ALLOWED_ON_ABSENT_FS
> - | OP_IS_PUTFH_LIKE | OP_MODIFIES_SOMETHING
> - | OP_CLEAR_STATEID,
> + | OP_IS_PUTFH_LIKE | OP_CLEAR_STATEID,
> .op_name = "OP_PUTROOTFH",
> .op_rsize_bop = (nfsd4op_rsize)nfsd4_only_status_rsize,
> },
> [OP_READ] = {
> .op_func = (nfsd4op_func)nfsd4_read,
> - .op_flags = OP_MODIFIES_SOMETHING,
> - .op_name = "OP_READ",

Should not delete op_name.

> .op_rsize_bop = (nfsd4op_rsize)nfsd4_read_rsize,
> .op_get_currentstateid = (stateid_getter)nfsd4_get_readstateid,
> },
> [OP_READDIR] = {
> .op_func = (nfsd4op_func)nfsd4_readdir,
> - .op_flags = OP_MODIFIES_SOMETHING,
> - .op_name = "OP_READDIR",

Same as above.

Because "[PATCH 03/52] nfsd4: fill in some missing op_name's" re-adds them.

thanks,
Kinglong Mee

2014-05-28 21:09:05

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 41/52] nfsd4: turn off zero-copy-read in exotic cases

On Wed, May 28, 2014 at 10:01:52AM -0400, J. Bruce Fields wrote:
> On Wed, May 28, 2014 at 01:09:45AM -0700, Christoph Hellwig wrote:
> > On Thu, May 22, 2014 at 03:32:16PM -0400, J. Bruce Fields wrote:
> > > Later patches handle those "exotic compounds", this one just makes sure
> > > zero-copy is turned off in those cases.
> >
> > How did you test these exotic compounds?
>
> I have is a pynfs test that sends a compound with multiple reads in it.

But, look, it wasn't turned on in my regular tests so, surprise, I
regressed recently without noticing; I intend to fold in the below after
some more testing.

> I don't think that's pushed out to my regular pynfs tree, I'll try to do
> that today.

Still working on that.

--b.

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 3f8bfb9..3976dc6 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3197,12 +3197,10 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
WARN_ON_ONCE(resp->rqstp->rq_splice_ok);
return nfserr_resource;
}
-
- if (resp->xdr.buf->page_len) {
- WARN_ON_ONCE(resp->rqstp->rq_splice_ok);
+ if (resp->xdr.buf->page_len && resp->rqstp->rq_splice_ok) {
+ WARN_ON_ONCE(1);
return nfserr_resource;
}
-
xdr_commit_encode(xdr);

maxcount = svc_max_payload(resp->rqstp);

2014-06-02 22:12:15

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 41/52] nfsd4: turn off zero-copy-read in exotic cases

On Wed, May 28, 2014 at 05:08:59PM -0400, bfields wrote:
> On Wed, May 28, 2014 at 10:01:52AM -0400, J. Bruce Fields wrote:
> > On Wed, May 28, 2014 at 01:09:45AM -0700, Christoph Hellwig wrote:
> > > On Thu, May 22, 2014 at 03:32:16PM -0400, J. Bruce Fields wrote:
> > > > Later patches handle those "exotic compounds", this one just makes sure
> > > > zero-copy is turned off in those cases.
> > >
> > > How did you test these exotic compounds?
> >
> > I have is a pynfs test that sends a compound with multiple reads in it.
>
> But, look, it wasn't turned on in my regular tests so, surprise, I
> regressed recently without noticing; I intend to fold in the below after
> some more testing.
>
> > I don't think that's pushed out to my regular pynfs tree, I'll try to do
> > that today.
>
> Still working on that.

I've pushed out some minimal tests of the new xdr code to

git://linux-nfs.org/~bfields/pynfs.git master

Frank, some of those might not really be appropriate for any server, I'm
not sure.

--b.

2014-06-03 14:24:18

by Weston Andros Adamson

[permalink] [raw]
Subject: Re: [PATCH 41/52] nfsd4: turn off zero-copy-read in exotic cases

On Jun 3, 2014, at 10:10 AM, J. Bruce Fields <[email protected]> wrote:

> On Tue, Jun 03, 2014 at 12:18:34AM -0400, Weston Andros Adamson wrote:
>> On May 28, 2014, at 10:23 AM, J. Bruce Fields <[email protected]> wrote:
>>> There's not much point re-running all the 4.1 tests over 4.2. Maybe all
>>> we may need is to say "use minor version 2 on this compound" in tests of
>>> the new features.
>>>
>>> But I haven't even tried to figure out how to tell pynfs about the new
>>> .x files.
>>
>> I?ve made a lot of pynfs cleanups recently - mostly on the client test side
>> (the nfs4.1 server), but they may make this a bit easier. I?ll clean them up
>> and share asap.
>>
>> I was just adding new xdr to pynfs and AFAIK it's xdrgen has no include
>> support, so we?re stuck tacking things on the end of nfs4.x. Would it be
>> worthwhile for me to add some type of include support? Minor versions and
>> layouts could stay in separate .x files - that sounds much cleaner to me.
>
> That does sound cleaner, but is that how the ietf's .x files work?
>
> It looks like
>
> http://www.ietf.org/id/draft-ietf-nfsv4-minorversion2-dot-x-26.txt
>
> just copies all the 4.1 definitions.
>
> I may not understand what you're proposing.

Oh, yeah. I was thinking more about layouts - like the flex file draft - that don?t
repeat definitions and instead reference other xdr definitions.

I don?t know if there is any restriction in the IETF that all types within a .x
have to be defined by that .x. If not, does it make sense to change the 4.2
spec to only have new types, or to prune out the differences for pynfs?

-dros


2014-06-03 14:10:52

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 41/52] nfsd4: turn off zero-copy-read in exotic cases

On Tue, Jun 03, 2014 at 12:18:34AM -0400, Weston Andros Adamson wrote:
> On May 28, 2014, at 10:23 AM, J. Bruce Fields <[email protected]> wrote:
> > There's not much point re-running all the 4.1 tests over 4.2. Maybe all
> > we may need is to say "use minor version 2 on this compound" in tests of
> > the new features.
> >
> > But I haven't even tried to figure out how to tell pynfs about the new
> > .x files.
>
> I’ve made a lot of pynfs cleanups recently - mostly on the client test side
> (the nfs4.1 server), but they may make this a bit easier. I’ll clean them up
> and share asap.
>
> I was just adding new xdr to pynfs and AFAIK it's xdrgen has no include
> support, so we’re stuck tacking things on the end of nfs4.x. Would it be
> worthwhile for me to add some type of include support? Minor versions and
> layouts could stay in separate .x files - that sounds much cleaner to me.

That does sound cleaner, but is that how the ietf's .x files work?

It looks like

http://www.ietf.org/id/draft-ietf-nfsv4-minorversion2-dot-x-26.txt

just copies all the 4.1 definitions.

I may not understand what you're proposing.

--b.

2014-06-03 04:18:38

by Weston Andros Adamson

[permalink] [raw]
Subject: Re: [PATCH 41/52] nfsd4: turn off zero-copy-read in exotic cases

On May 28, 2014, at 10:23 AM, J. Bruce Fields <[email protected]> wrote:

> On Wed, May 28, 2014 at 10:13:09AM -0400, Anna Schumaker wrote:
>> On 05/28/2014 10:01 AM, J. Bruce Fields wrote:
>>> On Wed, May 28, 2014 at 01:09:45AM -0700, Christoph Hellwig wrote:
>>>> On Thu, May 22, 2014 at 03:32:16PM -0400, J. Bruce Fields wrote:
>>>>> Later patches handle those "exotic compounds", this one just makes
>>>>> sure zero-copy is turned off in those cases.
>>>> How did you test these exotic compounds?
>>> I have is a pynfs test that sends a compound with multiple reads in
>>> it.
>>>
>>> I don't think that's pushed out to my regular pynfs tree, I'll try
>>> to do that today.
>>>
>>> I could really use more of those. Maybe I'm wrong, but I'm worried
>>> less about this case than the more finicky out-of-reply-space cases,
>>> where I do have patches puporting to fix problems that I haven't
>>> really verified.
>>
>> I'll eventually be using this case for READ_PLUS. I'll be sure to
>> send problems your way! :)
>
> Great, thanks.
>
> Actually my main question there is how to handle 4.2 in pynfs.
>
> 4.1 and 4.0 are entirely separate codebases. We definitely don't want
> to do that again.

I strongly agree!

>
> There's not much point re-running all the 4.1 tests over 4.2. Maybe all
> we may need is to say "use minor version 2 on this compound" in tests of
> the new features.
>
> But I haven't even tried to figure out how to tell pynfs about the new
> .x files.

I?ve made a lot of pynfs cleanups recently - mostly on the client test side
(the nfs4.1 server), but they may make this a bit easier. I?ll clean them up
and share asap.

I was just adding new xdr to pynfs and AFAIK it's xdrgen has no include
support, so we?re stuck tacking things on the end of nfs4.x. Would it be
worthwhile for me to add some type of include support? Minor versions and
layouts could stay in separate .x files - that sounds much cleaner to me.

-dros


2014-06-03 14:36:14

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 41/52] nfsd4: turn off zero-copy-read in exotic cases

On Tue, Jun 03, 2014 at 10:24:13AM -0400, Weston Andros Adamson wrote:
> On Jun 3, 2014, at 10:10 AM, J. Bruce Fields <[email protected]> wrote:
>
> > On Tue, Jun 03, 2014 at 12:18:34AM -0400, Weston Andros Adamson wrote:
> >> On May 28, 2014, at 10:23 AM, J. Bruce Fields <[email protected]> wrote:
> >>> There's not much point re-running all the 4.1 tests over 4.2. Maybe all
> >>> we may need is to say "use minor version 2 on this compound" in tests of
> >>> the new features.
> >>>
> >>> But I haven't even tried to figure out how to tell pynfs about the new
> >>> .x files.
> >>
> >> I’ve made a lot of pynfs cleanups recently - mostly on the client test side
> >> (the nfs4.1 server), but they may make this a bit easier. I’ll clean them up
> >> and share asap.
> >>
> >> I was just adding new xdr to pynfs and AFAIK it's xdrgen has no include
> >> support, so we’re stuck tacking things on the end of nfs4.x. Would it be
> >> worthwhile for me to add some type of include support? Minor versions and
> >> layouts could stay in separate .x files - that sounds much cleaner to me.
> >
> > That does sound cleaner, but is that how the ietf's .x files work?
> >
> > It looks like
> >
> > http://www.ietf.org/id/draft-ietf-nfsv4-minorversion2-dot-x-26.txt
> >
> > just copies all the 4.1 definitions.
> >
> > I may not understand what you're proposing.
>
> Oh, yeah. I was thinking more about layouts - like the flex file draft - that don’t
> repeat definitions and instead reference other xdr definitions.

OK, I don't really care about those yet--however you think it makes
sense is fine to me.... If we can use the ietf's files verbatim then
that will simplify keeping up with drafts and verifying we have the
protocol right, so we should support includes if they use them.

> I don’t know if there is any restriction in the IETF that all types within a .x
> have to be defined by that .x. If not, does it make sense to change the 4.2
> spec to only have new types, or to prune out the differences for pynfs?

I guess that's a question for Tom and/or the ietf list.

--b.