2009-08-27 16:07:25

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 0/5] NFSv4.1 DRC rewrite version 6


NFSv4.1 DRC Rewrite Version 6

These patches apply against git://linux-nfs.org/~bfields/linux: for-2.6.32
branch and continue to rewrite the NFSv4.1 Sessions DRC. Besides some
bug fixes:

1) The bound on the fore channel per-session DRC size are rewritten.
Instead of just using a maximum number of slots to bound the size,
The server gives the client the number of ca_maxresponsesize_cached slots it
requests bounded by NFSD_SLOT_CACHE_SIZE, NFSD_MAX_MEM_PER_SESSION and by
nfsd_drc_max_mem. Do not allow more than NFSD_MAX_SLOTS_PER_SESSION.

This allows clients to tailor a session to usage.
For example, an I/O session (READ/WRITE/COMMIT only) can have a much smaller
ca_maxresponsesize_cached (for only WRITE/COMMIt compound responses) and a lot
larger ca_maxresponses to service a large in-flight data window.

2) the page-based DRC is replaced with a buffer based DRC with each
slot table entry (struct nfsd4_slot + cache) allocated separately.
This allocation prepares us for slot size re-negotiation via the SEQUENCE
operation target and high slot id arguments.

Testing:

NFSv4.1 mount: pynfs tests - including the SEQUENCE replay cache tests.
connectathon tests.

NFSv4.0 mount: connectathon tests.

-->Andy

0001-nfsd41-expand-solo-sequence-check.patch
0002-nfsd41-bound-forechannel-drc-size-by-memory-usage.patch
0003-nfsd41-use-session-maxreqs-for-sequence-target-and.patch
0004-nfsd41-replace-nfserr_resource-in-pure-nfs41-respon.patch
0005-nfsd41-replace-page-based-DRC-with-buffer-based-DRC.patch



2009-08-28 22:56:06

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [pnfs] [PATCH 0/5] NFSv4.1 DRC rewrite version 6

On Fri, Aug 28, 2009 at 06:42:48PM -0400, J. Bruce Fields wrote:
> On Thu, Aug 27, 2009 at 12:07:39PM -0400, [email protected] wrote:
> >
> > NFSv4.1 DRC Rewrite Version 6
> >
> > These patches apply against git://linux-nfs.org/~bfields/linux: for-2.6.32
> > branch and continue to rewrite the NFSv4.1 Sessions DRC. Besides some
> > bug fixes:
> >
> > 1) The bound on the fore channel per-session DRC size are rewritten.
> > Instead of just using a maximum number of slots to bound the size,
> > The server gives the client the number of ca_maxresponsesize_cached slots it
> > requests bounded by NFSD_SLOT_CACHE_SIZE, NFSD_MAX_MEM_PER_SESSION and by
> > nfsd_drc_max_mem. Do not allow more than NFSD_MAX_SLOTS_PER_SESSION.
> >
> > This allows clients to tailor a session to usage.
> > For example, an I/O session (READ/WRITE/COMMIT only) can have a much smaller
> > ca_maxresponsesize_cached (for only WRITE/COMMIt compound responses) and a lot
> > larger ca_maxresponses to service a large in-flight data window.
> >
> > 2) the page-based DRC is replaced with a buffer based DRC with each
> > slot table entry (struct nfsd4_slot + cache) allocated separately.
> > This allocation prepares us for slot size re-negotiation via the SEQUENCE
> > operation target and high slot id arguments.
> >
> > Testing:
> >
> > NFSv4.1 mount: pynfs tests - including the SEQUENCE replay cache tests.
> > connectathon tests.
>
> By the way, any hints on getting the 4.1 pynfs stuff running?

OK, solution seemed to be

cd gssapi
python setup.py build_ext --inplace

But now almost everything fails.with SERVERFAULT. Bah.

--b.

2009-08-28 23:07:25

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [pnfs] [PATCH 0/5] NFSv4.1 DRC rewrite version 6

On Fri, Aug 28, 2009 at 06:56:08PM -0400, bfields wrote:
> On Fri, Aug 28, 2009 at 06:42:48PM -0400, J. Bruce Fields wrote:
> > On Thu, Aug 27, 2009 at 12:07:39PM -0400, [email protected] wrote:
> > >
> > > NFSv4.1 DRC Rewrite Version 6
> > >
> > > These patches apply against git://linux-nfs.org/~bfields/linux: for-2.6.32
> > > branch and continue to rewrite the NFSv4.1 Sessions DRC. Besides some
> > > bug fixes:
> > >
> > > 1) The bound on the fore channel per-session DRC size are rewritten.
> > > Instead of just using a maximum number of slots to bound the size,
> > > The server gives the client the number of ca_maxresponsesize_cached slots it
> > > requests bounded by NFSD_SLOT_CACHE_SIZE, NFSD_MAX_MEM_PER_SESSION and by
> > > nfsd_drc_max_mem. Do not allow more than NFSD_MAX_SLOTS_PER_SESSION.
> > >
> > > This allows clients to tailor a session to usage.
> > > For example, an I/O session (READ/WRITE/COMMIT only) can have a much smaller
> > > ca_maxresponsesize_cached (for only WRITE/COMMIt compound responses) and a lot
> > > larger ca_maxresponses to service a large in-flight data window.
> > >
> > > 2) the page-based DRC is replaced with a buffer based DRC with each
> > > slot table entry (struct nfsd4_slot + cache) allocated separately.
> > > This allocation prepares us for slot size re-negotiation via the SEQUENCE
> > > operation target and high slot id arguments.
> > >
> > > Testing:
> > >
> > > NFSv4.1 mount: pynfs tests - including the SEQUENCE replay cache tests.
> > > connectathon tests.
> >
> > By the way, any hints on getting the 4.1 pynfs stuff running?
>
> OK, solution seemed to be
>
> cd gssapi
> python setup.py build_ext --inplace
>
> But now almost everything fails.with SERVERFAULT. Bah.

Does anyone have a list of which tests are expected to pass at this point? If
I ask for "all" tests, I get "4 skipped, 103 failed, 0 warned, 26 passed--full
list available if anyone wants. I seem to be passing connectathon tests
OK over 4.1.

--b.

2009-08-28 21:33:22

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 5/5] nfsd41: replace page based DRC with buffer based DRC

On Thu, Aug 27, 2009 at 12:07:44PM -0400, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> Use NFSD_SLOT_CACHE_SIZE size buffers for sessions DRC instead of holding nfsd
> pages in cache.
>
> Connectathon testing has shown that 1024 bytes for encoded compound operation
> responses past the sequence operation is sufficient, 512 bytes is a little too
> small. Set NFSD_SLOT_CACHE_SIZE to 1024.
>
> Allocate memory for the session DRC in the CREATE_SESSION operation
> to guarantee that the memory resource is available for caching responses.
> Allocate each slot individually in preparation for slot table size negotiation.
>
> Remove struct nfsd4_cache_entry and helper functions for the old page-based
> DRC.
>
> The iov_len calculation in nfs4svc_encode_compoundres is now always
> correct, clean up the nfs4svc_encode_compoundres session logic.
>
> The nfsd4_compound_state statp pointer is also not used.
> Remove nfsd4_set_statp().
>
> Move useful nfsd4_cache_entry fields into nfsd4_slot.
>
> Signed-off-by: Andy Adamson <[email protected]
> ---
> fs/nfsd/nfs4state.c | 207 ++++++++++++--------------------------------
> fs/nfsd/nfs4xdr.c | 13 ++--
> fs/nfsd/nfssvc.c | 4 -
> include/linux/nfsd/state.h | 27 ++----
> include/linux/nfsd/xdr4.h | 5 +-
> 5 files changed, 74 insertions(+), 182 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 4695cec..2d72d5c 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -510,12 +510,22 @@ static int init_forechannel_attrs(struct svc_rqst *rqstp,
> return status;
> }
>
> +static void
> +free_session_slots(struct nfsd4_session *ses)
> +{
> + int i;
> +
> + for (i = 0; i < ses->se_fchannel.maxreqs; i++)
> + kfree(ses->se_slots[i]);
> +}
> +
> static int
> alloc_init_session(struct svc_rqst *rqstp, struct nfs4_client *clp,
> struct nfsd4_create_session *cses)
> {
> struct nfsd4_session *new, tmp;
> - int idx, status = nfserr_serverfault, slotsize;
> + struct nfsd4_slot *sp;
> + int idx, status = nfserr_serverfault, slotsize, cachesize, i;

Just as a style thing: that list's getting a little long. Could you
keep at least "status" on a separate line?

>
> memset(&tmp, 0, sizeof(tmp));
>
> @@ -526,14 +536,23 @@ alloc_init_session(struct svc_rqst *rqstp, struct nfs4_client *clp,
> if (status)
> goto out;
>
> - /* allocate struct nfsd4_session and slot table in one piece */
> - slotsize = tmp.se_fchannel.maxreqs * sizeof(struct nfsd4_slot);
> + /* allocate struct nfsd4_session and slot table pointers in one piece */
> + slotsize = tmp.se_fchannel.maxreqs * sizeof(struct nfsd4_slot *);
> new = kzalloc(sizeof(*new) + slotsize, GFP_KERNEL);

I think this is OK for now, but maybe stick something like:

BUILD_BUG_ON(NFSD_MAX_SLOTS_PER_SESSION * sizeof(struct nfsd4_slot)
+ sizeof(struct nfsd4_session) > PAGE_SIZE);

in state.h just to warn anyone who wants to blindly bump up
NFSD_MAX_SLOTS_PER_SESSION. (It's not really forbidden to kmalloc more
than a page, but it's also not reliable, and if it becomes necessary
then we'd rather find some way to code around it.)

> if (!new)
> goto out;
>
> memcpy(new, &tmp, sizeof(*new));
>
> + /* allocate each struct nfsd4_slot and data cache in one piece */
> + cachesize = new->se_fchannel.maxresp_cached - NFSD_MIN_HDR_SEQ_SZ;
> + for (i = 0; i < new->se_fchannel.maxreqs; i++) {
> + sp = kzalloc(sizeof(*sp) + cachesize, GFP_KERNEL);
> + if (!sp)
> + goto out_free;
> + new->se_slots[i] = sp;
> + }
> +
> new->se_client = clp;
> gen_sessionid(new);
> idx = hash_sessionid(&new->se_sessionid);
> @@ -550,6 +569,10 @@ alloc_init_session(struct svc_rqst *rqstp, struct nfs4_client *clp,
> status = nfs_ok;
> out:
> return status;
> +out_free:
> + free_session_slots(new);
> + kfree(new);
> + goto out;
> }
>
> /* caller must hold sessionid_lock */
> @@ -592,22 +615,16 @@ release_session(struct nfsd4_session *ses)
> nfsd4_put_session(ses);
> }
>
> -static void nfsd4_release_respages(struct page **respages, short resused);
> -
> void
> free_session(struct kref *kref)
> {
> struct nfsd4_session *ses;
> - int i;
>
> ses = container_of(kref, struct nfsd4_session, se_ref);
> - for (i = 0; i < ses->se_fchannel.maxreqs; i++) {
> - struct nfsd4_cache_entry *e = &ses->se_slots[i].sl_cache_entry;
> - nfsd4_release_respages(e->ce_respages, e->ce_resused);
> - }
> spin_lock(&nfsd_drc_lock);
> nfsd_drc_mem_used -= ses->se_fchannel.maxreqs * NFSD_SLOT_CACHE_SIZE;
> spin_unlock(&nfsd_drc_lock);
> + free_session_slots(ses);
> kfree(ses);
> }
>
> @@ -964,116 +981,32 @@ out_err:
> return;
> }
>
> -void
> -nfsd4_set_statp(struct svc_rqst *rqstp, __be32 *statp)
> -{
> - struct nfsd4_compoundres *resp = rqstp->rq_resp;
> -
> - resp->cstate.statp = statp;
> -}
> -
> -/*
> - * Dereference the result pages.
> - */
> -static void
> -nfsd4_release_respages(struct page **respages, short resused)
> -{
> - int i;
> -
> - dprintk("--> %s\n", __func__);
> - for (i = 0; i < resused; i++) {
> - if (!respages[i])
> - continue;
> - put_page(respages[i]);
> - respages[i] = NULL;
> - }
> -}
> -
> -static void
> -nfsd4_copy_pages(struct page **topages, struct page **frompages, short count)
> -{
> - int i;
> -
> - for (i = 0; i < count; i++) {
> - topages[i] = frompages[i];
> - if (!topages[i])
> - continue;
> - get_page(topages[i]);
> - }
> -}
> -
> /*
> - * Cache the reply pages up to NFSD_PAGES_PER_SLOT + 1, clearing the previous
> - * pages. We add a page to NFSD_PAGES_PER_SLOT for the case where the total
> - * length of the XDR response is less than se_fmaxresp_cached
> - * (NFSD_PAGES_PER_SLOT * PAGE_SIZE) but the xdr_buf pages is used for a
> - * of the reply (e.g. readdir).
> - *
> - * Store the base and length of the rq_req.head[0] page
> - * of the NFSv4.1 data, just past the rpc header.
> + * Cache a reply. nfsd4_check_drc_limit() has bounded the cache size.
> */
> void
> nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
> {
> - struct nfsd4_cache_entry *entry = &resp->cstate.slot->sl_cache_entry;
> - struct svc_rqst *rqstp = resp->rqstp;
> - struct kvec *resv = &rqstp->rq_res.head[0];
> -
> - dprintk("--> %s entry %p\n", __func__, entry);
> + struct nfsd4_slot *slot = resp->cstate.slot;
> + unsigned int base;
>
> - nfsd4_release_respages(entry->ce_respages, entry->ce_resused);
> - entry->ce_opcnt = resp->opcnt;
> - entry->ce_status = resp->cstate.status;
> + dprintk("--> %s slot %p\n", __func__, slot);
>
> - /*
> - * Don't need a page to cache just the sequence operation - the slot
> - * does this for us!
> - */
> + slot->sl_opcnt = resp->opcnt;
> + slot->sl_status = resp->cstate.status;
>
> if (nfsd4_not_cached(resp)) {
> - entry->ce_resused = 0;
> - entry->ce_rpchdrlen = 0;
> - dprintk("%s Just cache SEQUENCE. ce_cachethis %d\n", __func__,
> - resp->cstate.slot->sl_cache_entry.ce_cachethis);
> + slot->sl_datalen = 0;
> return;
> }
> - entry->ce_resused = rqstp->rq_resused;
> - if (entry->ce_resused > NFSD_PAGES_PER_SLOT + 1)
> - entry->ce_resused = NFSD_PAGES_PER_SLOT + 1;
> - nfsd4_copy_pages(entry->ce_respages, rqstp->rq_respages,
> - entry->ce_resused);
> - entry->ce_datav.iov_base = resp->cstate.statp;
> - entry->ce_datav.iov_len = resv->iov_len - ((char *)resp->cstate.statp -
> - (char *)page_address(rqstp->rq_respages[0]));
> - /* Current request rpc header length*/
> - entry->ce_rpchdrlen = (char *)resp->cstate.statp -
> - (char *)page_address(rqstp->rq_respages[0]);
> -}
> -
> -/*
> - * We keep the rpc header, but take the nfs reply from the replycache.
> - */
> -static int
> -nfsd41_copy_replay_data(struct nfsd4_compoundres *resp,
> - struct nfsd4_cache_entry *entry)
> -{
> - struct svc_rqst *rqstp = resp->rqstp;
> - struct kvec *resv = &resp->rqstp->rq_res.head[0];
> - int len;
> -
> - /* Current request rpc header length*/
> - len = (char *)resp->cstate.statp -
> - (char *)page_address(rqstp->rq_respages[0]);
> - if (entry->ce_datav.iov_len + len > PAGE_SIZE) {
> - dprintk("%s v41 cached reply too large (%Zd).\n", __func__,
> - entry->ce_datav.iov_len);
> - return 0;
> - }
> - /* copy the cached reply nfsd data past the current rpc header */
> - memcpy((char *)resv->iov_base + len, entry->ce_datav.iov_base,
> - entry->ce_datav.iov_len);
> - resv->iov_len = len + entry->ce_datav.iov_len;
> - return 1;
> + slot->sl_datalen = (char *)resp->p - (char *)resp->cstate.datap;
> + base = (char *)resp->cstate.datap -
> + (char *)resp->xbuf->head[0].iov_base;
> + if (read_bytes_from_xdr_buf(resp->xbuf, base, slot->sl_data,
> + slot->sl_datalen))
> + printk(KERN_WARNING
> + "nfsd: sessions DRC could not cache compound\n");

I'd make this WARN("nfsd:...") just to make it completely clear it's a
kernel bug. (This case should be caught by nfsd4_check_drc_limit unless
we've messed something up, right?)

> + return;
> }
>
> /*
> @@ -1091,14 +1024,14 @@ nfsd4_enc_sequence_replay(struct nfsd4_compoundargs *args,
> struct nfsd4_slot *slot = resp->cstate.slot;
>
> dprintk("--> %s resp->opcnt %d cachethis %u \n", __func__,
> - resp->opcnt, resp->cstate.slot->sl_cache_entry.ce_cachethis);
> + resp->opcnt, resp->cstate.slot->sl_cachethis);
>
> /* Encode the replayed sequence operation */
> op = &args->ops[resp->opcnt - 1];
> nfsd4_encode_operation(resp, op);
>
> /* Return nfserr_retry_uncached_rep in next operation. */
> - if (args->opcnt > 1 && slot->sl_cache_entry.ce_cachethis == 0) {
> + if (args->opcnt > 1 && slot->sl_cachethis == 0) {
> op = &args->ops[resp->opcnt++];
> op->status = nfserr_retry_uncached_rep;
> nfsd4_encode_operation(resp, op);
> @@ -1107,57 +1040,29 @@ nfsd4_enc_sequence_replay(struct nfsd4_compoundargs *args,
> }
>
> /*
> - * Keep the first page of the replay. Copy the NFSv4.1 data from the first
> - * cached page. Replace any futher replay pages from the cache.
> + * The sequence operation is not cached because we can use the slot and
> + * session values.
> */
> __be32
> nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
> struct nfsd4_sequence *seq)
> {
> - struct nfsd4_cache_entry *entry = &resp->cstate.slot->sl_cache_entry;
> + struct nfsd4_slot *slot = resp->cstate.slot;
> __be32 status;
>
> - dprintk("--> %s entry %p\n", __func__, entry);
> -
> - /*
> - * If this is just the sequence operation, we did not keep
> - * a page in the cache entry because we can just use the
> - * slot info stored in struct nfsd4_sequence that was checked
> - * against the slot in nfsd4_sequence().
> - *
> - * This occurs when seq->cachethis is FALSE, or when the client
> - * session inactivity timer fires and a solo sequence operation
> - * is sent (lease renewal).
> - */
> + dprintk("--> %s slot %p\n", __func__, slot);
>
> /* Either returns 0 or nfserr_retry_uncached */
> status = nfsd4_enc_sequence_replay(resp->rqstp->rq_argp, resp);
> if (status == nfserr_retry_uncached_rep)
> return status;
>
> - if (!nfsd41_copy_replay_data(resp, entry)) {
> - /*
> - * Not enough room to use the replay rpc header, send the
> - * cached header. Release all the allocated result pages.
> - */
> - svc_free_res_pages(resp->rqstp);
> - nfsd4_copy_pages(resp->rqstp->rq_respages, entry->ce_respages,
> - entry->ce_resused);
> - } else {
> - /* Release all but the first allocated result page */
> -
> - resp->rqstp->rq_resused--;
> - svc_free_res_pages(resp->rqstp);
> -
> - nfsd4_copy_pages(&resp->rqstp->rq_respages[1],
> - &entry->ce_respages[1],
> - entry->ce_resused - 1);
> - }
> + /* The sequence operation has been encoded, cstate->datap set. */
> + memcpy(resp->cstate.datap, slot->sl_data, slot->sl_datalen);
>
> - resp->rqstp->rq_resused = entry->ce_resused;
> - resp->opcnt = entry->ce_opcnt;
> - resp->cstate.iovlen = entry->ce_datav.iov_len + entry->ce_rpchdrlen;
> - status = entry->ce_status;
> + resp->opcnt = slot->sl_opcnt;
> + resp->p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
> + status = slot->sl_status;
>
> return status;
> }
> @@ -1489,7 +1394,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
> if (seq->slotid >= session->se_fchannel.maxreqs)
> goto out;
>
> - slot = &session->se_slots[seq->slotid];
> + slot = session->se_slots[seq->slotid];
> dprintk("%s: slotid %d\n", __func__, seq->slotid);
>
> /* We do not negotiate the number of slots yet, so set the
> @@ -1502,7 +1407,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
> cstate->slot = slot;
> cstate->session = session;
> /* Return the cached reply status and set cstate->status
> - * for nfsd4_svc_encode_compoundres processing */
> + * for nfsd4_proc_compound processing */
> status = nfsd4_replay_cache_entry(resp, seq);
> cstate->status = nfserr_replay_cache;
> goto replay_cache;
> @@ -1513,7 +1418,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
> /* Success! bump slot seqid */
> slot->sl_inuse = true;
> slot->sl_seqid = seq->seqid;
> - slot->sl_cache_entry.ce_cachethis = seq->cachethis;
> + slot->sl_cachethis = seq->cachethis;
>
> cstate->slot = slot;
> cstate->session = session;
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index fdf632b..49824ea 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -3064,6 +3064,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, int nfserr,
> WRITE32(0);
>
> ADJUST_ARGS();
> + resp->cstate.datap = p; /* DRC cache data pointer */
> return 0;
> }
>
> @@ -3166,7 +3167,7 @@ static int nfsd4_check_drc_limit(struct nfsd4_compoundres *resp)
> return status;
>
> session = resp->cstate.session;
> - if (session == NULL || slot->sl_cache_entry.ce_cachethis == 0)
> + if (session == NULL || slot->sl_cachethis == 0)
> return status;
>
> if (resp->opcnt >= args->opcnt)
> @@ -3291,6 +3292,7 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
> /*
> * All that remains is to write the tag and operation count...
> */
> + struct nfsd4_compound_state *cs = &resp->cstate;
> struct kvec *iov;
> p = resp->tagp;
> *p++ = htonl(resp->taglen);
> @@ -3304,14 +3306,11 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
> iov = &rqstp->rq_res.head[0];
> iov->iov_len = ((char*)resp->p) - (char*)iov->iov_base;
> BUG_ON(iov->iov_len > PAGE_SIZE);
> - if (nfsd4_has_session(&resp->cstate)) {
> - if (resp->cstate.status == nfserr_replay_cache &&
> - !nfsd4_not_cached(resp)) {
> - iov->iov_len = resp->cstate.iovlen;
> - } else {
> + if (nfsd4_has_session(cs)) {
> + if (cs->status != nfserr_replay_cache) {
> nfsd4_store_cache_entry(resp);
> dprintk("%s: SET SLOT STATE TO AVAILABLE\n", __func__);
> - resp->cstate.slot->sl_inuse = 0;
> + resp->cstate.slot->sl_inuse = false;
> }
> nfsd4_put_session(resp->cstate.session);
> }
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index d68cd05..944ef01 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -576,10 +576,6 @@ nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp)
> + rqstp->rq_res.head[0].iov_len;
> rqstp->rq_res.head[0].iov_len += sizeof(__be32);
>
> - /* NFSv4.1 DRC requires statp */
> - if (rqstp->rq_vers == 4)
> - nfsd4_set_statp(rqstp, statp);
> -
> /* Now call the procedure handler, and encode NFS status. */
> nfserr = proc->pc_func(rqstp, rqstp->rq_argp, rqstp->rq_resp);
> nfserr = map_new_errors(rqstp->rq_vers, nfserr);
> diff --git a/include/linux/nfsd/state.h b/include/linux/nfsd/state.h
> index ff0b771..e745100 100644
> --- a/include/linux/nfsd/state.h
> +++ b/include/linux/nfsd/state.h
> @@ -94,30 +94,23 @@ struct nfs4_cb_conn {
>
> /* Maximum number of slots per session. 160 is useful for long haul TCP */
> #define NFSD_MAX_SLOTS_PER_SESSION 160
> -/* Maximum number of pages per slot cache entry */
> -#define NFSD_PAGES_PER_SLOT 1
> -#define NFSD_SLOT_CACHE_SIZE PAGE_SIZE
> /* Maximum number of operations per session compound */
> #define NFSD_MAX_OPS_PER_COMPOUND 16
> +/* Maximum session per slot cache size */
> +#define NFSD_SLOT_CACHE_SIZE 1024
> /* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
> #define NFSD_CACHE_SIZE_SLOTS_PER_SESSION 32
> #define NFSD_MAX_MEM_PER_SESSION \
> (NFSD_CACHE_SIZE_SLOTS_PER_SESSION * NFSD_SLOT_CACHE_SIZE)
>
> -struct nfsd4_cache_entry {
> - __be32 ce_status;
> - struct kvec ce_datav; /* encoded NFSv4.1 data in rq_res.head[0] */
> - struct page *ce_respages[NFSD_PAGES_PER_SLOT + 1];
> - int ce_cachethis;
> - short ce_resused;
> - int ce_opcnt;
> - int ce_rpchdrlen;
> -};
> -
> struct nfsd4_slot {
> - bool sl_inuse;
> - u32 sl_seqid;
> - struct nfsd4_cache_entry sl_cache_entry;
> + bool sl_inuse;
> + u32 sl_seqid;
> + int sl_cachethis;
> + int sl_opcnt;
> + __be32 sl_status;
> + u32 sl_datalen;
> + char sl_data[];

Could you just move sl_inuse to the end? It'll save a few bytes in the
structure (because the compiler will probably stick 3 bytes after it to
align sl_seqid.)

--b.

> };
>
> struct nfsd4_channel_attrs {
> @@ -159,7 +152,7 @@ struct nfsd4_session {
> struct nfs4_sessionid se_sessionid;
> struct nfsd4_channel_attrs se_fchannel;
> struct nfsd4_channel_attrs se_bchannel;
> - struct nfsd4_slot se_slots[]; /* forward channel slots */
> + struct nfsd4_slot *se_slots[]; /* forward channel slots */
> };
>
> static inline void
> diff --git a/include/linux/nfsd/xdr4.h b/include/linux/nfsd/xdr4.h
> index 3f71660..73164c2 100644
> --- a/include/linux/nfsd/xdr4.h
> +++ b/include/linux/nfsd/xdr4.h
> @@ -51,7 +51,7 @@ struct nfsd4_compound_state {
> /* For sessions DRC */
> struct nfsd4_session *session;
> struct nfsd4_slot *slot;
> - __be32 *statp;
> + __be32 *datap;
> size_t iovlen;
> u32 minorversion;
> u32 status;
> @@ -472,8 +472,7 @@ static inline bool nfsd4_is_solo_sequence(struct nfsd4_compoundres *resp)
>
> static inline bool nfsd4_not_cached(struct nfsd4_compoundres *resp)
> {
> - return !resp->cstate.slot->sl_cache_entry.ce_cachethis ||
> - nfsd4_is_solo_sequence(resp);
> + return !resp->cstate.slot->sl_cachethis || nfsd4_is_solo_sequence(resp);
> }
>
> #define NFS4_SVC_XDRSIZE sizeof(struct nfsd4_compoundargs)
> --
> 1.6.2.5
>

2009-08-28 21:34:09

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/5] NFSv4.1 DRC rewrite version 6

On Thu, Aug 27, 2009 at 12:07:39PM -0400, [email protected] wrote:
>
> NFSv4.1 DRC Rewrite Version 6
>
> These patches apply against git://linux-nfs.org/~bfields/linux: for-2.6.32
> branch and continue to rewrite the NFSv4.1 Sessions DRC. Besides some
> bug fixes:

Thanks for following up on this. It looks fine except for a few nits
noted on replies to individual patches. Could you fix those up and
resend? And then I think #7 should be it....

--b.

>
> 1) The bound on the fore channel per-session DRC size are rewritten.
> Instead of just using a maximum number of slots to bound the size,
> The server gives the client the number of ca_maxresponsesize_cached slots it
> requests bounded by NFSD_SLOT_CACHE_SIZE, NFSD_MAX_MEM_PER_SESSION and by
> nfsd_drc_max_mem. Do not allow more than NFSD_MAX_SLOTS_PER_SESSION.
>
> This allows clients to tailor a session to usage.
> For example, an I/O session (READ/WRITE/COMMIT only) can have a much smaller
> ca_maxresponsesize_cached (for only WRITE/COMMIt compound responses) and a lot
> larger ca_maxresponses to service a large in-flight data window.
>
> 2) the page-based DRC is replaced with a buffer based DRC with each
> slot table entry (struct nfsd4_slot + cache) allocated separately.
> This allocation prepares us for slot size re-negotiation via the SEQUENCE
> operation target and high slot id arguments.
>
> Testing:
>
> NFSv4.1 mount: pynfs tests - including the SEQUENCE replay cache tests.
> connectathon tests.
>
> NFSv4.0 mount: connectathon tests.
>
> -->Andy
>
> 0001-nfsd41-expand-solo-sequence-check.patch
> 0002-nfsd41-bound-forechannel-drc-size-by-memory-usage.patch
> 0003-nfsd41-use-session-maxreqs-for-sequence-target-and.patch
> 0004-nfsd41-replace-nfserr_resource-in-pure-nfs41-respon.patch
> 0005-nfsd41-replace-page-based-DRC-with-buffer-based-DRC.patch
>

2009-08-28 22:42:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 0/5] NFSv4.1 DRC rewrite version 6

On Thu, Aug 27, 2009 at 12:07:39PM -0400, [email protected] wrote:
>
> NFSv4.1 DRC Rewrite Version 6
>
> These patches apply against git://linux-nfs.org/~bfields/linux: for-2.6.32
> branch and continue to rewrite the NFSv4.1 Sessions DRC. Besides some
> bug fixes:
>
> 1) The bound on the fore channel per-session DRC size are rewritten.
> Instead of just using a maximum number of slots to bound the size,
> The server gives the client the number of ca_maxresponsesize_cached slots it
> requests bounded by NFSD_SLOT_CACHE_SIZE, NFSD_MAX_MEM_PER_SESSION and by
> nfsd_drc_max_mem. Do not allow more than NFSD_MAX_SLOTS_PER_SESSION.
>
> This allows clients to tailor a session to usage.
> For example, an I/O session (READ/WRITE/COMMIT only) can have a much smaller
> ca_maxresponsesize_cached (for only WRITE/COMMIt compound responses) and a lot
> larger ca_maxresponses to service a large in-flight data window.
>
> 2) the page-based DRC is replaced with a buffer based DRC with each
> slot table entry (struct nfsd4_slot + cache) allocated separately.
> This allocation prepares us for slot size re-negotiation via the SEQUENCE
> operation target and high slot id arguments.
>
> Testing:
>
> NFSv4.1 mount: pynfs tests - including the SEQUENCE replay cache tests.
> connectathon tests.

By the way, any hints on getting the 4.1 pynfs stuff running?

./nfs4.1/testserver.py pearlet1:/pynfs41test-root --rundeps --maketree
Traceback (most recent call last):
File "./nfs4.1/testserver.py", line 38, in <module>
import server41tests.environment as environment
File "/root/pynfs41/nfs4.1/server41tests/environment.py", line 15, in <module>
import rpc
File "/root/pynfs41/rpc/__init__.py", line 1, in <module>
from rpc import *
File "/root/pynfs41/rpc/rpc.py", line 14, in <module>
import security
File "/root/pynfs41/rpc/security.py", line 11, in <module>
import gssapi
ImportError: No module named gssapi

With the 4.0 pynfs I've always run

/setup.py build_ext --inplace

before. That gives me:

This is currently broken...everything should already be set up.
Just go into nfs4 and play.

The setup.py in nfs4.1/ complains about not finding xdrgen.

--b.

2009-08-30 06:38:35

by Benny Halevy

[permalink] [raw]
Subject: Re: [pnfs] [PATCH 2/5] nfsd41: bound forechannel drc size by memory usage

On Aug. 28, 2009, 23:41 +0300, "J. Bruce Fields" <[email protected]> wrote:
> Remind me why serverfault and not resource here?
>
> --b.
>

NFS4ERR_RESOURCE is not a valid nfsv4.1 error.

Benny

2009-08-30 23:10:56

by Benny Halevy

[permalink] [raw]
Subject: Re: [pnfs] [PATCH 5/5] nfsd41: replace page based DRC with buffer based DRC

On Aug. 29, 2009, 0:33 +0300, "J. Bruce Fields" <[email protected]> wrote:
> I think this is OK for now, but maybe stick something like:
>
> BUILD_BUG_ON(NFSD_MAX_SLOTS_PER_SESSION * sizeof(struct nfsd4_slot)
> + sizeof(struct nfsd4_session) > PAGE_SIZE);
>
> in state.h just to warn anyone who wants to blindly bump up

I think that the BUILD_BUG_ON should be placed in a function so
it gets to compile, so it'd better be defined here and not in state.h
(where it also is in the right context)

Benny

> NFSD_MAX_SLOTS_PER_SESSION. (It's not really forbidden to kmalloc more
> than a page, but it's also not reliable, and if it becomes necessary
> then we'd rather find some way to code around it.)
>
>

2009-08-31 13:08:23

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [pnfs] [PATCH 5/5] nfsd41: replace page based DRC with buffer based DRC

On Mon, Aug 31, 2009 at 02:10:55AM +0300, Benny Halevy wrote:
> On Aug. 29, 2009, 0:33 +0300, "J. Bruce Fields" <[email protected]> wrote:
> > I think this is OK for now, but maybe stick something like:
> >
> > BUILD_BUG_ON(NFSD_MAX_SLOTS_PER_SESSION * sizeof(struct nfsd4_slot)
> > + sizeof(struct nfsd4_session) > PAGE_SIZE);
> >
> > in state.h just to warn anyone who wants to blindly bump up
>
> I think that the BUILD_BUG_ON should be placed in a function so
> it gets to compile,

Whoops, thanks--I hadn't noticed that.

> so it'd better be defined here and not in state.h
> (where it also is in the right context)

Sounds fine.--b.

2009-08-31 13:43:51

by Andy Adamson

[permalink] [raw]
Subject: Re: [pnfs] [PATCH 5/5] nfsd41: replace page based DRC with buffer based DRC

OK - Thanks for the review. New patches to follow..

-->Andy

On Mon, Aug 31, 2009 at 9:08 AM, J. Bruce Fields<[email protected]> =
wrote:
> On Mon, Aug 31, 2009 at 02:10:55AM +0300, Benny Halevy wrote:
>> On Aug. 29, 2009, 0:33 +0300, "J. Bruce Fields" <bfields-uC3wQj2KruMpug/[email protected]=
g> wrote:
>> > I think this is OK for now, but maybe stick something like:
>> >
>> > =A0 =A0 BUILD_BUG_ON(NFSD_MAX_SLOTS_PER_SESSION * sizeof(struct nf=
sd4_slot)
>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 + sizeof(struct nfsd4_sess=
ion) > PAGE_SIZE);
>> >
>> > in state.h just to warn anyone who wants to blindly bump up
>>
>> I think that the BUILD_BUG_ON should be placed in a function so
>> it gets to compile,
>
> Whoops, thanks--I hadn't noticed that.
>
>> so it'd better be defined here and not in state.h
>> (where it also is in the right context)
>
> Sounds fine.--b.
> _______________________________________________
> pNFS mailing list
> [email protected]
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
>

2009-08-27 16:07:26

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 1/5] nfsd41: expand solo sequence check

From: Andy Adamson <[email protected]>

Failed sequenece == solo sequence.
Remove redundant sequence operation cache checks.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfsd/nfs4state.c | 9 ---------
include/linux/nfsd/xdr4.h | 2 +-
2 files changed, 1 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d2a0524..ddfd36f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -991,16 +991,10 @@ nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
{
struct nfsd4_cache_entry *entry = &resp->cstate.slot->sl_cache_entry;
struct svc_rqst *rqstp = resp->rqstp;
- struct nfsd4_compoundargs *args = rqstp->rq_argp;
- struct nfsd4_op *op = &args->ops[resp->opcnt];
struct kvec *resv = &rqstp->rq_res.head[0];

dprintk("--> %s entry %p\n", __func__, entry);

- /* Don't cache a failed OP_SEQUENCE. */
- if (resp->opcnt == 1 && op->opnum == OP_SEQUENCE && resp->cstate.status)
- return;
-
nfsd4_release_respages(entry->ce_respages, entry->ce_resused);
entry->ce_opcnt = resp->opcnt;
entry->ce_status = resp->cstate.status;
@@ -1490,9 +1484,6 @@ nfsd4_sequence(struct svc_rqst *rqstp,
slot->sl_inuse = true;
slot->sl_seqid = seq->seqid;
slot->sl_cache_entry.ce_cachethis = seq->cachethis;
- /* Always set the cache entry cachethis for solo sequence */
- if (nfsd4_is_solo_sequence(resp))
- slot->sl_cache_entry.ce_cachethis = 1;

cstate->slot = slot;
cstate->session = session;
diff --git a/include/linux/nfsd/xdr4.h b/include/linux/nfsd/xdr4.h
index 5e4beb0..3f71660 100644
--- a/include/linux/nfsd/xdr4.h
+++ b/include/linux/nfsd/xdr4.h
@@ -467,7 +467,7 @@ struct nfsd4_compoundres {
static inline bool nfsd4_is_solo_sequence(struct nfsd4_compoundres *resp)
{
struct nfsd4_compoundargs *args = resp->rqstp->rq_argp;
- return args->opcnt == 1;
+ return resp->opcnt == 1 && args->ops[0].opnum == OP_SEQUENCE;
}

static inline bool nfsd4_not_cached(struct nfsd4_compoundres *resp)
--
1.6.2.5


2009-08-27 16:07:27

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 2/5] nfsd41: bound forechannel drc size by memory usage

From: Andy Adamson <[email protected]>

By using the requested ca_maxresponsesize_cached * ca_maxresponses to bound
a forechannel drc request size, clients can tailor a session to usage.

For example, an I/O session (READ/WRITE only) can have a much smaller
ca_maxresponsesize_cached (for only WRITE compound responses) and a lot larger
ca_maxresponses to service a large in-flight data window.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfsd/nfs4state.c | 60 +++++++++++++++++++++++++++++++------------
include/linux/nfsd/state.h | 8 ++++-
2 files changed, 49 insertions(+), 19 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index ddfd36f..a691139 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -414,34 +414,60 @@ gen_sessionid(struct nfsd4_session *ses)
}

/*
- * Give the client the number of slots it requests bound by
- * NFSD_MAX_SLOTS_PER_SESSION and by nfsd_drc_max_mem.
+ * 32 bytes of RPC header and 44 bytes of sequence operation response
+ * not included in NFSD_SLOT_CACHE_SIZE
+ * */
+#define NFSD_MIN_HDR_SEQ_SZ (32 + 44)
+
+/*
+ * Give the client the number of ca_maxresponsesize_cached slots it requests
+ * bounded by NFSD_SLOT_CACHE_SIZE, NFSD_MAX_MEM_PER_SESSION and by
+ * nfsd_drc_max_mem. Do not allow more than NFSD_MAX_SLOTS_PER_SESSION.
+ *
+ * The ca_maxresponssize_cached definition includes the size
+ * of the rpc header with the variable length security flavor credential
+ * plus verifier (32 bytes with AUTH_SYS and NULL verifier)
+ * as well as the encoded SEQUENCE operation response (44 bytes)
+ * which are not included in NFSD_SLOT_CACHE_SIZE.
+ * We err on the side of being a bit small when AUTH_SYS/NULL verifier
+ * is not used.
*
* If we run out of reserved DRC memory we should (up to a point) re-negotiate
* active sessions and reduce their slot usage to make rooom for new
* connections. For now we just fail the create session.
*/
-static int set_forechannel_maxreqs(struct nfsd4_channel_attrs *fchan)
+static int set_forechannel_drc_size(struct nfsd4_channel_attrs *fchan)
{
- int mem;
+ int mem, size = fchan->maxresp_cached - NFSD_MIN_HDR_SEQ_SZ;

- if (fchan->maxreqs < 1)
+ if (fchan->maxreqs < 1 || size <= 0)
return nfserr_inval;
- else if (fchan->maxreqs > NFSD_MAX_SLOTS_PER_SESSION)
- fchan->maxreqs = NFSD_MAX_SLOTS_PER_SESSION;

- mem = fchan->maxreqs * NFSD_SLOT_CACHE_SIZE;
+ if (size > NFSD_SLOT_CACHE_SIZE)
+ size = NFSD_SLOT_CACHE_SIZE;
+
+ /* bound the maxreqs by NFSD_MAX_MEM_PER_SESSION */
+ mem = fchan->maxreqs * size;
+ if (mem > NFSD_MAX_MEM_PER_SESSION) {
+ fchan->maxreqs = NFSD_MAX_MEM_PER_SESSION / size;
+ if (fchan->maxreqs > NFSD_MAX_SLOTS_PER_SESSION)
+ fchan->maxreqs = NFSD_MAX_SLOTS_PER_SESSION;
+ mem = fchan->maxreqs * size;
+ }

spin_lock(&nfsd_drc_lock);
- if (mem + nfsd_drc_mem_used > nfsd_drc_max_mem)
- mem = ((nfsd_drc_max_mem - nfsd_drc_mem_used) /
- NFSD_SLOT_CACHE_SIZE) * NFSD_SLOT_CACHE_SIZE;
+ /* bound the total session drc memory ussage */
+ if (mem + nfsd_drc_mem_used > nfsd_drc_max_mem) {
+ fchan->maxreqs = (nfsd_drc_max_mem - nfsd_drc_mem_used) / size;
+ mem = fchan->maxreqs * size;
+ }
nfsd_drc_mem_used += mem;
spin_unlock(&nfsd_drc_lock);

- fchan->maxreqs = mem / NFSD_SLOT_CACHE_SIZE;
if (fchan->maxreqs == 0)
- return nfserr_resource;
+ return nfserr_serverfault;
+
+ fchan->maxresp_cached = size + NFSD_MIN_HDR_SEQ_SZ;
return 0;
}

@@ -466,9 +492,6 @@ static int init_forechannel_attrs(struct svc_rqst *rqstp,
fchan->maxresp_sz = maxcount;
session_fchan->maxresp_sz = fchan->maxresp_sz;

- session_fchan->maxresp_cached = NFSD_SLOT_CACHE_SIZE;
- fchan->maxresp_cached = session_fchan->maxresp_cached;
-
/* Use the client's maxops if possible */
if (fchan->maxops > NFSD_MAX_OPS_PER_COMPOUND)
fchan->maxops = NFSD_MAX_OPS_PER_COMPOUND;
@@ -478,9 +501,12 @@ static int init_forechannel_attrs(struct svc_rqst *rqstp,
* recover pages from existing sessions. For now fail session
* creation.
*/
- status = set_forechannel_maxreqs(fchan);
+ status = set_forechannel_drc_size(fchan);

+ session_fchan->maxresp_cached = fchan->maxresp_cached;
session_fchan->maxreqs = fchan->maxreqs;
+
+ dprintk("%s status %d\n", __func__, status);
return status;
}

diff --git a/include/linux/nfsd/state.h b/include/linux/nfsd/state.h
index fb0c404..ff0b771 100644
--- a/include/linux/nfsd/state.h
+++ b/include/linux/nfsd/state.h
@@ -92,13 +92,17 @@ struct nfs4_cb_conn {
struct rpc_cred * cb_cred;
};

-/* Maximum number of slots per session. 128 is useful for long haul TCP */
-#define NFSD_MAX_SLOTS_PER_SESSION 128
+/* Maximum number of slots per session. 160 is useful for long haul TCP */
+#define NFSD_MAX_SLOTS_PER_SESSION 160
/* Maximum number of pages per slot cache entry */
#define NFSD_PAGES_PER_SLOT 1
#define NFSD_SLOT_CACHE_SIZE PAGE_SIZE
/* Maximum number of operations per session compound */
#define NFSD_MAX_OPS_PER_COMPOUND 16
+/* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
+#define NFSD_CACHE_SIZE_SLOTS_PER_SESSION 32
+#define NFSD_MAX_MEM_PER_SESSION \
+ (NFSD_CACHE_SIZE_SLOTS_PER_SESSION * NFSD_SLOT_CACHE_SIZE)

struct nfsd4_cache_entry {
__be32 ce_status;
--
1.6.2.5


2009-08-27 16:07:28

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 3/5] nfsd41: use session maxreqs for sequence target and highest slotid

From: Andy Adamson <[email protected]>

This fixes a bug in the sequence operation reply.

The sequence operation returns the highest slotid it will accept in the future
in sr_highest_slotid, and the highest slotid it prefers the client to use.
Since we do not re-negotiate the session slot table yet, these should both
always be set to the session ca_maxrequests.

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfsd/nfs4state.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a691139..ada10ad 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1129,7 +1129,6 @@ nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
* session inactivity timer fires and a solo sequence operation
* is sent (lease renewal).
*/
- seq->maxslots = resp->cstate.session->se_fchannel.maxreqs;

/* Either returns 0 or nfserr_retry_uncached */
status = nfsd4_enc_sequence_replay(resp->rqstp->rq_argp, resp);
@@ -1493,6 +1492,11 @@ nfsd4_sequence(struct svc_rqst *rqstp,
slot = &session->se_slots[seq->slotid];
dprintk("%s: slotid %d\n", __func__, seq->slotid);

+ /* We do not negotiate the number of slots yet, so set the
+ * maxslots to the session maxreqs which is used to encode
+ * sr_highest_slotid and the sr_target_slot id to maxslots */
+ seq->maxslots = session->se_fchannel.maxreqs;
+
status = check_slot_seqid(seq->seqid, slot->sl_seqid, slot->sl_inuse);
if (status == nfserr_replay_cache) {
cstate->slot = slot;
--
1.6.2.5


2009-08-27 16:07:29

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 4/5] nfsd41: replace nfserr_resource in pure nfs41 responses

From: Andy Adamson <[email protected]>

nfserr_resource is not a legal error for NFSv4.1. Replace it with
nfserr_serverfault for EXCHANGE_ID and CREATE_SESSION processing.

We will also need to map nfserr_resource to other errors in routines shared
by NFSv4.0 and NFSv4.1

Signed-off-by: Andy Adamson <[email protected]>
---
fs/nfsd/nfs4state.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index ada10ad..4695cec 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -515,7 +515,7 @@ alloc_init_session(struct svc_rqst *rqstp, struct nfs4_client *clp,
struct nfsd4_create_session *cses)
{
struct nfsd4_session *new, tmp;
- int idx, status = nfserr_resource, slotsize;
+ int idx, status = nfserr_serverfault, slotsize;

memset(&tmp, 0, sizeof(tmp));

@@ -1278,7 +1278,7 @@ out_new:
/* Normal case */
new = create_client(exid->clname, dname);
if (new == NULL) {
- status = nfserr_resource;
+ status = nfserr_serverfault;
goto out;
}

--
1.6.2.5


2009-08-27 16:07:30

by Andy Adamson

[permalink] [raw]
Subject: [PATCH 5/5] nfsd41: replace page based DRC with buffer based DRC

From: Andy Adamson <[email protected]>

Use NFSD_SLOT_CACHE_SIZE size buffers for sessions DRC instead of holding nfsd
pages in cache.

Connectathon testing has shown that 1024 bytes for encoded compound operation
responses past the sequence operation is sufficient, 512 bytes is a little too
small. Set NFSD_SLOT_CACHE_SIZE to 1024.

Allocate memory for the session DRC in the CREATE_SESSION operation
to guarantee that the memory resource is available for caching responses.
Allocate each slot individually in preparation for slot table size negotiation.

Remove struct nfsd4_cache_entry and helper functions for the old page-based
DRC.

The iov_len calculation in nfs4svc_encode_compoundres is now always
correct, clean up the nfs4svc_encode_compoundres session logic.

The nfsd4_compound_state statp pointer is also not used.
Remove nfsd4_set_statp().

Move useful nfsd4_cache_entry fields into nfsd4_slot.

Signed-off-by: Andy Adamson <[email protected]
---
fs/nfsd/nfs4state.c | 207 ++++++++++++--------------------------------
fs/nfsd/nfs4xdr.c | 13 ++--
fs/nfsd/nfssvc.c | 4 -
include/linux/nfsd/state.h | 27 ++----
include/linux/nfsd/xdr4.h | 5 +-
5 files changed, 74 insertions(+), 182 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 4695cec..2d72d5c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -510,12 +510,22 @@ static int init_forechannel_attrs(struct svc_rqst *rqstp,
return status;
}

+static void
+free_session_slots(struct nfsd4_session *ses)
+{
+ int i;
+
+ for (i = 0; i < ses->se_fchannel.maxreqs; i++)
+ kfree(ses->se_slots[i]);
+}
+
static int
alloc_init_session(struct svc_rqst *rqstp, struct nfs4_client *clp,
struct nfsd4_create_session *cses)
{
struct nfsd4_session *new, tmp;
- int idx, status = nfserr_serverfault, slotsize;
+ struct nfsd4_slot *sp;
+ int idx, status = nfserr_serverfault, slotsize, cachesize, i;

memset(&tmp, 0, sizeof(tmp));

@@ -526,14 +536,23 @@ alloc_init_session(struct svc_rqst *rqstp, struct nfs4_client *clp,
if (status)
goto out;

- /* allocate struct nfsd4_session and slot table in one piece */
- slotsize = tmp.se_fchannel.maxreqs * sizeof(struct nfsd4_slot);
+ /* allocate struct nfsd4_session and slot table pointers in one piece */
+ slotsize = tmp.se_fchannel.maxreqs * sizeof(struct nfsd4_slot *);
new = kzalloc(sizeof(*new) + slotsize, GFP_KERNEL);
if (!new)
goto out;

memcpy(new, &tmp, sizeof(*new));

+ /* allocate each struct nfsd4_slot and data cache in one piece */
+ cachesize = new->se_fchannel.maxresp_cached - NFSD_MIN_HDR_SEQ_SZ;
+ for (i = 0; i < new->se_fchannel.maxreqs; i++) {
+ sp = kzalloc(sizeof(*sp) + cachesize, GFP_KERNEL);
+ if (!sp)
+ goto out_free;
+ new->se_slots[i] = sp;
+ }
+
new->se_client = clp;
gen_sessionid(new);
idx = hash_sessionid(&new->se_sessionid);
@@ -550,6 +569,10 @@ alloc_init_session(struct svc_rqst *rqstp, struct nfs4_client *clp,
status = nfs_ok;
out:
return status;
+out_free:
+ free_session_slots(new);
+ kfree(new);
+ goto out;
}

/* caller must hold sessionid_lock */
@@ -592,22 +615,16 @@ release_session(struct nfsd4_session *ses)
nfsd4_put_session(ses);
}

-static void nfsd4_release_respages(struct page **respages, short resused);
-
void
free_session(struct kref *kref)
{
struct nfsd4_session *ses;
- int i;

ses = container_of(kref, struct nfsd4_session, se_ref);
- for (i = 0; i < ses->se_fchannel.maxreqs; i++) {
- struct nfsd4_cache_entry *e = &ses->se_slots[i].sl_cache_entry;
- nfsd4_release_respages(e->ce_respages, e->ce_resused);
- }
spin_lock(&nfsd_drc_lock);
nfsd_drc_mem_used -= ses->se_fchannel.maxreqs * NFSD_SLOT_CACHE_SIZE;
spin_unlock(&nfsd_drc_lock);
+ free_session_slots(ses);
kfree(ses);
}

@@ -964,116 +981,32 @@ out_err:
return;
}

-void
-nfsd4_set_statp(struct svc_rqst *rqstp, __be32 *statp)
-{
- struct nfsd4_compoundres *resp = rqstp->rq_resp;
-
- resp->cstate.statp = statp;
-}
-
-/*
- * Dereference the result pages.
- */
-static void
-nfsd4_release_respages(struct page **respages, short resused)
-{
- int i;
-
- dprintk("--> %s\n", __func__);
- for (i = 0; i < resused; i++) {
- if (!respages[i])
- continue;
- put_page(respages[i]);
- respages[i] = NULL;
- }
-}
-
-static void
-nfsd4_copy_pages(struct page **topages, struct page **frompages, short count)
-{
- int i;
-
- for (i = 0; i < count; i++) {
- topages[i] = frompages[i];
- if (!topages[i])
- continue;
- get_page(topages[i]);
- }
-}
-
/*
- * Cache the reply pages up to NFSD_PAGES_PER_SLOT + 1, clearing the previous
- * pages. We add a page to NFSD_PAGES_PER_SLOT for the case where the total
- * length of the XDR response is less than se_fmaxresp_cached
- * (NFSD_PAGES_PER_SLOT * PAGE_SIZE) but the xdr_buf pages is used for a
- * of the reply (e.g. readdir).
- *
- * Store the base and length of the rq_req.head[0] page
- * of the NFSv4.1 data, just past the rpc header.
+ * Cache a reply. nfsd4_check_drc_limit() has bounded the cache size.
*/
void
nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
{
- struct nfsd4_cache_entry *entry = &resp->cstate.slot->sl_cache_entry;
- struct svc_rqst *rqstp = resp->rqstp;
- struct kvec *resv = &rqstp->rq_res.head[0];
-
- dprintk("--> %s entry %p\n", __func__, entry);
+ struct nfsd4_slot *slot = resp->cstate.slot;
+ unsigned int base;

- nfsd4_release_respages(entry->ce_respages, entry->ce_resused);
- entry->ce_opcnt = resp->opcnt;
- entry->ce_status = resp->cstate.status;
+ dprintk("--> %s slot %p\n", __func__, slot);

- /*
- * Don't need a page to cache just the sequence operation - the slot
- * does this for us!
- */
+ slot->sl_opcnt = resp->opcnt;
+ slot->sl_status = resp->cstate.status;

if (nfsd4_not_cached(resp)) {
- entry->ce_resused = 0;
- entry->ce_rpchdrlen = 0;
- dprintk("%s Just cache SEQUENCE. ce_cachethis %d\n", __func__,
- resp->cstate.slot->sl_cache_entry.ce_cachethis);
+ slot->sl_datalen = 0;
return;
}
- entry->ce_resused = rqstp->rq_resused;
- if (entry->ce_resused > NFSD_PAGES_PER_SLOT + 1)
- entry->ce_resused = NFSD_PAGES_PER_SLOT + 1;
- nfsd4_copy_pages(entry->ce_respages, rqstp->rq_respages,
- entry->ce_resused);
- entry->ce_datav.iov_base = resp->cstate.statp;
- entry->ce_datav.iov_len = resv->iov_len - ((char *)resp->cstate.statp -
- (char *)page_address(rqstp->rq_respages[0]));
- /* Current request rpc header length*/
- entry->ce_rpchdrlen = (char *)resp->cstate.statp -
- (char *)page_address(rqstp->rq_respages[0]);
-}
-
-/*
- * We keep the rpc header, but take the nfs reply from the replycache.
- */
-static int
-nfsd41_copy_replay_data(struct nfsd4_compoundres *resp,
- struct nfsd4_cache_entry *entry)
-{
- struct svc_rqst *rqstp = resp->rqstp;
- struct kvec *resv = &resp->rqstp->rq_res.head[0];
- int len;
-
- /* Current request rpc header length*/
- len = (char *)resp->cstate.statp -
- (char *)page_address(rqstp->rq_respages[0]);
- if (entry->ce_datav.iov_len + len > PAGE_SIZE) {
- dprintk("%s v41 cached reply too large (%Zd).\n", __func__,
- entry->ce_datav.iov_len);
- return 0;
- }
- /* copy the cached reply nfsd data past the current rpc header */
- memcpy((char *)resv->iov_base + len, entry->ce_datav.iov_base,
- entry->ce_datav.iov_len);
- resv->iov_len = len + entry->ce_datav.iov_len;
- return 1;
+ slot->sl_datalen = (char *)resp->p - (char *)resp->cstate.datap;
+ base = (char *)resp->cstate.datap -
+ (char *)resp->xbuf->head[0].iov_base;
+ if (read_bytes_from_xdr_buf(resp->xbuf, base, slot->sl_data,
+ slot->sl_datalen))
+ printk(KERN_WARNING
+ "nfsd: sessions DRC could not cache compound\n");
+ return;
}

/*
@@ -1091,14 +1024,14 @@ nfsd4_enc_sequence_replay(struct nfsd4_compoundargs *args,
struct nfsd4_slot *slot = resp->cstate.slot;

dprintk("--> %s resp->opcnt %d cachethis %u \n", __func__,
- resp->opcnt, resp->cstate.slot->sl_cache_entry.ce_cachethis);
+ resp->opcnt, resp->cstate.slot->sl_cachethis);

/* Encode the replayed sequence operation */
op = &args->ops[resp->opcnt - 1];
nfsd4_encode_operation(resp, op);

/* Return nfserr_retry_uncached_rep in next operation. */
- if (args->opcnt > 1 && slot->sl_cache_entry.ce_cachethis == 0) {
+ if (args->opcnt > 1 && slot->sl_cachethis == 0) {
op = &args->ops[resp->opcnt++];
op->status = nfserr_retry_uncached_rep;
nfsd4_encode_operation(resp, op);
@@ -1107,57 +1040,29 @@ nfsd4_enc_sequence_replay(struct nfsd4_compoundargs *args,
}

/*
- * Keep the first page of the replay. Copy the NFSv4.1 data from the first
- * cached page. Replace any futher replay pages from the cache.
+ * The sequence operation is not cached because we can use the slot and
+ * session values.
*/
__be32
nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
struct nfsd4_sequence *seq)
{
- struct nfsd4_cache_entry *entry = &resp->cstate.slot->sl_cache_entry;
+ struct nfsd4_slot *slot = resp->cstate.slot;
__be32 status;

- dprintk("--> %s entry %p\n", __func__, entry);
-
- /*
- * If this is just the sequence operation, we did not keep
- * a page in the cache entry because we can just use the
- * slot info stored in struct nfsd4_sequence that was checked
- * against the slot in nfsd4_sequence().
- *
- * This occurs when seq->cachethis is FALSE, or when the client
- * session inactivity timer fires and a solo sequence operation
- * is sent (lease renewal).
- */
+ dprintk("--> %s slot %p\n", __func__, slot);

/* Either returns 0 or nfserr_retry_uncached */
status = nfsd4_enc_sequence_replay(resp->rqstp->rq_argp, resp);
if (status == nfserr_retry_uncached_rep)
return status;

- if (!nfsd41_copy_replay_data(resp, entry)) {
- /*
- * Not enough room to use the replay rpc header, send the
- * cached header. Release all the allocated result pages.
- */
- svc_free_res_pages(resp->rqstp);
- nfsd4_copy_pages(resp->rqstp->rq_respages, entry->ce_respages,
- entry->ce_resused);
- } else {
- /* Release all but the first allocated result page */
-
- resp->rqstp->rq_resused--;
- svc_free_res_pages(resp->rqstp);
-
- nfsd4_copy_pages(&resp->rqstp->rq_respages[1],
- &entry->ce_respages[1],
- entry->ce_resused - 1);
- }
+ /* The sequence operation has been encoded, cstate->datap set. */
+ memcpy(resp->cstate.datap, slot->sl_data, slot->sl_datalen);

- resp->rqstp->rq_resused = entry->ce_resused;
- resp->opcnt = entry->ce_opcnt;
- resp->cstate.iovlen = entry->ce_datav.iov_len + entry->ce_rpchdrlen;
- status = entry->ce_status;
+ resp->opcnt = slot->sl_opcnt;
+ resp->p = resp->cstate.datap + XDR_QUADLEN(slot->sl_datalen);
+ status = slot->sl_status;

return status;
}
@@ -1489,7 +1394,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
if (seq->slotid >= session->se_fchannel.maxreqs)
goto out;

- slot = &session->se_slots[seq->slotid];
+ slot = session->se_slots[seq->slotid];
dprintk("%s: slotid %d\n", __func__, seq->slotid);

/* We do not negotiate the number of slots yet, so set the
@@ -1502,7 +1407,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
cstate->slot = slot;
cstate->session = session;
/* Return the cached reply status and set cstate->status
- * for nfsd4_svc_encode_compoundres processing */
+ * for nfsd4_proc_compound processing */
status = nfsd4_replay_cache_entry(resp, seq);
cstate->status = nfserr_replay_cache;
goto replay_cache;
@@ -1513,7 +1418,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
/* Success! bump slot seqid */
slot->sl_inuse = true;
slot->sl_seqid = seq->seqid;
- slot->sl_cache_entry.ce_cachethis = seq->cachethis;
+ slot->sl_cachethis = seq->cachethis;

cstate->slot = slot;
cstate->session = session;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index fdf632b..49824ea 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3064,6 +3064,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres *resp, int nfserr,
WRITE32(0);

ADJUST_ARGS();
+ resp->cstate.datap = p; /* DRC cache data pointer */
return 0;
}

@@ -3166,7 +3167,7 @@ static int nfsd4_check_drc_limit(struct nfsd4_compoundres *resp)
return status;

session = resp->cstate.session;
- if (session == NULL || slot->sl_cache_entry.ce_cachethis == 0)
+ if (session == NULL || slot->sl_cachethis == 0)
return status;

if (resp->opcnt >= args->opcnt)
@@ -3291,6 +3292,7 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
/*
* All that remains is to write the tag and operation count...
*/
+ struct nfsd4_compound_state *cs = &resp->cstate;
struct kvec *iov;
p = resp->tagp;
*p++ = htonl(resp->taglen);
@@ -3304,14 +3306,11 @@ nfs4svc_encode_compoundres(struct svc_rqst *rqstp, __be32 *p, struct nfsd4_compo
iov = &rqstp->rq_res.head[0];
iov->iov_len = ((char*)resp->p) - (char*)iov->iov_base;
BUG_ON(iov->iov_len > PAGE_SIZE);
- if (nfsd4_has_session(&resp->cstate)) {
- if (resp->cstate.status == nfserr_replay_cache &&
- !nfsd4_not_cached(resp)) {
- iov->iov_len = resp->cstate.iovlen;
- } else {
+ if (nfsd4_has_session(cs)) {
+ if (cs->status != nfserr_replay_cache) {
nfsd4_store_cache_entry(resp);
dprintk("%s: SET SLOT STATE TO AVAILABLE\n", __func__);
- resp->cstate.slot->sl_inuse = 0;
+ resp->cstate.slot->sl_inuse = false;
}
nfsd4_put_session(resp->cstate.session);
}
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index d68cd05..944ef01 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -576,10 +576,6 @@ nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp)
+ rqstp->rq_res.head[0].iov_len;
rqstp->rq_res.head[0].iov_len += sizeof(__be32);

- /* NFSv4.1 DRC requires statp */
- if (rqstp->rq_vers == 4)
- nfsd4_set_statp(rqstp, statp);
-
/* Now call the procedure handler, and encode NFS status. */
nfserr = proc->pc_func(rqstp, rqstp->rq_argp, rqstp->rq_resp);
nfserr = map_new_errors(rqstp->rq_vers, nfserr);
diff --git a/include/linux/nfsd/state.h b/include/linux/nfsd/state.h
index ff0b771..e745100 100644
--- a/include/linux/nfsd/state.h
+++ b/include/linux/nfsd/state.h
@@ -94,30 +94,23 @@ struct nfs4_cb_conn {

/* Maximum number of slots per session. 160 is useful for long haul TCP */
#define NFSD_MAX_SLOTS_PER_SESSION 160
-/* Maximum number of pages per slot cache entry */
-#define NFSD_PAGES_PER_SLOT 1
-#define NFSD_SLOT_CACHE_SIZE PAGE_SIZE
/* Maximum number of operations per session compound */
#define NFSD_MAX_OPS_PER_COMPOUND 16
+/* Maximum session per slot cache size */
+#define NFSD_SLOT_CACHE_SIZE 1024
/* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
#define NFSD_CACHE_SIZE_SLOTS_PER_SESSION 32
#define NFSD_MAX_MEM_PER_SESSION \
(NFSD_CACHE_SIZE_SLOTS_PER_SESSION * NFSD_SLOT_CACHE_SIZE)

-struct nfsd4_cache_entry {
- __be32 ce_status;
- struct kvec ce_datav; /* encoded NFSv4.1 data in rq_res.head[0] */
- struct page *ce_respages[NFSD_PAGES_PER_SLOT + 1];
- int ce_cachethis;
- short ce_resused;
- int ce_opcnt;
- int ce_rpchdrlen;
-};
-
struct nfsd4_slot {
- bool sl_inuse;
- u32 sl_seqid;
- struct nfsd4_cache_entry sl_cache_entry;
+ bool sl_inuse;
+ u32 sl_seqid;
+ int sl_cachethis;
+ int sl_opcnt;
+ __be32 sl_status;
+ u32 sl_datalen;
+ char sl_data[];
};

struct nfsd4_channel_attrs {
@@ -159,7 +152,7 @@ struct nfsd4_session {
struct nfs4_sessionid se_sessionid;
struct nfsd4_channel_attrs se_fchannel;
struct nfsd4_channel_attrs se_bchannel;
- struct nfsd4_slot se_slots[]; /* forward channel slots */
+ struct nfsd4_slot *se_slots[]; /* forward channel slots */
};

static inline void
diff --git a/include/linux/nfsd/xdr4.h b/include/linux/nfsd/xdr4.h
index 3f71660..73164c2 100644
--- a/include/linux/nfsd/xdr4.h
+++ b/include/linux/nfsd/xdr4.h
@@ -51,7 +51,7 @@ struct nfsd4_compound_state {
/* For sessions DRC */
struct nfsd4_session *session;
struct nfsd4_slot *slot;
- __be32 *statp;
+ __be32 *datap;
size_t iovlen;
u32 minorversion;
u32 status;
@@ -472,8 +472,7 @@ static inline bool nfsd4_is_solo_sequence(struct nfsd4_compoundres *resp)

static inline bool nfsd4_not_cached(struct nfsd4_compoundres *resp)
{
- return !resp->cstate.slot->sl_cache_entry.ce_cachethis ||
- nfsd4_is_solo_sequence(resp);
+ return !resp->cstate.slot->sl_cachethis || nfsd4_is_solo_sequence(resp);
}

#define NFS4_SVC_XDRSIZE sizeof(struct nfsd4_compoundargs)
--
1.6.2.5


2009-08-27 17:12:14

by Andy Adamson

[permalink] [raw]
Subject: Re: [PATCH 2/5] nfsd41: bound forechannel drc size by memory usage

On Thu, Aug 27, 2009 at 12:07 PM, <[email protected]> wrote:
> From: Andy Adamson <[email protected]>
>
> By using the requested ca_maxresponsesize_cached * ca_maxresponses to=
bound
> a forechannel drc request size, clients can tailor a session to usage=
=2E
>
> For example, an I/O session (READ/WRITE only) can have a much smaller
> ca_maxresponsesize_cached (for only WRITE compound responses) and a l=
ot larger
> ca_maxresponses to service a large in-flight data window.
>
> Signed-off-by: Andy Adamson <[email protected]>
> ---
> =A0fs/nfsd/nfs4state.c =A0 =A0 =A0 =A0| =A0 60 ++++++++++++++++++++++=
+++++++++------------
> =A0include/linux/nfsd/state.h | =A0 =A08 ++++-
> =A02 files changed, 49 insertions(+), 19 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index ddfd36f..a691139 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -414,34 +414,60 @@ gen_sessionid(struct nfsd4_session *ses)
> =A0}
>
> =A0/*
> - * Give the client the number of slots it requests bound by
> - * NFSD_MAX_SLOTS_PER_SESSION and by nfsd_drc_max_mem.
> + * 32 bytes of RPC header and 44 bytes of sequence operation respons=
e
> + * not included in NFSD_SLOT_CACHE_SIZE
> + * */
> +#define NFSD_MIN_HDR_SEQ_SZ =A0(32 + 44)
> +
> +/*
> + * Give the client the number of ca_maxresponsesize_cached slots it =
requests
> + * bounded by NFSD_SLOT_CACHE_SIZE, NFSD_MAX_MEM_PER_SESSION and by
> + * nfsd_drc_max_mem. Do not allow more than NFSD_MAX_SLOTS_PER_SESSI=
ON.
> + *
> + * The ca_maxresponssize_cached definition includes the size
> + * of the rpc header with the variable length security flavor creden=
tial
> + * plus verifier (32 bytes with AUTH_SYS and NULL verifier)
> + * as well as the encoded SEQUENCE operation response (44 bytes)
> + * which are not included in NFSD_SLOT_CACHE_SIZE.
> + * We err on the side of being a bit small when AUTH_SYS/NULL verifi=
er
> + * is not used.
> =A0*
> =A0* If we run out of reserved DRC memory we should (up to a point) r=
e-negotiate
> =A0* active sessions and reduce their slot usage to make rooom for ne=
w
> =A0* connections. For now we just fail the create session.
> =A0*/
> -static int set_forechannel_maxreqs(struct nfsd4_channel_attrs *fchan=
)
> +static int set_forechannel_drc_size(struct nfsd4_channel_attrs *fcha=
n)
> =A0{
> - =A0 =A0 =A0 int mem;
> + =A0 =A0 =A0 int mem, size =3D fchan->maxresp_cached - NFSD_MIN_HDR_=
SEQ_SZ;
>
> - =A0 =A0 =A0 if (fchan->maxreqs < 1)
> + =A0 =A0 =A0 if (fchan->maxreqs < 1 || size <=3D 0)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return nfserr_inval;
> - =A0 =A0 =A0 else if (fchan->maxreqs > NFSD_MAX_SLOTS_PER_SESSION)
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 fchan->maxreqs =3D NFSD_MAX_SLOTS_PER_S=
ESSION;
>
> - =A0 =A0 =A0 mem =3D fchan->maxreqs * NFSD_SLOT_CACHE_SIZE;
> + =A0 =A0 =A0 if (size > NFSD_SLOT_CACHE_SIZE)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 size =3D NFSD_SLOT_CACHE_SIZE;
> +
> + =A0 =A0 =A0 /* bound the maxreqs by NFSD_MAX_MEM_PER_SESSION */
> + =A0 =A0 =A0 mem =3D fchan->maxreqs * size;
> + =A0 =A0 =A0 if (mem > NFSD_MAX_MEM_PER_SESSION) {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 fchan->maxreqs =3D NFSD_MAX_MEM_PER_SES=
SION / size;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (fchan->maxreqs > NFSD_MAX_SLOTS_PER=
_SESSION)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fchan->maxreqs =3D NFSD=
_MAX_SLOTS_PER_SESSION;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mem =3D fchan->maxreqs * size;
> + =A0 =A0 =A0 }
>
> =A0 =A0 =A0 =A0spin_lock(&nfsd_drc_lock);
> - =A0 =A0 =A0 if (mem + nfsd_drc_mem_used > nfsd_drc_max_mem)
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 mem =3D ((nfsd_drc_max_mem - nfsd_drc_m=
em_used) /
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 NFSD_SL=
OT_CACHE_SIZE) * NFSD_SLOT_CACHE_SIZE;
> + =A0 =A0 =A0 /* bound the total session drc memory ussage */
> + =A0 =A0 =A0 if (mem + nfsd_drc_mem_used > nfsd_drc_max_mem) {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 fchan->maxreqs =3D (nfsd_drc_max_mem - =
nfsd_drc_mem_used) / size;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 mem =3D fchan->maxreqs * size;
> + =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0nfsd_drc_mem_used +=3D mem;
> =A0 =A0 =A0 =A0spin_unlock(&nfsd_drc_lock);
>
> - =A0 =A0 =A0 fchan->maxreqs =3D mem / NFSD_SLOT_CACHE_SIZE;
> =A0 =A0 =A0 =A0if (fchan->maxreqs =3D=3D 0)
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return nfserr_resource;
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return nfserr_serverfault;
^^^^^^^^^^^^^^^^^^^
This change belongs in patch [PATCH 4/5] nfsd41: replace
nfserr_resource in pure nfs41 responses

-->Andy

> +
> + =A0 =A0 =A0 fchan->maxresp_cached =3D size + NFSD_MIN_HDR_SEQ_SZ;
> =A0 =A0 =A0 =A0return 0;
> =A0}
>
> @@ -466,9 +492,6 @@ static int init_forechannel_attrs(struct svc_rqst=
*rqstp,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0fchan->maxresp_sz =3D maxcount;
> =A0 =A0 =A0 =A0session_fchan->maxresp_sz =3D fchan->maxresp_sz;
>
> - =A0 =A0 =A0 session_fchan->maxresp_cached =3D NFSD_SLOT_CACHE_SIZE;
> - =A0 =A0 =A0 fchan->maxresp_cached =3D session_fchan->maxresp_cached=
;
> -
> =A0 =A0 =A0 =A0/* Use the client's maxops if possible */
> =A0 =A0 =A0 =A0if (fchan->maxops > NFSD_MAX_OPS_PER_COMPOUND)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0fchan->maxops =3D NFSD_MAX_OPS_PER_COM=
POUND;
> @@ -478,9 +501,12 @@ static int init_forechannel_attrs(struct svc_rqs=
t *rqstp,
> =A0 =A0 =A0 =A0 * recover pages from existing sessions. For now fail =
session
> =A0 =A0 =A0 =A0 * creation.
> =A0 =A0 =A0 =A0 */
> - =A0 =A0 =A0 status =3D set_forechannel_maxreqs(fchan);
> + =A0 =A0 =A0 status =3D set_forechannel_drc_size(fchan);
>
> + =A0 =A0 =A0 session_fchan->maxresp_cached =3D fchan->maxresp_cached=
;
> =A0 =A0 =A0 =A0session_fchan->maxreqs =3D fchan->maxreqs;
> +
> + =A0 =A0 =A0 dprintk("%s status %d\n", __func__, status);
> =A0 =A0 =A0 =A0return status;
> =A0}
>
> diff --git a/include/linux/nfsd/state.h b/include/linux/nfsd/state.h
> index fb0c404..ff0b771 100644
> --- a/include/linux/nfsd/state.h
> +++ b/include/linux/nfsd/state.h
> @@ -92,13 +92,17 @@ struct nfs4_cb_conn {
> =A0 =A0 =A0 =A0struct rpc_cred * =A0 =A0 =A0 cb_cred;
> =A0};
>
> -/* Maximum number of slots per session. 128 is useful for long haul =
TCP */
> -#define NFSD_MAX_SLOTS_PER_SESSION =A0 =A0 128
> +/* Maximum number of slots per session. 160 is useful for long haul =
TCP */
> +#define NFSD_MAX_SLOTS_PER_SESSION =A0 =A0 160
> =A0/* Maximum number of pages per slot cache entry */
> =A0#define NFSD_PAGES_PER_SLOT =A0 =A01
> =A0#define NFSD_SLOT_CACHE_SIZE =A0 =A0 =A0 =A0 =A0 PAGE_SIZE
> =A0/* Maximum number of operations per session compound */
> =A0#define NFSD_MAX_OPS_PER_COMPOUND =A0 =A0 =A016
> +/* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
> +#define NFSD_CACHE_SIZE_SLOTS_PER_SESSION =A0 =A0 =A032
> +#define NFSD_MAX_MEM_PER_SESSION =A0\
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 (NFSD_CACHE_SIZE_SLOTS_PER_SESSION * NF=
SD_SLOT_CACHE_SIZE)
>
> =A0struct nfsd4_cache_entry {
> =A0 =A0 =A0 =A0__be32 =A0 =A0 =A0 =A0 =A0ce_status;
> --
> 1.6.2.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

2009-08-27 21:42:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/5] nfsd41: expand solo sequence check

On Thu, Aug 27, 2009 at 12:07:40PM -0400, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> Failed sequenece == solo sequence.
> Remove redundant sequence operation cache checks.

Applied, thanks.

--b.

>
> Signed-off-by: Andy Adamson <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 9 ---------
> include/linux/nfsd/xdr4.h | 2 +-
> 2 files changed, 1 insertions(+), 10 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index d2a0524..ddfd36f 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -991,16 +991,10 @@ nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
> {
> struct nfsd4_cache_entry *entry = &resp->cstate.slot->sl_cache_entry;
> struct svc_rqst *rqstp = resp->rqstp;
> - struct nfsd4_compoundargs *args = rqstp->rq_argp;
> - struct nfsd4_op *op = &args->ops[resp->opcnt];
> struct kvec *resv = &rqstp->rq_res.head[0];
>
> dprintk("--> %s entry %p\n", __func__, entry);
>
> - /* Don't cache a failed OP_SEQUENCE. */
> - if (resp->opcnt == 1 && op->opnum == OP_SEQUENCE && resp->cstate.status)
> - return;
> -
> nfsd4_release_respages(entry->ce_respages, entry->ce_resused);
> entry->ce_opcnt = resp->opcnt;
> entry->ce_status = resp->cstate.status;
> @@ -1490,9 +1484,6 @@ nfsd4_sequence(struct svc_rqst *rqstp,
> slot->sl_inuse = true;
> slot->sl_seqid = seq->seqid;
> slot->sl_cache_entry.ce_cachethis = seq->cachethis;
> - /* Always set the cache entry cachethis for solo sequence */
> - if (nfsd4_is_solo_sequence(resp))
> - slot->sl_cache_entry.ce_cachethis = 1;
>
> cstate->slot = slot;
> cstate->session = session;
> diff --git a/include/linux/nfsd/xdr4.h b/include/linux/nfsd/xdr4.h
> index 5e4beb0..3f71660 100644
> --- a/include/linux/nfsd/xdr4.h
> +++ b/include/linux/nfsd/xdr4.h
> @@ -467,7 +467,7 @@ struct nfsd4_compoundres {
> static inline bool nfsd4_is_solo_sequence(struct nfsd4_compoundres *resp)
> {
> struct nfsd4_compoundargs *args = resp->rqstp->rq_argp;
> - return args->opcnt == 1;
> + return resp->opcnt == 1 && args->ops[0].opnum == OP_SEQUENCE;
> }
>
> static inline bool nfsd4_not_cached(struct nfsd4_compoundres *resp)
> --
> 1.6.2.5
>

2009-08-28 20:41:14

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/5] nfsd41: bound forechannel drc size by memory usage

Looks basically fine, but there are a few nitpicky problems:

On Thu, Aug 27, 2009 at 12:07:41PM -0400, [email protected] wrote:
> From: Andy Adamson <[email protected]>
>
> By using the requested ca_maxresponsesize_cached * ca_maxresponses to bound
> a forechannel drc request size, clients can tailor a session to usage.
>
> For example, an I/O session (READ/WRITE only) can have a much smaller
> ca_maxresponsesize_cached (for only WRITE compound responses) and a lot larger
> ca_maxresponses to service a large in-flight data window.
>
> Signed-off-by: Andy Adamson <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 60 +++++++++++++++++++++++++++++++------------
> include/linux/nfsd/state.h | 8 ++++-
> 2 files changed, 49 insertions(+), 19 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index ddfd36f..a691139 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -414,34 +414,60 @@ gen_sessionid(struct nfsd4_session *ses)
> }
>
> /*
> - * Give the client the number of slots it requests bound by
> - * NFSD_MAX_SLOTS_PER_SESSION and by nfsd_drc_max_mem.
> + * 32 bytes of RPC header and 44 bytes of sequence operation response
> + * not included in NFSD_SLOT_CACHE_SIZE
> + * */
> +#define NFSD_MIN_HDR_SEQ_SZ (32 + 44)

I took a look at a trace and got 24 for the rpc header (xid through
accept state), 12 for compound header (status, empty tag, opcount), and
44 for the sequence response (opcode through status flags), 80
together--could you double check this?

> +
> +/*
> + * Give the client the number of ca_maxresponsesize_cached slots it requests
> + * bounded by NFSD_SLOT_CACHE_SIZE, NFSD_MAX_MEM_PER_SESSION and by
> + * nfsd_drc_max_mem. Do not allow more than NFSD_MAX_SLOTS_PER_SESSION.
> + *
> + * The ca_maxresponssize_cached definition includes the size
> + * of the rpc header with the variable length security flavor credential
> + * plus verifier (32 bytes with AUTH_SYS and NULL verifier)

Note there's no credential in the response, just a verifier. My
attempt at these two comments:

/*
* The protocol defines ca_maxresponssize_cached to include the size of
* the rpc header, but all we need to cache is the data starting after
* the end of the initial SEQUENCE operation--the rest we regenerate
* each time. Therefore we can advertise a ca_maxresponssize_cached
* value that is the number of bytes in our cache plus a few additional
* bytes. In order to stay on the safe side, and not promise more than
* we can cache, those additional bytes must be the minimum possible: 24
* bytes of rpc header (xid through accept state, with AUTH_NULL
* verifier), 12 for the compound header (with zero-length tag), and 44
* for the SEQUENCE op response:
*/
#define NFSD_MIN_HDR_SEQ_SZ (24 + 12 + 44)

/*
* Give the client the number of ca_maxresponsesize_cached slots it
* requests, of size bounded by NFSD_SLOT_CACHE_SIZE,
* NFSD_MAX_MEM_PER_SESSION, and nfsd_drc_max_mem. Do not allow more
* than NFSD_MAX_SLOTS_PER_SESSION.
*
* If we run out of reserved DRC memory we should (up to a point)
* re-negotiate active sessions and reduce their slot usage to make
* rooom for new connections. For now we just fail the create session.
*/

Does that look right?

> + * as well as the encoded SEQUENCE operation response (44 bytes)
> + * which are not included in NFSD_SLOT_CACHE_SIZE.
> + * We err on the side of being a bit small when AUTH_SYS/NULL verifier
> + * is not used.
> *
> * If we run out of reserved DRC memory we should (up to a point) re-negotiate
> * active sessions and reduce their slot usage to make rooom for new
> * connections. For now we just fail the create session.
> */
> -static int set_forechannel_maxreqs(struct nfsd4_channel_attrs *fchan)
> +static int set_forechannel_drc_size(struct nfsd4_channel_attrs *fchan)
> {
> - int mem;
> + int mem, size = fchan->maxresp_cached - NFSD_MIN_HDR_SEQ_SZ;
>
> - if (fchan->maxreqs < 1)
> + if (fchan->maxreqs < 1 || size <= 0)
> return nfserr_inval;

Is it really illegal for the client to request a maxresp_cached less than
NFSD_MIN_HDR_SEQ_SIZE? I think this should just be rounded up to zero.

Also watch out for overflow: if the client gives an extremely large
value for maxresp_cached then we should round it down rather than doing
arithmetic that might result in overflow.

Simplest might be to first round maxresp_cached up to
NFSD_MIN_HDR_SEQ_SIZE, if it's less. *Then* subtract
NFSD_MIN_HDR_SEQ_SIZE. Then round down to NFSD_SLOT_CACHE_SIZE, if it's
too large. And keep all of this signed.

> - else if (fchan->maxreqs > NFSD_MAX_SLOTS_PER_SESSION)
> - fchan->maxreqs = NFSD_MAX_SLOTS_PER_SESSION;
>
> - mem = fchan->maxreqs * NFSD_SLOT_CACHE_SIZE;
> + if (size > NFSD_SLOT_CACHE_SIZE)
> + size = NFSD_SLOT_CACHE_SIZE;
> +
> + /* bound the maxreqs by NFSD_MAX_MEM_PER_SESSION */
> + mem = fchan->maxreqs * size;
> + if (mem > NFSD_MAX_MEM_PER_SESSION) {
> + fchan->maxreqs = NFSD_MAX_MEM_PER_SESSION / size;
> + if (fchan->maxreqs > NFSD_MAX_SLOTS_PER_SESSION)
> + fchan->maxreqs = NFSD_MAX_SLOTS_PER_SESSION;
> + mem = fchan->maxreqs * size;
> + }
>
> spin_lock(&nfsd_drc_lock);
> - if (mem + nfsd_drc_mem_used > nfsd_drc_max_mem)
> - mem = ((nfsd_drc_max_mem - nfsd_drc_mem_used) /
> - NFSD_SLOT_CACHE_SIZE) * NFSD_SLOT_CACHE_SIZE;
> + /* bound the total session drc memory ussage */
> + if (mem + nfsd_drc_mem_used > nfsd_drc_max_mem) {
> + fchan->maxreqs = (nfsd_drc_max_mem - nfsd_drc_mem_used) / size;
> + mem = fchan->maxreqs * size;
> + }
> nfsd_drc_mem_used += mem;
> spin_unlock(&nfsd_drc_lock);
>
> - fchan->maxreqs = mem / NFSD_SLOT_CACHE_SIZE;
> if (fchan->maxreqs == 0)
> - return nfserr_resource;
> + return nfserr_serverfault;

Remind me why serverfault and not resource here?

--b.

> +
> + fchan->maxresp_cached = size + NFSD_MIN_HDR_SEQ_SZ;
> return 0;
> }
>
> @@ -466,9 +492,6 @@ static int init_forechannel_attrs(struct svc_rqst *rqstp,
> fchan->maxresp_sz = maxcount;
> session_fchan->maxresp_sz = fchan->maxresp_sz;
>
> - session_fchan->maxresp_cached = NFSD_SLOT_CACHE_SIZE;
> - fchan->maxresp_cached = session_fchan->maxresp_cached;
> -
> /* Use the client's maxops if possible */
> if (fchan->maxops > NFSD_MAX_OPS_PER_COMPOUND)
> fchan->maxops = NFSD_MAX_OPS_PER_COMPOUND;
> @@ -478,9 +501,12 @@ static int init_forechannel_attrs(struct svc_rqst *rqstp,
> * recover pages from existing sessions. For now fail session
> * creation.
> */
> - status = set_forechannel_maxreqs(fchan);
> + status = set_forechannel_drc_size(fchan);
>
> + session_fchan->maxresp_cached = fchan->maxresp_cached;
> session_fchan->maxreqs = fchan->maxreqs;
> +
> + dprintk("%s status %d\n", __func__, status);
> return status;
> }
>
> diff --git a/include/linux/nfsd/state.h b/include/linux/nfsd/state.h
> index fb0c404..ff0b771 100644
> --- a/include/linux/nfsd/state.h
> +++ b/include/linux/nfsd/state.h
> @@ -92,13 +92,17 @@ struct nfs4_cb_conn {
> struct rpc_cred * cb_cred;
> };
>
> -/* Maximum number of slots per session. 128 is useful for long haul TCP */
> -#define NFSD_MAX_SLOTS_PER_SESSION 128
> +/* Maximum number of slots per session. 160 is useful for long haul TCP */
> +#define NFSD_MAX_SLOTS_PER_SESSION 160
> /* Maximum number of pages per slot cache entry */
> #define NFSD_PAGES_PER_SLOT 1
> #define NFSD_SLOT_CACHE_SIZE PAGE_SIZE
> /* Maximum number of operations per session compound */
> #define NFSD_MAX_OPS_PER_COMPOUND 16
> +/* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
> +#define NFSD_CACHE_SIZE_SLOTS_PER_SESSION 32
> +#define NFSD_MAX_MEM_PER_SESSION \
> + (NFSD_CACHE_SIZE_SLOTS_PER_SESSION * NFSD_SLOT_CACHE_SIZE)
>
> struct nfsd4_cache_entry {
> __be32 ce_status;
> --
> 1.6.2.5
>

2009-08-28 20:56:51

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/5] nfsd41: bound forechannel drc size by memory usage

On Fri, Aug 28, 2009 at 04:41:16PM -0400, bfields wrote:
> Looks basically fine, but there are a few nitpicky problems:
>
> On Thu, Aug 27, 2009 at 12:07:41PM -0400, [email protected] wrote:
> > From: Andy Adamson <[email protected]>
> >
> > By using the requested ca_maxresponsesize_cached * ca_maxresponses to bound
> > a forechannel drc request size, clients can tailor a session to usage.
> >
> > For example, an I/O session (READ/WRITE only) can have a much smaller
> > ca_maxresponsesize_cached (for only WRITE compound responses) and a lot larger
> > ca_maxresponses to service a large in-flight data window.
> >
> > Signed-off-by: Andy Adamson <[email protected]>
> > ---
> > fs/nfsd/nfs4state.c | 60 +++++++++++++++++++++++++++++++------------
> > include/linux/nfsd/state.h | 8 ++++-
> > 2 files changed, 49 insertions(+), 19 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index ddfd36f..a691139 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -414,34 +414,60 @@ gen_sessionid(struct nfsd4_session *ses)
> > }
> >
> > /*
> > - * Give the client the number of slots it requests bound by
> > - * NFSD_MAX_SLOTS_PER_SESSION and by nfsd_drc_max_mem.
> > + * 32 bytes of RPC header and 44 bytes of sequence operation response
> > + * not included in NFSD_SLOT_CACHE_SIZE
> > + * */
> > +#define NFSD_MIN_HDR_SEQ_SZ (32 + 44)
>
> I took a look at a trace and got 24 for the rpc header (xid through
> accept state), 12 for compound header (status, empty tag, opcount), and
> 44 for the sequence response (opcode through status flags), 80
> together--could you double check this?
>
> > +
> > +/*
> > + * Give the client the number of ca_maxresponsesize_cached slots it requests
> > + * bounded by NFSD_SLOT_CACHE_SIZE, NFSD_MAX_MEM_PER_SESSION and by
> > + * nfsd_drc_max_mem. Do not allow more than NFSD_MAX_SLOTS_PER_SESSION.
> > + *
> > + * The ca_maxresponssize_cached definition includes the size
> > + * of the rpc header with the variable length security flavor credential
> > + * plus verifier (32 bytes with AUTH_SYS and NULL verifier)
>
> Note there's no credential in the response, just a verifier. My
> attempt at these two comments:
>
> /*
> * The protocol defines ca_maxresponssize_cached to include the size of
> * the rpc header, but all we need to cache is the data starting after
> * the end of the initial SEQUENCE operation--the rest we regenerate
> * each time. Therefore we can advertise a ca_maxresponssize_cached
> * value that is the number of bytes in our cache plus a few additional
> * bytes. In order to stay on the safe side, and not promise more than
> * we can cache, those additional bytes must be the minimum possible: 24
> * bytes of rpc header (xid through accept state, with AUTH_NULL
> * verifier), 12 for the compound header (with zero-length tag), and 44
> * for the SEQUENCE op response:
> */
> #define NFSD_MIN_HDR_SEQ_SZ (24 + 12 + 44)
>
> /*
> * Give the client the number of ca_maxresponsesize_cached slots it
> * requests, of size bounded by NFSD_SLOT_CACHE_SIZE,
> * NFSD_MAX_MEM_PER_SESSION, and nfsd_drc_max_mem. Do not allow more
> * than NFSD_MAX_SLOTS_PER_SESSION.
> *
> * If we run out of reserved DRC memory we should (up to a point)
> * re-negotiate active sessions and reduce their slot usage to make
> * rooom for new connections. For now we just fail the create session.
> */
>
> Does that look right?
>
> > + * as well as the encoded SEQUENCE operation response (44 bytes)
> > + * which are not included in NFSD_SLOT_CACHE_SIZE.
> > + * We err on the side of being a bit small when AUTH_SYS/NULL verifier
> > + * is not used.
> > *
> > * If we run out of reserved DRC memory we should (up to a point) re-negotiate
> > * active sessions and reduce their slot usage to make rooom for new
> > * connections. For now we just fail the create session.
> > */
> > -static int set_forechannel_maxreqs(struct nfsd4_channel_attrs *fchan)
> > +static int set_forechannel_drc_size(struct nfsd4_channel_attrs *fchan)
> > {
> > - int mem;
> > + int mem, size = fchan->maxresp_cached - NFSD_MIN_HDR_SEQ_SZ;
> >
> > - if (fchan->maxreqs < 1)
> > + if (fchan->maxreqs < 1 || size <= 0)
> > return nfserr_inval;
>
> Is it really illegal for the client to request a maxresp_cached less than
> NFSD_MIN_HDR_SEQ_SIZE? I think this should just be rounded up to zero.
>
> Also watch out for overflow: if the client gives an extremely large
> value for maxresp_cached then we should round it down rather than doing
> arithmetic that might result in overflow.
>
> Simplest might be to first round maxresp_cached up to
> NFSD_MIN_HDR_SEQ_SIZE, if it's less. *Then* subtract
> NFSD_MIN_HDR_SEQ_SIZE. Then round down to NFSD_SLOT_CACHE_SIZE, if it's
> too large. And keep all of this signed.
>
> > - else if (fchan->maxreqs > NFSD_MAX_SLOTS_PER_SESSION)
> > - fchan->maxreqs = NFSD_MAX_SLOTS_PER_SESSION;
> >
> > - mem = fchan->maxreqs * NFSD_SLOT_CACHE_SIZE;
> > + if (size > NFSD_SLOT_CACHE_SIZE)
> > + size = NFSD_SLOT_CACHE_SIZE;
> > +
> > + /* bound the maxreqs by NFSD_MAX_MEM_PER_SESSION */
> > + mem = fchan->maxreqs * size;
> > + if (mem > NFSD_MAX_MEM_PER_SESSION) {
> > + fchan->maxreqs = NFSD_MAX_MEM_PER_SESSION / size;
> > + if (fchan->maxreqs > NFSD_MAX_SLOTS_PER_SESSION)
> > + fchan->maxreqs = NFSD_MAX_SLOTS_PER_SESSION;
> > + mem = fchan->maxreqs * size;
> > + }
> >
> > spin_lock(&nfsd_drc_lock);
> > - if (mem + nfsd_drc_mem_used > nfsd_drc_max_mem)
> > - mem = ((nfsd_drc_max_mem - nfsd_drc_mem_used) /
> > - NFSD_SLOT_CACHE_SIZE) * NFSD_SLOT_CACHE_SIZE;
> > + /* bound the total session drc memory ussage */
> > + if (mem + nfsd_drc_mem_used > nfsd_drc_max_mem) {
> > + fchan->maxreqs = (nfsd_drc_max_mem - nfsd_drc_mem_used) / size;
> > + mem = fchan->maxreqs * size;
> > + }
> > nfsd_drc_mem_used += mem;
> > spin_unlock(&nfsd_drc_lock);
> >
> > - fchan->maxreqs = mem / NFSD_SLOT_CACHE_SIZE;
> > if (fchan->maxreqs == 0)
> > - return nfserr_resource;
> > + return nfserr_serverfault;
>
> Remind me why serverfault and not resource here?

Whoops, OK, I see that one's answered later--it's not a legal error any
more.

--b.

2009-09-08 14:43:06

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [pnfs] [PATCH 0/5] NFSv4.1 DRC rewrite version 6

On Fri, Aug 28, 2009 at 07:07:27PM -0400, bfields wrote:
> On Fri, Aug 28, 2009 at 06:56:08PM -0400, bfields wrote:
> > On Fri, Aug 28, 2009 at 06:42:48PM -0400, J. Bruce Fields wrote:
> > > By the way, any hints on getting the 4.1 pynfs stuff running?
> >
> > OK, solution seemed to be
> >
> > cd gssapi
> > python setup.py build_ext --inplace
> >
> > But now almost everything fails.with SERVERFAULT. Bah.
>
> Does anyone have a list of which tests are expected to pass at this point? If
> I ask for "all" tests, I get "4 skipped, 103 failed, 0 warned, 26 passed--full
> list available if anyone wants. I seem to be passing connectathon tests
> OK over 4.1.

If someone could just send a copy of what they consider succesful (for
now) pynfs output, that would be useful.

--b.

2009-09-01 13:48:39

by Andy Adamson

[permalink] [raw]
Subject: Re: [pnfs] [PATCH 5/5] nfsd41: replace page based DRC with buffer based DRC

On Fri, Aug 28, 2009 at 5:33 PM, J. Bruce Fields<[email protected]> =
wrote:
> On Thu, Aug 27, 2009 at 12:07:44PM -0400, [email protected] wrote:
>> From: Andy Adamson <[email protected]>
>>
>> Use NFSD_SLOT_CACHE_SIZE size buffers for sessions DRC instead of ho=
lding nfsd
>> pages in cache.
>>
>> Connectathon testing has shown that 1024 bytes for encoded compound =
operation
>> responses past the sequence operation is sufficient, 512 bytes is a =
little too
>> small. Set NFSD_SLOT_CACHE_SIZE to 1024.
>>
>> Allocate memory for the session DRC in the CREATE_SESSION operation
>> to guarantee that the memory resource is available for caching respo=
nses.
>> Allocate each slot individually in preparation for slot table size n=
egotiation.
>>
>> Remove struct nfsd4_cache_entry and helper functions for the old pag=
e-based
>> DRC.
>>
>> The iov_len calculation in nfs4svc_encode_compoundres is now always
>> correct, clean up the nfs4svc_encode_compoundres session logic.
>>
>> The nfsd4_compound_state statp pointer is also not used.
>> Remove nfsd4_set_statp().
>>
>> Move useful nfsd4_cache_entry fields into nfsd4_slot.
>>
>> Signed-off-by: Andy Adamson <[email protected]
>> ---
>> =A0fs/nfsd/nfs4state.c =A0 =A0 =A0 =A0| =A0207 ++++++++++++---------=
-----------------------
>> =A0fs/nfsd/nfs4xdr.c =A0 =A0 =A0 =A0 =A0| =A0 13 ++--
>> =A0fs/nfsd/nfssvc.c =A0 =A0 =A0 =A0 =A0 | =A0 =A04 -
>> =A0include/linux/nfsd/state.h | =A0 27 ++----
>> =A0include/linux/nfsd/xdr4.h =A0| =A0 =A05 +-
>> =A05 files changed, 74 insertions(+), 182 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 4695cec..2d72d5c 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -510,12 +510,22 @@ static int init_forechannel_attrs(struct svc_r=
qst *rqstp,
>> =A0 =A0 =A0 return status;
>> =A0}
>>
>> +static void
>> +free_session_slots(struct nfsd4_session *ses)
>> +{
>> + =A0 =A0 int i;
>> +
>> + =A0 =A0 for (i =3D 0; i < ses->se_fchannel.maxreqs; i++)
>> + =A0 =A0 =A0 =A0 =A0 =A0 kfree(ses->se_slots[i]);
>> +}
>> +
>> =A0static int
>> =A0alloc_init_session(struct svc_rqst *rqstp, struct nfs4_client *cl=
p,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfsd4_create_session *cses=
)
>> =A0{
>> =A0 =A0 =A0 struct nfsd4_session *new, tmp;
>> - =A0 =A0 int idx, status =3D nfserr_serverfault, slotsize;
>> + =A0 =A0 struct nfsd4_slot *sp;
>> + =A0 =A0 int idx, status =3D nfserr_serverfault, slotsize, cachesiz=
e, i;
>
> Just as a style thing: that list's getting a little long. =A0Could yo=
u
> keep at least "status" on a separate line?
>
>>
>> =A0 =A0 =A0 memset(&tmp, 0, sizeof(tmp));
>>
>> @@ -526,14 +536,23 @@ alloc_init_session(struct svc_rqst *rqstp, str=
uct nfs4_client *clp,
>> =A0 =A0 =A0 if (status)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out;
>>
>> - =A0 =A0 /* allocate struct nfsd4_session and slot table in one pie=
ce */
>> - =A0 =A0 slotsize =3D tmp.se_fchannel.maxreqs * sizeof(struct nfsd4=
_slot);
>> + =A0 =A0 /* allocate struct nfsd4_session and slot table pointers i=
n one piece */
>> + =A0 =A0 slotsize =3D tmp.se_fchannel.maxreqs * sizeof(struct nfsd4=
_slot *);
>> =A0 =A0 =A0 new =3D kzalloc(sizeof(*new) + slotsize, GFP_KERNEL);
>
> I think this is OK for now, but maybe stick something like:
>
> =A0 =A0 =A0 =A0BUILD_BUG_ON(NFSD_MAX_SLOTS_PER_SESSION * sizeof(struc=
t nfsd4_slot)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0+ sizeof(struct nfsd4_=
session) > PAGE_SIZE);
>
> in state.h just to warn anyone who wants to blindly bump up
> NFSD_MAX_SLOTS_PER_SESSION. =A0(It's not really forbidden to kmalloc =
more
> than a page, but it's also not reliable, and if it becomes necessary
> then we'd rather find some way to code around it.)
>
>> =A0 =A0 =A0 if (!new)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out;
>>
>> =A0 =A0 =A0 memcpy(new, &tmp, sizeof(*new));
>>
>> + =A0 =A0 /* allocate each struct nfsd4_slot and data cache in one p=
iece */
>> + =A0 =A0 cachesize =3D new->se_fchannel.maxresp_cached - NFSD_MIN_H=
DR_SEQ_SZ;
>> + =A0 =A0 for (i =3D 0; i < new->se_fchannel.maxreqs; i++) {
>> + =A0 =A0 =A0 =A0 =A0 =A0 sp =3D kzalloc(sizeof(*sp) + cachesize, GF=
P_KERNEL);
>> + =A0 =A0 =A0 =A0 =A0 =A0 if (!sp)
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out_free;
>> + =A0 =A0 =A0 =A0 =A0 =A0 new->se_slots[i] =3D sp;
>> + =A0 =A0 }
>> +
>> =A0 =A0 =A0 new->se_client =3D clp;
>> =A0 =A0 =A0 gen_sessionid(new);
>> =A0 =A0 =A0 idx =3D hash_sessionid(&new->se_sessionid);
>> @@ -550,6 +569,10 @@ alloc_init_session(struct svc_rqst *rqstp, stru=
ct nfs4_client *clp,
>> =A0 =A0 =A0 status =3D nfs_ok;
>> =A0out:
>> =A0 =A0 =A0 return status;
>> +out_free:
>> + =A0 =A0 free_session_slots(new);
>> + =A0 =A0 kfree(new);
>> + =A0 =A0 goto out;
>> =A0}
>>
>> =A0/* caller must hold sessionid_lock */
>> @@ -592,22 +615,16 @@ release_session(struct nfsd4_session *ses)
>> =A0 =A0 =A0 nfsd4_put_session(ses);
>> =A0}
>>
>> -static void nfsd4_release_respages(struct page **respages, short re=
sused);
>> -
>> =A0void
>> =A0free_session(struct kref *kref)
>> =A0{
>> =A0 =A0 =A0 struct nfsd4_session *ses;
>> - =A0 =A0 int i;
>>
>> =A0 =A0 =A0 ses =3D container_of(kref, struct nfsd4_session, se_ref)=
;
>> - =A0 =A0 for (i =3D 0; i < ses->se_fchannel.maxreqs; i++) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 struct nfsd4_cache_entry *e =3D &ses->se_s=
lots[i].sl_cache_entry;
>> - =A0 =A0 =A0 =A0 =A0 =A0 nfsd4_release_respages(e->ce_respages, e->=
ce_resused);
>> - =A0 =A0 }
>> =A0 =A0 =A0 spin_lock(&nfsd_drc_lock);
>> =A0 =A0 =A0 nfsd_drc_mem_used -=3D ses->se_fchannel.maxreqs * NFSD_S=
LOT_CACHE_SIZE;
>> =A0 =A0 =A0 spin_unlock(&nfsd_drc_lock);
>> + =A0 =A0 free_session_slots(ses);
>> =A0 =A0 =A0 kfree(ses);
>> =A0}
>>
>> @@ -964,116 +981,32 @@ out_err:
>> =A0 =A0 =A0 return;
>> =A0}
>>
>> -void
>> -nfsd4_set_statp(struct svc_rqst *rqstp, __be32 *statp)
>> -{
>> - =A0 =A0 struct nfsd4_compoundres *resp =3D rqstp->rq_resp;
>> -
>> - =A0 =A0 resp->cstate.statp =3D statp;
>> -}
>> -
>> -/*
>> - * Dereference the result pages.
>> - */
>> -static void
>> -nfsd4_release_respages(struct page **respages, short resused)
>> -{
>> - =A0 =A0 int i;
>> -
>> - =A0 =A0 dprintk("--> %s\n", __func__);
>> - =A0 =A0 for (i =3D 0; i < resused; i++) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 if (!respages[i])
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 continue;
>> - =A0 =A0 =A0 =A0 =A0 =A0 put_page(respages[i]);
>> - =A0 =A0 =A0 =A0 =A0 =A0 respages[i] =3D NULL;
>> - =A0 =A0 }
>> -}
>> -
>> -static void
>> -nfsd4_copy_pages(struct page **topages, struct page **frompages, sh=
ort count)
>> -{
>> - =A0 =A0 int i;
>> -
>> - =A0 =A0 for (i =3D 0; i < count; i++) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 topages[i] =3D frompages[i];
>> - =A0 =A0 =A0 =A0 =A0 =A0 if (!topages[i])
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 continue;
>> - =A0 =A0 =A0 =A0 =A0 =A0 get_page(topages[i]);
>> - =A0 =A0 }
>> -}
>> -
>> =A0/*
>> - * Cache the reply pages up to NFSD_PAGES_PER_SLOT + 1, clearing th=
e previous
>> - * pages. We add a page to NFSD_PAGES_PER_SLOT for the case where t=
he total
>> - * length of the XDR response is less than se_fmaxresp_cached
>> - * (NFSD_PAGES_PER_SLOT * PAGE_SIZE) but the xdr_buf pages is used =
for a
>> - * of the reply (e.g. readdir).
>> - *
>> - * Store the base and length of the rq_req.head[0] page
>> - * of the NFSv4.1 data, just past the rpc header.
>> + * Cache a reply. nfsd4_check_drc_limit() has bounded the cache siz=
e.
>> =A0 */
>> =A0void
>> =A0nfsd4_store_cache_entry(struct nfsd4_compoundres *resp)
>> =A0{
>> - =A0 =A0 struct nfsd4_cache_entry *entry =3D &resp->cstate.slot->sl=
_cache_entry;
>> - =A0 =A0 struct svc_rqst *rqstp =3D resp->rqstp;
>> - =A0 =A0 struct kvec *resv =3D &rqstp->rq_res.head[0];
>> -
>> - =A0 =A0 dprintk("--> %s entry %p\n", __func__, entry);
>> + =A0 =A0 struct nfsd4_slot *slot =3D resp->cstate.slot;
>> + =A0 =A0 unsigned int base;
>>
>> - =A0 =A0 nfsd4_release_respages(entry->ce_respages, entry->ce_resus=
ed);
>> - =A0 =A0 entry->ce_opcnt =3D resp->opcnt;
>> - =A0 =A0 entry->ce_status =3D resp->cstate.status;
>> + =A0 =A0 dprintk("--> %s slot %p\n", __func__, slot);
>>
>> - =A0 =A0 /*
>> - =A0 =A0 =A0* Don't need a page to cache just the sequence operatio=
n - the slot
>> - =A0 =A0 =A0* does this for us!
>> - =A0 =A0 =A0*/
>> + =A0 =A0 slot->sl_opcnt =3D resp->opcnt;
>> + =A0 =A0 slot->sl_status =3D resp->cstate.status;
>>
>> =A0 =A0 =A0 if (nfsd4_not_cached(resp)) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 entry->ce_resused =3D 0;
>> - =A0 =A0 =A0 =A0 =A0 =A0 entry->ce_rpchdrlen =3D 0;
>> - =A0 =A0 =A0 =A0 =A0 =A0 dprintk("%s Just cache SEQUENCE. ce_cachet=
his %d\n", __func__,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 resp->cstate.slot->sl_cach=
e_entry.ce_cachethis);
>> + =A0 =A0 =A0 =A0 =A0 =A0 slot->sl_datalen =3D 0;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
>> =A0 =A0 =A0 }
>> - =A0 =A0 entry->ce_resused =3D rqstp->rq_resused;
>> - =A0 =A0 if (entry->ce_resused > NFSD_PAGES_PER_SLOT + 1)
>> - =A0 =A0 =A0 =A0 =A0 =A0 entry->ce_resused =3D NFSD_PAGES_PER_SLOT =
+ 1;
>> - =A0 =A0 nfsd4_copy_pages(entry->ce_respages, rqstp->rq_respages,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0entry->ce_resused);
>> - =A0 =A0 entry->ce_datav.iov_base =3D resp->cstate.statp;
>> - =A0 =A0 entry->ce_datav.iov_len =3D resv->iov_len - ((char *)resp-=
>cstate.statp -
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (char *)pa=
ge_address(rqstp->rq_respages[0]));
>> - =A0 =A0 /* Current request rpc header length*/
>> - =A0 =A0 entry->ce_rpchdrlen =3D (char *)resp->cstate.statp -
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (char *)pa=
ge_address(rqstp->rq_respages[0]);
>> -}
>> -
>> -/*
>> - * We keep the rpc header, but take the nfs reply from the replycac=
he.
>> - */
>> -static int
>> -nfsd41_copy_replay_data(struct nfsd4_compoundres *resp,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct nfsd4_cache_entry *=
entry)
>> -{
>> - =A0 =A0 struct svc_rqst *rqstp =3D resp->rqstp;
>> - =A0 =A0 struct kvec *resv =3D &resp->rqstp->rq_res.head[0];
>> - =A0 =A0 int len;
>> -
>> - =A0 =A0 /* Current request rpc header length*/
>> - =A0 =A0 len =3D (char *)resp->cstate.statp -
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (char *)page_address(rqstp=
->rq_respages[0]);
>> - =A0 =A0 if (entry->ce_datav.iov_len + len > PAGE_SIZE) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 dprintk("%s v41 cached reply too large (%Z=
d).\n", __func__,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 entry->ce_datav.iov_len);
>> - =A0 =A0 =A0 =A0 =A0 =A0 return 0;
>> - =A0 =A0 }
>> - =A0 =A0 /* copy the cached reply nfsd data past the current rpc he=
ader */
>> - =A0 =A0 memcpy((char *)resv->iov_base + len, entry->ce_datav.iov_b=
ase,
>> - =A0 =A0 =A0 =A0 =A0 =A0 entry->ce_datav.iov_len);
>> - =A0 =A0 resv->iov_len =3D len + entry->ce_datav.iov_len;
>> - =A0 =A0 return 1;
>> + =A0 =A0 slot->sl_datalen =3D (char *)resp->p - (char *)resp->cstat=
e.datap;
>> + =A0 =A0 base =3D (char *)resp->cstate.datap -
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 (char *)resp->xbuf->head[0].iov_base;
>> + =A0 =A0 if (read_bytes_from_xdr_buf(resp->xbuf, base, slot->sl_dat=
a,
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 sl=
ot->sl_datalen))
>> + =A0 =A0 =A0 =A0 =A0 =A0 printk(KERN_WARNING
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "nfsd: sessions DRC could =
not cache compound\n");
>
> I'd make this WARN("nfsd:...") just to make it completely clear it's =
a
> kernel bug. =A0(This case should be caught by nfsd4_check_drc_limit u=
nless
> we've messed something up, right?)
>
>> + =A0 =A0 return;
>> =A0}
>>
>> =A0/*
>> @@ -1091,14 +1024,14 @@ nfsd4_enc_sequence_replay(struct nfsd4_compo=
undargs *args,
>> =A0 =A0 =A0 struct nfsd4_slot *slot =3D resp->cstate.slot;
>>
>> =A0 =A0 =A0 dprintk("--> %s resp->opcnt %d cachethis %u \n", __func_=
_,
>> - =A0 =A0 =A0 =A0 =A0 =A0 resp->opcnt, resp->cstate.slot->sl_cache_e=
ntry.ce_cachethis);
>> + =A0 =A0 =A0 =A0 =A0 =A0 resp->opcnt, resp->cstate.slot->sl_cacheth=
is);
>>
>> =A0 =A0 =A0 /* Encode the replayed sequence operation */
>> =A0 =A0 =A0 op =3D &args->ops[resp->opcnt - 1];
>> =A0 =A0 =A0 nfsd4_encode_operation(resp, op);
>>
>> =A0 =A0 =A0 /* Return nfserr_retry_uncached_rep in next operation. *=
/
>> - =A0 =A0 if (args->opcnt > 1 && slot->sl_cache_entry.ce_cachethis =3D=
=3D 0) {
>> + =A0 =A0 if (args->opcnt > 1 && slot->sl_cachethis =3D=3D 0) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 op =3D &args->ops[resp->opcnt++];
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 op->status =3D nfserr_retry_uncached_rep=
;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 nfsd4_encode_operation(resp, op);
>> @@ -1107,57 +1040,29 @@ nfsd4_enc_sequence_replay(struct nfsd4_compo=
undargs *args,
>> =A0}
>>
>> =A0/*
>> - * Keep the first page of the replay. Copy the NFSv4.1 data from th=
e first
>> - * cached page. =A0Replace any futher replay pages from the cache.
>> + * The sequence operation is not cached because we can use the slot=
and
>> + * session values.
>> =A0 */
>> =A0__be32
>> =A0nfsd4_replay_cache_entry(struct nfsd4_compoundres *resp,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfsd4_sequence=
*seq)
>> =A0{
>> - =A0 =A0 struct nfsd4_cache_entry *entry =3D &resp->cstate.slot->sl=
_cache_entry;
>> + =A0 =A0 struct nfsd4_slot *slot =3D resp->cstate.slot;
>> =A0 =A0 =A0 __be32 status;
>>
>> - =A0 =A0 dprintk("--> %s entry %p\n", __func__, entry);
>> -
>> - =A0 =A0 /*
>> - =A0 =A0 =A0* If this is just the sequence operation, we did not ke=
ep
>> - =A0 =A0 =A0* a page in the cache entry because we can just use the
>> - =A0 =A0 =A0* slot info stored in struct nfsd4_sequence that was ch=
ecked
>> - =A0 =A0 =A0* against the slot in nfsd4_sequence().
>> - =A0 =A0 =A0*
>> - =A0 =A0 =A0* This occurs when seq->cachethis is FALSE, or when the=
client
>> - =A0 =A0 =A0* session inactivity timer fires and a solo sequence op=
eration
>> - =A0 =A0 =A0* is sent (lease renewal).
>> - =A0 =A0 =A0*/
>> + =A0 =A0 dprintk("--> %s slot %p\n", __func__, slot);
>>
>> =A0 =A0 =A0 /* Either returns 0 or nfserr_retry_uncached */
>> =A0 =A0 =A0 status =3D nfsd4_enc_sequence_replay(resp->rqstp->rq_arg=
p, resp);
>> =A0 =A0 =A0 if (status =3D=3D nfserr_retry_uncached_rep)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return status;
>>
>> - =A0 =A0 if (!nfsd41_copy_replay_data(resp, entry)) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 /*
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0* Not enough room to use the replay rpc=
header, send the
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0* cached header. Release all the alloca=
ted result pages.
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
>> - =A0 =A0 =A0 =A0 =A0 =A0 svc_free_res_pages(resp->rqstp);
>> - =A0 =A0 =A0 =A0 =A0 =A0 nfsd4_copy_pages(resp->rqstp->rq_respages,=
entry->ce_respages,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 entry->ce_resused);
>> - =A0 =A0 } else {
>> - =A0 =A0 =A0 =A0 =A0 =A0 /* Release all but the first allocated res=
ult page */
>> -
>> - =A0 =A0 =A0 =A0 =A0 =A0 resp->rqstp->rq_resused--;
>> - =A0 =A0 =A0 =A0 =A0 =A0 svc_free_res_pages(resp->rqstp);
>> -
>> - =A0 =A0 =A0 =A0 =A0 =A0 nfsd4_copy_pages(&resp->rqstp->rq_respages=
[1],
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0&entry-=
>ce_respages[1],
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0entry->=
ce_resused - 1);
>> - =A0 =A0 }
>> + =A0 =A0 /* The sequence operation has been encoded, cstate->datap =
set. */
>> + =A0 =A0 memcpy(resp->cstate.datap, slot->sl_data, slot->sl_datalen=
);
>>
>> - =A0 =A0 resp->rqstp->rq_resused =3D entry->ce_resused;
>> - =A0 =A0 resp->opcnt =3D entry->ce_opcnt;
>> - =A0 =A0 resp->cstate.iovlen =3D entry->ce_datav.iov_len + entry->c=
e_rpchdrlen;
>> - =A0 =A0 status =3D entry->ce_status;
>> + =A0 =A0 resp->opcnt =3D slot->sl_opcnt;
>> + =A0 =A0 resp->p =3D resp->cstate.datap + XDR_QUADLEN(slot->sl_data=
len);
>> + =A0 =A0 status =3D slot->sl_status;
>>
>> =A0 =A0 =A0 return status;
>> =A0}
>> @@ -1489,7 +1394,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
>> =A0 =A0 =A0 if (seq->slotid >=3D session->se_fchannel.maxreqs)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out;
>>
>> - =A0 =A0 slot =3D &session->se_slots[seq->slotid];
>> + =A0 =A0 slot =3D session->se_slots[seq->slotid];
>> =A0 =A0 =A0 dprintk("%s: slotid %d\n", __func__, seq->slotid);
>>
>> =A0 =A0 =A0 /* We do not negotiate the number of slots yet, so set t=
he
>> @@ -1502,7 +1407,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 cstate->slot =3D slot;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 cstate->session =3D session;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Return the cached reply status and se=
t cstate->status
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0* for nfsd4_svc_encode_compoundres proc=
essing */
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0* for nfsd4_proc_compound processing */
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 status =3D nfsd4_replay_cache_entry(resp=
, seq);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 cstate->status =3D nfserr_replay_cache;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto replay_cache;
>> @@ -1513,7 +1418,7 @@ nfsd4_sequence(struct svc_rqst *rqstp,
>> =A0 =A0 =A0 /* Success! bump slot seqid */
>> =A0 =A0 =A0 slot->sl_inuse =3D true;
>> =A0 =A0 =A0 slot->sl_seqid =3D seq->seqid;
>> - =A0 =A0 slot->sl_cache_entry.ce_cachethis =3D seq->cachethis;
>> + =A0 =A0 slot->sl_cachethis =3D seq->cachethis;
>>
>> =A0 =A0 =A0 cstate->slot =3D slot;
>> =A0 =A0 =A0 cstate->session =3D session;
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index fdf632b..49824ea 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -3064,6 +3064,7 @@ nfsd4_encode_sequence(struct nfsd4_compoundres=
*resp, int nfserr,
>> =A0 =A0 =A0 WRITE32(0);
>>
>> =A0 =A0 =A0 ADJUST_ARGS();
>> + =A0 =A0 resp->cstate.datap =3D p; /* DRC cache data pointer */
>> =A0 =A0 =A0 return 0;
>> =A0}
>>
>> @@ -3166,7 +3167,7 @@ static int nfsd4_check_drc_limit(struct nfsd4_=
compoundres *resp)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return status;
>>
>> =A0 =A0 =A0 session =3D resp->cstate.session;
>> - =A0 =A0 if (session =3D=3D NULL || slot->sl_cache_entry.ce_cacheth=
is =3D=3D 0)
>> + =A0 =A0 if (session =3D=3D NULL || slot->sl_cachethis =3D=3D 0)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return status;
>>
>> =A0 =A0 =A0 if (resp->opcnt >=3D args->opcnt)
>> @@ -3291,6 +3292,7 @@ nfs4svc_encode_compoundres(struct svc_rqst *rq=
stp, __be32 *p, struct nfsd4_compo
>> =A0 =A0 =A0 /*
>> =A0 =A0 =A0 =A0* All that remains is to write the tag and operation =
count...
>> =A0 =A0 =A0 =A0*/
>> + =A0 =A0 struct nfsd4_compound_state *cs =3D &resp->cstate;
>> =A0 =A0 =A0 struct kvec *iov;
>> =A0 =A0 =A0 p =3D resp->tagp;
>> =A0 =A0 =A0 *p++ =3D htonl(resp->taglen);
>> @@ -3304,14 +3306,11 @@ nfs4svc_encode_compoundres(struct svc_rqst *=
rqstp, __be32 *p, struct nfsd4_compo
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 iov =3D &rqstp->rq_res.head[0];
>> =A0 =A0 =A0 iov->iov_len =3D ((char*)resp->p) - (char*)iov->iov_base=
;
>> =A0 =A0 =A0 BUG_ON(iov->iov_len > PAGE_SIZE);
>> - =A0 =A0 if (nfsd4_has_session(&resp->cstate)) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 if (resp->cstate.status =3D=3D nfserr_repl=
ay_cache &&
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 !nfsd4_not=
_cached(resp)) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 iov->iov_len =3D resp->cst=
ate.iovlen;
>> - =A0 =A0 =A0 =A0 =A0 =A0 } else {
>> + =A0 =A0 if (nfsd4_has_session(cs)) {
>> + =A0 =A0 =A0 =A0 =A0 =A0 if (cs->status !=3D nfserr_replay_cache) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 nfsd4_store_cache_entry(=
resp);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 dprintk("%s: SET SLOT ST=
ATE TO AVAILABLE\n", __func__);
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 resp->cstate.slot->sl_inus=
e =3D 0;
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 resp->cstate.slot->sl_inus=
e =3D false;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 nfsd4_put_session(resp->cstate.session);
>> =A0 =A0 =A0 }
>> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
>> index d68cd05..944ef01 100644
>> --- a/fs/nfsd/nfssvc.c
>> +++ b/fs/nfsd/nfssvc.c
>> @@ -576,10 +576,6 @@ nfsd_dispatch(struct svc_rqst *rqstp, __be32 *s=
tatp)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 + rqstp->rq_res.head[0].iov_len;
>> =A0 =A0 =A0 rqstp->rq_res.head[0].iov_len +=3D sizeof(__be32);
>>
>> - =A0 =A0 /* NFSv4.1 DRC requires statp */
>> - =A0 =A0 if (rqstp->rq_vers =3D=3D 4)
>> - =A0 =A0 =A0 =A0 =A0 =A0 nfsd4_set_statp(rqstp, statp);
>> -
>> =A0 =A0 =A0 /* Now call the procedure handler, and encode NFS status=
=2E */
>> =A0 =A0 =A0 nfserr =3D proc->pc_func(rqstp, rqstp->rq_argp, rqstp->r=
q_resp);
>> =A0 =A0 =A0 nfserr =3D map_new_errors(rqstp->rq_vers, nfserr);
>> diff --git a/include/linux/nfsd/state.h b/include/linux/nfsd/state.h
>> index ff0b771..e745100 100644
>> --- a/include/linux/nfsd/state.h
>> +++ b/include/linux/nfsd/state.h
>> @@ -94,30 +94,23 @@ struct nfs4_cb_conn {
>>
>> =A0/* Maximum number of slots per session. 160 is useful for long ha=
ul TCP */
>> =A0#define NFSD_MAX_SLOTS_PER_SESSION =A0 =A0 160
>> -/* Maximum number of pages per slot cache entry */
>> -#define NFSD_PAGES_PER_SLOT =A01
>> -#define NFSD_SLOT_CACHE_SIZE =A0 =A0 =A0 =A0 PAGE_SIZE
>> =A0/* Maximum number of operations per session compound */
>> =A0#define NFSD_MAX_OPS_PER_COMPOUND =A0 =A016
>> +/* Maximum =A0session per slot cache size */
>> +#define NFSD_SLOT_CACHE_SIZE =A0 =A0 =A0 =A0 1024
>> =A0/* Maximum number of NFSD_SLOT_CACHE_SIZE slots per session */
>> =A0#define NFSD_CACHE_SIZE_SLOTS_PER_SESSION =A0 =A032
>> =A0#define NFSD_MAX_MEM_PER_SESSION =A0\
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 (NFSD_CACHE_SIZE_SLOTS_PER_SESSION * NFS=
D_SLOT_CACHE_SIZE)
>>
>> -struct nfsd4_cache_entry {
>> - =A0 =A0 __be32 =A0 =A0 =A0 =A0 =A0ce_status;
>> - =A0 =A0 struct kvec =A0 =A0 ce_datav; /* encoded NFSv4.1 data in r=
q_res.head[0] */
>> - =A0 =A0 struct page =A0 =A0 *ce_respages[NFSD_PAGES_PER_SLOT + 1];
>> - =A0 =A0 int =A0 =A0 =A0 =A0 =A0 =A0 ce_cachethis;
>> - =A0 =A0 short =A0 =A0 =A0 =A0 =A0 ce_resused;
>> - =A0 =A0 int =A0 =A0 =A0 =A0 =A0 =A0 ce_opcnt;
>> - =A0 =A0 int =A0 =A0 =A0 =A0 =A0 =A0 ce_rpchdrlen;
>> -};
>> -
>> =A0struct nfsd4_slot {
>> - =A0 =A0 bool =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
sl_inuse;
>> - =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
sl_seqid;
>> - =A0 =A0 struct nfsd4_cache_entry =A0 =A0 =A0 =A0sl_cache_entry;
>> + =A0 =A0 bool =A0 =A0sl_inuse;
>> + =A0 =A0 u32 =A0 =A0 sl_seqid;
>> + =A0 =A0 int =A0 =A0 sl_cachethis;
>> + =A0 =A0 int =A0 =A0 sl_opcnt;
>> + =A0 =A0 __be32 =A0sl_status;
>> + =A0 =A0 u32 =A0 =A0 sl_datalen;
>> + =A0 =A0 char =A0 =A0sl_data[];
>
> Could you just move sl_inuse to the end? =A0It'll save a few bytes in=
the
> structure (because the compiler will probably stick 3 bytes after it =
to
> align sl_seqid.)

How about this?

struct nfsd4_slot {
- bool sl_inuse;
- u32 sl_seqid;
- struct nfsd4_cache_entry sl_cache_entry;
+ bool sl_inuse;
+ bool sl_cachethis;
+ u16 sl_opcnt;
+ u32 sl_seqid;
+ __be32 sl_status;
+ u32 sl_datalen;
+ char sl_data[];
};

-->Andy

> --b.
>
>> =A0};
>>
>> =A0struct nfsd4_channel_attrs {
>> @@ -159,7 +152,7 @@ struct nfsd4_session {
>> =A0 =A0 =A0 struct nfs4_sessionid =A0 se_sessionid;
>> =A0 =A0 =A0 struct nfsd4_channel_attrs se_fchannel;
>> =A0 =A0 =A0 struct nfsd4_channel_attrs se_bchannel;
>> - =A0 =A0 struct nfsd4_slot =A0 =A0 =A0 se_slots[]; =A0 =A0 /* forwa=
rd channel slots */
>> + =A0 =A0 struct nfsd4_slot =A0 =A0 =A0 *se_slots[]; =A0 =A0/* forwa=
rd channel slots */
>> =A0};
>>
>> =A0static inline void
>> diff --git a/include/linux/nfsd/xdr4.h b/include/linux/nfsd/xdr4.h
>> index 3f71660..73164c2 100644
>> --- a/include/linux/nfsd/xdr4.h
>> +++ b/include/linux/nfsd/xdr4.h
>> @@ -51,7 +51,7 @@ struct nfsd4_compound_state {
>> =A0 =A0 =A0 /* For sessions DRC */
>> =A0 =A0 =A0 struct nfsd4_session =A0 =A0*session;
>> =A0 =A0 =A0 struct nfsd4_slot =A0 =A0 =A0 *slot;
>> - =A0 =A0 __be32 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*statp;
>> + =A0 =A0 __be32 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*datap;
>> =A0 =A0 =A0 size_t =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0iovlen;
>> =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 minorversion=
;
>> =A0 =A0 =A0 u32 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 status;
>> @@ -472,8 +472,7 @@ static inline bool nfsd4_is_solo_sequence(struct=
nfsd4_compoundres *resp)
>>
>> =A0static inline bool nfsd4_not_cached(struct nfsd4_compoundres *res=
p)
>> =A0{
>> - =A0 =A0 return !resp->cstate.slot->sl_cache_entry.ce_cachethis ||
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 nfsd4_is_solo_sequence(res=
p);
>> + =A0 =A0 return !resp->cstate.slot->sl_cachethis || nfsd4_is_solo_s=
equence(resp);
>> =A0}
>>
>> =A0#define NFS4_SVC_XDRSIZE =A0 =A0 =A0 =A0 =A0 =A0 sizeof(struct nf=
sd4_compoundargs)
>> --
>> 1.6.2.5
>>
> _______________________________________________
> pNFS mailing list
> [email protected]
> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
>