2010-05-17 17:56:51

by Alexandros Batsakis

[permalink] [raw]
Subject: [PATCH 0/8] pnfs-submit: Forgetful cleint and some layout cleanups

This set of patches includes a first attempt to implement the forgetful client model for the pNFS client. The model is explained is patch 7.
It also includes some minor cleanups in the layout management code that help to improve the maintanability of the current code.

Passed cthon tests against the pyNFS server, and against a modified version of the pyNFS server that randomly issues recalls after opens.

Alexandros Batsakis (8):
pnfs-submit: clean struct nfs_inode
pnfs-submit: clean locking infrastructure
pnfs-submit: remove lgetcount, lretcount (outstanding
LAYOUTGETs/LAYOUTRETUNs)
pnfs-submit: change stateid to be a union
pnfs-submit: request whole file layouts only
pnfs-submit: change layouts list to be similar to the other state
list management
pnfs-submit: forgetful client model
pnfs-submit: support for cb_recall_any (layouts)

fs/nfs/callback.h | 7 +
fs/nfs/callback_proc.c | 210 +++++++++++++++++++++++-------
fs/nfs/callback_xdr.c | 2 +-
fs/nfs/client.c | 2 +-
fs/nfs/delegation.c | 19 ++-
fs/nfs/inode.c | 17 ++-
fs/nfs/nfs4_fs.h | 1 +
fs/nfs/nfs4proc.c | 50 ++++---
fs/nfs/nfs4state.c | 8 +-
fs/nfs/nfs4xdr.c | 38 +++---
fs/nfs/pnfs.c | 310 ++++++++++++++++++++++-----------------------
fs/nfs/pnfs.h | 11 +-
fs/nfs/write.c | 3 +-
fs/nfsd/nfs4callback.c | 1 -
include/linux/nfs4.h | 18 +++-
include/linux/nfs_fs.h | 30 ++---
include/linux/nfs_fs_sb.h | 2 +-
17 files changed, 432 insertions(+), 297 deletions(-)



2010-05-17 20:38:55

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 7/8] pnfs-submit: forgetful client model

On Mon, May 17, 2010 at 10:56:43AM -0700, Alexandros Batsakis wrote:
> Forgetful client model:
>
> If we receive a CB_LAYOUTRECALL
> - we spawn a thread to handle the recall
> (xxx: now only one recall can be active at a time, else NFS4ERR_DELAY)
> - we check the stateid seqid
> if it does not match we return NFS4ERR_DELAY
> - we check for pending I/O
> if there is we return NFS4ERR_DELAY
> Else we return NO_MATCHING_LAYOUT.
> Note that for whole file layouts there is no need to serialize LAYOUTGETs/LAYOUTRETURNs
> For bulk layouts, if there is a layout active, we return NFS4_OK and we start
> cleaning the layouts asynchronously. At the end we send a bulk LAYOUTRETURN.
> Note that there is no need to prevent any new LAYOUTGETs explicitly as the server should
> reject them.

There a lot of discussion over the decision to adopt the forgetful
client, and also some over when (and whether) to return NFS4ERR_DELAY.
Would it be possible to also include a brief summary of the
justification for the choices made?

(And ideally to include it either in a block comment somewhere or in
Documentation/, not just in a changelog comment?)

--b.

>
> Signed-off-by: Alexandros Batsakis <[email protected]>
> ---
> fs/nfs/callback_proc.c | 131 ++++++++++++++++++++++++++++++++++--------------
> fs/nfs/inode.c | 2 +-
> fs/nfs/nfs4_fs.h | 1 +
> fs/nfs/nfs4proc.c | 4 +-
> fs/nfs/nfs4state.c | 2 +-
> fs/nfs/pnfs.c | 78 +++++++++++++---------------
> fs/nfs/pnfs.h | 8 ++-
> 7 files changed, 139 insertions(+), 87 deletions(-)
>
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index 24e2571..419fee7 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -129,6 +129,38 @@ int nfs4_validate_delegation_stateid(struct nfs_delegation *delegation, const nf
>
> #if defined(CONFIG_NFS_V4_1)
>
> +static int
> +pnfs_is_next_layout_stateid(const struct pnfs_layout_type *lo,
> + const nfs4_stateid stateid)
> +{
> + int seq;
> + int res;
> + u32 oldseqid, newseqid;
> +
> + do {
> + seq = read_seqbegin(&lo->seqlock);
> + oldseqid = ntohl(lo->stateid.u.stateid.seqid);
> + newseqid = ntohl(stateid.u.stateid.seqid);
> + res = !memcmp(lo->stateid.u.stateid.other,
> + stateid.u.stateid.other,
> + NFS4_STATEID_OTHER_SIZE);
> + if (res) { /* comparing layout stateids */
> + if (oldseqid == ~0)
> + res = (newseqid == 1);
> + else
> + res = (newseqid == oldseqid + 1);
> + } else { /* open stateid */
> + res = !memcmp(lo->stateid.u.data,
> + &zero_stateid,
> + NFS4_STATEID_SIZE);
> + if (res)
> + res = (newseqid == 1);
> + }
> + } while (read_seqretry(&lo->seqlock, seq));
> +
> + return res;
> +}
> +
> /*
> * Retrieve an inode based on layout recall parameters
> *
> @@ -190,10 +222,11 @@ static int pnfs_recall_layout(void *data)
> {
> struct inode *inode, *ino;
> struct nfs_client *clp;
> + struct nfs4_pnfs_layoutreturn *lrp;
> struct cb_pnfs_layoutrecallargs rl;
> struct recall_layout_threadargs *args =
> (struct recall_layout_threadargs *)data;
> - int status;
> + int status = 0;
>
> daemonize("nfsv4-layoutreturn");
>
> @@ -204,46 +237,57 @@ static int pnfs_recall_layout(void *data)
> clp = args->clp;
> inode = args->inode;
> rl = *args->rl;
> - args->result = 0;
> - complete(&args->started);
> - args = NULL;
> - /* Note: args must not be used after this point!!! */
> -
> -/* FIXME: need barrier here:
> - pause I/O to data servers
> - pause layoutgets
> - drain all outstanding writes to storage devices
> - wait for any outstanding layoutreturns and layoutgets mentioned in
> - cb_sequence.
> - then return layouts, resume after layoutreturns complete
> - */
>
> /* support whole file layouts only */
> rl.cbl_seg.offset = 0;
> rl.cbl_seg.length = NFS4_MAX_UINT64;
>
> if (rl.cbl_recall_type == RETURN_FILE) {
> - status = pnfs_return_layout(inode, &rl.cbl_seg, &rl.cbl_stateid,
> - RETURN_FILE);
> + if (pnfs_is_next_layout_stateid(&NFS_I(inode)->layout, rl.cbl_stateid))
> + status = pnfs_return_layout(inode, &rl.cbl_seg, &rl.cbl_stateid,
> + RETURN_FILE, 0);
> + else
> + status = htonl(NFS4ERR_DELAY);
> + /* forgetful client */
> if (status)
> - dprintk("%s RETURN_FILE error: %d\n", __func__, status);
> + dprintk("%s RETURN_FILE : %d\n", __func__, status);
> + else
> + status = htonl(NFS4ERR_NOMATCHING_LAYOUT);
> + args->result = status;
> + complete(&args->started);
> goto out;
> }
>
> - /* FIXME: This loop is inefficient, running in O(|s_inodes|^2) */
> + status = htonl(NFS4_OK);
> + args->result = status;
> + complete(&args->started);
> + args = NULL;
> +
> while ((ino = nfs_layoutrecall_find_inode(clp, &rl)) != NULL) {
> /* XXX need to check status on pnfs_return_layout */
> - pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE);
> + pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE, 0);
> iput(ino);
> }
>
> - /* send final layoutreturn */
> - status = pnfs_return_layout(inode, &rl.cbl_seg, NULL, rl.cbl_recall_type);
> - if (status)
> - printk(KERN_INFO "%s: ignoring pnfs_return_layout status=%d\n",
> - __func__, status);
> + lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
> + if (!lrp) {
> + dprintk("%s: allocation failed. Cannot send last LAYOUTRETURN\n",
> + __func__);
> + goto out;
> + }
> +
> + /* send final layoutreturn (bulk) */
> + lrp->args.reclaim = 0;
> + lrp->args.layout_type = rl.cbl_layout_type;
> + lrp->args.return_type = rl.cbl_recall_type;
> + lrp->args.lseg = rl.cbl_seg;
> + lrp->args.inode = inode;
> + lrp->lo = NULL;
> +
> + pnfs4_proc_layoutreturn(lrp);
> out:
> - iput(inode);
> + clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
> + nfs_put_client(clp);
> module_put_and_exit(0);
> dprintk("%s: exit status %d\n", __func__, 0);
> return 0;
> @@ -261,15 +305,18 @@ static int pnfs_async_return_layout(struct nfs_client *clp, struct inode *inode,
> .rl = rl,
> };
> struct task_struct *t;
> - int status;
> -
> - /* should have returned NFS4ERR_NOMATCHING_LAYOUT... */
> - BUG_ON(inode == NULL);
> + int status = htonl(NFS4ERR_DELAY);
>
> dprintk("%s: -->\n", __func__);
>
> + /* XXX: do not allow two concurrent layout recalls */
> + if (test_and_set_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state))
> + return status;
> +
> init_completion(&data.started);
> __module_get(THIS_MODULE);
> + if (!atomic_inc_not_zero(&clp->cl_count))
> + goto out_put_no_client;
>
> t = kthread_run(pnfs_recall_layout, &data, "%s", "pnfs_recall_layout");
> if (IS_ERR(t)) {
> @@ -283,6 +330,9 @@ static int pnfs_async_return_layout(struct nfs_client *clp, struct inode *inode,
> wait_for_completion(&data.started);
> return data.result;
> out_module_put:
> + nfs_put_client(clp);
> +out_put_no_client:
> + clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
> module_put(THIS_MODULE);
> return status;
> }
> @@ -309,19 +359,24 @@ __be32 pnfs_cb_layoutrecall(struct cb_pnfs_layoutrecallargs *args,
> do {
> struct nfs_client *prev = clp;
> num_client++;
> - inode = nfs_layoutrecall_find_inode(clp, args);
> - if (inode != NULL) {
> - if (PNFS_LD(&NFS_I(inode)->layout)->id ==
> - args->cbl_layout_type) {
> - /* Set up a helper thread to actually
> - * return the delegation */
> + /* the callback must come from the MDS personality */
> + if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_MDS))
> + goto loop;
> + if (args->cbl_recall_type == RETURN_FILE) {
> + inode = nfs_layoutrecall_find_inode(clp, args);
> + if (inode != NULL) {
> res = pnfs_async_return_layout(clp, inode, args);
> - if (res != 0)
> - res = htonl(NFS4ERR_RESOURCE);
> - break;
> + iput(inode);
> }
> + } else {
> + /* we need the inode to get the nfs_server struct */
> + inode = nfs_layoutrecall_find_inode(clp, args);
> + if (!inode)
> + goto loop;
> + res = pnfs_async_return_layout(clp, inode, args);
> iput(inode);
> }
> +loop:
> clp = nfs_find_client_next(prev);
> nfs_put_client(prev);
> } while (clp != NULL);
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index adf3280..584c9b8 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -1321,7 +1321,7 @@ void nfs4_clear_inode(struct inode *inode)
> /* First call standard NFS clear_inode() code */
> nfs_clear_inode(inode);
> #ifdef CONFIG_NFS_V4_1
> - pnfs_return_layout(inode, NULL, NULL, RETURN_FILE);
> + pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, 1);
> #endif /* CONFIG_NFS_V4_1 */
> }
> #endif
> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> index 40ebe1d..90c57b3 100644
> --- a/fs/nfs/nfs4_fs.h
> +++ b/fs/nfs/nfs4_fs.h
> @@ -47,6 +47,7 @@ enum nfs4_client_state {
> NFS4CLNT_SESSION_RESET,
> NFS4CLNT_SESSION_DRAINING,
> NFS4CLNT_RECALL_SLOT,
> + NFS4CLNT_LAYOUT_RECALL,
> };
>
> /*
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6739d15..14e90e1 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -1085,7 +1085,7 @@ static void pnfs4_layout_reclaim(struct nfs4_state *state)
> /* FIXME: send gratuitous layout commits and return with the reclaim
> * flag during grace period
> */
> - pnfs_return_layout(state->inode, NULL, &state->open_stateid, RETURN_FILE);
> + pnfs_return_layout(state->inode, NULL, NULL, RETURN_FILE, 1);
> pnfs_set_layout_stateid(&NFS_I(state->inode)->layout, &zero_stateid);
> #endif /* CONFIG_NFS_V4_1 */
> }
> @@ -2382,7 +2382,7 @@ pnfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
>
> if (pnfs_enabled_sb(server) && has_layout(nfsi) &&
> pnfs_ld_layoutret_on_setattr(server->pnfs_curr_ld))
> - pnfs_return_layout(inode, NULL, NULL, RETURN_FILE);
> + pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, 1);
> return nfs4_proc_setattr(dentry, fattr, sattr);
> }
> #endif /* CONFIG_NFS_V4_1 */
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 2d52300..cd2ae83 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -598,7 +598,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm
> range.offset = 0;
> range.length = NFS4_MAX_UINT64;
> pnfs_return_layout(state->inode, &range, NULL,
> - RETURN_FILE);
> + RETURN_FILE, 1);
> }
> #endif /* CONFIG_NFS_V4_1 */
> nfs4_do_close(path, state, wait);
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 3b4dea0..2c96723 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -610,7 +610,7 @@ get_layout(struct inode *ino,
> */
> static inline int
> should_free_lseg(struct pnfs_layout_segment *lseg,
> - struct nfs4_pnfs_layout_segment *range)
> + struct nfs4_pnfs_layout_segment *range)
> {
> return (range->iomode == IOMODE_ANY ||
> lseg->range.iomode == range->iomode) &&
> @@ -703,6 +703,8 @@ return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
>
> dprintk("--> %s\n", __func__);
>
> + BUG_ON(type != RETURN_FILE);
> +
> lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
> if (lrp == NULL) {
> if (lo && (type == RETURN_FILE))
> @@ -729,7 +731,8 @@ out:
> int
> _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
> const nfs4_stateid *stateid, /* optional */
> - enum pnfs_layoutreturn_type type)
> + enum pnfs_layoutreturn_type type,
> + int sendreturn)
> {
> struct pnfs_layout_type *lo = NULL;
> struct nfs_inode *nfsi = NFS_I(ino);
> @@ -739,14 +742,19 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
> dprintk("--> %s type %d\n", __func__, type);
>
> if (range)
> - arg = *range;
> - else {
> + arg.iomode = range->iomode;
> + else
> arg.iomode = IOMODE_ANY;
> - arg.offset = 0;
> - arg.length = ~0;
> - }
> + arg.offset = 0;
> + arg.length = ~0;
> +
> if (type == RETURN_FILE) {
> if (nfsi->layout.layoutcommit_ctx) {
> + if (stateid && !sendreturn) { /* callback */
> + dprintk("%s: layoutcommit pending\n", __func__);
> + status = htonl(NFS4ERR_DELAY);
> + goto out;
> + }
> status = pnfs_layoutcommit_inode(ino, 1);
> if (status) {
> dprintk("%s: layoutcommit failed, status=%d. "
> @@ -763,11 +771,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
> }
> if (!lo) {
> dprintk("%s: no layout segments to return\n", __func__);
> - /* must send the LAYOUTRETURN in response to recall */
> - if (stateid)
> - goto send_return;
> - else
> - goto out;
> + goto out;
> }
>
> /* unlock w/o put rebalanced by eventual call to
> @@ -776,13 +780,22 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
> unlock_current_layout(nfsi);
>
> if (pnfs_return_layout_barrier(nfsi, &arg)) {
> + if (stateid) { /* callback */
> + status = htonl(NFS4ERR_DELAY);
> + lock_current_layout(nfsi);
> + put_unlock_current_layout(lo);
> + goto out;
> + }
> dprintk("%s: waiting\n", __func__);
> wait_event(nfsi->lo_waitq,
> - !pnfs_return_layout_barrier(nfsi, &arg));
> + !pnfs_return_layout_barrier(nfsi, &arg));
> }
> +
> + if (sendreturn)
> + status = return_layout(ino, &arg, stateid, type, lo);
> + else
> + pnfs_layout_release(lo, &arg);
> }
> -send_return:
> - status = return_layout(ino, &arg, stateid, type, lo);
> out:
> dprintk("<-- %s status: %d\n", __func__, status);
> return status;
> @@ -1080,31 +1093,12 @@ pnfs_update_layout(struct inode *ino,
> /* Check to see if the layout for the given range already exists */
> lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
> if (lseg && !lseg->valid) {
> - unlock_current_layout(nfsi);
> if (take_ref)
> put_lseg(lseg);
> - for (;;) {
> - prepare_to_wait(&nfsi->lo_waitq, &__wait,
> - TASK_KILLABLE);
> - lock_current_layout(nfsi);
> - lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
> - if (!lseg || lseg->valid)
> - break;
> - dprintk("%s: invalid lseg %p ref %d\n", __func__,
> - lseg, atomic_read(&lseg->kref.refcount)-1);
> - if (take_ref)
> - put_lseg(lseg);
> - if (signal_pending(current)) {
> - lseg = NULL;
> - result = -ERESTARTSYS;
> - break;
> - }
> - unlock_current_layout(nfsi);
> - schedule();
> - }
> - finish_wait(&nfsi->lo_waitq, &__wait);
> - if (result)
> - goto out_put;
> +
> + /* someone is cleaning the layout */
> + result = -EAGAIN;
> + goto out_put;
> }
>
> if (lseg) {
> @@ -1434,7 +1428,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
>
> if (count > 0 && !below_threshold(inode, count, 0)) {
> status = pnfs_update_layout(inode, ctx, count,
> - loff, IOMODE_READ, NULL);
> + loff, IOMODE_READ, NULL);
> dprintk("%s *rsize %Zd virt update returned %d\n",
> __func__, *rsize, status);
> if (status != 0)
> @@ -1673,7 +1667,7 @@ pnfs_writeback_done(struct nfs_write_data *data)
> .length = data->args.count,
> };
> dprintk("%s: retrying\n", __func__);
> - _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
> + _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, 1);
> pnfs_initiate_write(data, NFS_CLIENT(data->inode),
> pdata->call_ops, pdata->how);
> }
> @@ -1804,7 +1798,7 @@ pnfs_read_done(struct nfs_read_data *data)
> .length = data->args.count,
> };
> dprintk("%s: retrying\n", __func__);
> - _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
> + _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, 1);
> pnfs_initiate_read(data, NFS_CLIENT(data->inode),
> pdata->call_ops);
> }
> @@ -2030,7 +2024,7 @@ pnfs_commit_done(struct nfs_write_data *data)
> .length = data->args.count,
> };
> dprintk("%s: retrying\n", __func__);
> - _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
> + _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, 1);
> pnfs_initiate_commit(data, NFS_CLIENT(data->inode),
> pdata->call_ops, pdata->how);
> }
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 9e43973..794a2d3 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -40,7 +40,8 @@ int pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>
> int _pnfs_return_layout(struct inode *, struct nfs4_pnfs_layout_segment *,
> const nfs4_stateid *stateid, /* optional */
> - enum pnfs_layoutreturn_type);
> + enum pnfs_layoutreturn_type,
> + int sendreturn);
> void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh *mntfh, u32 id);
> void unmount_pnfs_layoutdriver(struct nfs_server *);
> int pnfs_use_read(struct inode *inode, ssize_t count);
> @@ -236,14 +237,15 @@ static inline void pnfs_modify_new_request(struct nfs_page *req,
> static inline int pnfs_return_layout(struct inode *ino,
> struct nfs4_pnfs_layout_segment *lseg,
> const nfs4_stateid *stateid, /* optional */
> - enum pnfs_layoutreturn_type type)
> + enum pnfs_layoutreturn_type type,
> + int sendreturn)
> {
> struct nfs_inode *nfsi = NFS_I(ino);
> struct nfs_server *nfss = NFS_SERVER(ino);
>
> if (pnfs_enabled_sb(nfss) &&
> (type != RETURN_FILE || has_layout(nfsi)))
> - return _pnfs_return_layout(ino, lseg, stateid, type);
> + return _pnfs_return_layout(ino, lseg, stateid, type, sendreturn);
>
> return 0;
> }
> --
> 1.6.2.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-05-18 00:06:28

by Alexandros Batsakis

[permalink] [raw]
Subject: Re: [PATCH 7/8] pnfs-submit: forgetful client model


On May 17, 2010, at 1:38 PM, J. Bruce Fields wrote:

> On Mon, May 17, 2010 at 10:56:43AM -0700, Alexandros Batsakis wrote:
>> Forgetful client model:
>>
>> If we receive a CB_LAYOUTRECALL
>> - we spawn a thread to handle the recall
>> (xxx: now only one recall can be active at a time, else NFS4ERR_DELAY)
>> - we check the stateid seqid
>> if it does not match we return NFS4ERR_DELAY
>> - we check for pending I/O
>> if there is we return NFS4ERR_DELAY
>> Else we return NO_MATCHING_LAYOUT.
>> Note that for whole file layouts there is no need to serialize LAYOUTGETs/LAYOUTRETURNs
>> For bulk layouts, if there is a layout active, we return NFS4_OK and we start
>> cleaning the layouts asynchronously. At the end we send a bulk LAYOUTRETURN.
>> Note that there is no need to prevent any new LAYOUTGETs explicitly as the server should
>> reject them.
>
> There a lot of discussion over the decision to adopt the forgetful
> client, and also some over when (and whether) to return NFS4ERR_DELAY.
> Would it be possible to also include a brief summary of the
> justification for the choices made?
>
> (And ideally to include it either in a block comment somewhere or in
> Documentation/, not just in a changelog comment?)
>


Sure, I ll add the stuff below to the wiki and to the Documention folder.

Forgetful client
-----------------------

In order to simplify the design and facilitate the maintainability of the Linux pNFS client, we follow a forgetful layout management model.

In the forgetful model the client "forgets" or pretends to forget layouts without letting the server know. This has the side-effect that server issued layout recalls result in a NOMATCHING_LAYOUT response instead of a LAYOUTRETURN sequence.

The decision to follow this model is that it's much simpler than the alternative of serializing all layout returns with layout gets and recalls, and then remembering and replying to the original server recall request. The forgetful scheme requires a more minimalistic design and it improves the manageability/maintainability of the code.

Performance-wise it improves performance in the case of recalls to layouts that are not currently in use (as it saves a unnecessary LAYOUTRETURN by replying immediately with NOMATHCING_LAYOUT), while it relies on the server to do the coordination when the latter tries to recall layouts with pending I/O.

It is important to note that both the arguments about the simplicity and about the performance are based on the assumption that recalls are relatively rare, and hence, we want to avoid to complicate the code or to optimize for an "anomaly".
If a blocks or objects person believes that this is assumption is not valid, we can re-examine this issue when the corresponding code gets submitted to the mainline.

- How does the client respond to CB_LAYOUT_RECALL:

a) if stateid.seqid is not the expected one return NFS4ERR_DELAY.
b) we set the specific segments as not valid (no more I/O to them)
c) if there is pending I/O we return NFS4ERR_DELAY.
d) we return NFS4ERR_NOMATCHING_LAYOUT

Here you can find a simple flowchart that highlights the description above.
https://docs.google.com/leaf?id=0B9egH40ld0WsMjg0ZmYzMTQtNDY5NC00ZjRjLTkwNzktNjFhZDFhYjJjYmFj&hl=en

- When does the client issue LAYOUTRETURNs:

a) when it receives a CB_RECALL_ANY
b) when it receives a bulk layout recall
c) when it retries an I/O
d) when it receives a LAYOUTGET with the return_on_close field set
d) when the layout has not been used for at least one lease period (TODO)

Hope this helps,

-alexandros



> --b.
>
>>
>> Signed-off-by: Alexandros Batsakis <[email protected]>
>> ---
>> fs/nfs/callback_proc.c | 131 ++++++++++++++++++++++++++++++++++--------------
>> fs/nfs/inode.c | 2 +-
>> fs/nfs/nfs4_fs.h | 1 +
>> fs/nfs/nfs4proc.c | 4 +-
>> fs/nfs/nfs4state.c | 2 +-
>> fs/nfs/pnfs.c | 78 +++++++++++++---------------
>> fs/nfs/pnfs.h | 8 ++-
>> 7 files changed, 139 insertions(+), 87 deletions(-)
>>
>> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
>> index 24e2571..419fee7 100644
>> --- a/fs/nfs/callback_proc.c
>> +++ b/fs/nfs/callback_proc.c
>> @@ -129,6 +129,38 @@ int nfs4_validate_delegation_stateid(struct nfs_delegation *delegation, const nf
>>
>> #if defined(CONFIG_NFS_V4_1)
>>
>> +static int
>> +pnfs_is_next_layout_stateid(const struct pnfs_layout_type *lo,
>> + const nfs4_stateid stateid)
>> +{
>> + int seq;
>> + int res;
>> + u32 oldseqid, newseqid;
>> +
>> + do {
>> + seq = read_seqbegin(&lo->seqlock);
>> + oldseqid = ntohl(lo->stateid.u.stateid.seqid);
>> + newseqid = ntohl(stateid.u.stateid.seqid);
>> + res = !memcmp(lo->stateid.u.stateid.other,
>> + stateid.u.stateid.other,
>> + NFS4_STATEID_OTHER_SIZE);
>> + if (res) { /* comparing layout stateids */
>> + if (oldseqid == ~0)
>> + res = (newseqid == 1);
>> + else
>> + res = (newseqid == oldseqid + 1);
>> + } else { /* open stateid */
>> + res = !memcmp(lo->stateid.u.data,
>> + &zero_stateid,
>> + NFS4_STATEID_SIZE);
>> + if (res)
>> + res = (newseqid == 1);
>> + }
>> + } while (read_seqretry(&lo->seqlock, seq));
>> +
>> + return res;
>> +}
>> +
>> /*
>> * Retrieve an inode based on layout recall parameters
>> *
>> @@ -190,10 +222,11 @@ static int pnfs_recall_layout(void *data)
>> {
>> struct inode *inode, *ino;
>> struct nfs_client *clp;
>> + struct nfs4_pnfs_layoutreturn *lrp;
>> struct cb_pnfs_layoutrecallargs rl;
>> struct recall_layout_threadargs *args =
>> (struct recall_layout_threadargs *)data;
>> - int status;
>> + int status = 0;
>>
>> daemonize("nfsv4-layoutreturn");
>>
>> @@ -204,46 +237,57 @@ static int pnfs_recall_layout(void *data)
>> clp = args->clp;
>> inode = args->inode;
>> rl = *args->rl;
>> - args->result = 0;
>> - complete(&args->started);
>> - args = NULL;
>> - /* Note: args must not be used after this point!!! */
>> -
>> -/* FIXME: need barrier here:
>> - pause I/O to data servers
>> - pause layoutgets
>> - drain all outstanding writes to storage devices
>> - wait for any outstanding layoutreturns and layoutgets mentioned in
>> - cb_sequence.
>> - then return layouts, resume after layoutreturns complete
>> - */
>>
>> /* support whole file layouts only */
>> rl.cbl_seg.offset = 0;
>> rl.cbl_seg.length = NFS4_MAX_UINT64;
>>
>> if (rl.cbl_recall_type == RETURN_FILE) {
>> - status = pnfs_return_layout(inode, &rl.cbl_seg, &rl.cbl_stateid,
>> - RETURN_FILE);
>> + if (pnfs_is_next_layout_stateid(&NFS_I(inode)->layout, rl.cbl_stateid))
>> + status = pnfs_return_layout(inode, &rl.cbl_seg, &rl.cbl_stateid,
>> + RETURN_FILE, 0);
>> + else
>> + status = htonl(NFS4ERR_DELAY);
>> + /* forgetful client */
>> if (status)
>> - dprintk("%s RETURN_FILE error: %d\n", __func__, status);
>> + dprintk("%s RETURN_FILE : %d\n", __func__, status);
>> + else
>> + status = htonl(NFS4ERR_NOMATCHING_LAYOUT);
>> + args->result = status;
>> + complete(&args->started);
>> goto out;
>> }
>>
>> - /* FIXME: This loop is inefficient, running in O(|s_inodes|^2) */
>> + status = htonl(NFS4_OK);
>> + args->result = status;
>> + complete(&args->started);
>> + args = NULL;
>> +
>> while ((ino = nfs_layoutrecall_find_inode(clp, &rl)) != NULL) {
>> /* XXX need to check status on pnfs_return_layout */
>> - pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE);
>> + pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE, 0);
>> iput(ino);
>> }
>>
>> - /* send final layoutreturn */
>> - status = pnfs_return_layout(inode, &rl.cbl_seg, NULL, rl.cbl_recall_type);
>> - if (status)
>> - printk(KERN_INFO "%s: ignoring pnfs_return_layout status=%d\n",
>> - __func__, status);
>> + lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
>> + if (!lrp) {
>> + dprintk("%s: allocation failed. Cannot send last LAYOUTRETURN\n",
>> + __func__);
>> + goto out;
>> + }
>> +
>> + /* send final layoutreturn (bulk) */
>> + lrp->args.reclaim = 0;
>> + lrp->args.layout_type = rl.cbl_layout_type;
>> + lrp->args.return_type = rl.cbl_recall_type;
>> + lrp->args.lseg = rl.cbl_seg;
>> + lrp->args.inode = inode;
>> + lrp->lo = NULL;
>> +
>> + pnfs4_proc_layoutreturn(lrp);
>> out:
>> - iput(inode);
>> + clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
>> + nfs_put_client(clp);
>> module_put_and_exit(0);
>> dprintk("%s: exit status %d\n", __func__, 0);
>> return 0;
>> @@ -261,15 +305,18 @@ static int pnfs_async_return_layout(struct nfs_client *clp, struct inode *inode,
>> .rl = rl,
>> };
>> struct task_struct *t;
>> - int status;
>> -
>> - /* should have returned NFS4ERR_NOMATCHING_LAYOUT... */
>> - BUG_ON(inode == NULL);
>> + int status = htonl(NFS4ERR_DELAY);
>>
>> dprintk("%s: -->\n", __func__);
>>
>> + /* XXX: do not allow two concurrent layout recalls */
>> + if (test_and_set_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state))
>> + return status;
>> +
>> init_completion(&data.started);
>> __module_get(THIS_MODULE);
>> + if (!atomic_inc_not_zero(&clp->cl_count))
>> + goto out_put_no_client;
>>
>> t = kthread_run(pnfs_recall_layout, &data, "%s", "pnfs_recall_layout");
>> if (IS_ERR(t)) {
>> @@ -283,6 +330,9 @@ static int pnfs_async_return_layout(struct nfs_client *clp, struct inode *inode,
>> wait_for_completion(&data.started);
>> return data.result;
>> out_module_put:
>> + nfs_put_client(clp);
>> +out_put_no_client:
>> + clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
>> module_put(THIS_MODULE);
>> return status;
>> }
>> @@ -309,19 +359,24 @@ __be32 pnfs_cb_layoutrecall(struct cb_pnfs_layoutrecallargs *args,
>> do {
>> struct nfs_client *prev = clp;
>> num_client++;
>> - inode = nfs_layoutrecall_find_inode(clp, args);
>> - if (inode != NULL) {
>> - if (PNFS_LD(&NFS_I(inode)->layout)->id ==
>> - args->cbl_layout_type) {
>> - /* Set up a helper thread to actually
>> - * return the delegation */
>> + /* the callback must come from the MDS personality */
>> + if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_MDS))
>> + goto loop;
>> + if (args->cbl_recall_type == RETURN_FILE) {
>> + inode = nfs_layoutrecall_find_inode(clp, args);
>> + if (inode != NULL) {
>> res = pnfs_async_return_layout(clp, inode, args);
>> - if (res != 0)
>> - res = htonl(NFS4ERR_RESOURCE);
>> - break;
>> + iput(inode);
>> }
>> + } else {
>> + /* we need the inode to get the nfs_server struct */
>> + inode = nfs_layoutrecall_find_inode(clp, args);
>> + if (!inode)
>> + goto loop;
>> + res = pnfs_async_return_layout(clp, inode, args);
>> iput(inode);
>> }
>> +loop:
>> clp = nfs_find_client_next(prev);
>> nfs_put_client(prev);
>> } while (clp != NULL);
>> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
>> index adf3280..584c9b8 100644
>> --- a/fs/nfs/inode.c
>> +++ b/fs/nfs/inode.c
>> @@ -1321,7 +1321,7 @@ void nfs4_clear_inode(struct inode *inode)
>> /* First call standard NFS clear_inode() code */
>> nfs_clear_inode(inode);
>> #ifdef CONFIG_NFS_V4_1
>> - pnfs_return_layout(inode, NULL, NULL, RETURN_FILE);
>> + pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, 1);
>> #endif /* CONFIG_NFS_V4_1 */
>> }
>> #endif
>> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
>> index 40ebe1d..90c57b3 100644
>> --- a/fs/nfs/nfs4_fs.h
>> +++ b/fs/nfs/nfs4_fs.h
>> @@ -47,6 +47,7 @@ enum nfs4_client_state {
>> NFS4CLNT_SESSION_RESET,
>> NFS4CLNT_SESSION_DRAINING,
>> NFS4CLNT_RECALL_SLOT,
>> + NFS4CLNT_LAYOUT_RECALL,
>> };
>>
>> /*
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 6739d15..14e90e1 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -1085,7 +1085,7 @@ static void pnfs4_layout_reclaim(struct nfs4_state *state)
>> /* FIXME: send gratuitous layout commits and return with the reclaim
>> * flag during grace period
>> */
>> - pnfs_return_layout(state->inode, NULL, &state->open_stateid, RETURN_FILE);
>> + pnfs_return_layout(state->inode, NULL, NULL, RETURN_FILE, 1);
>> pnfs_set_layout_stateid(&NFS_I(state->inode)->layout, &zero_stateid);
>> #endif /* CONFIG_NFS_V4_1 */
>> }
>> @@ -2382,7 +2382,7 @@ pnfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
>>
>> if (pnfs_enabled_sb(server) && has_layout(nfsi) &&
>> pnfs_ld_layoutret_on_setattr(server->pnfs_curr_ld))
>> - pnfs_return_layout(inode, NULL, NULL, RETURN_FILE);
>> + pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, 1);
>> return nfs4_proc_setattr(dentry, fattr, sattr);
>> }
>> #endif /* CONFIG_NFS_V4_1 */
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index 2d52300..cd2ae83 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -598,7 +598,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm
>> range.offset = 0;
>> range.length = NFS4_MAX_UINT64;
>> pnfs_return_layout(state->inode, &range, NULL,
>> - RETURN_FILE);
>> + RETURN_FILE, 1);
>> }
>> #endif /* CONFIG_NFS_V4_1 */
>> nfs4_do_close(path, state, wait);
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index 3b4dea0..2c96723 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -610,7 +610,7 @@ get_layout(struct inode *ino,
>> */
>> static inline int
>> should_free_lseg(struct pnfs_layout_segment *lseg,
>> - struct nfs4_pnfs_layout_segment *range)
>> + struct nfs4_pnfs_layout_segment *range)
>> {
>> return (range->iomode == IOMODE_ANY ||
>> lseg->range.iomode == range->iomode) &&
>> @@ -703,6 +703,8 @@ return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
>>
>> dprintk("--> %s\n", __func__);
>>
>> + BUG_ON(type != RETURN_FILE);
>> +
>> lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
>> if (lrp == NULL) {
>> if (lo && (type == RETURN_FILE))
>> @@ -729,7 +731,8 @@ out:
>> int
>> _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
>> const nfs4_stateid *stateid, /* optional */
>> - enum pnfs_layoutreturn_type type)
>> + enum pnfs_layoutreturn_type type,
>> + int sendreturn)
>> {
>> struct pnfs_layout_type *lo = NULL;
>> struct nfs_inode *nfsi = NFS_I(ino);
>> @@ -739,14 +742,19 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
>> dprintk("--> %s type %d\n", __func__, type);
>>
>> if (range)
>> - arg = *range;
>> - else {
>> + arg.iomode = range->iomode;
>> + else
>> arg.iomode = IOMODE_ANY;
>> - arg.offset = 0;
>> - arg.length = ~0;
>> - }
>> + arg.offset = 0;
>> + arg.length = ~0;
>> +
>> if (type == RETURN_FILE) {
>> if (nfsi->layout.layoutcommit_ctx) {
>> + if (stateid && !sendreturn) { /* callback */
>> + dprintk("%s: layoutcommit pending\n", __func__);
>> + status = htonl(NFS4ERR_DELAY);
>> + goto out;
>> + }
>> status = pnfs_layoutcommit_inode(ino, 1);
>> if (status) {
>> dprintk("%s: layoutcommit failed, status=%d. "
>> @@ -763,11 +771,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
>> }
>> if (!lo) {
>> dprintk("%s: no layout segments to return\n", __func__);
>> - /* must send the LAYOUTRETURN in response to recall */
>> - if (stateid)
>> - goto send_return;
>> - else
>> - goto out;
>> + goto out;
>> }
>>
>> /* unlock w/o put rebalanced by eventual call to
>> @@ -776,13 +780,22 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
>> unlock_current_layout(nfsi);
>>
>> if (pnfs_return_layout_barrier(nfsi, &arg)) {
>> + if (stateid) { /* callback */
>> + status = htonl(NFS4ERR_DELAY);
>> + lock_current_layout(nfsi);
>> + put_unlock_current_layout(lo);
>> + goto out;
>> + }
>> dprintk("%s: waiting\n", __func__);
>> wait_event(nfsi->lo_waitq,
>> - !pnfs_return_layout_barrier(nfsi, &arg));
>> + !pnfs_return_layout_barrier(nfsi, &arg));
>> }
>> +
>> + if (sendreturn)
>> + status = return_layout(ino, &arg, stateid, type, lo);
>> + else
>> + pnfs_layout_release(lo, &arg);
>> }
>> -send_return:
>> - status = return_layout(ino, &arg, stateid, type, lo);
>> out:
>> dprintk("<-- %s status: %d\n", __func__, status);
>> return status;
>> @@ -1080,31 +1093,12 @@ pnfs_update_layout(struct inode *ino,
>> /* Check to see if the layout for the given range already exists */
>> lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
>> if (lseg && !lseg->valid) {
>> - unlock_current_layout(nfsi);
>> if (take_ref)
>> put_lseg(lseg);
>> - for (;;) {
>> - prepare_to_wait(&nfsi->lo_waitq, &__wait,
>> - TASK_KILLABLE);
>> - lock_current_layout(nfsi);
>> - lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
>> - if (!lseg || lseg->valid)
>> - break;
>> - dprintk("%s: invalid lseg %p ref %d\n", __func__,
>> - lseg, atomic_read(&lseg->kref.refcount)-1);
>> - if (take_ref)
>> - put_lseg(lseg);
>> - if (signal_pending(current)) {
>> - lseg = NULL;
>> - result = -ERESTARTSYS;
>> - break;
>> - }
>> - unlock_current_layout(nfsi);
>> - schedule();
>> - }
>> - finish_wait(&nfsi->lo_waitq, &__wait);
>> - if (result)
>> - goto out_put;
>> +
>> + /* someone is cleaning the layout */
>> + result = -EAGAIN;
>> + goto out_put;
>> }
>>
>> if (lseg) {
>> @@ -1434,7 +1428,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
>>
>> if (count > 0 && !below_threshold(inode, count, 0)) {
>> status = pnfs_update_layout(inode, ctx, count,
>> - loff, IOMODE_READ, NULL);
>> + loff, IOMODE_READ, NULL);
>> dprintk("%s *rsize %Zd virt update returned %d\n",
>> __func__, *rsize, status);
>> if (status != 0)
>> @@ -1673,7 +1667,7 @@ pnfs_writeback_done(struct nfs_write_data *data)
>> .length = data->args.count,
>> };
>> dprintk("%s: retrying\n", __func__);
>> - _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
>> + _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, 1);
>> pnfs_initiate_write(data, NFS_CLIENT(data->inode),
>> pdata->call_ops, pdata->how);
>> }
>> @@ -1804,7 +1798,7 @@ pnfs_read_done(struct nfs_read_data *data)
>> .length = data->args.count,
>> };
>> dprintk("%s: retrying\n", __func__);
>> - _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
>> + _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, 1);
>> pnfs_initiate_read(data, NFS_CLIENT(data->inode),
>> pdata->call_ops);
>> }
>> @@ -2030,7 +2024,7 @@ pnfs_commit_done(struct nfs_write_data *data)
>> .length = data->args.count,
>> };
>> dprintk("%s: retrying\n", __func__);
>> - _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
>> + _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, 1);
>> pnfs_initiate_commit(data, NFS_CLIENT(data->inode),
>> pdata->call_ops, pdata->how);
>> }
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index 9e43973..794a2d3 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -40,7 +40,8 @@ int pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>>
>> int _pnfs_return_layout(struct inode *, struct nfs4_pnfs_layout_segment *,
>> const nfs4_stateid *stateid, /* optional */
>> - enum pnfs_layoutreturn_type);
>> + enum pnfs_layoutreturn_type,
>> + int sendreturn);
>> void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh *mntfh, u32 id);
>> void unmount_pnfs_layoutdriver(struct nfs_server *);
>> int pnfs_use_read(struct inode *inode, ssize_t count);
>> @@ -236,14 +237,15 @@ static inline void pnfs_modify_new_request(struct nfs_page *req,
>> static inline int pnfs_return_layout(struct inode *ino,
>> struct nfs4_pnfs_layout_segment *lseg,
>> const nfs4_stateid *stateid, /* optional */
>> - enum pnfs_layoutreturn_type type)
>> + enum pnfs_layoutreturn_type type,
>> + int sendreturn)
>> {
>> struct nfs_inode *nfsi = NFS_I(ino);
>> struct nfs_server *nfss = NFS_SERVER(ino);
>>
>> if (pnfs_enabled_sb(nfss) &&
>> (type != RETURN_FILE || has_layout(nfsi)))
>> - return _pnfs_return_layout(ino, lseg, stateid, type);
>> + return _pnfs_return_layout(ino, lseg, stateid, type, sendreturn);
>>
>> return 0;
>> }
>> --
>> 1.6.2.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2010-05-18 17:33:34

by Alexandros Batsakis

[permalink] [raw]
Subject: Re: [PATCH 7/8] pnfs-submit: forgetful client model


On May 18, 2010, at 7:16 AM, J. Bruce Fields wrote:

> On Mon, May 17, 2010 at 05:06:09PM -0700, Alexandros Batsakis wrote:
>> Sure, I ll add the stuff below to the wiki and to the Documention folder.
>
> Thanks! Looks good.
>
>> a) if stateid.seqid is not the expected one return NFS4ERR_DELAY.
>> b) we set the specific segments as not valid (no more I/O to them)
>> c) if there is pending I/O we return NFS4ERR_DELAY.
>
> Up to you whether it's worth documenting, but: I remember there being
> some disagreement about this, but don't remember the resolution.
>
>> d) we return NFS4ERR_NOMATCHING_LAYOUT
>>
>> Here you can find a simple flowchart that highlights the description above.
>> https://docs.google.com/leaf?id=0B9egH40ld0WsMjg0ZmYzMTQtNDY5NC00ZjRjLTkwNzktNjFhZDFhYjJjYmFj&hl=en
>
> That URL doesn't work for me.
>

Apologies
This should work (it's a .pdf file)
https://docs.google.com/fileview?id=0B9egH40ld0WsNWQzM2RhMTQtM2E5ZS00YTk3LThmNDItMDU0YWMzZDczNDli&hl=en

-a

> --b.


2010-05-27 18:38:37

by Batsakis, Alexandros

[permalink] [raw]
Subject: RE: [PATCH 7/8] pnfs-submit: forgetful client model

Hi Benny,

Thanks for the review. I ll resend a new version shortly.

-alexandros

> -----Original Message-----
> From: Benny Halevy [mailto:[email protected]]
> Sent: Wednesday, May 26, 2010 2:21 AM
> To: Batsakis, Alexandros
> Cc: [email protected]
> Subject: Re: [PATCH 7/8] pnfs-submit: forgetful client model
>
> On 2010-05-17 20:56, Alexandros Batsakis wrote:
> > Forgetful client model:
> >
> > If we receive a CB_LAYOUTRECALL
> > - we spawn a thread to handle the recall
> > (xxx: now only one recall can be active at a time, else
> NFS4ERR_DELAY)
> > - we check the stateid seqid
> > if it does not match we return NFS4ERR_DELAY
> > - we check for pending I/O
> > if there is we return NFS4ERR_DELAY
> > Else we return NO_MATCHING_LAYOUT.
> > Note that for whole file layouts there is no need to serialize
> LAYOUTGETs/LAYOUTRETURNs
> > For bulk layouts, if there is a layout active, we return NFS4_OK and
> we start
> > cleaning the layouts asynchronously. At the end we send a bulk
> LAYOUTRETURN.
> > Note that there is no need to prevent any new LAYOUTGETs explicitly
> as the server should
> > reject them.
> >
> > Signed-off-by: Alexandros Batsakis <[email protected]>
> > ---
> > fs/nfs/callback_proc.c | 131
++++++++++++++++++++++++++++++++++----
> ----------
> > fs/nfs/inode.c | 2 +-
> > fs/nfs/nfs4_fs.h | 1 +
> > fs/nfs/nfs4proc.c | 4 +-
> > fs/nfs/nfs4state.c | 2 +-
> > fs/nfs/pnfs.c | 78 +++++++++++++---------------
> > fs/nfs/pnfs.h | 8 ++-
> > 7 files changed, 139 insertions(+), 87 deletions(-)
> >
> > diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> > index 24e2571..419fee7 100644
> > --- a/fs/nfs/callback_proc.c
> > +++ b/fs/nfs/callback_proc.c
> > @@ -129,6 +129,38 @@ int nfs4_validate_delegation_stateid(struct
> nfs_delegation *delegation, const nf
> >
> > #if defined(CONFIG_NFS_V4_1)
> >
> > +static int
>
> can return bool
>
> > +pnfs_is_next_layout_stateid(const struct pnfs_layout_type *lo,
> > + const nfs4_stateid stateid)
> > +{
> > + int seq;
>
> how about calling it seqlock so it will be less confusing
> with regards to seqid.
>
> > + int res;
>
> and res is "bool" too.
>
> > + u32 oldseqid, newseqid;
> > +
> > + do {
> > + seq = read_seqbegin(&lo->seqlock);
> > + oldseqid = ntohl(lo->stateid.u.stateid.seqid);
> > + newseqid = ntohl(stateid.u.stateid.seqid);
>
> let's use the more modern be32_to_cpu form...
>
> > + res = !memcmp(lo->stateid.u.stateid.other,
> > + stateid.u.stateid.other,
> > + NFS4_STATEID_OTHER_SIZE);
> > + if (res) { /* comparing layout stateids */
> > + if (oldseqid == ~0)
> > + res = (newseqid == 1);
> > + else
> > + res = (newseqid == oldseqid + 1);
> > + } else { /* open stateid */
> > + res = !memcmp(lo->stateid.u.data,
> > + &zero_stateid,
> > + NFS4_STATEID_SIZE);
> > + if (res)
> > + res = (newseqid == 1);
> > + }
> > + } while (read_seqretry(&lo->seqlock, seq));
> > +
> > + return res;
> > +}
> > +
> > /*
> > * Retrieve an inode based on layout recall parameters
> > *
> > @@ -190,10 +222,11 @@ static int pnfs_recall_layout(void *data)
> > {
> > struct inode *inode, *ino;
> > struct nfs_client *clp;
> > + struct nfs4_pnfs_layoutreturn *lrp;
> > struct cb_pnfs_layoutrecallargs rl;
> > struct recall_layout_threadargs *args =
> > (struct recall_layout_threadargs *)data;
> > - int status;
> > + int status = 0;
> >
> > daemonize("nfsv4-layoutreturn");
> >
> > @@ -204,46 +237,57 @@ static int pnfs_recall_layout(void *data)
> > clp = args->clp;
> > inode = args->inode;
> > rl = *args->rl;
> > - args->result = 0;
> > - complete(&args->started);
> > - args = NULL;
> > - /* Note: args must not be used after this point!!! */
> > -
> > -/* FIXME: need barrier here:
> > - pause I/O to data servers
> > - pause layoutgets
> > - drain all outstanding writes to storage devices
> > - wait for any outstanding layoutreturns and layoutgets mentioned
> in
> > - cb_sequence.
> > - then return layouts, resume after layoutreturns complete
> > - */
> >
> > /* support whole file layouts only */
> > rl.cbl_seg.offset = 0;
> > rl.cbl_seg.length = NFS4_MAX_UINT64;
> >
> > if (rl.cbl_recall_type == RETURN_FILE) {
> > - status = pnfs_return_layout(inode, &rl.cbl_seg,
> &rl.cbl_stateid,
> > - RETURN_FILE);
> > + if (pnfs_is_next_layout_stateid(&NFS_I(inode)->layout,
> rl.cbl_stateid))
> > + status = pnfs_return_layout(inode, &rl.cbl_seg,
> &rl.cbl_stateid,
> > + RETURN_FILE, 0);
> > + else
> > + status = htonl(NFS4ERR_DELAY);
>
> cpu_to_be32?
>
> > + /* forgetful client */
> > if (status)
> > - dprintk("%s RETURN_FILE error: %d\n", __func__,
> status);
> > + dprintk("%s RETURN_FILE : %d\n", __func__,
status);
> > + else
> > + status = htonl(NFS4ERR_NOMATCHING_LAYOUT);
>
> ditto
>
> > + args->result = status;
> > + complete(&args->started);
> > goto out;
> > }
> >
> > - /* FIXME: This loop is inefficient, running in O(|s_inodes|^2)
*/
>
> Hmm, I would like this to be fixed, eventually.
> Maybe FIXME is too harsh and IMPROVEME would be better :)
>
> > + status = htonl(NFS4_OK);
>
> ditto
>
> > + args->result = status;
> > + complete(&args->started);
> > + args = NULL;
> > +
> > while ((ino = nfs_layoutrecall_find_inode(clp, &rl)) != NULL) {
> > /* XXX need to check status on pnfs_return_layout */
>
> s/XXX/FIXME/
>
> > - pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE);
> > + pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE,
0);
> > iput(ino);
> > }
> >
> > - /* send final layoutreturn */
> > - status = pnfs_return_layout(inode, &rl.cbl_seg, NULL,
> rl.cbl_recall_type);
> > - if (status)
> > - printk(KERN_INFO "%s: ignoring pnfs_return_layout
> status=%d\n",
> > - __func__, status);
> > + lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
> > + if (!lrp) {
> > + dprintk("%s: allocation failed. Cannot send last
> LAYOUTRETURN\n",
> > + __func__);
> > + goto out;
> > + }
> > +
> > + /* send final layoutreturn (bulk) */
> > + lrp->args.reclaim = 0;
> > + lrp->args.layout_type = rl.cbl_layout_type;
> > + lrp->args.return_type = rl.cbl_recall_type;
> > + lrp->args.lseg = rl.cbl_seg;
> > + lrp->args.inode = inode;
> > + lrp->lo = NULL;
> > +
> > + pnfs4_proc_layoutreturn(lrp);
> > out:
> > - iput(inode);
> > + clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
> > + nfs_put_client(clp);
> > module_put_and_exit(0);
> > dprintk("%s: exit status %d\n", __func__, 0);
> > return 0;
> > @@ -261,15 +305,18 @@ static int pnfs_async_return_layout(struct
> nfs_client *clp, struct inode *inode,
> > .rl = rl,
> > };
> > struct task_struct *t;
> > - int status;
> > -
> > - /* should have returned NFS4ERR_NOMATCHING_LAYOUT... */
> > - BUG_ON(inode == NULL);
> > + int status = htonl(NFS4ERR_DELAY);
> >
> > dprintk("%s: -->\n", __func__);
> >
> > + /* XXX: do not allow two concurrent layout recalls */
> > + if (test_and_set_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state))
> > + return status;
> > +
> > init_completion(&data.started);
> > __module_get(THIS_MODULE);
> > + if (!atomic_inc_not_zero(&clp->cl_count))
> > + goto out_put_no_client;
> >
> > t = kthread_run(pnfs_recall_layout, &data, "%s",
> "pnfs_recall_layout");
> > if (IS_ERR(t)) {
> > @@ -283,6 +330,9 @@ static int pnfs_async_return_layout(struct
> nfs_client *clp, struct inode *inode,
> > wait_for_completion(&data.started);
> > return data.result;
> > out_module_put:
> > + nfs_put_client(clp);
> > +out_put_no_client:
> > + clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
> > module_put(THIS_MODULE);
> > return status;
> > }
> > @@ -309,19 +359,24 @@ __be32 pnfs_cb_layoutrecall(struct
> cb_pnfs_layoutrecallargs *args,
> > do {
> > struct nfs_client *prev = clp;
> > num_client++;
> > - inode = nfs_layoutrecall_find_inode(clp, args);
> > - if (inode != NULL) {
> > - if (PNFS_LD(&NFS_I(inode)->layout)->id ==
> > - args->cbl_layout_type) {
> > - /* Set up a helper thread to actually
> > - * return the delegation */
> > + /* the callback must come from the MDS personality */
> > + if (!(clp->cl_exchange_flags &
EXCHGID4_FLAG_USE_PNFS_MDS))
> > + goto loop;
> > + if (args->cbl_recall_type == RETURN_FILE) {
> > + inode = nfs_layoutrecall_find_inode(clp, args);
> > + if (inode != NULL) {
> > res = pnfs_async_return_layout(clp,
inode,
> args);
> > - if (res != 0)
> > - res = htonl(NFS4ERR_RESOURCE);
> > - break;
> > + iput(inode);
> > }
> > + } else {
> > + /* we need the inode to get the nfs_server
struct */
> > + inode = nfs_layoutrecall_find_inode(clp, args);
> > + if (!inode)
> > + goto loop;
> > + res = pnfs_async_return_layout(clp, inode,
args);
> > iput(inode);
> > }
> > +loop:
> > clp = nfs_find_client_next(prev);
> > nfs_put_client(prev);
> > } while (clp != NULL);
> > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> > index adf3280..584c9b8 100644
> > --- a/fs/nfs/inode.c
> > +++ b/fs/nfs/inode.c
> > @@ -1321,7 +1321,7 @@ void nfs4_clear_inode(struct inode *inode)
> > /* First call standard NFS clear_inode() code */
> > nfs_clear_inode(inode);
> > #ifdef CONFIG_NFS_V4_1
> > - pnfs_return_layout(inode, NULL, NULL, RETURN_FILE);
> > + pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, 1);
> > #endif /* CONFIG_NFS_V4_1 */
> > }
> > #endif
> > diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> > index 40ebe1d..90c57b3 100644
> > --- a/fs/nfs/nfs4_fs.h
> > +++ b/fs/nfs/nfs4_fs.h
> > @@ -47,6 +47,7 @@ enum nfs4_client_state {
> > NFS4CLNT_SESSION_RESET,
> > NFS4CLNT_SESSION_DRAINING,
> > NFS4CLNT_RECALL_SLOT,
> > + NFS4CLNT_LAYOUT_RECALL,
> > };
> >
> > /*
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index 6739d15..14e90e1 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -1085,7 +1085,7 @@ static void pnfs4_layout_reclaim(struct
> nfs4_state *state)
> > /* FIXME: send gratuitous layout commits and return with the
> reclaim
> > * flag during grace period
> > */
> > - pnfs_return_layout(state->inode, NULL, &state->open_stateid,
> RETURN_FILE);
> > + pnfs_return_layout(state->inode, NULL, NULL, RETURN_FILE, 1);
> > pnfs_set_layout_stateid(&NFS_I(state->inode)->layout,
> &zero_stateid);
> > #endif /* CONFIG_NFS_V4_1 */
> > }
> > @@ -2382,7 +2382,7 @@ pnfs4_proc_setattr(struct dentry *dentry,
> struct nfs_fattr *fattr,
> >
> > if (pnfs_enabled_sb(server) && has_layout(nfsi) &&
> > pnfs_ld_layoutret_on_setattr(server->pnfs_curr_ld))
> > - pnfs_return_layout(inode, NULL, NULL, RETURN_FILE);
> > + pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, 1);
> > return nfs4_proc_setattr(dentry, fattr, sattr);
> > }
> > #endif /* CONFIG_NFS_V4_1 */
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index 2d52300..cd2ae83 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -598,7 +598,7 @@ static void __nfs4_close(struct path *path,
> struct nfs4_state *state, fmode_t fm
> > range.offset = 0;
> > range.length = NFS4_MAX_UINT64;
> > pnfs_return_layout(state->inode, &range, NULL,
> > - RETURN_FILE);
> > + RETURN_FILE, 1);
> > }
> > #endif /* CONFIG_NFS_V4_1 */
> > nfs4_do_close(path, state, wait);
> > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > index 3b4dea0..2c96723 100644
> > --- a/fs/nfs/pnfs.c
> > +++ b/fs/nfs/pnfs.c
> > @@ -610,7 +610,7 @@ get_layout(struct inode *ino,
> > */
> > static inline int
> > should_free_lseg(struct pnfs_layout_segment *lseg,
> > - struct nfs4_pnfs_layout_segment *range)
> > + struct nfs4_pnfs_layout_segment *range)
> > {
> > return (range->iomode == IOMODE_ANY ||
> > lseg->range.iomode == range->iomode) &&
> > @@ -703,6 +703,8 @@ return_layout(struct inode *ino, struct
> nfs4_pnfs_layout_segment *range,
> >
> > dprintk("--> %s\n", __func__);
> >
> > + BUG_ON(type != RETURN_FILE);
> > +
> > lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
> > if (lrp == NULL) {
> > if (lo && (type == RETURN_FILE))
> > @@ -729,7 +731,8 @@ out:
> > int
> > _pnfs_return_layout(struct inode *ino, struct
> nfs4_pnfs_layout_segment *range,
> > const nfs4_stateid *stateid, /* optional */
> > - enum pnfs_layoutreturn_type type)
> > + enum pnfs_layoutreturn_type type,
> > + int sendreturn)
>
> s/int/bool/
> and we should use true/false rather than 1/0 in call sites.
>
> > {
> > struct pnfs_layout_type *lo = NULL;
> > struct nfs_inode *nfsi = NFS_I(ino);
> > @@ -739,14 +742,19 @@ _pnfs_return_layout(struct inode *ino, struct
> nfs4_pnfs_layout_segment *range,
> > dprintk("--> %s type %d\n", __func__, type);
> >
> > if (range)
> > - arg = *range;
> > - else {
> > + arg.iomode = range->iomode;
> > + else
> > arg.iomode = IOMODE_ANY;
>
> how about
> arg.iomode = range ? range->iomode : IOMODE_ANY;
>
> > - arg.offset = 0;
> > - arg.length = ~0;
> > - }
> > + arg.offset = 0;
> > + arg.length = ~0;
>
> NFS4_MAX_UINT64
>
> > +
> > if (type == RETURN_FILE) {
> > if (nfsi->layout.layoutcommit_ctx) {
> > + if (stateid && !sendreturn) { /* callback */
> > + dprintk("%s: layoutcommit pending\n",
> __func__);
> > + status = htonl(NFS4ERR_DELAY);
>
> this function currently returns system errors, not nfs'p.
>
> > + goto out;
> > + }
> > status = pnfs_layoutcommit_inode(ino, 1);
> > if (status) {
> > dprintk("%s: layoutcommit failed,
status=%d. "
> > @@ -763,11 +771,7 @@ _pnfs_return_layout(struct inode *ino, struct
> nfs4_pnfs_layout_segment *range,
> > }
> > if (!lo) {
> > dprintk("%s: no layout segments to return\n",
> __func__);
> > - /* must send the LAYOUTRETURN in response to
recall
> */
> > - if (stateid)
> > - goto send_return;
> > - else
> > - goto out;
> > + goto out;
>
> what about the sendreturn case?
>
> > }
> >
> > /* unlock w/o put rebalanced by eventual call to
> > @@ -776,13 +780,22 @@ _pnfs_return_layout(struct inode *ino, struct
> nfs4_pnfs_layout_segment *range,
> > unlock_current_layout(nfsi);
> >
> > if (pnfs_return_layout_barrier(nfsi, &arg)) {
> > + if (stateid) { /* callback */
> > + status = htonl(NFS4ERR_DELAY);
>
> ditto
>
> > + lock_current_layout(nfsi);
> > + put_unlock_current_layout(lo);
> > + goto out;
> > + }
> > dprintk("%s: waiting\n", __func__);
> > wait_event(nfsi->lo_waitq,
> > - !pnfs_return_layout_barrier(nfsi,
&arg));
> > + !pnfs_return_layout_barrier(nfsi,
&arg));
> > }
> > +
> > + if (sendreturn)
> > + status = return_layout(ino, &arg, stateid, type,
lo);
> > + else
> > + pnfs_layout_release(lo, &arg);
> > }
> > -send_return:
> > - status = return_layout(ino, &arg, stateid, type, lo);
> > out:
> > dprintk("<-- %s status: %d\n", __func__, status);
> > return status;
> > @@ -1080,31 +1093,12 @@ pnfs_update_layout(struct inode *ino,
> > /* Check to see if the layout for the given range already exists
> */
> > lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
> > if (lseg && !lseg->valid) {
> > - unlock_current_layout(nfsi);
> > if (take_ref)
> > put_lseg(lseg);
> > - for (;;) {
> > - prepare_to_wait(&nfsi->lo_waitq, &__wait,
> > - TASK_KILLABLE);
> > - lock_current_layout(nfsi);
> > - lseg = pnfs_has_layout(lo, &arg, take_ref,
> !take_ref);
> > - if (!lseg || lseg->valid)
> > - break;
> > - dprintk("%s: invalid lseg %p ref %d\n",
__func__,
> > - lseg,
atomic_read(&lseg->kref.refcount)-1);
> > - if (take_ref)
> > - put_lseg(lseg);
> > - if (signal_pending(current)) {
> > - lseg = NULL;
> > - result = -ERESTARTSYS;
> > - break;
> > - }
> > - unlock_current_layout(nfsi);
> > - schedule();
> > - }
> > - finish_wait(&nfsi->lo_waitq, &__wait);
> > - if (result)
> > - goto out_put;
> > +
> > + /* someone is cleaning the layout */
> > + result = -EAGAIN;
>
> Ah well, I'm not sure if this works, did you test that?
> Anyhow, will work properly with the long awaited state machine.
>
> > + goto out_put;
> > }
> >
> > if (lseg) {
> > @@ -1434,7 +1428,7 @@ pnfs_pageio_init_read(struct
> nfs_pageio_descriptor *pgio,
> >
> > if (count > 0 && !below_threshold(inode, count, 0)) {
> > status = pnfs_update_layout(inode, ctx, count,
> > - loff, IOMODE_READ,
NULL);
> > + loff, IOMODE_READ, NULL);
> > dprintk("%s *rsize %Zd virt update returned %d\n",
> > __func__, *rsize, status);
> > if (status != 0)
> > @@ -1673,7 +1667,7 @@ pnfs_writeback_done(struct nfs_write_data
> *data)
> > .length = data->args.count,
> > };
> > dprintk("%s: retrying\n", __func__);
> > - _pnfs_return_layout(data->inode, &range, NULL,
> RETURN_FILE);
> > + _pnfs_return_layout(data->inode, &range, NULL,
RETURN_FILE,
> 1);
>
> s/1/true/
>
> > pnfs_initiate_write(data, NFS_CLIENT(data->inode),
> > pdata->call_ops, pdata->how);
> > }
> > @@ -1804,7 +1798,7 @@ pnfs_read_done(struct nfs_read_data *data)
> > .length = data->args.count,
> > };
> > dprintk("%s: retrying\n", __func__);
> > - _pnfs_return_layout(data->inode, &range, NULL,
> RETURN_FILE);
> > + _pnfs_return_layout(data->inode, &range, NULL,
RETURN_FILE,
> 1);
>
> ditto
>
> > pnfs_initiate_read(data, NFS_CLIENT(data->inode),
> > pdata->call_ops);
> > }
> > @@ -2030,7 +2024,7 @@ pnfs_commit_done(struct nfs_write_data *data)
> > .length = data->args.count,
> > };
> > dprintk("%s: retrying\n", __func__);
> > - _pnfs_return_layout(data->inode, &range, NULL,
> RETURN_FILE);
> > + _pnfs_return_layout(data->inode, &range, NULL,
RETURN_FILE,
> 1);
>
> ditto
>
> > pnfs_initiate_commit(data, NFS_CLIENT(data->inode),
> > pdata->call_ops, pdata->how);
> > }
> > diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> > index 9e43973..794a2d3 100644
> > --- a/fs/nfs/pnfs.h
> > +++ b/fs/nfs/pnfs.h
> > @@ -40,7 +40,8 @@ int pnfs_update_layout(struct inode *ino, struct
> nfs_open_context *ctx,
> >
> > int _pnfs_return_layout(struct inode *, struct
> nfs4_pnfs_layout_segment *,
> > const nfs4_stateid *stateid, /* optional */
> > - enum pnfs_layoutreturn_type);
> > + enum pnfs_layoutreturn_type,
> > + int sendreturn);
>
> bool
>
> > void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh
> *mntfh, u32 id);
> > void unmount_pnfs_layoutdriver(struct nfs_server *);
> > int pnfs_use_read(struct inode *inode, ssize_t count);
> > @@ -236,14 +237,15 @@ static inline void
> pnfs_modify_new_request(struct nfs_page *req,
> > static inline int pnfs_return_layout(struct inode *ino,
> > struct nfs4_pnfs_layout_segment
*lseg,
> > const nfs4_stateid *stateid, /*
optional
> */
> > - enum pnfs_layoutreturn_type type)
> > + enum pnfs_layoutreturn_type type,
> > + int sendreturn)
>
> bool
>
> Thanks!
>
> Benny
>
> > {
> > struct nfs_inode *nfsi = NFS_I(ino);
> > struct nfs_server *nfss = NFS_SERVER(ino);
> >
> > if (pnfs_enabled_sb(nfss) &&
> > (type != RETURN_FILE || has_layout(nfsi)))
> > - return _pnfs_return_layout(ino, lseg, stateid, type);
> > + return _pnfs_return_layout(ino, lseg, stateid, type,
> sendreturn);
> >
> > return 0;
> > }


2010-05-28 17:27:19

by Fred Isaman

[permalink] [raw]
Subject: Re: [PATCH 2/8] pnfs-submit: clean locking infrastructure

On Mon, May 17, 2010 at 1:56 PM, Alexandros Batsakis
<[email protected]> wrote:
> (also minor cleanup of pnfs_free_layout())
>
> Signed-off-by: Alexandros Batsakis <[email protected]>
>
> Conflicts:
>
> =A0 =A0 =A0 =A0fs/nfs/pnfs.c
> ---
> =A0fs/nfs/pnfs.c | =A0 80 +++++++++++++++++++++++++++++++++++++------=
-------------
> =A01 files changed, 53 insertions(+), 27 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index b72c013..74cb998 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1,4 +1,4 @@
> -/*
> + /*
> =A0* =A0linux/fs/nfs/pnfs.c
> =A0*
> =A0* =A0pNFS functions to call and manage layout drivers.
> @@ -60,6 +60,8 @@ static int pnfs_initialized;
> =A0static void pnfs_free_layout(struct pnfs_layout_type *lo,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct nfs4_p=
nfs_layout_segment *range);
> =A0static enum pnfs_try_status pnfs_commit(struct nfs_write_data *dat=
a, int sync);
> +static inline void lock_current_layout(struct nfs_inode *nfsi);
> +static inline void unlock_current_layout(struct nfs_inode *nfsi);
>
> =A0/* Locking:
> =A0*
> @@ -153,16 +155,17 @@ pnfs_need_layoutcommit(struct nfs_inode *nfsi, =
struct nfs_open_context *ctx)
> =A0{
> =A0 =A0 =A0 =A0dprintk("%s: has_layout=3D%d layoutcommit_ctx=3D%p ctx=
=3D%p\n", __func__,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0has_layout(nfsi), nfsi->layout.layoutc=
ommit_ctx, ctx);
> - =A0 =A0 =A0 spin_lock(&nfsi->lo_lock);
> +
> + =A0 =A0 =A0 lock_current_layout(nfsi);
> =A0 =A0 =A0 =A0if (has_layout(nfsi) && !nfsi->layout.layoutcommit_ctx=
) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nfsi->layout.layoutcommit_ctx =3D get_=
nfs_open_context(ctx);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nfsi->change_attr++;
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 unlock_current_layout(nfsi);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0dprintk("%s: Set layoutcommit_ctx=3D%p=
\n", __func__,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nfsi->layout.layoutcom=
mit_ctx);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return;
> =A0 =A0 =A0 =A0}
> - =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> + =A0 =A0 =A0 unlock_current_layout(nfsi);
> =A0}
>
> =A0/* Update last_write_offset for layoutcommit.
> @@ -175,7 +178,7 @@ pnfs_update_last_write(struct nfs_inode *nfsi, lo=
ff_t offset, size_t extent)
> =A0{
> =A0 =A0 =A0 =A0loff_t end_pos;
>
> - =A0 =A0 =A0 spin_lock(&nfsi->lo_lock);
> + =A0 =A0 =A0 lock_current_layout(nfsi);
> =A0 =A0 =A0 =A0if (offset < nfsi->layout.pnfs_write_begin_pos)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nfsi->layout.pnfs_write_begin_pos =3D =
offset;
> =A0 =A0 =A0 =A0end_pos =3D offset + extent - 1; /* I'm being inclusiv=
e */
> @@ -187,7 +190,7 @@ pnfs_update_last_write(struct nfs_inode *nfsi, lo=
ff_t offset, size_t extent)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(unsigned long) offset ,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(unsigned long) nfsi->layout.pnfs_writ=
e_begin_pos,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(unsigned long) nfsi->layout.pnfs_writ=
e_end_pos);
> - =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> + =A0 =A0 =A0 unlock_current_layout(nfsi);
> =A0}
>
> =A0/* Unitialize a mountpoint in a layout driver */
> @@ -296,12 +299,27 @@ pnfs_unregister_layoutdriver(struct pnfs_layout=
driver_type *ld_type)
> =A0* pNFS client layout cache
> =A0*/
> =A0#if defined(CONFIG_SMP)
> +#define BUG_ON_LOCKED_LO(lo) \
> + =A0 =A0 =A0 BUG_ON(spin_is_locked(&PNFS_NFS_INODE(lo)->lo_lock))
> =A0#define BUG_ON_UNLOCKED_LO(lo) \
> =A0 =A0 =A0 =A0BUG_ON(!spin_is_locked(&PNFS_NFS_INODE(lo)->lo_lock))
> =A0#else /* CONFIG_SMP */
> +#define BUG_ON_LOCKED_LO(lo) do {} while (0)
> =A0#define BUG_ON_UNLOCKED_LO(lo) do {} while (0)
> =A0#endif /* CONFIG_SMP */
>
> +static inline void lock_current_layout(struct nfs_inode *nfsi)
> +{
> + =A0 =A0 =A0 BUG_ON_LOCKED_LO((&nfsi->layout));

I just ran into this in testing. This check causes problems. If you
know it is already unlocked, you wouldn't have to "spin".

=46red

> + =A0 =A0 =A0 spin_lock(&nfsi->lo_lock);
> +}
> +
> +static inline void unlock_current_layout(struct nfs_inode *nfsi)
> +{
> + =A0 =A0 =A0 BUG_ON_UNLOCKED_LO((&nfsi->layout));
> + =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> +}
> +
> =A0/*
> =A0* get and lock nfsi->layout
> =A0*/
> @@ -310,10 +328,10 @@ get_lock_current_layout(struct nfs_inode *nfsi)
> =A0{
> =A0 =A0 =A0 =A0struct pnfs_layout_type *lo;
>
> + =A0 =A0 =A0 lock_current_layout(nfsi);
> =A0 =A0 =A0 =A0lo =3D &nfsi->layout;
> - =A0 =A0 =A0 spin_lock(&nfsi->lo_lock);
> =A0 =A0 =A0 =A0if (!lo->ld_data) {
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 unlock_current_layout(nfsi);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return NULL;
> =A0 =A0 =A0 =A0}
>
> @@ -333,7 +351,12 @@ put_unlock_current_layout(struct pnfs_layout_typ=
e *lo)
> =A0 =A0 =A0 =A0BUG_ON_UNLOCKED_LO(lo);
> =A0 =A0 =A0 =A0BUG_ON(lo->refcount <=3D 0);
>
> - =A0 =A0 =A0 if (--lo->refcount =3D=3D 0 && list_empty(&lo->segs)) {
> + =A0 =A0 =A0 lo->refcount--;
> +
> + =A0 =A0 =A0 if (lo->refcount > 0)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out;
> +
> + =A0 =A0 =A0 if (list_empty(&lo->segs)) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct layoutdriver_io_operations *io_=
ops =3D
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0PNFS_LD_IO_OPS(lo);
>
> @@ -347,7 +370,8 @@ put_unlock_current_layout(struct pnfs_layout_type=
*lo)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0list_del_init(&nfsi->lo_inodes);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0spin_unlock(&clp->cl_lock);
> =A0 =A0 =A0 =A0}
> - =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> +out:
> + =A0 =A0 =A0 unlock_current_layout(nfsi);
> =A0}
>
> =A0void
> @@ -356,7 +380,7 @@ pnfs_layout_release(struct pnfs_layout_type *lo, =
atomic_t *count,
> =A0{
> =A0 =A0 =A0 =A0struct nfs_inode *nfsi =3D PNFS_NFS_INODE(lo);
>
> - =A0 =A0 =A0 spin_lock(&nfsi->lo_lock);
> + =A0 =A0 =A0 lock_current_layout(nfsi);
> =A0 =A0 =A0 =A0if (range)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0pnfs_free_layout(lo, range);
> =A0 =A0 =A0 =A0atomic_dec(count);
> @@ -375,6 +399,8 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
> =A0 =A0 =A0 =A0};
>
> =A0 =A0 =A0 =A0lo =3D get_lock_current_layout(nfsi);
> + =A0 =A0 =A0 if (!lo)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
> =A0 =A0 =A0 =A0pnfs_free_layout(lo, &range);
> =A0 =A0 =A0 =A0put_unlock_current_layout(lo);
> =A0}
> @@ -652,7 +678,7 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi=
,
> =A0 =A0 =A0 =A0struct pnfs_layout_segment *lseg;
> =A0 =A0 =A0 =A0bool ret =3D false;
>
> - =A0 =A0 =A0 spin_lock(&nfsi->lo_lock);
> + =A0 =A0 =A0 lock_current_layout(nfsi);
> =A0 =A0 =A0 =A0list_for_each_entry (lseg, &nfsi->layout.segs, fi_list=
) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!should_free_lseg(lseg, range))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue;
> @@ -666,7 +692,7 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi=
,
> =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0if (atomic_read(&nfsi->layout.lgetcount))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ret =3D true;
> - =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> + =A0 =A0 =A0 unlock_current_layout(nfsi);
>
> =A0 =A0 =A0 =A0dprintk("%s:Return %d\n", __func__, ret);
> =A0 =A0 =A0 =A0return ret;
> @@ -756,7 +782,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs=
4_pnfs_layout_segment *range,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* unlock w/o put rebalanced by eventu=
al call to
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 * pnfs_layout_release
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 */
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 unlock_current_layout(nfsi);
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (pnfs_return_layout_barrier(nfsi, &=
arg)) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0dprintk("%s: waiting\n=
", __func__);
> @@ -887,7 +913,7 @@ static int pnfs_wait_schedule(void *word)
> =A0*
> =A0* Note: If successful, nfsi->lo_lock is taken and the caller
> =A0* must put and unlock current_layout by using put_unlock_current_l=
ayout()
> - * when the returned layout is released.
> + * directly or pnfs_layout_release() when the returned layout is rel=
eased.
> =A0*/
> =A0static struct pnfs_layout_type *
> =A0get_lock_alloc_layout(struct inode *ino)
> @@ -922,7 +948,7 @@ get_lock_alloc_layout(struct inode *ino)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_client *clp=
=3D NFS_SERVER(ino)->nfs_client;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* must grab the layou=
t lock before the client lock */
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_lock(&nfsi->lo_loc=
k);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 lock_current_layout(nfs=
i);
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0spin_lock(&clp->cl_loc=
k);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (list_empty(&nfsi->=
lo_inodes))
> @@ -1038,10 +1064,10 @@ void drain_layoutreturns(struct pnfs_layout_t=
ype *lo)
> =A0 =A0 =A0 =A0while (atomic_read(&lo->lretcount)) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_inode *nfsi =3D PNFS_NFS_IN=
ODE(lo);
>
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 unlock_current_layout(nfsi);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0dprintk("%s: waiting\n", __func__);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wait_event(nfsi->lo_waitq, (atomic_rea=
d(&lo->lretcount) =3D=3D 0));
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_lock(&nfsi->lo_lock);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 lock_current_layout(nfsi);
> =A0 =A0 =A0 =A0}
> =A0}
>
> @@ -1080,13 +1106,13 @@ pnfs_update_layout(struct inode *ino,
> =A0 =A0 =A0 =A0/* Check to see if the layout for the given range alre=
ady exists */
> =A0 =A0 =A0 =A0lseg =3D pnfs_has_layout(lo, &arg, take_ref, !take_ref=
);
> =A0 =A0 =A0 =A0if (lseg && !lseg->valid) {
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 unlock_current_layout(nfsi);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (take_ref)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0put_lseg(lseg);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0for (;;) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0prepare_to_wait(&nfsi-=
>lo_waitq, &__wait,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0TASK_KILLABLE);
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_lock(&nfsi->lo_loc=
k);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 lock_current_layout(nfs=
i);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0lseg =3D pnfs_has_layo=
ut(lo, &arg, take_ref, !take_ref);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!lseg || lseg->val=
id)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break;
> @@ -1099,7 +1125,7 @@ pnfs_update_layout(struct inode *ino,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0result=
=3D -ERESTARTSYS;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 spin_unlock(&nfsi->lo_l=
ock);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 unlock_current_layout(n=
fsi);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0schedule();
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0finish_wait(&nfsi->lo_waitq, &__wait);
> @@ -1136,7 +1162,7 @@ pnfs_update_layout(struct inode *ino,
> =A0 =A0 =A0 =A0/* Matching dec is done in .rpc_release (on non-error =
paths) */
> =A0 =A0 =A0 =A0atomic_inc(&lo->lgetcount);
> =A0 =A0 =A0 =A0/* Lose lock, but not reference, match this with pnfs_=
layout_release */
> - =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> + =A0 =A0 =A0 unlock_current_layout(nfsi);
>
> =A0 =A0 =A0 =A0result =3D get_layout(ino, ctx, &arg, lsegpp, lo);
> =A0out:
> @@ -1286,7 +1312,7 @@ pnfs_layout_process(struct nfs4_pnfs_layoutget =
*lgp)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*lgp->lsegpp =3D lseg;
> =A0 =A0 =A0 =A0}
>
> - =A0 =A0 =A0 spin_lock(&nfsi->lo_lock);
> + =A0 =A0 =A0 lock_current_layout(nfsi);
> =A0 =A0 =A0 =A0pnfs_insert_layout(lo, lseg);
>
> =A0 =A0 =A0 =A0if (res->return_on_close) {
> @@ -1297,7 +1323,7 @@ pnfs_layout_process(struct nfs4_pnfs_layoutget =
*lgp)
>
> =A0 =A0 =A0 =A0/* Done processing layoutget. Set the layout stateid *=
/
> =A0 =A0 =A0 =A0pnfs_set_layout_stateid(lo, &res->stateid);
> - =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> + =A0 =A0 =A0 unlock_current_layout(nfsi);
> =A0out:
> =A0 =A0 =A0 =A0return status;
> =A0}
> @@ -2212,7 +2238,7 @@ pnfs_layoutcommit_inode(struct inode *inode, in=
t sync)
> =A0 =A0 =A0 =A0if (!data)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return -ENOMEM;
>
> - =A0 =A0 =A0 spin_lock(&nfsi->lo_lock);
> + =A0 =A0 =A0 lock_current_layout(nfsi);
> =A0 =A0 =A0 =A0if (!nfsi->layout.layoutcommit_ctx)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto out_unlock;
>
> @@ -2233,7 +2259,7 @@ pnfs_layoutcommit_inode(struct inode *inode, in=
t sync)
> =A0 =A0 =A0 =A0nfsi->layout.layoutcommit_ctx =3D NULL;
>
> =A0 =A0 =A0 =A0/* release lock on pnfs layoutcommit attrs */
> - =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> + =A0 =A0 =A0 unlock_current_layout(nfsi);
>
> =A0 =A0 =A0 =A0data->is_sync =3D sync;
> =A0 =A0 =A0 =A0status =3D pnfs4_proc_layoutcommit(data);
> @@ -2242,7 +2268,7 @@ out:
> =A0 =A0 =A0 =A0return status;
> =A0out_unlock:
> =A0 =A0 =A0 =A0pnfs_layoutcommit_free(data);
> - =A0 =A0 =A0 spin_unlock(&nfsi->lo_lock);
> + =A0 =A0 =A0 unlock_current_layout(nfsi);
> =A0 =A0 =A0 =A0goto out;
> =A0}
>
> --
> 1.6.2.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to [email protected]
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

2010-05-26 08:26:09

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 4/8] pnfs-submit: change stateid to be a union

On May. 17, 2010, 20:56 +0300, Alexandros Batsakis <[email protected]> wrote:
> In NFSv4.1 the stateid consists of the "other" and "seqid" fields. For layout
> processing we need to numerically compare the seqid value among layout stateids.
> To do so, introduce a union to nfs4_stateid to switch between opaque(16 bytes)
> and opaque(12 bytes) / __be32
>
> Signed-off-by: Alexandros Batsakis <[email protected]>
> ---
> fs/nfs/callback_proc.c | 13 +++++++------
> fs/nfs/callback_xdr.c | 2 +-
> fs/nfs/delegation.c | 19 +++++++++++--------
> fs/nfs/nfs4proc.c | 41 +++++++++++++++++++++++++----------------
> fs/nfs/nfs4state.c | 4 ++--
> fs/nfs/nfs4xdr.c | 38 +++++++++++++++++++++-----------------
> fs/nfs/pnfs.c | 11 ++++++-----
> fs/nfsd/nfs4callback.c | 1 -
> include/linux/nfs4.h | 18 ++++++++++++++++--
> 9 files changed, 89 insertions(+), 58 deletions(-)
>
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index ebf86df..8ef1502 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -121,8 +121,9 @@ out:
>
> int nfs4_validate_delegation_stateid(struct nfs_delegation *delegation, const nfs4_stateid *stateid)
> {
> - if (delegation == NULL || memcmp(delegation->stateid.data, stateid->data,
> - sizeof(delegation->stateid.data)) != 0)
> + if (delegation == NULL || memcmp(delegation->stateid.u.data,
> + stateid->u.data,
> + sizeof(delegation->stateid.u.data)))
> return 0;
> return 1;
> }
> @@ -383,11 +384,11 @@ int nfs41_validate_delegation_stateid(struct nfs_delegation *delegation, const n
> if (delegation == NULL)
> return 0;
>
> - /* seqid is 4-bytes long */
> - if (((u32 *) &stateid->data)[0] != 0)
> + if (stateid->u.stateid.seqid != 0)
> return 0;
> - if (memcmp(&delegation->stateid.data[4], &stateid->data[4],
> - sizeof(stateid->data)-4))
> + if (memcmp(&delegation->stateid.u.stateid.other,
> + &stateid->u.stateid.other,
> + NFS4_STATEID_OTHER_SIZE))
> return 0;
>
> return 1;
> diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
> index 69a026d..7e34bb3 100644
> --- a/fs/nfs/callback_xdr.c
> +++ b/fs/nfs/callback_xdr.c
> @@ -138,7 +138,7 @@ static __be32 decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
> p = read_buf(xdr, 16);
> if (unlikely(p == NULL))
> return htonl(NFS4ERR_RESOURCE);
> - memcpy(stateid->data, p, 16);
> + memcpy(stateid->u.data, p, 16);
> return 0;
> }
>
> diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
> index 1567124..2e43ae2 100644
> --- a/fs/nfs/delegation.c
> +++ b/fs/nfs/delegation.c
> @@ -108,7 +108,8 @@ again:
> continue;
> if (!test_bit(NFS_DELEGATED_STATE, &state->flags))
> continue;
> - if (memcmp(state->stateid.data, stateid->data, sizeof(state->stateid.data)) != 0)
> + if (memcmp(state->stateid.u.data, stateid->u.data,
> + sizeof(state->stateid.u.data)) != 0)
> continue;
> get_nfs_open_context(ctx);
> spin_unlock(&inode->i_lock);
> @@ -134,8 +135,8 @@ void nfs_inode_reclaim_delegation(struct inode *inode, struct rpc_cred *cred, st
>
> if (delegation == NULL)
> return;
> - memcpy(delegation->stateid.data, res->delegation.data,
> - sizeof(delegation->stateid.data));
> + memcpy(delegation->stateid.u.data, res->delegation.u.data,
> + sizeof(delegation->stateid.u.data));
> delegation->type = res->delegation_type;
> delegation->maxsize = res->maxsize;
> oldcred = delegation->cred;
> @@ -173,8 +174,9 @@ static struct nfs_delegation *nfs_detach_delegation_locked(struct nfs_inode *nfs
> if (delegation == NULL)
> goto nomatch;
> spin_lock(&delegation->lock);
> - if (stateid != NULL && memcmp(delegation->stateid.data, stateid->data,
> - sizeof(delegation->stateid.data)) != 0)
> + if (stateid != NULL && memcmp(delegation->stateid.u.data,
> + stateid->u.data,
> + sizeof(delegation->stateid.u.data)) != 0)
> goto nomatch_unlock;
> list_del_rcu(&delegation->super_list);
> delegation->inode = NULL;
> @@ -202,8 +204,8 @@ int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct
> delegation = kmalloc(sizeof(*delegation), GFP_KERNEL);
> if (delegation == NULL)
> return -ENOMEM;
> - memcpy(delegation->stateid.data, res->delegation.data,
> - sizeof(delegation->stateid.data));
> + memcpy(delegation->stateid.u.data, res->delegation.u.data,
> + sizeof(delegation->stateid.u.data));
> delegation->type = res->delegation_type;
> delegation->maxsize = res->maxsize;
> delegation->change_attr = nfsi->change_attr;
> @@ -546,7 +548,8 @@ int nfs4_copy_delegation_stateid(nfs4_stateid *dst, struct inode *inode)
> rcu_read_lock();
> delegation = rcu_dereference(nfsi->delegation);
> if (delegation != NULL) {
> - memcpy(dst->data, delegation->stateid.data, sizeof(dst->data));
> + memcpy(dst->u.data, delegation->stateid.u.data,
> + sizeof(dst->u.data));
> ret = 1;
> }
> rcu_read_unlock();
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index a8a6ad2..6739d15 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -865,8 +865,10 @@ static void update_open_stateflags(struct nfs4_state *state, fmode_t fmode)
> static void nfs_set_open_stateid_locked(struct nfs4_state *state, nfs4_stateid *stateid, fmode_t fmode)
> {
> if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
> - memcpy(state->stateid.data, stateid->data, sizeof(state->stateid.data));
> - memcpy(state->open_stateid.data, stateid->data, sizeof(state->open_stateid.data));
> + memcpy(state->stateid.u.data, stateid->u.data,
> + sizeof(state->stateid.u.data));
> + memcpy(state->open_stateid.u.data, stateid->u.data,
> + sizeof(state->open_stateid.u.data));
> switch (fmode) {
> case FMODE_READ:
> set_bit(NFS_O_RDONLY_STATE, &state->flags);
> @@ -894,7 +896,8 @@ static void __update_open_stateid(struct nfs4_state *state, nfs4_stateid *open_s
> */
> write_seqlock(&state->seqlock);
> if (deleg_stateid != NULL) {
> - memcpy(state->stateid.data, deleg_stateid->data, sizeof(state->stateid.data));
> + memcpy(state->stateid.u.data, deleg_stateid->u.data,
> + sizeof(state->stateid.u.data));
> set_bit(NFS_DELEGATED_STATE, &state->flags);
> }
> if (open_stateid != NULL)
> @@ -925,7 +928,8 @@ static int update_open_stateid(struct nfs4_state *state, nfs4_stateid *open_stat
>
> if (delegation == NULL)
> delegation = &deleg_cur->stateid;
> - else if (memcmp(deleg_cur->stateid.data, delegation->data, NFS4_STATEID_SIZE) != 0)
> + else if (memcmp(deleg_cur->stateid.u.data, delegation->u.data,
> + NFS4_STATEID_SIZE) != 0)
> goto no_delegation_unlock;
>
> nfs_mark_delegation_referenced(deleg_cur);
> @@ -987,7 +991,8 @@ static struct nfs4_state *nfs4_try_open_cached(struct nfs4_opendata *opendata)
> break;
> }
> /* Save the delegation */
> - memcpy(stateid.data, delegation->stateid.data, sizeof(stateid.data));
> + memcpy(stateid.u.data, delegation->stateid.u.data,
> + sizeof(stateid.u.data));
> rcu_read_unlock();
> ret = nfs_may_open(state->inode, state->owner->so_cred, open_mode);
> if (ret != 0)
> @@ -1152,10 +1157,13 @@ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *
> * Check if we need to update the current stateid.
> */
> if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0 &&
> - memcmp(state->stateid.data, state->open_stateid.data, sizeof(state->stateid.data)) != 0) {
> + memcmp(state->stateid.u.data, state->open_stateid.u.data,
> + sizeof(state->stateid.u.data)) != 0) {
> write_seqlock(&state->seqlock);
> if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
> - memcpy(state->stateid.data, state->open_stateid.data, sizeof(state->stateid.data));
> + memcpy(state->stateid.u.data,
> + state->open_stateid.u.data,
> + sizeof(state->stateid.u.data));
> write_sequnlock(&state->seqlock);
> }
> pnfs4_layout_reclaim(state);
> @@ -1226,8 +1234,8 @@ static int _nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs
> if (IS_ERR(opendata))
> return PTR_ERR(opendata);
> opendata->o_arg.claim = NFS4_OPEN_CLAIM_DELEGATE_CUR;
> - memcpy(opendata->o_arg.u.delegation.data, stateid->data,
> - sizeof(opendata->o_arg.u.delegation.data));
> + memcpy(opendata->o_arg.u.delegation.u.data, stateid->u.data,
> + sizeof(opendata->o_arg.u.delegation.u.data));
> ret = nfs4_open_recover(opendata, state);
> nfs4_opendata_put(opendata);
> return ret;
> @@ -1285,8 +1293,8 @@ static void nfs4_open_confirm_done(struct rpc_task *task, void *calldata)
> if (RPC_ASSASSINATED(task))
> return;
> if (data->rpc_status == 0) {
> - memcpy(data->o_res.stateid.data, data->c_res.stateid.data,
> - sizeof(data->o_res.stateid.data));
> + memcpy(data->o_res.stateid.u.data, data->c_res.stateid.u.data,
> + sizeof(data->o_res.stateid.u.data));
> nfs_confirm_seqid(&data->owner->so_seqid, 0);
> renew_lease(data->o_res.server, data->timestamp);
> data->rpc_done = 1;
> @@ -4095,9 +4103,10 @@ static void nfs4_locku_done(struct rpc_task *task, void *data)
> return;
> switch (task->tk_status) {
> case 0:
> - memcpy(calldata->lsp->ls_stateid.data,
> - calldata->res.stateid.data,
> - sizeof(calldata->lsp->ls_stateid.data));
> + memcpy(calldata->lsp->ls_stateid.u.data,
> + calldata->res.stateid.u.data,
> + sizeof(calldata->lsp->ls_stateid.u.
> + data));
> renew_lease(calldata->server, calldata->timestamp);
> break;
> case -NFS4ERR_BAD_STATEID:
> @@ -4310,8 +4319,8 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
> goto out;
> }
> if (data->rpc_status == 0) {
> - memcpy(data->lsp->ls_stateid.data, data->res.stateid.data,
> - sizeof(data->lsp->ls_stateid.data));
> + memcpy(data->lsp->ls_stateid.u.data, data->res.stateid.u.data,
> + sizeof(data->lsp->ls_stateid.u.data));
> data->lsp->ls_flags |= NFS_LOCK_INITIALIZED;
> renew_lease(NFS_SERVER(data->ctx->path.dentry->d_inode), data->timestamp);
> }
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 3765ca1..2d52300 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1049,8 +1049,8 @@ restart:
> * Open state on this file cannot be recovered
> * All we can do is revert to using the zero stateid.
> */
> - memset(state->stateid.data, 0,
> - sizeof(state->stateid.data));
> + memset(state->stateid.u.data, 0,
> + sizeof(state->stateid.u.data));
> /* Mark the file as being 'closed' */
> state->state = 0;
> break;
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index 718824c..44f789a 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -1000,7 +1000,7 @@ static void encode_close(struct xdr_stream *xdr, const struct nfs_closeargs *arg
> p = reserve_space(xdr, 8+NFS4_STATEID_SIZE);
> *p++ = cpu_to_be32(OP_CLOSE);
> *p++ = cpu_to_be32(arg->seqid->sequence->counter);
> - xdr_encode_opaque_fixed(p, arg->stateid->data, NFS4_STATEID_SIZE);
> + xdr_encode_opaque_fixed(p, arg->stateid->u.data, NFS4_STATEID_SIZE);
> hdr->nops++;
> hdr->replen += decode_close_maxsz;
> }
> @@ -1174,7 +1174,8 @@ static void encode_lock(struct xdr_stream *xdr, const struct nfs_lock_args *args
> if (args->new_lock_owner){
> p = reserve_space(xdr, 4+NFS4_STATEID_SIZE+32);
> *p++ = cpu_to_be32(args->open_seqid->sequence->counter);
> - p = xdr_encode_opaque_fixed(p, args->open_stateid->data, NFS4_STATEID_SIZE);
> + p = xdr_encode_opaque_fixed(p, args->open_stateid->u.data,
> + NFS4_STATEID_SIZE);
> *p++ = cpu_to_be32(args->lock_seqid->sequence->counter);
> p = xdr_encode_hyper(p, args->lock_owner.clientid);
> *p++ = cpu_to_be32(16);
> @@ -1183,7 +1184,7 @@ static void encode_lock(struct xdr_stream *xdr, const struct nfs_lock_args *args
> }
> else {
> p = reserve_space(xdr, NFS4_STATEID_SIZE+4);
> - p = xdr_encode_opaque_fixed(p, args->lock_stateid->data, NFS4_STATEID_SIZE);
> + p = xdr_encode_opaque_fixed(p, args->lock_stateid->u.data, NFS4_STATEID_SIZE);
> *p = cpu_to_be32(args->lock_seqid->sequence->counter);
> }
> hdr->nops++;
> @@ -1215,7 +1216,8 @@ static void encode_locku(struct xdr_stream *xdr, const struct nfs_locku_args *ar
> *p++ = cpu_to_be32(OP_LOCKU);
> *p++ = cpu_to_be32(nfs4_lock_type(args->fl, 0));
> *p++ = cpu_to_be32(args->seqid->sequence->counter);
> - p = xdr_encode_opaque_fixed(p, args->stateid->data, NFS4_STATEID_SIZE);
> + p = xdr_encode_opaque_fixed(p, args->stateid->u.data,
> + NFS4_STATEID_SIZE);
> p = xdr_encode_hyper(p, args->fl->fl_start);
> xdr_encode_hyper(p, nfs4_lock_length(args->fl));
> hdr->nops++;
> @@ -1365,7 +1367,7 @@ static inline void encode_claim_delegate_cur(struct xdr_stream *xdr, const struc
>
> p = reserve_space(xdr, 4+NFS4_STATEID_SIZE);
> *p++ = cpu_to_be32(NFS4_OPEN_CLAIM_DELEGATE_CUR);
> - xdr_encode_opaque_fixed(p, stateid->data, NFS4_STATEID_SIZE);
> + xdr_encode_opaque_fixed(p, stateid->u.data, NFS4_STATEID_SIZE);
> encode_string(xdr, name->len, name->name);
> }
>
> @@ -1396,7 +1398,7 @@ static void encode_open_confirm(struct xdr_stream *xdr, const struct nfs_open_co
>
> p = reserve_space(xdr, 4+NFS4_STATEID_SIZE+4);
> *p++ = cpu_to_be32(OP_OPEN_CONFIRM);
> - p = xdr_encode_opaque_fixed(p, arg->stateid->data, NFS4_STATEID_SIZE);
> + p = xdr_encode_opaque_fixed(p, arg->stateid->u.data, NFS4_STATEID_SIZE);
> *p = cpu_to_be32(arg->seqid->sequence->counter);
> hdr->nops++;
> hdr->replen += decode_open_confirm_maxsz;
> @@ -1408,7 +1410,7 @@ static void encode_open_downgrade(struct xdr_stream *xdr, const struct nfs_close
>
> p = reserve_space(xdr, 4+NFS4_STATEID_SIZE+4);
> *p++ = cpu_to_be32(OP_OPEN_DOWNGRADE);
> - p = xdr_encode_opaque_fixed(p, arg->stateid->data, NFS4_STATEID_SIZE);
> + p = xdr_encode_opaque_fixed(p, arg->stateid->u.data, NFS4_STATEID_SIZE);
> *p = cpu_to_be32(arg->seqid->sequence->counter);
> encode_share_access(xdr, arg->fmode);
> hdr->nops++;
> @@ -1446,9 +1448,10 @@ static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context
> p = reserve_space(xdr, NFS4_STATEID_SIZE);
> if (ctx->state != NULL) {
> nfs4_copy_stateid(&stateid, ctx->state, ctx->lockowner);
> - xdr_encode_opaque_fixed(p, stateid.data, NFS4_STATEID_SIZE);
> + xdr_encode_opaque_fixed(p, stateid.u.data,
> + NFS4_STATEID_SIZE);
> } else
> - xdr_encode_opaque_fixed(p, zero_stateid.data, NFS4_STATEID_SIZE);
> + xdr_encode_opaque_fixed(p, zero_stateid.u.data, NFS4_STATEID_SIZE);
> }
>
> static void encode_read(struct xdr_stream *xdr, const struct nfs_readargs *args, struct compound_hdr *hdr)
> @@ -1562,7 +1565,7 @@ encode_setacl(struct xdr_stream *xdr, struct nfs_setaclargs *arg, struct compoun
>
> p = reserve_space(xdr, 4+NFS4_STATEID_SIZE);
> *p++ = cpu_to_be32(OP_SETATTR);
> - xdr_encode_opaque_fixed(p, zero_stateid.data, NFS4_STATEID_SIZE);
> + xdr_encode_opaque_fixed(p, zero_stateid.u.data, NFS4_STATEID_SIZE);
> p = reserve_space(xdr, 2*4);
> *p++ = cpu_to_be32(1);
> *p = cpu_to_be32(FATTR4_WORD0_ACL);
> @@ -1593,7 +1596,7 @@ static void encode_setattr(struct xdr_stream *xdr, const struct nfs_setattrargs
>
> p = reserve_space(xdr, 4+NFS4_STATEID_SIZE);
> *p++ = cpu_to_be32(OP_SETATTR);
> - xdr_encode_opaque_fixed(p, arg->stateid.data, NFS4_STATEID_SIZE);
> + xdr_encode_opaque_fixed(p, arg->stateid.u.data, NFS4_STATEID_SIZE);
> hdr->nops++;
> hdr->replen += decode_setattr_maxsz;
> encode_attrs(xdr, arg->iap, server);
> @@ -1656,7 +1659,7 @@ static void encode_delegreturn(struct xdr_stream *xdr, const nfs4_stateid *state
> p = reserve_space(xdr, 4+NFS4_STATEID_SIZE);
>
> *p++ = cpu_to_be32(OP_DELEGRETURN);
> - xdr_encode_opaque_fixed(p, stateid->data, NFS4_STATEID_SIZE);
> + xdr_encode_opaque_fixed(p, stateid->u.data, NFS4_STATEID_SIZE);
> hdr->nops++;
> hdr->replen += decode_delegreturn_maxsz;
> }
> @@ -1866,7 +1869,8 @@ encode_layoutget(struct xdr_stream *xdr,
> p = xdr_encode_hyper(p, args->lseg.offset);
> p = xdr_encode_hyper(p, args->lseg.length);
> p = xdr_encode_hyper(p, args->minlength);
> - p = xdr_encode_opaque_fixed(p, &args->stateid.data, NFS4_STATEID_SIZE);
> + p = xdr_encode_opaque_fixed(p, &args->stateid.u.data,
> + NFS4_STATEID_SIZE);
> *p = cpu_to_be32(args->maxcount);
>
> dprintk("%s: 1st type:0x%x iomode:%d off:%lu len:%lu mc:%d\n",
> @@ -1898,7 +1902,7 @@ encode_layoutcommit(struct xdr_stream *xdr,
> p = xdr_encode_hyper(p, args->lseg.offset);
> p = xdr_encode_hyper(p, args->lseg.length);
> *p++ = cpu_to_be32(0); /* reclaim */
> - p = xdr_encode_opaque_fixed(p, args->stateid.data, NFS4_STATEID_SIZE);
> + p = xdr_encode_opaque_fixed(p, args->stateid.u.data, NFS4_STATEID_SIZE);
> *p++ = cpu_to_be32(1); /* newoffset = TRUE */
> p = xdr_encode_hyper(p, args->lastbytewritten);
> *p = cpu_to_be32(args->time_modify_changed != 0);
> @@ -1945,7 +1949,7 @@ encode_layoutreturn(struct xdr_stream *xdr,
> p = reserve_space(xdr, 16 + NFS4_STATEID_SIZE);
> p = xdr_encode_hyper(p, args->lseg.offset);
> p = xdr_encode_hyper(p, args->lseg.length);
> - p = xdr_encode_opaque_fixed(p, &args->stateid.data,
> + p = xdr_encode_opaque_fixed(p, &args->stateid.u.data,
> NFS4_STATEID_SIZE);
>
> dprintk("%s: call %pF\n", __func__,
> @@ -3989,7 +3993,7 @@ static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)
>
> static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
> {
> - return decode_opaque_fixed(xdr, stateid->data, NFS4_STATEID_SIZE);
> + return decode_opaque_fixed(xdr, stateid->u.data, NFS4_STATEID_SIZE);
> }
>
> static int decode_close(struct xdr_stream *xdr, struct nfs_closeres *res)
> @@ -5281,7 +5285,7 @@ static int decode_layoutget(struct xdr_stream *xdr, struct rpc_rqst *req,
> if (unlikely(!p))
> goto out_overflow;
> res->return_on_close = be32_to_cpup(p++);
> - p = xdr_decode_opaque_fixed(p, res->stateid.data, NFS4_STATEID_SIZE);
> + p = xdr_decode_opaque_fixed(p, res->stateid.u.data, NFS4_STATEID_SIZE);
> layout_count = be32_to_cpup(p);
> if (!layout_count) {
> dprintk("%s: server responded with empty layout array\n",
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index b2d203f..ecf6dc2 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -505,7 +505,7 @@ pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
> const nfs4_stateid *stateid)
> {
> write_seqlock(&lo->seqlock);
> - memcpy(lo->stateid.data, stateid->data, sizeof(lo->stateid.data));
> + memcpy(lo->stateid.u.data, stateid->u.data, sizeof(lo->stateid.u.data));
> write_sequnlock(&lo->seqlock);
> }
>
> @@ -518,7 +518,8 @@ pnfs_get_layout_stateid(nfs4_stateid *dst, struct pnfs_layout_type *lo)
>
> do {
> seq = read_seqbegin(&lo->seqlock);
> - memcpy(dst->data, lo->stateid.data, sizeof(lo->stateid.data));
> + memcpy(dst->u.data, lo->stateid.u.data,
> + sizeof(lo->stateid.u.data));
> } while (read_seqretry(&lo->seqlock, seq));
>
> dprintk("<-- %s\n", __func__);
> @@ -533,8 +534,8 @@ pnfs_layout_from_open_stateid(nfs4_stateid *dst, struct nfs4_state *state)
>
> do {
> seq = read_seqbegin(&state->seqlock);
> - memcpy(dst->data, state->stateid.data,
> - sizeof(state->stateid.data));
> + memcpy(dst->u.data, state->stateid.u.data,
> + sizeof(state->stateid.u.data));
> } while (read_seqretry(&state->seqlock, seq));
>
> dprintk("<-- %s\n", __func__);
> @@ -580,7 +581,7 @@ get_layout(struct inode *ino,
> lgp->args.inode = ino;
> lgp->lsegpp = lsegpp;
>
> - if (!memcmp(lo->stateid.data, &zero_stateid, NFS4_STATEID_SIZE)) {
> + if (!memcmp(lo->stateid.u.data, &zero_stateid, NFS4_STATEID_SIZE)) {
> struct nfs_open_context *oldctx = ctx;
>
> if (!oldctx) {
> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> index 7e32bd3..6a1fff7 100644
> --- a/fs/nfsd/nfs4callback.c
> +++ b/fs/nfsd/nfs4callback.c
> @@ -40,7 +40,6 @@
>
> #define NFSPROC4_CB_NULL 0
> #define NFSPROC4_CB_COMPOUND 1
> -#define NFS4_STATEID_SIZE 16
>
> /* Index of predefined Linux callback client operations */
>
> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> index 2bb8eeb..89020e5 100644
> --- a/include/linux/nfs4.h
> +++ b/include/linux/nfs4.h
> @@ -17,7 +17,10 @@
>
> #define NFS4_BITMAP_SIZE 2
> #define NFS4_VERIFIER_SIZE 8
> -#define NFS4_STATEID_SIZE 16
> +#define NFS4_STATEID_SEQID_SIZE 4
> +#define NFS4_STATEID_OTHER_SIZE 12
> +#define NFS4_STATEID_SIZE (NFS4_STATEID_OTHER_SIZE + \
> + NFS4_STATEID_SEQID_SIZE)
> #define NFS4_FHSIZE 128
> #define NFS4_MAXPATHLEN PATH_MAX
> #define NFS4_MAXNAMLEN NAME_MAX
> @@ -174,7 +177,16 @@ struct nfs4_acl {
> };
>
> typedef struct { char data[NFS4_VERIFIER_SIZE]; } nfs4_verifier;
> -typedef struct { char data[NFS4_STATEID_SIZE]; } nfs4_stateid;
> +typedef struct {
> + union {
> + char data[NFS4_STATEID_SIZE];
> + struct {
> + __be32 seqid;
> + char other[NFS4_STATEID_OTHER_SIZE];
> + } stateid;

let's define this struct separately and use the __packed attribute.

> + } u;
> +} nfs4_stateid;
> +
>
> enum nfs_opnum4 {
> OP_ACCESS = 3,
> @@ -554,6 +566,8 @@ struct nfs4_sessionid {
> unsigned char data[NFS4_MAX_SESSIONID_LEN];
> };
>
> +
> +

Why? ;-)

Benny

> /* Create Session Flags */
> #define SESSION4_PERSIST 0x001
> #define SESSION4_BACK_CHAN 0x002


2010-05-26 08:28:40

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 2/8] pnfs-submit: clean locking infrastructure

On May. 17, 2010, 20:56 +0300, Alexandros Batsakis <[email protected]> wrote:
> (also minor cleanup of pnfs_free_layout())
>
> Signed-off-by: Alexandros Batsakis <[email protected]>
>
> Conflicts:
>
> fs/nfs/pnfs.c
> ---
> fs/nfs/pnfs.c | 80 +++++++++++++++++++++++++++++++++++++-------------------
> 1 files changed, 53 insertions(+), 27 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index b72c013..74cb998 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1,4 +1,4 @@
> -/*
> + /*

just picking nit...

Benny

> * linux/fs/nfs/pnfs.c
> *
> * pNFS functions to call and manage layout drivers.
> @@ -60,6 +60,8 @@ static int pnfs_initialized;
> static void pnfs_free_layout(struct pnfs_layout_type *lo,
> struct nfs4_pnfs_layout_segment *range);
> static enum pnfs_try_status pnfs_commit(struct nfs_write_data *data, int sync);
> +static inline void lock_current_layout(struct nfs_inode *nfsi);
> +static inline void unlock_current_layout(struct nfs_inode *nfsi);
>
> /* Locking:
> *
> @@ -153,16 +155,17 @@ pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx)
> {
> dprintk("%s: has_layout=%d layoutcommit_ctx=%p ctx=%p\n", __func__,
> has_layout(nfsi), nfsi->layout.layoutcommit_ctx, ctx);
> - spin_lock(&nfsi->lo_lock);
> +
> + lock_current_layout(nfsi);
> if (has_layout(nfsi) && !nfsi->layout.layoutcommit_ctx) {
> nfsi->layout.layoutcommit_ctx = get_nfs_open_context(ctx);
> nfsi->change_attr++;
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
> dprintk("%s: Set layoutcommit_ctx=%p\n", __func__,
> nfsi->layout.layoutcommit_ctx);
> return;
> }
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
> }
>
> /* Update last_write_offset for layoutcommit.
> @@ -175,7 +178,7 @@ pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent)
> {
> loff_t end_pos;
>
> - spin_lock(&nfsi->lo_lock);
> + lock_current_layout(nfsi);
> if (offset < nfsi->layout.pnfs_write_begin_pos)
> nfsi->layout.pnfs_write_begin_pos = offset;
> end_pos = offset + extent - 1; /* I'm being inclusive */
> @@ -187,7 +190,7 @@ pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent)
> (unsigned long) offset ,
> (unsigned long) nfsi->layout.pnfs_write_begin_pos,
> (unsigned long) nfsi->layout.pnfs_write_end_pos);
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
> }
>
> /* Unitialize a mountpoint in a layout driver */
> @@ -296,12 +299,27 @@ pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
> * pNFS client layout cache
> */
> #if defined(CONFIG_SMP)
> +#define BUG_ON_LOCKED_LO(lo) \
> + BUG_ON(spin_is_locked(&PNFS_NFS_INODE(lo)->lo_lock))
> #define BUG_ON_UNLOCKED_LO(lo) \
> BUG_ON(!spin_is_locked(&PNFS_NFS_INODE(lo)->lo_lock))
> #else /* CONFIG_SMP */
> +#define BUG_ON_LOCKED_LO(lo) do {} while (0)
> #define BUG_ON_UNLOCKED_LO(lo) do {} while (0)
> #endif /* CONFIG_SMP */
>
> +static inline void lock_current_layout(struct nfs_inode *nfsi)
> +{
> + BUG_ON_LOCKED_LO((&nfsi->layout));
> + spin_lock(&nfsi->lo_lock);
> +}
> +
> +static inline void unlock_current_layout(struct nfs_inode *nfsi)
> +{
> + BUG_ON_UNLOCKED_LO((&nfsi->layout));
> + spin_unlock(&nfsi->lo_lock);
> +}
> +
> /*
> * get and lock nfsi->layout
> */
> @@ -310,10 +328,10 @@ get_lock_current_layout(struct nfs_inode *nfsi)
> {
> struct pnfs_layout_type *lo;
>
> + lock_current_layout(nfsi);
> lo = &nfsi->layout;
> - spin_lock(&nfsi->lo_lock);
> if (!lo->ld_data) {
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
> return NULL;
> }
>
> @@ -333,7 +351,12 @@ put_unlock_current_layout(struct pnfs_layout_type *lo)
> BUG_ON_UNLOCKED_LO(lo);
> BUG_ON(lo->refcount <= 0);
>
> - if (--lo->refcount == 0 && list_empty(&lo->segs)) {
> + lo->refcount--;
> +
> + if (lo->refcount > 0)
> + goto out;
> +
> + if (list_empty(&lo->segs)) {
> struct layoutdriver_io_operations *io_ops =
> PNFS_LD_IO_OPS(lo);
>
> @@ -347,7 +370,8 @@ put_unlock_current_layout(struct pnfs_layout_type *lo)
> list_del_init(&nfsi->lo_inodes);
> spin_unlock(&clp->cl_lock);
> }
> - spin_unlock(&nfsi->lo_lock);
> +out:
> + unlock_current_layout(nfsi);
> }
>
> void
> @@ -356,7 +380,7 @@ pnfs_layout_release(struct pnfs_layout_type *lo, atomic_t *count,
> {
> struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);
>
> - spin_lock(&nfsi->lo_lock);
> + lock_current_layout(nfsi);
> if (range)
> pnfs_free_layout(lo, range);
> atomic_dec(count);
> @@ -375,6 +399,8 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
> };
>
> lo = get_lock_current_layout(nfsi);
> + if (!lo)
> + return;
> pnfs_free_layout(lo, &range);
> put_unlock_current_layout(lo);
> }
> @@ -652,7 +678,7 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi,
> struct pnfs_layout_segment *lseg;
> bool ret = false;
>
> - spin_lock(&nfsi->lo_lock);
> + lock_current_layout(nfsi);
> list_for_each_entry (lseg, &nfsi->layout.segs, fi_list) {
> if (!should_free_lseg(lseg, range))
> continue;
> @@ -666,7 +692,7 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi,
> }
> if (atomic_read(&nfsi->layout.lgetcount))
> ret = true;
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
>
> dprintk("%s:Return %d\n", __func__, ret);
> return ret;
> @@ -756,7 +782,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
> /* unlock w/o put rebalanced by eventual call to
> * pnfs_layout_release
> */
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
>
> if (pnfs_return_layout_barrier(nfsi, &arg)) {
> dprintk("%s: waiting\n", __func__);
> @@ -887,7 +913,7 @@ static int pnfs_wait_schedule(void *word)
> *
> * Note: If successful, nfsi->lo_lock is taken and the caller
> * must put and unlock current_layout by using put_unlock_current_layout()
> - * when the returned layout is released.
> + * directly or pnfs_layout_release() when the returned layout is released.
> */
> static struct pnfs_layout_type *
> get_lock_alloc_layout(struct inode *ino)
> @@ -922,7 +948,7 @@ get_lock_alloc_layout(struct inode *ino)
> struct nfs_client *clp = NFS_SERVER(ino)->nfs_client;
>
> /* must grab the layout lock before the client lock */
> - spin_lock(&nfsi->lo_lock);
> + lock_current_layout(nfsi);
>
> spin_lock(&clp->cl_lock);
> if (list_empty(&nfsi->lo_inodes))
> @@ -1038,10 +1064,10 @@ void drain_layoutreturns(struct pnfs_layout_type *lo)
> while (atomic_read(&lo->lretcount)) {
> struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);
>
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
> dprintk("%s: waiting\n", __func__);
> wait_event(nfsi->lo_waitq, (atomic_read(&lo->lretcount) == 0));
> - spin_lock(&nfsi->lo_lock);
> + lock_current_layout(nfsi);
> }
> }
>
> @@ -1080,13 +1106,13 @@ pnfs_update_layout(struct inode *ino,
> /* Check to see if the layout for the given range already exists */
> lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
> if (lseg && !lseg->valid) {
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
> if (take_ref)
> put_lseg(lseg);
> for (;;) {
> prepare_to_wait(&nfsi->lo_waitq, &__wait,
> TASK_KILLABLE);
> - spin_lock(&nfsi->lo_lock);
> + lock_current_layout(nfsi);
> lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
> if (!lseg || lseg->valid)
> break;
> @@ -1099,7 +1125,7 @@ pnfs_update_layout(struct inode *ino,
> result = -ERESTARTSYS;
> break;
> }
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
> schedule();
> }
> finish_wait(&nfsi->lo_waitq, &__wait);
> @@ -1136,7 +1162,7 @@ pnfs_update_layout(struct inode *ino,
> /* Matching dec is done in .rpc_release (on non-error paths) */
> atomic_inc(&lo->lgetcount);
> /* Lose lock, but not reference, match this with pnfs_layout_release */
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
>
> result = get_layout(ino, ctx, &arg, lsegpp, lo);
> out:
> @@ -1286,7 +1312,7 @@ pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp)
> *lgp->lsegpp = lseg;
> }
>
> - spin_lock(&nfsi->lo_lock);
> + lock_current_layout(nfsi);
> pnfs_insert_layout(lo, lseg);
>
> if (res->return_on_close) {
> @@ -1297,7 +1323,7 @@ pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp)
>
> /* Done processing layoutget. Set the layout stateid */
> pnfs_set_layout_stateid(lo, &res->stateid);
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
> out:
> return status;
> }
> @@ -2212,7 +2238,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
> if (!data)
> return -ENOMEM;
>
> - spin_lock(&nfsi->lo_lock);
> + lock_current_layout(nfsi);
> if (!nfsi->layout.layoutcommit_ctx)
> goto out_unlock;
>
> @@ -2233,7 +2259,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
> nfsi->layout.layoutcommit_ctx = NULL;
>
> /* release lock on pnfs layoutcommit attrs */
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
>
> data->is_sync = sync;
> status = pnfs4_proc_layoutcommit(data);
> @@ -2242,7 +2268,7 @@ out:
> return status;
> out_unlock:
> pnfs_layoutcommit_free(data);
> - spin_unlock(&nfsi->lo_lock);
> + unlock_current_layout(nfsi);
> goto out;
> }
>

2010-05-26 08:42:17

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 5/8] pnfs-submit: request whole file layouts only

On May. 17, 2010, 20:56 +0300, Alexandros Batsakis <[email protected]> wrote:
> In the first iteration of the pNFS code, we support only whole-file layouts.
> To facilitate the move to multiple-segments, we keep the segment processing
> code, but the segment list should always contain a max of one segment per I/O type
>
> Signed-off-by: Alexandros Batsakis <[email protected]>
> ---
> fs/nfs/callback_proc.c | 7 ++++---
> fs/nfs/pnfs.c | 25 +++++--------------------
> 2 files changed, 9 insertions(+), 23 deletions(-)
>
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index 8ef1502..bfada25 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -213,6 +213,10 @@ static int pnfs_recall_layout(void *data)
> then return layouts, resume after layoutreturns complete
> */
>
> + /* support whole file layouts only */
> + rl.cbl_seg.offset = 0;
> + rl.cbl_seg.length = NFS4_MAX_UINT64;
> +
> if (rl.cbl_recall_type == RETURN_FILE) {
> status = pnfs_return_layout(inode, &rl.cbl_seg, &rl.cbl_stateid,
> RETURN_FILE);
> @@ -221,9 +225,6 @@ static int pnfs_recall_layout(void *data)
> goto out;
> }
>
> - rl.cbl_seg.offset = 0;
> - rl.cbl_seg.length = NFS4_MAX_UINT64;
> -
> /* FIXME: This loop is inefficient, running in O(|s_inodes|^2) */
> while ((ino = nfs_layoutrecall_find_inode(clp, &rl)) != NULL) {
> /* XXX need to check status on pnfs_return_layout */
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index ecf6dc2..2cc8895 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -546,12 +546,6 @@ pnfs_layout_from_open_stateid(nfs4_stateid *dst, struct nfs4_state *state)
> * for now, assume that whole file layouts are requested.
> * arg->offset: 0
> * arg->length: all ones
> -*
> -* for now, assume the LAYOUTGET operation is triggered by an I/O request.
> -* the count field is the count in the I/O request, and will be used
> -* as the minlength. for the file operation that piggy-backs
> -* the LAYOUTGET operation with an OPEN, s
> -* arg->minlength = count.
> */
> static int
> get_layout(struct inode *ino,
> @@ -572,10 +566,10 @@ get_layout(struct inode *ino,
> return -ENOMEM;
> }
> lgp->lo = lo;
> - lgp->args.minlength = PAGE_CACHE_SIZE;
> + lgp->args.minlength = NFS4_MAX_UINT64;
> lgp->args.maxcount = PNFS_LAYOUT_MAXSIZE;
> lgp->args.lseg.iomode = range->iomode;
> - lgp->args.lseg.offset = range->offset;
> + lgp->args.lseg.offset = 0;
> lgp->args.lseg.length = max(range->length, lgp->args.minlength);

This is a protocol bug.

As per RFC5661, 18.43. Operation 50: LAYOUTGET:
The second range is between loga_offset and loga_offset +
loga_minlength - 1 inclusive. This range indicates the required
range the client needs the layout to cover. Thus, loga_minlength
MUST be less than or equal to loga_length.

Therefore, lseg.length must also be set to NFS4_MAX_UINT64

> lgp->args.type = server->pnfs_curr_ld->id;
> lgp->args.inode = ino;
> @@ -1068,8 +1062,8 @@ pnfs_update_layout(struct inode *ino,
> {
> struct nfs4_pnfs_layout_segment arg = {
> .iomode = iomode,
> - .offset = pos,
> - .length = countg
> + .offset = 0,
> + .length = ~0

let's use NFS4_MAX_UINT64 rather than counting on the compiler for promoting ~0
to u64.

> };
> struct nfs_inode *nfsi = NFS_I(ino);
> struct pnfs_layout_type *lo;
> @@ -1159,7 +1153,6 @@ out_put:
> void
> pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
> {
> - struct nfs4_pnfs_layoutget_res *res = &lgp->res;
> struct pnfs_layout_segment *lseg = NULL;
> struct nfs_inode *nfsi = PNFS_NFS_INODE(lgp->lo);
> time_t suspend = 0;
> @@ -1167,15 +1160,8 @@ pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
> dprintk("-->%s\n", __func__);
>
> lgp->status = rpc_status;
> - if (likely(!rpc_status)) {
> - if (unlikely(res->layout.len <= 0)) {
> - printk(KERN_ERR
> - "%s: ERROR! Layout size is ZERO!\n", __func__);
> - lgp->status = -EIO;
> - goto get_out;
> - }

This is an orthogonal issue.
Why not verify the resulting layout we got?

Benny

> + if (likely(!rpc_status))
> goto out;
> - }
>
> dprintk("%s: ERROR retrieving layout %d\n", __func__, rpc_status);
> switch (rpc_status) {
> @@ -1250,7 +1236,6 @@ pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
> break;
> }
>
> -get_out:
> /* remember that get layout failed and suspend trying */
> nfsi->layout.pnfs_layout_suspend = suspend;
> set_bit(lo_fail_bit(lgp->args.lseg.iomode),


2010-05-26 08:49:30

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 6/8] pnfs-submit: change layouts list to be similar to the other state list management

On 2010-05-17 20:56, Alexandros Batsakis wrote:
> The current design keeps a list (nfs_client) of inodes having layouts.
> In order to make that code more similar to delegation handling (and in general to the rest of the NFS code), this patch changes the list element to layouts directly.
> No backpointer from the layout to the inode is needed as the inode can be accesed by a container_of() call
>
> Signed-off-by: Alexandros Batsakis <[email protected]>
> ---
> fs/nfs/callback_proc.c | 9 +++++++--
> fs/nfs/client.c | 2 +-
> fs/nfs/inode.c | 4 ++--
> fs/nfs/pnfs.c | 10 ++++------
> include/linux/nfs_fs.h | 3 +--
> include/linux/nfs_fs_sb.h | 2 +-
> 6 files changed, 16 insertions(+), 14 deletions(-)
>
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index bfada25..24e2571 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -76,7 +76,6 @@ static int (*nfs_validate_delegation_stateid(struct nfs_client *clp))(struct nfs
> return nfs4_validate_delegation_stateid;
> }
>
> -
> __be32 nfs4_callback_recall(struct cb_recallargs *args, void *dummy)
> {
> struct nfs_client *clp;
> @@ -140,6 +139,7 @@ nfs_layoutrecall_find_inode(struct nfs_client *clp,
> const struct cb_pnfs_layoutrecallargs *args)
> {
> struct nfs_inode *nfsi;
> + struct pnfs_layout_type *layout;
> struct nfs_server *server;
> struct inode *ino = NULL;
>
> @@ -147,9 +147,14 @@ nfs_layoutrecall_find_inode(struct nfs_client *clp,
> __func__, args->cbl_recall_type, clp);
>
> spin_lock(&clp->cl_lock);
> - list_for_each_entry(nfsi, &clp->cl_lo_inodes, lo_inodes) {
> + list_for_each_entry(layout, &clp->cl_layouts, lo_layouts) {
> + nfsi = PNFS_NFS_INODE(layout);
> + if (!nfsi)
> + continue;
> +
> dprintk("%s: Searching inode=%lu\n",
> __func__, nfsi->vfs_inode.i_ino);
> +
> if (args->cbl_recall_type == RETURN_FILE) {
> if (nfs_compare_fh(&args->cbl_fh, &nfsi->fh))
> continue;
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index d4759dd..d60e5d7 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -157,7 +157,7 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
> if (!IS_ERR(cred))
> clp->cl_machine_cred = cred;
> #if defined(CONFIG_NFS_V4_1)
> - INIT_LIST_HEAD(&clp->cl_lo_inodes);
> + INIT_LIST_HEAD(&clp->cl_layouts);
> #endif
> nfs_fscache_get_client_cookie(clp);
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index ab599be..adf3280 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -1363,7 +1363,7 @@ static void pnfs_destroy_inode(struct nfs_inode *nfsi)
> if (!list_empty(&nfsi->layout.segs))
> pnfs_destroy_layout(nfsi);
>
> - BUG_ON(!list_empty(&nfsi->lo_inodes));
> + BUG_ON(!list_empty(&nfsi->layout.lo_layouts));
> BUG_ON(!list_empty(&nfsi->layout.segs));
> BUG_ON(nfsi->layout.refcount);
> BUG_ON(nfsi->layout.ld_data);
> @@ -1381,10 +1381,10 @@ void nfs_destroy_inode(struct inode *inode)
> static void pnfs_init_once(struct nfs_inode *nfsi)
> {
> #ifdef CONFIG_NFS_V4_1
> - INIT_LIST_HEAD(&nfsi->lo_inodes);
> init_waitqueue_head(&nfsi->lo_waitq);
> spin_lock_init(&nfsi->lo_lock);
> seqlock_init(&nfsi->layout.seqlock);
> + INIT_LIST_HEAD(&nfsi->layout.lo_layouts);
> INIT_LIST_HEAD(&nfsi->layout.segs);
> nfsi->layout.refcount = 0;
> nfsi->layout.ld_data = NULL;
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 2cc8895..3b4dea0 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -364,10 +364,10 @@ put_unlock_current_layout(struct pnfs_layout_type *lo)
> io_ops->free_layout(lo->ld_data);
> lo->ld_data = NULL;
>
> - /* Unlist the inode. */
> + /* Unlist the layout. */
> clp = NFS_SERVER(&nfsi->vfs_inode)->nfs_client;
> spin_lock(&clp->cl_lock);
> - list_del_init(&nfsi->lo_inodes);
> + list_del_init(&lo->lo_layouts);
> spin_unlock(&clp->cl_lock);
> }
> out:
> @@ -881,10 +881,8 @@ alloc_init_layout(struct inode *ino)
>
> BUG_ON(lo->ld_data != NULL);
> lo->ld_data = ld_data;
> - seqlock_init(&lo->seqlock);
> memset(&lo->stateid, 0, NFS4_STATEID_SIZE);
> lo->refcount = 1;
> - INIT_LIST_HEAD(&lo->segs);
> lo->roc_iomode = 0;
> return lo;
> }
> @@ -940,8 +938,8 @@ get_lock_alloc_layout(struct inode *ino)
> lock_current_layout(nfsi);
>
> spin_lock(&clp->cl_lock);
> - if (list_empty(&nfsi->lo_inodes))
> - list_add_tail(&nfsi->lo_inodes, &clp->cl_lo_inodes);
> + if (list_empty(&lo->lo_layouts))
> + list_add_tail(&lo->lo_layouts, &clp->cl_layouts);
> spin_unlock(&clp->cl_lock);
> } else
> lo = ERR_PTR(-ENOMEM);
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 2434bba..2a80be8 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -99,6 +99,7 @@ struct posix_acl;
>
> struct pnfs_layout_type {
> int refcount;
> + struct list_head lo_layouts; /* other inode layouts */

It's "other client layouts" actually.
I think the name is confusing.
How about "per_client" instead?

Benny

> struct list_head segs; /* layout segments list */
> int roc_iomode; /* iomode to return on close, 0=none */
> seqlock_t seqlock; /* Protects the stateid */
> @@ -205,8 +206,6 @@ struct nfs_inode {
>
> /* pNFS layout information */
> #if defined(CONFIG_NFS_V4_1)
> - /* Inodes having layouts */
> - struct list_head lo_inodes;
> wait_queue_head_t lo_waitq;
> spinlock_t lo_lock;
> struct pnfs_layout_type layout;
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 00a4e7e..0c92889 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -86,7 +86,7 @@ struct nfs_client {
> /* The flags used for obtaining the clientid during EXCHANGE_ID */
> u32 cl_exchange_flags;
> struct nfs4_session *cl_session; /* sharred session */
> - struct list_head cl_lo_inodes; /* Inodes having layouts */
> + struct list_head cl_layouts;
> struct nfs4_deviceid_cache *cl_devid_cache; /* pNFS deviceid cache */
> #endif /* CONFIG_NFS_V4_1 */
>


2010-05-26 09:21:03

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 7/8] pnfs-submit: forgetful client model

On 2010-05-17 20:56, Alexandros Batsakis wrote:
> Forgetful client model:
>
> If we receive a CB_LAYOUTRECALL
> - we spawn a thread to handle the recall
> (xxx: now only one recall can be active at a time, else NFS4ERR_DELAY)
> - we check the stateid seqid
> if it does not match we return NFS4ERR_DELAY
> - we check for pending I/O
> if there is we return NFS4ERR_DELAY
> Else we return NO_MATCHING_LAYOUT.
> Note that for whole file layouts there is no need to serialize LAYOUTGETs/LAYOUTRETURNs
> For bulk layouts, if there is a layout active, we return NFS4_OK and we start
> cleaning the layouts asynchronously. At the end we send a bulk LAYOUTRETURN.
> Note that there is no need to prevent any new LAYOUTGETs explicitly as the server should
> reject them.
>
> Signed-off-by: Alexandros Batsakis <[email protected]>
> ---
> fs/nfs/callback_proc.c | 131 ++++++++++++++++++++++++++++++++++--------------
> fs/nfs/inode.c | 2 +-
> fs/nfs/nfs4_fs.h | 1 +
> fs/nfs/nfs4proc.c | 4 +-
> fs/nfs/nfs4state.c | 2 +-
> fs/nfs/pnfs.c | 78 +++++++++++++---------------
> fs/nfs/pnfs.h | 8 ++-
> 7 files changed, 139 insertions(+), 87 deletions(-)
>
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index 24e2571..419fee7 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -129,6 +129,38 @@ int nfs4_validate_delegation_stateid(struct nfs_delegation *delegation, const nf
>
> #if defined(CONFIG_NFS_V4_1)
>
> +static int

can return bool

> +pnfs_is_next_layout_stateid(const struct pnfs_layout_type *lo,
> + const nfs4_stateid stateid)
> +{
> + int seq;

how about calling it seqlock so it will be less confusing
with regards to seqid.

> + int res;

and res is "bool" too.

> + u32 oldseqid, newseqid;
> +
> + do {
> + seq = read_seqbegin(&lo->seqlock);
> + oldseqid = ntohl(lo->stateid.u.stateid.seqid);
> + newseqid = ntohl(stateid.u.stateid.seqid);

let's use the more modern be32_to_cpu form...

> + res = !memcmp(lo->stateid.u.stateid.other,
> + stateid.u.stateid.other,
> + NFS4_STATEID_OTHER_SIZE);
> + if (res) { /* comparing layout stateids */
> + if (oldseqid == ~0)
> + res = (newseqid == 1);
> + else
> + res = (newseqid == oldseqid + 1);
> + } else { /* open stateid */
> + res = !memcmp(lo->stateid.u.data,
> + &zero_stateid,
> + NFS4_STATEID_SIZE);
> + if (res)
> + res = (newseqid == 1);
> + }
> + } while (read_seqretry(&lo->seqlock, seq));
> +
> + return res;
> +}
> +
> /*
> * Retrieve an inode based on layout recall parameters
> *
> @@ -190,10 +222,11 @@ static int pnfs_recall_layout(void *data)
> {
> struct inode *inode, *ino;
> struct nfs_client *clp;
> + struct nfs4_pnfs_layoutreturn *lrp;
> struct cb_pnfs_layoutrecallargs rl;
> struct recall_layout_threadargs *args =
> (struct recall_layout_threadargs *)data;
> - int status;
> + int status = 0;
>
> daemonize("nfsv4-layoutreturn");
>
> @@ -204,46 +237,57 @@ static int pnfs_recall_layout(void *data)
> clp = args->clp;
> inode = args->inode;
> rl = *args->rl;
> - args->result = 0;
> - complete(&args->started);
> - args = NULL;
> - /* Note: args must not be used after this point!!! */
> -
> -/* FIXME: need barrier here:
> - pause I/O to data servers
> - pause layoutgets
> - drain all outstanding writes to storage devices
> - wait for any outstanding layoutreturns and layoutgets mentioned in
> - cb_sequence.
> - then return layouts, resume after layoutreturns complete
> - */
>
> /* support whole file layouts only */
> rl.cbl_seg.offset = 0;
> rl.cbl_seg.length = NFS4_MAX_UINT64;
>
> if (rl.cbl_recall_type == RETURN_FILE) {
> - status = pnfs_return_layout(inode, &rl.cbl_seg, &rl.cbl_stateid,
> - RETURN_FILE);
> + if (pnfs_is_next_layout_stateid(&NFS_I(inode)->layout, rl.cbl_stateid))
> + status = pnfs_return_layout(inode, &rl.cbl_seg, &rl.cbl_stateid,
> + RETURN_FILE, 0);
> + else
> + status = htonl(NFS4ERR_DELAY);

cpu_to_be32?

> + /* forgetful client */
> if (status)
> - dprintk("%s RETURN_FILE error: %d\n", __func__, status);
> + dprintk("%s RETURN_FILE : %d\n", __func__, status);
> + else
> + status = htonl(NFS4ERR_NOMATCHING_LAYOUT);

ditto

> + args->result = status;
> + complete(&args->started);
> goto out;
> }
>
> - /* FIXME: This loop is inefficient, running in O(|s_inodes|^2) */

Hmm, I would like this to be fixed, eventually.
Maybe FIXME is too harsh and IMPROVEME would be better :)

> + status = htonl(NFS4_OK);

ditto

> + args->result = status;
> + complete(&args->started);
> + args = NULL;
> +
> while ((ino = nfs_layoutrecall_find_inode(clp, &rl)) != NULL) {
> /* XXX need to check status on pnfs_return_layout */

s/XXX/FIXME/

> - pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE);
> + pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE, 0);
> iput(ino);
> }
>
> - /* send final layoutreturn */
> - status = pnfs_return_layout(inode, &rl.cbl_seg, NULL, rl.cbl_recall_type);
> - if (status)
> - printk(KERN_INFO "%s: ignoring pnfs_return_layout status=%d\n",
> - __func__, status);
> + lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
> + if (!lrp) {
> + dprintk("%s: allocation failed. Cannot send last LAYOUTRETURN\n",
> + __func__);
> + goto out;
> + }
> +
> + /* send final layoutreturn (bulk) */
> + lrp->args.reclaim = 0;
> + lrp->args.layout_type = rl.cbl_layout_type;
> + lrp->args.return_type = rl.cbl_recall_type;
> + lrp->args.lseg = rl.cbl_seg;
> + lrp->args.inode = inode;
> + lrp->lo = NULL;
> +
> + pnfs4_proc_layoutreturn(lrp);
> out:
> - iput(inode);
> + clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
> + nfs_put_client(clp);
> module_put_and_exit(0);
> dprintk("%s: exit status %d\n", __func__, 0);
> return 0;
> @@ -261,15 +305,18 @@ static int pnfs_async_return_layout(struct nfs_client *clp, struct inode *inode,
> .rl = rl,
> };
> struct task_struct *t;
> - int status;
> -
> - /* should have returned NFS4ERR_NOMATCHING_LAYOUT... */
> - BUG_ON(inode == NULL);
> + int status = htonl(NFS4ERR_DELAY);
>
> dprintk("%s: -->\n", __func__);
>
> + /* XXX: do not allow two concurrent layout recalls */
> + if (test_and_set_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state))
> + return status;
> +
> init_completion(&data.started);
> __module_get(THIS_MODULE);
> + if (!atomic_inc_not_zero(&clp->cl_count))
> + goto out_put_no_client;
>
> t = kthread_run(pnfs_recall_layout, &data, "%s", "pnfs_recall_layout");
> if (IS_ERR(t)) {
> @@ -283,6 +330,9 @@ static int pnfs_async_return_layout(struct nfs_client *clp, struct inode *inode,
> wait_for_completion(&data.started);
> return data.result;
> out_module_put:
> + nfs_put_client(clp);
> +out_put_no_client:
> + clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
> module_put(THIS_MODULE);
> return status;
> }
> @@ -309,19 +359,24 @@ __be32 pnfs_cb_layoutrecall(struct cb_pnfs_layoutrecallargs *args,
> do {
> struct nfs_client *prev = clp;
> num_client++;
> - inode = nfs_layoutrecall_find_inode(clp, args);
> - if (inode != NULL) {
> - if (PNFS_LD(&NFS_I(inode)->layout)->id ==
> - args->cbl_layout_type) {
> - /* Set up a helper thread to actually
> - * return the delegation */
> + /* the callback must come from the MDS personality */
> + if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_MDS))
> + goto loop;
> + if (args->cbl_recall_type == RETURN_FILE) {
> + inode = nfs_layoutrecall_find_inode(clp, args);
> + if (inode != NULL) {
> res = pnfs_async_return_layout(clp, inode, args);
> - if (res != 0)
> - res = htonl(NFS4ERR_RESOURCE);
> - break;
> + iput(inode);
> }
> + } else {
> + /* we need the inode to get the nfs_server struct */
> + inode = nfs_layoutrecall_find_inode(clp, args);
> + if (!inode)
> + goto loop;
> + res = pnfs_async_return_layout(clp, inode, args);
> iput(inode);
> }
> +loop:
> clp = nfs_find_client_next(prev);
> nfs_put_client(prev);
> } while (clp != NULL);
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index adf3280..584c9b8 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -1321,7 +1321,7 @@ void nfs4_clear_inode(struct inode *inode)
> /* First call standard NFS clear_inode() code */
> nfs_clear_inode(inode);
> #ifdef CONFIG_NFS_V4_1
> - pnfs_return_layout(inode, NULL, NULL, RETURN_FILE);
> + pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, 1);
> #endif /* CONFIG_NFS_V4_1 */
> }
> #endif
> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> index 40ebe1d..90c57b3 100644
> --- a/fs/nfs/nfs4_fs.h
> +++ b/fs/nfs/nfs4_fs.h
> @@ -47,6 +47,7 @@ enum nfs4_client_state {
> NFS4CLNT_SESSION_RESET,
> NFS4CLNT_SESSION_DRAINING,
> NFS4CLNT_RECALL_SLOT,
> + NFS4CLNT_LAYOUT_RECALL,
> };
>
> /*
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6739d15..14e90e1 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -1085,7 +1085,7 @@ static void pnfs4_layout_reclaim(struct nfs4_state *state)
> /* FIXME: send gratuitous layout commits and return with the reclaim
> * flag during grace period
> */
> - pnfs_return_layout(state->inode, NULL, &state->open_stateid, RETURN_FILE);
> + pnfs_return_layout(state->inode, NULL, NULL, RETURN_FILE, 1);
> pnfs_set_layout_stateid(&NFS_I(state->inode)->layout, &zero_stateid);
> #endif /* CONFIG_NFS_V4_1 */
> }
> @@ -2382,7 +2382,7 @@ pnfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
>
> if (pnfs_enabled_sb(server) && has_layout(nfsi) &&
> pnfs_ld_layoutret_on_setattr(server->pnfs_curr_ld))
> - pnfs_return_layout(inode, NULL, NULL, RETURN_FILE);
> + pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, 1);
> return nfs4_proc_setattr(dentry, fattr, sattr);
> }
> #endif /* CONFIG_NFS_V4_1 */
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 2d52300..cd2ae83 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -598,7 +598,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm
> range.offset = 0;
> range.length = NFS4_MAX_UINT64;
> pnfs_return_layout(state->inode, &range, NULL,
> - RETURN_FILE);
> + RETURN_FILE, 1);
> }
> #endif /* CONFIG_NFS_V4_1 */
> nfs4_do_close(path, state, wait);
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 3b4dea0..2c96723 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -610,7 +610,7 @@ get_layout(struct inode *ino,
> */
> static inline int
> should_free_lseg(struct pnfs_layout_segment *lseg,
> - struct nfs4_pnfs_layout_segment *range)
> + struct nfs4_pnfs_layout_segment *range)
> {
> return (range->iomode == IOMODE_ANY ||
> lseg->range.iomode == range->iomode) &&
> @@ -703,6 +703,8 @@ return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
>
> dprintk("--> %s\n", __func__);
>
> + BUG_ON(type != RETURN_FILE);
> +
> lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
> if (lrp == NULL) {
> if (lo && (type == RETURN_FILE))
> @@ -729,7 +731,8 @@ out:
> int
> _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
> const nfs4_stateid *stateid, /* optional */
> - enum pnfs_layoutreturn_type type)
> + enum pnfs_layoutreturn_type type,
> + int sendreturn)

s/int/bool/
and we should use true/false rather than 1/0 in call sites.

> {
> struct pnfs_layout_type *lo = NULL;
> struct nfs_inode *nfsi = NFS_I(ino);
> @@ -739,14 +742,19 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
> dprintk("--> %s type %d\n", __func__, type);
>
> if (range)
> - arg = *range;
> - else {
> + arg.iomode = range->iomode;
> + else
> arg.iomode = IOMODE_ANY;

how about
arg.iomode = range ? range->iomode : IOMODE_ANY;

> - arg.offset = 0;
> - arg.length = ~0;
> - }
> + arg.offset = 0;
> + arg.length = ~0;

NFS4_MAX_UINT64

> +
> if (type == RETURN_FILE) {
> if (nfsi->layout.layoutcommit_ctx) {
> + if (stateid && !sendreturn) { /* callback */
> + dprintk("%s: layoutcommit pending\n", __func__);
> + status = htonl(NFS4ERR_DELAY);

this function currently returns system errors, not nfs'p.

> + goto out;
> + }
> status = pnfs_layoutcommit_inode(ino, 1);
> if (status) {
> dprintk("%s: layoutcommit failed, status=%d. "
> @@ -763,11 +771,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
> }
> if (!lo) {
> dprintk("%s: no layout segments to return\n", __func__);
> - /* must send the LAYOUTRETURN in response to recall */
> - if (stateid)
> - goto send_return;
> - else
> - goto out;
> + goto out;

what about the sendreturn case?

> }
>
> /* unlock w/o put rebalanced by eventual call to
> @@ -776,13 +780,22 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
> unlock_current_layout(nfsi);
>
> if (pnfs_return_layout_barrier(nfsi, &arg)) {
> + if (stateid) { /* callback */
> + status = htonl(NFS4ERR_DELAY);

ditto

> + lock_current_layout(nfsi);
> + put_unlock_current_layout(lo);
> + goto out;
> + }
> dprintk("%s: waiting\n", __func__);
> wait_event(nfsi->lo_waitq,
> - !pnfs_return_layout_barrier(nfsi, &arg));
> + !pnfs_return_layout_barrier(nfsi, &arg));
> }
> +
> + if (sendreturn)
> + status = return_layout(ino, &arg, stateid, type, lo);
> + else
> + pnfs_layout_release(lo, &arg);
> }
> -send_return:
> - status = return_layout(ino, &arg, stateid, type, lo);
> out:
> dprintk("<-- %s status: %d\n", __func__, status);
> return status;
> @@ -1080,31 +1093,12 @@ pnfs_update_layout(struct inode *ino,
> /* Check to see if the layout for the given range already exists */
> lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
> if (lseg && !lseg->valid) {
> - unlock_current_layout(nfsi);
> if (take_ref)
> put_lseg(lseg);
> - for (;;) {
> - prepare_to_wait(&nfsi->lo_waitq, &__wait,
> - TASK_KILLABLE);
> - lock_current_layout(nfsi);
> - lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
> - if (!lseg || lseg->valid)
> - break;
> - dprintk("%s: invalid lseg %p ref %d\n", __func__,
> - lseg, atomic_read(&lseg->kref.refcount)-1);
> - if (take_ref)
> - put_lseg(lseg);
> - if (signal_pending(current)) {
> - lseg = NULL;
> - result = -ERESTARTSYS;
> - break;
> - }
> - unlock_current_layout(nfsi);
> - schedule();
> - }
> - finish_wait(&nfsi->lo_waitq, &__wait);
> - if (result)
> - goto out_put;
> +
> + /* someone is cleaning the layout */
> + result = -EAGAIN;

Ah well, I'm not sure if this works, did you test that?
Anyhow, will work properly with the long awaited state machine.

> + goto out_put;
> }
>
> if (lseg) {
> @@ -1434,7 +1428,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
>
> if (count > 0 && !below_threshold(inode, count, 0)) {
> status = pnfs_update_layout(inode, ctx, count,
> - loff, IOMODE_READ, NULL);
> + loff, IOMODE_READ, NULL);
> dprintk("%s *rsize %Zd virt update returned %d\n",
> __func__, *rsize, status);
> if (status != 0)
> @@ -1673,7 +1667,7 @@ pnfs_writeback_done(struct nfs_write_data *data)
> .length = data->args.count,
> };
> dprintk("%s: retrying\n", __func__);
> - _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
> + _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, 1);

s/1/true/

> pnfs_initiate_write(data, NFS_CLIENT(data->inode),
> pdata->call_ops, pdata->how);
> }
> @@ -1804,7 +1798,7 @@ pnfs_read_done(struct nfs_read_data *data)
> .length = data->args.count,
> };
> dprintk("%s: retrying\n", __func__);
> - _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
> + _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, 1);

ditto

> pnfs_initiate_read(data, NFS_CLIENT(data->inode),
> pdata->call_ops);
> }
> @@ -2030,7 +2024,7 @@ pnfs_commit_done(struct nfs_write_data *data)
> .length = data->args.count,
> };
> dprintk("%s: retrying\n", __func__);
> - _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
> + _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, 1);

ditto

> pnfs_initiate_commit(data, NFS_CLIENT(data->inode),
> pdata->call_ops, pdata->how);
> }
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 9e43973..794a2d3 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -40,7 +40,8 @@ int pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>
> int _pnfs_return_layout(struct inode *, struct nfs4_pnfs_layout_segment *,
> const nfs4_stateid *stateid, /* optional */
> - enum pnfs_layoutreturn_type);
> + enum pnfs_layoutreturn_type,
> + int sendreturn);

bool

> void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh *mntfh, u32 id);
> void unmount_pnfs_layoutdriver(struct nfs_server *);
> int pnfs_use_read(struct inode *inode, ssize_t count);
> @@ -236,14 +237,15 @@ static inline void pnfs_modify_new_request(struct nfs_page *req,
> static inline int pnfs_return_layout(struct inode *ino,
> struct nfs4_pnfs_layout_segment *lseg,
> const nfs4_stateid *stateid, /* optional */
> - enum pnfs_layoutreturn_type type)
> + enum pnfs_layoutreturn_type type,
> + int sendreturn)

bool

Thanks!

Benny

> {
> struct nfs_inode *nfsi = NFS_I(ino);
> struct nfs_server *nfss = NFS_SERVER(ino);
>
> if (pnfs_enabled_sb(nfss) &&
> (type != RETURN_FILE || has_layout(nfsi)))
> - return _pnfs_return_layout(ino, lseg, stateid, type);
> + return _pnfs_return_layout(ino, lseg, stateid, type, sendreturn);
>
> return 0;
> }


2010-05-26 10:48:54

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH 8/8] pnfs-submit: support for cb_recall_any (layouts)

On 2010-05-17 20:56, Alexandros Batsakis wrote:
> CB_RECALL_ANY serves as a hint to the client to return some server state.
> We reply immediately and we clean the layouts asycnhronously.
>
> FIXME: currently we return _all_ layouts
> FIXME: eventually we should treat layouts as delegations, marked them expired
> and fire the state manager to clean them.
>
> Signed-off-by: Alexandros Batsakis <[email protected]>
> ---
> fs/nfs/callback.h | 7 ++++++
> fs/nfs/callback_proc.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 58 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
> index 73f21bc..b39ac86 100644
> --- a/fs/nfs/callback.h
> +++ b/fs/nfs/callback.h
> @@ -115,6 +115,13 @@ extern int nfs41_validate_delegation_stateid(struct nfs_delegation *delegation,
>
> #define RCA4_TYPE_MASK_RDATA_DLG 0
> #define RCA4_TYPE_MASK_WDATA_DLG 1
> +#define RCA4_TYPE_MASK_DIR_DLG 2
> +#define RCA4_TYPE_MASK_FILE_LAYOUT 3
> +#define RCA4_TYPE_MASK_BLK_LAYOUT 4
> +#define RCA4_TYPE_MASK_OBJ_LAYOUT_MIN 8
> +#define RCA4_TYPE_MASK_OBJ_LAYOUT_MAX 9
> +#define RCA4_TYPE_MASK_OTHER_LAYOUT_MIN 12
> +#define RCA4_TYPE_MASK_OTHER_LAYOUT_MAX 15
>
> struct cb_recallanyargs {
> struct sockaddr *craa_addr;
> diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
> index 419fee7..d09fb07 100644
> --- a/fs/nfs/callback_proc.c
> +++ b/fs/nfs/callback_proc.c
> @@ -337,6 +337,26 @@ out_put_no_client:
> return status;
> }
>
> +static int pnfs_recall_all_layouts(struct nfs_client *clp)
> +{
> + struct cb_pnfs_layoutrecallargs rl;
> + struct inode *inode;
> + int status = 0;
> +
> + rl.cbl_recall_type = RETURN_ALL;
> + rl.cbl_seg.offset = 0;
> + rl.cbl_seg.length = NFS4_MAX_UINT64;
> + rl.cbl_seg.iomode = IOMODE_ANY;

nit: set the fields in actual memory order...

> + /* we need the inode to get the nfs_server struct */
> + inode = nfs_layoutrecall_find_inode(clp, &rl);
> + if (!inode)
> + return status;
> + status = pnfs_async_return_layout(clp, inode, &rl);
> + iput(inode);
> +
> + return status;
> +}
> +
> __be32 pnfs_cb_layoutrecall(struct cb_pnfs_layoutrecallargs *args,
> void *dummy)
> {
> @@ -651,6 +671,27 @@ out:
> return status;
> }
>
> +static inline int

bool maybe?

> +validate_bitmap_values(const unsigned long* mask)
> +{
> + int i;
> + if (test_bit(RCA4_TYPE_MASK_RDATA_DLG, mask) ||
> + test_bit(RCA4_TYPE_MASK_WDATA_DLG, mask) ||
> + test_bit(RCA4_TYPE_MASK_DIR_DLG, mask) ||
> + test_bit(RCA4_TYPE_MASK_FILE_LAYOUT, mask) ||
> + test_bit(RCA4_TYPE_MASK_BLK_LAYOUT, mask))
> + return 1;
> + for (i = RCA4_TYPE_MASK_OBJ_LAYOUT_MIN;
> + i <= RCA4_TYPE_MASK_OBJ_LAYOUT_MAX; i++)
> + if (test_bit(i, mask))
> + return 1;
> + for (i = RCA4_TYPE_MASK_OTHER_LAYOUT_MIN;
> + i <= RCA4_TYPE_MASK_OTHER_LAYOUT_MAX; i++)
> + if (test_bit(i, mask))
> + return 1;

What about undefined bits?
The spec requires returning NFS4ERR_INVAL in this case.
How about having a mask of all known bits (a constant)
and checking that (mask & ~known_mask) == 0
I'm not absolutely sure that returning INVAL when (mask & known_mask) == 0
(as you effectively do in this patchset) is the right thing to do.

2010-05-17 17:56:51

by Alexandros Batsakis

[permalink] [raw]
Subject: [PATCH 1/8] pnfs-submit: clean struct nfs_inode

by moving layout specific fields from nfs_inode to struct pnfs_layout_type

Signed-off-by: Alexandros Batsakis <[email protected]>
---
fs/nfs/inode.c | 11 ++++---
fs/nfs/nfs4state.c | 2 +-
fs/nfs/pnfs.c | 70 ++++++++++++++++++++++++++----------------------
fs/nfs/write.c | 3 +-
include/linux/nfs_fs.h | 25 ++++++++---------
5 files changed, 59 insertions(+), 52 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 9b8b655..ab599be 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1110,7 +1110,8 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
/*
* file needs layout commit, server attributes may be stale
*/
- if (nfsi->layoutcommit_ctx && nfsi->change_attr >= fattr->change_attr) {
+ if (nfsi->layout.layoutcommit_ctx &&
+ nfsi->change_attr >= fattr->change_attr) {
dprintk("NFS: %s: layoutcommit is needed for file %s/%ld\n",
__func__, inode->i_sb->s_id, inode->i_ino);
return 0;
@@ -1328,12 +1329,12 @@ void nfs4_clear_inode(struct inode *inode)
static void pnfs_alloc_init_inode(struct nfs_inode *nfsi)
{
#ifdef CONFIG_NFS_V4_1
- nfsi->pnfs_layout_state = 0;
+ nfsi->layout.pnfs_layout_state = 0;
memset(&nfsi->layout.stateid, 0, NFS4_STATEID_SIZE);
nfsi->layout.roc_iomode = 0;
- nfsi->layoutcommit_ctx = NULL;
- nfsi->pnfs_write_begin_pos = 0;
- nfsi->pnfs_write_end_pos = 0;
+ nfsi->layout.layoutcommit_ctx = NULL;
+ nfsi->layout.pnfs_write_begin_pos = 0;
+ nfsi->layout.pnfs_write_end_pos = 0;
#endif /* CONFIG_NFS_V4_1 */
}

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 15c8bc8..3765ca1 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -589,7 +589,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm
#ifdef CONFIG_NFS_V4_1
struct nfs_inode *nfsi = NFS_I(state->inode);

- if (nfsi->layoutcommit_ctx)
+ if (nfsi->layout.ld_data && nfsi->layout.layoutcommit_ctx)
pnfs_layoutcommit_inode(state->inode, 0);
if (has_layout(nfsi) && nfsi->layout.roc_iomode) {
struct nfs4_pnfs_layout_segment range;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 8df1610..b72c013 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -152,14 +152,14 @@ void
pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx)
{
dprintk("%s: has_layout=%d layoutcommit_ctx=%p ctx=%p\n", __func__,
- has_layout(nfsi), nfsi->layoutcommit_ctx, ctx);
+ has_layout(nfsi), nfsi->layout.layoutcommit_ctx, ctx);
spin_lock(&nfsi->lo_lock);
- if (has_layout(nfsi) && !nfsi->layoutcommit_ctx) {
- nfsi->layoutcommit_ctx = get_nfs_open_context(ctx);
+ if (has_layout(nfsi) && !nfsi->layout.layoutcommit_ctx) {
+ nfsi->layout.layoutcommit_ctx = get_nfs_open_context(ctx);
nfsi->change_attr++;
spin_unlock(&nfsi->lo_lock);
dprintk("%s: Set layoutcommit_ctx=%p\n", __func__,
- nfsi->layoutcommit_ctx);
+ nfsi->layout.layoutcommit_ctx);
return;
}
spin_unlock(&nfsi->lo_lock);
@@ -176,17 +176,17 @@ pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent)
loff_t end_pos;

spin_lock(&nfsi->lo_lock);
- if (offset < nfsi->pnfs_write_begin_pos)
- nfsi->pnfs_write_begin_pos = offset;
+ if (offset < nfsi->layout.pnfs_write_begin_pos)
+ nfsi->layout.pnfs_write_begin_pos = offset;
end_pos = offset + extent - 1; /* I'm being inclusive */
- if (end_pos > nfsi->pnfs_write_end_pos)
- nfsi->pnfs_write_end_pos = end_pos;
+ if (end_pos > nfsi->layout.pnfs_write_end_pos)
+ nfsi->layout.pnfs_write_end_pos = end_pos;
dprintk("%s: Wrote %lu@%lu bpos %lu, epos: %lu\n",
__func__,
(unsigned long) extent,
(unsigned long) offset ,
- (unsigned long) nfsi->pnfs_write_begin_pos,
- (unsigned long) nfsi->pnfs_write_end_pos);
+ (unsigned long) nfsi->layout.pnfs_write_begin_pos,
+ (unsigned long) nfsi->layout.pnfs_write_end_pos);
spin_unlock(&nfsi->lo_lock);
}

@@ -726,7 +726,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
arg.length = ~0;
}
if (type == RETURN_FILE) {
- if (nfsi->layoutcommit_ctx) {
+ if (nfsi->layout.layoutcommit_ctx) {
status = pnfs_layoutcommit_inode(ino, 1);
if (status) {
dprintk("%s: layoutcommit failed, status=%d. "
@@ -903,7 +903,8 @@ get_lock_alloc_layout(struct inode *ino)
* wait until bit is cleared if we lost this race.
*/
res = wait_on_bit_lock(
- &nfsi->pnfs_layout_state, NFS_INO_LAYOUT_ALLOC,
+ &nfsi->layout.pnfs_layout_state,
+ NFS_INO_LAYOUT_ALLOC,
pnfs_wait_schedule, TASK_KILLABLE);
if (res) {
lo = ERR_PTR(res);
@@ -931,8 +932,10 @@ get_lock_alloc_layout(struct inode *ino)
lo = ERR_PTR(-ENOMEM);

/* release the NFS_INO_LAYOUT_ALLOC bit and wake up waiters */
- clear_bit_unlock(NFS_INO_LAYOUT_ALLOC, &nfsi->pnfs_layout_state);
- wake_up_bit(&nfsi->pnfs_layout_state, NFS_INO_LAYOUT_ALLOC);
+ clear_bit_unlock(NFS_INO_LAYOUT_ALLOC,
+ &nfsi->layout.pnfs_layout_state);
+ wake_up_bit(&nfsi->layout.pnfs_layout_state,
+ NFS_INO_LAYOUT_ALLOC);
break;
}

@@ -1116,13 +1119,13 @@ pnfs_update_layout(struct inode *ino,
}

/* if get layout already failed once goto out */
- if (test_bit(lo_fail_bit(iomode), &nfsi->pnfs_layout_state)) {
- if (unlikely(nfsi->pnfs_layout_suspend &&
- get_seconds() >= nfsi->pnfs_layout_suspend)) {
+ if (test_bit(lo_fail_bit(iomode), &nfsi->layout.pnfs_layout_state)) {
+ if (unlikely(nfsi->layout.pnfs_layout_suspend &&
+ get_seconds() >= nfsi->layout.pnfs_layout_suspend)) {
dprintk("%s: layout_get resumed\n", __func__);
clear_bit(lo_fail_bit(iomode),
- &nfsi->pnfs_layout_state);
- nfsi->pnfs_layout_suspend = 0;
+ &nfsi->layout.pnfs_layout_state);
+ nfsi->layout.pnfs_layout_suspend = 0;
} else {
result = 1;
goto out_put;
@@ -1138,7 +1141,8 @@ pnfs_update_layout(struct inode *ino,
result = get_layout(ino, ctx, &arg, lsegpp, lo);
out:
dprintk("%s end (err:%d) state 0x%lx lseg %p\n",
- __func__, result, nfsi->pnfs_layout_state, lseg);
+ __func__, result, nfsi->layout.pnfs_layout_state,
+ lseg);
return result;
out_put:
if (lsegpp)
@@ -1243,13 +1247,14 @@ pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)

get_out:
/* remember that get layout failed and suspend trying */
- nfsi->pnfs_layout_suspend = suspend;
- set_bit(lo_fail_bit(lgp->args.lseg.iomode), &nfsi->pnfs_layout_state);
+ nfsi->layout.pnfs_layout_suspend = suspend;
+ set_bit(lo_fail_bit(lgp->args.lseg.iomode),
+ &nfsi->layout.pnfs_layout_state);
dprintk("%s: layout_get suspended until %ld\n",
__func__, suspend);
out:
dprintk("%s end (err:%d) state 0x%lx lseg %p\n",
- __func__, lgp->status, nfsi->pnfs_layout_state, lseg);
+ __func__, lgp->status, nfsi->layout.pnfs_layout_state, lseg);
return;
}

@@ -2166,9 +2171,10 @@ pnfs_layoutcommit_setup(struct pnfs_layoutcommit_data *data, int sync)
/* Set values from inode so it can be reset
*/
data->args.lseg.iomode = IOMODE_RW;
- data->args.lseg.offset = nfsi->pnfs_write_begin_pos;
- data->args.lseg.length = nfsi->pnfs_write_end_pos - nfsi->pnfs_write_begin_pos + 1;
- data->args.lastbytewritten = nfsi->pnfs_write_end_pos;
+ data->args.lseg.offset = nfsi->layout.pnfs_write_begin_pos;
+ data->args.lseg.length = nfsi->layout.pnfs_write_end_pos -
+ nfsi->layout.pnfs_write_begin_pos + 1;
+ data->args.lastbytewritten = nfsi->layout.pnfs_write_end_pos;
data->args.bitmask = nfss->attr_bitmask;
data->res.server = nfss;

@@ -2207,12 +2213,12 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
return -ENOMEM;

spin_lock(&nfsi->lo_lock);
- if (!nfsi->layoutcommit_ctx)
+ if (!nfsi->layout.layoutcommit_ctx)
goto out_unlock;

data->args.inode = inode;
- data->cred = nfsi->layoutcommit_ctx->cred;
- data->ctx = nfsi->layoutcommit_ctx;
+ data->cred = nfsi->layout.layoutcommit_ctx->cred;
+ data->ctx = nfsi->layout.layoutcommit_ctx;

/* Set up layout commit args*/
status = pnfs_layoutcommit_setup(data, sync);
@@ -2222,9 +2228,9 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
/* Clear layoutcommit properties in the inode so
* new lc info can be generated
*/
- nfsi->pnfs_write_begin_pos = 0;
- nfsi->pnfs_write_end_pos = 0;
- nfsi->layoutcommit_ctx = NULL;
+ nfsi->layout.pnfs_write_begin_pos = 0;
+ nfsi->layout.pnfs_write_end_pos = 0;
+ nfsi->layout.layoutcommit_ctx = NULL;

/* release lock on pnfs layoutcommit attrs */
spin_unlock(&nfsi->lo_lock);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index d058781..57bfc85 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1494,7 +1494,8 @@ static int nfs_commit_inode(struct inode *inode, int how)
nfs_wait_bit_killable,
TASK_KILLABLE);
#ifdef CONFIG_NFS_V4_1
- if (may_wait && NFS_I(inode)->layoutcommit_ctx) {
+ if (may_wait && NFS_I(inode)->layout.ld_data &&
+ NFS_I(inode)->layout.layoutcommit_ctx) {
error = pnfs_layoutcommit_inode(inode, 1);
if (error < 0)
return error;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 68b3b5c..5048b97 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -106,6 +106,18 @@ struct pnfs_layout_type {
seqlock_t seqlock; /* Protects the stateid */
nfs4_stateid stateid;
void *ld_data; /* layout driver private data */
+ unsigned long pnfs_layout_state;
+ #define NFS_INO_RO_LAYOUT_FAILED 0 /* get ro layout failed stop trying */
+ #define NFS_INO_RW_LAYOUT_FAILED 1 /* get rw layout failed stop trying */
+ #define NFS_INO_LAYOUT_ALLOC 2 /* bit lock for layout allocation */
+ time_t pnfs_layout_suspend;
+ /* use rpc_creds in this open_context to send LAYOUTCOMMIT to MDS */
+ struct nfs_open_context *layoutcommit_ctx;
+ /* DH: These vars keep track of the maximum write range
+ * so the values can be used for layoutcommit.
+ */
+ loff_t pnfs_write_begin_pos;
+ loff_t pnfs_write_end_pos;
};

/*
@@ -197,22 +209,9 @@ struct nfs_inode {
#if defined(CONFIG_NFS_V4_1)
/* Inodes having layouts */
struct list_head lo_inodes;
-
- unsigned long pnfs_layout_state;
-#define NFS_INO_RO_LAYOUT_FAILED 0 /* get ro layout failed stop trying */
-#define NFS_INO_RW_LAYOUT_FAILED 1 /* get rw layout failed stop trying */
-#define NFS_INO_LAYOUT_ALLOC 2 /* bit lock for layout allocation */
- time_t pnfs_layout_suspend;
wait_queue_head_t lo_waitq;
spinlock_t lo_lock;
struct pnfs_layout_type layout;
- /* use rpc_creds in this open_context to send LAYOUTCOMMIT to MDS */
- struct nfs_open_context *layoutcommit_ctx;
- /* DH: These vars keep track of the maximum write range
- * so the values can be used for layoutcommit.
- */
- loff_t pnfs_write_begin_pos;
- loff_t pnfs_write_end_pos;
#endif /* CONFIG_NFS_V4_1 */
#endif /* CONFIG_NFS_V4*/
#ifdef CONFIG_NFS_FSCACHE
--
1.6.2.5


2010-05-17 17:56:54

by Alexandros Batsakis

[permalink] [raw]
Subject: [PATCH 5/8] pnfs-submit: request whole file layouts only

In the first iteration of the pNFS code, we support only whole-file layouts.
To facilitate the move to multiple-segments, we keep the segment processing
code, but the segment list should always contain a max of one segment per I/O type

Signed-off-by: Alexandros Batsakis <[email protected]>
---
fs/nfs/callback_proc.c | 7 ++++---
fs/nfs/pnfs.c | 25 +++++--------------------
2 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 8ef1502..bfada25 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -213,6 +213,10 @@ static int pnfs_recall_layout(void *data)
then return layouts, resume after layoutreturns complete
*/

+ /* support whole file layouts only */
+ rl.cbl_seg.offset = 0;
+ rl.cbl_seg.length = NFS4_MAX_UINT64;
+
if (rl.cbl_recall_type == RETURN_FILE) {
status = pnfs_return_layout(inode, &rl.cbl_seg, &rl.cbl_stateid,
RETURN_FILE);
@@ -221,9 +225,6 @@ static int pnfs_recall_layout(void *data)
goto out;
}

- rl.cbl_seg.offset = 0;
- rl.cbl_seg.length = NFS4_MAX_UINT64;
-
/* FIXME: This loop is inefficient, running in O(|s_inodes|^2) */
while ((ino = nfs_layoutrecall_find_inode(clp, &rl)) != NULL) {
/* XXX need to check status on pnfs_return_layout */
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index ecf6dc2..2cc8895 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -546,12 +546,6 @@ pnfs_layout_from_open_stateid(nfs4_stateid *dst, struct nfs4_state *state)
* for now, assume that whole file layouts are requested.
* arg->offset: 0
* arg->length: all ones
-*
-* for now, assume the LAYOUTGET operation is triggered by an I/O request.
-* the count field is the count in the I/O request, and will be used
-* as the minlength. for the file operation that piggy-backs
-* the LAYOUTGET operation with an OPEN, s
-* arg->minlength = count.
*/
static int
get_layout(struct inode *ino,
@@ -572,10 +566,10 @@ get_layout(struct inode *ino,
return -ENOMEM;
}
lgp->lo = lo;
- lgp->args.minlength = PAGE_CACHE_SIZE;
+ lgp->args.minlength = NFS4_MAX_UINT64;
lgp->args.maxcount = PNFS_LAYOUT_MAXSIZE;
lgp->args.lseg.iomode = range->iomode;
- lgp->args.lseg.offset = range->offset;
+ lgp->args.lseg.offset = 0;
lgp->args.lseg.length = max(range->length, lgp->args.minlength);
lgp->args.type = server->pnfs_curr_ld->id;
lgp->args.inode = ino;
@@ -1068,8 +1062,8 @@ pnfs_update_layout(struct inode *ino,
{
struct nfs4_pnfs_layout_segment arg = {
.iomode = iomode,
- .offset = pos,
- .length = count
+ .offset = 0,
+ .length = ~0
};
struct nfs_inode *nfsi = NFS_I(ino);
struct pnfs_layout_type *lo;
@@ -1159,7 +1153,6 @@ out_put:
void
pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
{
- struct nfs4_pnfs_layoutget_res *res = &lgp->res;
struct pnfs_layout_segment *lseg = NULL;
struct nfs_inode *nfsi = PNFS_NFS_INODE(lgp->lo);
time_t suspend = 0;
@@ -1167,15 +1160,8 @@ pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
dprintk("-->%s\n", __func__);

lgp->status = rpc_status;
- if (likely(!rpc_status)) {
- if (unlikely(res->layout.len <= 0)) {
- printk(KERN_ERR
- "%s: ERROR! Layout size is ZERO!\n", __func__);
- lgp->status = -EIO;
- goto get_out;
- }
+ if (likely(!rpc_status))
goto out;
- }

dprintk("%s: ERROR retrieving layout %d\n", __func__, rpc_status);
switch (rpc_status) {
@@ -1250,7 +1236,6 @@ pnfs_get_layout_done(struct nfs4_pnfs_layoutget *lgp, int rpc_status)
break;
}

-get_out:
/* remember that get layout failed and suspend trying */
nfsi->layout.pnfs_layout_suspend = suspend;
set_bit(lo_fail_bit(lgp->args.lseg.iomode),
--
1.6.2.5


2010-05-17 17:56:53

by Alexandros Batsakis

[permalink] [raw]
Subject: [PATCH 3/8] pnfs-submit: remove lgetcount, lretcount (outstanding LAYOUTGETs/LAYOUTRETUNs)

This is in order to prepare for the forgetful client. There is no need to explicitly count the number of outstanding layout operations, as the protocol has provision for it (seqid of stateid -- e.g. section 12.5.5.2.1.2). As long as no requests for intersecting layouts are issued, LAYOUTGETs/LAYOUTRETURNs can be sent in parallel

Signed-off-by: Alexandros Batsakis <[email protected]>
---
fs/nfs/nfs4proc.c | 5 ++---
fs/nfs/pnfs.c | 46 ++++++++++++----------------------------------
fs/nfs/pnfs.h | 3 +--
include/linux/nfs_fs.h | 2 --
4 files changed, 15 insertions(+), 41 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index df908be..a8a6ad2 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5501,7 +5501,7 @@ static void nfs4_pnfs_layoutget_release(void *calldata)
struct nfs4_pnfs_layoutget *lgp = calldata;

dprintk("--> %s\n", __func__);
- pnfs_layout_release(lgp->lo, &lgp->lo->lgetcount, NULL);
+ pnfs_layout_release(lgp->lo, NULL);
if (lgp->res.layout.buf != NULL)
free_page((unsigned long) lgp->res.layout.buf);
kfree(calldata);
@@ -5722,8 +5722,7 @@ static void nfs4_pnfs_layoutreturn_release(void *calldata)
if (lrp->lo && (lrp->args.return_type == RETURN_FILE)) {
if (!lrp->res.lrs_present)
pnfs_set_layout_stateid(lrp->lo, &zero_stateid);
- pnfs_layout_release(lrp->lo, &lrp->lo->lretcount,
- &lrp->args.lseg);
+ pnfs_layout_release(lrp->lo, &lrp->args.lseg);
}
kfree(calldata);
dprintk("<-- %s\n", __func__);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 74cb998..b2d203f 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -375,7 +375,7 @@ out:
}

void
-pnfs_layout_release(struct pnfs_layout_type *lo, atomic_t *count,
+pnfs_layout_release(struct pnfs_layout_type *lo,
struct nfs4_pnfs_layout_segment *range)
{
struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);
@@ -383,7 +383,6 @@ pnfs_layout_release(struct pnfs_layout_type *lo, atomic_t *count,
lock_current_layout(nfsi);
if (range)
pnfs_free_layout(lo, range);
- atomic_dec(count);
put_unlock_current_layout(lo);
wake_up_all(&nfsi->lo_waitq);
}
@@ -568,7 +567,7 @@ get_layout(struct inode *ino,

lgp = kzalloc(sizeof(*lgp), GFP_KERNEL);
if (lgp == NULL) {
- pnfs_layout_release(lo, &lo->lgetcount, NULL);
+ pnfs_layout_release(lo, NULL);
return -ENOMEM;
}
lgp->lo = lo;
@@ -642,6 +641,13 @@ has_layout_to_return(struct pnfs_layout_type *lo,
return out;
}

+static inline bool
+_pnfs_can_return_lseg(struct pnfs_layout_segment *lseg)
+{
+ return atomic_read(&lseg->kref.refcount) == 1;
+}
+
+
static void
pnfs_free_layout(struct pnfs_layout_type *lo,
struct nfs4_pnfs_layout_segment *range)
@@ -652,7 +658,8 @@ pnfs_free_layout(struct pnfs_layout_type *lo,

BUG_ON_UNLOCKED_LO(lo);
list_for_each_entry_safe (lseg, next, &lo->segs, fi_list) {
- if (!should_free_lseg(lseg, range))
+ if (!should_free_lseg(lseg, range) ||
+ !_pnfs_can_return_lseg(lseg))
continue;
dprintk("%s: freeing lseg %p iomode %d "
"offset %llu length %llu\n", __func__,
@@ -665,12 +672,6 @@ pnfs_free_layout(struct pnfs_layout_type *lo,
dprintk("%s:Return\n", __func__);
}

-static inline bool
-_pnfs_can_return_lseg(struct pnfs_layout_segment *lseg)
-{
- return atomic_read(&lseg->kref.refcount) == 1;
-}
-
static bool
pnfs_return_layout_barrier(struct nfs_inode *nfsi,
struct nfs4_pnfs_layout_segment *range)
@@ -690,8 +691,6 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi,
ret = true;
}
}
- if (atomic_read(&nfsi->layout.lgetcount))
- ret = true;
unlock_current_layout(nfsi);

dprintk("%s:Return %d\n", __func__, ret);
@@ -712,7 +711,7 @@ return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
if (lrp == NULL) {
if (lo && (type == RETURN_FILE))
- pnfs_layout_release(lo, &lo->lretcount, NULL);
+ pnfs_layout_release(lo, NULL);
goto out;
}
lrp->args.reclaim = 0;
@@ -776,9 +775,6 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
goto out;
}

- /* Matching dec is done in .rpc_release (on non-error paths) */
- atomic_inc(&lo->lretcount);
-
/* unlock w/o put rebalanced by eventual call to
* pnfs_layout_release
*/
@@ -893,8 +889,6 @@ alloc_init_layout(struct inode *ino)
seqlock_init(&lo->seqlock);
memset(&lo->stateid, 0, NFS4_STATEID_SIZE);
lo->refcount = 1;
- atomic_set(&lo->lgetcount, 0);
- atomic_set(&lo->lretcount, 0);
INIT_LIST_HEAD(&lo->segs);
lo->roc_iomode = 0;
return lo;
@@ -1058,19 +1052,6 @@ pnfs_find_get_lseg(struct inode *inode,
return lseg;
}

-/* Called with spin lock held */
-void drain_layoutreturns(struct pnfs_layout_type *lo)
-{
- while (atomic_read(&lo->lretcount)) {
- struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);
-
- unlock_current_layout(nfsi);
- dprintk("%s: waiting\n", __func__);
- wait_event(nfsi->lo_waitq, (atomic_read(&lo->lretcount) == 0));
- lock_current_layout(nfsi);
- }
-}
-
/* Update the file's layout for the given range and iomode.
* Layout is retreived from the server if needed.
* If lsegpp is given, the appropriate layout segment is referenced and
@@ -1158,9 +1139,6 @@ pnfs_update_layout(struct inode *ino,
}
}

- drain_layoutreturns(lo);
- /* Matching dec is done in .rpc_release (on non-error paths) */
- atomic_inc(&lo->lgetcount);
/* Lose lock, but not reference, match this with pnfs_layout_release */
unlock_current_layout(nfsi);

diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 8318112..9e43973 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -71,8 +71,7 @@ void pnfs_free_fsdata(struct pnfs_fsdata *fsdata);
ssize_t pnfs_file_write(struct file *, const char __user *, size_t, loff_t *);
void pnfs_get_layout_done(struct nfs4_pnfs_layoutget *, int rpc_status);
int pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp);
-void pnfs_layout_release(struct pnfs_layout_type *, atomic_t *,
- struct nfs4_pnfs_layout_segment *range);
+void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segment *range);
void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
const nfs4_stateid *stateid);
void pnfs_destroy_layout(struct nfs_inode *);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 5048b97..2434bba 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -99,8 +99,6 @@ struct posix_acl;

struct pnfs_layout_type {
int refcount;
- atomic_t lretcount; /* Layoutreturns outstanding */
- atomic_t lgetcount; /* Layoutgets outstanding */
struct list_head segs; /* layout segments list */
int roc_iomode; /* iomode to return on close, 0=none */
seqlock_t seqlock; /* Protects the stateid */
--
1.6.2.5


2010-05-17 17:56:53

by Alexandros Batsakis

[permalink] [raw]
Subject: [PATCH 6/8] pnfs-submit: change layouts list to be similar to the other state list management

The current design keeps a list (nfs_client) of inodes having layouts.
In order to make that code more similar to delegation handling (and in general to the rest of the NFS code), this patch changes the list element to layouts directly.
No backpointer from the layout to the inode is needed as the inode can be accesed by a container_of() call

Signed-off-by: Alexandros Batsakis <[email protected]>
---
fs/nfs/callback_proc.c | 9 +++++++--
fs/nfs/client.c | 2 +-
fs/nfs/inode.c | 4 ++--
fs/nfs/pnfs.c | 10 ++++------
include/linux/nfs_fs.h | 3 +--
include/linux/nfs_fs_sb.h | 2 +-
6 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index bfada25..24e2571 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -76,7 +76,6 @@ static int (*nfs_validate_delegation_stateid(struct nfs_client *clp))(struct nfs
return nfs4_validate_delegation_stateid;
}

-
__be32 nfs4_callback_recall(struct cb_recallargs *args, void *dummy)
{
struct nfs_client *clp;
@@ -140,6 +139,7 @@ nfs_layoutrecall_find_inode(struct nfs_client *clp,
const struct cb_pnfs_layoutrecallargs *args)
{
struct nfs_inode *nfsi;
+ struct pnfs_layout_type *layout;
struct nfs_server *server;
struct inode *ino = NULL;

@@ -147,9 +147,14 @@ nfs_layoutrecall_find_inode(struct nfs_client *clp,
__func__, args->cbl_recall_type, clp);

spin_lock(&clp->cl_lock);
- list_for_each_entry(nfsi, &clp->cl_lo_inodes, lo_inodes) {
+ list_for_each_entry(layout, &clp->cl_layouts, lo_layouts) {
+ nfsi = PNFS_NFS_INODE(layout);
+ if (!nfsi)
+ continue;
+
dprintk("%s: Searching inode=%lu\n",
__func__, nfsi->vfs_inode.i_ino);
+
if (args->cbl_recall_type == RETURN_FILE) {
if (nfs_compare_fh(&args->cbl_fh, &nfsi->fh))
continue;
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index d4759dd..d60e5d7 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -157,7 +157,7 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
if (!IS_ERR(cred))
clp->cl_machine_cred = cred;
#if defined(CONFIG_NFS_V4_1)
- INIT_LIST_HEAD(&clp->cl_lo_inodes);
+ INIT_LIST_HEAD(&clp->cl_layouts);
#endif
nfs_fscache_get_client_cookie(clp);

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index ab599be..adf3280 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1363,7 +1363,7 @@ static void pnfs_destroy_inode(struct nfs_inode *nfsi)
if (!list_empty(&nfsi->layout.segs))
pnfs_destroy_layout(nfsi);

- BUG_ON(!list_empty(&nfsi->lo_inodes));
+ BUG_ON(!list_empty(&nfsi->layout.lo_layouts));
BUG_ON(!list_empty(&nfsi->layout.segs));
BUG_ON(nfsi->layout.refcount);
BUG_ON(nfsi->layout.ld_data);
@@ -1381,10 +1381,10 @@ void nfs_destroy_inode(struct inode *inode)
static void pnfs_init_once(struct nfs_inode *nfsi)
{
#ifdef CONFIG_NFS_V4_1
- INIT_LIST_HEAD(&nfsi->lo_inodes);
init_waitqueue_head(&nfsi->lo_waitq);
spin_lock_init(&nfsi->lo_lock);
seqlock_init(&nfsi->layout.seqlock);
+ INIT_LIST_HEAD(&nfsi->layout.lo_layouts);
INIT_LIST_HEAD(&nfsi->layout.segs);
nfsi->layout.refcount = 0;
nfsi->layout.ld_data = NULL;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 2cc8895..3b4dea0 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -364,10 +364,10 @@ put_unlock_current_layout(struct pnfs_layout_type *lo)
io_ops->free_layout(lo->ld_data);
lo->ld_data = NULL;

- /* Unlist the inode. */
+ /* Unlist the layout. */
clp = NFS_SERVER(&nfsi->vfs_inode)->nfs_client;
spin_lock(&clp->cl_lock);
- list_del_init(&nfsi->lo_inodes);
+ list_del_init(&lo->lo_layouts);
spin_unlock(&clp->cl_lock);
}
out:
@@ -881,10 +881,8 @@ alloc_init_layout(struct inode *ino)

BUG_ON(lo->ld_data != NULL);
lo->ld_data = ld_data;
- seqlock_init(&lo->seqlock);
memset(&lo->stateid, 0, NFS4_STATEID_SIZE);
lo->refcount = 1;
- INIT_LIST_HEAD(&lo->segs);
lo->roc_iomode = 0;
return lo;
}
@@ -940,8 +938,8 @@ get_lock_alloc_layout(struct inode *ino)
lock_current_layout(nfsi);

spin_lock(&clp->cl_lock);
- if (list_empty(&nfsi->lo_inodes))
- list_add_tail(&nfsi->lo_inodes, &clp->cl_lo_inodes);
+ if (list_empty(&lo->lo_layouts))
+ list_add_tail(&lo->lo_layouts, &clp->cl_layouts);
spin_unlock(&clp->cl_lock);
} else
lo = ERR_PTR(-ENOMEM);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 2434bba..2a80be8 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -99,6 +99,7 @@ struct posix_acl;

struct pnfs_layout_type {
int refcount;
+ struct list_head lo_layouts; /* other inode layouts */
struct list_head segs; /* layout segments list */
int roc_iomode; /* iomode to return on close, 0=none */
seqlock_t seqlock; /* Protects the stateid */
@@ -205,8 +206,6 @@ struct nfs_inode {

/* pNFS layout information */
#if defined(CONFIG_NFS_V4_1)
- /* Inodes having layouts */
- struct list_head lo_inodes;
wait_queue_head_t lo_waitq;
spinlock_t lo_lock;
struct pnfs_layout_type layout;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 00a4e7e..0c92889 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -86,7 +86,7 @@ struct nfs_client {
/* The flags used for obtaining the clientid during EXCHANGE_ID */
u32 cl_exchange_flags;
struct nfs4_session *cl_session; /* sharred session */
- struct list_head cl_lo_inodes; /* Inodes having layouts */
+ struct list_head cl_layouts;
struct nfs4_deviceid_cache *cl_devid_cache; /* pNFS deviceid cache */
#endif /* CONFIG_NFS_V4_1 */

--
1.6.2.5


2010-05-17 17:56:55

by Alexandros Batsakis

[permalink] [raw]
Subject: [PATCH 8/8] pnfs-submit: support for cb_recall_any (layouts)

CB_RECALL_ANY serves as a hint to the client to return some server state.
We reply immediately and we clean the layouts asycnhronously.

FIXME: currently we return _all_ layouts
FIXME: eventually we should treat layouts as delegations, marked them expired
and fire the state manager to clean them.

Signed-off-by: Alexandros Batsakis <[email protected]>
---
fs/nfs/callback.h | 7 ++++++
fs/nfs/callback_proc.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 58 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/callback.h b/fs/nfs/callback.h
index 73f21bc..b39ac86 100644
--- a/fs/nfs/callback.h
+++ b/fs/nfs/callback.h
@@ -115,6 +115,13 @@ extern int nfs41_validate_delegation_stateid(struct nfs_delegation *delegation,

#define RCA4_TYPE_MASK_RDATA_DLG 0
#define RCA4_TYPE_MASK_WDATA_DLG 1
+#define RCA4_TYPE_MASK_DIR_DLG 2
+#define RCA4_TYPE_MASK_FILE_LAYOUT 3
+#define RCA4_TYPE_MASK_BLK_LAYOUT 4
+#define RCA4_TYPE_MASK_OBJ_LAYOUT_MIN 8
+#define RCA4_TYPE_MASK_OBJ_LAYOUT_MAX 9
+#define RCA4_TYPE_MASK_OTHER_LAYOUT_MIN 12
+#define RCA4_TYPE_MASK_OTHER_LAYOUT_MAX 15

struct cb_recallanyargs {
struct sockaddr *craa_addr;
diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 419fee7..d09fb07 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -337,6 +337,26 @@ out_put_no_client:
return status;
}

+static int pnfs_recall_all_layouts(struct nfs_client *clp)
+{
+ struct cb_pnfs_layoutrecallargs rl;
+ struct inode *inode;
+ int status = 0;
+
+ rl.cbl_recall_type = RETURN_ALL;
+ rl.cbl_seg.offset = 0;
+ rl.cbl_seg.length = NFS4_MAX_UINT64;
+ rl.cbl_seg.iomode = IOMODE_ANY;
+ /* we need the inode to get the nfs_server struct */
+ inode = nfs_layoutrecall_find_inode(clp, &rl);
+ if (!inode)
+ return status;
+ status = pnfs_async_return_layout(clp, inode, &rl);
+ iput(inode);
+
+ return status;
+}
+
__be32 pnfs_cb_layoutrecall(struct cb_pnfs_layoutrecallargs *args,
void *dummy)
{
@@ -651,6 +671,27 @@ out:
return status;
}

+static inline int
+validate_bitmap_values(const unsigned long* mask)
+{
+ int i;
+ if (test_bit(RCA4_TYPE_MASK_RDATA_DLG, mask) ||
+ test_bit(RCA4_TYPE_MASK_WDATA_DLG, mask) ||
+ test_bit(RCA4_TYPE_MASK_DIR_DLG, mask) ||
+ test_bit(RCA4_TYPE_MASK_FILE_LAYOUT, mask) ||
+ test_bit(RCA4_TYPE_MASK_BLK_LAYOUT, mask))
+ return 1;
+ for (i = RCA4_TYPE_MASK_OBJ_LAYOUT_MIN;
+ i <= RCA4_TYPE_MASK_OBJ_LAYOUT_MAX; i++)
+ if (test_bit(i, mask))
+ return 1;
+ for (i = RCA4_TYPE_MASK_OTHER_LAYOUT_MIN;
+ i <= RCA4_TYPE_MASK_OTHER_LAYOUT_MAX; i++)
+ if (test_bit(i, mask))
+ return 1;
+ return 0;
+}
+
__be32 nfs4_callback_recallany(struct cb_recallanyargs *args, void *dummy)
{
struct nfs_client *clp;
@@ -665,16 +706,25 @@ __be32 nfs4_callback_recallany(struct cb_recallanyargs *args, void *dummy)
dprintk("NFS: RECALL_ANY callback request from %s\n",
rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR));

+ status = htonl(NFS4ERR_INVAL);
+ if (!validate_bitmap_values((const unsigned long *)
+ &args->craa_type_mask))
+ return status;
+
if (test_bit(RCA4_TYPE_MASK_RDATA_DLG, (const unsigned long *)
&args->craa_type_mask))
flags = FMODE_READ;
if (test_bit(RCA4_TYPE_MASK_WDATA_DLG, (const unsigned long *)
&args->craa_type_mask))
flags |= FMODE_WRITE;
+ if (test_bit(RCA4_TYPE_MASK_FILE_LAYOUT, (const unsigned long *)
+ &args->craa_type_mask))
+ status = pnfs_recall_all_layouts(clp);

if (flags)
nfs_expire_all_delegation_types(clp, flags);
- status = htonl(NFS4_OK);
+ if (status != htonl(NFS4ERR_DELAY))
+ status = htonl(NFS4_OK);
out:
dprintk("%s: exit with status = %d\n", __func__, ntohl(status));
return status;
--
1.6.2.5


2010-05-17 17:56:53

by Alexandros Batsakis

[permalink] [raw]
Subject: [PATCH 4/8] pnfs-submit: change stateid to be a union

In NFSv4.1 the stateid consists of the "other" and "seqid" fields. For layout
processing we need to numerically compare the seqid value among layout stateids.
To do so, introduce a union to nfs4_stateid to switch between opaque(16 bytes)
and opaque(12 bytes) / __be32

Signed-off-by: Alexandros Batsakis <[email protected]>
---
fs/nfs/callback_proc.c | 13 +++++++------
fs/nfs/callback_xdr.c | 2 +-
fs/nfs/delegation.c | 19 +++++++++++--------
fs/nfs/nfs4proc.c | 41 +++++++++++++++++++++++++----------------
fs/nfs/nfs4state.c | 4 ++--
fs/nfs/nfs4xdr.c | 38 +++++++++++++++++++++-----------------
fs/nfs/pnfs.c | 11 ++++++-----
fs/nfsd/nfs4callback.c | 1 -
include/linux/nfs4.h | 18 ++++++++++++++++--
9 files changed, 89 insertions(+), 58 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index ebf86df..8ef1502 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -121,8 +121,9 @@ out:

int nfs4_validate_delegation_stateid(struct nfs_delegation *delegation, const nfs4_stateid *stateid)
{
- if (delegation == NULL || memcmp(delegation->stateid.data, stateid->data,
- sizeof(delegation->stateid.data)) != 0)
+ if (delegation == NULL || memcmp(delegation->stateid.u.data,
+ stateid->u.data,
+ sizeof(delegation->stateid.u.data)))
return 0;
return 1;
}
@@ -383,11 +384,11 @@ int nfs41_validate_delegation_stateid(struct nfs_delegation *delegation, const n
if (delegation == NULL)
return 0;

- /* seqid is 4-bytes long */
- if (((u32 *) &stateid->data)[0] != 0)
+ if (stateid->u.stateid.seqid != 0)
return 0;
- if (memcmp(&delegation->stateid.data[4], &stateid->data[4],
- sizeof(stateid->data)-4))
+ if (memcmp(&delegation->stateid.u.stateid.other,
+ &stateid->u.stateid.other,
+ NFS4_STATEID_OTHER_SIZE))
return 0;

return 1;
diff --git a/fs/nfs/callback_xdr.c b/fs/nfs/callback_xdr.c
index 69a026d..7e34bb3 100644
--- a/fs/nfs/callback_xdr.c
+++ b/fs/nfs/callback_xdr.c
@@ -138,7 +138,7 @@ static __be32 decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
p = read_buf(xdr, 16);
if (unlikely(p == NULL))
return htonl(NFS4ERR_RESOURCE);
- memcpy(stateid->data, p, 16);
+ memcpy(stateid->u.data, p, 16);
return 0;
}

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 1567124..2e43ae2 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -108,7 +108,8 @@ again:
continue;
if (!test_bit(NFS_DELEGATED_STATE, &state->flags))
continue;
- if (memcmp(state->stateid.data, stateid->data, sizeof(state->stateid.data)) != 0)
+ if (memcmp(state->stateid.u.data, stateid->u.data,
+ sizeof(state->stateid.u.data)) != 0)
continue;
get_nfs_open_context(ctx);
spin_unlock(&inode->i_lock);
@@ -134,8 +135,8 @@ void nfs_inode_reclaim_delegation(struct inode *inode, struct rpc_cred *cred, st

if (delegation == NULL)
return;
- memcpy(delegation->stateid.data, res->delegation.data,
- sizeof(delegation->stateid.data));
+ memcpy(delegation->stateid.u.data, res->delegation.u.data,
+ sizeof(delegation->stateid.u.data));
delegation->type = res->delegation_type;
delegation->maxsize = res->maxsize;
oldcred = delegation->cred;
@@ -173,8 +174,9 @@ static struct nfs_delegation *nfs_detach_delegation_locked(struct nfs_inode *nfs
if (delegation == NULL)
goto nomatch;
spin_lock(&delegation->lock);
- if (stateid != NULL && memcmp(delegation->stateid.data, stateid->data,
- sizeof(delegation->stateid.data)) != 0)
+ if (stateid != NULL && memcmp(delegation->stateid.u.data,
+ stateid->u.data,
+ sizeof(delegation->stateid.u.data)) != 0)
goto nomatch_unlock;
list_del_rcu(&delegation->super_list);
delegation->inode = NULL;
@@ -202,8 +204,8 @@ int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct
delegation = kmalloc(sizeof(*delegation), GFP_KERNEL);
if (delegation == NULL)
return -ENOMEM;
- memcpy(delegation->stateid.data, res->delegation.data,
- sizeof(delegation->stateid.data));
+ memcpy(delegation->stateid.u.data, res->delegation.u.data,
+ sizeof(delegation->stateid.u.data));
delegation->type = res->delegation_type;
delegation->maxsize = res->maxsize;
delegation->change_attr = nfsi->change_attr;
@@ -546,7 +548,8 @@ int nfs4_copy_delegation_stateid(nfs4_stateid *dst, struct inode *inode)
rcu_read_lock();
delegation = rcu_dereference(nfsi->delegation);
if (delegation != NULL) {
- memcpy(dst->data, delegation->stateid.data, sizeof(dst->data));
+ memcpy(dst->u.data, delegation->stateid.u.data,
+ sizeof(dst->u.data));
ret = 1;
}
rcu_read_unlock();
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index a8a6ad2..6739d15 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -865,8 +865,10 @@ static void update_open_stateflags(struct nfs4_state *state, fmode_t fmode)
static void nfs_set_open_stateid_locked(struct nfs4_state *state, nfs4_stateid *stateid, fmode_t fmode)
{
if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
- memcpy(state->stateid.data, stateid->data, sizeof(state->stateid.data));
- memcpy(state->open_stateid.data, stateid->data, sizeof(state->open_stateid.data));
+ memcpy(state->stateid.u.data, stateid->u.data,
+ sizeof(state->stateid.u.data));
+ memcpy(state->open_stateid.u.data, stateid->u.data,
+ sizeof(state->open_stateid.u.data));
switch (fmode) {
case FMODE_READ:
set_bit(NFS_O_RDONLY_STATE, &state->flags);
@@ -894,7 +896,8 @@ static void __update_open_stateid(struct nfs4_state *state, nfs4_stateid *open_s
*/
write_seqlock(&state->seqlock);
if (deleg_stateid != NULL) {
- memcpy(state->stateid.data, deleg_stateid->data, sizeof(state->stateid.data));
+ memcpy(state->stateid.u.data, deleg_stateid->u.data,
+ sizeof(state->stateid.u.data));
set_bit(NFS_DELEGATED_STATE, &state->flags);
}
if (open_stateid != NULL)
@@ -925,7 +928,8 @@ static int update_open_stateid(struct nfs4_state *state, nfs4_stateid *open_stat

if (delegation == NULL)
delegation = &deleg_cur->stateid;
- else if (memcmp(deleg_cur->stateid.data, delegation->data, NFS4_STATEID_SIZE) != 0)
+ else if (memcmp(deleg_cur->stateid.u.data, delegation->u.data,
+ NFS4_STATEID_SIZE) != 0)
goto no_delegation_unlock;

nfs_mark_delegation_referenced(deleg_cur);
@@ -987,7 +991,8 @@ static struct nfs4_state *nfs4_try_open_cached(struct nfs4_opendata *opendata)
break;
}
/* Save the delegation */
- memcpy(stateid.data, delegation->stateid.data, sizeof(stateid.data));
+ memcpy(stateid.u.data, delegation->stateid.u.data,
+ sizeof(stateid.u.data));
rcu_read_unlock();
ret = nfs_may_open(state->inode, state->owner->so_cred, open_mode);
if (ret != 0)
@@ -1152,10 +1157,13 @@ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *
* Check if we need to update the current stateid.
*/
if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0 &&
- memcmp(state->stateid.data, state->open_stateid.data, sizeof(state->stateid.data)) != 0) {
+ memcmp(state->stateid.u.data, state->open_stateid.u.data,
+ sizeof(state->stateid.u.data)) != 0) {
write_seqlock(&state->seqlock);
if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
- memcpy(state->stateid.data, state->open_stateid.data, sizeof(state->stateid.data));
+ memcpy(state->stateid.u.data,
+ state->open_stateid.u.data,
+ sizeof(state->stateid.u.data));
write_sequnlock(&state->seqlock);
}
pnfs4_layout_reclaim(state);
@@ -1226,8 +1234,8 @@ static int _nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs
if (IS_ERR(opendata))
return PTR_ERR(opendata);
opendata->o_arg.claim = NFS4_OPEN_CLAIM_DELEGATE_CUR;
- memcpy(opendata->o_arg.u.delegation.data, stateid->data,
- sizeof(opendata->o_arg.u.delegation.data));
+ memcpy(opendata->o_arg.u.delegation.u.data, stateid->u.data,
+ sizeof(opendata->o_arg.u.delegation.u.data));
ret = nfs4_open_recover(opendata, state);
nfs4_opendata_put(opendata);
return ret;
@@ -1285,8 +1293,8 @@ static void nfs4_open_confirm_done(struct rpc_task *task, void *calldata)
if (RPC_ASSASSINATED(task))
return;
if (data->rpc_status == 0) {
- memcpy(data->o_res.stateid.data, data->c_res.stateid.data,
- sizeof(data->o_res.stateid.data));
+ memcpy(data->o_res.stateid.u.data, data->c_res.stateid.u.data,
+ sizeof(data->o_res.stateid.u.data));
nfs_confirm_seqid(&data->owner->so_seqid, 0);
renew_lease(data->o_res.server, data->timestamp);
data->rpc_done = 1;
@@ -4095,9 +4103,10 @@ static void nfs4_locku_done(struct rpc_task *task, void *data)
return;
switch (task->tk_status) {
case 0:
- memcpy(calldata->lsp->ls_stateid.data,
- calldata->res.stateid.data,
- sizeof(calldata->lsp->ls_stateid.data));
+ memcpy(calldata->lsp->ls_stateid.u.data,
+ calldata->res.stateid.u.data,
+ sizeof(calldata->lsp->ls_stateid.u.
+ data));
renew_lease(calldata->server, calldata->timestamp);
break;
case -NFS4ERR_BAD_STATEID:
@@ -4310,8 +4319,8 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
goto out;
}
if (data->rpc_status == 0) {
- memcpy(data->lsp->ls_stateid.data, data->res.stateid.data,
- sizeof(data->lsp->ls_stateid.data));
+ memcpy(data->lsp->ls_stateid.u.data, data->res.stateid.u.data,
+ sizeof(data->lsp->ls_stateid.u.data));
data->lsp->ls_flags |= NFS_LOCK_INITIALIZED;
renew_lease(NFS_SERVER(data->ctx->path.dentry->d_inode), data->timestamp);
}
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 3765ca1..2d52300 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1049,8 +1049,8 @@ restart:
* Open state on this file cannot be recovered
* All we can do is revert to using the zero stateid.
*/
- memset(state->stateid.data, 0,
- sizeof(state->stateid.data));
+ memset(state->stateid.u.data, 0,
+ sizeof(state->stateid.u.data));
/* Mark the file as being 'closed' */
state->state = 0;
break;
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 718824c..44f789a 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1000,7 +1000,7 @@ static void encode_close(struct xdr_stream *xdr, const struct nfs_closeargs *arg
p = reserve_space(xdr, 8+NFS4_STATEID_SIZE);
*p++ = cpu_to_be32(OP_CLOSE);
*p++ = cpu_to_be32(arg->seqid->sequence->counter);
- xdr_encode_opaque_fixed(p, arg->stateid->data, NFS4_STATEID_SIZE);
+ xdr_encode_opaque_fixed(p, arg->stateid->u.data, NFS4_STATEID_SIZE);
hdr->nops++;
hdr->replen += decode_close_maxsz;
}
@@ -1174,7 +1174,8 @@ static void encode_lock(struct xdr_stream *xdr, const struct nfs_lock_args *args
if (args->new_lock_owner){
p = reserve_space(xdr, 4+NFS4_STATEID_SIZE+32);
*p++ = cpu_to_be32(args->open_seqid->sequence->counter);
- p = xdr_encode_opaque_fixed(p, args->open_stateid->data, NFS4_STATEID_SIZE);
+ p = xdr_encode_opaque_fixed(p, args->open_stateid->u.data,
+ NFS4_STATEID_SIZE);
*p++ = cpu_to_be32(args->lock_seqid->sequence->counter);
p = xdr_encode_hyper(p, args->lock_owner.clientid);
*p++ = cpu_to_be32(16);
@@ -1183,7 +1184,7 @@ static void encode_lock(struct xdr_stream *xdr, const struct nfs_lock_args *args
}
else {
p = reserve_space(xdr, NFS4_STATEID_SIZE+4);
- p = xdr_encode_opaque_fixed(p, args->lock_stateid->data, NFS4_STATEID_SIZE);
+ p = xdr_encode_opaque_fixed(p, args->lock_stateid->u.data, NFS4_STATEID_SIZE);
*p = cpu_to_be32(args->lock_seqid->sequence->counter);
}
hdr->nops++;
@@ -1215,7 +1216,8 @@ static void encode_locku(struct xdr_stream *xdr, const struct nfs_locku_args *ar
*p++ = cpu_to_be32(OP_LOCKU);
*p++ = cpu_to_be32(nfs4_lock_type(args->fl, 0));
*p++ = cpu_to_be32(args->seqid->sequence->counter);
- p = xdr_encode_opaque_fixed(p, args->stateid->data, NFS4_STATEID_SIZE);
+ p = xdr_encode_opaque_fixed(p, args->stateid->u.data,
+ NFS4_STATEID_SIZE);
p = xdr_encode_hyper(p, args->fl->fl_start);
xdr_encode_hyper(p, nfs4_lock_length(args->fl));
hdr->nops++;
@@ -1365,7 +1367,7 @@ static inline void encode_claim_delegate_cur(struct xdr_stream *xdr, const struc

p = reserve_space(xdr, 4+NFS4_STATEID_SIZE);
*p++ = cpu_to_be32(NFS4_OPEN_CLAIM_DELEGATE_CUR);
- xdr_encode_opaque_fixed(p, stateid->data, NFS4_STATEID_SIZE);
+ xdr_encode_opaque_fixed(p, stateid->u.data, NFS4_STATEID_SIZE);
encode_string(xdr, name->len, name->name);
}

@@ -1396,7 +1398,7 @@ static void encode_open_confirm(struct xdr_stream *xdr, const struct nfs_open_co

p = reserve_space(xdr, 4+NFS4_STATEID_SIZE+4);
*p++ = cpu_to_be32(OP_OPEN_CONFIRM);
- p = xdr_encode_opaque_fixed(p, arg->stateid->data, NFS4_STATEID_SIZE);
+ p = xdr_encode_opaque_fixed(p, arg->stateid->u.data, NFS4_STATEID_SIZE);
*p = cpu_to_be32(arg->seqid->sequence->counter);
hdr->nops++;
hdr->replen += decode_open_confirm_maxsz;
@@ -1408,7 +1410,7 @@ static void encode_open_downgrade(struct xdr_stream *xdr, const struct nfs_close

p = reserve_space(xdr, 4+NFS4_STATEID_SIZE+4);
*p++ = cpu_to_be32(OP_OPEN_DOWNGRADE);
- p = xdr_encode_opaque_fixed(p, arg->stateid->data, NFS4_STATEID_SIZE);
+ p = xdr_encode_opaque_fixed(p, arg->stateid->u.data, NFS4_STATEID_SIZE);
*p = cpu_to_be32(arg->seqid->sequence->counter);
encode_share_access(xdr, arg->fmode);
hdr->nops++;
@@ -1446,9 +1448,10 @@ static void encode_stateid(struct xdr_stream *xdr, const struct nfs_open_context
p = reserve_space(xdr, NFS4_STATEID_SIZE);
if (ctx->state != NULL) {
nfs4_copy_stateid(&stateid, ctx->state, ctx->lockowner);
- xdr_encode_opaque_fixed(p, stateid.data, NFS4_STATEID_SIZE);
+ xdr_encode_opaque_fixed(p, stateid.u.data,
+ NFS4_STATEID_SIZE);
} else
- xdr_encode_opaque_fixed(p, zero_stateid.data, NFS4_STATEID_SIZE);
+ xdr_encode_opaque_fixed(p, zero_stateid.u.data, NFS4_STATEID_SIZE);
}

static void encode_read(struct xdr_stream *xdr, const struct nfs_readargs *args, struct compound_hdr *hdr)
@@ -1562,7 +1565,7 @@ encode_setacl(struct xdr_stream *xdr, struct nfs_setaclargs *arg, struct compoun

p = reserve_space(xdr, 4+NFS4_STATEID_SIZE);
*p++ = cpu_to_be32(OP_SETATTR);
- xdr_encode_opaque_fixed(p, zero_stateid.data, NFS4_STATEID_SIZE);
+ xdr_encode_opaque_fixed(p, zero_stateid.u.data, NFS4_STATEID_SIZE);
p = reserve_space(xdr, 2*4);
*p++ = cpu_to_be32(1);
*p = cpu_to_be32(FATTR4_WORD0_ACL);
@@ -1593,7 +1596,7 @@ static void encode_setattr(struct xdr_stream *xdr, const struct nfs_setattrargs

p = reserve_space(xdr, 4+NFS4_STATEID_SIZE);
*p++ = cpu_to_be32(OP_SETATTR);
- xdr_encode_opaque_fixed(p, arg->stateid.data, NFS4_STATEID_SIZE);
+ xdr_encode_opaque_fixed(p, arg->stateid.u.data, NFS4_STATEID_SIZE);
hdr->nops++;
hdr->replen += decode_setattr_maxsz;
encode_attrs(xdr, arg->iap, server);
@@ -1656,7 +1659,7 @@ static void encode_delegreturn(struct xdr_stream *xdr, const nfs4_stateid *state
p = reserve_space(xdr, 4+NFS4_STATEID_SIZE);

*p++ = cpu_to_be32(OP_DELEGRETURN);
- xdr_encode_opaque_fixed(p, stateid->data, NFS4_STATEID_SIZE);
+ xdr_encode_opaque_fixed(p, stateid->u.data, NFS4_STATEID_SIZE);
hdr->nops++;
hdr->replen += decode_delegreturn_maxsz;
}
@@ -1866,7 +1869,8 @@ encode_layoutget(struct xdr_stream *xdr,
p = xdr_encode_hyper(p, args->lseg.offset);
p = xdr_encode_hyper(p, args->lseg.length);
p = xdr_encode_hyper(p, args->minlength);
- p = xdr_encode_opaque_fixed(p, &args->stateid.data, NFS4_STATEID_SIZE);
+ p = xdr_encode_opaque_fixed(p, &args->stateid.u.data,
+ NFS4_STATEID_SIZE);
*p = cpu_to_be32(args->maxcount);

dprintk("%s: 1st type:0x%x iomode:%d off:%lu len:%lu mc:%d\n",
@@ -1898,7 +1902,7 @@ encode_layoutcommit(struct xdr_stream *xdr,
p = xdr_encode_hyper(p, args->lseg.offset);
p = xdr_encode_hyper(p, args->lseg.length);
*p++ = cpu_to_be32(0); /* reclaim */
- p = xdr_encode_opaque_fixed(p, args->stateid.data, NFS4_STATEID_SIZE);
+ p = xdr_encode_opaque_fixed(p, args->stateid.u.data, NFS4_STATEID_SIZE);
*p++ = cpu_to_be32(1); /* newoffset = TRUE */
p = xdr_encode_hyper(p, args->lastbytewritten);
*p = cpu_to_be32(args->time_modify_changed != 0);
@@ -1945,7 +1949,7 @@ encode_layoutreturn(struct xdr_stream *xdr,
p = reserve_space(xdr, 16 + NFS4_STATEID_SIZE);
p = xdr_encode_hyper(p, args->lseg.offset);
p = xdr_encode_hyper(p, args->lseg.length);
- p = xdr_encode_opaque_fixed(p, &args->stateid.data,
+ p = xdr_encode_opaque_fixed(p, &args->stateid.u.data,
NFS4_STATEID_SIZE);

dprintk("%s: call %pF\n", __func__,
@@ -3989,7 +3993,7 @@ static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)

static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
{
- return decode_opaque_fixed(xdr, stateid->data, NFS4_STATEID_SIZE);
+ return decode_opaque_fixed(xdr, stateid->u.data, NFS4_STATEID_SIZE);
}

static int decode_close(struct xdr_stream *xdr, struct nfs_closeres *res)
@@ -5281,7 +5285,7 @@ static int decode_layoutget(struct xdr_stream *xdr, struct rpc_rqst *req,
if (unlikely(!p))
goto out_overflow;
res->return_on_close = be32_to_cpup(p++);
- p = xdr_decode_opaque_fixed(p, res->stateid.data, NFS4_STATEID_SIZE);
+ p = xdr_decode_opaque_fixed(p, res->stateid.u.data, NFS4_STATEID_SIZE);
layout_count = be32_to_cpup(p);
if (!layout_count) {
dprintk("%s: server responded with empty layout array\n",
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index b2d203f..ecf6dc2 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -505,7 +505,7 @@ pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
const nfs4_stateid *stateid)
{
write_seqlock(&lo->seqlock);
- memcpy(lo->stateid.data, stateid->data, sizeof(lo->stateid.data));
+ memcpy(lo->stateid.u.data, stateid->u.data, sizeof(lo->stateid.u.data));
write_sequnlock(&lo->seqlock);
}

@@ -518,7 +518,8 @@ pnfs_get_layout_stateid(nfs4_stateid *dst, struct pnfs_layout_type *lo)

do {
seq = read_seqbegin(&lo->seqlock);
- memcpy(dst->data, lo->stateid.data, sizeof(lo->stateid.data));
+ memcpy(dst->u.data, lo->stateid.u.data,
+ sizeof(lo->stateid.u.data));
} while (read_seqretry(&lo->seqlock, seq));

dprintk("<-- %s\n", __func__);
@@ -533,8 +534,8 @@ pnfs_layout_from_open_stateid(nfs4_stateid *dst, struct nfs4_state *state)

do {
seq = read_seqbegin(&state->seqlock);
- memcpy(dst->data, state->stateid.data,
- sizeof(state->stateid.data));
+ memcpy(dst->u.data, state->stateid.u.data,
+ sizeof(state->stateid.u.data));
} while (read_seqretry(&state->seqlock, seq));

dprintk("<-- %s\n", __func__);
@@ -580,7 +581,7 @@ get_layout(struct inode *ino,
lgp->args.inode = ino;
lgp->lsegpp = lsegpp;

- if (!memcmp(lo->stateid.data, &zero_stateid, NFS4_STATEID_SIZE)) {
+ if (!memcmp(lo->stateid.u.data, &zero_stateid, NFS4_STATEID_SIZE)) {
struct nfs_open_context *oldctx = ctx;

if (!oldctx) {
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 7e32bd3..6a1fff7 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -40,7 +40,6 @@

#define NFSPROC4_CB_NULL 0
#define NFSPROC4_CB_COMPOUND 1
-#define NFS4_STATEID_SIZE 16

/* Index of predefined Linux callback client operations */

diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 2bb8eeb..89020e5 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -17,7 +17,10 @@

#define NFS4_BITMAP_SIZE 2
#define NFS4_VERIFIER_SIZE 8
-#define NFS4_STATEID_SIZE 16
+#define NFS4_STATEID_SEQID_SIZE 4
+#define NFS4_STATEID_OTHER_SIZE 12
+#define NFS4_STATEID_SIZE (NFS4_STATEID_OTHER_SIZE + \
+ NFS4_STATEID_SEQID_SIZE)
#define NFS4_FHSIZE 128
#define NFS4_MAXPATHLEN PATH_MAX
#define NFS4_MAXNAMLEN NAME_MAX
@@ -174,7 +177,16 @@ struct nfs4_acl {
};

typedef struct { char data[NFS4_VERIFIER_SIZE]; } nfs4_verifier;
-typedef struct { char data[NFS4_STATEID_SIZE]; } nfs4_stateid;
+typedef struct {
+ union {
+ char data[NFS4_STATEID_SIZE];
+ struct {
+ __be32 seqid;
+ char other[NFS4_STATEID_OTHER_SIZE];
+ } stateid;
+ } u;
+} nfs4_stateid;
+

enum nfs_opnum4 {
OP_ACCESS = 3,
@@ -554,6 +566,8 @@ struct nfs4_sessionid {
unsigned char data[NFS4_MAX_SESSIONID_LEN];
};

+
+
/* Create Session Flags */
#define SESSION4_PERSIST 0x001
#define SESSION4_BACK_CHAN 0x002
--
1.6.2.5


2010-05-17 17:56:51

by Alexandros Batsakis

[permalink] [raw]
Subject: [PATCH 2/8] pnfs-submit: clean locking infrastructure

(also minor cleanup of pnfs_free_layout())

Signed-off-by: Alexandros Batsakis <[email protected]>

Conflicts:

fs/nfs/pnfs.c
---
fs/nfs/pnfs.c | 80 +++++++++++++++++++++++++++++++++++++-------------------
1 files changed, 53 insertions(+), 27 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index b72c013..74cb998 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1,4 +1,4 @@
-/*
+ /*
* linux/fs/nfs/pnfs.c
*
* pNFS functions to call and manage layout drivers.
@@ -60,6 +60,8 @@ static int pnfs_initialized;
static void pnfs_free_layout(struct pnfs_layout_type *lo,
struct nfs4_pnfs_layout_segment *range);
static enum pnfs_try_status pnfs_commit(struct nfs_write_data *data, int sync);
+static inline void lock_current_layout(struct nfs_inode *nfsi);
+static inline void unlock_current_layout(struct nfs_inode *nfsi);

/* Locking:
*
@@ -153,16 +155,17 @@ pnfs_need_layoutcommit(struct nfs_inode *nfsi, struct nfs_open_context *ctx)
{
dprintk("%s: has_layout=%d layoutcommit_ctx=%p ctx=%p\n", __func__,
has_layout(nfsi), nfsi->layout.layoutcommit_ctx, ctx);
- spin_lock(&nfsi->lo_lock);
+
+ lock_current_layout(nfsi);
if (has_layout(nfsi) && !nfsi->layout.layoutcommit_ctx) {
nfsi->layout.layoutcommit_ctx = get_nfs_open_context(ctx);
nfsi->change_attr++;
- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);
dprintk("%s: Set layoutcommit_ctx=%p\n", __func__,
nfsi->layout.layoutcommit_ctx);
return;
}
- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);
}

/* Update last_write_offset for layoutcommit.
@@ -175,7 +178,7 @@ pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent)
{
loff_t end_pos;

- spin_lock(&nfsi->lo_lock);
+ lock_current_layout(nfsi);
if (offset < nfsi->layout.pnfs_write_begin_pos)
nfsi->layout.pnfs_write_begin_pos = offset;
end_pos = offset + extent - 1; /* I'm being inclusive */
@@ -187,7 +190,7 @@ pnfs_update_last_write(struct nfs_inode *nfsi, loff_t offset, size_t extent)
(unsigned long) offset ,
(unsigned long) nfsi->layout.pnfs_write_begin_pos,
(unsigned long) nfsi->layout.pnfs_write_end_pos);
- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);
}

/* Unitialize a mountpoint in a layout driver */
@@ -296,12 +299,27 @@ pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
* pNFS client layout cache
*/
#if defined(CONFIG_SMP)
+#define BUG_ON_LOCKED_LO(lo) \
+ BUG_ON(spin_is_locked(&PNFS_NFS_INODE(lo)->lo_lock))
#define BUG_ON_UNLOCKED_LO(lo) \
BUG_ON(!spin_is_locked(&PNFS_NFS_INODE(lo)->lo_lock))
#else /* CONFIG_SMP */
+#define BUG_ON_LOCKED_LO(lo) do {} while (0)
#define BUG_ON_UNLOCKED_LO(lo) do {} while (0)
#endif /* CONFIG_SMP */

+static inline void lock_current_layout(struct nfs_inode *nfsi)
+{
+ BUG_ON_LOCKED_LO((&nfsi->layout));
+ spin_lock(&nfsi->lo_lock);
+}
+
+static inline void unlock_current_layout(struct nfs_inode *nfsi)
+{
+ BUG_ON_UNLOCKED_LO((&nfsi->layout));
+ spin_unlock(&nfsi->lo_lock);
+}
+
/*
* get and lock nfsi->layout
*/
@@ -310,10 +328,10 @@ get_lock_current_layout(struct nfs_inode *nfsi)
{
struct pnfs_layout_type *lo;

+ lock_current_layout(nfsi);
lo = &nfsi->layout;
- spin_lock(&nfsi->lo_lock);
if (!lo->ld_data) {
- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);
return NULL;
}

@@ -333,7 +351,12 @@ put_unlock_current_layout(struct pnfs_layout_type *lo)
BUG_ON_UNLOCKED_LO(lo);
BUG_ON(lo->refcount <= 0);

- if (--lo->refcount == 0 && list_empty(&lo->segs)) {
+ lo->refcount--;
+
+ if (lo->refcount > 0)
+ goto out;
+
+ if (list_empty(&lo->segs)) {
struct layoutdriver_io_operations *io_ops =
PNFS_LD_IO_OPS(lo);

@@ -347,7 +370,8 @@ put_unlock_current_layout(struct pnfs_layout_type *lo)
list_del_init(&nfsi->lo_inodes);
spin_unlock(&clp->cl_lock);
}
- spin_unlock(&nfsi->lo_lock);
+out:
+ unlock_current_layout(nfsi);
}

void
@@ -356,7 +380,7 @@ pnfs_layout_release(struct pnfs_layout_type *lo, atomic_t *count,
{
struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);

- spin_lock(&nfsi->lo_lock);
+ lock_current_layout(nfsi);
if (range)
pnfs_free_layout(lo, range);
atomic_dec(count);
@@ -375,6 +399,8 @@ pnfs_destroy_layout(struct nfs_inode *nfsi)
};

lo = get_lock_current_layout(nfsi);
+ if (!lo)
+ return;
pnfs_free_layout(lo, &range);
put_unlock_current_layout(lo);
}
@@ -652,7 +678,7 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi,
struct pnfs_layout_segment *lseg;
bool ret = false;

- spin_lock(&nfsi->lo_lock);
+ lock_current_layout(nfsi);
list_for_each_entry (lseg, &nfsi->layout.segs, fi_list) {
if (!should_free_lseg(lseg, range))
continue;
@@ -666,7 +692,7 @@ pnfs_return_layout_barrier(struct nfs_inode *nfsi,
}
if (atomic_read(&nfsi->layout.lgetcount))
ret = true;
- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);

dprintk("%s:Return %d\n", __func__, ret);
return ret;
@@ -756,7 +782,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
/* unlock w/o put rebalanced by eventual call to
* pnfs_layout_release
*/
- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);

if (pnfs_return_layout_barrier(nfsi, &arg)) {
dprintk("%s: waiting\n", __func__);
@@ -887,7 +913,7 @@ static int pnfs_wait_schedule(void *word)
*
* Note: If successful, nfsi->lo_lock is taken and the caller
* must put and unlock current_layout by using put_unlock_current_layout()
- * when the returned layout is released.
+ * directly or pnfs_layout_release() when the returned layout is released.
*/
static struct pnfs_layout_type *
get_lock_alloc_layout(struct inode *ino)
@@ -922,7 +948,7 @@ get_lock_alloc_layout(struct inode *ino)
struct nfs_client *clp = NFS_SERVER(ino)->nfs_client;

/* must grab the layout lock before the client lock */
- spin_lock(&nfsi->lo_lock);
+ lock_current_layout(nfsi);

spin_lock(&clp->cl_lock);
if (list_empty(&nfsi->lo_inodes))
@@ -1038,10 +1064,10 @@ void drain_layoutreturns(struct pnfs_layout_type *lo)
while (atomic_read(&lo->lretcount)) {
struct nfs_inode *nfsi = PNFS_NFS_INODE(lo);

- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);
dprintk("%s: waiting\n", __func__);
wait_event(nfsi->lo_waitq, (atomic_read(&lo->lretcount) == 0));
- spin_lock(&nfsi->lo_lock);
+ lock_current_layout(nfsi);
}
}

@@ -1080,13 +1106,13 @@ pnfs_update_layout(struct inode *ino,
/* Check to see if the layout for the given range already exists */
lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
if (lseg && !lseg->valid) {
- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);
if (take_ref)
put_lseg(lseg);
for (;;) {
prepare_to_wait(&nfsi->lo_waitq, &__wait,
TASK_KILLABLE);
- spin_lock(&nfsi->lo_lock);
+ lock_current_layout(nfsi);
lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
if (!lseg || lseg->valid)
break;
@@ -1099,7 +1125,7 @@ pnfs_update_layout(struct inode *ino,
result = -ERESTARTSYS;
break;
}
- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);
schedule();
}
finish_wait(&nfsi->lo_waitq, &__wait);
@@ -1136,7 +1162,7 @@ pnfs_update_layout(struct inode *ino,
/* Matching dec is done in .rpc_release (on non-error paths) */
atomic_inc(&lo->lgetcount);
/* Lose lock, but not reference, match this with pnfs_layout_release */
- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);

result = get_layout(ino, ctx, &arg, lsegpp, lo);
out:
@@ -1286,7 +1312,7 @@ pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp)
*lgp->lsegpp = lseg;
}

- spin_lock(&nfsi->lo_lock);
+ lock_current_layout(nfsi);
pnfs_insert_layout(lo, lseg);

if (res->return_on_close) {
@@ -1297,7 +1323,7 @@ pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp)

/* Done processing layoutget. Set the layout stateid */
pnfs_set_layout_stateid(lo, &res->stateid);
- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);
out:
return status;
}
@@ -2212,7 +2238,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
if (!data)
return -ENOMEM;

- spin_lock(&nfsi->lo_lock);
+ lock_current_layout(nfsi);
if (!nfsi->layout.layoutcommit_ctx)
goto out_unlock;

@@ -2233,7 +2259,7 @@ pnfs_layoutcommit_inode(struct inode *inode, int sync)
nfsi->layout.layoutcommit_ctx = NULL;

/* release lock on pnfs layoutcommit attrs */
- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);

data->is_sync = sync;
status = pnfs4_proc_layoutcommit(data);
@@ -2242,7 +2268,7 @@ out:
return status;
out_unlock:
pnfs_layoutcommit_free(data);
- spin_unlock(&nfsi->lo_lock);
+ unlock_current_layout(nfsi);
goto out;
}

--
1.6.2.5


2010-05-17 17:56:54

by Alexandros Batsakis

[permalink] [raw]
Subject: [PATCH 7/8] pnfs-submit: forgetful client model

Forgetful client model:

If we receive a CB_LAYOUTRECALL
- we spawn a thread to handle the recall
(xxx: now only one recall can be active at a time, else NFS4ERR_DELAY)
- we check the stateid seqid
if it does not match we return NFS4ERR_DELAY
- we check for pending I/O
if there is we return NFS4ERR_DELAY
Else we return NO_MATCHING_LAYOUT.
Note that for whole file layouts there is no need to serialize LAYOUTGETs/LAYOUTRETURNs
For bulk layouts, if there is a layout active, we return NFS4_OK and we start
cleaning the layouts asynchronously. At the end we send a bulk LAYOUTRETURN.
Note that there is no need to prevent any new LAYOUTGETs explicitly as the server should
reject them.

Signed-off-by: Alexandros Batsakis <[email protected]>
---
fs/nfs/callback_proc.c | 131 ++++++++++++++++++++++++++++++++++--------------
fs/nfs/inode.c | 2 +-
fs/nfs/nfs4_fs.h | 1 +
fs/nfs/nfs4proc.c | 4 +-
fs/nfs/nfs4state.c | 2 +-
fs/nfs/pnfs.c | 78 +++++++++++++---------------
fs/nfs/pnfs.h | 8 ++-
7 files changed, 139 insertions(+), 87 deletions(-)

diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c
index 24e2571..419fee7 100644
--- a/fs/nfs/callback_proc.c
+++ b/fs/nfs/callback_proc.c
@@ -129,6 +129,38 @@ int nfs4_validate_delegation_stateid(struct nfs_delegation *delegation, const nf

#if defined(CONFIG_NFS_V4_1)

+static int
+pnfs_is_next_layout_stateid(const struct pnfs_layout_type *lo,
+ const nfs4_stateid stateid)
+{
+ int seq;
+ int res;
+ u32 oldseqid, newseqid;
+
+ do {
+ seq = read_seqbegin(&lo->seqlock);
+ oldseqid = ntohl(lo->stateid.u.stateid.seqid);
+ newseqid = ntohl(stateid.u.stateid.seqid);
+ res = !memcmp(lo->stateid.u.stateid.other,
+ stateid.u.stateid.other,
+ NFS4_STATEID_OTHER_SIZE);
+ if (res) { /* comparing layout stateids */
+ if (oldseqid == ~0)
+ res = (newseqid == 1);
+ else
+ res = (newseqid == oldseqid + 1);
+ } else { /* open stateid */
+ res = !memcmp(lo->stateid.u.data,
+ &zero_stateid,
+ NFS4_STATEID_SIZE);
+ if (res)
+ res = (newseqid == 1);
+ }
+ } while (read_seqretry(&lo->seqlock, seq));
+
+ return res;
+}
+
/*
* Retrieve an inode based on layout recall parameters
*
@@ -190,10 +222,11 @@ static int pnfs_recall_layout(void *data)
{
struct inode *inode, *ino;
struct nfs_client *clp;
+ struct nfs4_pnfs_layoutreturn *lrp;
struct cb_pnfs_layoutrecallargs rl;
struct recall_layout_threadargs *args =
(struct recall_layout_threadargs *)data;
- int status;
+ int status = 0;

daemonize("nfsv4-layoutreturn");

@@ -204,46 +237,57 @@ static int pnfs_recall_layout(void *data)
clp = args->clp;
inode = args->inode;
rl = *args->rl;
- args->result = 0;
- complete(&args->started);
- args = NULL;
- /* Note: args must not be used after this point!!! */
-
-/* FIXME: need barrier here:
- pause I/O to data servers
- pause layoutgets
- drain all outstanding writes to storage devices
- wait for any outstanding layoutreturns and layoutgets mentioned in
- cb_sequence.
- then return layouts, resume after layoutreturns complete
- */

/* support whole file layouts only */
rl.cbl_seg.offset = 0;
rl.cbl_seg.length = NFS4_MAX_UINT64;

if (rl.cbl_recall_type == RETURN_FILE) {
- status = pnfs_return_layout(inode, &rl.cbl_seg, &rl.cbl_stateid,
- RETURN_FILE);
+ if (pnfs_is_next_layout_stateid(&NFS_I(inode)->layout, rl.cbl_stateid))
+ status = pnfs_return_layout(inode, &rl.cbl_seg, &rl.cbl_stateid,
+ RETURN_FILE, 0);
+ else
+ status = htonl(NFS4ERR_DELAY);
+ /* forgetful client */
if (status)
- dprintk("%s RETURN_FILE error: %d\n", __func__, status);
+ dprintk("%s RETURN_FILE : %d\n", __func__, status);
+ else
+ status = htonl(NFS4ERR_NOMATCHING_LAYOUT);
+ args->result = status;
+ complete(&args->started);
goto out;
}

- /* FIXME: This loop is inefficient, running in O(|s_inodes|^2) */
+ status = htonl(NFS4_OK);
+ args->result = status;
+ complete(&args->started);
+ args = NULL;
+
while ((ino = nfs_layoutrecall_find_inode(clp, &rl)) != NULL) {
/* XXX need to check status on pnfs_return_layout */
- pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE);
+ pnfs_return_layout(ino, &rl.cbl_seg, NULL, RETURN_FILE, 0);
iput(ino);
}

- /* send final layoutreturn */
- status = pnfs_return_layout(inode, &rl.cbl_seg, NULL, rl.cbl_recall_type);
- if (status)
- printk(KERN_INFO "%s: ignoring pnfs_return_layout status=%d\n",
- __func__, status);
+ lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
+ if (!lrp) {
+ dprintk("%s: allocation failed. Cannot send last LAYOUTRETURN\n",
+ __func__);
+ goto out;
+ }
+
+ /* send final layoutreturn (bulk) */
+ lrp->args.reclaim = 0;
+ lrp->args.layout_type = rl.cbl_layout_type;
+ lrp->args.return_type = rl.cbl_recall_type;
+ lrp->args.lseg = rl.cbl_seg;
+ lrp->args.inode = inode;
+ lrp->lo = NULL;
+
+ pnfs4_proc_layoutreturn(lrp);
out:
- iput(inode);
+ clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
+ nfs_put_client(clp);
module_put_and_exit(0);
dprintk("%s: exit status %d\n", __func__, 0);
return 0;
@@ -261,15 +305,18 @@ static int pnfs_async_return_layout(struct nfs_client *clp, struct inode *inode,
.rl = rl,
};
struct task_struct *t;
- int status;
-
- /* should have returned NFS4ERR_NOMATCHING_LAYOUT... */
- BUG_ON(inode == NULL);
+ int status = htonl(NFS4ERR_DELAY);

dprintk("%s: -->\n", __func__);

+ /* XXX: do not allow two concurrent layout recalls */
+ if (test_and_set_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state))
+ return status;
+
init_completion(&data.started);
__module_get(THIS_MODULE);
+ if (!atomic_inc_not_zero(&clp->cl_count))
+ goto out_put_no_client;

t = kthread_run(pnfs_recall_layout, &data, "%s", "pnfs_recall_layout");
if (IS_ERR(t)) {
@@ -283,6 +330,9 @@ static int pnfs_async_return_layout(struct nfs_client *clp, struct inode *inode,
wait_for_completion(&data.started);
return data.result;
out_module_put:
+ nfs_put_client(clp);
+out_put_no_client:
+ clear_bit(NFS4CLNT_LAYOUT_RECALL, &clp->cl_state);
module_put(THIS_MODULE);
return status;
}
@@ -309,19 +359,24 @@ __be32 pnfs_cb_layoutrecall(struct cb_pnfs_layoutrecallargs *args,
do {
struct nfs_client *prev = clp;
num_client++;
- inode = nfs_layoutrecall_find_inode(clp, args);
- if (inode != NULL) {
- if (PNFS_LD(&NFS_I(inode)->layout)->id ==
- args->cbl_layout_type) {
- /* Set up a helper thread to actually
- * return the delegation */
+ /* the callback must come from the MDS personality */
+ if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_MDS))
+ goto loop;
+ if (args->cbl_recall_type == RETURN_FILE) {
+ inode = nfs_layoutrecall_find_inode(clp, args);
+ if (inode != NULL) {
res = pnfs_async_return_layout(clp, inode, args);
- if (res != 0)
- res = htonl(NFS4ERR_RESOURCE);
- break;
+ iput(inode);
}
+ } else {
+ /* we need the inode to get the nfs_server struct */
+ inode = nfs_layoutrecall_find_inode(clp, args);
+ if (!inode)
+ goto loop;
+ res = pnfs_async_return_layout(clp, inode, args);
iput(inode);
}
+loop:
clp = nfs_find_client_next(prev);
nfs_put_client(prev);
} while (clp != NULL);
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index adf3280..584c9b8 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1321,7 +1321,7 @@ void nfs4_clear_inode(struct inode *inode)
/* First call standard NFS clear_inode() code */
nfs_clear_inode(inode);
#ifdef CONFIG_NFS_V4_1
- pnfs_return_layout(inode, NULL, NULL, RETURN_FILE);
+ pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, 1);
#endif /* CONFIG_NFS_V4_1 */
}
#endif
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 40ebe1d..90c57b3 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -47,6 +47,7 @@ enum nfs4_client_state {
NFS4CLNT_SESSION_RESET,
NFS4CLNT_SESSION_DRAINING,
NFS4CLNT_RECALL_SLOT,
+ NFS4CLNT_LAYOUT_RECALL,
};

/*
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6739d15..14e90e1 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1085,7 +1085,7 @@ static void pnfs4_layout_reclaim(struct nfs4_state *state)
/* FIXME: send gratuitous layout commits and return with the reclaim
* flag during grace period
*/
- pnfs_return_layout(state->inode, NULL, &state->open_stateid, RETURN_FILE);
+ pnfs_return_layout(state->inode, NULL, NULL, RETURN_FILE, 1);
pnfs_set_layout_stateid(&NFS_I(state->inode)->layout, &zero_stateid);
#endif /* CONFIG_NFS_V4_1 */
}
@@ -2382,7 +2382,7 @@ pnfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,

if (pnfs_enabled_sb(server) && has_layout(nfsi) &&
pnfs_ld_layoutret_on_setattr(server->pnfs_curr_ld))
- pnfs_return_layout(inode, NULL, NULL, RETURN_FILE);
+ pnfs_return_layout(inode, NULL, NULL, RETURN_FILE, 1);
return nfs4_proc_setattr(dentry, fattr, sattr);
}
#endif /* CONFIG_NFS_V4_1 */
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 2d52300..cd2ae83 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -598,7 +598,7 @@ static void __nfs4_close(struct path *path, struct nfs4_state *state, fmode_t fm
range.offset = 0;
range.length = NFS4_MAX_UINT64;
pnfs_return_layout(state->inode, &range, NULL,
- RETURN_FILE);
+ RETURN_FILE, 1);
}
#endif /* CONFIG_NFS_V4_1 */
nfs4_do_close(path, state, wait);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 3b4dea0..2c96723 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -610,7 +610,7 @@ get_layout(struct inode *ino,
*/
static inline int
should_free_lseg(struct pnfs_layout_segment *lseg,
- struct nfs4_pnfs_layout_segment *range)
+ struct nfs4_pnfs_layout_segment *range)
{
return (range->iomode == IOMODE_ANY ||
lseg->range.iomode == range->iomode) &&
@@ -703,6 +703,8 @@ return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,

dprintk("--> %s\n", __func__);

+ BUG_ON(type != RETURN_FILE);
+
lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
if (lrp == NULL) {
if (lo && (type == RETURN_FILE))
@@ -729,7 +731,8 @@ out:
int
_pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
const nfs4_stateid *stateid, /* optional */
- enum pnfs_layoutreturn_type type)
+ enum pnfs_layoutreturn_type type,
+ int sendreturn)
{
struct pnfs_layout_type *lo = NULL;
struct nfs_inode *nfsi = NFS_I(ino);
@@ -739,14 +742,19 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
dprintk("--> %s type %d\n", __func__, type);

if (range)
- arg = *range;
- else {
+ arg.iomode = range->iomode;
+ else
arg.iomode = IOMODE_ANY;
- arg.offset = 0;
- arg.length = ~0;
- }
+ arg.offset = 0;
+ arg.length = ~0;
+
if (type == RETURN_FILE) {
if (nfsi->layout.layoutcommit_ctx) {
+ if (stateid && !sendreturn) { /* callback */
+ dprintk("%s: layoutcommit pending\n", __func__);
+ status = htonl(NFS4ERR_DELAY);
+ goto out;
+ }
status = pnfs_layoutcommit_inode(ino, 1);
if (status) {
dprintk("%s: layoutcommit failed, status=%d. "
@@ -763,11 +771,7 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
}
if (!lo) {
dprintk("%s: no layout segments to return\n", __func__);
- /* must send the LAYOUTRETURN in response to recall */
- if (stateid)
- goto send_return;
- else
- goto out;
+ goto out;
}

/* unlock w/o put rebalanced by eventual call to
@@ -776,13 +780,22 @@ _pnfs_return_layout(struct inode *ino, struct nfs4_pnfs_layout_segment *range,
unlock_current_layout(nfsi);

if (pnfs_return_layout_barrier(nfsi, &arg)) {
+ if (stateid) { /* callback */
+ status = htonl(NFS4ERR_DELAY);
+ lock_current_layout(nfsi);
+ put_unlock_current_layout(lo);
+ goto out;
+ }
dprintk("%s: waiting\n", __func__);
wait_event(nfsi->lo_waitq,
- !pnfs_return_layout_barrier(nfsi, &arg));
+ !pnfs_return_layout_barrier(nfsi, &arg));
}
+
+ if (sendreturn)
+ status = return_layout(ino, &arg, stateid, type, lo);
+ else
+ pnfs_layout_release(lo, &arg);
}
-send_return:
- status = return_layout(ino, &arg, stateid, type, lo);
out:
dprintk("<-- %s status: %d\n", __func__, status);
return status;
@@ -1080,31 +1093,12 @@ pnfs_update_layout(struct inode *ino,
/* Check to see if the layout for the given range already exists */
lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
if (lseg && !lseg->valid) {
- unlock_current_layout(nfsi);
if (take_ref)
put_lseg(lseg);
- for (;;) {
- prepare_to_wait(&nfsi->lo_waitq, &__wait,
- TASK_KILLABLE);
- lock_current_layout(nfsi);
- lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
- if (!lseg || lseg->valid)
- break;
- dprintk("%s: invalid lseg %p ref %d\n", __func__,
- lseg, atomic_read(&lseg->kref.refcount)-1);
- if (take_ref)
- put_lseg(lseg);
- if (signal_pending(current)) {
- lseg = NULL;
- result = -ERESTARTSYS;
- break;
- }
- unlock_current_layout(nfsi);
- schedule();
- }
- finish_wait(&nfsi->lo_waitq, &__wait);
- if (result)
- goto out_put;
+
+ /* someone is cleaning the layout */
+ result = -EAGAIN;
+ goto out_put;
}

if (lseg) {
@@ -1434,7 +1428,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,

if (count > 0 && !below_threshold(inode, count, 0)) {
status = pnfs_update_layout(inode, ctx, count,
- loff, IOMODE_READ, NULL);
+ loff, IOMODE_READ, NULL);
dprintk("%s *rsize %Zd virt update returned %d\n",
__func__, *rsize, status);
if (status != 0)
@@ -1673,7 +1667,7 @@ pnfs_writeback_done(struct nfs_write_data *data)
.length = data->args.count,
};
dprintk("%s: retrying\n", __func__);
- _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
+ _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, 1);
pnfs_initiate_write(data, NFS_CLIENT(data->inode),
pdata->call_ops, pdata->how);
}
@@ -1804,7 +1798,7 @@ pnfs_read_done(struct nfs_read_data *data)
.length = data->args.count,
};
dprintk("%s: retrying\n", __func__);
- _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
+ _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, 1);
pnfs_initiate_read(data, NFS_CLIENT(data->inode),
pdata->call_ops);
}
@@ -2030,7 +2024,7 @@ pnfs_commit_done(struct nfs_write_data *data)
.length = data->args.count,
};
dprintk("%s: retrying\n", __func__);
- _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE);
+ _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE, 1);
pnfs_initiate_commit(data, NFS_CLIENT(data->inode),
pdata->call_ops, pdata->how);
}
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 9e43973..794a2d3 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -40,7 +40,8 @@ int pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,

int _pnfs_return_layout(struct inode *, struct nfs4_pnfs_layout_segment *,
const nfs4_stateid *stateid, /* optional */
- enum pnfs_layoutreturn_type);
+ enum pnfs_layoutreturn_type,
+ int sendreturn);
void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh *mntfh, u32 id);
void unmount_pnfs_layoutdriver(struct nfs_server *);
int pnfs_use_read(struct inode *inode, ssize_t count);
@@ -236,14 +237,15 @@ static inline void pnfs_modify_new_request(struct nfs_page *req,
static inline int pnfs_return_layout(struct inode *ino,
struct nfs4_pnfs_layout_segment *lseg,
const nfs4_stateid *stateid, /* optional */
- enum pnfs_layoutreturn_type type)
+ enum pnfs_layoutreturn_type type,
+ int sendreturn)
{
struct nfs_inode *nfsi = NFS_I(ino);
struct nfs_server *nfss = NFS_SERVER(ino);

if (pnfs_enabled_sb(nfss) &&
(type != RETURN_FILE || has_layout(nfsi)))
- return _pnfs_return_layout(ino, lseg, stateid, type);
+ return _pnfs_return_layout(ino, lseg, stateid, type, sendreturn);

return 0;
}
--
1.6.2.5


2010-05-18 14:16:37

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 7/8] pnfs-submit: forgetful client model

On Mon, May 17, 2010 at 05:06:09PM -0700, Alexandros Batsakis wrote:
> Sure, I ll add the stuff below to the wiki and to the Documention folder.

Thanks! Looks good.

> a) if stateid.seqid is not the expected one return NFS4ERR_DELAY.
> b) we set the specific segments as not valid (no more I/O to them)
> c) if there is pending I/O we return NFS4ERR_DELAY.

Up to you whether it's worth documenting, but: I remember there being
some disagreement about this, but don't remember the resolution.

> d) we return NFS4ERR_NOMATCHING_LAYOUT
>
> Here you can find a simple flowchart that highlights the description above.
> https://docs.google.com/leaf?id=0B9egH40ld0WsMjg0ZmYzMTQtNDY5NC00ZjRjLTkwNzktNjFhZDFhYjJjYmFj&hl=en

That URL doesn't work for me.

--b.

2010-05-18 18:22:13

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 7/8] pnfs-submit: forgetful client model

On Tue, May 18, 2010 at 10:33:32AM -0700, Alexandros Batsakis wrote:
>
> On May 18, 2010, at 7:16 AM, J. Bruce Fields wrote:
>
> > On Mon, May 17, 2010 at 05:06:09PM -0700, Alexandros Batsakis wrote:
> >> Sure, I ll add the stuff below to the wiki and to the Documention folder.
> >
> > Thanks! Looks good.
> >
> >> a) if stateid.seqid is not the expected one return NFS4ERR_DELAY.
> >> b) we set the specific segments as not valid (no more I/O to them)
> >> c) if there is pending I/O we return NFS4ERR_DELAY.
> >
> > Up to you whether it's worth documenting, but: I remember there being
> > some disagreement about this, but don't remember the resolution.
> >
> >> d) we return NFS4ERR_NOMATCHING_LAYOUT
> >>
> >> Here you can find a simple flowchart that highlights the description above.
> >> https://docs.google.com/leaf?id=0B9egH40ld0WsMjg0ZmYzMTQtNDY5NC00ZjRjLTkwNzktNjFhZDFhYjJjYmFj&hl=en
> >
> > That URL doesn't work for me.
> >
>
> Apologies
> This should work (it's a .pdf file)
> https://docs.google.com/fileview?id=0B9egH40ld0WsNWQzM2RhMTQtM2E5ZS00YTk3LThmNDItMDU0YWMzZDczNDli&hl=en

Yep--thanks.

--b.