by Benny Halevy

[permalink] [raw]

Subject: Re: [PATCH 1/9] Revert "pnfs-submit: wave2: remove forgotten layoutreturn struct definitions"

On 2010-12-15 20:32, Trond Myklebust wrote:
> On Wed, 2010-12-15 at 20:30 +0200, Benny Halevy wrote:
>> This reverts commit 19e1e5ae1ec0a3f5d997a1a5d924d482e147bea2.
>> ---
>> include/linux/nfs4.h | 1 +
>> include/linux/nfs_xdr.h | 23 +++++++++++++++++++++++
>> 2 files changed, 24 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
>> index 8ca7700..55511e8 100644
>> --- a/include/linux/nfs4.h
>> +++ b/include/linux/nfs4.h
>> @@ -557,6 +557,7 @@ enum {
>> NFSPROC4_CLNT_RECLAIM_COMPLETE,
>> NFSPROC4_CLNT_LAYOUTGET,
>> NFSPROC4_CLNT_LAYOUTCOMMIT,
>> + NFSPROC4_CLNT_LAYOUTRETURN,
>> NFSPROC4_CLNT_GETDEVICEINFO,
>> NFSPROC4_CLNT_PNFS_WRITE,
>> NFSPROC4_CLNT_PNFS_COMMIT,
>> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
>> index 9d847ac..a651574 100644
>> --- a/include/linux/nfs_xdr.h
>> +++ b/include/linux/nfs_xdr.h
>> @@ -258,6 +258,29 @@ struct nfs4_layoutcommit_data {
>> int status;
>> };
>>
>> +struct nfs4_layoutreturn_args {
>> + __u32 reclaim;
>> + __u32 layout_type;
>> + __u32 return_type;
>> + struct pnfs_layout_range range;
>> + struct inode *inode;
>> + struct nfs4_sequence_args seq_args;
>> +};
>> +
>> +struct nfs4_layoutreturn_res {
>> + struct nfs4_sequence_res seq_res;
>> + u32 lrs_present;
>> + nfs4_stateid stateid;
>> +};
>> +
>> +struct nfs4_layoutreturn {
>> + struct nfs4_layoutreturn_args args;
>> + struct nfs4_layoutreturn_res res;
>> + struct rpc_cred *cred;
>> + struct nfs_client *clp;
>> + int rpc_status;
>> +};
>> +
>> struct nfs4_getdeviceinfo_args {
>> struct pnfs_device *pdev;
>> struct nfs4_sequence_args seq_args;
>
> Why? We don't need or even want layoutreturn. It adds too much
> serialisation crap.

Define "we" :)

First, the object layout driver relies on layout return for returning I/O error
information. On the common, successful path, with return_on_close (that Panasas
uses but others may not) I agree that CLOSE with the implicit layoutreturn
semantics we discussed should do a good job too (accompanied with a respective
LAYOUTCOMMIT if needed).

Then, when there's a large number of outstanding layout segments (again
prob. non-files layout presuming server implementations are going to utilize
whole-file layouts) proactive layoutreturn comes handy in capping the
state both the client and server keep - reducing time wasted on walking long
lists of items.

For CB_LAYOUTRECALL response the heart of the debate is around synchronizing
with layouts in-use and in-flight layoutgets. There, having the server poll
the client, who's retuning NFS4ERR_DELAY should essentially work but may be
inefficient and unreliable in use cases where contention is likely enough.
Eventually, when CB_LAYOUTRECALL is clear to go sending the LAYOUTRETURN
or replying with CB_NOMATCHING_LAYOUT (assuming no I/O error to report
for pnfs-obj) should be equivalent [note: need errata to clarify the
resulting stateid after NOMATCHING_LAYOUT].
Is this the serialization "crap" you're talking about?
What makes checking the conditions for returning NFS4ERR_DELAY to
CB_LAYOUTRECALL so different from implementing a barrier and doing the
returns asynchronously with the CB_LAYOUTRECALL?

Benny

>
>