Date: Wed, 7 Jul 2010 18:44:54 -0400
Message-ID: <C2D311A6F086424F99E385949ECFEBCB030F2A80@CORPUSMX80B.corp.emc.com>
In-Reply-To: <BF3BB6D12298F54B89C8DCC1E4073D8001ADDDA5@CORPUSMX50A.corp.emc.com>
References: <A062FCC8662DA848949F7C3046B9BEAE01F3A6ED@us-email.terastack.bluearc.com><A062FCC8662DA848949F7C3046B9BEAE01F3A6EE@us-email.terastack.bluearc.com><6206CE0E-0A32-46A7-B648-3FCC12ED1961@netapp.com><B9A709F368FAAF4DB4B33870F72A141DFB88F3@CORPUSMX30A.corp.emc.com><0E2B1FE3-3B42-4BF2-BECE-A611DADF3983@netapp.com><B9A709F368FAAF4DB4B33870F72A141D01017F94@CORPUSMX30A.corp.emc.com><1278448834.16176.5.camel@heimdal.trondhjem.org><4C346D80.8010405@panasas.com><1278507985.2804.30.camel@heimdal.trondhjem.org><1278508696.2804.35.camel@heimdal.trondhjem.org><4C348679.6010507@panasas.com><1278511416.2804.52.camel@heimdal.trondhjem.org><B9A709F368FAAF4DB4B33870F72A141D0106B6B0@CORPUSMX30A.corp.emc.com><1278536484.12889.4.camel@heimdal.trondhjem.org>
	<BF3BB6D12298F54B89C8DCC1E4073D8001ADDDA5@CORPUSMX50A.corp.emc.com>
From: <david.black@emc.com>
To: <Noveck_David@emc.com>, <Trond.Myklebust@netapp.com>,
        <Daniel.Muntz@emc.com>
Cc: linux-nfs@vger.kernel.org, garth@panasas.com, welch@panasas.com,
        nfsv4@ietf.org, andros@netapp.com, bhalevy@panasas.com
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
Content-Type: text/plain; charset="us-ascii"
Sender: nfsv4-bounces@ietf.org
Errors-To: nfsv4-bounces@ietf.org
MIME-Version: 1.0

Let me try this ...

A correct client will always send LAYOUTCOMMIT.
Assume that the client is correct.
Hence if the LAYOUTCOMMIT doesn't arrive, something's failed.

Important implication: No LAYOUTCOMMIT is an error/failure case.  It
just has to work; it doesn't have to be fast.

Suggestion: If a client dies while holding writeable layouts that permit
write-in-place, and the client doesn't reappear or doesn't reclaim those
layouts, then the server should assume that the files involved were
written before the client died, and set the file attributes accordingly
as part of internally reclaiming the layout that the client has
abandoned.

Caveat: It may take a while for the server to determine that the client
has abandoned a layout.

This can result in false positives (file appears to be modified when it
wasn't) but won't yield false negatives (file does not appear to be
modified even though it was modified).

Thanks,
--David

> -----Original Message-----
> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf
Of Noveck_David@emc.com
> Sent: Wednesday, July 07, 2010 6:04 PM
> To: Trond.Myklebust@netapp.com; Muntz, Daniel
> Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com;
nfsv4@ietf.org;
> andros@netapp.com; bhalevy@panasas.com
> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> 
> > Yes. I would agree that the client cannot rely on the updates being
made
> > visible if it fails to send the LAYOUTCOMMIT. My point was simply
that a
> > compliant server MUST also have a valid strategy for dealing with
the
> > case where the client doesn't send it.
> 
> So you are saying the updates "MUST be made visible" through the
> server's valid strategy.  Is that right.
> 
> And that the client cannot rely on that.  Why not, if the server must
> have a valid strategy.
> 
> Is this just prudent "belt and suspenders" design or what?
> 
> It seems to me that if one side here is MUST (and the spec needs to be
> clearer about what might or might not constitute a valid strategy),
then
> the other side should be SHOULD.
> 
> If both sides are "MUST", then if things don't work out then the
client
> and server can equally point to one another and say "It's his fault".
> 
> Am I missing something here?
> 
> 
> 
> -----Original Message-----
> From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org] On Behalf
> Of Trond Myklebust
> Sent: Wednesday, July 07, 2010 5:01 PM
> To: Muntz, Daniel
> Cc: linux-nfs@vger.kernel.org; garth@panasas.com; welch@panasas.com;
> nfsv4@ietf.org; andros@netapp.com; bhalevy@panasas.com
> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> 
> On Wed, 2010-07-07 at 16:39 -0400, Daniel.Muntz@emc.com wrote:
> > To bring this discussion full circle, since we agree that a
compliant
> > server can implement a scheme where written data does not become
> visible
> > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
> > "MUST" from a compliant client (independent of layout type)?
> 
> Yes. I would agree that the client cannot rely on the updates being
made
> visible if it fails to send the LAYOUTCOMMIT. My point was simply that
a
> compliant server MUST also have a valid strategy for dealing with the
> case where the client doesn't send it.
> 
> Cheers
>   Trond
> 
> >   -Dan
> >
> > > -----Original Message-----
> > > From: nfsv4-bounces@ietf.org [mailto:nfsv4-bounces@ietf.org]
> > > On Behalf Of Trond Myklebust
> > > Sent: Wednesday, July 07, 2010 7:04 AM
> > > To: Benny Halevy
> > > Cc: andros@netapp.com; linux-nfs@vger.kernel.org; Garth
> > > Gibson; Brent Welch; NFSv4
> > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > >
> > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> > > <Trond.Myklebust@netapp.com> wrote:
> > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> > > <trond.myklebust@fys.uio.no> wrote:
> > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, Daniel.Muntz@emc.com
> wrote:
> > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I
see it as
> > > > >>>>> orthogonal to updating the metadata on the MDS (but
perhaps I'm wrong).
> > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a
synchronization
> > > > >>>>> point, so even if the non-clustered server does not want
to update
> > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a
trigger to
> > > > >>>>> execute whatever synchronization mechanism the implementer
wishes to put
> > > > >>>>> in the control protocol.
> > > > >>>>
> > > > >>>> As far as I'm aware, there are no exceptions in RFC5661
that would allow
> > > > >>>> pNFS servers to break the rule that any visible change to
the data must
> > > > >>>> be atomically accompanied with a change attribute update.
> > > > >>>>
> > > > >>>
> > > > >>> Trond, I'm not sure how this rule you mentioned is
specified.
> > > > >>>
> > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and
change/time_modify
> > > > >>> in particular:
> > > > >>>
> > > > >>>    For some layout protocols, the storage device is able to
notify the
> > > > >>>    metadata server of the occurrence of an I/O; as a result,
the change
> > > > >>>    and time_modify attributes may be updated at the metadata
server.
> > > > >>>    For a metadata server that is capable of monitoring
updates to the
> > > > >>>    change and time_modify attributes, LAYOUTCOMMIT
processing is not
> > > > >>>    required to update the change attribute.  In this case,
the metadata
> > > > >>>    server must ensure that no further update to the data has
occurred
> > > > >>>    since the last update of the attributes; file-based
protocols may
> > > > >>>    have enough information to make this determination or may
update the
> > > > >>>    change attribute upon each file modification.  This also
applies for
> > > > >>>    the time_modify attribute.  If the server implementation
is able to
> > > > >>>    determine that the file has not been modified since the
last
> > > > >>>    time_modify update, the server need not update
time_modify at
> > > > >>>    LAYOUTCOMMIT.  At LAYOUTCOMMIT completion, the updated
attributes
> > > > >>>    should be visible if that file was modified since the
latest previous
> > > > >>>    LAYOUTCOMMIT or LAYOUTGET
> > > > >>
> > > > >> I know. However the above paragraph does not state that the
server
> > > > >> should make those changes visible to clients other than the
one that is
> > > > >> writing.
> > > > >>
> > > > >> Section 18.32.4 states that writes will cause the
time_modified and
> > > > >> change attributes to be updated (if and only if the file data
is
> > > > >> modified). Several other sections rely on this behaviour,
including
> > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > > > >>
> > > > >> The only 'special behaviour' that I see allowed for pNFS is
in section
> > > > >> 13.10, which states that clients can't expect to see changes
> > > > >> immediately, but that they must be able to expect
close-to-open
> > > > >> semantics to work. Again, if this is to be the case, then the
server
> > > > >> _must_ be able to deal with the case where client 1 dies
before it can
> > > > >> issue the LAYOUTCOMMIT.
> > > >
> > > > Agreed.
> > > >
> > > > >>
> > > > >>
> > > > >>>> As I see it, if your server allows one client to read data
that may have
> > > > >>>> been modified by another client that holds a WRITE layout
for that range
> > > > >>>> then (since that is a visible data change) it should
provide a change
> > > > >>>> attribute update irrespective of whether or not a
LAYOUTCOMMIT has been
> > > > >>>> sent.
> > > > >>>
> > > > >>> the requirement for the server in WRITE's implementation
section
> > > > >>> is quite weak: "It is assumed that the act of writing data
to a file will
> > > > >>> cause the time_modified and change attributes of the file to
be updated."
> > > > >>>
> > > > >>> The difference here is that for pNFS the written data is not
guaranteed
> > > > >>> to be visible until LAYOUTCOMMIT.  In a broader sense,
assuming the clients
> > > > >>> are caching dirty data and use a write-behind cache,
application-written data
> > > > >>> may be visible to other processes on the same host but not
to others until
> > > > >>> fsync() or close() - open-to-close semantics are the only
thing the client
> > > > >>> guarantees, right?  Issuing LAYOUTCOMMIT on fsync() and
close() ensure the
> > > > >>> data is committed to stable storage and is visible to all
other clients in
> > > > >>> the cluster.
> > > > >>
> > > > >> See above. I'm not disputing your statement that 'the written
data is
> > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am
disputing an
> > > > >> assumption that 'the written data may be visible without an
accompanying
> > > > >> change attribute update'.
> > > > >
> > > > >
> > > > > In other words, I'd expect the following scenario to give the
same
> > > > > results in NFSv4.1 w/pNFS as it does in NFSv4:
> > > >
> > > > That's a strong requirement that may limit the scalability of
the server.
> > > >
> > > > The spirit of the pNFS operations, at least from Panasas
perspective was that
> > > > the data is transient until LAYOUTCOMMIT, meaning it may or may
not be visible
> > > > to clients other than the one who wrote it, and its associated
metadata MUST
> > > > be updated and describe the new data only on LAYOUTCOMMIT and
until then it's
> > > > undefined, i.e. it's up to the server implementation whether to
update it or not.
> > > >
> > > > Without locking, what do the stronger semantics buy you?
> > > > Even if a client verified the change_attribute new data may
become visible
> > > > at any time after the GETATTR if the file/byte range aren't
locked.
> > >
> > > There is no locking needed in the scenario below: it is ordinary
> > > close-to-open semantics.
> > >
> > > The point is that if you remove the one and only way that clients
have
> > > to determine whether or not their data caches are valid, then they
can
> > > no longer cache data at all, and server scalability will be shot
to
> > > smithereens anyway.
> > >
> > > Trond
> > >
> > > > Benny
> > > >
> > > > >
> > > > > Client 1			Client 2
> > > > > ========			========
> > > > >
> > > > > OPEN foo
> > > > > READ
> > > > > CLOSE
> > > > > 				OPEN
> > > > > 				LAYOUTGET ...
> > > > > 				WRITE via DS
> > > > > 				<dies>...
> > > > > OPEN foo
> > > > > verify change_attr
> > > > > READ if above WRITE is visible
> > > > > CLOSE
> > > > >
> > > > > Trond
> > > > > _______________________________________________
> > > > > nfsv4 mailing list
> > > > > nfsv4@ietf.org
> > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > >
> > >
> > > _______________________________________________
> > > nfsv4 mailing list
> > > nfsv4@ietf.org
> > > https://www.ietf.org/mailman/listinfo/nfsv4
> > >
> > >
> 
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
> 
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4

_______________________________________________
nfsv4 mailing list
nfsv4@ietf.org
https://www.ietf.org/mailman/listinfo/nfsv4