2010-07-02 00:12:52

by Sandeep Joshi

[permalink] [raw]
Subject: 4.1 client - LAYOUTCOMMIT & close


In certain cases, I don't see layoutcommit on a file at all even after doing many writes.



Client side operations:

open
write(s)
close


On server side (observed operations):

open
layoutget's
close


But, I do not see laycommit at all. In terms data written by client it is about 4-5MB.

When does client issue laycommit?


regards,

Sandeep



2010-07-06 22:50:29

by Daniel.Muntz

[permalink] [raw]
Subject: RE: 4.1 client - LAYOUTCOMMIT & close



> -----Original Message-----
> From: Trond Myklebust [mailto:[email protected]]
> Sent: Tuesday, July 06, 2010 1:41 PM
> To: Muntz, Daniel
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: RE: 4.1 client - LAYOUTCOMMIT & close
>
> On Tue, 2010-07-06 at 15:20 -0400, [email protected] wrote:
> > The COMMIT to the DS, ttbomk, commits data on the DS. I see it as
> > orthogonal to updating the metadata on the MDS (but perhaps
> I'm wrong).
> > As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a
> synchronization
> > point, so even if the non-clustered server does not want to update
> > metadata on every DS I/O, the LAYOUTCOMMIT could also be a
> trigger to
> > execute whatever synchronization mechanism the implementer
> wishes to put
> > in the control protocol.
>
> As far as I'm aware, there are no exceptions in RFC5661 that
> would allow
> pNFS servers to break the rule that any visible change to the
> data must
> be atomically accompanied with a change attribute update.

As we've discussed before, until a LAYOUTCOMMIT occurs, new data may or
may not be visible to clients.

Suppose my server takes the approach that a COMMIT guarantees that data
is written to a persistent intent log in NVRAM. On LAYOUTCOMMIT, file
data is updated from NVRAM and there is a change attribute update
(atomic). A client that does not issue LAYOUTCOMMITs will not be able
to write data.

If every WRITE to a DS has to atomically update metadata on the MDS,
perhaps we could improve performance by co-locating data and metadata on
a single server [1/2 :-)]

>
> As I see it, if your server allows one client to read data
> that may have
> been modified by another client that holds a WRITE layout for
> that range
> then (since that is a visible data change) it should provide a change
> attribute update irrespective of whether or not a
> LAYOUTCOMMIT has been
> sent.
> If your MDS is incapable of determining whether or not data
> has changed
> on the DSes, then it should probably recall the WRITE layout
> if someone
> tries to read data that may have been modified. Said server
> also needs a
> strategy for determining if a data change occurred if the client that
> held the WRITE layout died before it could send the LAYOUTCOMMIT.

Sounds like you're suggesting treating layouts as capabilities in the
files case, which is one way to solve the problem. Is anyone doing
this, or are the files implementations still all treating layouts as
simply data locators?

>
> Cheers
> Trond
>
> > -Dan
> >
> > > -----Original Message-----
> > > From: Andy Adamson [mailto:[email protected]]
> > > Sent: Tuesday, July 06, 2010 6:38 AM
> > > To: Muntz, Daniel
> > > Cc: [email protected]; [email protected];
> [email protected]
> > > Subject: Re: 4.1 client - LAYOUTCOMMIT & close
> > >
> > >
> > > On Jul 2, 2010, at 5:46 PM, <[email protected]> wrote:
> > >
> > > > By "extremely lame server" I assume you mean any pNFS
> server that
> > > > doesn't have a cluster FS on the back end.
> > >
> > > No, I mean a pNFS file layout type server that depends upon
> > > the 'hint'
> > > of file size given by LAYOUTCOMMIT. This does not mean
> that the file
> > > system has to be a cluster FS.
> > >
> > > If COMMIT through MDS is set, the MDS to DS protocol (be it a
> > > cluster
> > > FS or not) ensures the data is "commited" on the DSs.
> > > LAYOUTCOMMIT is
> > > not needed.
> > >
> > > If COMMITs are sent to the DSs (or FILE_SYNC writes), then
> > > the MDS to
> > > DS protocol (be it a cluster FS or not) should kick off a
> > > back-end DS
> > > to MDS communication to update the file size on the MDS.
> > >
> > > What I consider an 'extremely lame pNFS file layout
> server' is one
> > > that requires COMMITs to the DS and then depends upon the
> > > LAYOUTCOMMIT
> > > to communicate the commited data size to the MDS.
> > >
> > > -->Andy
> > >
> > >
> > > > So while this might work
> > > > well for NetApp (as long as NetApp never ships a non-clustered
> > > > pNFS), it
> > > > might break others, or at least severely impact their
> > > performance.
> > > > For
> > > > example, will the Solaris pNFS server work correctly without
> > > > LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate
> > > > LAYOUTCOMMIT,
> > > > but the server is free to handle it as a no-op if the server
> > > > implementation does not need to utilize the payload.
> > > >
> > > > -Dan
> > > >
> > > >> -----Original Message-----
> > > >> From: [email protected]
> > > >> [mailto:[email protected]] On Behalf Of
> Andy Adamson
> > > >> Sent: Friday, July 02, 2010 8:41 AM
> > > >> To: Sandeep Joshi
> > > >> Cc: [email protected]; [email protected]
> > > >> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
> > > >>
> > > >>
> > > >> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote:
> > > >>
> > > >> Hi Sandeep
> > > >>
> > > >>>
> > > >>> In certain cases, I don't see layoutcommit on a file
> at all even
> > > >>> after doing many writes.
> > > >>
> > > >> FYI:
> > > >>
> > > >> You should not be paying attention to layoutcommits -
> they have no
> > > >> value for the file layout type.
> > > >>
> > > >> From RFC 5661:
> > > >>
> > > >> "The LAYOUTCOMMIT operation commits chages in the layout
> > > represented
> > > >> by the current filehandle, client ID (derived from the
> > > session ID in
> > > >> the preceding SEQUENCE operation), byte-range, and stateid."
> > > >>
> > > >> For the block layout type, this sentence has meaning in that
> > > >> there is
> > > >> a layoutupdate4 payload that enumerates the blocks that
> > > have changed
> > > >> state from being 'handed out' to being 'written'.
> > > >>
> > > >> The file layout type has no layoutupdate4 payload, and the
> > > >> layout does
> > > >> not change due to writes, and thus the LAYOUTCOMMIT call
> > > is useless.
> > > >>
> > > >> The only field in the LAYOUTCOMMIT4args that might possibly
> > > >> be useful
> > > >> is the loca_last_write_offset which tells the server what
> > > the client
> > > >> thinks is the EOF of the file after WRITE. It is an
> extremely lame
> > > >> server (file layout type server) that depends upon
> clients for this
> > > >> info.
> > > >>
> > > >>>
> > > >>>
> > > >>>
> > > >>> Client side operations:
> > > >>>
> > > >>> open
> > > >>> write(s)
> > > >>> close
> > > >>>
> > > >>>
> > > >>> On server side (observed operations):
> > > >>>
> > > >>> open
> > > >>> layoutget's
> > > >>> close
> > > >>>
> > > >>>
> > > >>> But, I do not see laycommit at all. In terms data written
> > > >> by client
> > > >>> it is about 4-5MB.
> > > >>>
> > > >>> When does client issue laycommit?
> > > >>
> > > >> The latest linux client sends a layout commit when the
> VFS does a
> > > >> super_operations.write_inode call which happens when the
> > > metadata of
> > > >> an inode needs updating. We are seriously considering
> removing the
> > > >> layoutcommit call from the file layout client.
> > > >>
> > > >> -->Andy
> > > >>
> > > >>>
> > > >>>
> > > >>> regards,
> > > >>>
> > > >>> Sandeep
> > > >>>
> > > >>> --
> > > >>> To unsubscribe from this list: send the line "unsubscribe
> > > >> linux-nfs"
> > > >>> in
> > > >>> the body of a message to [email protected]
> > > >>> More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> > > >>
> > > >> --
> > > >> To unsubscribe from this list: send the line "unsubscribe
> > > >> linux-nfs" in
> > > >> the body of a message to [email protected]
> > > >> More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> > > >>
> > > >>
> > >
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
>
>
>
>

2010-07-06 20:40:45

by Trond Myklebust

[permalink] [raw]
Subject: RE: 4.1 client - LAYOUTCOMMIT & close

On Tue, 2010-07-06 at 15:20 -0400, [email protected] wrote:
> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as
> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong).
> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization
> point, so even if the non-clustered server does not want to update
> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to
> execute whatever synchronization mechanism the implementer wishes to put
> in the control protocol.

As far as I'm aware, there are no exceptions in RFC5661 that would allow
pNFS servers to break the rule that any visible change to the data must
be atomically accompanied with a change attribute update.

As I see it, if your server allows one client to read data that may have
been modified by another client that holds a WRITE layout for that range
then (since that is a visible data change) it should provide a change
attribute update irrespective of whether or not a LAYOUTCOMMIT has been
sent.
If your MDS is incapable of determining whether or not data has changed
on the DSes, then it should probably recall the WRITE layout if someone
tries to read data that may have been modified. Said server also needs a
strategy for determining if a data change occurred if the client that
held the WRITE layout died before it could send the LAYOUTCOMMIT.

Cheers
Trond

> -Dan
>
> > -----Original Message-----
> > From: Andy Adamson [mailto:[email protected]]
> > Sent: Tuesday, July 06, 2010 6:38 AM
> > To: Muntz, Daniel
> > Cc: [email protected]; [email protected]; [email protected]
> > Subject: Re: 4.1 client - LAYOUTCOMMIT & close
> >
> >
> > On Jul 2, 2010, at 5:46 PM, <[email protected]> wrote:
> >
> > > By "extremely lame server" I assume you mean any pNFS server that
> > > doesn't have a cluster FS on the back end.
> >
> > No, I mean a pNFS file layout type server that depends upon
> > the 'hint'
> > of file size given by LAYOUTCOMMIT. This does not mean that the file
> > system has to be a cluster FS.
> >
> > If COMMIT through MDS is set, the MDS to DS protocol (be it a
> > cluster
> > FS or not) ensures the data is "commited" on the DSs.
> > LAYOUTCOMMIT is
> > not needed.
> >
> > If COMMITs are sent to the DSs (or FILE_SYNC writes), then
> > the MDS to
> > DS protocol (be it a cluster FS or not) should kick off a
> > back-end DS
> > to MDS communication to update the file size on the MDS.
> >
> > What I consider an 'extremely lame pNFS file layout server' is one
> > that requires COMMITs to the DS and then depends upon the
> > LAYOUTCOMMIT
> > to communicate the commited data size to the MDS.
> >
> > -->Andy
> >
> >
> > > So while this might work
> > > well for NetApp (as long as NetApp never ships a non-clustered
> > > pNFS), it
> > > might break others, or at least severely impact their
> > performance.
> > > For
> > > example, will the Solaris pNFS server work correctly without
> > > LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate
> > > LAYOUTCOMMIT,
> > > but the server is free to handle it as a no-op if the server
> > > implementation does not need to utilize the payload.
> > >
> > > -Dan
> > >
> > >> -----Original Message-----
> > >> From: [email protected]
> > >> [mailto:[email protected]] On Behalf Of Andy Adamson
> > >> Sent: Friday, July 02, 2010 8:41 AM
> > >> To: Sandeep Joshi
> > >> Cc: [email protected]; [email protected]
> > >> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
> > >>
> > >>
> > >> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote:
> > >>
> > >> Hi Sandeep
> > >>
> > >>>
> > >>> In certain cases, I don't see layoutcommit on a file at all even
> > >>> after doing many writes.
> > >>
> > >> FYI:
> > >>
> > >> You should not be paying attention to layoutcommits - they have no
> > >> value for the file layout type.
> > >>
> > >> From RFC 5661:
> > >>
> > >> "The LAYOUTCOMMIT operation commits chages in the layout
> > represented
> > >> by the current filehandle, client ID (derived from the
> > session ID in
> > >> the preceding SEQUENCE operation), byte-range, and stateid."
> > >>
> > >> For the block layout type, this sentence has meaning in that
> > >> there is
> > >> a layoutupdate4 payload that enumerates the blocks that
> > have changed
> > >> state from being 'handed out' to being 'written'.
> > >>
> > >> The file layout type has no layoutupdate4 payload, and the
> > >> layout does
> > >> not change due to writes, and thus the LAYOUTCOMMIT call
> > is useless.
> > >>
> > >> The only field in the LAYOUTCOMMIT4args that might possibly
> > >> be useful
> > >> is the loca_last_write_offset which tells the server what
> > the client
> > >> thinks is the EOF of the file after WRITE. It is an extremely lame
> > >> server (file layout type server) that depends upon clients for this
> > >> info.
> > >>
> > >>>
> > >>>
> > >>>
> > >>> Client side operations:
> > >>>
> > >>> open
> > >>> write(s)
> > >>> close
> > >>>
> > >>>
> > >>> On server side (observed operations):
> > >>>
> > >>> open
> > >>> layoutget's
> > >>> close
> > >>>
> > >>>
> > >>> But, I do not see laycommit at all. In terms data written
> > >> by client
> > >>> it is about 4-5MB.
> > >>>
> > >>> When does client issue laycommit?
> > >>
> > >> The latest linux client sends a layout commit when the VFS does a
> > >> super_operations.write_inode call which happens when the
> > metadata of
> > >> an inode needs updating. We are seriously considering removing the
> > >> layoutcommit call from the file layout client.
> > >>
> > >> -->Andy
> > >>
> > >>>
> > >>>
> > >>> regards,
> > >>>
> > >>> Sandeep
> > >>>
> > >>> --
> > >>> To unsubscribe from this list: send the line "unsubscribe
> > >> linux-nfs"
> > >>> in
> > >>> the body of a message to [email protected]
> > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >>
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe
> > >> linux-nfs" in
> > >> the body of a message to [email protected]
> > >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >>
> > >>
> >
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html






2010-07-06 14:05:03

by Boaz Harrosh

[permalink] [raw]
Subject: Re: 4.1 client - LAYOUTCOMMIT & close

On 07/06/2010 04:37 PM, Andy Adamson wrote:
>
> On Jul 2, 2010, at 5:46 PM, <[email protected]> wrote:
>
> What I consider an 'extremely lame pNFS file layout server' is one
> that requires COMMITs to the DS and then depends upon the LAYOUTCOMMIT
> to communicate the commited data size to the MDS.
>
(And mtime)

This is not "lame" this is "smart". There are tens of DS(s) but
thousands of clients with thousands of open files each, better
make the clients busy then the servers.

You are not looking scale.

> -->Andy
>

Boaz

2010-07-07 22:53:26

by Myklebust, Trond

[permalink] [raw]
Subject: RE: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

On Wed, 2010-07-07 at 18:44 -0400, [email protected] wrote:
> Let me try this ...
>
> A correct client will always send LAYOUTCOMMIT.
> Assume that the client is correct.
> Hence if the LAYOUTCOMMIT doesn't arrive, something's failed.
>
> Important implication: No LAYOUTCOMMIT is an error/failure case. It
> just has to work; it doesn't have to be fast.
>
> Suggestion: If a client dies while holding writeable layouts that permit
> write-in-place, and the client doesn't reappear or doesn't reclaim those
> layouts, then the server should assume that the files involved were
> written before the client died, and set the file attributes accordingly
> as part of internally reclaiming the layout that the client has
> abandoned.
>
> Caveat: It may take a while for the server to determine that the client
> has abandoned a layout.
>
> This can result in false positives (file appears to be modified when it
> wasn't) but won't yield false negatives (file does not appear to be
> modified even though it was modified).

OK... So we're going to have to turn off client side file caching
entirely for pNFS? I can do that...

The above won't work. Think readahead...

Trond

> Thanks,
> --David
>
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]] On Behalf
> Of [email protected]
> > Sent: Wednesday, July 07, 2010 6:04 PM
> > To: [email protected]; Muntz, Daniel
> > Cc: [email protected]; [email protected]; [email protected];
> [email protected];
> > [email protected]; [email protected]
> > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> >
> > > Yes. I would agree that the client cannot rely on the updates being
> made
> > > visible if it fails to send the LAYOUTCOMMIT. My point was simply
> that a
> > > compliant server MUST also have a valid strategy for dealing with
> the
> > > case where the client doesn't send it.
> >
> > So you are saying the updates "MUST be made visible" through the
> > server's valid strategy. Is that right.
> >
> > And that the client cannot rely on that. Why not, if the server must
> > have a valid strategy.
> >
> > Is this just prudent "belt and suspenders" design or what?
> >
> > It seems to me that if one side here is MUST (and the spec needs to be
> > clearer about what might or might not constitute a valid strategy),
> then
> > the other side should be SHOULD.
> >
> > If both sides are "MUST", then if things don't work out then the
> client
> > and server can equally point to one another and say "It's his fault".
> >
> > Am I missing something here?
> >
> >
> >
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]] On Behalf
> > Of Trond Myklebust
> > Sent: Wednesday, July 07, 2010 5:01 PM
> > To: Muntz, Daniel
> > Cc: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> >
> > On Wed, 2010-07-07 at 16:39 -0400, [email protected] wrote:
> > > To bring this discussion full circle, since we agree that a
> compliant
> > > server can implement a scheme where written data does not become
> > visible
> > > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
> > > "MUST" from a compliant client (independent of layout type)?
> >
> > Yes. I would agree that the client cannot rely on the updates being
> made
> > visible if it fails to send the LAYOUTCOMMIT. My point was simply that
> a
> > compliant server MUST also have a valid strategy for dealing with the
> > case where the client doesn't send it.
> >
> > Cheers
> > Trond
> >
> > > -Dan
> > >
> > > > -----Original Message-----
> > > > From: [email protected] [mailto:[email protected]]
> > > > On Behalf Of Trond Myklebust
> > > > Sent: Wednesday, July 07, 2010 7:04 AM
> > > > To: Benny Halevy
> > > > Cc: [email protected]; [email protected]; Garth
> > > > Gibson; Brent Welch; NFSv4
> > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > > >
> > > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> > > > <[email protected]> wrote:
> > > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> > > > <[email protected]> wrote:
> > > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected]
> > wrote:
> > > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I
> see it as
> > > > > >>>>> orthogonal to updating the metadata on the MDS (but
> perhaps I'm wrong).
> > > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a
> synchronization
> > > > > >>>>> point, so even if the non-clustered server does not want
> to update
> > > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a
> trigger to
> > > > > >>>>> execute whatever synchronization mechanism the implementer
> wishes to put
> > > > > >>>>> in the control protocol.
> > > > > >>>>
> > > > > >>>> As far as I'm aware, there are no exceptions in RFC5661
> that would allow
> > > > > >>>> pNFS servers to break the rule that any visible change to
> the data must
> > > > > >>>> be atomically accompanied with a change attribute update.
> > > > > >>>>
> > > > > >>>
> > > > > >>> Trond, I'm not sure how this rule you mentioned is
> specified.
> > > > > >>>
> > > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and
> change/time_modify
> > > > > >>> in particular:
> > > > > >>>
> > > > > >>> For some layout protocols, the storage device is able to
> notify the
> > > > > >>> metadata server of the occurrence of an I/O; as a result,
> the change
> > > > > >>> and time_modify attributes may be updated at the metadata
> server.
> > > > > >>> For a metadata server that is capable of monitoring
> updates to the
> > > > > >>> change and time_modify attributes, LAYOUTCOMMIT
> processing is not
> > > > > >>> required to update the change attribute. In this case,
> the metadata
> > > > > >>> server must ensure that no further update to the data has
> occurred
> > > > > >>> since the last update of the attributes; file-based
> protocols may
> > > > > >>> have enough information to make this determination or may
> update the
> > > > > >>> change attribute upon each file modification. This also
> applies for
> > > > > >>> the time_modify attribute. If the server implementation
> is able to
> > > > > >>> determine that the file has not been modified since the
> last
> > > > > >>> time_modify update, the server need not update
> time_modify at
> > > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated
> attributes
> > > > > >>> should be visible if that file was modified since the
> latest previous
> > > > > >>> LAYOUTCOMMIT or LAYOUTGET
> > > > > >>
> > > > > >> I know. However the above paragraph does not state that the
> server
> > > > > >> should make those changes visible to clients other than the
> one that is
> > > > > >> writing.
> > > > > >>
> > > > > >> Section 18.32.4 states that writes will cause the
> time_modified and
> > > > > >> change attributes to be updated (if and only if the file data
> is
> > > > > >> modified). Several other sections rely on this behaviour,
> including
> > > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > > > > >>
> > > > > >> The only 'special behaviour' that I see allowed for pNFS is
> in section
> > > > > >> 13.10, which states that clients can't expect to see changes
> > > > > >> immediately, but that they must be able to expect
> close-to-open
> > > > > >> semantics to work. Again, if this is to be the case, then the
> server
> > > > > >> _must_ be able to deal with the case where client 1 dies
> before it can
> > > > > >> issue the LAYOUTCOMMIT.
> > > > >
> > > > > Agreed.
> > > > >
> > > > > >>
> > > > > >>
> > > > > >>>> As I see it, if your server allows one client to read data
> that may have
> > > > > >>>> been modified by another client that holds a WRITE layout
> for that range
> > > > > >>>> then (since that is a visible data change) it should
> provide a change
> > > > > >>>> attribute update irrespective of whether or not a
> LAYOUTCOMMIT has been
> > > > > >>>> sent.
> > > > > >>>
> > > > > >>> the requirement for the server in WRITE's implementation
> section
> > > > > >>> is quite weak: "It is assumed that the act of writing data
> to a file will
> > > > > >>> cause the time_modified and change attributes of the file to
> be updated."
> > > > > >>>
> > > > > >>> The difference here is that for pNFS the written data is not
> guaranteed
> > > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense,
> assuming the clients
> > > > > >>> are caching dirty data and use a write-behind cache,
> application-written data
> > > > > >>> may be visible to other processes on the same host but not
> to others until
> > > > > >>> fsync() or close() - open-to-close semantics are the only
> thing the client
> > > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and
> close() ensure the
> > > > > >>> data is committed to stable storage and is visible to all
> other clients in
> > > > > >>> the cluster.
> > > > > >>
> > > > > >> See above. I'm not disputing your statement that 'the written
> data is
> > > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am
> disputing an
> > > > > >> assumption that 'the written data may be visible without an
> accompanying
> > > > > >> change attribute update'.
> > > > > >
> > > > > >
> > > > > > In other words, I'd expect the following scenario to give the
> same
> > > > > > results in NFSv4.1 w/pNFS as it does in NFSv4:
> > > > >
> > > > > That's a strong requirement that may limit the scalability of
> the server.
> > > > >
> > > > > The spirit of the pNFS operations, at least from Panasas
> perspective was that
> > > > > the data is transient until LAYOUTCOMMIT, meaning it may or may
> not be visible
> > > > > to clients other than the one who wrote it, and its associated
> metadata MUST
> > > > > be updated and describe the new data only on LAYOUTCOMMIT and
> until then it's
> > > > > undefined, i.e. it's up to the server implementation whether to
> update it or not.
> > > > >
> > > > > Without locking, what do the stronger semantics buy you?
> > > > > Even if a client verified the change_attribute new data may
> become visible
> > > > > at any time after the GETATTR if the file/byte range aren't
> locked.
> > > >
> > > > There is no locking needed in the scenario below: it is ordinary
> > > > close-to-open semantics.
> > > >
> > > > The point is that if you remove the one and only way that clients
> have
> > > > to determine whether or not their data caches are valid, then they
> can
> > > > no longer cache data at all, and server scalability will be shot
> to
> > > > smithereens anyway.
> > > >
> > > > Trond
> > > >
> > > > > Benny
> > > > >
> > > > > >
> > > > > > Client 1 Client 2
> > > > > > ======== ========
> > > > > >
> > > > > > OPEN foo
> > > > > > READ
> > > > > > CLOSE
> > > > > > OPEN
> > > > > > LAYOUTGET ...
> > > > > > WRITE via DS
> > > > > > <dies>...
> > > > > > OPEN foo
> > > > > > verify change_attr
> > > > > > READ if above WRITE is visible
> > > > > > CLOSE
> > > > > >
> > > > > > Trond
> > > > > > _______________________________________________
> > > > > > nfsv4 mailing list
> > > > > > [email protected]
> > > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > > >
> > > >
> > > > _______________________________________________
> > > > nfsv4 mailing list
> > > > [email protected]
> > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > > >
> > > >
> >
> >
> > _______________________________________________
> > nfsv4 mailing list
> > [email protected]
> > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> > _______________________________________________
> > nfsv4 mailing list
> > [email protected]
> > https://www.ietf.org/mailman/listinfo/nfsv4
>



2010-07-07 22:44:54

by david.black

[permalink] [raw]
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

Let me try this ...

A correct client will always send LAYOUTCOMMIT.
Assume that the client is correct.
Hence if the LAYOUTCOMMIT doesn't arrive, something's failed.

Important implication: No LAYOUTCOMMIT is an error/failure case. It
just has to work; it doesn't have to be fast.

Suggestion: If a client dies while holding writeable layouts that permit
write-in-place, and the client doesn't reappear or doesn't reclaim those
layouts, then the server should assume that the files involved were
written before the client died, and set the file attributes accordingly
as part of internally reclaiming the layout that the client has
abandoned.

Caveat: It may take a while for the server to determine that the client
has abandoned a layout.

This can result in false positives (file appears to be modified when it
wasn't) but won't yield false negatives (file does not appear to be
modified even though it was modified).

Thanks,
--David

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf
Of [email protected]
> Sent: Wednesday, July 07, 2010 6:04 PM
> To: [email protected]; Muntz, Daniel
> Cc: [email protected]; [email protected]; [email protected];
[email protected];
> [email protected]; [email protected]
> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
>
> > Yes. I would agree that the client cannot rely on the updates being
made
> > visible if it fails to send the LAYOUTCOMMIT. My point was simply
that a
> > compliant server MUST also have a valid strategy for dealing with
the
> > case where the client doesn't send it.
>
> So you are saying the updates "MUST be made visible" through the
> server's valid strategy. Is that right.
>
> And that the client cannot rely on that. Why not, if the server must
> have a valid strategy.
>
> Is this just prudent "belt and suspenders" design or what?
>
> It seems to me that if one side here is MUST (and the spec needs to be
> clearer about what might or might not constitute a valid strategy),
then
> the other side should be SHOULD.
>
> If both sides are "MUST", then if things don't work out then the
client
> and server can equally point to one another and say "It's his fault".
>
> Am I missing something here?
>
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf
> Of Trond Myklebust
> Sent: Wednesday, July 07, 2010 5:01 PM
> To: Muntz, Daniel
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
>
> On Wed, 2010-07-07 at 16:39 -0400, [email protected] wrote:
> > To bring this discussion full circle, since we agree that a
compliant
> > server can implement a scheme where written data does not become
> visible
> > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
> > "MUST" from a compliant client (independent of layout type)?
>
> Yes. I would agree that the client cannot rely on the updates being
made
> visible if it fails to send the LAYOUTCOMMIT. My point was simply that
a
> compliant server MUST also have a valid strategy for dealing with the
> case where the client doesn't send it.
>
> Cheers
> Trond
>
> > -Dan
> >
> > > -----Original Message-----
> > > From: [email protected] [mailto:[email protected]]
> > > On Behalf Of Trond Myklebust
> > > Sent: Wednesday, July 07, 2010 7:04 AM
> > > To: Benny Halevy
> > > Cc: [email protected]; [email protected]; Garth
> > > Gibson; Brent Welch; NFSv4
> > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > >
> > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> > > <[email protected]> wrote:
> > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> > > <[email protected]> wrote:
> > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected]
> wrote:
> > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I
see it as
> > > > >>>>> orthogonal to updating the metadata on the MDS (but
perhaps I'm wrong).
> > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a
synchronization
> > > > >>>>> point, so even if the non-clustered server does not want
to update
> > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a
trigger to
> > > > >>>>> execute whatever synchronization mechanism the implementer
wishes to put
> > > > >>>>> in the control protocol.
> > > > >>>>
> > > > >>>> As far as I'm aware, there are no exceptions in RFC5661
that would allow
> > > > >>>> pNFS servers to break the rule that any visible change to
the data must
> > > > >>>> be atomically accompanied with a change attribute update.
> > > > >>>>
> > > > >>>
> > > > >>> Trond, I'm not sure how this rule you mentioned is
specified.
> > > > >>>
> > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and
change/time_modify
> > > > >>> in particular:
> > > > >>>
> > > > >>> For some layout protocols, the storage device is able to
notify the
> > > > >>> metadata server of the occurrence of an I/O; as a result,
the change
> > > > >>> and time_modify attributes may be updated at the metadata
server.
> > > > >>> For a metadata server that is capable of monitoring
updates to the
> > > > >>> change and time_modify attributes, LAYOUTCOMMIT
processing is not
> > > > >>> required to update the change attribute. In this case,
the metadata
> > > > >>> server must ensure that no further update to the data has
occurred
> > > > >>> since the last update of the attributes; file-based
protocols may
> > > > >>> have enough information to make this determination or may
update the
> > > > >>> change attribute upon each file modification. This also
applies for
> > > > >>> the time_modify attribute. If the server implementation
is able to
> > > > >>> determine that the file has not been modified since the
last
> > > > >>> time_modify update, the server need not update
time_modify at
> > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated
attributes
> > > > >>> should be visible if that file was modified since the
latest previous
> > > > >>> LAYOUTCOMMIT or LAYOUTGET
> > > > >>
> > > > >> I know. However the above paragraph does not state that the
server
> > > > >> should make those changes visible to clients other than the
one that is
> > > > >> writing.
> > > > >>
> > > > >> Section 18.32.4 states that writes will cause the
time_modified and
> > > > >> change attributes to be updated (if and only if the file data
is
> > > > >> modified). Several other sections rely on this behaviour,
including
> > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > > > >>
> > > > >> The only 'special behaviour' that I see allowed for pNFS is
in section
> > > > >> 13.10, which states that clients can't expect to see changes
> > > > >> immediately, but that they must be able to expect
close-to-open
> > > > >> semantics to work. Again, if this is to be the case, then the
server
> > > > >> _must_ be able to deal with the case where client 1 dies
before it can
> > > > >> issue the LAYOUTCOMMIT.
> > > >
> > > > Agreed.
> > > >
> > > > >>
> > > > >>
> > > > >>>> As I see it, if your server allows one client to read data
that may have
> > > > >>>> been modified by another client that holds a WRITE layout
for that range
> > > > >>>> then (since that is a visible data change) it should
provide a change
> > > > >>>> attribute update irrespective of whether or not a
LAYOUTCOMMIT has been
> > > > >>>> sent.
> > > > >>>
> > > > >>> the requirement for the server in WRITE's implementation
section
> > > > >>> is quite weak: "It is assumed that the act of writing data
to a file will
> > > > >>> cause the time_modified and change attributes of the file to
be updated."
> > > > >>>
> > > > >>> The difference here is that for pNFS the written data is not
guaranteed
> > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense,
assuming the clients
> > > > >>> are caching dirty data and use a write-behind cache,
application-written data
> > > > >>> may be visible to other processes on the same host but not
to others until
> > > > >>> fsync() or close() - open-to-close semantics are the only
thing the client
> > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and
close() ensure the
> > > > >>> data is committed to stable storage and is visible to all
other clients in
> > > > >>> the cluster.
> > > > >>
> > > > >> See above. I'm not disputing your statement that 'the written
data is
> > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am
disputing an
> > > > >> assumption that 'the written data may be visible without an
accompanying
> > > > >> change attribute update'.
> > > > >
> > > > >
> > > > > In other words, I'd expect the following scenario to give the
same
> > > > > results in NFSv4.1 w/pNFS as it does in NFSv4:
> > > >
> > > > That's a strong requirement that may limit the scalability of
the server.
> > > >
> > > > The spirit of the pNFS operations, at least from Panasas
perspective was that
> > > > the data is transient until LAYOUTCOMMIT, meaning it may or may
not be visible
> > > > to clients other than the one who wrote it, and its associated
metadata MUST
> > > > be updated and describe the new data only on LAYOUTCOMMIT and
until then it's
> > > > undefined, i.e. it's up to the server implementation whether to
update it or not.
> > > >
> > > > Without locking, what do the stronger semantics buy you?
> > > > Even if a client verified the change_attribute new data may
become visible
> > > > at any time after the GETATTR if the file/byte range aren't
locked.
> > >
> > > There is no locking needed in the scenario below: it is ordinary
> > > close-to-open semantics.
> > >
> > > The point is that if you remove the one and only way that clients
have
> > > to determine whether or not their data caches are valid, then they
can
> > > no longer cache data at all, and server scalability will be shot
to
> > > smithereens anyway.
> > >
> > > Trond
> > >
> > > > Benny
> > > >
> > > > >
> > > > > Client 1 Client 2
> > > > > ======== ========
> > > > >
> > > > > OPEN foo
> > > > > READ
> > > > > CLOSE
> > > > > OPEN
> > > > > LAYOUTGET ...
> > > > > WRITE via DS
> > > > > <dies>...
> > > > > OPEN foo
> > > > > verify change_attr
> > > > > READ if above WRITE is visible
> > > > > CLOSE
> > > > >
> > > > > Trond
> > > > > _______________________________________________
> > > > > nfsv4 mailing list
> > > > > [email protected]
> > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > >
> > >
> > > _______________________________________________
> > > nfsv4 mailing list
> > > [email protected]
> > > https://www.ietf.org/mailman/listinfo/nfsv4
> > >
> > >
>
>
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4
>
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4

_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-07 20:39:42

by Daniel.Muntz

[permalink] [raw]
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

To bring this discussion full circle, since we agree that a compliant
server can implement a scheme where written data does not become visible
until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
"MUST" from a compliant client (independent of layout type)?

-Dan

> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Trond Myklebust
> Sent: Wednesday, July 07, 2010 7:04 AM
> To: Benny Halevy
> Cc: [email protected]; [email protected]; Garth
> Gibson; Brent Welch; NFSv4
> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
>
> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> <[email protected]> wrote:
> > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> <[email protected]> wrote:
> > >>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected] wrote:
> > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS.
> I see it as
> > >>>>> orthogonal to updating the metadata on the MDS (but
> perhaps I'm wrong).
> > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT
> provides a synchronization
> > >>>>> point, so even if the non-clustered server does not
> want to update
> > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also
> be a trigger to
> > >>>>> execute whatever synchronization mechanism the
> implementer wishes to put
> > >>>>> in the control protocol.
> > >>>>
> > >>>> As far as I'm aware, there are no exceptions in
> RFC5661 that would allow
> > >>>> pNFS servers to break the rule that any visible change
> to the data must
> > >>>> be atomically accompanied with a change attribute update.
> > >>>>
> > >>>
> > >>> Trond, I'm not sure how this rule you mentioned is specified.
> > >>>
> > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT
> and change/time_modify
> > >>> in particular:
> > >>>
> > >>> For some layout protocols, the storage device is
> able to notify the
> > >>> metadata server of the occurrence of an I/O; as a
> result, the change
> > >>> and time_modify attributes may be updated at the
> metadata server.
> > >>> For a metadata server that is capable of monitoring
> updates to the
> > >>> change and time_modify attributes, LAYOUTCOMMIT
> processing is not
> > >>> required to update the change attribute. In this
> case, the metadata
> > >>> server must ensure that no further update to the
> data has occurred
> > >>> since the last update of the attributes; file-based
> protocols may
> > >>> have enough information to make this determination
> or may update the
> > >>> change attribute upon each file modification. This
> also applies for
> > >>> the time_modify attribute. If the server
> implementation is able to
> > >>> determine that the file has not been modified since the last
> > >>> time_modify update, the server need not update time_modify at
> > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the
> updated attributes
> > >>> should be visible if that file was modified since
> the latest previous
> > >>> LAYOUTCOMMIT or LAYOUTGET
> > >>
> > >> I know. However the above paragraph does not state that
> the server
> > >> should make those changes visible to clients other than
> the one that is
> > >> writing.
> > >>
> > >> Section 18.32.4 states that writes will cause the
> time_modified and
> > >> change attributes to be updated (if and only if the file data is
> > >> modified). Several other sections rely on this
> behaviour, including
> > >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > >>
> > >> The only 'special behaviour' that I see allowed for pNFS
> is in section
> > >> 13.10, which states that clients can't expect to see changes
> > >> immediately, but that they must be able to expect close-to-open
> > >> semantics to work. Again, if this is to be the case,
> then the server
> > >> _must_ be able to deal with the case where client 1 dies
> before it can
> > >> issue the LAYOUTCOMMIT.
> >
> > Agreed.
> >
> > >>
> > >>
> > >>>> As I see it, if your server allows one client to read
> data that may have
> > >>>> been modified by another client that holds a WRITE
> layout for that range
> > >>>> then (since that is a visible data change) it should
> provide a change
> > >>>> attribute update irrespective of whether or not a
> LAYOUTCOMMIT has been
> > >>>> sent.
> > >>>
> > >>> the requirement for the server in WRITE's
> implementation section
> > >>> is quite weak: "It is assumed that the act of writing
> data to a file will
> > >>> cause the time_modified and change attributes of the
> file to be updated."
> > >>>
> > >>> The difference here is that for pNFS the written data
> is not guaranteed
> > >>> to be visible until LAYOUTCOMMIT. In a broader sense,
> assuming the clients
> > >>> are caching dirty data and use a write-behind cache,
> application-written data
> > >>> may be visible to other processes on the same host but
> not to others until
> > >>> fsync() or close() - open-to-close semantics are the
> only thing the client
> > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and
> close() ensure the
> > >>> data is committed to stable storage and is visible to
> all other clients in
> > >>> the cluster.
> > >>
> > >> See above. I'm not disputing your statement that 'the
> written data is
> > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am
> disputing an
> > >> assumption that 'the written data may be visible without
> an accompanying
> > >> change attribute update'.
> > >
> > >
> > > In other words, I'd expect the following scenario to give the same
> > > results in NFSv4.1 w/pNFS as it does in NFSv4:
> >
> > That's a strong requirement that may limit the scalability
> of the server.
> >
> > The spirit of the pNFS operations, at least from Panasas
> perspective was that
> > the data is transient until LAYOUTCOMMIT, meaning it may or
> may not be visible
> > to clients other than the one who wrote it, and its
> associated metadata MUST
> > be updated and describe the new data only on LAYOUTCOMMIT
> and until then it's
> > undefined, i.e. it's up to the server implementation
> whether to update it or not.
> >
> > Without locking, what do the stronger semantics buy you?
> > Even if a client verified the change_attribute new data may
> become visible
> > at any time after the GETATTR if the file/byte range aren't locked.
>
> There is no locking needed in the scenario below: it is ordinary
> close-to-open semantics.
>
> The point is that if you remove the one and only way that clients have
> to determine whether or not their data caches are valid, then they can
> no longer cache data at all, and server scalability will be shot to
> smithereens anyway.
>
> Trond
>
> > Benny
> >
> > >
> > > Client 1 Client 2
> > > ======== ========
> > >
> > > OPEN foo
> > > READ
> > > CLOSE
> > > OPEN
> > > LAYOUTGET ...
> > > WRITE via DS
> > > <dies>...
> > > OPEN foo
> > > verify change_attr
> > > READ if above WRITE is visible
> > > CLOSE
> > >
> > > Trond
> > > _______________________________________________
> > > nfsv4 mailing list
> > > [email protected]
> > > https://www.ietf.org/mailman/listinfo/nfsv4
>
>
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4
>
>
_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-07 23:15:06

by Trond Myklebust

[permalink] [raw]
Subject: RE: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote:
> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote:
> > On Wed, 2010-07-07 at 18:44 -0400, [email protected] wrote:
> > > Let me try this ...
> > >
> > > A correct client will always send LAYOUTCOMMIT.
> > > Assume that the client is correct.
> > > Hence if the LAYOUTCOMMIT doesn't arrive, something's failed.
> > >
> > > Important implication: No LAYOUTCOMMIT is an error/failure case. It
> > > just has to work; it doesn't have to be fast.
> > >
> > > Suggestion: If a client dies while holding writeable layouts that permit
> > > write-in-place, and the client doesn't reappear or doesn't reclaim those
> > > layouts, then the server should assume that the files involved were
> > > written before the client died, and set the file attributes accordingly
> > > as part of internally reclaiming the layout that the client has
> > > abandoned.
> > >
> > > Caveat: It may take a while for the server to determine that the client
> > > has abandoned a layout.
> > >
> > > This can result in false positives (file appears to be modified when it
> > > wasn't) but won't yield false negatives (file does not appear to be
> > > modified even though it was modified).
> >
> > OK... So we're going to have to turn off client side file caching
> > entirely for pNFS? I can do that...
> >
> > The above won't work. Think readahead...
>
> So... What can work, is if you modify it to work explicitly for
> close-to-open
>
> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must
> check that it has received LAYOUTCOMMITs from any other clients that may
> have the file open for writing. If it hasn't, then it MUST take some
> action to ensure that any file data changes are accompanied by a change
^ potentially visible
> attribute update."
>
> Then you can add the above suggestion without the offending caveat. Note
> however that it does break the "SHOULD NOT" admonition in section
> 18.32.4.
>
> Trond
>
>
> > Trond
> >
> > > Thanks,
> > > --David
> > >
> > > > -----Original Message-----
> > > > From: [email protected] [mailto:[email protected]] On Behalf
> > > Of [email protected]
> > > > Sent: Wednesday, July 07, 2010 6:04 PM
> > > > To: [email protected]; Muntz, Daniel
> > > > Cc: [email protected]; [email protected]; [email protected];
> > > [email protected];
> > > > [email protected]; [email protected]
> > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > > >
> > > > > Yes. I would agree that the client cannot rely on the updates being
> > > made
> > > > > visible if it fails to send the LAYOUTCOMMIT. My point was simply
> > > that a
> > > > > compliant server MUST also have a valid strategy for dealing with
> > > the
> > > > > case where the client doesn't send it.
> > > >
> > > > So you are saying the updates "MUST be made visible" through the
> > > > server's valid strategy. Is that right.
> > > >
> > > > And that the client cannot rely on that. Why not, if the server must
> > > > have a valid strategy.
> > > >
> > > > Is this just prudent "belt and suspenders" design or what?
> > > >
> > > > It seems to me that if one side here is MUST (and the spec needs to be
> > > > clearer about what might or might not constitute a valid strategy),
> > > then
> > > > the other side should be SHOULD.
> > > >
> > > > If both sides are "MUST", then if things don't work out then the
> > > client
> > > > and server can equally point to one another and say "It's his fault".
> > > >
> > > > Am I missing something here?
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: [email protected] [mailto:[email protected]] On Behalf
> > > > Of Trond Myklebust
> > > > Sent: Wednesday, July 07, 2010 5:01 PM
> > > > To: Muntz, Daniel
> > > > Cc: [email protected]; [email protected]; [email protected];
> > > > [email protected]; [email protected]; [email protected]
> > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > > >
> > > > On Wed, 2010-07-07 at 16:39 -0400, [email protected] wrote:
> > > > > To bring this discussion full circle, since we agree that a
> > > compliant
> > > > > server can implement a scheme where written data does not become
> > > > visible
> > > > > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
> > > > > "MUST" from a compliant client (independent of layout type)?
> > > >
> > > > Yes. I would agree that the client cannot rely on the updates being
> > > made
> > > > visible if it fails to send the LAYOUTCOMMIT. My point was simply that
> > > a
> > > > compliant server MUST also have a valid strategy for dealing with the
> > > > case where the client doesn't send it.
> > > >
> > > > Cheers
> > > > Trond
> > > >
> > > > > -Dan
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: [email protected] [mailto:[email protected]]
> > > > > > On Behalf Of Trond Myklebust
> > > > > > Sent: Wednesday, July 07, 2010 7:04 AM
> > > > > > To: Benny Halevy
> > > > > > Cc: [email protected]; [email protected]; Garth
> > > > > > Gibson; Brent Welch; NFSv4
> > > > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > > > > >
> > > > > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > > > > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> > > > > > <[email protected]> wrote:
> > > > > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > > > > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > > > > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> > > > > > <[email protected]> wrote:
> > > > > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected]
> > > > wrote:
> > > > > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I
> > > see it as
> > > > > > > >>>>> orthogonal to updating the metadata on the MDS (but
> > > perhaps I'm wrong).
> > > > > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a
> > > synchronization
> > > > > > > >>>>> point, so even if the non-clustered server does not want
> > > to update
> > > > > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a
> > > trigger to
> > > > > > > >>>>> execute whatever synchronization mechanism the implementer
> > > wishes to put
> > > > > > > >>>>> in the control protocol.
> > > > > > > >>>>
> > > > > > > >>>> As far as I'm aware, there are no exceptions in RFC5661
> > > that would allow
> > > > > > > >>>> pNFS servers to break the rule that any visible change to
> > > the data must
> > > > > > > >>>> be atomically accompanied with a change attribute update.
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>> Trond, I'm not sure how this rule you mentioned is
> > > specified.
> > > > > > > >>>
> > > > > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and
> > > change/time_modify
> > > > > > > >>> in particular:
> > > > > > > >>>
> > > > > > > >>> For some layout protocols, the storage device is able to
> > > notify the
> > > > > > > >>> metadata server of the occurrence of an I/O; as a result,
> > > the change
> > > > > > > >>> and time_modify attributes may be updated at the metadata
> > > server.
> > > > > > > >>> For a metadata server that is capable of monitoring
> > > updates to the
> > > > > > > >>> change and time_modify attributes, LAYOUTCOMMIT
> > > processing is not
> > > > > > > >>> required to update the change attribute. In this case,
> > > the metadata
> > > > > > > >>> server must ensure that no further update to the data has
> > > occurred
> > > > > > > >>> since the last update of the attributes; file-based
> > > protocols may
> > > > > > > >>> have enough information to make this determination or may
> > > update the
> > > > > > > >>> change attribute upon each file modification. This also
> > > applies for
> > > > > > > >>> the time_modify attribute. If the server implementation
> > > is able to
> > > > > > > >>> determine that the file has not been modified since the
> > > last
> > > > > > > >>> time_modify update, the server need not update
> > > time_modify at
> > > > > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated
> > > attributes
> > > > > > > >>> should be visible if that file was modified since the
> > > latest previous
> > > > > > > >>> LAYOUTCOMMIT or LAYOUTGET
> > > > > > > >>
> > > > > > > >> I know. However the above paragraph does not state that the
> > > server
> > > > > > > >> should make those changes visible to clients other than the
> > > one that is
> > > > > > > >> writing.
> > > > > > > >>
> > > > > > > >> Section 18.32.4 states that writes will cause the
> > > time_modified and
> > > > > > > >> change attributes to be updated (if and only if the file data
> > > is
> > > > > > > >> modified). Several other sections rely on this behaviour,
> > > including
> > > > > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > > > > > > >>
> > > > > > > >> The only 'special behaviour' that I see allowed for pNFS is
> > > in section
> > > > > > > >> 13.10, which states that clients can't expect to see changes
> > > > > > > >> immediately, but that they must be able to expect
> > > close-to-open
> > > > > > > >> semantics to work. Again, if this is to be the case, then the
> > > server
> > > > > > > >> _must_ be able to deal with the case where client 1 dies
> > > before it can
> > > > > > > >> issue the LAYOUTCOMMIT.
> > > > > > >
> > > > > > > Agreed.
> > > > > > >
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>>> As I see it, if your server allows one client to read data
> > > that may have
> > > > > > > >>>> been modified by another client that holds a WRITE layout
> > > for that range
> > > > > > > >>>> then (since that is a visible data change) it should
> > > provide a change
> > > > > > > >>>> attribute update irrespective of whether or not a
> > > LAYOUTCOMMIT has been
> > > > > > > >>>> sent.
> > > > > > > >>>
> > > > > > > >>> the requirement for the server in WRITE's implementation
> > > section
> > > > > > > >>> is quite weak: "It is assumed that the act of writing data
> > > to a file will
> > > > > > > >>> cause the time_modified and change attributes of the file to
> > > be updated."
> > > > > > > >>>
> > > > > > > >>> The difference here is that for pNFS the written data is not
> > > guaranteed
> > > > > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense,
> > > assuming the clients
> > > > > > > >>> are caching dirty data and use a write-behind cache,
> > > application-written data
> > > > > > > >>> may be visible to other processes on the same host but not
> > > to others until
> > > > > > > >>> fsync() or close() - open-to-close semantics are the only
> > > thing the client
> > > > > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and
> > > close() ensure the
> > > > > > > >>> data is committed to stable storage and is visible to all
> > > other clients in
> > > > > > > >>> the cluster.
> > > > > > > >>
> > > > > > > >> See above. I'm not disputing your statement that 'the written
> > > data is
> > > > > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am
> > > disputing an
> > > > > > > >> assumption that 'the written data may be visible without an
> > > accompanying
> > > > > > > >> change attribute update'.
> > > > > > > >
> > > > > > > >
> > > > > > > > In other words, I'd expect the following scenario to give the
> > > same
> > > > > > > > results in NFSv4.1 w/pNFS as it does in NFSv4:
> > > > > > >
> > > > > > > That's a strong requirement that may limit the scalability of
> > > the server.
> > > > > > >
> > > > > > > The spirit of the pNFS operations, at least from Panasas
> > > perspective was that
> > > > > > > the data is transient until LAYOUTCOMMIT, meaning it may or may
> > > not be visible
> > > > > > > to clients other than the one who wrote it, and its associated
> > > metadata MUST
> > > > > > > be updated and describe the new data only on LAYOUTCOMMIT and
> > > until then it's
> > > > > > > undefined, i.e. it's up to the server implementation whether to
> > > update it or not.
> > > > > > >
> > > > > > > Without locking, what do the stronger semantics buy you?
> > > > > > > Even if a client verified the change_attribute new data may
> > > become visible
> > > > > > > at any time after the GETATTR if the file/byte range aren't
> > > locked.
> > > > > >
> > > > > > There is no locking needed in the scenario below: it is ordinary
> > > > > > close-to-open semantics.
> > > > > >
> > > > > > The point is that if you remove the one and only way that clients
> > > have
> > > > > > to determine whether or not their data caches are valid, then they
> > > can
> > > > > > no longer cache data at all, and server scalability will be shot
> > > to
> > > > > > smithereens anyway.
> > > > > >
> > > > > > Trond
> > > > > >
> > > > > > > Benny
> > > > > > >
> > > > > > > >
> > > > > > > > Client 1 Client 2
> > > > > > > > ======== ========
> > > > > > > >
> > > > > > > > OPEN foo
> > > > > > > > READ
> > > > > > > > CLOSE
> > > > > > > > OPEN
> > > > > > > > LAYOUTGET ...
> > > > > > > > WRITE via DS
> > > > > > > > <dies>...
> > > > > > > > OPEN foo
> > > > > > > > verify change_attr
> > > > > > > > READ if above WRITE is visible
> > > > > > > > CLOSE
> > > > > > > >
> > > > > > > > Trond
> > > > > > > > _______________________________________________
> > > > > > > > nfsv4 mailing list
> > > > > > > > [email protected]
> > > > > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > nfsv4 mailing list
> > > > > > [email protected]
> > > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > > > > >
> > > > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > nfsv4 mailing list
> > > > [email protected]
> > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > > >
> > > > _______________________________________________
> > > > nfsv4 mailing list
> > > > [email protected]
> > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > >
> >
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html




2010-07-08 22:12:47

by Sorin Faibish

[permalink] [raw]
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

All, After discussing this issue with Dave Noveck and as I mentioned in the
call today I think that this is a serious issue and a disconnect between
different layout types behavior. My proposal is to have this discussion F2F
in Maastricht on the white board. So I will add an agenda item to the WG
on this topic. I could address the behavior of the block layout but
it is not something we want to mimic as we all agreed at cthon to avoid the
LAYOUTCOMMIT as much as possible for file layout. If we solve the
issue using the proposed mechanism (Trond) we will create a conflict
with the use of LAYOUTCOMMIT. Just as a hint the difference from block is
that block uses layout for write and read as different leases and
when a client has layout for read the server will always send him
a LAYOUTRETURN when either upgrading his lease to write of send a layout
for write to another client. We don't want to do same for file, I
don't think so. My 2c.

/Sorin

On Thu, 08 Jul 2010 16:30:48 -0400, <[email protected]> wrote:

>> Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the client =
=20
>> hasn't
>> written to the file. I'm not sure what about the blocks case though, =20
>> do you
>> implicitly free up any provisionally allocated blocks that the client =20
>> had not
>> explicitly committed using LAYOUTCOMMIT?
>
> In principle, yes as the blocks are no longer promised to the client, =20
> although
> lazy evaluation of this is an obvious optimization.
>
>> >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must
>> >> check that it has received LAYOUTCOMMITs from any other clients that =
=20
>> may
>> >> have the file open for writing. If it hasn't, then it MUST take some
>> >> action to ensure that any file data changes are accompanied by a =20
>> change
>> > ^ potentially visible
>> >> attribute update."
>>
>> That should be OK as long as it's not for every GETATTR for the change, =
=20
>> mtime,
>> or size attributes.
>>
>> >>
>> >> Then you can add the above suggestion without the offending caveat. =
=20
>> Note
>> >> however that it does break the "SHOULD NOT" admonition in section
>> >> 18.32.4.
>>
>> Better be safe than sorry in this rare error case.
>
> I concur with Benny on both of the above - in essence, the unrecovered =20
> client failure is a reason to potentially ignore the "SHOULD" (server =20
> can't know whether it actually ignored the "SHOULD", hence better safe =20
> than sorry). We probably ought to find a someplace appropriate to add a =
=20
> paragraph or two explaining this in one of the 4.2 documents.
>
> Thanks,
> --David
>
>
>> -----Original Message-----
>> From: Benny Halevy [mailto:[email protected]] On Behalf Of Benny =
=20
>> Halevy
>> Sent: Thursday, July 08, 2010 12:00 PM
>> To: Trond Myklebust
>> Cc: Black, David; Noveck, David; Muntz, Daniel; =20
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]
>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
>>
>> On Jul. 08, 2010, 2:14 +0300, Trond Myklebust =20
>> <[email protected]> wrote:
>> > On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote:
>> >> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote:
>> >>> On Wed, 2010-07-07 at 18:44 -0400, [email protected] wrote:
>> >>>> Let me try this ...
>> >>>>
>> >>>> A correct client will always send LAYOUTCOMMIT.
>> >>>> Assume that the client is correct.
>> >>>> Hence if the LAYOUTCOMMIT doesn't arrive, something's failed.
>> >>>>
>> >>>> Important implication: No LAYOUTCOMMIT is an error/failure case. =
=20
>> It
>> >>>> just has to work; it doesn't have to be fast.
>> >>>>
>>
>> Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the client =
=20
>> hasn't
>> written to the file. I'm not sure what about the blocks case though, =20
>> do you
>> implicitly free up any provisionally allocated blocks that the client =20
>> had not
>> explicitly committed using LAYOUTCOMMIT?
>>
>> >>>> Suggestion: If a client dies while holding writeable layouts that =
=20
>> permit
>> >>>> write-in-place, and the client doesn't reappear or doesn't reclaim =
=20
>> those
>> >>>> layouts, then the server should assume that the files involved were
>> >>>> written before the client died, and set the file attributes =20
>> accordingly
>> >>>> as part of internally reclaiming the layout that the client has
>> >>>> abandoned.
>>
>> Of course. That's part of the server recovery.
>>
>> >>>>
>> >>>> Caveat: It may take a while for the server to determine that the =20
>> client
>> >>>> has abandoned a layout.
>>
>> That's two lease times after a respective CB_LAYOUTRECALL.
>>
>> >>>>
>> >>>> This can result in false positives (file appears to be modified =20
>> when it
>> >>>> wasn't) but won't yield false negatives (file does not appear to be
>> >>>> modified even though it was modified).
>> >>>
>> >>> OK... So we're going to have to turn off client side file caching
>> >>> entirely for pNFS? I can do that...
>> >>>
>> >>> The above won't work. Think readahead...
>> >>
>> >> So... What can work, is if you modify it to work explicitly for
>> >> close-to-open
>> >>
>> >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must
>> >> check that it has received LAYOUTCOMMITs from any other clients that =
=20
>> may
>> >> have the file open for writing. If it hasn't, then it MUST take some
>> >> action to ensure that any file data changes are accompanied by a =20
>> change
>> > ^ potentially visible
>> >> attribute update."
>>
>> That should be OK as long as it's not for every GETATTR for the change, =
=20
>> mtime,
>> or size attributes.
>>
>> >>
>> >> Then you can add the above suggestion without the offending caveat. =
=20
>> Note
>> >> however that it does break the "SHOULD NOT" admonition in section
>> >> 18.32.4.
>>
>> Better be safe than sorry in this rare error case.
>>
>> Benny
>>
>> >>
>> >> Trond
>> >>
>> >>
>> >>> Trond
>> >>>
>> >>>> Thanks,
>> >>>> --David
>> >>>>
>> >>>>> -----Original Message-----
>> >>>>> From: [email protected] [mailto:[email protected]] On =20
>> Behalf
>> >>>> Of [email protected]
>> >>>>> Sent: Wednesday, July 07, 2010 6:04 PM
>> >>>>> To: [email protected]; Muntz, Daniel
>> >>>>> Cc: [email protected]; [email protected]; =20
>> [email protected];
>> >>>> [email protected];
>> >>>>> [email protected]; [email protected]
>> >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
>> >>>>>
>> >>>>>> Yes. I would agree that the client cannot rely on the updates =20
>> being
>> >>>> made
>> >>>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply
>> >>>> that a
>> >>>>>> compliant server MUST also have a valid strategy for dealing with
>> >>>> the
>> >>>>>> case where the client doesn't send it.
>> >>>>>
>> >>>>> So you are saying the updates "MUST be made visible" through the
>> >>>>> server's valid strategy. Is that right.
>> >>>>>
>> >>>>> And that the client cannot rely on that. Why not, if the server =
=20
>> must
>> >>>>> have a valid strategy.
>> >>>>>
>> >>>>> Is this just prudent "belt and suspenders" design or what?
>> >>>>>
>> >>>>> It seems to me that if one side here is MUST (and the spec needs =
=20
>> to be
>> >>>>> clearer about what might or might not constitute a valid =20
>> strategy),
>> >>>> then
>> >>>>> the other side should be SHOULD.
>> >>>>>
>> >>>>> If both sides are "MUST", then if things don't work out then the
>> >>>> client
>> >>>>> and server can equally point to one another and say "It's his =20
>> fault".
>> >>>>>
>> >>>>> Am I missing something here?
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> -----Original Message-----
>> >>>>> From: [email protected] [mailto:[email protected]] On =20
>> Behalf
>> >>>>> Of Trond Myklebust
>> >>>>> Sent: Wednesday, July 07, 2010 5:01 PM
>> >>>>> To: Muntz, Daniel
>> >>>>> Cc: [email protected]; [email protected]; =20
>> [email protected];
>> >>>>> [email protected]; [email protected]; [email protected]
>> >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
>> >>>>>
>> >>>>> On Wed, 2010-07-07 at 16:39 -0400, [email protected] wrote:
>> >>>>>> To bring this discussion full circle, since we agree that a
>> >>>> compliant
>> >>>>>> server can implement a scheme where written data does not become
>> >>>>> visible
>> >>>>>> until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT =20
>> is a
>> >>>>>> "MUST" from a compliant client (independent of layout type)?
>> >>>>>
>> >>>>> Yes. I would agree that the client cannot rely on the updates =20
>> being
>> >>>> made
>> >>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply =
=20
>> that
>> >>>> a
>> >>>>> compliant server MUST also have a valid strategy for dealing with =
=20
>> the
>> >>>>> case where the client doesn't send it.
>> >>>>>
>> >>>>> Cheers
>> >>>>> Trond
>> >>>>>
>> >>>>>> -Dan
>> >>>>>>
>> >>>>>>> -----Original Message-----
>> >>>>>>> From: [email protected] [mailto:[email protected]]
>> >>>>>>> On Behalf Of Trond Myklebust
>> >>>>>>> Sent: Wednesday, July 07, 2010 7:04 AM
>> >>>>>>> To: Benny Halevy
>> >>>>>>> Cc: [email protected]; [email protected]; Garth
>> >>>>>>> Gibson; Brent Welch; NFSv4
>> >>>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
>> >>>>>>>
>> >>>>>>> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
>> >>>>>>>> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
>> >>>>>>> <[email protected]> wrote:
>> >>>>>>>>> On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
>> >>>>>>>>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
>> >>>>>>>>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
>> >>>>>>> <[email protected]> wrote:
>> >>>>>>>>>>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected]
>> >>>>> wrote:
>> >>>>>>>>>>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I
>> >>>> see it as
>> >>>>>>>>>>>>> orthogonal to updating the metadata on the MDS (but
>> >>>> perhaps I'm wrong).
>> >>>>>>>>>>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a
>> >>>> synchronization
>> >>>>>>>>>>>>> point, so even if the non-clustered server does not want
>> >>>> to update
>> >>>>>>>>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a
>> >>>> trigger to
>> >>>>>>>>>>>>> execute whatever synchronization mechanism the implementer
>> >>>> wishes to put
>> >>>>>>>>>>>>> in the control protocol.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> As far as I'm aware, there are no exceptions in RFC5661
>> >>>> that would allow
>> >>>>>>>>>>>> pNFS servers to break the rule that any visible change to
>> >>>> the data must
>> >>>>>>>>>>>> be atomically accompanied with a change attribute update.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Trond, I'm not sure how this rule you mentioned is
>> >>>> specified.
>> >>>>>>>>>>>
>> >>>>>>>>>>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and
>> >>>> change/time_modify
>> >>>>>>>>>>> in particular:
>> >>>>>>>>>>>
>> >>>>>>>>>>> For some layout protocols, the storage device is able to
>> >>>> notify the
>> >>>>>>>>>>> metadata server of the occurrence of an I/O; as a result,
>> >>>> the change
>> >>>>>>>>>>> and time_modify attributes may be updated at the metadata
>> >>>> server.
>> >>>>>>>>>>> For a metadata server that is capable of monitoring
>> >>>> updates to the
>> >>>>>>>>>>> change and time_modify attributes, LAYOUTCOMMIT
>> >>>> processing is not
>> >>>>>>>>>>> required to update the change attribute. In this case,
>> >>>> the metadata
>> >>>>>>>>>>> server must ensure that no further update to the data has
>> >>>> occurred
>> >>>>>>>>>>> since the last update of the attributes; file-based
>> >>>> protocols may
>> >>>>>>>>>>> have enough information to make this determination or may
>> >>>> update the
>> >>>>>>>>>>> change attribute upon each file modification. This also
>> >>>> applies for
>> >>>>>>>>>>> the time_modify attribute. If the server implementation
>> >>>> is able to
>> >>>>>>>>>>> determine that the file has not been modified since the
>> >>>> last
>> >>>>>>>>>>> time_modify update, the server need not update
>> >>>> time_modify at
>> >>>>>>>>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated
>> >>>> attributes
>> >>>>>>>>>>> should be visible if that file was modified since the
>> >>>> latest previous
>> >>>>>>>>>>> LAYOUTCOMMIT or LAYOUTGET
>> >>>>>>>>>>
>> >>>>>>>>>> I know. However the above paragraph does not state that the
>> >>>> server
>> >>>>>>>>>> should make those changes visible to clients other than the
>> >>>> one that is
>> >>>>>>>>>> writing.
>> >>>>>>>>>>
>> >>>>>>>>>> Section 18.32.4 states that writes will cause the
>> >>>> time_modified and
>> >>>>>>>>>> change attributes to be updated (if and only if the file data
>> >>>> is
>> >>>>>>>>>> modified). Several other sections rely on this behaviour,
>> >>>> including
>> >>>>>>>>>> section 10.3.1, section 11.7.2.2, and section 11.7.7.
>> >>>>>>>>>>
>> >>>>>>>>>> The only 'special behaviour' that I see allowed for pNFS is
>> >>>> in section
>> >>>>>>>>>> 13.10, which states that clients can't expect to see changes
>> >>>>>>>>>> immediately, but that they must be able to expect
>> >>>> close-to-open
>> >>>>>>>>>> semantics to work. Again, if this is to be the case, then the
>> >>>> server
>> >>>>>>>>>> _must_ be able to deal with the case where client 1 dies
>> >>>> before it can
>> >>>>>>>>>> issue the LAYOUTCOMMIT.
>> >>>>>>>>
>> >>>>>>>> Agreed.
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>>> As I see it, if your server allows one client to read data
>> >>>> that may have
>> >>>>>>>>>>>> been modified by another client that holds a WRITE layout
>> >>>> for that range
>> >>>>>>>>>>>> then (since that is a visible data change) it should
>> >>>> provide a change
>> >>>>>>>>>>>> attribute update irrespective of whether or not a
>> >>>> LAYOUTCOMMIT has been
>> >>>>>>>>>>>> sent.
>> >>>>>>>>>>>
>> >>>>>>>>>>> the requirement for the server in WRITE's implementation
>> >>>> section
>> >>>>>>>>>>> is quite weak: "It is assumed that the act of writing data
>> >>>> to a file will
>> >>>>>>>>>>> cause the time_modified and change attributes of the file to
>> >>>> be updated."
>> >>>>>>>>>>>
>> >>>>>>>>>>> The difference here is that for pNFS the written data is not
>> >>>> guaranteed
>> >>>>>>>>>>> to be visible until LAYOUTCOMMIT. In a broader sense,
>> >>>> assuming the clients
>> >>>>>>>>>>> are caching dirty data and use a write-behind cache,
>> >>>> application-written data
>> >>>>>>>>>>> may be visible to other processes on the same host but not
>> >>>> to others until
>> >>>>>>>>>>> fsync() or close() - open-to-close semantics are the only
>> >>>> thing the client
>> >>>>>>>>>>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and
>> >>>> close() ensure the
>> >>>>>>>>>>> data is committed to stable storage and is visible to all
>> >>>> other clients in
>> >>>>>>>>>>> the cluster.
>> >>>>>>>>>>
>> >>>>>>>>>> See above. I'm not disputing your statement that 'the written
>> >>>> data is
>> >>>>>>>>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am
>> >>>> disputing an
>> >>>>>>>>>> assumption that 'the written data may be visible without an
>> >>>> accompanying
>> >>>>>>>>>> change attribute update'.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> In other words, I'd expect the following scenario to give the
>> >>>> same
>> >>>>>>>>> results in NFSv4.1 w/pNFS as it does in NFSv4:
>> >>>>>>>>
>> >>>>>>>> That's a strong requirement that may limit the scalability of
>> >>>> the server.
>> >>>>>>>>
>> >>>>>>>> The spirit of the pNFS operations, at least from Panasas
>> >>>> perspective was that
>> >>>>>>>> the data is transient until LAYOUTCOMMIT, meaning it may or may
>> >>>> not be visible
>> >>>>>>>> to clients other than the one who wrote it, and its associated
>> >>>> metadata MUST
>> >>>>>>>> be updated and describe the new data only on LAYOUTCOMMIT and
>> >>>> until then it's
>> >>>>>>>> undefined, i.e. it's up to the server implementation whether to
>> >>>> update it or not.
>> >>>>>>>>
>> >>>>>>>> Without locking, what do the stronger semantics buy you?
>> >>>>>>>> Even if a client verified the change_attribute new data may
>> >>>> become visible
>> >>>>>>>> at any time after the GETATTR if the file/byte range aren't
>> >>>> locked.
>> >>>>>>>
>> >>>>>>> There is no locking needed in the scenario below: it is ordinary
>> >>>>>>> close-to-open semantics.
>> >>>>>>>
>> >>>>>>> The point is that if you remove the one and only way that =20
>> clients
>> >>>> have
>> >>>>>>> to determine whether or not their data caches are valid, then =20
>> they
>> >>>> can
>> >>>>>>> no longer cache data at all, and server scalability will be shot
>> >>>> to
>> >>>>>>> smithereens anyway.
>> >>>>>>>
>> >>>>>>> Trond
>> >>>>>>>
>> >>>>>>>> Benny
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Client 1 Client 2
>> >>>>>>>>> =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D
>> >>>>>>>>>
>> >>>>>>>>> OPEN foo
>> >>>>>>>>> READ
>> >>>>>>>>> CLOSE
>> >>>>>>>>> OPEN
>> >>>>>>>>> LAYOUTGET ...
>> >>>>>>>>> WRITE via DS
>> >>>>>>>>> <dies>...
>> >>>>>>>>> OPEN foo
>> >>>>>>>>> verify change_attr
>> >>>>>>>>> READ if above WRITE is visible
>> >>>>>>>>> CLOSE
>> >>>>>>>>>
>> >>>>>>>>> Trond
>> >>>>>>>>> _______________________________________________
>> >>>>>>>>> nfsv4 mailing list
>> >>>>>>>>> [email protected]
>> >>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> _______________________________________________
>> >>>>>>> nfsv4 mailing list
>> >>>>>>> [email protected]
>> >>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>> >>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> nfsv4 mailing list
>> >>>>> [email protected]
>> >>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> nfsv4 mailing list
>> >>>>> [email protected]
>> >>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>> >>>>
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
=20
>> in
>> >> the body of a message to [email protected]
>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
=20
>> in
>> > the body of a message to [email protected]
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4
>
>



--=20
Best Regards

Sorin Faibish
Corporate Distinguished Engineer
Network Storage Group

EMC=B2
where information lives

Phone: 508-435-1000 x 48545
Cellphone: 617-510-0422
Email : [email protected]
_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-02 21:46:49

by Daniel.Muntz

[permalink] [raw]
Subject: RE: 4.1 client - LAYOUTCOMMIT & close

By "extremely lame server" I assume you mean any pNFS server that
doesn't have a cluster FS on the back end. So while this might work
well for NetApp (as long as NetApp never ships a non-clustered pNFS), it
might break others, or at least severely impact their performance. For
example, will the Solaris pNFS server work correctly without
LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate LAYOUTCOMMIT,
but the server is free to handle it as a no-op if the server
implementation does not need to utilize the payload.

-Dan

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Andy Adamson
> Sent: Friday, July 02, 2010 8:41 AM
> To: Sandeep Joshi
> Cc: [email protected]; [email protected]
> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
>
>
> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote:
>
> Hi Sandeep
>
> >
> > In certain cases, I don't see layoutcommit on a file at all even
> > after doing many writes.
>
> FYI:
>
> You should not be paying attention to layoutcommits - they have no
> value for the file layout type.
>
> From RFC 5661:
>
> "The LAYOUTCOMMIT operation commits chages in the layout represented
> by the current filehandle, client ID (derived from the session ID in
> the preceding SEQUENCE operation), byte-range, and stateid."
>
> For the block layout type, this sentence has meaning in that
> there is
> a layoutupdate4 payload that enumerates the blocks that have changed
> state from being 'handed out' to being 'written'.
>
> The file layout type has no layoutupdate4 payload, and the
> layout does
> not change due to writes, and thus the LAYOUTCOMMIT call is useless.
>
> The only field in the LAYOUTCOMMIT4args that might possibly
> be useful
> is the loca_last_write_offset which tells the server what the client
> thinks is the EOF of the file after WRITE. It is an extremely lame
> server (file layout type server) that depends upon clients for this
> info.
>
> >
> >
> >
> > Client side operations:
> >
> > open
> > write(s)
> > close
> >
> >
> > On server side (observed operations):
> >
> > open
> > layoutget's
> > close
> >
> >
> > But, I do not see laycommit at all. In terms data written
> by client
> > it is about 4-5MB.
> >
> > When does client issue laycommit?
>
> The latest linux client sends a layout commit when the VFS does a
> super_operations.write_inode call which happens when the metadata of
> an inode needs updating. We are seriously considering removing the
> layoutcommit call from the file layout client.
>
> -->Andy
>
> >
> >
> > regards,
> >
> > Sandeep
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> linux-nfs"
> > in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2010-07-06 19:20:50

by Daniel.Muntz

[permalink] [raw]
Subject: RE: 4.1 client - LAYOUTCOMMIT & close

The COMMIT to the DS, ttbomk, commits data on the DS. I see it as
orthogonal to updating the metadata on the MDS (but perhaps I'm wrong).
As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization
point, so even if the non-clustered server does not want to update
metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to
execute whatever synchronization mechanism the implementer wishes to put
in the control protocol.

-Dan

> -----Original Message-----
> From: Andy Adamson [mailto:[email protected]]
> Sent: Tuesday, July 06, 2010 6:38 AM
> To: Muntz, Daniel
> Cc: [email protected]; [email protected]; [email protected]
> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
>
>
> On Jul 2, 2010, at 5:46 PM, <[email protected]> wrote:
>
> > By "extremely lame server" I assume you mean any pNFS server that
> > doesn't have a cluster FS on the back end.
>
> No, I mean a pNFS file layout type server that depends upon
> the 'hint'
> of file size given by LAYOUTCOMMIT. This does not mean that the file
> system has to be a cluster FS.
>
> If COMMIT through MDS is set, the MDS to DS protocol (be it a
> cluster
> FS or not) ensures the data is "commited" on the DSs.
> LAYOUTCOMMIT is
> not needed.
>
> If COMMITs are sent to the DSs (or FILE_SYNC writes), then
> the MDS to
> DS protocol (be it a cluster FS or not) should kick off a
> back-end DS
> to MDS communication to update the file size on the MDS.
>
> What I consider an 'extremely lame pNFS file layout server' is one
> that requires COMMITs to the DS and then depends upon the
> LAYOUTCOMMIT
> to communicate the commited data size to the MDS.
>
> -->Andy
>
>
> > So while this might work
> > well for NetApp (as long as NetApp never ships a non-clustered
> > pNFS), it
> > might break others, or at least severely impact their
> performance.
> > For
> > example, will the Solaris pNFS server work correctly without
> > LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate
> > LAYOUTCOMMIT,
> > but the server is free to handle it as a no-op if the server
> > implementation does not need to utilize the payload.
> >
> > -Dan
> >
> >> -----Original Message-----
> >> From: [email protected]
> >> [mailto:[email protected]] On Behalf Of Andy Adamson
> >> Sent: Friday, July 02, 2010 8:41 AM
> >> To: Sandeep Joshi
> >> Cc: [email protected]; [email protected]
> >> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
> >>
> >>
> >> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote:
> >>
> >> Hi Sandeep
> >>
> >>>
> >>> In certain cases, I don't see layoutcommit on a file at all even
> >>> after doing many writes.
> >>
> >> FYI:
> >>
> >> You should not be paying attention to layoutcommits - they have no
> >> value for the file layout type.
> >>
> >> From RFC 5661:
> >>
> >> "The LAYOUTCOMMIT operation commits chages in the layout
> represented
> >> by the current filehandle, client ID (derived from the
> session ID in
> >> the preceding SEQUENCE operation), byte-range, and stateid."
> >>
> >> For the block layout type, this sentence has meaning in that
> >> there is
> >> a layoutupdate4 payload that enumerates the blocks that
> have changed
> >> state from being 'handed out' to being 'written'.
> >>
> >> The file layout type has no layoutupdate4 payload, and the
> >> layout does
> >> not change due to writes, and thus the LAYOUTCOMMIT call
> is useless.
> >>
> >> The only field in the LAYOUTCOMMIT4args that might possibly
> >> be useful
> >> is the loca_last_write_offset which tells the server what
> the client
> >> thinks is the EOF of the file after WRITE. It is an extremely lame
> >> server (file layout type server) that depends upon clients for this
> >> info.
> >>
> >>>
> >>>
> >>>
> >>> Client side operations:
> >>>
> >>> open
> >>> write(s)
> >>> close
> >>>
> >>>
> >>> On server side (observed operations):
> >>>
> >>> open
> >>> layoutget's
> >>> close
> >>>
> >>>
> >>> But, I do not see laycommit at all. In terms data written
> >> by client
> >>> it is about 4-5MB.
> >>>
> >>> When does client issue laycommit?
> >>
> >> The latest linux client sends a layout commit when the VFS does a
> >> super_operations.write_inode call which happens when the
> metadata of
> >> an inode needs updating. We are seriously considering removing the
> >> layoutcommit call from the file layout client.
> >>
> >> -->Andy
> >>
> >>>
> >>>
> >>> regards,
> >>>
> >>> Sandeep
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe
> >> linux-nfs"
> >>> in
> >>> the body of a message to [email protected]
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe
> >> linux-nfs" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >>
>
>
>

2010-07-07 12:05:29

by Benny Halevy

[permalink] [raw]
Subject: Re: 4.1 client - LAYOUTCOMMIT & close

On Jul. 06, 2010, 23:40 +0300, Trond Myklebust <[email protected]> wrote:
> On Tue, 2010-07-06 at 15:20 -0400, [email protected] wrote:
>> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as
>> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong).
>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization
>> point, so even if the non-clustered server does not want to update
>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to
>> execute whatever synchronization mechanism the implementer wishes to put
>> in the control protocol.
>
> As far as I'm aware, there are no exceptions in RFC5661 that would allow
> pNFS servers to break the rule that any visible change to the data must
> be atomically accompanied with a change attribute update.
>

Trond, I'm not sure how this rule you mentioned is specified.

See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify
in particular:

For some layout protocols, the storage device is able to notify the
metadata server of the occurrence of an I/O; as a result, the change
and time_modify attributes may be updated at the metadata server.
For a metadata server that is capable of monitoring updates to the
change and time_modify attributes, LAYOUTCOMMIT processing is not
required to update the change attribute. In this case, the metadata
server must ensure that no further update to the data has occurred
since the last update of the attributes; file-based protocols may
have enough information to make this determination or may update the
change attribute upon each file modification. This also applies for
the time_modify attribute. If the server implementation is able to
determine that the file has not been modified since the last
time_modify update, the server need not update time_modify at
LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes
should be visible if that file was modified since the latest previous
LAYOUTCOMMIT or LAYOUTGET

> As I see it, if your server allows one client to read data that may have
> been modified by another client that holds a WRITE layout for that range
> then (since that is a visible data change) it should provide a change
> attribute update irrespective of whether or not a LAYOUTCOMMIT has been
> sent.

the requirement for the server in WRITE's implementation section
is quite weak: "It is assumed that the act of writing data to a file will
cause the time_modified and change attributes of the file to be updated."

The difference here is that for pNFS the written data is not guaranteed
to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients
are caching dirty data and use a write-behind cache, application-written data
may be visible to other processes on the same host but not to others until
fsync() or close() - open-to-close semantics are the only thing the client
guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the
data is committed to stable storage and is visible to all other clients in
the cluster.

Benny

> If your MDS is incapable of determining whether or not data has changed
> on the DSes, then it should probably recall the WRITE layout if someone
> tries to read data that may have been modified. Said server also needs a
> strategy for determining if a data change occurred if the client that
> held the WRITE layout died before it could send the LAYOUTCOMMIT.
>
> Cheers
> Trond
>
>> -Dan
>>
>>> -----Original Message-----
>>> From: Andy Adamson [mailto:[email protected]]
>>> Sent: Tuesday, July 06, 2010 6:38 AM
>>> To: Muntz, Daniel
>>> Cc: [email protected]; [email protected]; [email protected]
>>> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
>>>
>>>
>>> On Jul 2, 2010, at 5:46 PM, <[email protected]> wrote:
>>>
>>>> By "extremely lame server" I assume you mean any pNFS server that
>>>> doesn't have a cluster FS on the back end.
>>>
>>> No, I mean a pNFS file layout type server that depends upon
>>> the 'hint'
>>> of file size given by LAYOUTCOMMIT. This does not mean that the file
>>> system has to be a cluster FS.
>>>
>>> If COMMIT through MDS is set, the MDS to DS protocol (be it a
>>> cluster
>>> FS or not) ensures the data is "commited" on the DSs.
>>> LAYOUTCOMMIT is
>>> not needed.
>>>
>>> If COMMITs are sent to the DSs (or FILE_SYNC writes), then
>>> the MDS to
>>> DS protocol (be it a cluster FS or not) should kick off a
>>> back-end DS
>>> to MDS communication to update the file size on the MDS.
>>>
>>> What I consider an 'extremely lame pNFS file layout server' is one
>>> that requires COMMITs to the DS and then depends upon the
>>> LAYOUTCOMMIT
>>> to communicate the commited data size to the MDS.
>>>
>>> -->Andy
>>>
>>>
>>>> So while this might work
>>>> well for NetApp (as long as NetApp never ships a non-clustered
>>>> pNFS), it
>>>> might break others, or at least severely impact their
>>> performance.
>>>> For
>>>> example, will the Solaris pNFS server work correctly without
>>>> LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate
>>>> LAYOUTCOMMIT,
>>>> but the server is free to handle it as a no-op if the server
>>>> implementation does not need to utilize the payload.
>>>>
>>>> -Dan
>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected]
>>>>> [mailto:[email protected]] On Behalf Of Andy Adamson
>>>>> Sent: Friday, July 02, 2010 8:41 AM
>>>>> To: Sandeep Joshi
>>>>> Cc: [email protected]; [email protected]
>>>>> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
>>>>>
>>>>>
>>>>> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote:
>>>>>
>>>>> Hi Sandeep
>>>>>
>>>>>>
>>>>>> In certain cases, I don't see layoutcommit on a file at all even
>>>>>> after doing many writes.
>>>>>
>>>>> FYI:
>>>>>
>>>>> You should not be paying attention to layoutcommits - they have no
>>>>> value for the file layout type.
>>>>>
>>>>> From RFC 5661:
>>>>>
>>>>> "The LAYOUTCOMMIT operation commits chages in the layout
>>> represented
>>>>> by the current filehandle, client ID (derived from the
>>> session ID in
>>>>> the preceding SEQUENCE operation), byte-range, and stateid."
>>>>>
>>>>> For the block layout type, this sentence has meaning in that
>>>>> there is
>>>>> a layoutupdate4 payload that enumerates the blocks that
>>> have changed
>>>>> state from being 'handed out' to being 'written'.
>>>>>
>>>>> The file layout type has no layoutupdate4 payload, and the
>>>>> layout does
>>>>> not change due to writes, and thus the LAYOUTCOMMIT call
>>> is useless.
>>>>>
>>>>> The only field in the LAYOUTCOMMIT4args that might possibly
>>>>> be useful
>>>>> is the loca_last_write_offset which tells the server what
>>> the client
>>>>> thinks is the EOF of the file after WRITE. It is an extremely lame
>>>>> server (file layout type server) that depends upon clients for this
>>>>> info.
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Client side operations:
>>>>>>
>>>>>> open
>>>>>> write(s)
>>>>>> close
>>>>>>
>>>>>>
>>>>>> On server side (observed operations):
>>>>>>
>>>>>> open
>>>>>> layoutget's
>>>>>> close
>>>>>>
>>>>>>
>>>>>> But, I do not see laycommit at all. In terms data written
>>>>> by client
>>>>>> it is about 4-5MB.
>>>>>>
>>>>>> When does client issue laycommit?
>>>>>
>>>>> The latest linux client sends a layout commit when the VFS does a
>>>>> super_operations.write_inode call which happens when the
>>> metadata of
>>>>> an inode needs updating. We are seriously considering removing the
>>>>> layoutcommit call from the file layout client.
>>>>>
>>>>> -->Andy
>>>>>
>>>>>>
>>>>>>
>>>>>> regards,
>>>>>>
>>>>>> Sandeep
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-nfs"
>>>>>> in
>>>>>> the body of a message to [email protected]
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>> linux-nfs" in
>>>>> the body of a message to [email protected]
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
>
>

2010-07-07 22:15:12

by Noveck_David

[permalink] [raw]
Subject: RE: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

> Yes. I would agree that the client cannot rely on the updates being
made
> visible if it fails to send the LAYOUTCOMMIT. My point was simply that
a
> compliant server MUST also have a valid strategy for dealing with the
> case where the client doesn't send it.

So you are saying the updates "MUST be made visible" through the
server's valid strategy. Is that right.

And that the client cannot rely on that. Why not, if the server must
have a valid strategy.

Is this just prudent "belt and suspenders" design or what?

It seems to me that if one side here is MUST (and the spec needs to be
clearer about what might or might not constitute a valid strategy), then
the other side should be SHOULD.

If both sides are "MUST", then if things don't work out then the client
and server can equally point to one another and say "It's his fault".

Am I missing something here?



-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf
Of Trond Myklebust
Sent: Wednesday, July 07, 2010 5:01 PM
To: Muntz, Daniel
Cc: [email protected]; [email protected]; [email protected];
[email protected]; [email protected]; [email protected]
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

On Wed, 2010-07-07 at 16:39 -0400, [email protected] wrote:
> To bring this discussion full circle, since we agree that a compliant
> server can implement a scheme where written data does not become
visible
> until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
> "MUST" from a compliant client (independent of layout type)?

Yes. I would agree that the client cannot rely on the updates being made
visible if it fails to send the LAYOUTCOMMIT. My point was simply that a
compliant server MUST also have a valid strategy for dealing with the
case where the client doesn't send it.

Cheers
Trond

> -Dan
>
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]]
> > On Behalf Of Trond Myklebust
> > Sent: Wednesday, July 07, 2010 7:04 AM
> > To: Benny Halevy
> > Cc: [email protected]; [email protected]; Garth
> > Gibson; Brent Welch; NFSv4
> > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> >
> > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> > <[email protected]> wrote:
> > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> > <[email protected]> wrote:
> > > >>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected]
wrote:
> > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS.
> > I see it as
> > > >>>>> orthogonal to updating the metadata on the MDS (but
> > perhaps I'm wrong).
> > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT
> > provides a synchronization
> > > >>>>> point, so even if the non-clustered server does not
> > want to update
> > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also
> > be a trigger to
> > > >>>>> execute whatever synchronization mechanism the
> > implementer wishes to put
> > > >>>>> in the control protocol.
> > > >>>>
> > > >>>> As far as I'm aware, there are no exceptions in
> > RFC5661 that would allow
> > > >>>> pNFS servers to break the rule that any visible change
> > to the data must
> > > >>>> be atomically accompanied with a change attribute update.
> > > >>>>
> > > >>>
> > > >>> Trond, I'm not sure how this rule you mentioned is specified.
> > > >>>
> > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT
> > and change/time_modify
> > > >>> in particular:
> > > >>>
> > > >>> For some layout protocols, the storage device is
> > able to notify the
> > > >>> metadata server of the occurrence of an I/O; as a
> > result, the change
> > > >>> and time_modify attributes may be updated at the
> > metadata server.
> > > >>> For a metadata server that is capable of monitoring
> > updates to the
> > > >>> change and time_modify attributes, LAYOUTCOMMIT
> > processing is not
> > > >>> required to update the change attribute. In this
> > case, the metadata
> > > >>> server must ensure that no further update to the
> > data has occurred
> > > >>> since the last update of the attributes; file-based
> > protocols may
> > > >>> have enough information to make this determination
> > or may update the
> > > >>> change attribute upon each file modification. This
> > also applies for
> > > >>> the time_modify attribute. If the server
> > implementation is able to
> > > >>> determine that the file has not been modified since the
last
> > > >>> time_modify update, the server need not update time_modify
at
> > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the
> > updated attributes
> > > >>> should be visible if that file was modified since
> > the latest previous
> > > >>> LAYOUTCOMMIT or LAYOUTGET
> > > >>
> > > >> I know. However the above paragraph does not state that
> > the server
> > > >> should make those changes visible to clients other than
> > the one that is
> > > >> writing.
> > > >>
> > > >> Section 18.32.4 states that writes will cause the
> > time_modified and
> > > >> change attributes to be updated (if and only if the file data
is
> > > >> modified). Several other sections rely on this
> > behaviour, including
> > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > > >>
> > > >> The only 'special behaviour' that I see allowed for pNFS
> > is in section
> > > >> 13.10, which states that clients can't expect to see changes
> > > >> immediately, but that they must be able to expect close-to-open
> > > >> semantics to work. Again, if this is to be the case,
> > then the server
> > > >> _must_ be able to deal with the case where client 1 dies
> > before it can
> > > >> issue the LAYOUTCOMMIT.
> > >
> > > Agreed.
> > >
> > > >>
> > > >>
> > > >>>> As I see it, if your server allows one client to read
> > data that may have
> > > >>>> been modified by another client that holds a WRITE
> > layout for that range
> > > >>>> then (since that is a visible data change) it should
> > provide a change
> > > >>>> attribute update irrespective of whether or not a
> > LAYOUTCOMMIT has been
> > > >>>> sent.
> > > >>>
> > > >>> the requirement for the server in WRITE's
> > implementation section
> > > >>> is quite weak: "It is assumed that the act of writing
> > data to a file will
> > > >>> cause the time_modified and change attributes of the
> > file to be updated."
> > > >>>
> > > >>> The difference here is that for pNFS the written data
> > is not guaranteed
> > > >>> to be visible until LAYOUTCOMMIT. In a broader sense,
> > assuming the clients
> > > >>> are caching dirty data and use a write-behind cache,
> > application-written data
> > > >>> may be visible to other processes on the same host but
> > not to others until
> > > >>> fsync() or close() - open-to-close semantics are the
> > only thing the client
> > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and
> > close() ensure the
> > > >>> data is committed to stable storage and is visible to
> > all other clients in
> > > >>> the cluster.
> > > >>
> > > >> See above. I'm not disputing your statement that 'the
> > written data is
> > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am
> > disputing an
> > > >> assumption that 'the written data may be visible without
> > an accompanying
> > > >> change attribute update'.
> > > >
> > > >
> > > > In other words, I'd expect the following scenario to give the
same
> > > > results in NFSv4.1 w/pNFS as it does in NFSv4:
> > >
> > > That's a strong requirement that may limit the scalability
> > of the server.
> > >
> > > The spirit of the pNFS operations, at least from Panasas
> > perspective was that
> > > the data is transient until LAYOUTCOMMIT, meaning it may or
> > may not be visible
> > > to clients other than the one who wrote it, and its
> > associated metadata MUST
> > > be updated and describe the new data only on LAYOUTCOMMIT
> > and until then it's
> > > undefined, i.e. it's up to the server implementation
> > whether to update it or not.
> > >
> > > Without locking, what do the stronger semantics buy you?
> > > Even if a client verified the change_attribute new data may
> > become visible
> > > at any time after the GETATTR if the file/byte range aren't
locked.
> >
> > There is no locking needed in the scenario below: it is ordinary
> > close-to-open semantics.
> >
> > The point is that if you remove the one and only way that clients
have
> > to determine whether or not their data caches are valid, then they
can
> > no longer cache data at all, and server scalability will be shot to
> > smithereens anyway.
> >
> > Trond
> >
> > > Benny
> > >
> > > >
> > > > Client 1 Client 2
> > > > ======== ========
> > > >
> > > > OPEN foo
> > > > READ
> > > > CLOSE
> > > > OPEN
> > > > LAYOUTGET ...
> > > > WRITE via DS
> > > > <dies>...
> > > > OPEN foo
> > > > verify change_attr
> > > > READ if above WRITE is visible
> > > > CLOSE
> > > >
> > > > Trond
> > > > _______________________________________________
> > > > nfsv4 mailing list
> > > > [email protected]
> > > > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> >
> > _______________________________________________
> > nfsv4 mailing list
> > [email protected]
> > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> >


_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4


2010-07-07 23:09:09

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote:
> On Wed, 2010-07-07 at 18:44 -0400, [email protected] wrote:
> > Let me try this ...
> >
> > A correct client will always send LAYOUTCOMMIT.
> > Assume that the client is correct.
> > Hence if the LAYOUTCOMMIT doesn't arrive, something's failed.
> >
> > Important implication: No LAYOUTCOMMIT is an error/failure case. It
> > just has to work; it doesn't have to be fast.
> >
> > Suggestion: If a client dies while holding writeable layouts that permit
> > write-in-place, and the client doesn't reappear or doesn't reclaim those
> > layouts, then the server should assume that the files involved were
> > written before the client died, and set the file attributes accordingly
> > as part of internally reclaiming the layout that the client has
> > abandoned.
> >
> > Caveat: It may take a while for the server to determine that the client
> > has abandoned a layout.
> >
> > This can result in false positives (file appears to be modified when it
> > wasn't) but won't yield false negatives (file does not appear to be
> > modified even though it was modified).
>
> OK... So we're going to have to turn off client side file caching
> entirely for pNFS? I can do that...
>
> The above won't work. Think readahead...

So... What can work, is if you modify it to work explicitly for
close-to-open

"Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must
check that it has received LAYOUTCOMMITs from any other clients that may
have the file open for writing. If it hasn't, then it MUST take some
action to ensure that any file data changes are accompanied by a change
attribute update."

Then you can add the above suggestion without the offending caveat. Note
however that it does break the "SHOULD NOT" admonition in section
18.32.4.

Trond


> Trond
>
> > Thanks,
> > --David
> >
> > > -----Original Message-----
> > > From: [email protected] [mailto:[email protected]] On Behalf
> > Of [email protected]
> > > Sent: Wednesday, July 07, 2010 6:04 PM
> > > To: [email protected]; Muntz, Daniel
> > > Cc: [email protected]; [email protected]; [email protected];
> > [email protected];
> > > [email protected]; [email protected]
> > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > >
> > > > Yes. I would agree that the client cannot rely on the updates being
> > made
> > > > visible if it fails to send the LAYOUTCOMMIT. My point was simply
> > that a
> > > > compliant server MUST also have a valid strategy for dealing with
> > the
> > > > case where the client doesn't send it.
> > >
> > > So you are saying the updates "MUST be made visible" through the
> > > server's valid strategy. Is that right.
> > >
> > > And that the client cannot rely on that. Why not, if the server must
> > > have a valid strategy.
> > >
> > > Is this just prudent "belt and suspenders" design or what?
> > >
> > > It seems to me that if one side here is MUST (and the spec needs to be
> > > clearer about what might or might not constitute a valid strategy),
> > then
> > > the other side should be SHOULD.
> > >
> > > If both sides are "MUST", then if things don't work out then the
> > client
> > > and server can equally point to one another and say "It's his fault".
> > >
> > > Am I missing something here?
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: [email protected] [mailto:[email protected]] On Behalf
> > > Of Trond Myklebust
> > > Sent: Wednesday, July 07, 2010 5:01 PM
> > > To: Muntz, Daniel
> > > Cc: [email protected]; [email protected]; [email protected];
> > > [email protected]; [email protected]; [email protected]
> > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > >
> > > On Wed, 2010-07-07 at 16:39 -0400, [email protected] wrote:
> > > > To bring this discussion full circle, since we agree that a
> > compliant
> > > > server can implement a scheme where written data does not become
> > > visible
> > > > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
> > > > "MUST" from a compliant client (independent of layout type)?
> > >
> > > Yes. I would agree that the client cannot rely on the updates being
> > made
> > > visible if it fails to send the LAYOUTCOMMIT. My point was simply that
> > a
> > > compliant server MUST also have a valid strategy for dealing with the
> > > case where the client doesn't send it.
> > >
> > > Cheers
> > > Trond
> > >
> > > > -Dan
> > > >
> > > > > -----Original Message-----
> > > > > From: [email protected] [mailto:[email protected]]
> > > > > On Behalf Of Trond Myklebust
> > > > > Sent: Wednesday, July 07, 2010 7:04 AM
> > > > > To: Benny Halevy
> > > > > Cc: [email protected]; [email protected]; Garth
> > > > > Gibson; Brent Welch; NFSv4
> > > > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > > > >
> > > > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > > > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> > > > > <[email protected]> wrote:
> > > > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > > > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > > > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> > > > > <[email protected]> wrote:
> > > > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected]
> > > wrote:
> > > > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I
> > see it as
> > > > > > >>>>> orthogonal to updating the metadata on the MDS (but
> > perhaps I'm wrong).
> > > > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a
> > synchronization
> > > > > > >>>>> point, so even if the non-clustered server does not want
> > to update
> > > > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a
> > trigger to
> > > > > > >>>>> execute whatever synchronization mechanism the implementer
> > wishes to put
> > > > > > >>>>> in the control protocol.
> > > > > > >>>>
> > > > > > >>>> As far as I'm aware, there are no exceptions in RFC5661
> > that would allow
> > > > > > >>>> pNFS servers to break the rule that any visible change to
> > the data must
> > > > > > >>>> be atomically accompanied with a change attribute update.
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>> Trond, I'm not sure how this rule you mentioned is
> > specified.
> > > > > > >>>
> > > > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and
> > change/time_modify
> > > > > > >>> in particular:
> > > > > > >>>
> > > > > > >>> For some layout protocols, the storage device is able to
> > notify the
> > > > > > >>> metadata server of the occurrence of an I/O; as a result,
> > the change
> > > > > > >>> and time_modify attributes may be updated at the metadata
> > server.
> > > > > > >>> For a metadata server that is capable of monitoring
> > updates to the
> > > > > > >>> change and time_modify attributes, LAYOUTCOMMIT
> > processing is not
> > > > > > >>> required to update the change attribute. In this case,
> > the metadata
> > > > > > >>> server must ensure that no further update to the data has
> > occurred
> > > > > > >>> since the last update of the attributes; file-based
> > protocols may
> > > > > > >>> have enough information to make this determination or may
> > update the
> > > > > > >>> change attribute upon each file modification. This also
> > applies for
> > > > > > >>> the time_modify attribute. If the server implementation
> > is able to
> > > > > > >>> determine that the file has not been modified since the
> > last
> > > > > > >>> time_modify update, the server need not update
> > time_modify at
> > > > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated
> > attributes
> > > > > > >>> should be visible if that file was modified since the
> > latest previous
> > > > > > >>> LAYOUTCOMMIT or LAYOUTGET
> > > > > > >>
> > > > > > >> I know. However the above paragraph does not state that the
> > server
> > > > > > >> should make those changes visible to clients other than the
> > one that is
> > > > > > >> writing.
> > > > > > >>
> > > > > > >> Section 18.32.4 states that writes will cause the
> > time_modified and
> > > > > > >> change attributes to be updated (if and only if the file data
> > is
> > > > > > >> modified). Several other sections rely on this behaviour,
> > including
> > > > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > > > > > >>
> > > > > > >> The only 'special behaviour' that I see allowed for pNFS is
> > in section
> > > > > > >> 13.10, which states that clients can't expect to see changes
> > > > > > >> immediately, but that they must be able to expect
> > close-to-open
> > > > > > >> semantics to work. Again, if this is to be the case, then the
> > server
> > > > > > >> _must_ be able to deal with the case where client 1 dies
> > before it can
> > > > > > >> issue the LAYOUTCOMMIT.
> > > > > >
> > > > > > Agreed.
> > > > > >
> > > > > > >>
> > > > > > >>
> > > > > > >>>> As I see it, if your server allows one client to read data
> > that may have
> > > > > > >>>> been modified by another client that holds a WRITE layout
> > for that range
> > > > > > >>>> then (since that is a visible data change) it should
> > provide a change
> > > > > > >>>> attribute update irrespective of whether or not a
> > LAYOUTCOMMIT has been
> > > > > > >>>> sent.
> > > > > > >>>
> > > > > > >>> the requirement for the server in WRITE's implementation
> > section
> > > > > > >>> is quite weak: "It is assumed that the act of writing data
> > to a file will
> > > > > > >>> cause the time_modified and change attributes of the file to
> > be updated."
> > > > > > >>>
> > > > > > >>> The difference here is that for pNFS the written data is not
> > guaranteed
> > > > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense,
> > assuming the clients
> > > > > > >>> are caching dirty data and use a write-behind cache,
> > application-written data
> > > > > > >>> may be visible to other processes on the same host but not
> > to others until
> > > > > > >>> fsync() or close() - open-to-close semantics are the only
> > thing the client
> > > > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and
> > close() ensure the
> > > > > > >>> data is committed to stable storage and is visible to all
> > other clients in
> > > > > > >>> the cluster.
> > > > > > >>
> > > > > > >> See above. I'm not disputing your statement that 'the written
> > data is
> > > > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am
> > disputing an
> > > > > > >> assumption that 'the written data may be visible without an
> > accompanying
> > > > > > >> change attribute update'.
> > > > > > >
> > > > > > >
> > > > > > > In other words, I'd expect the following scenario to give the
> > same
> > > > > > > results in NFSv4.1 w/pNFS as it does in NFSv4:
> > > > > >
> > > > > > That's a strong requirement that may limit the scalability of
> > the server.
> > > > > >
> > > > > > The spirit of the pNFS operations, at least from Panasas
> > perspective was that
> > > > > > the data is transient until LAYOUTCOMMIT, meaning it may or may
> > not be visible
> > > > > > to clients other than the one who wrote it, and its associated
> > metadata MUST
> > > > > > be updated and describe the new data only on LAYOUTCOMMIT and
> > until then it's
> > > > > > undefined, i.e. it's up to the server implementation whether to
> > update it or not.
> > > > > >
> > > > > > Without locking, what do the stronger semantics buy you?
> > > > > > Even if a client verified the change_attribute new data may
> > become visible
> > > > > > at any time after the GETATTR if the file/byte range aren't
> > locked.
> > > > >
> > > > > There is no locking needed in the scenario below: it is ordinary
> > > > > close-to-open semantics.
> > > > >
> > > > > The point is that if you remove the one and only way that clients
> > have
> > > > > to determine whether or not their data caches are valid, then they
> > can
> > > > > no longer cache data at all, and server scalability will be shot
> > to
> > > > > smithereens anyway.
> > > > >
> > > > > Trond
> > > > >
> > > > > > Benny
> > > > > >
> > > > > > >
> > > > > > > Client 1 Client 2
> > > > > > > ======== ========
> > > > > > >
> > > > > > > OPEN foo
> > > > > > > READ
> > > > > > > CLOSE
> > > > > > > OPEN
> > > > > > > LAYOUTGET ...
> > > > > > > WRITE via DS
> > > > > > > <dies>...
> > > > > > > OPEN foo
> > > > > > > verify change_attr
> > > > > > > READ if above WRITE is visible
> > > > > > > CLOSE
> > > > > > >
> > > > > > > Trond
> > > > > > > _______________________________________________
> > > > > > > nfsv4 mailing list
> > > > > > > [email protected]
> > > > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > nfsv4 mailing list
> > > > > [email protected]
> > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > > > >
> > > > >
> > >
> > >
> > > _______________________________________________
> > > nfsv4 mailing list
> > > [email protected]
> > > https://www.ietf.org/mailman/listinfo/nfsv4
> > >
> > > _______________________________________________
> > > nfsv4 mailing list
> > > [email protected]
> > > https://www.ietf.org/mailman/listinfo/nfsv4
> >
>
>


_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-06 13:36:00

by Benny Halevy

[permalink] [raw]
Subject: Re: 4.1 client - LAYOUTCOMMIT & close

On Jul. 03, 2010, 0:46 +0300, <[email protected]> wrote:
> By "extremely lame server" I assume you mean any pNFS server that
> doesn't have a cluster FS on the back end. So while this might work
> well for NetApp (as long as NetApp never ships a non-clustered pNFS), it
> might break others, or at least severely impact their performance. For
> example, will the Solaris pNFS server work correctly without
> LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate LAYOUTCOMMIT,
> but the server is free to handle it as a no-op if the server
> implementation does not need to utilize the payload.

I completely agree.

Only with Dave Noveck suggestion of adding a "LAYOUT_{DATA,FILE}_SYNC4"
stable_how4 values (or maybe a LAYOUT_SYNC4=4 or higher power of 2 flag)
to be returned by a DS on WRITE, the DS can say that it ensures metadata
synchronization with the MDS in a cluster coherent way and the client can relax
and avoid sending LAYOUTCOMMIT to the MDS.

Otherwise, the linux implementation can potentially support a mount option
telling the client to not send a LAYOUTCOMMIT to the MDS as an optimization
if the admin is sure that the server doesn't require it.

Benny


>
> -Dan
>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Andy Adamson
>> Sent: Friday, July 02, 2010 8:41 AM
>> To: Sandeep Joshi
>> Cc: [email protected]; [email protected]
>> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
>>
>>
>> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote:
>>
>> Hi Sandeep
>>
>>>
>>> In certain cases, I don't see layoutcommit on a file at all even
>>> after doing many writes.
>>
>> FYI:
>>
>> You should not be paying attention to layoutcommits - they have no
>> value for the file layout type.
>>
>> From RFC 5661:
>>
>> "The LAYOUTCOMMIT operation commits chages in the layout represented
>> by the current filehandle, client ID (derived from the session ID in
>> the preceding SEQUENCE operation), byte-range, and stateid."
>>
>> For the block layout type, this sentence has meaning in that
>> there is
>> a layoutupdate4 payload that enumerates the blocks that have changed
>> state from being 'handed out' to being 'written'.
>>
>> The file layout type has no layoutupdate4 payload, and the
>> layout does
>> not change due to writes, and thus the LAYOUTCOMMIT call is useless.
>>
>> The only field in the LAYOUTCOMMIT4args that might possibly
>> be useful
>> is the loca_last_write_offset which tells the server what the client
>> thinks is the EOF of the file after WRITE. It is an extremely lame
>> server (file layout type server) that depends upon clients for this
>> info.
>>
>>>
>>>
>>>
>>> Client side operations:
>>>
>>> open
>>> write(s)
>>> close
>>>
>>>
>>> On server side (observed operations):
>>>
>>> open
>>> layoutget's
>>> close
>>>
>>>
>>> But, I do not see laycommit at all. In terms data written
>> by client
>>> it is about 4-5MB.
>>>
>>> When does client issue laycommit?
>>
>> The latest linux client sends a layout commit when the VFS does a
>> super_operations.write_inode call which happens when the metadata of
>> an inode needs updating. We are seriously considering removing the
>> layoutcommit call from the file layout client.
>>
>> -->Andy
>>
>>>
>>>
>>> regards,
>>>
>>> Sandeep
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>> linux-nfs"
>>> in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>

2010-07-07 13:06:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: 4.1 client - LAYOUTCOMMIT & close

On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust <[email protected]> wrote:
> > On Tue, 2010-07-06 at 15:20 -0400, [email protected] wrote:
> >> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as
> >> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong).
> >> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization
> >> point, so even if the non-clustered server does not want to update
> >> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to
> >> execute whatever synchronization mechanism the implementer wishes to put
> >> in the control protocol.
> >
> > As far as I'm aware, there are no exceptions in RFC5661 that would allow
> > pNFS servers to break the rule that any visible change to the data must
> > be atomically accompanied with a change attribute update.
> >
>
> Trond, I'm not sure how this rule you mentioned is specified.
>
> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify
> in particular:
>
> For some layout protocols, the storage device is able to notify the
> metadata server of the occurrence of an I/O; as a result, the change
> and time_modify attributes may be updated at the metadata server.
> For a metadata server that is capable of monitoring updates to the
> change and time_modify attributes, LAYOUTCOMMIT processing is not
> required to update the change attribute. In this case, the metadata
> server must ensure that no further update to the data has occurred
> since the last update of the attributes; file-based protocols may
> have enough information to make this determination or may update the
> change attribute upon each file modification. This also applies for
> the time_modify attribute. If the server implementation is able to
> determine that the file has not been modified since the last
> time_modify update, the server need not update time_modify at
> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes
> should be visible if that file was modified since the latest previous
> LAYOUTCOMMIT or LAYOUTGET

I know. However the above paragraph does not state that the server
should make those changes visible to clients other than the one that is
writing.

Section 18.32.4 states that writes will cause the time_modified and
change attributes to be updated (if and only if the file data is
modified). Several other sections rely on this behaviour, including
section 10.3.1, section 11.7.2.2, and section 11.7.7.

The only 'special behaviour' that I see allowed for pNFS is in section
13.10, which states that clients can't expect to see changes
immediately, but that they must be able to expect close-to-open
semantics to work. Again, if this is to be the case, then the server
_must_ be able to deal with the case where client 1 dies before it can
issue the LAYOUTCOMMIT.


> > As I see it, if your server allows one client to read data that may have
> > been modified by another client that holds a WRITE layout for that range
> > then (since that is a visible data change) it should provide a change
> > attribute update irrespective of whether or not a LAYOUTCOMMIT has been
> > sent.
>
> the requirement for the server in WRITE's implementation section
> is quite weak: "It is assumed that the act of writing data to a file will
> cause the time_modified and change attributes of the file to be updated."
>
> The difference here is that for pNFS the written data is not guaranteed
> to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients
> are caching dirty data and use a write-behind cache, application-written data
> may be visible to other processes on the same host but not to others until
> fsync() or close() - open-to-close semantics are the only thing the client
> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the
> data is committed to stable storage and is visible to all other clients in
> the cluster.

See above. I'm not disputing your statement that 'the written data is
not guaranteed to be visible until LAYOUTCOMMIT'. I am disputing an
assumption that 'the written data may be visible without an accompanying
change attribute update'.

Trond

> Benny
>
> > If your MDS is incapable of determining whether or not data has changed
> > on the DSes, then it should probably recall the WRITE layout if someone
> > tries to read data that may have been modified. Said server also needs a
> > strategy for determining if a data change occurred if the client that
> > held the WRITE layout died before it could send the LAYOUTCOMMIT.
> >
> > Cheers
> > Trond
> >
> >> -Dan
> >>
> >>> -----Original Message-----
> >>> From: Andy Adamson [mailto:[email protected]]
> >>> Sent: Tuesday, July 06, 2010 6:38 AM
> >>> To: Muntz, Daniel
> >>> Cc: [email protected]; [email protected]; [email protected]
> >>> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
> >>>
> >>>
> >>> On Jul 2, 2010, at 5:46 PM, <[email protected]> wrote:
> >>>
> >>>> By "extremely lame server" I assume you mean any pNFS server that
> >>>> doesn't have a cluster FS on the back end.
> >>>
> >>> No, I mean a pNFS file layout type server that depends upon
> >>> the 'hint'
> >>> of file size given by LAYOUTCOMMIT. This does not mean that the file
> >>> system has to be a cluster FS.
> >>>
> >>> If COMMIT through MDS is set, the MDS to DS protocol (be it a
> >>> cluster
> >>> FS or not) ensures the data is "commited" on the DSs.
> >>> LAYOUTCOMMIT is
> >>> not needed.
> >>>
> >>> If COMMITs are sent to the DSs (or FILE_SYNC writes), then
> >>> the MDS to
> >>> DS protocol (be it a cluster FS or not) should kick off a
> >>> back-end DS
> >>> to MDS communication to update the file size on the MDS.
> >>>
> >>> What I consider an 'extremely lame pNFS file layout server' is one
> >>> that requires COMMITs to the DS and then depends upon the
> >>> LAYOUTCOMMIT
> >>> to communicate the commited data size to the MDS.
> >>>
> >>> -->Andy
> >>>
> >>>
> >>>> So while this might work
> >>>> well for NetApp (as long as NetApp never ships a non-clustered
> >>>> pNFS), it
> >>>> might break others, or at least severely impact their
> >>> performance.
> >>>> For
> >>>> example, will the Solaris pNFS server work correctly without
> >>>> LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate
> >>>> LAYOUTCOMMIT,
> >>>> but the server is free to handle it as a no-op if the server
> >>>> implementation does not need to utilize the payload.
> >>>>
> >>>> -Dan
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: [email protected]
> >>>>> [mailto:[email protected]] On Behalf Of Andy Adamson
> >>>>> Sent: Friday, July 02, 2010 8:41 AM
> >>>>> To: Sandeep Joshi
> >>>>> Cc: [email protected]; [email protected]
> >>>>> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
> >>>>>
> >>>>>
> >>>>> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote:
> >>>>>
> >>>>> Hi Sandeep
> >>>>>
> >>>>>>
> >>>>>> In certain cases, I don't see layoutcommit on a file at all even
> >>>>>> after doing many writes.
> >>>>>
> >>>>> FYI:
> >>>>>
> >>>>> You should not be paying attention to layoutcommits - they have no
> >>>>> value for the file layout type.
> >>>>>
> >>>>> From RFC 5661:
> >>>>>
> >>>>> "The LAYOUTCOMMIT operation commits chages in the layout
> >>> represented
> >>>>> by the current filehandle, client ID (derived from the
> >>> session ID in
> >>>>> the preceding SEQUENCE operation), byte-range, and stateid."
> >>>>>
> >>>>> For the block layout type, this sentence has meaning in that
> >>>>> there is
> >>>>> a layoutupdate4 payload that enumerates the blocks that
> >>> have changed
> >>>>> state from being 'handed out' to being 'written'.
> >>>>>
> >>>>> The file layout type has no layoutupdate4 payload, and the
> >>>>> layout does
> >>>>> not change due to writes, and thus the LAYOUTCOMMIT call
> >>> is useless.
> >>>>>
> >>>>> The only field in the LAYOUTCOMMIT4args that might possibly
> >>>>> be useful
> >>>>> is the loca_last_write_offset which tells the server what
> >>> the client
> >>>>> thinks is the EOF of the file after WRITE. It is an extremely lame
> >>>>> server (file layout type server) that depends upon clients for this
> >>>>> info.
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Client side operations:
> >>>>>>
> >>>>>> open
> >>>>>> write(s)
> >>>>>> close
> >>>>>>
> >>>>>>
> >>>>>> On server side (observed operations):
> >>>>>>
> >>>>>> open
> >>>>>> layoutget's
> >>>>>> close
> >>>>>>
> >>>>>>
> >>>>>> But, I do not see laycommit at all. In terms data written
> >>>>> by client
> >>>>>> it is about 4-5MB.
> >>>>>>
> >>>>>> When does client issue laycommit?
> >>>>>
> >>>>> The latest linux client sends a layout commit when the VFS does a
> >>>>> super_operations.write_inode call which happens when the
> >>> metadata of
> >>>>> an inode needs updating. We are seriously considering removing the
> >>>>> layoutcommit call from the file layout client.
> >>>>>
> >>>>> -->Andy
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> regards,
> >>>>>>
> >>>>>> Sandeep
> >>>>>>
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe
> >>>>> linux-nfs"
> >>>>>> in
> >>>>>> the body of a message to [email protected]
> >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>>>
> >>>>> --
> >>>>> To unsubscribe from this list: send the line "unsubscribe
> >>>>> linux-nfs" in
> >>>>> the body of a message to [email protected]
> >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>>>
> >>>>>
> >>>
> >>>
> >>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
> >
> >
> >




2010-07-07 21:01:24

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

On Wed, 2010-07-07 at 16:39 -0400, [email protected] wrote:
> To bring this discussion full circle, since we agree that a compliant
> server can implement a scheme where written data does not become visible
> until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
> "MUST" from a compliant client (independent of layout type)?

Yes. I would agree that the client cannot rely on the updates being made
visible if it fails to send the LAYOUTCOMMIT. My point was simply that a
compliant server MUST also have a valid strategy for dealing with the
case where the client doesn't send it.

Cheers
Trond

> -Dan
>
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]]
> > On Behalf Of Trond Myklebust
> > Sent: Wednesday, July 07, 2010 7:04 AM
> > To: Benny Halevy
> > Cc: [email protected]; [email protected]; Garth
> > Gibson; Brent Welch; NFSv4
> > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> >
> > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> > <[email protected]> wrote:
> > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> > <[email protected]> wrote:
> > > >>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected] wrote:
> > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS.
> > I see it as
> > > >>>>> orthogonal to updating the metadata on the MDS (but
> > perhaps I'm wrong).
> > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT
> > provides a synchronization
> > > >>>>> point, so even if the non-clustered server does not
> > want to update
> > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also
> > be a trigger to
> > > >>>>> execute whatever synchronization mechanism the
> > implementer wishes to put
> > > >>>>> in the control protocol.
> > > >>>>
> > > >>>> As far as I'm aware, there are no exceptions in
> > RFC5661 that would allow
> > > >>>> pNFS servers to break the rule that any visible change
> > to the data must
> > > >>>> be atomically accompanied with a change attribute update.
> > > >>>>
> > > >>>
> > > >>> Trond, I'm not sure how this rule you mentioned is specified.
> > > >>>
> > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT
> > and change/time_modify
> > > >>> in particular:
> > > >>>
> > > >>> For some layout protocols, the storage device is
> > able to notify the
> > > >>> metadata server of the occurrence of an I/O; as a
> > result, the change
> > > >>> and time_modify attributes may be updated at the
> > metadata server.
> > > >>> For a metadata server that is capable of monitoring
> > updates to the
> > > >>> change and time_modify attributes, LAYOUTCOMMIT
> > processing is not
> > > >>> required to update the change attribute. In this
> > case, the metadata
> > > >>> server must ensure that no further update to the
> > data has occurred
> > > >>> since the last update of the attributes; file-based
> > protocols may
> > > >>> have enough information to make this determination
> > or may update the
> > > >>> change attribute upon each file modification. This
> > also applies for
> > > >>> the time_modify attribute. If the server
> > implementation is able to
> > > >>> determine that the file has not been modified since the last
> > > >>> time_modify update, the server need not update time_modify at
> > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the
> > updated attributes
> > > >>> should be visible if that file was modified since
> > the latest previous
> > > >>> LAYOUTCOMMIT or LAYOUTGET
> > > >>
> > > >> I know. However the above paragraph does not state that
> > the server
> > > >> should make those changes visible to clients other than
> > the one that is
> > > >> writing.
> > > >>
> > > >> Section 18.32.4 states that writes will cause the
> > time_modified and
> > > >> change attributes to be updated (if and only if the file data is
> > > >> modified). Several other sections rely on this
> > behaviour, including
> > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > > >>
> > > >> The only 'special behaviour' that I see allowed for pNFS
> > is in section
> > > >> 13.10, which states that clients can't expect to see changes
> > > >> immediately, but that they must be able to expect close-to-open
> > > >> semantics to work. Again, if this is to be the case,
> > then the server
> > > >> _must_ be able to deal with the case where client 1 dies
> > before it can
> > > >> issue the LAYOUTCOMMIT.
> > >
> > > Agreed.
> > >
> > > >>
> > > >>
> > > >>>> As I see it, if your server allows one client to read
> > data that may have
> > > >>>> been modified by another client that holds a WRITE
> > layout for that range
> > > >>>> then (since that is a visible data change) it should
> > provide a change
> > > >>>> attribute update irrespective of whether or not a
> > LAYOUTCOMMIT has been
> > > >>>> sent.
> > > >>>
> > > >>> the requirement for the server in WRITE's
> > implementation section
> > > >>> is quite weak: "It is assumed that the act of writing
> > data to a file will
> > > >>> cause the time_modified and change attributes of the
> > file to be updated."
> > > >>>
> > > >>> The difference here is that for pNFS the written data
> > is not guaranteed
> > > >>> to be visible until LAYOUTCOMMIT. In a broader sense,
> > assuming the clients
> > > >>> are caching dirty data and use a write-behind cache,
> > application-written data
> > > >>> may be visible to other processes on the same host but
> > not to others until
> > > >>> fsync() or close() - open-to-close semantics are the
> > only thing the client
> > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and
> > close() ensure the
> > > >>> data is committed to stable storage and is visible to
> > all other clients in
> > > >>> the cluster.
> > > >>
> > > >> See above. I'm not disputing your statement that 'the
> > written data is
> > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am
> > disputing an
> > > >> assumption that 'the written data may be visible without
> > an accompanying
> > > >> change attribute update'.
> > > >
> > > >
> > > > In other words, I'd expect the following scenario to give the same
> > > > results in NFSv4.1 w/pNFS as it does in NFSv4:
> > >
> > > That's a strong requirement that may limit the scalability
> > of the server.
> > >
> > > The spirit of the pNFS operations, at least from Panasas
> > perspective was that
> > > the data is transient until LAYOUTCOMMIT, meaning it may or
> > may not be visible
> > > to clients other than the one who wrote it, and its
> > associated metadata MUST
> > > be updated and describe the new data only on LAYOUTCOMMIT
> > and until then it's
> > > undefined, i.e. it's up to the server implementation
> > whether to update it or not.
> > >
> > > Without locking, what do the stronger semantics buy you?
> > > Even if a client verified the change_attribute new data may
> > become visible
> > > at any time after the GETATTR if the file/byte range aren't locked.
> >
> > There is no locking needed in the scenario below: it is ordinary
> > close-to-open semantics.
> >
> > The point is that if you remove the one and only way that clients have
> > to determine whether or not their data caches are valid, then they can
> > no longer cache data at all, and server scalability will be shot to
> > smithereens anyway.
> >
> > Trond
> >
> > > Benny
> > >
> > > >
> > > > Client 1 Client 2
> > > > ======== ========
> > > >
> > > > OPEN foo
> > > > READ
> > > > CLOSE
> > > > OPEN
> > > > LAYOUTGET ...
> > > > WRITE via DS
> > > > <dies>...
> > > > OPEN foo
> > > > verify change_attr
> > > > READ if above WRITE is visible
> > > > CLOSE
> > > >
> > > > Trond
> > > > _______________________________________________
> > > > nfsv4 mailing list
> > > > [email protected]
> > > > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> >
> > _______________________________________________
> > nfsv4 mailing list
> > [email protected]
> > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> >


_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-07 13:52:02

by Benny Halevy

[permalink] [raw]
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

On Jul. 07, 2010, 16:18 +0300, Trond Myklebust <[email protected]> wrote:
> On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust <[email protected]> wrote:
>>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected] wrote:
>>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as
>>>>> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong).
>>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization
>>>>> point, so even if the non-clustered server does not want to update
>>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to
>>>>> execute whatever synchronization mechanism the implementer wishes to put
>>>>> in the control protocol.
>>>>
>>>> As far as I'm aware, there are no exceptions in RFC5661 that would allow
>>>> pNFS servers to break the rule that any visible change to the data must
>>>> be atomically accompanied with a change attribute update.
>>>>
>>>
>>> Trond, I'm not sure how this rule you mentioned is specified.
>>>
>>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify
>>> in particular:
>>>
>>> For some layout protocols, the storage device is able to notify the
>>> metadata server of the occurrence of an I/O; as a result, the change
>>> and time_modify attributes may be updated at the metadata server.
>>> For a metadata server that is capable of monitoring updates to the
>>> change and time_modify attributes, LAYOUTCOMMIT processing is not
>>> required to update the change attribute. In this case, the metadata
>>> server must ensure that no further update to the data has occurred
>>> since the last update of the attributes; file-based protocols may
>>> have enough information to make this determination or may update the
>>> change attribute upon each file modification. This also applies for
>>> the time_modify attribute. If the server implementation is able to
>>> determine that the file has not been modified since the last
>>> time_modify update, the server need not update time_modify at
>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes
>>> should be visible if that file was modified since the latest previous
>>> LAYOUTCOMMIT or LAYOUTGET
>>
>> I know. However the above paragraph does not state that the server
>> should make those changes visible to clients other than the one that is
>> writing.
>>
>> Section 18.32.4 states that writes will cause the time_modified and
>> change attributes to be updated (if and only if the file data is
>> modified). Several other sections rely on this behaviour, including
>> section 10.3.1, section 11.7.2.2, and section 11.7.7.
>>
>> The only 'special behaviour' that I see allowed for pNFS is in section
>> 13.10, which states that clients can't expect to see changes
>> immediately, but that they must be able to expect close-to-open
>> semantics to work. Again, if this is to be the case, then the server
>> _must_ be able to deal with the case where client 1 dies before it can
>> issue the LAYOUTCOMMIT.

Agreed.

>>
>>
>>>> As I see it, if your server allows one client to read data that may have
>>>> been modified by another client that holds a WRITE layout for that range
>>>> then (since that is a visible data change) it should provide a change
>>>> attribute update irrespective of whether or not a LAYOUTCOMMIT has been
>>>> sent.
>>>
>>> the requirement for the server in WRITE's implementation section
>>> is quite weak: "It is assumed that the act of writing data to a file will
>>> cause the time_modified and change attributes of the file to be updated."
>>>
>>> The difference here is that for pNFS the written data is not guaranteed
>>> to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients
>>> are caching dirty data and use a write-behind cache, application-written data
>>> may be visible to other processes on the same host but not to others until
>>> fsync() or close() - open-to-close semantics are the only thing the client
>>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the
>>> data is committed to stable storage and is visible to all other clients in
>>> the cluster.
>>
>> See above. I'm not disputing your statement that 'the written data is
>> not guaranteed to be visible until LAYOUTCOMMIT'. I am disputing an
>> assumption that 'the written data may be visible without an accompanying
>> change attribute update'.
>
>
> In other words, I'd expect the following scenario to give the same
> results in NFSv4.1 w/pNFS as it does in NFSv4:

That's a strong requirement that may limit the scalability of the server.

The spirit of the pNFS operations, at least from Panasas perspective was that
the data is transient until LAYOUTCOMMIT, meaning it may or may not be visible
to clients other than the one who wrote it, and its associated metadata MUST
be updated and describe the new data only on LAYOUTCOMMIT and until then it's
undefined, i.e. it's up to the server implementation whether to update it or not.

Without locking, what do the stronger semantics buy you?
Even if a client verified the change_attribute new data may become visible
at any time after the GETATTR if the file/byte range aren't locked.

Benny

>
> Client 1 Client 2
> ======== ========
>
> OPEN foo
> READ
> CLOSE
> OPEN
> LAYOUTGET ...
> WRITE via DS
> <dies>...
> OPEN foo
> verify change_attr
> READ if above WRITE is visible
> CLOSE
>
> Trond
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-07 14:03:36

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust <[email protected]> wrote:
> > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust <[email protected]> wrote:
> >>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected] wrote:
> >>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as
> >>>>> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong).
> >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization
> >>>>> point, so even if the non-clustered server does not want to update
> >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to
> >>>>> execute whatever synchronization mechanism the implementer wishes to put
> >>>>> in the control protocol.
> >>>>
> >>>> As far as I'm aware, there are no exceptions in RFC5661 that would allow
> >>>> pNFS servers to break the rule that any visible change to the data must
> >>>> be atomically accompanied with a change attribute update.
> >>>>
> >>>
> >>> Trond, I'm not sure how this rule you mentioned is specified.
> >>>
> >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify
> >>> in particular:
> >>>
> >>> For some layout protocols, the storage device is able to notify the
> >>> metadata server of the occurrence of an I/O; as a result, the change
> >>> and time_modify attributes may be updated at the metadata server.
> >>> For a metadata server that is capable of monitoring updates to the
> >>> change and time_modify attributes, LAYOUTCOMMIT processing is not
> >>> required to update the change attribute. In this case, the metadata
> >>> server must ensure that no further update to the data has occurred
> >>> since the last update of the attributes; file-based protocols may
> >>> have enough information to make this determination or may update the
> >>> change attribute upon each file modification. This also applies for
> >>> the time_modify attribute. If the server implementation is able to
> >>> determine that the file has not been modified since the last
> >>> time_modify update, the server need not update time_modify at
> >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes
> >>> should be visible if that file was modified since the latest previous
> >>> LAYOUTCOMMIT or LAYOUTGET
> >>
> >> I know. However the above paragraph does not state that the server
> >> should make those changes visible to clients other than the one that is
> >> writing.
> >>
> >> Section 18.32.4 states that writes will cause the time_modified and
> >> change attributes to be updated (if and only if the file data is
> >> modified). Several other sections rely on this behaviour, including
> >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> >>
> >> The only 'special behaviour' that I see allowed for pNFS is in section
> >> 13.10, which states that clients can't expect to see changes
> >> immediately, but that they must be able to expect close-to-open
> >> semantics to work. Again, if this is to be the case, then the server
> >> _must_ be able to deal with the case where client 1 dies before it can
> >> issue the LAYOUTCOMMIT.
>
> Agreed.
>
> >>
> >>
> >>>> As I see it, if your server allows one client to read data that may have
> >>>> been modified by another client that holds a WRITE layout for that range
> >>>> then (since that is a visible data change) it should provide a change
> >>>> attribute update irrespective of whether or not a LAYOUTCOMMIT has been
> >>>> sent.
> >>>
> >>> the requirement for the server in WRITE's implementation section
> >>> is quite weak: "It is assumed that the act of writing data to a file will
> >>> cause the time_modified and change attributes of the file to be updated."
> >>>
> >>> The difference here is that for pNFS the written data is not guaranteed
> >>> to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients
> >>> are caching dirty data and use a write-behind cache, application-written data
> >>> may be visible to other processes on the same host but not to others until
> >>> fsync() or close() - open-to-close semantics are the only thing the client
> >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the
> >>> data is committed to stable storage and is visible to all other clients in
> >>> the cluster.
> >>
> >> See above. I'm not disputing your statement that 'the written data is
> >> not guaranteed to be visible until LAYOUTCOMMIT'. I am disputing an
> >> assumption that 'the written data may be visible without an accompanying
> >> change attribute update'.
> >
> >
> > In other words, I'd expect the following scenario to give the same
> > results in NFSv4.1 w/pNFS as it does in NFSv4:
>
> That's a strong requirement that may limit the scalability of the server.
>
> The spirit of the pNFS operations, at least from Panasas perspective was that
> the data is transient until LAYOUTCOMMIT, meaning it may or may not be visible
> to clients other than the one who wrote it, and its associated metadata MUST
> be updated and describe the new data only on LAYOUTCOMMIT and until then it's
> undefined, i.e. it's up to the server implementation whether to update it or not.
>
> Without locking, what do the stronger semantics buy you?
> Even if a client verified the change_attribute new data may become visible
> at any time after the GETATTR if the file/byte range aren't locked.

There is no locking needed in the scenario below: it is ordinary
close-to-open semantics.

The point is that if you remove the one and only way that clients have
to determine whether or not their data caches are valid, then they can
no longer cache data at all, and server scalability will be shot to
smithereens anyway.

Trond

> Benny
>
> >
> > Client 1 Client 2
> > ======== ========
> >
> > OPEN foo
> > READ
> > CLOSE
> > OPEN
> > LAYOUTGET ...
> > WRITE via DS
> > <dies>...
> > OPEN foo
> > verify change_attr
> > READ if above WRITE is visible
> > CLOSE
> >
> > Trond
> > _______________________________________________
> > nfsv4 mailing list
> > [email protected]
> > https://www.ietf.org/mailman/listinfo/nfsv4


_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-06 23:23:35

by Trond Myklebust

[permalink] [raw]
Subject: RE: 4.1 client - LAYOUTCOMMIT & close

On Tue, 2010-07-06 at 18:50 -0400, [email protected] wrote:

> As we've discussed before, until a LAYOUTCOMMIT occurs, new data may or
> may not be visible to clients.
>
> Suppose my server takes the approach that a COMMIT guarantees that data
> is written to a persistent intent log in NVRAM. On LAYOUTCOMMIT, file
> data is updated from NVRAM and there is a change attribute update
> (atomic). A client that does not issue LAYOUTCOMMITs will not be able
> to write data.

That's fine unless you make those updates visible to other clients. It's
a rather expensive way of solving the problem, though.

> If every WRITE to a DS has to atomically update metadata on the MDS,
> perhaps we could improve performance by co-locating data and metadata on
> a single server [1/2 :-)]

You only need to update the metadata when someone requests a change
attribute or mtime through a GETATTR request to the MDS, so it shouldn't
be that difficult to implement.

> >
> > As I see it, if your server allows one client to read data
> > that may have
> > been modified by another client that holds a WRITE layout for
> > that range
> > then (since that is a visible data change) it should provide a change
> > attribute update irrespective of whether or not a
> > LAYOUTCOMMIT has been
> > sent.
> > If your MDS is incapable of determining whether or not data
> > has changed
> > on the DSes, then it should probably recall the WRITE layout
> > if someone
> > tries to read data that may have been modified. Said server
> > also needs a
> > strategy for determining if a data change occurred if the client that
> > held the WRITE layout died before it could send the LAYOUTCOMMIT.
>
> Sounds like you're suggesting treating layouts as capabilities in the
> files case, which is one way to solve the problem. Is anyone doing
> this, or are the files implementations still all treating layouts as
> simply data locators?

You shouldn't need it if you have a control protocol that conforms to
the definition in section 12.2.6.

Cheers
Trond

> >
> > Cheers
> > Trond
> >
> > > -Dan
> > >
> > > > -----Original Message-----
> > > > From: Andy Adamson [mailto:[email protected]]
> > > > Sent: Tuesday, July 06, 2010 6:38 AM
> > > > To: Muntz, Daniel
> > > > Cc: [email protected]; [email protected];
> > [email protected]
> > > > Subject: Re: 4.1 client - LAYOUTCOMMIT & close
> > > >
> > > >
> > > > On Jul 2, 2010, at 5:46 PM, <[email protected]> wrote:
> > > >
> > > > > By "extremely lame server" I assume you mean any pNFS
> > server that
> > > > > doesn't have a cluster FS on the back end.
> > > >
> > > > No, I mean a pNFS file layout type server that depends upon
> > > > the 'hint'
> > > > of file size given by LAYOUTCOMMIT. This does not mean
> > that the file
> > > > system has to be a cluster FS.
> > > >
> > > > If COMMIT through MDS is set, the MDS to DS protocol (be it a
> > > > cluster
> > > > FS or not) ensures the data is "commited" on the DSs.
> > > > LAYOUTCOMMIT is
> > > > not needed.
> > > >
> > > > If COMMITs are sent to the DSs (or FILE_SYNC writes), then
> > > > the MDS to
> > > > DS protocol (be it a cluster FS or not) should kick off a
> > > > back-end DS
> > > > to MDS communication to update the file size on the MDS.
> > > >
> > > > What I consider an 'extremely lame pNFS file layout
> > server' is one
> > > > that requires COMMITs to the DS and then depends upon the
> > > > LAYOUTCOMMIT
> > > > to communicate the commited data size to the MDS.
> > > >
> > > > -->Andy
> > > >
> > > >
> > > > > So while this might work
> > > > > well for NetApp (as long as NetApp never ships a non-clustered
> > > > > pNFS), it
> > > > > might break others, or at least severely impact their
> > > > performance.
> > > > > For
> > > > > example, will the Solaris pNFS server work correctly without
> > > > > LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate
> > > > > LAYOUTCOMMIT,
> > > > > but the server is free to handle it as a no-op if the server
> > > > > implementation does not need to utilize the payload.
> > > > >
> > > > > -Dan
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: [email protected]
> > > > >> [mailto:[email protected]] On Behalf Of
> > Andy Adamson
> > > > >> Sent: Friday, July 02, 2010 8:41 AM
> > > > >> To: Sandeep Joshi
> > > > >> Cc: [email protected]; [email protected]
> > > > >> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
> > > > >>
> > > > >>
> > > > >> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote:
> > > > >>
> > > > >> Hi Sandeep
> > > > >>
> > > > >>>
> > > > >>> In certain cases, I don't see layoutcommit on a file
> > at all even
> > > > >>> after doing many writes.
> > > > >>
> > > > >> FYI:
> > > > >>
> > > > >> You should not be paying attention to layoutcommits -
> > they have no
> > > > >> value for the file layout type.
> > > > >>
> > > > >> From RFC 5661:
> > > > >>
> > > > >> "The LAYOUTCOMMIT operation commits chages in the layout
> > > > represented
> > > > >> by the current filehandle, client ID (derived from the
> > > > session ID in
> > > > >> the preceding SEQUENCE operation), byte-range, and stateid."
> > > > >>
> > > > >> For the block layout type, this sentence has meaning in that
> > > > >> there is
> > > > >> a layoutupdate4 payload that enumerates the blocks that
> > > > have changed
> > > > >> state from being 'handed out' to being 'written'.
> > > > >>
> > > > >> The file layout type has no layoutupdate4 payload, and the
> > > > >> layout does
> > > > >> not change due to writes, and thus the LAYOUTCOMMIT call
> > > > is useless.
> > > > >>
> > > > >> The only field in the LAYOUTCOMMIT4args that might possibly
> > > > >> be useful
> > > > >> is the loca_last_write_offset which tells the server what
> > > > the client
> > > > >> thinks is the EOF of the file after WRITE. It is an
> > extremely lame
> > > > >> server (file layout type server) that depends upon
> > clients for this
> > > > >> info.
> > > > >>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> Client side operations:
> > > > >>>
> > > > >>> open
> > > > >>> write(s)
> > > > >>> close
> > > > >>>
> > > > >>>
> > > > >>> On server side (observed operations):
> > > > >>>
> > > > >>> open
> > > > >>> layoutget's
> > > > >>> close
> > > > >>>
> > > > >>>
> > > > >>> But, I do not see laycommit at all. In terms data written
> > > > >> by client
> > > > >>> it is about 4-5MB.
> > > > >>>
> > > > >>> When does client issue laycommit?
> > > > >>
> > > > >> The latest linux client sends a layout commit when the
> > VFS does a
> > > > >> super_operations.write_inode call which happens when the
> > > > metadata of
> > > > >> an inode needs updating. We are seriously considering
> > removing the
> > > > >> layoutcommit call from the file layout client.
> > > > >>
> > > > >> -->Andy
> > > > >>
> > > > >>>
> > > > >>>
> > > > >>> regards,
> > > > >>>
> > > > >>> Sandeep
> > > > >>>
> > > > >>> --
> > > > >>> To unsubscribe from this list: send the line "unsubscribe
> > > > >> linux-nfs"
> > > > >>> in
> > > > >>> the body of a message to [email protected]
> > > > >>> More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
> > > > >>
> > > > >> --
> > > > >> To unsubscribe from this list: send the line "unsubscribe
> > > > >> linux-nfs" in
> > > > >> the body of a message to [email protected]
> > > > >> More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
> > > > >>
> > > > >>
> > > >
> > > >
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > linux-nfs" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
> >
> >
> >
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html




2010-07-07 22:27:58

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

On Wed, 2010-07-07 at 18:04 -0400, [email protected] wrote:
> > Yes. I would agree that the client cannot rely on the updates being
> made
> > visible if it fails to send the LAYOUTCOMMIT. My point was simply that
> a
> > compliant server MUST also have a valid strategy for dealing with the
> > case where the client doesn't send it.
>
> So you are saying the updates "MUST be made visible" through the
> server's valid strategy. Is that right.
>
> And that the client cannot rely on that. Why not, if the server must
> have a valid strategy.
>
> Is this just prudent "belt and suspenders" design or what?
>
> It seems to me that if one side here is MUST (and the spec needs to be
> clearer about what might or might not constitute a valid strategy), then
> the other side should be SHOULD.
>
> If both sides are "MUST", then if things don't work out then the client
> and server can equally point to one another and say "It's his fault".
>
> Am I missing something here?

See the example at the very bottom of this email. If the client dies
after it has written data to the data servers, but before it can issue
LAYOUTCOMMIT, then the server needs to have a strategy for dealing with
that. Either it has to figure out that changes have been made, and to
update the change attribute so that close-to-open cache consistency
works, or it needs to ensure that those changes are not made visible.

A "solution" where the file data changes, but the client can't detect it
is not acceptable.

Trond

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf
> Of Trond Myklebust
> Sent: Wednesday, July 07, 2010 5:01 PM
> To: Muntz, Daniel
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
>
> On Wed, 2010-07-07 at 16:39 -0400, [email protected] wrote:
> > To bring this discussion full circle, since we agree that a compliant
> > server can implement a scheme where written data does not become
> visible
> > until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
> > "MUST" from a compliant client (independent of layout type)?
>
> Yes. I would agree that the client cannot rely on the updates being made
> visible if it fails to send the LAYOUTCOMMIT. My point was simply that a
> compliant server MUST also have a valid strategy for dealing with the
> case where the client doesn't send it.
>
> Cheers
> Trond
>
> > -Dan
> >
> > > -----Original Message-----
> > > From: [email protected] [mailto:[email protected]]
> > > On Behalf Of Trond Myklebust
> > > Sent: Wednesday, July 07, 2010 7:04 AM
> > > To: Benny Halevy
> > > Cc: [email protected]; [email protected]; Garth
> > > Gibson; Brent Welch; NFSv4
> > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > >
> > > On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > > > On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> > > <[email protected]> wrote:
> > > > > On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > > > >> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > > > >>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> > > <[email protected]> wrote:
> > > > >>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected]
> wrote:
> > > > >>>>> The COMMIT to the DS, ttbomk, commits data on the DS.
> > > I see it as
> > > > >>>>> orthogonal to updating the metadata on the MDS (but
> > > perhaps I'm wrong).
> > > > >>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT
> > > provides a synchronization
> > > > >>>>> point, so even if the non-clustered server does not
> > > want to update
> > > > >>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also
> > > be a trigger to
> > > > >>>>> execute whatever synchronization mechanism the
> > > implementer wishes to put
> > > > >>>>> in the control protocol.
> > > > >>>>
> > > > >>>> As far as I'm aware, there are no exceptions in
> > > RFC5661 that would allow
> > > > >>>> pNFS servers to break the rule that any visible change
> > > to the data must
> > > > >>>> be atomically accompanied with a change attribute update.
> > > > >>>>
> > > > >>>
> > > > >>> Trond, I'm not sure how this rule you mentioned is specified.
> > > > >>>
> > > > >>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT
> > > and change/time_modify
> > > > >>> in particular:
> > > > >>>
> > > > >>> For some layout protocols, the storage device is
> > > able to notify the
> > > > >>> metadata server of the occurrence of an I/O; as a
> > > result, the change
> > > > >>> and time_modify attributes may be updated at the
> > > metadata server.
> > > > >>> For a metadata server that is capable of monitoring
> > > updates to the
> > > > >>> change and time_modify attributes, LAYOUTCOMMIT
> > > processing is not
> > > > >>> required to update the change attribute. In this
> > > case, the metadata
> > > > >>> server must ensure that no further update to the
> > > data has occurred
> > > > >>> since the last update of the attributes; file-based
> > > protocols may
> > > > >>> have enough information to make this determination
> > > or may update the
> > > > >>> change attribute upon each file modification. This
> > > also applies for
> > > > >>> the time_modify attribute. If the server
> > > implementation is able to
> > > > >>> determine that the file has not been modified since the
> last
> > > > >>> time_modify update, the server need not update time_modify
> at
> > > > >>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the
> > > updated attributes
> > > > >>> should be visible if that file was modified since
> > > the latest previous
> > > > >>> LAYOUTCOMMIT or LAYOUTGET
> > > > >>
> > > > >> I know. However the above paragraph does not state that
> > > the server
> > > > >> should make those changes visible to clients other than
> > > the one that is
> > > > >> writing.
> > > > >>
> > > > >> Section 18.32.4 states that writes will cause the
> > > time_modified and
> > > > >> change attributes to be updated (if and only if the file data
> is
> > > > >> modified). Several other sections rely on this
> > > behaviour, including
> > > > >> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > > > >>
> > > > >> The only 'special behaviour' that I see allowed for pNFS
> > > is in section
> > > > >> 13.10, which states that clients can't expect to see changes
> > > > >> immediately, but that they must be able to expect close-to-open
> > > > >> semantics to work. Again, if this is to be the case,
> > > then the server
> > > > >> _must_ be able to deal with the case where client 1 dies
> > > before it can
> > > > >> issue the LAYOUTCOMMIT.
> > > >
> > > > Agreed.
> > > >
> > > > >>
> > > > >>
> > > > >>>> As I see it, if your server allows one client to read
> > > data that may have
> > > > >>>> been modified by another client that holds a WRITE
> > > layout for that range
> > > > >>>> then (since that is a visible data change) it should
> > > provide a change
> > > > >>>> attribute update irrespective of whether or not a
> > > LAYOUTCOMMIT has been
> > > > >>>> sent.
> > > > >>>
> > > > >>> the requirement for the server in WRITE's
> > > implementation section
> > > > >>> is quite weak: "It is assumed that the act of writing
> > > data to a file will
> > > > >>> cause the time_modified and change attributes of the
> > > file to be updated."
> > > > >>>
> > > > >>> The difference here is that for pNFS the written data
> > > is not guaranteed
> > > > >>> to be visible until LAYOUTCOMMIT. In a broader sense,
> > > assuming the clients
> > > > >>> are caching dirty data and use a write-behind cache,
> > > application-written data
> > > > >>> may be visible to other processes on the same host but
> > > not to others until
> > > > >>> fsync() or close() - open-to-close semantics are the
> > > only thing the client
> > > > >>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and
> > > close() ensure the
> > > > >>> data is committed to stable storage and is visible to
> > > all other clients in
> > > > >>> the cluster.
> > > > >>
> > > > >> See above. I'm not disputing your statement that 'the
> > > written data is
> > > > >> not guaranteed to be visible until LAYOUTCOMMIT'. I am
> > > disputing an
> > > > >> assumption that 'the written data may be visible without
> > > an accompanying
> > > > >> change attribute update'.
> > > > >
> > > > >
> > > > > In other words, I'd expect the following scenario to give the
> same
> > > > > results in NFSv4.1 w/pNFS as it does in NFSv4:
> > > >
> > > > That's a strong requirement that may limit the scalability
> > > of the server.
> > > >
> > > > The spirit of the pNFS operations, at least from Panasas
> > > perspective was that
> > > > the data is transient until LAYOUTCOMMIT, meaning it may or
> > > may not be visible
> > > > to clients other than the one who wrote it, and its
> > > associated metadata MUST
> > > > be updated and describe the new data only on LAYOUTCOMMIT
> > > and until then it's
> > > > undefined, i.e. it's up to the server implementation
> > > whether to update it or not.
> > > >
> > > > Without locking, what do the stronger semantics buy you?
> > > > Even if a client verified the change_attribute new data may
> > > become visible
> > > > at any time after the GETATTR if the file/byte range aren't
> locked.
> > >
> > > There is no locking needed in the scenario below: it is ordinary
> > > close-to-open semantics.
> > >
> > > The point is that if you remove the one and only way that clients
> have
> > > to determine whether or not their data caches are valid, then they
> can
> > > no longer cache data at all, and server scalability will be shot to
> > > smithereens anyway.
> > >
> > > Trond
> > >
> > > > Benny
> > > >
> > > > >
> > > > > Client 1 Client 2
> > > > > ======== ========
> > > > >
> > > > > OPEN foo
> > > > > READ
> > > > > CLOSE
> > > > > OPEN
> > > > > LAYOUTGET ...
> > > > > WRITE via DS
> > > > > <dies>...
> > > > > OPEN foo
> > > > > verify change_attr
> > > > > READ if above WRITE is visible
> > > > > CLOSE
> > > > >
> > > > > Trond
> > > > > _______________________________________________
> > > > > nfsv4 mailing list
> > > > > [email protected]
> > > > > https://www.ietf.org/mailman/listinfo/nfsv4
> > >
> > >
> > > _______________________________________________
> > > nfsv4 mailing list
> > > [email protected]
> > > https://www.ietf.org/mailman/listinfo/nfsv4
> > >
> > >
>
>
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4
>


_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-06 13:37:48

by Andy Adamson

[permalink] [raw]
Subject: Re: 4.1 client - LAYOUTCOMMIT & close


On Jul 2, 2010, at 5:46 PM, <[email protected]> wrote:

> By "extremely lame server" I assume you mean any pNFS server that
> doesn't have a cluster FS on the back end.

No, I mean a pNFS file layout type server that depends upon the 'hint'
of file size given by LAYOUTCOMMIT. This does not mean that the file
system has to be a cluster FS.

If COMMIT through MDS is set, the MDS to DS protocol (be it a cluster
FS or not) ensures the data is "commited" on the DSs. LAYOUTCOMMIT is
not needed.

If COMMITs are sent to the DSs (or FILE_SYNC writes), then the MDS to
DS protocol (be it a cluster FS or not) should kick off a back-end DS
to MDS communication to update the file size on the MDS.

What I consider an 'extremely lame pNFS file layout server' is one
that requires COMMITs to the DS and then depends upon the LAYOUTCOMMIT
to communicate the commited data size to the MDS.

-->Andy


> So while this might work
> well for NetApp (as long as NetApp never ships a non-clustered
> pNFS), it
> might break others, or at least severely impact their performance.
> For
> example, will the Solaris pNFS server work correctly without
> LAYOUTCOMMIT? IMHO, the client MUST issue the appropriate
> LAYOUTCOMMIT,
> but the server is free to handle it as a no-op if the server
> implementation does not need to utilize the payload.
>
> -Dan
>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Andy Adamson
>> Sent: Friday, July 02, 2010 8:41 AM
>> To: Sandeep Joshi
>> Cc: [email protected]; [email protected]
>> Subject: Re: 4.1 client - LAYOUTCOMMIT & close
>>
>>
>> On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote:
>>
>> Hi Sandeep
>>
>>>
>>> In certain cases, I don't see layoutcommit on a file at all even
>>> after doing many writes.
>>
>> FYI:
>>
>> You should not be paying attention to layoutcommits - they have no
>> value for the file layout type.
>>
>> From RFC 5661:
>>
>> "The LAYOUTCOMMIT operation commits chages in the layout represented
>> by the current filehandle, client ID (derived from the session ID in
>> the preceding SEQUENCE operation), byte-range, and stateid."
>>
>> For the block layout type, this sentence has meaning in that
>> there is
>> a layoutupdate4 payload that enumerates the blocks that have changed
>> state from being 'handed out' to being 'written'.
>>
>> The file layout type has no layoutupdate4 payload, and the
>> layout does
>> not change due to writes, and thus the LAYOUTCOMMIT call is useless.
>>
>> The only field in the LAYOUTCOMMIT4args that might possibly
>> be useful
>> is the loca_last_write_offset which tells the server what the client
>> thinks is the EOF of the file after WRITE. It is an extremely lame
>> server (file layout type server) that depends upon clients for this
>> info.
>>
>>>
>>>
>>>
>>> Client side operations:
>>>
>>> open
>>> write(s)
>>> close
>>>
>>>
>>> On server side (observed operations):
>>>
>>> open
>>> layoutget's
>>> close
>>>
>>>
>>> But, I do not see laycommit at all. In terms data written
>> by client
>>> it is about 4-5MB.
>>>
>>> When does client issue laycommit?
>>
>> The latest linux client sends a layout commit when the VFS does a
>> super_operations.write_inode call which happens when the metadata of
>> an inode needs updating. We are seriously considering removing the
>> layoutcommit call from the file layout client.
>>
>> -->Andy
>>
>>>
>>>
>>> regards,
>>>
>>> Sandeep
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>> linux-nfs"
>>> in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>


2010-07-07 17:45:24

by Dean

[permalink] [raw]
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close



On 7/7/2010 7:03 AM, Trond Myklebust wrote:
> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
>
>> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust<[email protected]> wrote:
>>
>>> On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
>>>
>>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
>>>>
>>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust<[email protected]> wrote:
>>>>>
>>>>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected] wrote:
>>>>>>
>>>>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as
>>>>>>> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong).
>>>>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization
>>>>>>> point, so even if the non-clustered server does not want to update
>>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to
>>>>>>> execute whatever synchronization mechanism the implementer wishes to put
>>>>>>> in the control protocol.
>>>>>>>
>>>>>> As far as I'm aware, there are no exceptions in RFC5661 that would allow
>>>>>> pNFS servers to break the rule that any visible change to the data must
>>>>>> be atomically accompanied with a change attribute update.
>>>>>>
>>>>>>
>>>>> Trond, I'm not sure how this rule you mentioned is specified.
>>>>>
>>>>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify
>>>>> in particular:
>>>>>
>>>>> For some layout protocols, the storage device is able to notify the
>>>>> metadata server of the occurrence of an I/O; as a result, the change
>>>>> and time_modify attributes may be updated at the metadata server.
>>>>> For a metadata server that is capable of monitoring updates to the
>>>>> change and time_modify attributes, LAYOUTCOMMIT processing is not
>>>>> required to update the change attribute. In this case, the metadata
>>>>> server must ensure that no further update to the data has occurred
>>>>> since the last update of the attributes; file-based protocols may
>>>>> have enough information to make this determination or may update the
>>>>> change attribute upon each file modification. This also applies for
>>>>> the time_modify attribute. If the server implementation is able to
>>>>> determine that the file has not been modified since the last
>>>>> time_modify update, the server need not update time_modify at
>>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes
>>>>> should be visible if that file was modified since the latest previous
>>>>> LAYOUTCOMMIT or LAYOUTGET
>>>>>
>>>> I know. However the above paragraph does not state that the server
>>>> should make those changes visible to clients other than the one that is
>>>> writing.
>>>>
>>>> Section 18.32.4 states that writes will cause the time_modified and
>>>> change attributes to be updated (if and only if the file data is
>>>> modified). Several other sections rely on this behaviour, including
>>>> section 10.3.1, section 11.7.2.2, and section 11.7.7.
>>>>
>>>> The only 'special behaviour' that I see allowed for pNFS is in section
>>>> 13.10, which states that clients can't expect to see changes
>>>> immediately, but that they must be able to expect close-to-open
>>>> semantics to work. Again, if this is to be the case, then the server
>>>> _must_ be able to deal with the case where client 1 dies before it can
>>>> issue the LAYOUTCOMMIT.
>>>>
>> Agreed.
>>
>>
>>>>
>>>>
>>>>>> As I see it, if your server allows one client to read data that may have
>>>>>> been modified by another client that holds a WRITE layout for that range
>>>>>> then (since that is a visible data change) it should provide a change
>>>>>> attribute update irrespective of whether or not a LAYOUTCOMMIT has been
>>>>>> sent.
>>>>>>
>>>>> the requirement for the server in WRITE's implementation section
>>>>> is quite weak: "It is assumed that the act of writing data to a file will
>>>>> cause the time_modified and change attributes of the file to be updated."
>>>>>
>>>>> The difference here is that for pNFS the written data is not guaranteed
>>>>> to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients
>>>>> are caching dirty data and use a write-behind cache, application-written data
>>>>> may be visible to other processes on the same host but not to others until
>>>>> fsync() or close() - open-to-close semantics are the only thing the client
>>>>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the
>>>>> data is committed to stable storage and is visible to all other clients in
>>>>> the cluster.
>>>>>
>>>> See above. I'm not disputing your statement that 'the written data is
>>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am disputing an
>>>> assumption that 'the written data may be visible without an accompanying
>>>> change attribute update'.
>>>>
>>>
>>> In other words, I'd expect the following scenario to give the same
>>> results in NFSv4.1 w/pNFS as it does in NFSv4:
>>>
>> That's a strong requirement that may limit the scalability of the server.
>>
>> The spirit of the pNFS operations, at least from Panasas perspective was that
>> the data is transient until LAYOUTCOMMIT, meaning it may or may not be visible
>> to clients other than the one who wrote it, and its associated metadata MUST
>> be updated and describe the new data only on LAYOUTCOMMIT and until then it's
>> undefined, i.e. it's up to the server implementation whether to update it or not.
>>
>> Without locking, what do the stronger semantics buy you?
>> Even if a client verified the change_attribute new data may become visible
>> at any time after the GETATTR if the file/byte range aren't locked.
>>
> There is no locking needed in the scenario below: it is ordinary
> close-to-open semantics.
>
> The point is that if you remove the one and only way that clients have
> to determine whether or not their data caches are valid, then they can
> no longer cache data at all, and server scalability will be shot to
> smithereens anyway.
>

It would seem that when the change_attr is changed depends on the server
implementation. If the
server implementation promises NOT to modify the file in place on a
write, then it can postpone
updating the change_attr until LAYOUTCOMMIT (at which time the actual
file data is updated). If
not, meaning that if client 1 can see the write by client 2 in the
example below, then the change_attr
should be updated on every write (I would guess it would only be updated
when some server actually
requested it)
Dean
> Trond
>
>
>> Benny
>>
>>
>>> Client 1 Client 2
>>> ======== ========
>>>
>>> OPEN foo
>>> READ
>>> CLOSE
>>> OPEN
>>> LAYOUTGET ...
>>> WRITE via DS
>>> <dies>...
>>> OPEN foo
>>> verify change_attr
>>> READ if above WRITE is visible
>>> CLOSE
>>>
>>> Trond
>>> _______________________________________________
>>> nfsv4 mailing list
>>> [email protected]
>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>
>
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4
>

2010-07-07 13:18:16

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > On Jul. 06, 2010, 23:40 +0300, Trond Myklebust <[email protected]> wrote:
> > > On Tue, 2010-07-06 at 15:20 -0400, [email protected] wrote:
> > >> The COMMIT to the DS, ttbomk, commits data on the DS. I see it as
> > >> orthogonal to updating the metadata on the MDS (but perhaps I'm wrong).
> > >> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a synchronization
> > >> point, so even if the non-clustered server does not want to update
> > >> metadata on every DS I/O, the LAYOUTCOMMIT could also be a trigger to
> > >> execute whatever synchronization mechanism the implementer wishes to put
> > >> in the control protocol.
> > >
> > > As far as I'm aware, there are no exceptions in RFC5661 that would allow
> > > pNFS servers to break the rule that any visible change to the data must
> > > be atomically accompanied with a change attribute update.
> > >
> >
> > Trond, I'm not sure how this rule you mentioned is specified.
> >
> > See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and change/time_modify
> > in particular:
> >
> > For some layout protocols, the storage device is able to notify the
> > metadata server of the occurrence of an I/O; as a result, the change
> > and time_modify attributes may be updated at the metadata server.
> > For a metadata server that is capable of monitoring updates to the
> > change and time_modify attributes, LAYOUTCOMMIT processing is not
> > required to update the change attribute. In this case, the metadata
> > server must ensure that no further update to the data has occurred
> > since the last update of the attributes; file-based protocols may
> > have enough information to make this determination or may update the
> > change attribute upon each file modification. This also applies for
> > the time_modify attribute. If the server implementation is able to
> > determine that the file has not been modified since the last
> > time_modify update, the server need not update time_modify at
> > LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated attributes
> > should be visible if that file was modified since the latest previous
> > LAYOUTCOMMIT or LAYOUTGET
>
> I know. However the above paragraph does not state that the server
> should make those changes visible to clients other than the one that is
> writing.
>
> Section 18.32.4 states that writes will cause the time_modified and
> change attributes to be updated (if and only if the file data is
> modified). Several other sections rely on this behaviour, including
> section 10.3.1, section 11.7.2.2, and section 11.7.7.
>
> The only 'special behaviour' that I see allowed for pNFS is in section
> 13.10, which states that clients can't expect to see changes
> immediately, but that they must be able to expect close-to-open
> semantics to work. Again, if this is to be the case, then the server
> _must_ be able to deal with the case where client 1 dies before it can
> issue the LAYOUTCOMMIT.
>
>
> > > As I see it, if your server allows one client to read data that may have
> > > been modified by another client that holds a WRITE layout for that range
> > > then (since that is a visible data change) it should provide a change
> > > attribute update irrespective of whether or not a LAYOUTCOMMIT has been
> > > sent.
> >
> > the requirement for the server in WRITE's implementation section
> > is quite weak: "It is assumed that the act of writing data to a file will
> > cause the time_modified and change attributes of the file to be updated."
> >
> > The difference here is that for pNFS the written data is not guaranteed
> > to be visible until LAYOUTCOMMIT. In a broader sense, assuming the clients
> > are caching dirty data and use a write-behind cache, application-written data
> > may be visible to other processes on the same host but not to others until
> > fsync() or close() - open-to-close semantics are the only thing the client
> > guarantees, right? Issuing LAYOUTCOMMIT on fsync() and close() ensure the
> > data is committed to stable storage and is visible to all other clients in
> > the cluster.
>
> See above. I'm not disputing your statement that 'the written data is
> not guaranteed to be visible until LAYOUTCOMMIT'. I am disputing an
> assumption that 'the written data may be visible without an accompanying
> change attribute update'.


In other words, I'd expect the following scenario to give the same
results in NFSv4.1 w/pNFS as it does in NFSv4:

Client 1 Client 2
======== ========

OPEN foo
READ
CLOSE
OPEN
LAYOUTGET ...
WRITE via DS
<dies>...
OPEN foo
verify change_attr
READ if above WRITE is visible
CLOSE

Trond
_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-08 15:59:31

by Benny Halevy

[permalink] [raw]
Subject: Re: 4.1 client - LAYOUTCOMMIT & close

On Jul. 08, 2010, 2:14 +0300, Trond Myklebust <[email protected]> wrote:
> On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote:
>> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote:
>>> On Wed, 2010-07-07 at 18:44 -0400, [email protected] wrote:
>>>> Let me try this ...
>>>>
>>>> A correct client will always send LAYOUTCOMMIT.
>>>> Assume that the client is correct.
>>>> Hence if the LAYOUTCOMMIT doesn't arrive, something's failed.
>>>>
>>>> Important implication: No LAYOUTCOMMIT is an error/failure case. It
>>>> just has to work; it doesn't have to be fast.
>>>>

Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the client hasn't
written to the file. I'm not sure what about the blocks case though, do you
implicitly free up any provisionally allocated blocks that the client had not
explicitly committed using LAYOUTCOMMIT?

>>>> Suggestion: If a client dies while holding writeable layouts that permit
>>>> write-in-place, and the client doesn't reappear or doesn't reclaim those
>>>> layouts, then the server should assume that the files involved were
>>>> written before the client died, and set the file attributes accordingly
>>>> as part of internally reclaiming the layout that the client has
>>>> abandoned.

Of course. That's part of the server recovery.

>>>>
>>>> Caveat: It may take a while for the server to determine that the client
>>>> has abandoned a layout.

That's two lease times after a respective CB_LAYOUTRECALL.

>>>>
>>>> This can result in false positives (file appears to be modified when it
>>>> wasn't) but won't yield false negatives (file does not appear to be
>>>> modified even though it was modified).
>>>
>>> OK... So we're going to have to turn off client side file caching
>>> entirely for pNFS? I can do that...
>>>
>>> The above won't work. Think readahead...
>>
>> So... What can work, is if you modify it to work explicitly for
>> close-to-open
>>
>> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must
>> check that it has received LAYOUTCOMMITs from any other clients that may
>> have the file open for writing. If it hasn't, then it MUST take some
>> action to ensure that any file data changes are accompanied by a change
> ^ potentially visible
>> attribute update."

That should be OK as long as it's not for every GETATTR for the change, mtime,
or size attributes.

>>
>> Then you can add the above suggestion without the offending caveat. Note
>> however that it does break the "SHOULD NOT" admonition in section
>> 18.32.4.

Better be safe than sorry in this rare error case.

Benny

>>
>> Trond
>>
>>
>>> Trond
>>>
>>>> Thanks,
>>>> --David
>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected] [mailto:[email protected]] On Behalf
>>>> Of [email protected]
>>>>> Sent: Wednesday, July 07, 2010 6:04 PM
>>>>> To: [email protected]; Muntz, Daniel
>>>>> Cc: [email protected]; [email protected]; [email protected];
>>>> [email protected];
>>>>> [email protected]; [email protected]
>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
>>>>>
>>>>>> Yes. I would agree that the client cannot rely on the updates being
>>>> made
>>>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply
>>>> that a
>>>>>> compliant server MUST also have a valid strategy for dealing with
>>>> the
>>>>>> case where the client doesn't send it.
>>>>>
>>>>> So you are saying the updates "MUST be made visible" through the
>>>>> server's valid strategy. Is that right.
>>>>>
>>>>> And that the client cannot rely on that. Why not, if the server must
>>>>> have a valid strategy.
>>>>>
>>>>> Is this just prudent "belt and suspenders" design or what?
>>>>>
>>>>> It seems to me that if one side here is MUST (and the spec needs to be
>>>>> clearer about what might or might not constitute a valid strategy),
>>>> then
>>>>> the other side should be SHOULD.
>>>>>
>>>>> If both sides are "MUST", then if things don't work out then the
>>>> client
>>>>> and server can equally point to one another and say "It's his fault".
>>>>>
>>>>> Am I missing something here?
>>>>>
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected] [mailto:[email protected]] On Behalf
>>>>> Of Trond Myklebust
>>>>> Sent: Wednesday, July 07, 2010 5:01 PM
>>>>> To: Muntz, Daniel
>>>>> Cc: [email protected]; [email protected]; [email protected];
>>>>> [email protected]; [email protected]; [email protected]
>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
>>>>>
>>>>> On Wed, 2010-07-07 at 16:39 -0400, [email protected] wrote:
>>>>>> To bring this discussion full circle, since we agree that a
>>>> compliant
>>>>>> server can implement a scheme where written data does not become
>>>>> visible
>>>>>> until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
>>>>>> "MUST" from a compliant client (independent of layout type)?
>>>>>
>>>>> Yes. I would agree that the client cannot rely on the updates being
>>>> made
>>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply that
>>>> a
>>>>> compliant server MUST also have a valid strategy for dealing with the
>>>>> case where the client doesn't send it.
>>>>>
>>>>> Cheers
>>>>> Trond
>>>>>
>>>>>> -Dan
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: [email protected] [mailto:[email protected]]
>>>>>>> On Behalf Of Trond Myklebust
>>>>>>> Sent: Wednesday, July 07, 2010 7:04 AM
>>>>>>> To: Benny Halevy
>>>>>>> Cc: [email protected]; [email protected]; Garth
>>>>>>> Gibson; Brent Welch; NFSv4
>>>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
>>>>>>>
>>>>>>> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
>>>>>>>> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
>>>>>>> <[email protected]> wrote:
>>>>>>>>> On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
>>>>>>>>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
>>>>>>>>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected]
>>>>> wrote:
>>>>>>>>>>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I
>>>> see it as
>>>>>>>>>>>>> orthogonal to updating the metadata on the MDS (but
>>>> perhaps I'm wrong).
>>>>>>>>>>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a
>>>> synchronization
>>>>>>>>>>>>> point, so even if the non-clustered server does not want
>>>> to update
>>>>>>>>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a
>>>> trigger to
>>>>>>>>>>>>> execute whatever synchronization mechanism the implementer
>>>> wishes to put
>>>>>>>>>>>>> in the control protocol.
>>>>>>>>>>>>
>>>>>>>>>>>> As far as I'm aware, there are no exceptions in RFC5661
>>>> that would allow
>>>>>>>>>>>> pNFS servers to break the rule that any visible change to
>>>> the data must
>>>>>>>>>>>> be atomically accompanied with a change attribute update.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Trond, I'm not sure how this rule you mentioned is
>>>> specified.
>>>>>>>>>>>
>>>>>>>>>>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and
>>>> change/time_modify
>>>>>>>>>>> in particular:
>>>>>>>>>>>
>>>>>>>>>>> For some layout protocols, the storage device is able to
>>>> notify the
>>>>>>>>>>> metadata server of the occurrence of an I/O; as a result,
>>>> the change
>>>>>>>>>>> and time_modify attributes may be updated at the metadata
>>>> server.
>>>>>>>>>>> For a metadata server that is capable of monitoring
>>>> updates to the
>>>>>>>>>>> change and time_modify attributes, LAYOUTCOMMIT
>>>> processing is not
>>>>>>>>>>> required to update the change attribute. In this case,
>>>> the metadata
>>>>>>>>>>> server must ensure that no further update to the data has
>>>> occurred
>>>>>>>>>>> since the last update of the attributes; file-based
>>>> protocols may
>>>>>>>>>>> have enough information to make this determination or may
>>>> update the
>>>>>>>>>>> change attribute upon each file modification. This also
>>>> applies for
>>>>>>>>>>> the time_modify attribute. If the server implementation
>>>> is able to
>>>>>>>>>>> determine that the file has not been modified since the
>>>> last
>>>>>>>>>>> time_modify update, the server need not update
>>>> time_modify at
>>>>>>>>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated
>>>> attributes
>>>>>>>>>>> should be visible if that file was modified since the
>>>> latest previous
>>>>>>>>>>> LAYOUTCOMMIT or LAYOUTGET
>>>>>>>>>>
>>>>>>>>>> I know. However the above paragraph does not state that the
>>>> server
>>>>>>>>>> should make those changes visible to clients other than the
>>>> one that is
>>>>>>>>>> writing.
>>>>>>>>>>
>>>>>>>>>> Section 18.32.4 states that writes will cause the
>>>> time_modified and
>>>>>>>>>> change attributes to be updated (if and only if the file data
>>>> is
>>>>>>>>>> modified). Several other sections rely on this behaviour,
>>>> including
>>>>>>>>>> section 10.3.1, section 11.7.2.2, and section 11.7.7.
>>>>>>>>>>
>>>>>>>>>> The only 'special behaviour' that I see allowed for pNFS is
>>>> in section
>>>>>>>>>> 13.10, which states that clients can't expect to see changes
>>>>>>>>>> immediately, but that they must be able to expect
>>>> close-to-open
>>>>>>>>>> semantics to work. Again, if this is to be the case, then the
>>>> server
>>>>>>>>>> _must_ be able to deal with the case where client 1 dies
>>>> before it can
>>>>>>>>>> issue the LAYOUTCOMMIT.
>>>>>>>>
>>>>>>>> Agreed.
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> As I see it, if your server allows one client to read data
>>>> that may have
>>>>>>>>>>>> been modified by another client that holds a WRITE layout
>>>> for that range
>>>>>>>>>>>> then (since that is a visible data change) it should
>>>> provide a change
>>>>>>>>>>>> attribute update irrespective of whether or not a
>>>> LAYOUTCOMMIT has been
>>>>>>>>>>>> sent.
>>>>>>>>>>>
>>>>>>>>>>> the requirement for the server in WRITE's implementation
>>>> section
>>>>>>>>>>> is quite weak: "It is assumed that the act of writing data
>>>> to a file will
>>>>>>>>>>> cause the time_modified and change attributes of the file to
>>>> be updated."
>>>>>>>>>>>
>>>>>>>>>>> The difference here is that for pNFS the written data is not
>>>> guaranteed
>>>>>>>>>>> to be visible until LAYOUTCOMMIT. In a broader sense,
>>>> assuming the clients
>>>>>>>>>>> are caching dirty data and use a write-behind cache,
>>>> application-written data
>>>>>>>>>>> may be visible to other processes on the same host but not
>>>> to others until
>>>>>>>>>>> fsync() or close() - open-to-close semantics are the only
>>>> thing the client
>>>>>>>>>>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and
>>>> close() ensure the
>>>>>>>>>>> data is committed to stable storage and is visible to all
>>>> other clients in
>>>>>>>>>>> the cluster.
>>>>>>>>>>
>>>>>>>>>> See above. I'm not disputing your statement that 'the written
>>>> data is
>>>>>>>>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am
>>>> disputing an
>>>>>>>>>> assumption that 'the written data may be visible without an
>>>> accompanying
>>>>>>>>>> change attribute update'.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In other words, I'd expect the following scenario to give the
>>>> same
>>>>>>>>> results in NFSv4.1 w/pNFS as it does in NFSv4:
>>>>>>>>
>>>>>>>> That's a strong requirement that may limit the scalability of
>>>> the server.
>>>>>>>>
>>>>>>>> The spirit of the pNFS operations, at least from Panasas
>>>> perspective was that
>>>>>>>> the data is transient until LAYOUTCOMMIT, meaning it may or may
>>>> not be visible
>>>>>>>> to clients other than the one who wrote it, and its associated
>>>> metadata MUST
>>>>>>>> be updated and describe the new data only on LAYOUTCOMMIT and
>>>> until then it's
>>>>>>>> undefined, i.e. it's up to the server implementation whether to
>>>> update it or not.
>>>>>>>>
>>>>>>>> Without locking, what do the stronger semantics buy you?
>>>>>>>> Even if a client verified the change_attribute new data may
>>>> become visible
>>>>>>>> at any time after the GETATTR if the file/byte range aren't
>>>> locked.
>>>>>>>
>>>>>>> There is no locking needed in the scenario below: it is ordinary
>>>>>>> close-to-open semantics.
>>>>>>>
>>>>>>> The point is that if you remove the one and only way that clients
>>>> have
>>>>>>> to determine whether or not their data caches are valid, then they
>>>> can
>>>>>>> no longer cache data at all, and server scalability will be shot
>>>> to
>>>>>>> smithereens anyway.
>>>>>>>
>>>>>>> Trond
>>>>>>>
>>>>>>>> Benny
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Client 1 Client 2
>>>>>>>>> ======== ========
>>>>>>>>>
>>>>>>>>> OPEN foo
>>>>>>>>> READ
>>>>>>>>> CLOSE
>>>>>>>>> OPEN
>>>>>>>>> LAYOUTGET ...
>>>>>>>>> WRITE via DS
>>>>>>>>> <dies>...
>>>>>>>>> OPEN foo
>>>>>>>>> verify change_attr
>>>>>>>>> READ if above WRITE is visible
>>>>>>>>> CLOSE
>>>>>>>>>
>>>>>>>>> Trond
>>>>>>>>> _______________________________________________
>>>>>>>>> nfsv4 mailing list
>>>>>>>>> [email protected]
>>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> nfsv4 mailing list
>>>>>>> [email protected]
>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> nfsv4 mailing list
>>>>> [email protected]
>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>>
>>>>> _______________________________________________
>>>>> nfsv4 mailing list
>>>>> [email protected]
>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
>>>>
>>>
>>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-08 20:38:07

by david.black

[permalink] [raw]
Subject: RE: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

PiBOb3RlIHRoYXQgYSBMQVlPVVRSRVRVUk4gY2FuIGFycml2ZSB3aXRob3V0IExBWU9VVENPTU1J
VCBpZiB0aGUgY2xpZW50IGhhc24ndA0KPiB3cml0dGVuIHRvIHRoZSBmaWxlLiAgSSdtIG5vdCBz
dXJlIHdoYXQgYWJvdXQgdGhlIGJsb2NrcyBjYXNlIHRob3VnaCwgZG8geW91DQo+IGltcGxpY2l0
bHkgZnJlZSB1cCBhbnkgcHJvdmlzaW9uYWxseSBhbGxvY2F0ZWQgYmxvY2tzIHRoYXQgdGhlIGNs
aWVudCBoYWQgbm90DQo+IGV4cGxpY2l0bHkgY29tbWl0dGVkIHVzaW5nIExBWU9VVENPTU1JVD8N
Cg0KSW4gcHJpbmNpcGxlLCB5ZXMgYXMgdGhlIGJsb2NrcyBhcmUgbm8gbG9uZ2VyIHByb21pc2Vk
IHRvIHRoZSBjbGllbnQsIGFsdGhvdWdoDQpsYXp5IGV2YWx1YXRpb24gb2YgdGhpcyBpcyBhbiBv
YnZpb3VzIG9wdGltaXphdGlvbi4NCg0KPiA+PiAiVXBvbiByZWNlaXZpbmcgYW4gT1BFTiwgTE9D
SyBvciBhIFdBTlRfREVMRUdBVElPTiwgdGhlIHNlcnZlciBtdXN0DQo+ID4+IGNoZWNrIHRoYXQg
aXQgaGFzIHJlY2VpdmVkIExBWU9VVENPTU1JVHMgZnJvbSBhbnkgb3RoZXIgY2xpZW50cyB0aGF0
IG1heQ0KPiA+PiBoYXZlIHRoZSBmaWxlIG9wZW4gZm9yIHdyaXRpbmcuIElmIGl0IGhhc24ndCwg
dGhlbiBpdCBNVVNUIHRha2Ugc29tZQ0KPiA+PiBhY3Rpb24gdG8gZW5zdXJlIHRoYXQgYW55IGZp
bGUgZGF0YSBjaGFuZ2VzIGFyZSBhY2NvbXBhbmllZCBieSBhIGNoYW5nZQ0KPiA+ICAgICAgICAg
ICAgICAgICAgICAgICAgICAgIF4gcG90ZW50aWFsbHkgdmlzaWJsZQ0KPiA+PiBhdHRyaWJ1dGUg
dXBkYXRlLiINCj4gDQo+IFRoYXQgc2hvdWxkIGJlIE9LIGFzIGxvbmcgYXMgaXQncyBub3QgZm9y
IGV2ZXJ5IEdFVEFUVFIgZm9yIHRoZSBjaGFuZ2UsIG10aW1lLA0KPiBvciBzaXplIGF0dHJpYnV0
ZXMuDQo+IA0KPiA+Pg0KPiA+PiBUaGVuIHlvdSBjYW4gYWRkIHRoZSBhYm92ZSBzdWdnZXN0aW9u
IHdpdGhvdXQgdGhlIG9mZmVuZGluZyBjYXZlYXQuIE5vdGUNCj4gPj4gaG93ZXZlciB0aGF0IGl0
IGRvZXMgYnJlYWsgdGhlICJTSE9VTEQgTk9UIiBhZG1vbml0aW9uIGluIHNlY3Rpb24NCj4gPj4g
MTguMzIuNC4NCj4gDQo+IEJldHRlciBiZSBzYWZlIHRoYW4gc29ycnkgaW4gdGhpcyByYXJlIGVy
cm9yIGNhc2UuDQoNCkkgY29uY3VyIHdpdGggQmVubnkgb24gYm90aCBvZiB0aGUgYWJvdmUgLSBp
biBlc3NlbmNlLCB0aGUgdW5yZWNvdmVyZWQgY2xpZW50IGZhaWx1cmUgaXMgYSByZWFzb24gdG8g
cG90ZW50aWFsbHkgaWdub3JlIHRoZSAiU0hPVUxEIiAoc2VydmVyIGNhbid0IGtub3cgd2hldGhl
ciBpdCBhY3R1YWxseSBpZ25vcmVkIHRoZSAiU0hPVUxEIiwgaGVuY2UgYmV0dGVyIHNhZmUgdGhh
biBzb3JyeSkuICBXZSBwcm9iYWJseSBvdWdodCB0byBmaW5kIGEgc29tZXBsYWNlIGFwcHJvcHJp
YXRlIHRvIGFkZCBhIHBhcmFncmFwaCBvciB0d28gZXhwbGFpbmluZyB0aGlzIGluIG9uZSBvZiB0
aGUgNC4yIGRvY3VtZW50cy4NCg0KVGhhbmtzLA0KLS1EYXZpZA0KDQoNCj4gLS0tLS1PcmlnaW5h
bCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogQmVubnkgSGFsZXZ5IFttYWlsdG86YmhhbGV2eS5saXN0
c0BnbWFpbC5jb21dIE9uIEJlaGFsZiBPZiBCZW5ueSBIYWxldnkNCj4gU2VudDogVGh1cnNkYXks
IEp1bHkgMDgsIDIwMTAgMTI6MDAgUE0NCj4gVG86IFRyb25kIE15a2xlYnVzdA0KPiBDYzogQmxh
Y2ssIERhdmlkOyBOb3ZlY2ssIERhdmlkOyBNdW50eiwgRGFuaWVsOyBsaW51eC1uZnNAdmdlci5r
ZXJuZWwub3JnOyBnYXJ0aEBwYW5hc2FzLmNvbTsNCj4gd2VsY2hAcGFuYXNhcy5jb207IG5mc3Y0
QGlldGYub3JnOyBhbmRyb3NAbmV0YXBwLmNvbQ0KPiBTdWJqZWN0OiBSZTogW25mc3Y0XSA0LjEg
Y2xpZW50IC0gTEFZT1VUQ09NTUlUICYgY2xvc2UNCj4gDQo+IE9uIEp1bC4gMDgsIDIwMTAsIDI6
MTQgKzAzMDAsIFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0QGZ5cy51aW8ubm8+IHdy
b3RlOg0KPiA+IE9uIFdlZCwgMjAxMC0wNy0wNyBhdCAxOTowOSAtMDQwMCwgVHJvbmQgTXlrbGVi
dXN0IHdyb3RlOg0KPiA+PiBPbiBXZWQsIDIwMTAtMDctMDcgYXQgMTg6NTIgLTA0MDAsIFRyb25k
IE15a2xlYnVzdCB3cm90ZToNCj4gPj4+IE9uIFdlZCwgMjAxMC0wNy0wNyBhdCAxODo0NCAtMDQw
MCwgZGF2aWQuYmxhY2tAZW1jLmNvbSB3cm90ZToNCj4gPj4+PiBMZXQgbWUgdHJ5IHRoaXMgLi4u
DQo+ID4+Pj4NCj4gPj4+PiBBIGNvcnJlY3QgY2xpZW50IHdpbGwgYWx3YXlzIHNlbmQgTEFZT1VU
Q09NTUlULg0KPiA+Pj4+IEFzc3VtZSB0aGF0IHRoZSBjbGllbnQgaXMgY29ycmVjdC4NCj4gPj4+
PiBIZW5jZSBpZiB0aGUgTEFZT1VUQ09NTUlUIGRvZXNuJ3QgYXJyaXZlLCBzb21ldGhpbmcncyBm
YWlsZWQuDQo+ID4+Pj4NCj4gPj4+PiBJbXBvcnRhbnQgaW1wbGljYXRpb246IE5vIExBWU9VVENP
TU1JVCBpcyBhbiBlcnJvci9mYWlsdXJlIGNhc2UuICBJdA0KPiA+Pj4+IGp1c3QgaGFzIHRvIHdv
cms7IGl0IGRvZXNuJ3QgaGF2ZSB0byBiZSBmYXN0Lg0KPiA+Pj4+DQo+IA0KPiBOb3RlIHRoYXQg
YSBMQVlPVVRSRVRVUk4gY2FuIGFycml2ZSB3aXRob3V0IExBWU9VVENPTU1JVCBpZiB0aGUgY2xp
ZW50IGhhc24ndA0KPiB3cml0dGVuIHRvIHRoZSBmaWxlLiAgSSdtIG5vdCBzdXJlIHdoYXQgYWJv
dXQgdGhlIGJsb2NrcyBjYXNlIHRob3VnaCwgZG8geW91DQo+IGltcGxpY2l0bHkgZnJlZSB1cCBh
bnkgcHJvdmlzaW9uYWxseSBhbGxvY2F0ZWQgYmxvY2tzIHRoYXQgdGhlIGNsaWVudCBoYWQgbm90
DQo+IGV4cGxpY2l0bHkgY29tbWl0dGVkIHVzaW5nIExBWU9VVENPTU1JVD8NCj4gDQo+ID4+Pj4g
U3VnZ2VzdGlvbjogSWYgYSBjbGllbnQgZGllcyB3aGlsZSBob2xkaW5nIHdyaXRlYWJsZSBsYXlv
dXRzIHRoYXQgcGVybWl0DQo+ID4+Pj4gd3JpdGUtaW4tcGxhY2UsIGFuZCB0aGUgY2xpZW50IGRv
ZXNuJ3QgcmVhcHBlYXIgb3IgZG9lc24ndCByZWNsYWltIHRob3NlDQo+ID4+Pj4gbGF5b3V0cywg
dGhlbiB0aGUgc2VydmVyIHNob3VsZCBhc3N1bWUgdGhhdCB0aGUgZmlsZXMgaW52b2x2ZWQgd2Vy
ZQ0KPiA+Pj4+IHdyaXR0ZW4gYmVmb3JlIHRoZSBjbGllbnQgZGllZCwgYW5kIHNldCB0aGUgZmls
ZSBhdHRyaWJ1dGVzIGFjY29yZGluZ2x5DQo+ID4+Pj4gYXMgcGFydCBvZiBpbnRlcm5hbGx5IHJl
Y2xhaW1pbmcgdGhlIGxheW91dCB0aGF0IHRoZSBjbGllbnQgaGFzDQo+ID4+Pj4gYWJhbmRvbmVk
Lg0KPiANCj4gT2YgY291cnNlLiBUaGF0J3MgcGFydCBvZiB0aGUgc2VydmVyIHJlY292ZXJ5Lg0K
PiANCj4gPj4+Pg0KPiA+Pj4+IENhdmVhdDogSXQgbWF5IHRha2UgYSB3aGlsZSBmb3IgdGhlIHNl
cnZlciB0byBkZXRlcm1pbmUgdGhhdCB0aGUgY2xpZW50DQo+ID4+Pj4gaGFzIGFiYW5kb25lZCBh
IGxheW91dC4NCj4gDQo+IFRoYXQncyB0d28gbGVhc2UgdGltZXMgYWZ0ZXIgYSByZXNwZWN0aXZl
IENCX0xBWU9VVFJFQ0FMTC4NCj4gDQo+ID4+Pj4NCj4gPj4+PiBUaGlzIGNhbiByZXN1bHQgaW4g
ZmFsc2UgcG9zaXRpdmVzIChmaWxlIGFwcGVhcnMgdG8gYmUgbW9kaWZpZWQgd2hlbiBpdA0KPiA+
Pj4+IHdhc24ndCkgYnV0IHdvbid0IHlpZWxkIGZhbHNlIG5lZ2F0aXZlcyAoZmlsZSBkb2VzIG5v
dCBhcHBlYXIgdG8gYmUNCj4gPj4+PiBtb2RpZmllZCBldmVuIHRob3VnaCBpdCB3YXMgbW9kaWZp
ZWQpLg0KPiA+Pj4NCj4gPj4+IE9LLi4uIFNvIHdlJ3JlIGdvaW5nIHRvIGhhdmUgdG8gdHVybiBv
ZmYgY2xpZW50IHNpZGUgZmlsZSBjYWNoaW5nDQo+ID4+PiBlbnRpcmVseSBmb3IgcE5GUz8gSSBj
YW4gZG8gdGhhdC4uLg0KPiA+Pj4NCj4gPj4+IFRoZSBhYm92ZSB3b24ndCB3b3JrLiBUaGluayBy
ZWFkYWhlYWQuLi4NCj4gPj4NCj4gPj4gU28uLi4gV2hhdCBjYW4gd29yaywgaXMgaWYgeW91IG1v
ZGlmeSBpdCB0byB3b3JrIGV4cGxpY2l0bHkgZm9yDQo+ID4+IGNsb3NlLXRvLW9wZW4NCj4gPj4N
Cj4gPj4gIlVwb24gcmVjZWl2aW5nIGFuIE9QRU4sIExPQ0sgb3IgYSBXQU5UX0RFTEVHQVRJT04s
IHRoZSBzZXJ2ZXIgbXVzdA0KPiA+PiBjaGVjayB0aGF0IGl0IGhhcyByZWNlaXZlZCBMQVlPVVRD
T01NSVRzIGZyb20gYW55IG90aGVyIGNsaWVudHMgdGhhdCBtYXkNCj4gPj4gaGF2ZSB0aGUgZmls
ZSBvcGVuIGZvciB3cml0aW5nLiBJZiBpdCBoYXNuJ3QsIHRoZW4gaXQgTVVTVCB0YWtlIHNvbWUN
Cj4gPj4gYWN0aW9uIHRvIGVuc3VyZSB0aGF0IGFueSBmaWxlIGRhdGEgY2hhbmdlcyBhcmUgYWNj
b21wYW5pZWQgYnkgYSBjaGFuZ2UNCj4gPiAgICAgICAgICAgICAgICAgICAgICAgICAgICBeIHBv
dGVudGlhbGx5IHZpc2libGUNCj4gPj4gYXR0cmlidXRlIHVwZGF0ZS4iDQo+IA0KPiBUaGF0IHNo
b3VsZCBiZSBPSyBhcyBsb25nIGFzIGl0J3Mgbm90IGZvciBldmVyeSBHRVRBVFRSIGZvciB0aGUg
Y2hhbmdlLCBtdGltZSwNCj4gb3Igc2l6ZSBhdHRyaWJ1dGVzLg0KPiANCj4gPj4NCj4gPj4gVGhl
biB5b3UgY2FuIGFkZCB0aGUgYWJvdmUgc3VnZ2VzdGlvbiB3aXRob3V0IHRoZSBvZmZlbmRpbmcg
Y2F2ZWF0LiBOb3RlDQo+ID4+IGhvd2V2ZXIgdGhhdCBpdCBkb2VzIGJyZWFrIHRoZSAiU0hPVUxE
IE5PVCIgYWRtb25pdGlvbiBpbiBzZWN0aW9uDQo+ID4+IDE4LjMyLjQuDQo+IA0KPiBCZXR0ZXIg
YmUgc2FmZSB0aGFuIHNvcnJ5IGluIHRoaXMgcmFyZSBlcnJvciBjYXNlLg0KPiANCj4gQmVubnkN
Cj4gDQo+ID4+DQo+ID4+IFRyb25kDQo+ID4+DQo+ID4+DQo+ID4+PiBUcm9uZA0KPiA+Pj4NCj4g
Pj4+PiBUaGFua3MsDQo+ID4+Pj4gLS1EYXZpZA0KPiA+Pj4+DQo+ID4+Pj4+IC0tLS0tT3JpZ2lu
YWwgTWVzc2FnZS0tLS0tDQo+ID4+Pj4+IEZyb206IG5mc3Y0LWJvdW5jZXNAaWV0Zi5vcmcgW21h
aWx0bzpuZnN2NC1ib3VuY2VzQGlldGYub3JnXSBPbiBCZWhhbGYNCj4gPj4+PiBPZiBOb3ZlY2tf
RGF2aWRAZW1jLmNvbQ0KPiA+Pj4+PiBTZW50OiBXZWRuZXNkYXksIEp1bHkgMDcsIDIwMTAgNjow
NCBQTQ0KPiA+Pj4+PiBUbzogVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb207IE11bnR6LCBEYW5p
ZWwNCj4gPj4+Pj4gQ2M6IGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc7IGdhcnRoQHBhbmFzYXMu
Y29tOyB3ZWxjaEBwYW5hc2FzLmNvbTsNCj4gPj4+PiBuZnN2NEBpZXRmLm9yZzsNCj4gPj4+Pj4g
YW5kcm9zQG5ldGFwcC5jb207IGJoYWxldnlAcGFuYXNhcy5jb20NCj4gPj4+Pj4gU3ViamVjdDog
UmU6IFtuZnN2NF0gNC4xIGNsaWVudCAtIExBWU9VVENPTU1JVCAmIGNsb3NlDQo+ID4+Pj4+DQo+
ID4+Pj4+PiBZZXMuIEkgd291bGQgYWdyZWUgdGhhdCB0aGUgY2xpZW50IGNhbm5vdCByZWx5IG9u
IHRoZSB1cGRhdGVzIGJlaW5nDQo+ID4+Pj4gbWFkZQ0KPiA+Pj4+Pj4gdmlzaWJsZSBpZiBpdCBm
YWlscyB0byBzZW5kIHRoZSBMQVlPVVRDT01NSVQuIE15IHBvaW50IHdhcyBzaW1wbHkNCj4gPj4+
PiB0aGF0IGENCj4gPj4+Pj4+IGNvbXBsaWFudCBzZXJ2ZXIgTVVTVCBhbHNvIGhhdmUgYSB2YWxp
ZCBzdHJhdGVneSBmb3IgZGVhbGluZyB3aXRoDQo+ID4+Pj4gdGhlDQo+ID4+Pj4+PiBjYXNlIHdo
ZXJlIHRoZSBjbGllbnQgZG9lc24ndCBzZW5kIGl0Lg0KPiA+Pj4+Pg0KPiA+Pj4+PiBTbyB5b3Ug
YXJlIHNheWluZyB0aGUgdXBkYXRlcyAiTVVTVCBiZSBtYWRlIHZpc2libGUiIHRocm91Z2ggdGhl
DQo+ID4+Pj4+IHNlcnZlcidzIHZhbGlkIHN0cmF0ZWd5LiAgSXMgdGhhdCByaWdodC4NCj4gPj4+
Pj4NCj4gPj4+Pj4gQW5kIHRoYXQgdGhlIGNsaWVudCBjYW5ub3QgcmVseSBvbiB0aGF0LiAgV2h5
IG5vdCwgaWYgdGhlIHNlcnZlciBtdXN0DQo+ID4+Pj4+IGhhdmUgYSB2YWxpZCBzdHJhdGVneS4N
Cj4gPj4+Pj4NCj4gPj4+Pj4gSXMgdGhpcyBqdXN0IHBydWRlbnQgImJlbHQgYW5kIHN1c3BlbmRl
cnMiIGRlc2lnbiBvciB3aGF0Pw0KPiA+Pj4+Pg0KPiA+Pj4+PiBJdCBzZWVtcyB0byBtZSB0aGF0
IGlmIG9uZSBzaWRlIGhlcmUgaXMgTVVTVCAoYW5kIHRoZSBzcGVjIG5lZWRzIHRvIGJlDQo+ID4+
Pj4+IGNsZWFyZXIgYWJvdXQgd2hhdCBtaWdodCBvciBtaWdodCBub3QgY29uc3RpdHV0ZSBhIHZh
bGlkIHN0cmF0ZWd5KSwNCj4gPj4+PiB0aGVuDQo+ID4+Pj4+IHRoZSBvdGhlciBzaWRlIHNob3Vs
ZCBiZSBTSE9VTEQuDQo+ID4+Pj4+DQo+ID4+Pj4+IElmIGJvdGggc2lkZXMgYXJlICJNVVNUIiwg
dGhlbiBpZiB0aGluZ3MgZG9uJ3Qgd29yayBvdXQgdGhlbiB0aGUNCj4gPj4+PiBjbGllbnQNCj4g
Pj4+Pj4gYW5kIHNlcnZlciBjYW4gZXF1YWxseSBwb2ludCB0byBvbmUgYW5vdGhlciBhbmQgc2F5
ICJJdCdzIGhpcyBmYXVsdCIuDQo+ID4+Pj4+DQo+ID4+Pj4+IEFtIEkgbWlzc2luZyBzb21ldGhp
bmcgaGVyZT8NCj4gPj4+Pj4NCj4gPj4+Pj4NCj4gPj4+Pj4NCj4gPj4+Pj4gLS0tLS1PcmlnaW5h
bCBNZXNzYWdlLS0tLS0NCj4gPj4+Pj4gRnJvbTogbmZzdjQtYm91bmNlc0BpZXRmLm9yZyBbbWFp
bHRvOm5mc3Y0LWJvdW5jZXNAaWV0Zi5vcmddIE9uIEJlaGFsZg0KPiA+Pj4+PiBPZiBUcm9uZCBN
eWtsZWJ1c3QNCj4gPj4+Pj4gU2VudDogV2VkbmVzZGF5LCBKdWx5IDA3LCAyMDEwIDU6MDEgUE0N
Cj4gPj4+Pj4gVG86IE11bnR6LCBEYW5pZWwNCj4gPj4+Pj4gQ2M6IGxpbnV4LW5mc0B2Z2VyLmtl
cm5lbC5vcmc7IGdhcnRoQHBhbmFzYXMuY29tOyB3ZWxjaEBwYW5hc2FzLmNvbTsNCj4gPj4+Pj4g
bmZzdjRAaWV0Zi5vcmc7IGFuZHJvc0BuZXRhcHAuY29tOyBiaGFsZXZ5QHBhbmFzYXMuY29tDQo+
ID4+Pj4+IFN1YmplY3Q6IFJlOiBbbmZzdjRdIDQuMSBjbGllbnQgLSBMQVlPVVRDT01NSVQgJiBj
bG9zZQ0KPiA+Pj4+Pg0KPiA+Pj4+PiBPbiBXZWQsIDIwMTAtMDctMDcgYXQgMTY6MzkgLTA0MDAs
IERhbmllbC5NdW50ekBlbWMuY29tIHdyb3RlOg0KPiA+Pj4+Pj4gVG8gYnJpbmcgdGhpcyBkaXNj
dXNzaW9uIGZ1bGwgY2lyY2xlLCBzaW5jZSB3ZSBhZ3JlZSB0aGF0IGENCj4gPj4+PiBjb21wbGlh
bnQNCj4gPj4+Pj4+IHNlcnZlciBjYW4gaW1wbGVtZW50IGEgc2NoZW1lIHdoZXJlIHdyaXR0ZW4g
ZGF0YSBkb2VzIG5vdCBiZWNvbWUNCj4gPj4+Pj4gdmlzaWJsZQ0KPiA+Pj4+Pj4gdW50aWwgYWZ0
ZXIgYSBMQVlPVVRDT01NSVQsIGRvIHdlIGFsc28gYWdyZWUgdGhhdCBMQVlPVVRDT01NSVQgaXMg
YQ0KPiA+Pj4+Pj4gIk1VU1QiIGZyb20gYSBjb21wbGlhbnQgY2xpZW50IChpbmRlcGVuZGVudCBv
ZiBsYXlvdXQgdHlwZSk/DQo+ID4+Pj4+DQo+ID4+Pj4+IFllcy4gSSB3b3VsZCBhZ3JlZSB0aGF0
IHRoZSBjbGllbnQgY2Fubm90IHJlbHkgb24gdGhlIHVwZGF0ZXMgYmVpbmcNCj4gPj4+PiBtYWRl
DQo+ID4+Pj4+IHZpc2libGUgaWYgaXQgZmFpbHMgdG8gc2VuZCB0aGUgTEFZT1VUQ09NTUlULiBN
eSBwb2ludCB3YXMgc2ltcGx5IHRoYXQNCj4gPj4+PiBhDQo+ID4+Pj4+IGNvbXBsaWFudCBzZXJ2
ZXIgTVVTVCBhbHNvIGhhdmUgYSB2YWxpZCBzdHJhdGVneSBmb3IgZGVhbGluZyB3aXRoIHRoZQ0K
PiA+Pj4+PiBjYXNlIHdoZXJlIHRoZSBjbGllbnQgZG9lc24ndCBzZW5kIGl0Lg0KPiA+Pj4+Pg0K
PiA+Pj4+PiBDaGVlcnMNCj4gPj4+Pj4gICBUcm9uZA0KPiA+Pj4+Pg0KPiA+Pj4+Pj4gICAtRGFu
DQo+ID4+Pj4+Pg0KPiA+Pj4+Pj4+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4+Pj4+
Pj4gRnJvbTogbmZzdjQtYm91bmNlc0BpZXRmLm9yZyBbbWFpbHRvOm5mc3Y0LWJvdW5jZXNAaWV0
Zi5vcmddDQo+ID4+Pj4+Pj4gT24gQmVoYWxmIE9mIFRyb25kIE15a2xlYnVzdA0KPiA+Pj4+Pj4+
IFNlbnQ6IFdlZG5lc2RheSwgSnVseSAwNywgMjAxMCA3OjA0IEFNDQo+ID4+Pj4+Pj4gVG86IEJl
bm55IEhhbGV2eQ0KPiA+Pj4+Pj4+IENjOiBhbmRyb3NAbmV0YXBwLmNvbTsgbGludXgtbmZzQHZn
ZXIua2VybmVsLm9yZzsgR2FydGgNCj4gPj4+Pj4+PiBHaWJzb247IEJyZW50IFdlbGNoOyBORlN2
NA0KPiA+Pj4+Pj4+IFN1YmplY3Q6IFJlOiBbbmZzdjRdIDQuMSBjbGllbnQgLSBMQVlPVVRDT01N
SVQgJiBjbG9zZQ0KPiA+Pj4+Pj4+DQo+ID4+Pj4+Pj4gT24gV2VkLCAyMDEwLTA3LTA3IGF0IDE2
OjUxICswMzAwLCBCZW5ueSBIYWxldnkgd3JvdGU6DQo+ID4+Pj4+Pj4+IE9uIEp1bC4gMDcsIDIw
MTAsIDE2OjE4ICswMzAwLCBUcm9uZCBNeWtsZWJ1c3QNCj4gPj4+Pj4+PiA8VHJvbmQuTXlrbGVi
dXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+Pj4+Pj4+Pj4gT24gV2VkLCAyMDEwLTA3LTA3IGF0
IDA5OjA2IC0wNDAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+ID4+Pj4+Pj4+Pj4gT24gV2Vk
LCAyMDEwLTA3LTA3IGF0IDE1OjA1ICswMzAwLCBCZW5ueSBIYWxldnkgd3JvdGU6DQo+ID4+Pj4+
Pj4+Pj4+IE9uIEp1bC4gMDYsIDIwMTAsIDIzOjQwICswMzAwLCBUcm9uZCBNeWtsZWJ1c3QNCj4g
Pj4+Pj4+PiA8dHJvbmQubXlrbGVidXN0QGZ5cy51aW8ubm8+IHdyb3RlOg0KPiA+Pj4+Pj4+Pj4+
Pj4gT24gVHVlLCAyMDEwLTA3LTA2IGF0IDE1OjIwIC0wNDAwLCBEYW5pZWwuTXVudHpAZW1jLmNv
bQ0KPiA+Pj4+PiB3cm90ZToNCj4gPj4+Pj4+Pj4+Pj4+PiBUaGUgQ09NTUlUIHRvIHRoZSBEUywg
dHRib21rLCBjb21taXRzIGRhdGEgb24gdGhlIERTLiBJDQo+ID4+Pj4gc2VlIGl0IGFzDQo+ID4+
Pj4+Pj4+Pj4+Pj4gb3J0aG9nb25hbCB0byB1cGRhdGluZyB0aGUgbWV0YWRhdGEgb24gdGhlIE1E
UyAoYnV0DQo+ID4+Pj4gcGVyaGFwcyBJJ20gd3JvbmcpLg0KPiA+Pj4+Pj4+Pj4+Pj4+IEFzIHNq
b3NoaUBibHVlYXJjIG1lbnRpb25lZCwgdGhlIExBWU9VVENPTU1JVCBwcm92aWRlcyBhDQo+ID4+
Pj4gc3luY2hyb25pemF0aW9uDQo+ID4+Pj4+Pj4+Pj4+Pj4gcG9pbnQsIHNvIGV2ZW4gaWYgdGhl
IG5vbi1jbHVzdGVyZWQgc2VydmVyIGRvZXMgbm90IHdhbnQNCj4gPj4+PiB0byB1cGRhdGUNCj4g
Pj4+Pj4+Pj4+Pj4+PiBtZXRhZGF0YSBvbiBldmVyeSBEUyBJL08sIHRoZSBMQVlPVVRDT01NSVQg
Y291bGQgYWxzbyBiZSBhDQo+ID4+Pj4gdHJpZ2dlciB0bw0KPiA+Pj4+Pj4+Pj4+Pj4+IGV4ZWN1
dGUgd2hhdGV2ZXIgc3luY2hyb25pemF0aW9uIG1lY2hhbmlzbSB0aGUgaW1wbGVtZW50ZXINCj4g
Pj4+PiB3aXNoZXMgdG8gcHV0DQo+ID4+Pj4+Pj4+Pj4+Pj4gaW4gdGhlIGNvbnRyb2wgcHJvdG9j
b2wuDQo+ID4+Pj4+Pj4+Pj4+Pg0KPiA+Pj4+Pj4+Pj4+Pj4gQXMgZmFyIGFzIEknbSBhd2FyZSwg
dGhlcmUgYXJlIG5vIGV4Y2VwdGlvbnMgaW4gUkZDNTY2MQ0KPiA+Pj4+IHRoYXQgd291bGQgYWxs
b3cNCj4gPj4+Pj4+Pj4+Pj4+IHBORlMgc2VydmVycyB0byBicmVhayB0aGUgcnVsZSB0aGF0IGFu
eSB2aXNpYmxlIGNoYW5nZSB0bw0KPiA+Pj4+IHRoZSBkYXRhIG11c3QNCj4gPj4+Pj4+Pj4+Pj4+
IGJlIGF0b21pY2FsbHkgYWNjb21wYW5pZWQgd2l0aCBhIGNoYW5nZSBhdHRyaWJ1dGUgdXBkYXRl
Lg0KPiA+Pj4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+Pj4gVHJvbmQsIEkn
bSBub3Qgc3VyZSBob3cgdGhpcyBydWxlIHlvdSBtZW50aW9uZWQgaXMNCj4gPj4+PiBzcGVjaWZp
ZWQuDQo+ID4+Pj4+Pj4+Pj4+DQo+ID4+Pj4+Pj4+Pj4+IFNlZSBtb3JlIGluIHNlY3Rpb24gMTIu
NS40IGFuZCAxMi41LjQuMS4gTEFZT1VUQ09NTUlUIGFuZA0KPiA+Pj4+IGNoYW5nZS90aW1lX21v
ZGlmeQ0KPiA+Pj4+Pj4+Pj4+PiBpbiBwYXJ0aWN1bGFyOg0KPiA+Pj4+Pj4+Pj4+Pg0KPiA+Pj4+
Pj4+Pj4+PiAgICBGb3Igc29tZSBsYXlvdXQgcHJvdG9jb2xzLCB0aGUgc3RvcmFnZSBkZXZpY2Ug
aXMgYWJsZSB0bw0KPiA+Pj4+IG5vdGlmeSB0aGUNCj4gPj4+Pj4+Pj4+Pj4gICAgbWV0YWRhdGEg
c2VydmVyIG9mIHRoZSBvY2N1cnJlbmNlIG9mIGFuIEkvTzsgYXMgYSByZXN1bHQsDQo+ID4+Pj4g
dGhlIGNoYW5nZQ0KPiA+Pj4+Pj4+Pj4+PiAgICBhbmQgdGltZV9tb2RpZnkgYXR0cmlidXRlcyBt
YXkgYmUgdXBkYXRlZCBhdCB0aGUgbWV0YWRhdGENCj4gPj4+PiBzZXJ2ZXIuDQo+ID4+Pj4+Pj4+
Pj4+ICAgIEZvciBhIG1ldGFkYXRhIHNlcnZlciB0aGF0IGlzIGNhcGFibGUgb2YgbW9uaXRvcmlu
Zw0KPiA+Pj4+IHVwZGF0ZXMgdG8gdGhlDQo+ID4+Pj4+Pj4+Pj4+ICAgIGNoYW5nZSBhbmQgdGlt
ZV9tb2RpZnkgYXR0cmlidXRlcywgTEFZT1VUQ09NTUlUDQo+ID4+Pj4gcHJvY2Vzc2luZyBpcyBu
b3QNCj4gPj4+Pj4+Pj4+Pj4gICAgcmVxdWlyZWQgdG8gdXBkYXRlIHRoZSBjaGFuZ2UgYXR0cmli
dXRlLiAgSW4gdGhpcyBjYXNlLA0KPiA+Pj4+IHRoZSBtZXRhZGF0YQ0KPiA+Pj4+Pj4+Pj4+PiAg
ICBzZXJ2ZXIgbXVzdCBlbnN1cmUgdGhhdCBubyBmdXJ0aGVyIHVwZGF0ZSB0byB0aGUgZGF0YSBo
YXMNCj4gPj4+PiBvY2N1cnJlZA0KPiA+Pj4+Pj4+Pj4+PiAgICBzaW5jZSB0aGUgbGFzdCB1cGRh
dGUgb2YgdGhlIGF0dHJpYnV0ZXM7IGZpbGUtYmFzZWQNCj4gPj4+PiBwcm90b2NvbHMgbWF5DQo+
ID4+Pj4+Pj4+Pj4+ICAgIGhhdmUgZW5vdWdoIGluZm9ybWF0aW9uIHRvIG1ha2UgdGhpcyBkZXRl
cm1pbmF0aW9uIG9yIG1heQ0KPiA+Pj4+IHVwZGF0ZSB0aGUNCj4gPj4+Pj4+Pj4+Pj4gICAgY2hh
bmdlIGF0dHJpYnV0ZSB1cG9uIGVhY2ggZmlsZSBtb2RpZmljYXRpb24uICBUaGlzIGFsc28NCj4g
Pj4+PiBhcHBsaWVzIGZvcg0KPiA+Pj4+Pj4+Pj4+PiAgICB0aGUgdGltZV9tb2RpZnkgYXR0cmli
dXRlLiAgSWYgdGhlIHNlcnZlciBpbXBsZW1lbnRhdGlvbg0KPiA+Pj4+IGlzIGFibGUgdG8NCj4g
Pj4+Pj4+Pj4+Pj4gICAgZGV0ZXJtaW5lIHRoYXQgdGhlIGZpbGUgaGFzIG5vdCBiZWVuIG1vZGlm
aWVkIHNpbmNlIHRoZQ0KPiA+Pj4+IGxhc3QNCj4gPj4+Pj4+Pj4+Pj4gICAgdGltZV9tb2RpZnkg
dXBkYXRlLCB0aGUgc2VydmVyIG5lZWQgbm90IHVwZGF0ZQ0KPiA+Pj4+IHRpbWVfbW9kaWZ5IGF0
DQo+ID4+Pj4+Pj4+Pj4+ICAgIExBWU9VVENPTU1JVC4gIEF0IExBWU9VVENPTU1JVCBjb21wbGV0
aW9uLCB0aGUgdXBkYXRlZA0KPiA+Pj4+IGF0dHJpYnV0ZXMNCj4gPj4+Pj4+Pj4+Pj4gICAgc2hv
dWxkIGJlIHZpc2libGUgaWYgdGhhdCBmaWxlIHdhcyBtb2RpZmllZCBzaW5jZSB0aGUNCj4gPj4+
PiBsYXRlc3QgcHJldmlvdXMNCj4gPj4+Pj4+Pj4+Pj4gICAgTEFZT1VUQ09NTUlUIG9yIExBWU9V
VEdFVA0KPiA+Pj4+Pj4+Pj4+DQo+ID4+Pj4+Pj4+Pj4gSSBrbm93LiBIb3dldmVyIHRoZSBhYm92
ZSBwYXJhZ3JhcGggZG9lcyBub3Qgc3RhdGUgdGhhdCB0aGUNCj4gPj4+PiBzZXJ2ZXINCj4gPj4+
Pj4+Pj4+PiBzaG91bGQgbWFrZSB0aG9zZSBjaGFuZ2VzIHZpc2libGUgdG8gY2xpZW50cyBvdGhl
ciB0aGFuIHRoZQ0KPiA+Pj4+IG9uZSB0aGF0IGlzDQo+ID4+Pj4+Pj4+Pj4gd3JpdGluZy4NCj4g
Pj4+Pj4+Pj4+Pg0KPiA+Pj4+Pj4+Pj4+IFNlY3Rpb24gMTguMzIuNCBzdGF0ZXMgdGhhdCB3cml0
ZXMgd2lsbCBjYXVzZSB0aGUNCj4gPj4+PiB0aW1lX21vZGlmaWVkIGFuZA0KPiA+Pj4+Pj4+Pj4+
IGNoYW5nZSBhdHRyaWJ1dGVzIHRvIGJlIHVwZGF0ZWQgKGlmIGFuZCBvbmx5IGlmIHRoZSBmaWxl
IGRhdGENCj4gPj4+PiBpcw0KPiA+Pj4+Pj4+Pj4+IG1vZGlmaWVkKS4gU2V2ZXJhbCBvdGhlciBz
ZWN0aW9ucyByZWx5IG9uIHRoaXMgYmVoYXZpb3VyLA0KPiA+Pj4+IGluY2x1ZGluZw0KPiA+Pj4+
Pj4+Pj4+IHNlY3Rpb24gMTAuMy4xLCBzZWN0aW9uIDExLjcuMi4yLCBhbmQgc2VjdGlvbiAxMS43
LjcuDQo+ID4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+PiBUaGUgb25seSAnc3BlY2lhbCBiZWhhdmlv
dXInIHRoYXQgSSBzZWUgYWxsb3dlZCBmb3IgcE5GUyBpcw0KPiA+Pj4+IGluIHNlY3Rpb24NCj4g
Pj4+Pj4+Pj4+PiAxMy4xMCwgd2hpY2ggc3RhdGVzIHRoYXQgY2xpZW50cyBjYW4ndCBleHBlY3Qg
dG8gc2VlIGNoYW5nZXMNCj4gPj4+Pj4+Pj4+PiBpbW1lZGlhdGVseSwgYnV0IHRoYXQgdGhleSBt
dXN0IGJlIGFibGUgdG8gZXhwZWN0DQo+ID4+Pj4gY2xvc2UtdG8tb3Blbg0KPiA+Pj4+Pj4+Pj4+
IHNlbWFudGljcyB0byB3b3JrLiBBZ2FpbiwgaWYgdGhpcyBpcyB0byBiZSB0aGUgY2FzZSwgdGhl
biB0aGUNCj4gPj4+PiBzZXJ2ZXINCj4gPj4+Pj4+Pj4+PiBfbXVzdF8gYmUgYWJsZSB0byBkZWFs
IHdpdGggdGhlIGNhc2Ugd2hlcmUgY2xpZW50IDEgZGllcw0KPiA+Pj4+IGJlZm9yZSBpdCBjYW4N
Cj4gPj4+Pj4+Pj4+PiBpc3N1ZSB0aGUgTEFZT1VUQ09NTUlULg0KPiA+Pj4+Pj4+Pg0KPiA+Pj4+
Pj4+PiBBZ3JlZWQuDQo+ID4+Pj4+Pj4+DQo+ID4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+Pg0KPiA+
Pj4+Pj4+Pj4+Pj4gQXMgSSBzZWUgaXQsIGlmIHlvdXIgc2VydmVyIGFsbG93cyBvbmUgY2xpZW50
IHRvIHJlYWQgZGF0YQ0KPiA+Pj4+IHRoYXQgbWF5IGhhdmUNCj4gPj4+Pj4+Pj4+Pj4+IGJlZW4g
bW9kaWZpZWQgYnkgYW5vdGhlciBjbGllbnQgdGhhdCBob2xkcyBhIFdSSVRFIGxheW91dA0KPiA+
Pj4+IGZvciB0aGF0IHJhbmdlDQo+ID4+Pj4+Pj4+Pj4+PiB0aGVuIChzaW5jZSB0aGF0IGlzIGEg
dmlzaWJsZSBkYXRhIGNoYW5nZSkgaXQgc2hvdWxkDQo+ID4+Pj4gcHJvdmlkZSBhIGNoYW5nZQ0K
PiA+Pj4+Pj4+Pj4+Pj4gYXR0cmlidXRlIHVwZGF0ZSBpcnJlc3BlY3RpdmUgb2Ygd2hldGhlciBv
ciBub3QgYQ0KPiA+Pj4+IExBWU9VVENPTU1JVCBoYXMgYmVlbg0KPiA+Pj4+Pj4+Pj4+Pj4gc2Vu
dC4NCj4gPj4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+Pj4gdGhlIHJlcXVpcmVtZW50IGZvciB0aGUg
c2VydmVyIGluIFdSSVRFJ3MgaW1wbGVtZW50YXRpb24NCj4gPj4+PiBzZWN0aW9uDQo+ID4+Pj4+
Pj4+Pj4+IGlzIHF1aXRlIHdlYWs6ICJJdCBpcyBhc3N1bWVkIHRoYXQgdGhlIGFjdCBvZiB3cml0
aW5nIGRhdGENCj4gPj4+PiB0byBhIGZpbGUgd2lsbA0KPiA+Pj4+Pj4+Pj4+PiBjYXVzZSB0aGUg
dGltZV9tb2RpZmllZCBhbmQgY2hhbmdlIGF0dHJpYnV0ZXMgb2YgdGhlIGZpbGUgdG8NCj4gPj4+
PiBiZSB1cGRhdGVkLiINCj4gPj4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+Pj4gVGhlIGRpZmZlcmVu
Y2UgaGVyZSBpcyB0aGF0IGZvciBwTkZTIHRoZSB3cml0dGVuIGRhdGEgaXMgbm90DQo+ID4+Pj4g
Z3VhcmFudGVlZA0KPiA+Pj4+Pj4+Pj4+PiB0byBiZSB2aXNpYmxlIHVudGlsIExBWU9VVENPTU1J
VC4gIEluIGEgYnJvYWRlciBzZW5zZSwNCj4gPj4+PiBhc3N1bWluZyB0aGUgY2xpZW50cw0KPiA+
Pj4+Pj4+Pj4+PiBhcmUgY2FjaGluZyBkaXJ0eSBkYXRhIGFuZCB1c2UgYSB3cml0ZS1iZWhpbmQg
Y2FjaGUsDQo+ID4+Pj4gYXBwbGljYXRpb24td3JpdHRlbiBkYXRhDQo+ID4+Pj4+Pj4+Pj4+IG1h
eSBiZSB2aXNpYmxlIHRvIG90aGVyIHByb2Nlc3NlcyBvbiB0aGUgc2FtZSBob3N0IGJ1dCBub3QN
Cj4gPj4+PiB0byBvdGhlcnMgdW50aWwNCj4gPj4+Pj4+Pj4+Pj4gZnN5bmMoKSBvciBjbG9zZSgp
IC0gb3Blbi10by1jbG9zZSBzZW1hbnRpY3MgYXJlIHRoZSBvbmx5DQo+ID4+Pj4gdGhpbmcgdGhl
IGNsaWVudA0KPiA+Pj4+Pj4+Pj4+PiBndWFyYW50ZWVzLCByaWdodD8gIElzc3VpbmcgTEFZT1VU
Q09NTUlUIG9uIGZzeW5jKCkgYW5kDQo+ID4+Pj4gY2xvc2UoKSBlbnN1cmUgdGhlDQo+ID4+Pj4+
Pj4+Pj4+IGRhdGEgaXMgY29tbWl0dGVkIHRvIHN0YWJsZSBzdG9yYWdlIGFuZCBpcyB2aXNpYmxl
IHRvIGFsbA0KPiA+Pj4+IG90aGVyIGNsaWVudHMgaW4NCj4gPj4+Pj4+Pj4+Pj4gdGhlIGNsdXN0
ZXIuDQo+ID4+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+PiBTZWUgYWJvdmUuIEknbSBub3QgZGlzcHV0
aW5nIHlvdXIgc3RhdGVtZW50IHRoYXQgJ3RoZSB3cml0dGVuDQo+ID4+Pj4gZGF0YSBpcw0KPiA+
Pj4+Pj4+Pj4+IG5vdCBndWFyYW50ZWVkIHRvIGJlIHZpc2libGUgdW50aWwgTEFZT1VUQ09NTUlU
Jy4gSSBhbQ0KPiA+Pj4+IGRpc3B1dGluZyBhbg0KPiA+Pj4+Pj4+Pj4+IGFzc3VtcHRpb24gdGhh
dCAndGhlIHdyaXR0ZW4gZGF0YSBtYXkgYmUgdmlzaWJsZSB3aXRob3V0IGFuDQo+ID4+Pj4gYWNj
b21wYW55aW5nDQo+ID4+Pj4+Pj4+Pj4gY2hhbmdlIGF0dHJpYnV0ZSB1cGRhdGUnLg0KPiA+Pj4+
Pj4+Pj4NCj4gPj4+Pj4+Pj4+DQo+ID4+Pj4+Pj4+PiBJbiBvdGhlciB3b3JkcywgSSdkIGV4cGVj
dCB0aGUgZm9sbG93aW5nIHNjZW5hcmlvIHRvIGdpdmUgdGhlDQo+ID4+Pj4gc2FtZQ0KPiA+Pj4+
Pj4+Pj4gcmVzdWx0cyBpbiBORlN2NC4xIHcvcE5GUyBhcyBpdCBkb2VzIGluIE5GU3Y0Og0KPiA+
Pj4+Pj4+Pg0KPiA+Pj4+Pj4+PiBUaGF0J3MgYSBzdHJvbmcgcmVxdWlyZW1lbnQgdGhhdCBtYXkg
bGltaXQgdGhlIHNjYWxhYmlsaXR5IG9mDQo+ID4+Pj4gdGhlIHNlcnZlci4NCj4gPj4+Pj4+Pj4N
Cj4gPj4+Pj4+Pj4gVGhlIHNwaXJpdCBvZiB0aGUgcE5GUyBvcGVyYXRpb25zLCBhdCBsZWFzdCBm
cm9tIFBhbmFzYXMNCj4gPj4+PiBwZXJzcGVjdGl2ZSB3YXMgdGhhdA0KPiA+Pj4+Pj4+PiB0aGUg
ZGF0YSBpcyB0cmFuc2llbnQgdW50aWwgTEFZT1VUQ09NTUlULCBtZWFuaW5nIGl0IG1heSBvciBt
YXkNCj4gPj4+PiBub3QgYmUgdmlzaWJsZQ0KPiA+Pj4+Pj4+PiB0byBjbGllbnRzIG90aGVyIHRo
YW4gdGhlIG9uZSB3aG8gd3JvdGUgaXQsIGFuZCBpdHMgYXNzb2NpYXRlZA0KPiA+Pj4+IG1ldGFk
YXRhIE1VU1QNCj4gPj4+Pj4+Pj4gYmUgdXBkYXRlZCBhbmQgZGVzY3JpYmUgdGhlIG5ldyBkYXRh
IG9ubHkgb24gTEFZT1VUQ09NTUlUIGFuZA0KPiA+Pj4+IHVudGlsIHRoZW4gaXQncw0KPiA+Pj4+
Pj4+PiB1bmRlZmluZWQsIGkuZS4gaXQncyB1cCB0byB0aGUgc2VydmVyIGltcGxlbWVudGF0aW9u
IHdoZXRoZXIgdG8NCj4gPj4+PiB1cGRhdGUgaXQgb3Igbm90Lg0KPiA+Pj4+Pj4+Pg0KPiA+Pj4+
Pj4+PiBXaXRob3V0IGxvY2tpbmcsIHdoYXQgZG8gdGhlIHN0cm9uZ2VyIHNlbWFudGljcyBidXkg
eW91Pw0KPiA+Pj4+Pj4+PiBFdmVuIGlmIGEgY2xpZW50IHZlcmlmaWVkIHRoZSBjaGFuZ2VfYXR0
cmlidXRlIG5ldyBkYXRhIG1heQ0KPiA+Pj4+IGJlY29tZSB2aXNpYmxlDQo+ID4+Pj4+Pj4+IGF0
IGFueSB0aW1lIGFmdGVyIHRoZSBHRVRBVFRSIGlmIHRoZSBmaWxlL2J5dGUgcmFuZ2UgYXJlbid0
DQo+ID4+Pj4gbG9ja2VkLg0KPiA+Pj4+Pj4+DQo+ID4+Pj4+Pj4gVGhlcmUgaXMgbm8gbG9ja2lu
ZyBuZWVkZWQgaW4gdGhlIHNjZW5hcmlvIGJlbG93OiBpdCBpcyBvcmRpbmFyeQ0KPiA+Pj4+Pj4+
IGNsb3NlLXRvLW9wZW4gc2VtYW50aWNzLg0KPiA+Pj4+Pj4+DQo+ID4+Pj4+Pj4gVGhlIHBvaW50
IGlzIHRoYXQgaWYgeW91IHJlbW92ZSB0aGUgb25lIGFuZCBvbmx5IHdheSB0aGF0IGNsaWVudHMN
Cj4gPj4+PiBoYXZlDQo+ID4+Pj4+Pj4gdG8gZGV0ZXJtaW5lIHdoZXRoZXIgb3Igbm90IHRoZWly
IGRhdGEgY2FjaGVzIGFyZSB2YWxpZCwgdGhlbiB0aGV5DQo+ID4+Pj4gY2FuDQo+ID4+Pj4+Pj4g
bm8gbG9uZ2VyIGNhY2hlIGRhdGEgYXQgYWxsLCBhbmQgc2VydmVyIHNjYWxhYmlsaXR5IHdpbGwg
YmUgc2hvdA0KPiA+Pj4+IHRvDQo+ID4+Pj4+Pj4gc21pdGhlcmVlbnMgYW55d2F5Lg0KPiA+Pj4+
Pj4+DQo+ID4+Pj4+Pj4gVHJvbmQNCj4gPj4+Pj4+Pg0KPiA+Pj4+Pj4+PiBCZW5ueQ0KPiA+Pj4+
Pj4+Pg0KPiA+Pj4+Pj4+Pj4NCj4gPj4+Pj4+Pj4+IENsaWVudCAxCQkJQ2xpZW50IDINCj4gPj4+
Pj4+Pj4+ID09PT09PT09CQkJPT09PT09PT0NCj4gPj4+Pj4+Pj4+DQo+ID4+Pj4+Pj4+PiBPUEVO
IGZvbw0KPiA+Pj4+Pj4+Pj4gUkVBRA0KPiA+Pj4+Pj4+Pj4gQ0xPU0UNCj4gPj4+Pj4+Pj4+IAkJ
CQlPUEVODQo+ID4+Pj4+Pj4+PiAJCQkJTEFZT1VUR0VUIC4uLg0KPiA+Pj4+Pj4+Pj4gCQkJCVdS
SVRFIHZpYSBEUw0KPiA+Pj4+Pj4+Pj4gCQkJCTxkaWVzPi4uLg0KPiA+Pj4+Pj4+Pj4gT1BFTiBm
b28NCj4gPj4+Pj4+Pj4+IHZlcmlmeSBjaGFuZ2VfYXR0cg0KPiA+Pj4+Pj4+Pj4gUkVBRCBpZiBh
Ym92ZSBXUklURSBpcyB2aXNpYmxlDQo+ID4+Pj4+Pj4+PiBDTE9TRQ0KPiA+Pj4+Pj4+Pj4NCj4g
Pj4+Pj4+Pj4+IFRyb25kDQo+ID4+Pj4+Pj4+PiBfX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fXw0KPiA+Pj4+Pj4+Pj4gbmZzdjQgbWFpbGluZyBsaXN0DQo+ID4+
Pj4+Pj4+PiBuZnN2NEBpZXRmLm9yZw0KPiA+Pj4+Pj4+Pj4gaHR0cHM6Ly93d3cuaWV0Zi5vcmcv
bWFpbG1hbi9saXN0aW5mby9uZnN2NA0KPiA+Pj4+Pj4+DQo+ID4+Pj4+Pj4NCj4gPj4+Pj4+PiBf
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXw0KPiA+Pj4+Pj4+
IG5mc3Y0IG1haWxpbmcgbGlzdA0KPiA+Pj4+Pj4+IG5mc3Y0QGlldGYub3JnDQo+ID4+Pj4+Pj4g
aHR0cHM6Ly93d3cuaWV0Zi5vcmcvbWFpbG1hbi9saXN0aW5mby9uZnN2NA0KPiA+Pj4+Pj4+DQo+
ID4+Pj4+Pj4NCj4gPj4+Pj4NCj4gPj4+Pj4NCj4gPj4+Pj4gX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX18NCj4gPj4+Pj4gbmZzdjQgbWFpbGluZyBsaXN0DQo+
ID4+Pj4+IG5mc3Y0QGlldGYub3JnDQo+ID4+Pj4+IGh0dHBzOi8vd3d3LmlldGYub3JnL21haWxt
YW4vbGlzdGluZm8vbmZzdjQNCj4gPj4+Pj4NCj4gPj4+Pj4gX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX18NCj4gPj4+Pj4gbmZzdjQgbWFpbGluZyBsaXN0DQo+
ID4+Pj4+IG5mc3Y0QGlldGYub3JnDQo+ID4+Pj4+IGh0dHBzOi8vd3d3LmlldGYub3JnL21haWxt
YW4vbGlzdGluZm8vbmZzdjQNCj4gPj4+Pg0KPiA+Pj4NCj4gPj4+DQo+ID4+DQo+ID4+DQo+ID4+
IC0tDQo+ID4+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1
bnN1YnNjcmliZSBsaW51eC1uZnMiIGluDQo+ID4+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBt
YWpvcmRvbW9Admdlci5rZXJuZWwub3JnDQo+ID4+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0
dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KPiA+DQo+ID4NCj4gPg0K
PiA+IC0tDQo+ID4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUg
InVuc3Vic2NyaWJlIGxpbnV4LW5mcyIgaW4NCj4gPiB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8g
bWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0KPiA+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0
dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KPiANCg0K

2010-07-08 21:16:11

by Trond Myklebust

[permalink] [raw]
Subject: Re: 4.1 client - LAYOUTCOMMIT & close

On Thu, 2010-07-08 at 16:30 -0400, [email protected] wrote:
> > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the client hasn't
> > written to the file. I'm not sure what about the blocks case though, do you
> > implicitly free up any provisionally allocated blocks that the client had not
> > explicitly committed using LAYOUTCOMMIT?
>
> In principle, yes as the blocks are no longer promised to the client, although
> lazy evaluation of this is an obvious optimization.
>
> > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must
> > >> check that it has received LAYOUTCOMMITs from any other clients that may
> > >> have the file open for writing. If it hasn't, then it MUST take some
> > >> action to ensure that any file data changes are accompanied by a change
> > > ^ potentially visible
> > >> attribute update."
> >
> > That should be OK as long as it's not for every GETATTR for the change, mtime,
> > or size attributes.
> >
> > >>
> > >> Then you can add the above suggestion without the offending caveat. Note
> > >> however that it does break the "SHOULD NOT" admonition in section
> > >> 18.32.4.
> >
> > Better be safe than sorry in this rare error case.
>
> I concur with Benny on both of the above - in essence, the unrecovered client failure is a reason to potentially ignore the "SHOULD" (server can't know whether it actually ignored the "SHOULD", hence better safe than sorry). We probably ought to find a someplace appropriate to add a paragraph or two explaining this in one of the 4.2 documents.

Right. I'm only interested in fixing the close-to-open case. The case of
general GETATTR calls might be nice to fix too, but it should not be
essential in order to ensure that well-behaved applications continue to
work as expected.

Note, however, that legacy support for stateless protocols like NFSv2
and NFSv3 may be problematic: there is no equivalent of OPEN, and so the
server may have to do the above check on all NFSPROC2_GETATTR,
NFSPROC3_GETATTR, NFSPROC2_LOOKUP and NFSPROC3_LOOKUP requests.

Trond

> Thanks,
> --David
>
>
> > -----Original Message-----
> > From: Benny Halevy [mailto:[email protected]] On Behalf Of Benny Halevy
> > Sent: Thursday, July 08, 2010 12:00 PM
> > To: Trond Myklebust
> > Cc: Black, David; Noveck, David; Muntz, Daniel; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> >
> > On Jul. 08, 2010, 2:14 +0300, Trond Myklebust <[email protected]> wrote:
> > > On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote:
> > >> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote:
> > >>> On Wed, 2010-07-07 at 18:44 -0400, [email protected] wrote:
> > >>>> Let me try this ...
> > >>>>
> > >>>> A correct client will always send LAYOUTCOMMIT.
> > >>>> Assume that the client is correct.
> > >>>> Hence if the LAYOUTCOMMIT doesn't arrive, something's failed.
> > >>>>
> > >>>> Important implication: No LAYOUTCOMMIT is an error/failure case. It
> > >>>> just has to work; it doesn't have to be fast.
> > >>>>
> >
> > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT if the client hasn't
> > written to the file. I'm not sure what about the blocks case though, do you
> > implicitly free up any provisionally allocated blocks that the client had not
> > explicitly committed using LAYOUTCOMMIT?
> >
> > >>>> Suggestion: If a client dies while holding writeable layouts that permit
> > >>>> write-in-place, and the client doesn't reappear or doesn't reclaim those
> > >>>> layouts, then the server should assume that the files involved were
> > >>>> written before the client died, and set the file attributes accordingly
> > >>>> as part of internally reclaiming the layout that the client has
> > >>>> abandoned.
> >
> > Of course. That's part of the server recovery.
> >
> > >>>>
> > >>>> Caveat: It may take a while for the server to determine that the client
> > >>>> has abandoned a layout.
> >
> > That's two lease times after a respective CB_LAYOUTRECALL.
> >
> > >>>>
> > >>>> This can result in false positives (file appears to be modified when it
> > >>>> wasn't) but won't yield false negatives (file does not appear to be
> > >>>> modified even though it was modified).
> > >>>
> > >>> OK... So we're going to have to turn off client side file caching
> > >>> entirely for pNFS? I can do that...
> > >>>
> > >>> The above won't work. Think readahead...
> > >>
> > >> So... What can work, is if you modify it to work explicitly for
> > >> close-to-open
> > >>
> > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION, the server must
> > >> check that it has received LAYOUTCOMMITs from any other clients that may
> > >> have the file open for writing. If it hasn't, then it MUST take some
> > >> action to ensure that any file data changes are accompanied by a change
> > > ^ potentially visible
> > >> attribute update."
> >
> > That should be OK as long as it's not for every GETATTR for the change, mtime,
> > or size attributes.
> >
> > >>
> > >> Then you can add the above suggestion without the offending caveat. Note
> > >> however that it does break the "SHOULD NOT" admonition in section
> > >> 18.32.4.
> >
> > Better be safe than sorry in this rare error case.
> >
> > Benny
> >
> > >>
> > >> Trond
> > >>
> > >>
> > >>> Trond
> > >>>
> > >>>> Thanks,
> > >>>> --David
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: [email protected] [mailto:[email protected]] On Behalf
> > >>>> Of [email protected]
> > >>>>> Sent: Wednesday, July 07, 2010 6:04 PM
> > >>>>> To: [email protected]; Muntz, Daniel
> > >>>>> Cc: [email protected]; [email protected]; [email protected];
> > >>>> [email protected];
> > >>>>> [email protected]; [email protected]
> > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > >>>>>
> > >>>>>> Yes. I would agree that the client cannot rely on the updates being
> > >>>> made
> > >>>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply
> > >>>> that a
> > >>>>>> compliant server MUST also have a valid strategy for dealing with
> > >>>> the
> > >>>>>> case where the client doesn't send it.
> > >>>>>
> > >>>>> So you are saying the updates "MUST be made visible" through the
> > >>>>> server's valid strategy. Is that right.
> > >>>>>
> > >>>>> And that the client cannot rely on that. Why not, if the server must
> > >>>>> have a valid strategy.
> > >>>>>
> > >>>>> Is this just prudent "belt and suspenders" design or what?
> > >>>>>
> > >>>>> It seems to me that if one side here is MUST (and the spec needs to be
> > >>>>> clearer about what might or might not constitute a valid strategy),
> > >>>> then
> > >>>>> the other side should be SHOULD.
> > >>>>>
> > >>>>> If both sides are "MUST", then if things don't work out then the
> > >>>> client
> > >>>>> and server can equally point to one another and say "It's his fault".
> > >>>>>
> > >>>>> Am I missing something here?
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> -----Original Message-----
> > >>>>> From: [email protected] [mailto:[email protected]] On Behalf
> > >>>>> Of Trond Myklebust
> > >>>>> Sent: Wednesday, July 07, 2010 5:01 PM
> > >>>>> To: Muntz, Daniel
> > >>>>> Cc: [email protected]; [email protected]; [email protected];
> > >>>>> [email protected]; [email protected]; [email protected]
> > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > >>>>>
> > >>>>> On Wed, 2010-07-07 at 16:39 -0400, [email protected] wrote:
> > >>>>>> To bring this discussion full circle, since we agree that a
> > >>>> compliant
> > >>>>>> server can implement a scheme where written data does not become
> > >>>>> visible
> > >>>>>> until after a LAYOUTCOMMIT, do we also agree that LAYOUTCOMMIT is a
> > >>>>>> "MUST" from a compliant client (independent of layout type)?
> > >>>>>
> > >>>>> Yes. I would agree that the client cannot rely on the updates being
> > >>>> made
> > >>>>> visible if it fails to send the LAYOUTCOMMIT. My point was simply that
> > >>>> a
> > >>>>> compliant server MUST also have a valid strategy for dealing with the
> > >>>>> case where the client doesn't send it.
> > >>>>>
> > >>>>> Cheers
> > >>>>> Trond
> > >>>>>
> > >>>>>> -Dan
> > >>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: [email protected] [mailto:[email protected]]
> > >>>>>>> On Behalf Of Trond Myklebust
> > >>>>>>> Sent: Wednesday, July 07, 2010 7:04 AM
> > >>>>>>> To: Benny Halevy
> > >>>>>>> Cc: [email protected]; [email protected]; Garth
> > >>>>>>> Gibson; Brent Welch; NFSv4
> > >>>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > >>>>>>>
> > >>>>>>> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > >>>>>>>> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> > >>>>>>> <[email protected]> wrote:
> > >>>>>>>>> On Wed, 2010-07-07 at 09:06 -0400, Trond Myklebust wrote:
> > >>>>>>>>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > >>>>>>>>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> > >>>>>>> <[email protected]> wrote:
> > >>>>>>>>>>>> On Tue, 2010-07-06 at 15:20 -0400, [email protected]
> > >>>>> wrote:
> > >>>>>>>>>>>>> The COMMIT to the DS, ttbomk, commits data on the DS. I
> > >>>> see it as
> > >>>>>>>>>>>>> orthogonal to updating the metadata on the MDS (but
> > >>>> perhaps I'm wrong).
> > >>>>>>>>>>>>> As sjoshi@bluearc mentioned, the LAYOUTCOMMIT provides a
> > >>>> synchronization
> > >>>>>>>>>>>>> point, so even if the non-clustered server does not want
> > >>>> to update
> > >>>>>>>>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT could also be a
> > >>>> trigger to
> > >>>>>>>>>>>>> execute whatever synchronization mechanism the implementer
> > >>>> wishes to put
> > >>>>>>>>>>>>> in the control protocol.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> As far as I'm aware, there are no exceptions in RFC5661
> > >>>> that would allow
> > >>>>>>>>>>>> pNFS servers to break the rule that any visible change to
> > >>>> the data must
> > >>>>>>>>>>>> be atomically accompanied with a change attribute update.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Trond, I'm not sure how this rule you mentioned is
> > >>>> specified.
> > >>>>>>>>>>>
> > >>>>>>>>>>> See more in section 12.5.4 and 12.5.4.1. LAYOUTCOMMIT and
> > >>>> change/time_modify
> > >>>>>>>>>>> in particular:
> > >>>>>>>>>>>
> > >>>>>>>>>>> For some layout protocols, the storage device is able to
> > >>>> notify the
> > >>>>>>>>>>> metadata server of the occurrence of an I/O; as a result,
> > >>>> the change
> > >>>>>>>>>>> and time_modify attributes may be updated at the metadata
> > >>>> server.
> > >>>>>>>>>>> For a metadata server that is capable of monitoring
> > >>>> updates to the
> > >>>>>>>>>>> change and time_modify attributes, LAYOUTCOMMIT
> > >>>> processing is not
> > >>>>>>>>>>> required to update the change attribute. In this case,
> > >>>> the metadata
> > >>>>>>>>>>> server must ensure that no further update to the data has
> > >>>> occurred
> > >>>>>>>>>>> since the last update of the attributes; file-based
> > >>>> protocols may
> > >>>>>>>>>>> have enough information to make this determination or may
> > >>>> update the
> > >>>>>>>>>>> change attribute upon each file modification. This also
> > >>>> applies for
> > >>>>>>>>>>> the time_modify attribute. If the server implementation
> > >>>> is able to
> > >>>>>>>>>>> determine that the file has not been modified since the
> > >>>> last
> > >>>>>>>>>>> time_modify update, the server need not update
> > >>>> time_modify at
> > >>>>>>>>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion, the updated
> > >>>> attributes
> > >>>>>>>>>>> should be visible if that file was modified since the
> > >>>> latest previous
> > >>>>>>>>>>> LAYOUTCOMMIT or LAYOUTGET
> > >>>>>>>>>>
> > >>>>>>>>>> I know. However the above paragraph does not state that the
> > >>>> server
> > >>>>>>>>>> should make those changes visible to clients other than the
> > >>>> one that is
> > >>>>>>>>>> writing.
> > >>>>>>>>>>
> > >>>>>>>>>> Section 18.32.4 states that writes will cause the
> > >>>> time_modified and
> > >>>>>>>>>> change attributes to be updated (if and only if the file data
> > >>>> is
> > >>>>>>>>>> modified). Several other sections rely on this behaviour,
> > >>>> including
> > >>>>>>>>>> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > >>>>>>>>>>
> > >>>>>>>>>> The only 'special behaviour' that I see allowed for pNFS is
> > >>>> in section
> > >>>>>>>>>> 13.10, which states that clients can't expect to see changes
> > >>>>>>>>>> immediately, but that they must be able to expect
> > >>>> close-to-open
> > >>>>>>>>>> semantics to work. Again, if this is to be the case, then the
> > >>>> server
> > >>>>>>>>>> _must_ be able to deal with the case where client 1 dies
> > >>>> before it can
> > >>>>>>>>>> issue the LAYOUTCOMMIT.
> > >>>>>>>>
> > >>>>>>>> Agreed.
> > >>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>>> As I see it, if your server allows one client to read data
> > >>>> that may have
> > >>>>>>>>>>>> been modified by another client that holds a WRITE layout
> > >>>> for that range
> > >>>>>>>>>>>> then (since that is a visible data change) it should
> > >>>> provide a change
> > >>>>>>>>>>>> attribute update irrespective of whether or not a
> > >>>> LAYOUTCOMMIT has been
> > >>>>>>>>>>>> sent.
> > >>>>>>>>>>>
> > >>>>>>>>>>> the requirement for the server in WRITE's implementation
> > >>>> section
> > >>>>>>>>>>> is quite weak: "It is assumed that the act of writing data
> > >>>> to a file will
> > >>>>>>>>>>> cause the time_modified and change attributes of the file to
> > >>>> be updated."
> > >>>>>>>>>>>
> > >>>>>>>>>>> The difference here is that for pNFS the written data is not
> > >>>> guaranteed
> > >>>>>>>>>>> to be visible until LAYOUTCOMMIT. In a broader sense,
> > >>>> assuming the clients
> > >>>>>>>>>>> are caching dirty data and use a write-behind cache,
> > >>>> application-written data
> > >>>>>>>>>>> may be visible to other processes on the same host but not
> > >>>> to others until
> > >>>>>>>>>>> fsync() or close() - open-to-close semantics are the only
> > >>>> thing the client
> > >>>>>>>>>>> guarantees, right? Issuing LAYOUTCOMMIT on fsync() and
> > >>>> close() ensure the
> > >>>>>>>>>>> data is committed to stable storage and is visible to all
> > >>>> other clients in
> > >>>>>>>>>>> the cluster.
> > >>>>>>>>>>
> > >>>>>>>>>> See above. I'm not disputing your statement that 'the written
> > >>>> data is
> > >>>>>>>>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am
> > >>>> disputing an
> > >>>>>>>>>> assumption that 'the written data may be visible without an
> > >>>> accompanying
> > >>>>>>>>>> change attribute update'.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> In other words, I'd expect the following scenario to give the
> > >>>> same
> > >>>>>>>>> results in NFSv4.1 w/pNFS as it does in NFSv4:
> > >>>>>>>>
> > >>>>>>>> That's a strong requirement that may limit the scalability of
> > >>>> the server.
> > >>>>>>>>
> > >>>>>>>> The spirit of the pNFS operations, at least from Panasas
> > >>>> perspective was that
> > >>>>>>>> the data is transient until LAYOUTCOMMIT, meaning it may or may
> > >>>> not be visible
> > >>>>>>>> to clients other than the one who wrote it, and its associated
> > >>>> metadata MUST
> > >>>>>>>> be updated and describe the new data only on LAYOUTCOMMIT and
> > >>>> until then it's
> > >>>>>>>> undefined, i.e. it's up to the server implementation whether to
> > >>>> update it or not.
> > >>>>>>>>
> > >>>>>>>> Without locking, what do the stronger semantics buy you?
> > >>>>>>>> Even if a client verified the change_attribute new data may
> > >>>> become visible
> > >>>>>>>> at any time after the GETATTR if the file/byte range aren't
> > >>>> locked.
> > >>>>>>>
> > >>>>>>> There is no locking needed in the scenario below: it is ordinary
> > >>>>>>> close-to-open semantics.
> > >>>>>>>
> > >>>>>>> The point is that if you remove the one and only way that clients
> > >>>> have
> > >>>>>>> to determine whether or not their data caches are valid, then they
> > >>>> can
> > >>>>>>> no longer cache data at all, and server scalability will be shot
> > >>>> to
> > >>>>>>> smithereens anyway.
> > >>>>>>>
> > >>>>>>> Trond
> > >>>>>>>
> > >>>>>>>> Benny
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Client 1 Client 2
> > >>>>>>>>> ======== ========
> > >>>>>>>>>
> > >>>>>>>>> OPEN foo
> > >>>>>>>>> READ
> > >>>>>>>>> CLOSE
> > >>>>>>>>> OPEN
> > >>>>>>>>> LAYOUTGET ...
> > >>>>>>>>> WRITE via DS
> > >>>>>>>>> <dies>...
> > >>>>>>>>> OPEN foo
> > >>>>>>>>> verify change_attr
> > >>>>>>>>> READ if above WRITE is visible
> > >>>>>>>>> CLOSE
> > >>>>>>>>>
> > >>>>>>>>> Trond
> > >>>>>>>>> _______________________________________________
> > >>>>>>>>> nfsv4 mailing list
> > >>>>>>>>> [email protected]
> > >>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>> nfsv4 mailing list
> > >>>>>>> [email protected]
> > >>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> nfsv4 mailing list
> > >>>>> [email protected]
> > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> nfsv4 mailing list
> > >>>>> [email protected]
> > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4
> > >>>>
> > >>>
> > >>>
> > >>
> > >>
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > >> the body of a message to [email protected]
> > >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >
> > >
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4



_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-08 23:01:16

by Tom Haynes

[permalink] [raw]
Subject: Re: 4.1 client - LAYOUTCOMMIT & close

On 07/ 8/10 05:12 PM, sfaibish wrote:
> All, After discussing this issue with Dave Noveck and as I mentioned
> in the
> call today I think that this is a serious issue and a disconnect between
> different layout types behavior. My proposal is to have this
> discussion F2F
> in Maastricht on the white board. So I will add an agenda item to the WG
> on this topic. I could address the behavior of the block layout but
> it is not something we want to mimic as we all agreed at cthon to
> avoid the
> LAYOUTCOMMIT as much as possible for file layout. If we solve the
> issue using the proposed mechanism (Trond) we will create a conflict
> with the use of LAYOUTCOMMIT. Just as a hint the difference from block is
> that block uses layout for write and read as different leases and
> when a client has layout for read the server will always send him
> a LAYOUTRETURN when either upgrading his lease to write of send a layout
> for write to another client. We don't want to do same for file, I
> don't think so. My 2c.
>
> /Sorin

When I hear the words "white board", I immediately think unorganized and
likely
to get out of hand. I don't know how much time we are up to now, but we must
be close to running out of it.

I have a counter-proposal, why doesn't someone, say Trond, put together
some slides on this and we discuss them.

Or, if there is a strong consensus that we do need to do this on a white
board, why don't we ask ietf for an additional slot in the morning?
_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-08 23:51:47

by Daniel.Muntz

[permalink] [raw]
Subject: Re: 4.1 client - LAYOUTCOMMIT & close



> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Trond Myklebust
> Sent: Thursday, July 08, 2010 2:16 PM
> To: Black, David
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
>
> On Thu, 2010-07-08 at 16:30 -0400, [email protected] wrote:
> > > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT
> if the client hasn't
> > > written to the file. I'm not sure what about the blocks
> case though, do you
> > > implicitly free up any provisionally allocated blocks
> that the client had not
> > > explicitly committed using LAYOUTCOMMIT?
> >
> > In principle, yes as the blocks are no longer promised to
> the client, although
> > lazy evaluation of this is an obvious optimization.
> >
> > > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION,
> the server must
> > > >> check that it has received LAYOUTCOMMITs from any
> other clients that may
> > > >> have the file open for writing. If it hasn't, then it
> MUST take some
> > > >> action to ensure that any file data changes are
> accompanied by a change
> > > > ^ potentially visible
> > > >> attribute update."
> > >
> > > That should be OK as long as it's not for every GETATTR
> for the change, mtime,
> > > or size attributes.
> > >
> > > >>
> > > >> Then you can add the above suggestion without the
> offending caveat. Note
> > > >> however that it does break the "SHOULD NOT" admonition
> in section
> > > >> 18.32.4.
> > >
> > > Better be safe than sorry in this rare error case.
> >
> > I concur with Benny on both of the above - in essence, the
> unrecovered client failure is a reason to potentially ignore
> the "SHOULD" (server can't know whether it actually ignored
> the "SHOULD", hence better safe than sorry). We probably
> ought to find a someplace appropriate to add a paragraph or
> two explaining this in one of the 4.2 documents.
>
> Right. I'm only interested in fixing the close-to-open case.
> The case of
> general GETATTR calls might be nice to fix too, but it should not be
> essential in order to ensure that well-behaved applications
> continue to
> work as expected.

I think we have close-to-open covered. A client will do a LAYOUTCOMMIT
after the last WRITE before a CLOSE (otherwise it has no guarantee that
the data becomes "visible"). So, written data may be "visible" to other
clients without the change attribute being updated, *but* at CLOSE time
we are guaranteed the change attribute is updated.

In the failure case (client dies before sending LAYOUTCOMMIT and/or
CLOSE), the server will eventually have to close the file. At this
point, the server can, e.g., use its knowledge of the layout(s) that may
have been used by the client to check DSs (via control protocol) to
synthesize the appropriate attributes including change attribute and set
them before completing the server close operation. This is hand-wavy,
but I think there's a way to solve close-to-open without updating the
change attribute with every DS write.

However, I think we may still have a problem when locking/delegations
are combined with client caching and attempting to decouple the DS write
from the change attribute update. I'm still looking into this.

>
> Note, however, that legacy support for stateless protocols like NFSv2
> and NFSv3 may be problematic: there is no equivalent of OPEN,
> and so the
> server may have to do the above check on all NFSPROC2_GETATTR,
> NFSPROC3_GETATTR, NFSPROC2_LOOKUP and NFSPROC3_LOOKUP requests.
>
> Trond
>
> > Thanks,
> > --David
> >
> >
> > > -----Original Message-----
> > > From: Benny Halevy [mailto:[email protected]] On
> Behalf Of Benny Halevy
> > > Sent: Thursday, July 08, 2010 12:00 PM
> > > To: Trond Myklebust
> > > Cc: Black, David; Noveck, David; Muntz, Daniel;
> [email protected]; [email protected];
> > > [email protected]; [email protected]; [email protected]
> > > Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > >
> > > On Jul. 08, 2010, 2:14 +0300, Trond Myklebust
> <[email protected]> wrote:
> > > > On Wed, 2010-07-07 at 19:09 -0400, Trond Myklebust wrote:
> > > >> On Wed, 2010-07-07 at 18:52 -0400, Trond Myklebust wrote:
> > > >>> On Wed, 2010-07-07 at 18:44 -0400, [email protected] wrote:
> > > >>>> Let me try this ...
> > > >>>>
> > > >>>> A correct client will always send LAYOUTCOMMIT.
> > > >>>> Assume that the client is correct.
> > > >>>> Hence if the LAYOUTCOMMIT doesn't arrive, something's failed.
> > > >>>>
> > > >>>> Important implication: No LAYOUTCOMMIT is an
> error/failure case. It
> > > >>>> just has to work; it doesn't have to be fast.
> > > >>>>
> > >
> > > Note that a LAYOUTRETURN can arrive without LAYOUTCOMMIT
> if the client hasn't
> > > written to the file. I'm not sure what about the blocks
> case though, do you
> > > implicitly free up any provisionally allocated blocks
> that the client had not
> > > explicitly committed using LAYOUTCOMMIT?
> > >
> > > >>>> Suggestion: If a client dies while holding writeable
> layouts that permit
> > > >>>> write-in-place, and the client doesn't reappear or
> doesn't reclaim those
> > > >>>> layouts, then the server should assume that the
> files involved were
> > > >>>> written before the client died, and set the file
> attributes accordingly
> > > >>>> as part of internally reclaiming the layout that the
> client has
> > > >>>> abandoned.
> > >
> > > Of course. That's part of the server recovery.
> > >
> > > >>>>
> > > >>>> Caveat: It may take a while for the server to
> determine that the client
> > > >>>> has abandoned a layout.
> > >
> > > That's two lease times after a respective CB_LAYOUTRECALL.
> > >
> > > >>>>
> > > >>>> This can result in false positives (file appears to
> be modified when it
> > > >>>> wasn't) but won't yield false negatives (file does
> not appear to be
> > > >>>> modified even though it was modified).
> > > >>>
> > > >>> OK... So we're going to have to turn off client side
> file caching
> > > >>> entirely for pNFS? I can do that...
> > > >>>
> > > >>> The above won't work. Think readahead...
> > > >>
> > > >> So... What can work, is if you modify it to work explicitly for
> > > >> close-to-open
> > > >>
> > > >> "Upon receiving an OPEN, LOCK or a WANT_DELEGATION,
> the server must
> > > >> check that it has received LAYOUTCOMMITs from any
> other clients that may
> > > >> have the file open for writing. If it hasn't, then it
> MUST take some
> > > >> action to ensure that any file data changes are
> accompanied by a change
> > > > ^ potentially visible
> > > >> attribute update."
> > >
> > > That should be OK as long as it's not for every GETATTR
> for the change, mtime,
> > > or size attributes.
> > >
> > > >>
> > > >> Then you can add the above suggestion without the
> offending caveat. Note
> > > >> however that it does break the "SHOULD NOT" admonition
> in section
> > > >> 18.32.4.
> > >
> > > Better be safe than sorry in this rare error case.
> > >
> > > Benny
> > >
> > > >>
> > > >> Trond
> > > >>
> > > >>
> > > >>> Trond
> > > >>>
> > > >>>> Thanks,
> > > >>>> --David
> > > >>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: [email protected]
> [mailto:[email protected]] On Behalf
> > > >>>> Of [email protected]
> > > >>>>> Sent: Wednesday, July 07, 2010 6:04 PM
> > > >>>>> To: [email protected]; Muntz, Daniel
> > > >>>>> Cc: [email protected]; [email protected];
> [email protected];
> > > >>>> [email protected];
> > > >>>>> [email protected]; [email protected]
> > > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > > >>>>>
> > > >>>>>> Yes. I would agree that the client cannot rely on
> the updates being
> > > >>>> made
> > > >>>>>> visible if it fails to send the LAYOUTCOMMIT. My
> point was simply
> > > >>>> that a
> > > >>>>>> compliant server MUST also have a valid strategy
> for dealing with
> > > >>>> the
> > > >>>>>> case where the client doesn't send it.
> > > >>>>>
> > > >>>>> So you are saying the updates "MUST be made
> visible" through the
> > > >>>>> server's valid strategy. Is that right.
> > > >>>>>
> > > >>>>> And that the client cannot rely on that. Why not,
> if the server must
> > > >>>>> have a valid strategy.
> > > >>>>>
> > > >>>>> Is this just prudent "belt and suspenders" design or what?
> > > >>>>>
> > > >>>>> It seems to me that if one side here is MUST (and
> the spec needs to be
> > > >>>>> clearer about what might or might not constitute a
> valid strategy),
> > > >>>> then
> > > >>>>> the other side should be SHOULD.
> > > >>>>>
> > > >>>>> If both sides are "MUST", then if things don't work
> out then the
> > > >>>> client
> > > >>>>> and server can equally point to one another and say
> "It's his fault".
> > > >>>>>
> > > >>>>> Am I missing something here?
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: [email protected]
> [mailto:[email protected]] On Behalf
> > > >>>>> Of Trond Myklebust
> > > >>>>> Sent: Wednesday, July 07, 2010 5:01 PM
> > > >>>>> To: Muntz, Daniel
> > > >>>>> Cc: [email protected]; [email protected];
> [email protected];
> > > >>>>> [email protected]; [email protected]; [email protected]
> > > >>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > > >>>>>
> > > >>>>> On Wed, 2010-07-07 at 16:39 -0400,
> [email protected] wrote:
> > > >>>>>> To bring this discussion full circle, since we agree that a
> > > >>>> compliant
> > > >>>>>> server can implement a scheme where written data
> does not become
> > > >>>>> visible
> > > >>>>>> until after a LAYOUTCOMMIT, do we also agree that
> LAYOUTCOMMIT is a
> > > >>>>>> "MUST" from a compliant client (independent of
> layout type)?
> > > >>>>>
> > > >>>>> Yes. I would agree that the client cannot rely on
> the updates being
> > > >>>> made
> > > >>>>> visible if it fails to send the LAYOUTCOMMIT. My
> point was simply that
> > > >>>> a
> > > >>>>> compliant server MUST also have a valid strategy
> for dealing with the
> > > >>>>> case where the client doesn't send it.
> > > >>>>>
> > > >>>>> Cheers
> > > >>>>> Trond
> > > >>>>>
> > > >>>>>> -Dan
> > > >>>>>>
> > > >>>>>>> -----Original Message-----
> > > >>>>>>> From: [email protected]
> [mailto:[email protected]]
> > > >>>>>>> On Behalf Of Trond Myklebust
> > > >>>>>>> Sent: Wednesday, July 07, 2010 7:04 AM
> > > >>>>>>> To: Benny Halevy
> > > >>>>>>> Cc: [email protected]; [email protected]; Garth
> > > >>>>>>> Gibson; Brent Welch; NFSv4
> > > >>>>>>> Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close
> > > >>>>>>>
> > > >>>>>>> On Wed, 2010-07-07 at 16:51 +0300, Benny Halevy wrote:
> > > >>>>>>>> On Jul. 07, 2010, 16:18 +0300, Trond Myklebust
> > > >>>>>>> <[email protected]> wrote:
> > > >>>>>>>>> On Wed, 2010-07-07 at 09:06 -0400, Trond
> Myklebust wrote:
> > > >>>>>>>>>> On Wed, 2010-07-07 at 15:05 +0300, Benny Halevy wrote:
> > > >>>>>>>>>>> On Jul. 06, 2010, 23:40 +0300, Trond Myklebust
> > > >>>>>>> <[email protected]> wrote:
> > > >>>>>>>>>>>> On Tue, 2010-07-06 at 15:20 -0400,
> [email protected]
> > > >>>>> wrote:
> > > >>>>>>>>>>>>> The COMMIT to the DS, ttbomk, commits data
> on the DS. I
> > > >>>> see it as
> > > >>>>>>>>>>>>> orthogonal to updating the metadata on the MDS (but
> > > >>>> perhaps I'm wrong).
> > > >>>>>>>>>>>>> As sjoshi@bluearc mentioned, the
> LAYOUTCOMMIT provides a
> > > >>>> synchronization
> > > >>>>>>>>>>>>> point, so even if the non-clustered server
> does not want
> > > >>>> to update
> > > >>>>>>>>>>>>> metadata on every DS I/O, the LAYOUTCOMMIT
> could also be a
> > > >>>> trigger to
> > > >>>>>>>>>>>>> execute whatever synchronization mechanism
> the implementer
> > > >>>> wishes to put
> > > >>>>>>>>>>>>> in the control protocol.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> As far as I'm aware, there are no exceptions
> in RFC5661
> > > >>>> that would allow
> > > >>>>>>>>>>>> pNFS servers to break the rule that any
> visible change to
> > > >>>> the data must
> > > >>>>>>>>>>>> be atomically accompanied with a change
> attribute update.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Trond, I'm not sure how this rule you mentioned is
> > > >>>> specified.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> See more in section 12.5.4 and 12.5.4.1.
> LAYOUTCOMMIT and
> > > >>>> change/time_modify
> > > >>>>>>>>>>> in particular:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> For some layout protocols, the storage
> device is able to
> > > >>>> notify the
> > > >>>>>>>>>>> metadata server of the occurrence of an
> I/O; as a result,
> > > >>>> the change
> > > >>>>>>>>>>> and time_modify attributes may be updated
> at the metadata
> > > >>>> server.
> > > >>>>>>>>>>> For a metadata server that is capable of monitoring
> > > >>>> updates to the
> > > >>>>>>>>>>> change and time_modify attributes, LAYOUTCOMMIT
> > > >>>> processing is not
> > > >>>>>>>>>>> required to update the change attribute.
> In this case,
> > > >>>> the metadata
> > > >>>>>>>>>>> server must ensure that no further update
> to the data has
> > > >>>> occurred
> > > >>>>>>>>>>> since the last update of the attributes; file-based
> > > >>>> protocols may
> > > >>>>>>>>>>> have enough information to make this
> determination or may
> > > >>>> update the
> > > >>>>>>>>>>> change attribute upon each file
> modification. This also
> > > >>>> applies for
> > > >>>>>>>>>>> the time_modify attribute. If the server
> implementation
> > > >>>> is able to
> > > >>>>>>>>>>> determine that the file has not been
> modified since the
> > > >>>> last
> > > >>>>>>>>>>> time_modify update, the server need not update
> > > >>>> time_modify at
> > > >>>>>>>>>>> LAYOUTCOMMIT. At LAYOUTCOMMIT completion,
> the updated
> > > >>>> attributes
> > > >>>>>>>>>>> should be visible if that file was
> modified since the
> > > >>>> latest previous
> > > >>>>>>>>>>> LAYOUTCOMMIT or LAYOUTGET
> > > >>>>>>>>>>
> > > >>>>>>>>>> I know. However the above paragraph does not
> state that the
> > > >>>> server
> > > >>>>>>>>>> should make those changes visible to clients
> other than the
> > > >>>> one that is
> > > >>>>>>>>>> writing.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Section 18.32.4 states that writes will cause the
> > > >>>> time_modified and
> > > >>>>>>>>>> change attributes to be updated (if and only
> if the file data
> > > >>>> is
> > > >>>>>>>>>> modified). Several other sections rely on this
> behaviour,
> > > >>>> including
> > > >>>>>>>>>> section 10.3.1, section 11.7.2.2, and section 11.7.7.
> > > >>>>>>>>>>
> > > >>>>>>>>>> The only 'special behaviour' that I see
> allowed for pNFS is
> > > >>>> in section
> > > >>>>>>>>>> 13.10, which states that clients can't expect
> to see changes
> > > >>>>>>>>>> immediately, but that they must be able to expect
> > > >>>> close-to-open
> > > >>>>>>>>>> semantics to work. Again, if this is to be the
> case, then the
> > > >>>> server
> > > >>>>>>>>>> _must_ be able to deal with the case where
> client 1 dies
> > > >>>> before it can
> > > >>>>>>>>>> issue the LAYOUTCOMMIT.
> > > >>>>>>>>
> > > >>>>>>>> Agreed.
> > > >>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>>> As I see it, if your server allows one
> client to read data
> > > >>>> that may have
> > > >>>>>>>>>>>> been modified by another client that holds a
> WRITE layout
> > > >>>> for that range
> > > >>>>>>>>>>>> then (since that is a visible data change) it should
> > > >>>> provide a change
> > > >>>>>>>>>>>> attribute update irrespective of whether or not a
> > > >>>> LAYOUTCOMMIT has been
> > > >>>>>>>>>>>> sent.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> the requirement for the server in WRITE's
> implementation
> > > >>>> section
> > > >>>>>>>>>>> is quite weak: "It is assumed that the act of
> writing data
> > > >>>> to a file will
> > > >>>>>>>>>>> cause the time_modified and change attributes
> of the file to
> > > >>>> be updated."
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> The difference here is that for pNFS the
> written data is not
> > > >>>> guaranteed
> > > >>>>>>>>>>> to be visible until LAYOUTCOMMIT. In a broader sense,
> > > >>>> assuming the clients
> > > >>>>>>>>>>> are caching dirty data and use a write-behind cache,
> > > >>>> application-written data
> > > >>>>>>>>>>> may be visible to other processes on the same
> host but not
> > > >>>> to others until
> > > >>>>>>>>>>> fsync() or close() - open-to-close semantics
> are the only
> > > >>>> thing the client
> > > >>>>>>>>>>> guarantees, right? Issuing LAYOUTCOMMIT on
> fsync() and
> > > >>>> close() ensure the
> > > >>>>>>>>>>> data is committed to stable storage and is
> visible to all
> > > >>>> other clients in
> > > >>>>>>>>>>> the cluster.
> > > >>>>>>>>>>
> > > >>>>>>>>>> See above. I'm not disputing your statement
> that 'the written
> > > >>>> data is
> > > >>>>>>>>>> not guaranteed to be visible until LAYOUTCOMMIT'. I am
> > > >>>> disputing an
> > > >>>>>>>>>> assumption that 'the written data may be
> visible without an
> > > >>>> accompanying
> > > >>>>>>>>>> change attribute update'.
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> In other words, I'd expect the following
> scenario to give the
> > > >>>> same
> > > >>>>>>>>> results in NFSv4.1 w/pNFS as it does in NFSv4:
> > > >>>>>>>>
> > > >>>>>>>> That's a strong requirement that may limit the
> scalability of
> > > >>>> the server.
> > > >>>>>>>>
> > > >>>>>>>> The spirit of the pNFS operations, at least from Panasas
> > > >>>> perspective was that
> > > >>>>>>>> the data is transient until LAYOUTCOMMIT,
> meaning it may or may
> > > >>>> not be visible
> > > >>>>>>>> to clients other than the one who wrote it, and
> its associated
> > > >>>> metadata MUST
> > > >>>>>>>> be updated and describe the new data only on
> LAYOUTCOMMIT and
> > > >>>> until then it's
> > > >>>>>>>> undefined, i.e. it's up to the server
> implementation whether to
> > > >>>> update it or not.
> > > >>>>>>>>
> > > >>>>>>>> Without locking, what do the stronger semantics buy you?
> > > >>>>>>>> Even if a client verified the change_attribute
> new data may
> > > >>>> become visible
> > > >>>>>>>> at any time after the GETATTR if the file/byte
> range aren't
> > > >>>> locked.
> > > >>>>>>>
> > > >>>>>>> There is no locking needed in the scenario below:
> it is ordinary
> > > >>>>>>> close-to-open semantics.
> > > >>>>>>>
> > > >>>>>>> The point is that if you remove the one and only
> way that clients
> > > >>>> have
> > > >>>>>>> to determine whether or not their data caches are
> valid, then they
> > > >>>> can
> > > >>>>>>> no longer cache data at all, and server
> scalability will be shot
> > > >>>> to
> > > >>>>>>> smithereens anyway.
> > > >>>>>>>
> > > >>>>>>> Trond
> > > >>>>>>>
> > > >>>>>>>> Benny
> > > >>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Client 1 Client 2
> > > >>>>>>>>> ======== ========
> > > >>>>>>>>>
> > > >>>>>>>>> OPEN foo
> > > >>>>>>>>> READ
> > > >>>>>>>>> CLOSE
> > > >>>>>>>>> OPEN
> > > >>>>>>>>> LAYOUTGET ...
> > > >>>>>>>>> WRITE via DS
> > > >>>>>>>>> <dies>...
> > > >>>>>>>>> OPEN foo
> > > >>>>>>>>> verify change_attr
> > > >>>>>>>>> READ if above WRITE is visible
> > > >>>>>>>>> CLOSE
> > > >>>>>>>>>
> > > >>>>>>>>> Trond
> > > >>>>>>>>> _______________________________________________
> > > >>>>>>>>> nfsv4 mailing list
> > > >>>>>>>>> [email protected]
> > > >>>>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> _______________________________________________
> > > >>>>>>> nfsv4 mailing list
> > > >>>>>>> [email protected]
> > > >>>>>>> https://www.ietf.org/mailman/listinfo/nfsv4
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> _______________________________________________
> > > >>>>> nfsv4 mailing list
> > > >>>>> [email protected]
> > > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4
> > > >>>>>
> > > >>>>> _______________________________________________
> > > >>>>> nfsv4 mailing list
> > > >>>>> [email protected]
> > > >>>>> https://www.ietf.org/mailman/listinfo/nfsv4
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> To unsubscribe from this list: send the line
> "unsubscribe linux-nfs" in
> > > >> the body of a message to [email protected]
> > > >> More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> > > >
> > > >
> > > >
> > > > --
> > > > To unsubscribe from this list: send the line
> "unsubscribe linux-nfs" in
> > > > the body of a message to [email protected]
> > > > More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> > >
> >
> > _______________________________________________
> > nfsv4 mailing list
> > [email protected]
> > https://www.ietf.org/mailman/listinfo/nfsv4
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-08 23:57:37

by Sorin Faibish

[permalink] [raw]
Subject: Re: 4.1 client - LAYOUTCOMMIT & close

On Thu, 08 Jul 2010 19:01:16 -0400, Tom Haynes <[email protected]> =

wrote:

> On 07/ 8/10 05:12 PM, sfaibish wrote:
>> All, After discussing this issue with Dave Noveck and as I mentioned in =

>> the
>> call today I think that this is a serious issue and a disconnect between
>> different layout types behavior. My proposal is to have this discussion =

>> F2F
>> in Maastricht on the white board. So I will add an agenda item to the WG
>> on this topic. I could address the behavior of the block layout but
>> it is not something we want to mimic as we all agreed at cthon to avoid =

>> the
>> LAYOUTCOMMIT as much as possible for file layout. If we solve the
>> issue using the proposed mechanism (Trond) we will create a conflict
>> with the use of LAYOUTCOMMIT. Just as a hint the difference from block =

>> is
>> that block uses layout for write and read as different leases and
>> when a client has layout for read the server will always send him
>> a LAYOUTRETURN when either upgrading his lease to write of send a layout
>> for write to another client. We don't want to do same for file, I
>> don't think so. My 2c.
>>
>> /Sorin
>
> When I hear the words "white board", I immediately think unorganized and =

> likely
> to get out of hand. I don't know how much time we are up to now, but we =

> must
> be close to running out of it.
>
> I have a counter-proposal, why doesn't someone, say Trond, put together
> some slides on this and we discuss them.
Agreed. This is what I thought about "white board" a presentation followed =

by
a discussion on plan of action, perhaps a new 4.2 draft if there is a need.
We can continue it in the email after we decide what to do.

>
> Or, if there is a strong consensus that we do need to do this on a white
> board, why don't we ask ietf for an additional slot in the morning?
My bad using the wrong term. I don't think we need a special time slot
but we can decide on the spot in Maastricht. We should be able to find a
room available.

>
>



-- =

Best Regards

Sorin Faibish
Corporate Distinguished Engineer
Network Storage Group

EMC=B2
where information lives

Phone: 508-435-1000 x 48545
Cellphone: 617-510-0422
Email : [email protected]
_______________________________________________
nfsv4 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nfsv4

2010-07-09 00:41:18

by Trond Myklebust

[permalink] [raw]
Subject: Re: [nfsv4] 4.1 client - LAYOUTCOMMIT & close

On Thu, 2010-07-08 at 18:01 -0500, Tom Haynes wrote:

> I have a counter-proposal, why doesn't someone, say Trond, put together
> some slides on this and we discuss them.

Say who, what???? :-)

OK. I can put something together, but it will take 5-10 minutes of
meeting time (10 being the more realistic estimate).

Trond


2010-07-02 15:41:15

by Andy Adamson

[permalink] [raw]
Subject: Re: 4.1 client - LAYOUTCOMMIT & close


On Jul 1, 2010, at 8:07 PM, Sandeep Joshi wrote:

Hi Sandeep

>
> In certain cases, I don't see layoutcommit on a file at all even
> after doing many writes.

FYI:

You should not be paying attention to layoutcommits - they have no
value for the file layout type.

From RFC 5661:

"The LAYOUTCOMMIT operation commits chages in the layout represented
by the current filehandle, client ID (derived from the session ID in
the preceding SEQUENCE operation), byte-range, and stateid."

For the block layout type, this sentence has meaning in that there is
a layoutupdate4 payload that enumerates the blocks that have changed
state from being 'handed out' to being 'written'.

The file layout type has no layoutupdate4 payload, and the layout does
not change due to writes, and thus the LAYOUTCOMMIT call is useless.

The only field in the LAYOUTCOMMIT4args that might possibly be useful
is the loca_last_write_offset which tells the server what the client
thinks is the EOF of the file after WRITE. It is an extremely lame
server (file layout type server) that depends upon clients for this
info.

>
>
>
> Client side operations:
>
> open
> write(s)
> close
>
>
> On server side (observed operations):
>
> open
> layoutget's
> close
>
>
> But, I do not see laycommit at all. In terms data written by client
> it is about 4-5MB.
>
> When does client issue laycommit?

The latest linux client sends a layout commit when the VFS does a
super_operations.write_inode call which happens when the metadata of
an inode needs updating. We are seriously considering removing the
layoutcommit call from the file layout client.

-->Andy

>
>
> regards,
>
> Sandeep
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html