2019-02-25 08:49:42

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: [nfsv4] file size and getattr



----- Original Message -----
> From: "Dave Noveck" <[email protected]>
> To: "Marc Eshel" <[email protected]>
> Cc: "linux-nfs" <[email protected]>, "Ganesha-devel" <[email protected]>, "NFSv4" <[email protected]>,
> "linux-nfs-owner" <[email protected]>
> Sent: Saturday, February 23, 2019 12:38:56 AM
> Subject: Re: [nfsv4] file size and getattr

>> Does the NFSv4 spec allows the server to return file size that doesn't
>> include the unstable write to the writer or any other NFS client?
>
> I would say "no". Consider the followiing sentence in the description of
> COMMIT.
>

I would say "yes". Let's have a look at LAYOUTCOMMIT. Spec says:

"The metadata server may
use this information to determine whether the file's size needs to be
updated. If the metadata server updates the file's size as the
result of the LAYOUTCOMMIT operation, it must return the new size
(locr_newsize.ns_size) as part of the results."

I read this as: Metadata server may not know real file size until client sends
the file size information. Our DSes return DATA_SYNC and relay on client to
issue LAYOUTCOMMIT to update file size (well, on close we force data servers
up update file anyway).

Tigran.


>> If the count is zero, a flush from the offset to the end of the file is
> done.
>
> Note that he size returned by GETATTR gives you the end of the file so
> that, if it did not
> reflect the unstable writes, COMMIT wouldn't work right.
>
> On Fri, Feb 22, 2019 at 6:25 PM Marc Eshel <[email protected]> wrote:
>
>> What is the file size returned by the NFS server for getattr after an
>> unstable write to the NFS client that did the write or to other NFS
>> clients.
>>
>> As far as I know most file systems will always return the file size that
>> includes the unstable writes.
>>
>> Does the NFSv4 spec allows the server to return file size that doesn't
>> include the unstable write to the writer or any other NFS client?
>>
>> Marc.
>>
>>
>
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4


2019-02-26 02:54:52

by Rick Macklem

[permalink] [raw]
Subject: Re: [nfsv4] file size and getattr

Mkrtchyan, Tigran wrote:
>> From: "Dave Noveck" <[email protected]>
>> To: "Marc Eshel" <[email protected]>
>> Cc: "linux-nfs" <[email protected]>, "Ganesha-devel" <[email protected]>, "NFSv4" <[email protected]>,
>> "linux-nfs-owner" <[email protected]>
>> Sent: Saturday, February 23, 2019 12:38:56 AM
>> Subject: Re: [nfsv4] file size and getattr
>>
>>> Does the NFSv4 spec allows the server to return file size that doesn't
>>> include the unstable write to the writer or any other NFS client?
>>
>> I would say "no". Consider the followiing sentence in the description of
>> COMMIT.
>>
>
>I would say "yes". Let's have a look at LAYOUTCOMMIT. Spec says:
>
>"The metadata server may
> use this information to determine whether the file's size needs to be
> updated. If the metadata server updates the file's size as the
> result of the LAYOUTCOMMIT operation, it must return the new size
> (locr_newsize.ns_size) as part of the results."
>
>I read this as: Metadata server may not know real file size until client sends
>the file size information. Our DSes return DATA_SYNC and relay on client to
>issue LAYOUTCOMMIT to update file size (well, on close we force data servers
>up update file anyway).

For the pNFS case, when I implemented a pNFS server for FreeBSD, I assumed
that the Metadata server only needed to return a correct size for a file
to a client doing a Getattr with a read/write layout after it did a LayoutCommit.
This implementation worked ok for the FreeBSD client, but did not work
correctly for the Linux-4.17-rc2 kernel.
To fix the server implementation to interoperate with the above Linux kernel,
I had to add a tunable that makes the server get an up to date size for the
Getattr operation for a client when a read/write layout is issued to the client.
Just to be clear, I am referring to the case where the Getattr was done by the
client that holds the read/write layout and not another client.
(This does result in additional overheads whenever a client holds a read/write
layout for the file.)
I can't remember exactly how the Linux client failed, but I suspect it would
see a premature EOF when the size returned by the MDS was smaller than
the actual size of the file on the DS.

I honestly don't know if this is a bug in the Linux client or a misinterpretation
of RFC5661?

rick


2019-02-26 03:23:46

by Rick Macklem

[permalink] [raw]
Subject: Re: [nfsv4] file size and getattr

I wrote:
[stuff snipped]
>For the pNFS case, when I implemented a pNFS server for FreeBSD, I assumed
>that the Metadata server only needed to return a correct size for a file
>to a client doing a Getattr with a read/write layout after it did a LayoutCommit.
>This implementation worked ok for the FreeBSD client, but did not work
>correctly for the Linux-4.17-rc2 kernel.
>To fix the server implementation to interoperate with the above Linux kernel,
>I had to add a tunable that makes the server get an up to date size for the
>Getattr operation for a client when a read/write layout is issued to the client.
>Just to be clear, I am referring to the case where the Getattr was done by the
>client that holds the read/write layout and not another client.
>(This does result in additional overheads whenever a client holds a read/write
> layout for the file.)
>I can't remember exactly how the Linux client failed, but I suspect it would
>see a premature EOF when the size returned by the MDS was smaller than
>the actual size of the file on the DS.
>
>I honestly don't know if this is a bug in the Linux client or a misinterpretation
>of RFC5661?
A little more info on this.

If I recall correctly, the Linux client only did LayoutCommits after Commits.
As such, no LayoutCommit were done after writes to a DS that were FILE_SYNC
(or stable, if you prefer). Without the LayoutCommit, the pNFS server didn't
know when it needed to get an up to date size.
The FreeBSD client does a LayoutCommit after writing to a DS whenever it is done
writing, such as an fsync() or close() syscall on the file or an unlock of a write lock
on the file.

rick


2019-02-26 12:49:13

by Trond Myklebust

[permalink] [raw]
Subject: Re: [nfsv4] file size and getattr

Hi Rick,

On Tue, 2019-02-26 at 03:23 +0000, Rick Macklem wrote:
>
> If I recall correctly, the Linux client only did LayoutCommits after
> Commits.
> As such, no LayoutCommit were done after writes to a DS that were
> FILE_SYNC
> (or stable, if you prefer). Without the LayoutCommit, the pNFS server
> didn't
> know when it needed to get an up to date size.
> The FreeBSD client does a LayoutCommit after writing to a DS whenever
> it is done
> writing, such as an fsync() or close() syscall on the file or an
> unlock of a write lock
> on the file.
>

Please see the Errata ID 2751 http://www.rfc-editor.org/errata/eid2751

Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-02-27 00:13:05

by Rick Macklem

[permalink] [raw]
Subject: Re: [nfsv4] file size and getattr

Trond Myklebust wrote:
[stuff snipped]
> Please see the Errata ID 2751 http://www.rfc-editor.org/errata/eid2751

I'll admit I hadn't seen this errata before. However, it seems to be specific to
the File Layout. For the Flexible File Layout...

When I look in RFC-8435, I cannot find anything that states that a LayoutCommit
is only required for case(s) where a Commit to the Storage Server is required.
Sec. 2.1
Clearly states that a Commit to the Storage Server is required before the client
does a LayoutCommit when the write(s) were not done FILE_SYNC.
However, I do not see any indication that the LayoutCommit is not to be done
for the case where the write(s) are done FILE_SYNC.

FF_FLAGS_NO_LAYOUTCOMMIT can be used to indicate to a client that
LayoutCommits are not required, but this does not be dependent on how
the write(s) to the Storage Server were done.

The only way a Flexible File layout Metadata server can know what the
current file size is (when a read/write layout is issued to a client) is to do a
Getattr to the Storage Server.
If a client is not required to do a LayoutCommit when the write(s) to the
Storage Server are done FILE_SYNC, then the Metadata server must do
Getattr RPCs to the Storage Server whenever it needs an up to date file size
if a read/write layout is issued to a client.

This can result in a lot of overhead that can be avoided by requiring the
LayoutCommit to be done by a client after writing to a Storage Server,
irrespective of the need for a Commit to the Storage Server.
As such, I would rather not have this errata applied to RFC-8435.

rick


2019-02-27 01:12:21

by Trond Myklebust

[permalink] [raw]
Subject: Re: [nfsv4] file size and getattr

On Wed, 2019-02-27 at 00:13 +0000, Rick Macklem wrote:
> Trond Myklebust wrote:
> [stuff snipped]
> > Please see the Errata ID 2751
> > http://www.rfc-editor.org/errata/eid2751
>
> I'll admit I hadn't seen this errata before. However, it seems to be
> specific to
> the File Layout. For the Flexible File Layout...
>
> When I look in RFC-8435, I cannot find anything that states that a
> LayoutCommit
> is only required for case(s) where a Commit to the Storage Server is
> required.
> Sec. 2.1
> Clearly states that a Commit to the Storage Server is required
> before the client
> does a LayoutCommit when the write(s) were not done FILE_SYNC.
> However, I do not see any indication that the LayoutCommit is not
> to be done
> for the case where the write(s) are done FILE_SYNC.
>
> FF_FLAGS_NO_LAYOUTCOMMIT can be used to indicate to a client that
> LayoutCommits are not required, but this does not be dependent on how
> the write(s) to the Storage Server were done.
>
> The only way a Flexible File layout Metadata server can know what the
> current file size is (when a read/write layout is issued to a client)
> is to do a
> Getattr to the Storage Server.
> If a client is not required to do a LayoutCommit when the write(s) to
> the
> Storage Server are done FILE_SYNC, then the Metadata server must do
> Getattr RPCs to the Storage Server whenever it needs an up to date
> file size
> if a read/write layout is issued to a client.
>
> This can result in a lot of overhead that can be avoided by requiring
> the
> LayoutCommit to be done by a client after writing to a Storage
> Server,
> irrespective of the need for a Commit to the Storage Server.
> As such, I would rather not have this errata applied to RFC-8435.
>

Fair enough. I agree that the errata in question only applies to the
pNFS files layout, however you were talking about RFC5661 and whether
or not we were interpreting that correctly. Since RFC5661 only refers
to about the behaviour of the pNFS files layout, then I assumed that
was what you were referring to.

For flexfiles we may have a bug in the layoutcommit case. However note
that the counter argument to what you state above is that _if_ the
server requires a layoutcommit before it will acknowledge a file size
change, then pNFS is likely to underperform for applications such as
databases or VMs where each record is required to be written in stable
mode.
IOW: If all writes that need to be stable are also required to be
acknowledged with a layoutcommit (to the MDS), then your ability to
scale out your server will be in doubt.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]