2015-01-07 20:04:44

by Chuck Lever

[permalink] [raw]
Subject: close(2) behavior when client holds a write delegation

Hi-

Dai noticed that when a 3.17 Linux NFS client is granted a
write delegation, it neglects to flush dirty data synchronously
with close(2). The data is flushed asynchronously, and close(2)
completes immediately. Normally that?s OK. But Dai observed that:

1. If the server can?t accommodate the dirty data (eg ENOSPC or
EIO) the application is not notified, even via close(2) return
code.

2. If the server is down, the application does not hang, but it
can leave dirty data in the client?s page cache with no
indication to applications or administrators.

The disposition of that data remains unknown even if a umount
is attempted. While the server is down, the umount will hang
trying to flush that data without giving an indication of why.

3. If a shutdown is attempted while the server is down and there
is a pending flush, the shutdown will hang, even though there
are no running applications with open files.

4. The behavior is non-deterministic from the application?s
perspective. It occurs only if the server has granted a write
delegation for that file; otherwise close(2) behaves like it
does for NFSv2/3 or NFSv4 without a delegation present
(close(2) waits synchronously for the flush to complete).

Should close(2) wait synchronously for a data flush even in the
presence of a write delegation?

It?s certainly reasonable for umount to try hard to flush pinned
data, but that makes shutdown unreliable.

Thanks for any thoughts!

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2015-01-08 00:05:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: close(2) behavior when client holds a write delegation

On Wed, Jan 7, 2015 at 12:04 PM, Chuck Lever <[email protected]> wrote:
> Hi-
>
> Dai noticed that when a 3.17 Linux NFS client is granted a
> write delegation, it neglects to flush dirty data synchronously
> with close(2). The data is flushed asynchronously, and close(2)
> completes immediately. Normally that’s OK. But Dai observed that:
>
> 1. If the server can’t accommodate the dirty data (eg ENOSPC or
> EIO) the application is not notified, even via close(2) return
> code.
>
> 2. If the server is down, the application does not hang, but it
> can leave dirty data in the client’s page cache with no
> indication to applications or administrators.
>
> The disposition of that data remains unknown even if a umount
> is attempted. While the server is down, the umount will hang
> trying to flush that data without giving an indication of why.
>
> 3. If a shutdown is attempted while the server is down and there
> is a pending flush, the shutdown will hang, even though there
> are no running applications with open files.
>
> 4. The behavior is non-deterministic from the application’s
> perspective. It occurs only if the server has granted a write
> delegation for that file; otherwise close(2) behaves like it
> does for NFSv2/3 or NFSv4 without a delegation present
> (close(2) waits synchronously for the flush to complete).
>
> Should close(2) wait synchronously for a data flush even in the
> presence of a write delegation?
>
> It’s certainly reasonable for umount to try hard to flush pinned
> data, but that makes shutdown unreliable.

We should probably start paying more attention to the "space_limit"
field in the write delegation. That field is supposed to tell the
client precisely how much data it is allowed to cache on close().

--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2015-01-08 01:11:31

by Thomas Haynes

[permalink] [raw]
Subject: Re: close(2) behavior when client holds a write delegation

Adding NFSv4 WG ....

On Wed, Jan 07, 2015 at 04:05:43PM -0800, Trond Myklebust wrote:
> On Wed, Jan 7, 2015 at 12:04 PM, Chuck Lever <[email protected]> wrote:
> > Hi-
> >
> > Dai noticed that when a 3.17 Linux NFS client is granted a

Hi, is this new behavior for 3.17 or does it happen to prior
versions as well?

> > write delegation, it neglects to flush dirty data synchronously
> > with close(2). The data is flushed asynchronously, and close(2)
> > completes immediately. Normally that’s OK. But Dai observed that:
> >
> > 1. If the server can’t accommodate the dirty data (eg ENOSPC or
> > EIO) the application is not notified, even via close(2) return
> > code.
> >
> > 2. If the server is down, the application does not hang, but it
> > can leave dirty data in the client’s page cache with no
> > indication to applications or administrators.
> >
> > The disposition of that data remains unknown even if a umount
> > is attempted. While the server is down, the umount will hang
> > trying to flush that data without giving an indication of why.
> >
> > 3. If a shutdown is attempted while the server is down and there
> > is a pending flush, the shutdown will hang, even though there
> > are no running applications with open files.
> >
> > 4. The behavior is non-deterministic from the application’s
> > perspective. It occurs only if the server has granted a write
> > delegation for that file; otherwise close(2) behaves like it
> > does for NFSv2/3 or NFSv4 without a delegation present
> > (close(2) waits synchronously for the flush to complete).
> >
> > Should close(2) wait synchronously for a data flush even in the
> > presence of a write delegation?
> >
> > It’s certainly reasonable for umount to try hard to flush pinned
> > data, but that makes shutdown unreliable.
>
> We should probably start paying more attention to the "space_limit"
> field in the write delegation. That field is supposed to tell the
> client precisely how much data it is allowed to cache on close().
>

Sure, but what does that mean?

Is the space_limit supposed to be on the file or the amount of data that
can be cached by the client?

Note that Spencer Dawkins effectively asked this question a couple of years ago:

| In this text:
|
| 15.18.3. RESULT
|
| nfs_space_limit4
| space_limit; /* Defines condition that
| the client must check to
| determine whether the
| file needs to be flushed
| to the server on close. */
|
| I'm no expert, but could I ask you to check whether this is the right
| description for this struct? nfs_space_limit4 looks like it's either
| a file size or a number of blocks, and I wasn't understanding how that
| was a "condition" or how the limit had anything to do with flushing a
| file to the server on close, so I'm wondering about a cut-and-paste error.
|

Does any server set the space_limit?

And to what?

Note, it seems that OpenSolaris does set it to be NFS_LIMIT_SIZE and
UINT64_MAX. Which means that it is effectively saying that the client
is guaranteed a lot of space. :-)


2015-01-08 02:58:21

by Trond Myklebust

[permalink] [raw]
Subject: Re: close(2) behavior when client holds a write delegation

On Wed, Jan 7, 2015 at 5:11 PM, Tom Haynes
<[email protected]> wrote:
> Adding NFSv4 WG ....
>
> On Wed, Jan 07, 2015 at 04:05:43PM -0800, Trond Myklebust wrote:
>> On Wed, Jan 7, 2015 at 12:04 PM, Chuck Lever <[email protected]> wrote:
>> > Hi-
>> >
>> > Dai noticed that when a 3.17 Linux NFS client is granted a
>
> Hi, is this new behavior for 3.17 or does it happen to prior
> versions as well?
>
>> > write delegation, it neglects to flush dirty data synchronously
>> > with close(2). The data is flushed asynchronously, and close(2)
>> > completes immediately. Normally that’s OK. But Dai observed that:
>> >
>> > 1. If the server can’t accommodate the dirty data (eg ENOSPC or
>> > EIO) the application is not notified, even via close(2) return
>> > code.
>> >
>> > 2. If the server is down, the application does not hang, but it
>> > can leave dirty data in the client’s page cache with no
>> > indication to applications or administrators.
>> >
>> > The disposition of that data remains unknown even if a umount
>> > is attempted. While the server is down, the umount will hang
>> > trying to flush that data without giving an indication of why.
>> >
>> > 3. If a shutdown is attempted while the server is down and there
>> > is a pending flush, the shutdown will hang, even though there
>> > are no running applications with open files.
>> >
>> > 4. The behavior is non-deterministic from the application’s
>> > perspective. It occurs only if the server has granted a write
>> > delegation for that file; otherwise close(2) behaves like it
>> > does for NFSv2/3 or NFSv4 without a delegation present
>> > (close(2) waits synchronously for the flush to complete).
>> >
>> > Should close(2) wait synchronously for a data flush even in the
>> > presence of a write delegation?
>> >
>> > It’s certainly reasonable for umount to try hard to flush pinned
>> > data, but that makes shutdown unreliable.
>>
>> We should probably start paying more attention to the "space_limit"
>> field in the write delegation. That field is supposed to tell the
>> client precisely how much data it is allowed to cache on close().
>>
>
> Sure, but what does that mean?
>
> Is the space_limit supposed to be on the file or the amount of data that
> can be cached by the client?
>
> Note that Spencer Dawkins effectively asked this question a couple of years ago:
>
> | In this text:
> |
> | 15.18.3. RESULT
> |
> | nfs_space_limit4
> | space_limit; /* Defines condition that
> | the client must check to
> | determine whether the
> | file needs to be flushed
> | to the server on close. */
> |
> | I'm no expert, but could I ask you to check whether this is the right
> | description for this struct? nfs_space_limit4 looks like it's either
> | a file size or a number of blocks, and I wasn't understanding how that
> | was a "condition" or how the limit had anything to do with flushing a
> | file to the server on close, so I'm wondering about a cut-and-paste error.
> |
>
> Does any server set the space_limit?
>
> And to what?
>
> Note, it seems that OpenSolaris does set it to be NFS_LIMIT_SIZE and
> UINT64_MAX. Which means that it is effectively saying that the client
> is guaranteed a lot of space. :-)

Yes... Unless they plan to never return NFS4ERR_NOSPC, then that
suggests we should probably file an errata deprecating the feature
altogether.

--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2015-01-08 03:13:08

by Dai Ngo

[permalink] [raw]
Subject: Re: close(2) behavior when client holds a write delegation

On 1/7/15 5:11 PM, Tom Haynes wrote:
> Adding NFSv4 WG ....
>
> On Wed, Jan 07, 2015 at 04:05:43PM -0800, Trond Myklebust wrote:
>> On Wed, Jan 7, 2015 at 12:04 PM, Chuck Lever <[email protected]> wrote:
>>> Hi-
>>>
>>> Dai noticed that when a 3.17 Linux NFS client is granted a
> Hi, is this new behavior for 3.17 or does it happen to prior
> versions as well?
Same behavior was observed in 3.16:

aus-x4170m2-02# uname -a
Linux aus-x4170m2-02 3.16.0-00034-ga1caddc #5 SMP Fri Sep 19 11:36:14
MDT 2014 x86_64 x86_64 x86_64 GNU/Linux

-Dai
>
>>> write delegation, it neglects to flush dirty data synchronously
>>> with close(2). The data is flushed asynchronously, and close(2)
>>> completes immediately. Normally that’s OK. But Dai observed that:
>>>
>>> 1. If the server can’t accommodate the dirty data (eg ENOSPC or
>>> EIO) the application is not notified, even via close(2) return
>>> code.
>>>
>>> 2. If the server is down, the application does not hang, but it
>>> can leave dirty data in the client’s page cache with no
>>> indication to applications or administrators.
>>>
>>> The disposition of that data remains unknown even if a umount
>>> is attempted. While the server is down, the umount will hang
>>> trying to flush that data without giving an indication of why.
>>>
>>> 3. If a shutdown is attempted while the server is down and there
>>> is a pending flush, the shutdown will hang, even though there
>>> are no running applications with open files.
>>>
>>> 4. The behavior is non-deterministic from the application’s
>>> perspective. It occurs only if the server has granted a write
>>> delegation for that file; otherwise close(2) behaves like it
>>> does for NFSv2/3 or NFSv4 without a delegation present
>>> (close(2) waits synchronously for the flush to complete).
>>>
>>> Should close(2) wait synchronously for a data flush even in the
>>> presence of a write delegation?
>>>
>>> It’s certainly reasonable for umount to try hard to flush pinned
>>> data, but that makes shutdown unreliable.
>> We should probably start paying more attention to the "space_limit"
>> field in the write delegation. That field is supposed to tell the
>> client precisely how much data it is allowed to cache on close().
>>
> Sure, but what does that mean?
>
> Is the space_limit supposed to be on the file or the amount of data that
> can be cached by the client?
>
> Note that Spencer Dawkins effectively asked this question a couple of years ago:
>
> | In this text:
> |
> | 15.18.3. RESULT
> |
> | nfs_space_limit4
> | space_limit; /* Defines condition that
> | the client must check to
> | determine whether the
> | file needs to be flushed
> | to the server on close. */
> |
> | I'm no expert, but could I ask you to check whether this is the right
> | description for this struct? nfs_space_limit4 looks like it's either
> | a file size or a number of blocks, and I wasn't understanding how that
> | was a "condition" or how the limit had anything to do with flushing a
> | file to the server on close, so I'm wondering about a cut-and-paste error.
> |
>
> Does any server set the space_limit?
>
> And to what?
>
> Note, it seems that OpenSolaris does set it to be NFS_LIMIT_SIZE and
> UINT64_MAX. Which means that it is effectively saying that the client
> is guaranteed a lot of space. :-)
>


2015-01-08 15:46:17

by Rick Macklem

[permalink] [raw]
Subject: Re: [nfsv4] close(2) behavior when client holds a write delegation

Tom Haynes wrote:
> Adding NFSv4 WG ....
>
> On Wed, Jan 07, 2015 at 04:05:43PM -0800, Trond Myklebust wrote:
> > On Wed, Jan 7, 2015 at 12:04 PM, Chuck Lever
> > <[email protected]> wrote:
> > > Hi-
> > >
> > > Dai noticed that when a 3.17 Linux NFS client is granted a
>
> Hi, is this new behavior for 3.17 or does it happen to prior
> versions as well?
>
> > > write delegation, it neglects to flush dirty data synchronously
> > > with close(2). The data is flushed asynchronously, and close(2)
> > > completes immediately. Normally that’s OK. But Dai observed that:
> > >
> > > 1. If the server can’t accommodate the dirty data (eg ENOSPC or
> > > EIO) the application is not notified, even via close(2) return
> > > code.
> > >
> > > 2. If the server is down, the application does not hang, but it
> > > can leave dirty data in the client’s page cache with no
> > > indication to applications or administrators.
> > >
> > > The disposition of that data remains unknown even if a umount
> > > is attempted. While the server is down, the umount will hang
> > > trying to flush that data without giving an indication of why.
> > >
> > > 3. If a shutdown is attempted while the server is down and there
> > > is a pending flush, the shutdown will hang, even though there
> > > are no running applications with open files.
> > >
> > > 4. The behavior is non-deterministic from the application’s
> > > perspective. It occurs only if the server has granted a write
> > > delegation for that file; otherwise close(2) behaves like it
> > > does for NFSv2/3 or NFSv4 without a delegation present
> > > (close(2) waits synchronously for the flush to complete).
> > >
> > > Should close(2) wait synchronously for a data flush even in the
> > > presence of a write delegation?
> > >
> > > It’s certainly reasonable for umount to try hard to flush pinned
> > > data, but that makes shutdown unreliable.
> >
> > We should probably start paying more attention to the "space_limit"
> > field in the write delegation. That field is supposed to tell the
> > client precisely how much data it is allowed to cache on close().
> >
>
> Sure, but what does that mean?
>
> Is the space_limit supposed to be on the file or the amount of data
> that
> can be cached by the client?
>
My understanding of this was that the space limit was how much the
server guarantees that the client can grow the file by without failing,
due to ENOSPACE. Done via pre-allocation of blocks to the file on the
server or something like that.

I'll admit I can't remember what the FreeBSD server sets this to.
(Hopefully 0, because it doesn't do pre-allocation, but I should go take a look.;-)

For the other cases, such as crashed server or network partitioning,
there will never be a good solution. I think for these it is just a
design choice for the client implementor. (If FreeBSD, this tends to
end up controllable via a mount option. I think the FreeBSD client
uses the "nocto" option to decide if it will flush on close.)

rick

> Note that Spencer Dawkins effectively asked this question a couple of
> years ago:
>
> | In this text:
> |
> | 15.18.3. RESULT
> |
> | nfs_space_limit4
> | space_limit; /* Defines condition that
> | the client must check to
> | determine whether the
> | file needs to be flushed
> | to the server on close. */
> |
> | I'm no expert, but could I ask you to check whether this is the
> | right
> | description for this struct? nfs_space_limit4 looks like it's
> | either
> | a file size or a number of blocks, and I wasn't understanding how
> | that
> | was a "condition" or how the limit had anything to do with flushing
> | a
> | file to the server on close, so I'm wondering about a cut-and-paste
> | error.
> |
>
> Does any server set the space_limit?
>
> And to what?
>
> Note, it seems that OpenSolaris does set it to be NFS_LIMIT_SIZE and
> UINT64_MAX. Which means that it is effectively saying that the client
> is guaranteed a lot of space. :-)
>
> _______________________________________________
> nfsv4 mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/nfsv4
>