LinuxLists.cc - flock() and NFS [Was: Re: [PATCH] locks: rename file-private locks to file-description locks]

2014-04-27 09:16:08

Subject: flock() and NFS [Was: Re: [PATCH] locks: rename file-private locks to file-description locks]

[Trimming some folk from CC, and adding various NFS people]

On 04/27/2014 06:51 AM, NeilBrown wrote:

[...]

> Note to Michael: The text
> flock() does not lock files over NFS.
> in flock(2) is no longer accurate. The reality is ... complex.
> See nfs(5), and search for "local_lock".

Ahhh -- I see:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eebde23223aeb0ad2d9e3be6590ff8bbfab0fc2

Thanks for the heads up.

Just in general, it would be great if the flock(2) and fcntl(2) man pages
contained correct details for NFS, of course. So, for example, if there
are any current gotchas for NFS and fcntl() byte-range locking, I'd like
to add those to the fcntl(2) man page.

Anyway, returning to your point about flock(), how would this text
look for the flock(2) manual page:

NOTES
Since kernel 2.0, flock() is implemented as a system call in
its own right rather than being emulated in the GNU C library
as a call to fcntl(2). This yields classical BSD semantics:
there is no interaction between the types of lock placed by
flock() and fcntl(2), and flock() does not detect deadlock.
(Note, however, that on some modern BSDs, flock() and fcntl(2)
locks do interact with one another.)

In Linux kernels up to 2.6.11, flock() does not lock files over
NFS (i.e., the scope of locks was limited to the local system).
Instead, one could use fcntl(2) byte-range locking, which does
work over NFS, given a sufficiently recent version of Linux and
a server which supports locking. Since Linux 2.6.12, NFS
clients support flock() locks by emulating them as byte-range
locks on the entire file. This means that fcntl(2) and flock()
locks do interact with one another over NFS. Since Linux
2.6.37, the kernel supports a compatibility mode that allows
flock() locks (and also fcntl(2) byte region locks) to be
treated as local; see the discussion of the local_lock option
in nfs(5).
?

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2014-04-27 11:11:54

by Michael Kerrisk (man-pages)

[permalink] [raw]

Subject: Re: flock() and NFS [Was: Re: [PATCH] locks: rename file-private locks to file-description locks]

On Sun, Apr 27, 2014 at 12:04 PM, NeilBrown <[email protected]> wrote:
> On Sun, 27 Apr 2014 11:16:02 +0200 "Michael Kerrisk (man-pages)"
> <[email protected]> wrote:
>
>> [Trimming some folk from CC, and adding various NFS people]
>>
>> On 04/27/2014 06:51 AM, NeilBrown wrote:
>>
>> [...]
>>
>> > Note to Michael: The text
>> > flock() does not lock files over NFS.
>> > in flock(2) is no longer accurate. The reality is ... complex.
>> > See nfs(5), and search for "local_lock".
>>
>> Ahhh -- I see:
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eebde23223aeb0ad2d9e3be6590ff8bbfab0fc2
>>
>> Thanks for the heads up.
>>
>> Just in general, it would be great if the flock(2) and fcntl(2) man pages
>> contained correct details for NFS, of course. So, for example, if there
>> are any current gotchas for NFS and fcntl() byte-range locking, I'd like
>> to add those to the fcntl(2) man page.
>
> The only peculiarities I can think of are:
> - With NFS, locking or unlocking a region forces a flush of any cached data
> for that file (or maybe for the region of the file). I'm not sure if this
> is worth mentioning.

I agree that it's probably not necessary to mention.

> - With NFSv4 the client can lose a lock if it is out of contact with the
> server for a period of time. When this happens, any IO to the file by a
> process which "thinks" it holds a lock will fail until that process closes
> and re-opens the file.
> This behaviour is since 3.12. Prior to that the client might lose and
> regain the lock without ever knowing thus potentially risking corruption
> (but only if client and server lost contact for an extended period).

Do you have a pointer for that commit to 3.12?

>> Anyway, returning to your point about flock(), how would this text
>> look for the flock(2) manual page:
>>
>> NOTES
>> Since kernel 2.0, flock() is implemented as a system call in
>> its own right rather than being emulated in the GNU C library
>> as a call to fcntl(2). This yields classical BSD semantics:
>> there is no interaction between the types of lock placed by
>> flock() and fcntl(2), and flock() does not detect deadlock.
>> (Note, however, that on some modern BSDs, flock() and fcntl(2)
>> locks do interact with one another.)
>>
>> In Linux kernels up to 2.6.11, flock() does not lock files over
>> NFS (i.e., the scope of locks was limited to the local system).
>> Instead, one could use fcntl(2) byte-range locking, which does
>> work over NFS, given a sufficiently recent version of Linux and
>> a server which supports locking. Since Linux 2.6.12, NFS
>> clients support flock() locks by emulating them as byte-range
>> locks on the entire file. This means that fcntl(2) and flock()
>> locks do interact with one another over NFS. Since Linux
>> 2.6.37, the kernel supports a compatibility mode that allows
>> flock() locks (and also fcntl(2) byte region locks) to be
>> treated as local; see the discussion of the local_lock option
>> in nfs(5).
>> ?
>
> That seems to cover it quite well - thanks.

Thanks for checking it.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2014-04-29 09:53:44

by Michael Kerrisk (man-pages)

[permalink] [raw]

Subject: Re: flock() and NFS [Was: Re: [PATCH] locks: rename file-private locks to file-description locks]

On 04/29/2014 11:24 AM, NeilBrown wrote:
> On Tue, 29 Apr 2014 11:07:16 +0200 "Michael Kerrisk (man-pages)"
> <[email protected]> wrote:
>
>> On 04/27/2014 11:28 PM, NeilBrown wrote:
>>> On Sun, 27 Apr 2014 13:11:33 +0200 "Michael Kerrisk (man-pages)"
>>> <[email protected]> wrote:
>>>
>>>> On Sun, Apr 27, 2014 at 12:04 PM, NeilBrown <[email protected]> wrote:
>>>>> On Sun, 27 Apr 2014 11:16:02 +0200 "Michael Kerrisk (man-pages)"
>>>>> <[email protected]> wrote:
>>>>>
>>>>>> [Trimming some folk from CC, and adding various NFS people]
>>>>>>
>>>>>> On 04/27/2014 06:51 AM, NeilBrown wrote:
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>> Note to Michael: The text
>>>>>>> flock() does not lock files over NFS.
>>>>>>> in flock(2) is no longer accurate. The reality is ... complex.
>>>>>>> See nfs(5), and search for "local_lock".
>>>>>>
>>>>>> Ahhh -- I see:
>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eebde23223aeb0ad2d9e3be6590ff8bbfab0fc2
>>>>>>
>>>>>> Thanks for the heads up.
>>>>>>
>>>>>> Just in general, it would be great if the flock(2) and fcntl(2) man pages
>>>>>> contained correct details for NFS, of course. So, for example, if there
>>>>>> are any current gotchas for NFS and fcntl() byte-range locking, I'd like
>>>>>> to add those to the fcntl(2) man page.
>>>>>
>>>>> The only peculiarities I can think of are:
>>>>> - With NFS, locking or unlocking a region forces a flush of any cached data
>>>>> for that file (or maybe for the region of the file). I'm not sure if this
>>>>> is worth mentioning.
>>>>
>>>> I agree that it's probably not necessary to mention.
>>>>
>>>>> - With NFSv4 the client can lose a lock if it is out of contact with the
>>>>> server for a period of time. When this happens, any IO to the file by a
>>>>> process which "thinks" it holds a lock will fail until that process closes
>>>>> and re-opens the file.
>>>>> This behaviour is since 3.12. Prior to that the client might lose and
>>>>> regain the lock without ever knowing thus potentially risking corruption
>>>>> (but only if client and server lost contact for an extended period).
>>>>
>>>> Do you have a pointer for that commit to 3.12?
>>>>
>>>
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ef1820f9be27b6ad158f433ab38002ab8131db4d
>>>
>>> did most of the work while the subsequent commit
>>>
>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f6de7a39c181dfb8a2c534661a53c73afb3081cd
>>>
>>> changed some details, added some documentation, and inverted the default
>>> behaviour.
>>
>> Thanks for that detail. What do you think of the following text for the
>> fcntl(2) man page:
>>
>> Before Linux 3.12, if an NFS client is out of contact with the
>> server for a period of time, it might lose and regain a lock
>> without ever being aware of the fact. This scenario poten‐
>> tially risks data corruption, since another process might
>> acquire a lock in the intervening period and perform file I/O.
>> Since Linux 3.12, if the client loses contact with the server,
>> any I/O to the file by a process which "thinks" it holds a lock
>> will fail until that process closes and reopens the file. A
>> kernel parameter, nfs.recover_lost_locks, can be set to 1 to
>> obtain the pre-3.12 behavior, whereby the client will attempt
>> to recover lost locks when contact is reestablished with the
>> server. Because of the attendant risk of data corruption, this
>> parameter defaults to 0 (disabled).
>>
>
> Mostly good.
>
> I'm just a little concerned about "if the client loses contact with the
> server" in the middle there. It is no longer qualified and it isn't clear
> that the "for a period of time" qualification still applied. And we should
> probably quantify the period of time - which defaults to 90 seconds.
> I don't remember just now the difference between
> /proc/fs/nfsd/nfsv4{lease,grace}time
> but this 90 seconds is one of those.
>
> Also this is NFSv4 specific. With NFSv3 the failure mode is the reverse. If
> the server loses contact with a client then any lock stays in place
> indefinitely ("why can't I read my mail"... I remember it well).
>
> Before Linux 3.12, if an NFSv4 client loses contact with the server
> (defined as more than 90 seconds with no communication), it might lose
> and regain ....

Thanks, Neil. Changed as you suggest. I'd quite like to mention
which of /proc/fs/nfsd/nfsv4{lease,grace}time is relevant here. I had a
quick scan, but could not determine it with complete confidence. My suspicion,
looking at fs/lockd/svcproc.c and fs/lockd/grace.c::locks_in_grace()
is that it is /proc/fs/nfsd/nfsv4gracetime that is relevant here. Can anyone
confirm?

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2014-04-29 12:20:14

by Michael Kerrisk (man-pages)

[permalink] [raw]

Subject: Re: flock() and NFS [Was: Re: [PATCH] locks: rename file-private locks to file-description locks]

On 04/29/2014 01:34 PM, Jeff Layton wrote:
> On Tue, 29 Apr 2014 11:53:40 +0200
> "Michael Kerrisk (man-pages)" <[email protected]> wrote:
>
>> On 04/29/2014 11:24 AM, NeilBrown wrote:
>>> On Tue, 29 Apr 2014 11:07:16 +0200 "Michael Kerrisk (man-pages)"
>>> <[email protected]> wrote:
>>>
>>>> On 04/27/2014 11:28 PM, NeilBrown wrote:
>>>>> On Sun, 27 Apr 2014 13:11:33 +0200 "Michael Kerrisk (man-pages)"
>>>>> <[email protected]> wrote:
>>>>>
>>>>>> On Sun, Apr 27, 2014 at 12:04 PM, NeilBrown <[email protected]> wrote:
>>>>>>> On Sun, 27 Apr 2014 11:16:02 +0200 "Michael Kerrisk (man-pages)"
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>> [Trimming some folk from CC, and adding various NFS people]
>>>>>>>>
>>>>>>>> On 04/27/2014 06:51 AM, NeilBrown wrote:
>>>>>>>>
>>>>>>>> [...]
>>>>>>>>
>>>>>>>>> Note to Michael: The text
>>>>>>>>> flock() does not lock files over NFS.
>>>>>>>>> in flock(2) is no longer accurate. The reality is ... complex.
>>>>>>>>> See nfs(5), and search for "local_lock".
>>>>>>>>
>>>>>>>> Ahhh -- I see:
>>>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eebde23223aeb0ad2d9e3be6590ff8bbfab0fc2
>>>>>>>>
>>>>>>>> Thanks for the heads up.
>>>>>>>>
>>>>>>>> Just in general, it would be great if the flock(2) and fcntl(2) man pages
>>>>>>>> contained correct details for NFS, of course. So, for example, if there
>>>>>>>> are any current gotchas for NFS and fcntl() byte-range locking, I'd like
>>>>>>>> to add those to the fcntl(2) man page.
>>>>>>>
>>>>>>> The only peculiarities I can think of are:
>>>>>>> - With NFS, locking or unlocking a region forces a flush of any cached data
>>>>>>> for that file (or maybe for the region of the file). I'm not sure if this
>>>>>>> is worth mentioning.
>>>>>>
>>>>>> I agree that it's probably not necessary to mention.
>>>>>>
>>>>>>> - With NFSv4 the client can lose a lock if it is out of contact with the
>>>>>>> server for a period of time. When this happens, any IO to the file by a
>>>>>>> process which "thinks" it holds a lock will fail until that process closes
>>>>>>> and re-opens the file.
>>>>>>> This behaviour is since 3.12. Prior to that the client might lose and
>>>>>>> regain the lock without ever knowing thus potentially risking corruption
>>>>>>> (but only if client and server lost contact for an extended period).
>>>>>>
>>>>>> Do you have a pointer for that commit to 3.12?
>>>>>>
>>>>>
>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ef1820f9be27b6ad158f433ab38002ab8131db4d
>>>>>
>>>>> did most of the work while the subsequent commit
>>>>>
>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f6de7a39c181dfb8a2c534661a53c73afb3081cd
>>>>>
>>>>> changed some details, added some documentation, and inverted the default
>>>>> behaviour.
>>>>
>>>> Thanks for that detail. What do you think of the following text for the
>>>> fcntl(2) man page:
>>>>
>>>> Before Linux 3.12, if an NFS client is out of contact with the
>>>> server for a period of time, it might lose and regain a lock
>>>> without ever being aware of the fact. This scenario poten‐
>>>> tially risks data corruption, since another process might
>>>> acquire a lock in the intervening period and perform file I/O.
>>>> Since Linux 3.12, if the client loses contact with the server,
>>>> any I/O to the file by a process which "thinks" it holds a lock
>>>> will fail until that process closes and reopens the file. A
>>>> kernel parameter, nfs.recover_lost_locks, can be set to 1 to
>>>> obtain the pre-3.12 behavior, whereby the client will attempt
>>>> to recover lost locks when contact is reestablished with the
>>>> server. Because of the attendant risk of data corruption, this
>>>> parameter defaults to 0 (disabled).
>>>>
>>>
>>> Mostly good.
>>>
>>> I'm just a little concerned about "if the client loses contact with the
>>> server" in the middle there. It is no longer qualified and it isn't clear
>>> that the "for a period of time" qualification still applied. And we should
>>> probably quantify the period of time - which defaults to 90 seconds.
>>> I don't remember just now the difference between
>>> /proc/fs/nfsd/nfsv4{lease,grace}time
>>> but this 90 seconds is one of those.
>>>
>>> Also this is NFSv4 specific. With NFSv3 the failure mode is the reverse. If
>>> the server loses contact with a client then any lock stays in place
>>> indefinitely ("why can't I read my mail"... I remember it well).
>>>
>>> Before Linux 3.12, if an NFSv4 client loses contact with the server
>>> (defined as more than 90 seconds with no communication), it might lose
>>> and regain ....
>>
>> Thanks, Neil. Changed as you suggest. I'd quite like to mention
>> which of /proc/fs/nfsd/nfsv4{lease,grace}time is relevant here. I had a
>> quick scan, but could not determine it with complete confidence. My suspicion,
>> looking at fs/lockd/svcproc.c and fs/lockd/grace.c::locks_in_grace()
>> is that it is /proc/fs/nfsd/nfsv4gracetime that is relevant here. Can anyone
>> confirm?
>>
>
> The difference here is subtle. The gracetime is how long after a reboot
> should knfsd allow clients to reclaim state (and deny the creation of
> new locks and opens). The leasetime is how long the NFSv4 lease period
> is. There is a relationship between the two that's illustrated in the
> comments above write_gracetime:
>
> /**
> * write_gracetime - Set or report current NFSv4 grace period time
> *
> * As above, but sets the time of the NFSv4 grace period.
> *
> * Note this should never be set to less than the *previous*
> * lease-period time, but we don't try to enforce this. (In the common
> * case (a new boot), we don't know what the previous lease time was
> * anyway.)
> */
>
> The value you're interested in here is the nfsv4leasetime. If the
> client doesn't renew its lease within that period, then it's subject to
> the server giving up on it and dropping any state that it holds on that
> clients' behalf.
>
> Note that this is not a firm timeout. The server runs a job
> periodically to clean out expired stateful objects, and it's likely
> that there is some time (maybe even up to another whole lease period)
> between when the timeout expires and the job actually runs. If the
> client gets a RENEW in there within that window, its lease will be
> renewed and its state preserved.
>
> Also note that all of the above just applies to the Linux knfsd. There
> are many other servers in the field and they have different rules for
> dropping state held by clients that have gone AWOL.

Thanks for the detailed explanation, Jeff. I've updated the draft text to
mention nfsv4gracetime. I won't add the subtleties you mention above
(but they'll go into the commit message).

The text is now:

Record locking and NFS
Before Linux 3.12, if an NFSv4 client loses contact with the
server for a period of time (defined as more than 90 seconds
with no communication), it might lose and regain a lock without
ever being aware of the fact. (The period of time after which
contact is assumed lost is defined by /proc/fs/nfsd/nfsv4lease‐
time, which expresses the period in seconds. The default value
for this file is 90.) This scenario potentially risks data
corruption, since another process might acquire a lock in the
intervening period and perform file I/O.

Since Linux 3.12, if an NFSv4 client loses contact with the
server, any I/O to the file by a process which "thinks" it
holds a lock will fail until that process closes and reopens
the file. A kernel parameter, nfs.recover_lost_locks, can be
set to 1 to obtain the pre-3.12 behavior, whereby the client
will attempt to recover lost locks when contact is reestab‐
lished with the server. Because of the attendant risk of data
corruption, this parameter defaults to 0 (disabled).

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2014-04-27 10:04:45

by NeilBrown

[permalink] [raw]

Subject: Re: flock() and NFS [Was: Re: [PATCH] locks: rename file-private locks to file-description locks]

On Sun, 27 Apr 2014 11:16:02 +0200 "Michael Kerrisk (man-pages)"
<[email protected]> wrote:

> [Trimming some folk from CC, and adding various NFS people]
>
> On 04/27/2014 06:51 AM, NeilBrown wrote:
>
> [...]
>
> > Note to Michael: The text
> > flock() does not lock files over NFS.
> > in flock(2) is no longer accurate. The reality is ... complex.
> > See nfs(5), and search for "local_lock".
>
> Ahhh -- I see:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eebde23223aeb0ad2d9e3be6590ff8bbfab0fc2
>
> Thanks for the heads up.
>
> Just in general, it would be great if the flock(2) and fcntl(2) man pages
> contained correct details for NFS, of course. So, for example, if there
> are any current gotchas for NFS and fcntl() byte-range locking, I'd like
> to add those to the fcntl(2) man page.

The only peculiarities I can think of are:
- With NFS, locking or unlocking a region forces a flush of any cached data
for that file (or maybe for the region of the file). I'm not sure if this
is worth mentioning.

- With NFSv4 the client can lose a lock if it is out of contact with the
server for a period of time. When this happens, any IO to the file by a
process which "thinks" it holds a lock will fail until that process closes
and re-opens the file.
This behaviour is since 3.12. Prior to that the client might lose and
regain the lock without ever knowing thus potentially risking corruption
(but only if client and server lost contact for an extended period).

>
> Anyway, returning to your point about flock(), how would this text
> look for the flock(2) manual page:
>
> NOTES
> Since kernel 2.0, flock() is implemented as a system call in
> its own right rather than being emulated in the GNU C library
> as a call to fcntl(2). This yields classical BSD semantics:
> there is no interaction between the types of lock placed by
> flock() and fcntl(2), and flock() does not detect deadlock.
> (Note, however, that on some modern BSDs, flock() and fcntl(2)
> locks do interact with one another.)
>
> In Linux kernels up to 2.6.11, flock() does not lock files over
> NFS (i.e., the scope of locks was limited to the local system).
> Instead, one could use fcntl(2) byte-range locking, which does
> work over NFS, given a sufficiently recent version of Linux and
> a server which supports locking. Since Linux 2.6.12, NFS
> clients support flock() locks by emulating them as byte-range
> locks on the entire file. This means that fcntl(2) and flock()
> locks do interact with one another over NFS. Since Linux
> 2.6.37, the kernel supports a compatibility mode that allows
> flock() locks (and also fcntl(2) byte region locks) to be
> treated as local; see the discussion of the local_lock option
> in nfs(5).
> ?

That seems to cover it quite well - thanks.

NeilBrown

>
> Thanks,
>
> Michael
>
>

Attachments:

signature.asc (828.00 B)

2014-04-29 09:25:10

by NeilBrown

[permalink] [raw]

Subject: Re: flock() and NFS [Was: Re: [PATCH] locks: rename file-private locks to file-description locks]

On Tue, 29 Apr 2014 11:07:16 +0200 "Michael Kerrisk (man-pages)"
<[email protected]> wrote:

> On 04/27/2014 11:28 PM, NeilBrown wrote:
> > On Sun, 27 Apr 2014 13:11:33 +0200 "Michael Kerrisk (man-pages)"
> > <[email protected]> wrote:
> >
> >> On Sun, Apr 27, 2014 at 12:04 PM, NeilBrown <[email protected]> wrote:
> >>> On Sun, 27 Apr 2014 11:16:02 +0200 "Michael Kerrisk (man-pages)"
> >>> <[email protected]> wrote:
> >>>
> >>>> [Trimming some folk from CC, and adding various NFS people]
> >>>>
> >>>> On 04/27/2014 06:51 AM, NeilBrown wrote:
> >>>>
> >>>> [...]
> >>>>
> >>>>> Note to Michael: The text
> >>>>> flock() does not lock files over NFS.
> >>>>> in flock(2) is no longer accurate. The reality is ... complex.
> >>>>> See nfs(5), and search for "local_lock".
> >>>>
> >>>> Ahhh -- I see:
> >>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eebde23223aeb0ad2d9e3be6590ff8bbfab0fc2
> >>>>
> >>>> Thanks for the heads up.
> >>>>
> >>>> Just in general, it would be great if the flock(2) and fcntl(2) man pages
> >>>> contained correct details for NFS, of course. So, for example, if there
> >>>> are any current gotchas for NFS and fcntl() byte-range locking, I'd like
> >>>> to add those to the fcntl(2) man page.
> >>>
> >>> The only peculiarities I can think of are:
> >>> - With NFS, locking or unlocking a region forces a flush of any cached data
> >>> for that file (or maybe for the region of the file). I'm not sure if this
> >>> is worth mentioning.
> >>
> >> I agree that it's probably not necessary to mention.
> >>
> >>> - With NFSv4 the client can lose a lock if it is out of contact with the
> >>> server for a period of time. When this happens, any IO to the file by a
> >>> process which "thinks" it holds a lock will fail until that process closes
> >>> and re-opens the file.
> >>> This behaviour is since 3.12. Prior to that the client might lose and
> >>> regain the lock without ever knowing thus potentially risking corruption
> >>> (but only if client and server lost contact for an extended period).
> >>
> >> Do you have a pointer for that commit to 3.12?
> >>
> >
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ef1820f9be27b6ad158f433ab38002ab8131db4d
> >
> > did most of the work while the subsequent commit
> >
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f6de7a39c181dfb8a2c534661a53c73afb3081cd
> >
> > changed some details, added some documentation, and inverted the default
> > behaviour.
>
> Thanks for that detail. What do you think of the following text for the
> fcntl(2) man page:
>
> Before Linux 3.12, if an NFS client is out of contact with the
> server for a period of time, it might lose and regain a lock
> without ever being aware of the fact. This scenario poten‐
> tially risks data corruption, since another process might
> acquire a lock in the intervening period and perform file I/O.
> Since Linux 3.12, if the client loses contact with the server,
> any I/O to the file by a process which "thinks" it holds a lock
> will fail until that process closes and reopens the file. A
> kernel parameter, nfs.recover_lost_locks, can be set to 1 to
> obtain the pre-3.12 behavior, whereby the client will attempt
> to recover lost locks when contact is reestablished with the
> server. Because of the attendant risk of data corruption, this
> parameter defaults to 0 (disabled).
>

Mostly good.

I'm just a little concerned about "if the client loses contact with the
server" in the middle there. It is no longer qualified and it isn't clear
that the "for a period of time" qualification still applied. And we should
probably quantify the period of time - which defaults to 90 seconds.
I don't remember just now the difference between
/proc/fs/nfsd/nfsv4{lease,grace}time
but this 90 seconds is one of those.

Also this is NFSv4 specific. With NFSv3 the failure mode is the reverse. If
the server loses contact with a client then any lock stays in place
indefinitely ("why can't I read my mail"... I remember it well).

Before Linux 3.12, if an NFSv4 client loses contact with the server
(defined as more than 90 seconds with no communication), it might lose
and regain ....

Just changing that bit should cover it I think.

NeilBrown

Attachments:

signature.asc (828.00 B)

2014-04-29 09:07:22

by Michael Kerrisk (man-pages)

[permalink] [raw]

Subject: Re: flock() and NFS [Was: Re: [PATCH] locks: rename file-private locks to file-description locks]

On 04/27/2014 11:28 PM, NeilBrown wrote:
> On Sun, 27 Apr 2014 13:11:33 +0200 "Michael Kerrisk (man-pages)"
> <[email protected]> wrote:
>
>> On Sun, Apr 27, 2014 at 12:04 PM, NeilBrown <[email protected]> wrote:
>>> On Sun, 27 Apr 2014 11:16:02 +0200 "Michael Kerrisk (man-pages)"
>>> <[email protected]> wrote:
>>>
>>>> [Trimming some folk from CC, and adding various NFS people]
>>>>
>>>> On 04/27/2014 06:51 AM, NeilBrown wrote:
>>>>
>>>> [...]
>>>>
>>>>> Note to Michael: The text
>>>>> flock() does not lock files over NFS.
>>>>> in flock(2) is no longer accurate. The reality is ... complex.
>>>>> See nfs(5), and search for "local_lock".
>>>>
>>>> Ahhh -- I see:
>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eebde23223aeb0ad2d9e3be6590ff8bbfab0fc2
>>>>
>>>> Thanks for the heads up.
>>>>
>>>> Just in general, it would be great if the flock(2) and fcntl(2) man pages
>>>> contained correct details for NFS, of course. So, for example, if there
>>>> are any current gotchas for NFS and fcntl() byte-range locking, I'd like
>>>> to add those to the fcntl(2) man page.
>>>
>>> The only peculiarities I can think of are:
>>> - With NFS, locking or unlocking a region forces a flush of any cached data
>>> for that file (or maybe for the region of the file). I'm not sure if this
>>> is worth mentioning.
>>
>> I agree that it's probably not necessary to mention.
>>
>>> - With NFSv4 the client can lose a lock if it is out of contact with the
>>> server for a period of time. When this happens, any IO to the file by a
>>> process which "thinks" it holds a lock will fail until that process closes
>>> and re-opens the file.
>>> This behaviour is since 3.12. Prior to that the client might lose and
>>> regain the lock without ever knowing thus potentially risking corruption
>>> (but only if client and server lost contact for an extended period).
>>
>> Do you have a pointer for that commit to 3.12?
>>
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ef1820f9be27b6ad158f433ab38002ab8131db4d
>
> did most of the work while the subsequent commit
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f6de7a39c181dfb8a2c534661a53c73afb3081cd
>
> changed some details, added some documentation, and inverted the default
> behaviour.

Thanks for that detail. What do you think of the following text for the
fcntl(2) man page:

Before Linux 3.12, if an NFS client is out of contact with the
server for a period of time, it might lose and regain a lock
without ever being aware of the fact. This scenario poten‐
tially risks data corruption, since another process might
acquire a lock in the intervening period and perform file I/O.
Since Linux 3.12, if the client loses contact with the server,
any I/O to the file by a process which "thinks" it holds a lock
will fail until that process closes and reopens the file. A
kernel parameter, nfs.recover_lost_locks, can be set to 1 to
obtain the pre-3.12 behavior, whereby the client will attempt
to recover lost locks when contact is reestablished with the
server. Because of the attendant risk of data corruption, this
parameter defaults to 0 (disabled).

?

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2014-04-27 21:28:59

by NeilBrown

[permalink] [raw]

Subject: Re: flock() and NFS [Was: Re: [PATCH] locks: rename file-private locks to file-description locks]

On Sun, 27 Apr 2014 13:11:33 +0200 "Michael Kerrisk (man-pages)"
<[email protected]> wrote:

> On Sun, Apr 27, 2014 at 12:04 PM, NeilBrown <[email protected]> wrote:
> > On Sun, 27 Apr 2014 11:16:02 +0200 "Michael Kerrisk (man-pages)"
> > <[email protected]> wrote:
> >
> >> [Trimming some folk from CC, and adding various NFS people]
> >>
> >> On 04/27/2014 06:51 AM, NeilBrown wrote:
> >>
> >> [...]
> >>
> >> > Note to Michael: The text
> >> > flock() does not lock files over NFS.
> >> > in flock(2) is no longer accurate. The reality is ... complex.
> >> > See nfs(5), and search for "local_lock".
> >>
> >> Ahhh -- I see:
> >> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eebde23223aeb0ad2d9e3be6590ff8bbfab0fc2
> >>
> >> Thanks for the heads up.
> >>
> >> Just in general, it would be great if the flock(2) and fcntl(2) man pages
> >> contained correct details for NFS, of course. So, for example, if there
> >> are any current gotchas for NFS and fcntl() byte-range locking, I'd like
> >> to add those to the fcntl(2) man page.
> >
> > The only peculiarities I can think of are:
> > - With NFS, locking or unlocking a region forces a flush of any cached data
> > for that file (or maybe for the region of the file). I'm not sure if this
> > is worth mentioning.
>
> I agree that it's probably not necessary to mention.
>
> > - With NFSv4 the client can lose a lock if it is out of contact with the
> > server for a period of time. When this happens, any IO to the file by a
> > process which "thinks" it holds a lock will fail until that process closes
> > and re-opens the file.
> > This behaviour is since 3.12. Prior to that the client might lose and
> > regain the lock without ever knowing thus potentially risking corruption
> > (but only if client and server lost contact for an extended period).
>
> Do you have a pointer for that commit to 3.12?
>

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ef1820f9be27b6ad158f433ab38002ab8131db4d

did most of the work while the subsequent commit

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f6de7a39c181dfb8a2c534661a53c73afb3081cd

changed some details, added some documentation, and inverted the default
behaviour.

NeilBrown

Attachments:

signature.asc (828.00 B)

2014-04-29 11:34:58

by Jeff Layton

[permalink] [raw]

Subject: Re: flock() and NFS [Was: Re: [PATCH] locks: rename file-private locks to file-description locks]

On Tue, 29 Apr 2014 11:53:40 +0200
"Michael Kerrisk (man-pages)" <[email protected]> wrote:

> On 04/29/2014 11:24 AM, NeilBrown wrote:
> > On Tue, 29 Apr 2014 11:07:16 +0200 "Michael Kerrisk (man-pages)"
> > <[email protected]> wrote:
> >
> >> On 04/27/2014 11:28 PM, NeilBrown wrote:
> >>> On Sun, 27 Apr 2014 13:11:33 +0200 "Michael Kerrisk (man-pages)"
> >>> <[email protected]> wrote:
> >>>
> >>>> On Sun, Apr 27, 2014 at 12:04 PM, NeilBrown <[email protected]> wrote:
> >>>>> On Sun, 27 Apr 2014 11:16:02 +0200 "Michael Kerrisk (man-pages)"
> >>>>> <[email protected]> wrote:
> >>>>>
> >>>>>> [Trimming some folk from CC, and adding various NFS people]
> >>>>>>
> >>>>>> On 04/27/2014 06:51 AM, NeilBrown wrote:
> >>>>>>
> >>>>>> [...]
> >>>>>>
> >>>>>>> Note to Michael: The text
> >>>>>>> flock() does not lock files over NFS.
> >>>>>>> in flock(2) is no longer accurate. The reality is ... complex.
> >>>>>>> See nfs(5), and search for "local_lock".
> >>>>>>
> >>>>>> Ahhh -- I see:
> >>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eebde23223aeb0ad2d9e3be6590ff8bbfab0fc2
> >>>>>>
> >>>>>> Thanks for the heads up.
> >>>>>>
> >>>>>> Just in general, it would be great if the flock(2) and fcntl(2) man pages
> >>>>>> contained correct details for NFS, of course. So, for example, if there
> >>>>>> are any current gotchas for NFS and fcntl() byte-range locking, I'd like
> >>>>>> to add those to the fcntl(2) man page.
> >>>>>
> >>>>> The only peculiarities I can think of are:
> >>>>> - With NFS, locking or unlocking a region forces a flush of any cached data
> >>>>> for that file (or maybe for the region of the file). I'm not sure if this
> >>>>> is worth mentioning.
> >>>>
> >>>> I agree that it's probably not necessary to mention.
> >>>>
> >>>>> - With NFSv4 the client can lose a lock if it is out of contact with the
> >>>>> server for a period of time. When this happens, any IO to the file by a
> >>>>> process which "thinks" it holds a lock will fail until that process closes
> >>>>> and re-opens the file.
> >>>>> This behaviour is since 3.12. Prior to that the client might lose and
> >>>>> regain the lock without ever knowing thus potentially risking corruption
> >>>>> (but only if client and server lost contact for an extended period).
> >>>>
> >>>> Do you have a pointer for that commit to 3.12?
> >>>>
> >>>
> >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ef1820f9be27b6ad158f433ab38002ab8131db4d
> >>>
> >>> did most of the work while the subsequent commit
> >>>
> >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f6de7a39c181dfb8a2c534661a53c73afb3081cd
> >>>
> >>> changed some details, added some documentation, and inverted the default
> >>> behaviour.
> >>
> >> Thanks for that detail. What do you think of the following text for the
> >> fcntl(2) man page:
> >>
> >> Before Linux 3.12, if an NFS client is out of contact with the
> >> server for a period of time, it might lose and regain a lock
> >> without ever being aware of the fact. This scenario poten‐
> >> tially risks data corruption, since another process might
> >> acquire a lock in the intervening period and perform file I/O.
> >> Since Linux 3.12, if the client loses contact with the server,
> >> any I/O to the file by a process which "thinks" it holds a lock
> >> will fail until that process closes and reopens the file. A
> >> kernel parameter, nfs.recover_lost_locks, can be set to 1 to
> >> obtain the pre-3.12 behavior, whereby the client will attempt
> >> to recover lost locks when contact is reestablished with the
> >> server. Because of the attendant risk of data corruption, this
> >> parameter defaults to 0 (disabled).
> >>
> >
> > Mostly good.
> >
> > I'm just a little concerned about "if the client loses contact with the
> > server" in the middle there. It is no longer qualified and it isn't clear
> > that the "for a period of time" qualification still applied. And we should
> > probably quantify the period of time - which defaults to 90 seconds.
> > I don't remember just now the difference between
> > /proc/fs/nfsd/nfsv4{lease,grace}time
> > but this 90 seconds is one of those.
> >
> > Also this is NFSv4 specific. With NFSv3 the failure mode is the reverse. If
> > the server loses contact with a client then any lock stays in place
> > indefinitely ("why can't I read my mail"... I remember it well).
> >
> > Before Linux 3.12, if an NFSv4 client loses contact with the server
> > (defined as more than 90 seconds with no communication), it might lose
> > and regain ....
>
> Thanks, Neil. Changed as you suggest. I'd quite like to mention
> which of /proc/fs/nfsd/nfsv4{lease,grace}time is relevant here. I had a
> quick scan, but could not determine it with complete confidence. My suspicion,
> looking at fs/lockd/svcproc.c and fs/lockd/grace.c::locks_in_grace()
> is that it is /proc/fs/nfsd/nfsv4gracetime that is relevant here. Can anyone
> confirm?
>

The difference here is subtle. The gracetime is how long after a reboot
should knfsd allow clients to reclaim state (and deny the creation of
new locks and opens). The leasetime is how long the NFSv4 lease period
is. There is a relationship between the two that's illustrated in the
comments above write_gracetime:

/**
* write_gracetime - Set or report current NFSv4 grace period time
*
* As above, but sets the time of the NFSv4 grace period.
*
* Note this should never be set to less than the *previous*
* lease-period time, but we don't try to enforce this. (In the common
* case (a new boot), we don't know what the previous lease time was
* anyway.)
*/

The value you're interested in here is the nfsv4leasetime. If the
client doesn't renew its lease within that period, then it's subject to
the server giving up on it and dropping any state that it holds on that
clients' behalf.

Note that this is not a firm timeout. The server runs a job
periodically to clean out expired stateful objects, and it's likely
that there is some time (maybe even up to another whole lease period)
between when the timeout expires and the job actually runs. If the
client gets a RENEW in there within that window, its lease will be
renewed and its state preserved.

Also note that all of the above just applies to the Linux knfsd. There
are many other servers in the field and they have different rules for
dropping state held by clients that have gone AWOL.

--
Jeff Layton <[email protected]>