Hi,
I was looking at the code for nlmclnt_lock and wanted to ask a
question about how the Linux kernel client and the NLM 4 protocol
handle some errors around certain edge cases.
Specifically, I think there is a race condition around two threads of
the same program acquiring a lock, one of the threads being
interrupted, and the NFS client sending an unlock when none of the
program threads called unlock.
On NFS server machine S:
there exists an unlocked file F
On NFS client machine C:
in program P:
thread 1 tries to lock(F) with fd A
thread 2 tries to lock(F) with fd B
The Linux client will issue two NLM_LOCK calls with the same svid and
same range, because it uses the program id to map to an svid.
For whatever reason, assume the connection is broken (cable gets pulled etc...)
and `status = nlmclnt_call(cred, req, NLMPROC_LOCK);` fails.
The Linux client will retry the request, but at some point thread 1
receives a signal and nlmclnt_lock breaks out of its loop. Because the
Linux client request failed, it will fall through and go to the
out_unlock label, where it will want to send an unlock request.
Assume that at some point the connection is reestablished.
The Linux kernel client now has two outstanding lock requests to send
to the remote server: one for a lock that thread 2 is still trying to
acquire, and one for an unlock of thread 1 that failed and was
interrupted.
I'm worried that the Linux client may first send the lock request, and
tell thread 2 that it acquired the lock, and then send an unlock
request from the cancelled thread 1 request.
The server will successfully process both requests, because the svid
is the same for both, and the true server side state will be that the
file is unlocked.
One can talk about the wisdom of using multiple threads to acquire the
same file lock, but this behavior is weird, because none of the
threads called unlock.
I have experimented with reproducing this, but have not been
successful in triggering this ordering of events.
I've also looked at the code of in clntproc.c and I don't see a spot
where outstanding failed lock/unlock requests are checked while
processing lock requests?
Thanks,
-Jan
On Fri, 2022-08-05 at 19:17 -0400, Jan Kasiak wrote:
> Hi,
>
> I was looking at the code for nlmclnt_lock and wanted to ask a
> question about how the Linux kernel client and the NLM 4 protocol
> handle some errors around certain edge cases.
>
> Specifically, I think there is a race condition around two threads of
> the same program acquiring a lock, one of the threads being
> interrupted, and the NFS client sending an unlock when none of the
> program threads called unlock.
>
> On NFS server machine S:
> there exists an unlocked file F
>
> On NFS client machine C:
> in program P:
> thread 1 tries to lock(F) with fd A
> thread 2 tries to lock(F) with fd B
>
> The Linux client will issue two NLM_LOCK calls with the same svid and
> same range, because it uses the program id to map to an svid.
>
> For whatever reason, assume the connection is broken (cable gets
> pulled etc...)
> and `status = nlmclnt_call(cred, req, NLMPROC_LOCK);` fails.
>
> The Linux client will retry the request, but at some point thread 1
> receives a signal and nlmclnt_lock breaks out of its loop. Because
> the
> Linux client request failed, it will fall through and go to the
> out_unlock label, where it will want to send an unlock request.
>
> Assume that at some point the connection is reestablished.
>
> The Linux kernel client now has two outstanding lock requests to send
> to the remote server: one for a lock that thread 2 is still trying to
> acquire, and one for an unlock of thread 1 that failed and was
> interrupted.
>
> I'm worried that the Linux client may first send the lock request,
> and
> tell thread 2 that it acquired the lock, and then send an unlock
> request from the cancelled thread 1 request.
>
> The server will successfully process both requests, because the svid
> is the same for both, and the true server side state will be that the
> file is unlocked.
>
> One can talk about the wisdom of using multiple threads to acquire
> the
> same file lock, but this behavior is weird, because none of the
> threads called unlock.
>
> I have experimented with reproducing this, but have not been
> successful in triggering this ordering of events.
>
> I've also looked at the code of in clntproc.c and I don't see a spot
> where outstanding failed lock/unlock requests are checked while
> processing lock requests?
>
> Thanks,
> -Jan
Nobody here is likely to want to waste much time trying to 'fix' the
NLM locking protocol. The protocol itself is known to be extremely
fragile, and the endemic problems constitute some of the main
motivations for the development of the NFSv4 protocol
(SeeĀ https://datatracker.ietf.org/doc/html/rfc2624#section-8
andĀ https://datatracker.ietf.org/doc/html/rfc7530#section-9).
If you need more reliable support for POSIX locks beyond what exists
today for NLM, then please consider NFSv4.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
Hi Trond,
The v4 RFCs do mention protocol design flaws, but don't go into more detail.
I was trying to understand those flaws in order to understand how and
why v3 was problematic.
-Jan
On Fri, Aug 5, 2022 at 10:27 PM Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2022-08-05 at 19:17 -0400, Jan Kasiak wrote:
> > Hi,
> >
> > I was looking at the code for nlmclnt_lock and wanted to ask a
> > question about how the Linux kernel client and the NLM 4 protocol
> > handle some errors around certain edge cases.
> >
> > Specifically, I think there is a race condition around two threads of
> > the same program acquiring a lock, one of the threads being
> > interrupted, and the NFS client sending an unlock when none of the
> > program threads called unlock.
> >
> > On NFS server machine S:
> > there exists an unlocked file F
> >
> > On NFS client machine C:
> > in program P:
> > thread 1 tries to lock(F) with fd A
> > thread 2 tries to lock(F) with fd B
> >
> > The Linux client will issue two NLM_LOCK calls with the same svid and
> > same range, because it uses the program id to map to an svid.
> >
> > For whatever reason, assume the connection is broken (cable gets
> > pulled etc...)
> > and `status = nlmclnt_call(cred, req, NLMPROC_LOCK);` fails.
> >
> > The Linux client will retry the request, but at some point thread 1
> > receives a signal and nlmclnt_lock breaks out of its loop. Because
> > the
> > Linux client request failed, it will fall through and go to the
> > out_unlock label, where it will want to send an unlock request.
> >
> > Assume that at some point the connection is reestablished.
> >
> > The Linux kernel client now has two outstanding lock requests to send
> > to the remote server: one for a lock that thread 2 is still trying to
> > acquire, and one for an unlock of thread 1 that failed and was
> > interrupted.
> >
> > I'm worried that the Linux client may first send the lock request,
> > and
> > tell thread 2 that it acquired the lock, and then send an unlock
> > request from the cancelled thread 1 request.
> >
> > The server will successfully process both requests, because the svid
> > is the same for both, and the true server side state will be that the
> > file is unlocked.
> >
> > One can talk about the wisdom of using multiple threads to acquire
> > the
> > same file lock, but this behavior is weird, because none of the
> > threads called unlock.
> >
> > I have experimented with reproducing this, but have not been
> > successful in triggering this ordering of events.
> >
> > I've also looked at the code of in clntproc.c and I don't see a spot
> > where outstanding failed lock/unlock requests are checked while
> > processing lock requests?
> >
> > Thanks,
> > -Jan
>
> Nobody here is likely to want to waste much time trying to 'fix' the
> NLM locking protocol. The protocol itself is known to be extremely
> fragile, and the endemic problems constitute some of the main
> motivations for the development of the NFSv4 protocol
> (See https://datatracker.ietf.org/doc/html/rfc2624#section-8
> and https://datatracker.ietf.org/doc/html/rfc7530#section-9).
>
> If you need more reliable support for POSIX locks beyond what exists
> today for NLM, then please consider NFSv4.
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>
On Sat, 2022-08-06 at 11:03 -0400, Jan Kasiak wrote:
> Hi Trond,
>
> The v4 RFCs do mention protocol design flaws, but don't go into more
> detail.
>
> I was trying to understand those flaws in order to understand how and
> why v3 was problematic.
>
>
The main issues derive from the fact that NLM is a side band protocol,
meaning that it has no ability to influence the NFS protocol
operations. In particular, there is no way to ensure safe ordering of
locks and I/O. e.g. if your readahead code kicks in while you are
unlocking the file, then there is nothing that guarantees the page
reads happened while the lock was in place on the server.
The same weakness also causes problems for reboots: if your client
doesn't notice that the server rebooted (and lost your locks) because
the statd callback mechanism failed, then you're SOL. Your I/O may
succeed, but can end up causing problems for another client that has
since grabbed the lock and assumes it now has exclusive access to the
file.
NLM also suffers from intrinsic problems of its own such as lack of
only-once semantics. If you send a blocking LOCK request, and
subsequently send a CANCEL operation, then who knows whether or not the
lock or the cancel get processed first by the server? Many servers will
reply LCK_GRANTED to the CANCEL even if they did not find the lock
request. Sending an UNLOCK can also cause issues if the lock was
granted via a blocking lock callback (NLM_GRANTED) since there is no
ordering between the reply to the NLM_GRANTED and the UNLOCK.
Finally, as already mentioned, there are multiple issues associated
with client or server reboot. The NLM mechanism is pretty dependent on
yet another side band mechanism (STATD) to tell you when this occurs,
but that mechanism does not work to release the locks held by a client
if it fails to come back after reboot. Even if the client does come
back, it might forget to invoke the statd process, or it might use a
different identifier than it did during the last boot instance (e.g.
because DHCP allocated a different IP address, or the IP address it not
unique due to use of NAT, or a hostname was used that is non-unique,
...).
If the server reboots, then it may fail to notify the client of that
reboot through the callback mechanism. Reasons may include the
existence of a NAT, failure of the rpcbind/portmapper process on the
client, firewalls,...
> -Jan
>
>
> On Fri, Aug 5, 2022 at 10:27 PM Trond Myklebust
> <[email protected]> wrote:
> >
> > On Fri, 2022-08-05 at 19:17 -0400, Jan Kasiak wrote:
> > > Hi,
> > >
> > > I was looking at the code for nlmclnt_lock and wanted to ask a
> > > question about how the Linux kernel client and the NLM 4 protocol
> > > handle some errors around certain edge cases.
> > >
> > > Specifically, I think there is a race condition around two
> > > threads of
> > > the same program acquiring a lock, one of the threads being
> > > interrupted, and the NFS client sending an unlock when none of
> > > the
> > > program threads called unlock.
> > >
> > > On NFS server machine S:
> > > there exists an unlocked file F
> > >
> > > On NFS client machine C:
> > > in program P:
> > > thread 1 tries to lock(F) with fd A
> > > thread 2 tries to lock(F) with fd B
> > >
> > > The Linux client will issue two NLM_LOCK calls with the same svid
> > > and
> > > same range, because it uses the program id to map to an svid.
> > >
> > > For whatever reason, assume the connection is broken (cable gets
> > > pulled etc...)
> > > and `status = nlmclnt_call(cred, req, NLMPROC_LOCK);` fails.
> > >
> > > The Linux client will retry the request, but at some point thread
> > > 1
> > > receives a signal and nlmclnt_lock breaks out of its loop.
> > > Because
> > > the
> > > Linux client request failed, it will fall through and go to the
> > > out_unlock label, where it will want to send an unlock request.
> > >
> > > Assume that at some point the connection is reestablished.
> > >
> > > The Linux kernel client now has two outstanding lock requests to
> > > send
> > > to the remote server: one for a lock that thread 2 is still
> > > trying to
> > > acquire, and one for an unlock of thread 1 that failed and was
> > > interrupted.
> > >
> > > I'm worried that the Linux client may first send the lock
> > > request,
> > > and
> > > tell thread 2 that it acquired the lock, and then send an unlock
> > > request from the cancelled thread 1 request.
> > >
> > > The server will successfully process both requests, because the
> > > svid
> > > is the same for both, and the true server side state will be that
> > > the
> > > file is unlocked.
> > >
> > > One can talk about the wisdom of using multiple threads to
> > > acquire
> > > the
> > > same file lock, but this behavior is weird, because none of the
> > > threads called unlock.
> > >
> > > I have experimented with reproducing this, but have not been
> > > successful in triggering this ordering of events.
> > >
> > > I've also looked at the code of in clntproc.c and I don't see a
> > > spot
> > > where outstanding failed lock/unlock requests are checked while
> > > processing lock requests?
> > >
> > > Thanks,
> > > -Jan
> >
> > Nobody here is likely to want to waste much time trying to 'fix'
> > the
> > NLM locking protocol. The protocol itself is known to be extremely
> > fragile, and the endemic problems constitute some of the main
> > motivations for the development of the NFSv4 protocol
> > (See https://datatracker.ietf.org/doc/html/rfc2624#section-8
> > and https://datatracker.ietf.org/doc/html/rfc7530#section-9).
> >
> > If you need more reliable support for POSIX locks beyond what
> > exists
> > today for NLM, then please consider NFSv4.
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > [email protected]
> >
> >
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On 8/6/2022 3:49 PM, Trond Myklebust wrote:
> On Sat, 2022-08-06 at 11:03 -0400, Jan Kasiak wrote:
>> Hi Trond,
>>
>> The v4 RFCs do mention protocol design flaws, but don't go into more
>> detail.
>>
>> I was trying to understand those flaws in order to understand how and
>> why v3 was problematic.
>>
>>
>
> The main issues derive from the fact that NLM is a side band protocol,
> meaning that it has no ability to influence the NFS protocol
> operations. In particular, there is no way to ensure safe ordering of
> locks and I/O. e.g. if your readahead code kicks in while you are
> unlocking the file, then there is nothing that guarantees the page
> reads happened while the lock was in place on the server.
> The same weakness also causes problems for reboots: if your client
> doesn't notice that the server rebooted (and lost your locks) because
> the statd callback mechanism failed, then you're SOL. Your I/O may
> succeed, but can end up causing problems for another client that has
> since grabbed the lock and assumes it now has exclusive access to the
> file.
>
> NLM also suffers from intrinsic problems of its own such as lack of
> only-once semantics. If you send a blocking LOCK request, and
> subsequently send a CANCEL operation, then who knows whether or not the
> lock or the cancel get processed first by the server? Many servers will
> reply LCK_GRANTED to the CANCEL even if they did not find the lock
> request. Sending an UNLOCK can also cause issues if the lock was
> granted via a blocking lock callback (NLM_GRANTED) since there is no
> ordering between the reply to the NLM_GRANTED and the UNLOCK.
>
> Finally, as already mentioned, there are multiple issues associated
> with client or server reboot. The NLM mechanism is pretty dependent on
> yet another side band mechanism (STATD) to tell you when this occurs,
> but that mechanism does not work to release the locks held by a client
> if it fails to come back after reboot. Even if the client does come
> back, it might forget to invoke the statd process, or it might use a
> different identifier than it did during the last boot instance (e.g.
> because DHCP allocated a different IP address, or the IP address it not
> unique due to use of NAT, or a hostname was used that is non-unique,
> ...).
> If the server reboots, then it may fail to notify the client of that
> reboot through the callback mechanism. Reasons may include the
> existence of a NAT, failure of the rpcbind/portmapper process on the
> client, firewalls,...
That brought back memories.
http://www.nfsv4bat.org/Documents/ConnectAThon/2006/talpey-cthon06-nsm.pdf
Here's an even older issues list for nlm on Solaris circa 1996.
The portrait-mode slides are in reverse order. :)
http://www.nfsv4bat.org/Documents/ConnectAThon/1996/lockmgr.pdf
The NLM protocol is an antique and hasn't been looked at in well
over a decade (or two!). NLMv4 (circa 1995) widened offsets to
64-bit, which was the last innovation it got. None of the RPC
sideband protocols were ever standardized, btw.
Jan, what are you planning to use it for? Personally I'd advise
against pretty much anything.
Tom.
>
>> -Jan
>>
>>
>> On Fri, Aug 5, 2022 at 10:27 PM Trond Myklebust
>> <[email protected]> wrote:
>>>
>>> On Fri, 2022-08-05 at 19:17 -0400, Jan Kasiak wrote:
>>>> Hi,
>>>>
>>>> I was looking at the code for nlmclnt_lock and wanted to ask a
>>>> question about how the Linux kernel client and the NLM 4 protocol
>>>> handle some errors around certain edge cases.
>>>>
>>>> Specifically, I think there is a race condition around two
>>>> threads of
>>>> the same program acquiring a lock, one of the threads being
>>>> interrupted, and the NFS client sending an unlock when none of
>>>> the
>>>> program threads called unlock.
>>>>
>>>> On NFS server machine S:
>>>> there exists an unlocked file F
>>>>
>>>> On NFS client machine C:
>>>> in program P:
>>>> thread 1 tries to lock(F) with fd A
>>>> thread 2 tries to lock(F) with fd B
>>>>
>>>> The Linux client will issue two NLM_LOCK calls with the same svid
>>>> and
>>>> same range, because it uses the program id to map to an svid.
>>>>
>>>> For whatever reason, assume the connection is broken (cable gets
>>>> pulled etc...)
>>>> and `status = nlmclnt_call(cred, req, NLMPROC_LOCK);` fails.
>>>>
>>>> The Linux client will retry the request, but at some point thread
>>>> 1
>>>> receives a signal and nlmclnt_lock breaks out of its loop.
>>>> Because
>>>> the
>>>> Linux client request failed, it will fall through and go to the
>>>> out_unlock label, where it will want to send an unlock request.
>>>>
>>>> Assume that at some point the connection is reestablished.
>>>>
>>>> The Linux kernel client now has two outstanding lock requests to
>>>> send
>>>> to the remote server: one for a lock that thread 2 is still
>>>> trying to
>>>> acquire, and one for an unlock of thread 1 that failed and was
>>>> interrupted.
>>>>
>>>> I'm worried that the Linux client may first send the lock
>>>> request,
>>>> and
>>>> tell thread 2 that it acquired the lock, and then send an unlock
>>>> request from the cancelled thread 1 request.
>>>>
>>>> The server will successfully process both requests, because the
>>>> svid
>>>> is the same for both, and the true server side state will be that
>>>> the
>>>> file is unlocked.
>>>>
>>>> One can talk about the wisdom of using multiple threads to
>>>> acquire
>>>> the
>>>> same file lock, but this behavior is weird, because none of the
>>>> threads called unlock.
>>>>
>>>> I have experimented with reproducing this, but have not been
>>>> successful in triggering this ordering of events.
>>>>
>>>> I've also looked at the code of in clntproc.c and I don't see a
>>>> spot
>>>> where outstanding failed lock/unlock requests are checked while
>>>> processing lock requests?
>>>>
>>>> Thanks,
>>>> -Jan
>>>
>>> Nobody here is likely to want to waste much time trying to 'fix'
>>> the
>>> NLM locking protocol. The protocol itself is known to be extremely
>>> fragile, and the endemic problems constitute some of the main
>>> motivations for the development of the NFSv4 protocol
>>> (See https://datatracker.ietf.org/doc/html/rfc2624#section-8
>>> and https://datatracker.ietf.org/doc/html/rfc7530#section-9).
>>>
>>> If you need more reliable support for POSIX locks beyond what
>>> exists
>>> today for NLM, then please consider NFSv4.
>>>
>>> --
>>> Trond Myklebust
>>> Linux NFS client maintainer, Hammerspace
>>> [email protected]
>>>
>>>
>
Thanks for all of the resources!
I was trying to implement an NFS server, and v3 sounded like an easier
place to start :-)
I think I'll move on to v4.
If we're revisiting the past, maybe just one last historical question:
Do either of you know why the Linux Kernel only uses the IP
address/svid to identify the caller?
FreeBSD uses the owner field as well.
Jan
On Sun, Aug 7, 2022 at 8:01 AM Tom Talpey <[email protected]> wrote:
>
> On 8/6/2022 3:49 PM, Trond Myklebust wrote:
> > On Sat, 2022-08-06 at 11:03 -0400, Jan Kasiak wrote:
> >> Hi Trond,
> >>
> >> The v4 RFCs do mention protocol design flaws, but don't go into more
> >> detail.
> >>
> >> I was trying to understand those flaws in order to understand how and
> >> why v3 was problematic.
> >>
> >>
> >
> > The main issues derive from the fact that NLM is a side band protocol,
> > meaning that it has no ability to influence the NFS protocol
> > operations. In particular, there is no way to ensure safe ordering of
> > locks and I/O. e.g. if your readahead code kicks in while you are
> > unlocking the file, then there is nothing that guarantees the page
> > reads happened while the lock was in place on the server.
> > The same weakness also causes problems for reboots: if your client
> > doesn't notice that the server rebooted (and lost your locks) because
> > the statd callback mechanism failed, then you're SOL. Your I/O may
> > succeed, but can end up causing problems for another client that has
> > since grabbed the lock and assumes it now has exclusive access to the
> > file.
> >
> > NLM also suffers from intrinsic problems of its own such as lack of
> > only-once semantics. If you send a blocking LOCK request, and
> > subsequently send a CANCEL operation, then who knows whether or not the
> > lock or the cancel get processed first by the server? Many servers will
> > reply LCK_GRANTED to the CANCEL even if they did not find the lock
> > request. Sending an UNLOCK can also cause issues if the lock was
> > granted via a blocking lock callback (NLM_GRANTED) since there is no
> > ordering between the reply to the NLM_GRANTED and the UNLOCK.
> >
> > Finally, as already mentioned, there are multiple issues associated
> > with client or server reboot. The NLM mechanism is pretty dependent on
> > yet another side band mechanism (STATD) to tell you when this occurs,
> > but that mechanism does not work to release the locks held by a client
> > if it fails to come back after reboot. Even if the client does come
> > back, it might forget to invoke the statd process, or it might use a
> > different identifier than it did during the last boot instance (e.g.
> > because DHCP allocated a different IP address, or the IP address it not
> > unique due to use of NAT, or a hostname was used that is non-unique,
> > ...).
> > If the server reboots, then it may fail to notify the client of that
> > reboot through the callback mechanism. Reasons may include the
> > existence of a NAT, failure of the rpcbind/portmapper process on the
> > client, firewalls,...
>
> That brought back memories.
>
> http://www.nfsv4bat.org/Documents/ConnectAThon/2006/talpey-cthon06-nsm.pdf
>
> Here's an even older issues list for nlm on Solaris circa 1996.
> The portrait-mode slides are in reverse order. :)
>
> http://www.nfsv4bat.org/Documents/ConnectAThon/1996/lockmgr.pdf
>
> The NLM protocol is an antique and hasn't been looked at in well
> over a decade (or two!). NLMv4 (circa 1995) widened offsets to
> 64-bit, which was the last innovation it got. None of the RPC
> sideband protocols were ever standardized, btw.
>
> Jan, what are you planning to use it for? Personally I'd advise
> against pretty much anything.
>
> Tom.
>
> >
> >> -Jan
> >>
> >>
> >> On Fri, Aug 5, 2022 at 10:27 PM Trond Myklebust
> >> <[email protected]> wrote:
> >>>
> >>> On Fri, 2022-08-05 at 19:17 -0400, Jan Kasiak wrote:
> >>>> Hi,
> >>>>
> >>>> I was looking at the code for nlmclnt_lock and wanted to ask a
> >>>> question about how the Linux kernel client and the NLM 4 protocol
> >>>> handle some errors around certain edge cases.
> >>>>
> >>>> Specifically, I think there is a race condition around two
> >>>> threads of
> >>>> the same program acquiring a lock, one of the threads being
> >>>> interrupted, and the NFS client sending an unlock when none of
> >>>> the
> >>>> program threads called unlock.
> >>>>
> >>>> On NFS server machine S:
> >>>> there exists an unlocked file F
> >>>>
> >>>> On NFS client machine C:
> >>>> in program P:
> >>>> thread 1 tries to lock(F) with fd A
> >>>> thread 2 tries to lock(F) with fd B
> >>>>
> >>>> The Linux client will issue two NLM_LOCK calls with the same svid
> >>>> and
> >>>> same range, because it uses the program id to map to an svid.
> >>>>
> >>>> For whatever reason, assume the connection is broken (cable gets
> >>>> pulled etc...)
> >>>> and `status = nlmclnt_call(cred, req, NLMPROC_LOCK);` fails.
> >>>>
> >>>> The Linux client will retry the request, but at some point thread
> >>>> 1
> >>>> receives a signal and nlmclnt_lock breaks out of its loop.
> >>>> Because
> >>>> the
> >>>> Linux client request failed, it will fall through and go to the
> >>>> out_unlock label, where it will want to send an unlock request.
> >>>>
> >>>> Assume that at some point the connection is reestablished.
> >>>>
> >>>> The Linux kernel client now has two outstanding lock requests to
> >>>> send
> >>>> to the remote server: one for a lock that thread 2 is still
> >>>> trying to
> >>>> acquire, and one for an unlock of thread 1 that failed and was
> >>>> interrupted.
> >>>>
> >>>> I'm worried that the Linux client may first send the lock
> >>>> request,
> >>>> and
> >>>> tell thread 2 that it acquired the lock, and then send an unlock
> >>>> request from the cancelled thread 1 request.
> >>>>
> >>>> The server will successfully process both requests, because the
> >>>> svid
> >>>> is the same for both, and the true server side state will be that
> >>>> the
> >>>> file is unlocked.
> >>>>
> >>>> One can talk about the wisdom of using multiple threads to
> >>>> acquire
> >>>> the
> >>>> same file lock, but this behavior is weird, because none of the
> >>>> threads called unlock.
> >>>>
> >>>> I have experimented with reproducing this, but have not been
> >>>> successful in triggering this ordering of events.
> >>>>
> >>>> I've also looked at the code of in clntproc.c and I don't see a
> >>>> spot
> >>>> where outstanding failed lock/unlock requests are checked while
> >>>> processing lock requests?
> >>>>
> >>>> Thanks,
> >>>> -Jan
> >>>
> >>> Nobody here is likely to want to waste much time trying to 'fix'
> >>> the
> >>> NLM locking protocol. The protocol itself is known to be extremely
> >>> fragile, and the endemic problems constitute some of the main
> >>> motivations for the development of the NFSv4 protocol
> >>> (See https://datatracker.ietf.org/doc/html/rfc2624#section-8
> >>> and https://datatracker.ietf.org/doc/html/rfc7530#section-9).
> >>>
> >>> If you need more reliable support for POSIX locks beyond what
> >>> exists
> >>> today for NLM, then please consider NFSv4.
> >>>
> >>> --
> >>> Trond Myklebust
> >>> Linux NFS client maintainer, Hammerspace
> >>> [email protected]
> >>>
> >>>
> >
Sorry, I meant FreeBSD uses the caller field as well.
-Jan
On Tue, Aug 9, 2022 at 10:14 AM Jan Kasiak <[email protected]> wrote:
>
> Thanks for all of the resources!
>
> I was trying to implement an NFS server, and v3 sounded like an easier
> place to start :-)
>
> I think I'll move on to v4.
>
> If we're revisiting the past, maybe just one last historical question:
>
> Do either of you know why the Linux Kernel only uses the IP
> address/svid to identify the caller?
>
> FreeBSD uses the owner field as well.
>
> Jan
>
> On Sun, Aug 7, 2022 at 8:01 AM Tom Talpey <[email protected]> wrote:
> >
> > On 8/6/2022 3:49 PM, Trond Myklebust wrote:
> > > On Sat, 2022-08-06 at 11:03 -0400, Jan Kasiak wrote:
> > >> Hi Trond,
> > >>
> > >> The v4 RFCs do mention protocol design flaws, but don't go into more
> > >> detail.
> > >>
> > >> I was trying to understand those flaws in order to understand how and
> > >> why v3 was problematic.
> > >>
> > >>
> > >
> > > The main issues derive from the fact that NLM is a side band protocol,
> > > meaning that it has no ability to influence the NFS protocol
> > > operations. In particular, there is no way to ensure safe ordering of
> > > locks and I/O. e.g. if your readahead code kicks in while you are
> > > unlocking the file, then there is nothing that guarantees the page
> > > reads happened while the lock was in place on the server.
> > > The same weakness also causes problems for reboots: if your client
> > > doesn't notice that the server rebooted (and lost your locks) because
> > > the statd callback mechanism failed, then you're SOL. Your I/O may
> > > succeed, but can end up causing problems for another client that has
> > > since grabbed the lock and assumes it now has exclusive access to the
> > > file.
> > >
> > > NLM also suffers from intrinsic problems of its own such as lack of
> > > only-once semantics. If you send a blocking LOCK request, and
> > > subsequently send a CANCEL operation, then who knows whether or not the
> > > lock or the cancel get processed first by the server? Many servers will
> > > reply LCK_GRANTED to the CANCEL even if they did not find the lock
> > > request. Sending an UNLOCK can also cause issues if the lock was
> > > granted via a blocking lock callback (NLM_GRANTED) since there is no
> > > ordering between the reply to the NLM_GRANTED and the UNLOCK.
> > >
> > > Finally, as already mentioned, there are multiple issues associated
> > > with client or server reboot. The NLM mechanism is pretty dependent on
> > > yet another side band mechanism (STATD) to tell you when this occurs,
> > > but that mechanism does not work to release the locks held by a client
> > > if it fails to come back after reboot. Even if the client does come
> > > back, it might forget to invoke the statd process, or it might use a
> > > different identifier than it did during the last boot instance (e.g.
> > > because DHCP allocated a different IP address, or the IP address it not
> > > unique due to use of NAT, or a hostname was used that is non-unique,
> > > ...).
> > > If the server reboots, then it may fail to notify the client of that
> > > reboot through the callback mechanism. Reasons may include the
> > > existence of a NAT, failure of the rpcbind/portmapper process on the
> > > client, firewalls,...
> >
> > That brought back memories.
> >
> > http://www.nfsv4bat.org/Documents/ConnectAThon/2006/talpey-cthon06-nsm.pdf
> >
> > Here's an even older issues list for nlm on Solaris circa 1996.
> > The portrait-mode slides are in reverse order. :)
> >
> > http://www.nfsv4bat.org/Documents/ConnectAThon/1996/lockmgr.pdf
> >
> > The NLM protocol is an antique and hasn't been looked at in well
> > over a decade (or two!). NLMv4 (circa 1995) widened offsets to
> > 64-bit, which was the last innovation it got. None of the RPC
> > sideband protocols were ever standardized, btw.
> >
> > Jan, what are you planning to use it for? Personally I'd advise
> > against pretty much anything.
> >
> > Tom.
> >
> > >
> > >> -Jan
> > >>
> > >>
> > >> On Fri, Aug 5, 2022 at 10:27 PM Trond Myklebust
> > >> <[email protected]> wrote:
> > >>>
> > >>> On Fri, 2022-08-05 at 19:17 -0400, Jan Kasiak wrote:
> > >>>> Hi,
> > >>>>
> > >>>> I was looking at the code for nlmclnt_lock and wanted to ask a
> > >>>> question about how the Linux kernel client and the NLM 4 protocol
> > >>>> handle some errors around certain edge cases.
> > >>>>
> > >>>> Specifically, I think there is a race condition around two
> > >>>> threads of
> > >>>> the same program acquiring a lock, one of the threads being
> > >>>> interrupted, and the NFS client sending an unlock when none of
> > >>>> the
> > >>>> program threads called unlock.
> > >>>>
> > >>>> On NFS server machine S:
> > >>>> there exists an unlocked file F
> > >>>>
> > >>>> On NFS client machine C:
> > >>>> in program P:
> > >>>> thread 1 tries to lock(F) with fd A
> > >>>> thread 2 tries to lock(F) with fd B
> > >>>>
> > >>>> The Linux client will issue two NLM_LOCK calls with the same svid
> > >>>> and
> > >>>> same range, because it uses the program id to map to an svid.
> > >>>>
> > >>>> For whatever reason, assume the connection is broken (cable gets
> > >>>> pulled etc...)
> > >>>> and `status = nlmclnt_call(cred, req, NLMPROC_LOCK);` fails.
> > >>>>
> > >>>> The Linux client will retry the request, but at some point thread
> > >>>> 1
> > >>>> receives a signal and nlmclnt_lock breaks out of its loop.
> > >>>> Because
> > >>>> the
> > >>>> Linux client request failed, it will fall through and go to the
> > >>>> out_unlock label, where it will want to send an unlock request.
> > >>>>
> > >>>> Assume that at some point the connection is reestablished.
> > >>>>
> > >>>> The Linux kernel client now has two outstanding lock requests to
> > >>>> send
> > >>>> to the remote server: one for a lock that thread 2 is still
> > >>>> trying to
> > >>>> acquire, and one for an unlock of thread 1 that failed and was
> > >>>> interrupted.
> > >>>>
> > >>>> I'm worried that the Linux client may first send the lock
> > >>>> request,
> > >>>> and
> > >>>> tell thread 2 that it acquired the lock, and then send an unlock
> > >>>> request from the cancelled thread 1 request.
> > >>>>
> > >>>> The server will successfully process both requests, because the
> > >>>> svid
> > >>>> is the same for both, and the true server side state will be that
> > >>>> the
> > >>>> file is unlocked.
> > >>>>
> > >>>> One can talk about the wisdom of using multiple threads to
> > >>>> acquire
> > >>>> the
> > >>>> same file lock, but this behavior is weird, because none of the
> > >>>> threads called unlock.
> > >>>>
> > >>>> I have experimented with reproducing this, but have not been
> > >>>> successful in triggering this ordering of events.
> > >>>>
> > >>>> I've also looked at the code of in clntproc.c and I don't see a
> > >>>> spot
> > >>>> where outstanding failed lock/unlock requests are checked while
> > >>>> processing lock requests?
> > >>>>
> > >>>> Thanks,
> > >>>> -Jan
> > >>>
> > >>> Nobody here is likely to want to waste much time trying to 'fix'
> > >>> the
> > >>> NLM locking protocol. The protocol itself is known to be extremely
> > >>> fragile, and the endemic problems constitute some of the main
> > >>> motivations for the development of the NFSv4 protocol
> > >>> (See https://datatracker.ietf.org/doc/html/rfc2624#section-8
> > >>> and https://datatracker.ietf.org/doc/html/rfc7530#section-9).
> > >>>
> > >>> If you need more reliable support for POSIX locks beyond what
> > >>> exists
> > >>> today for NLM, then please consider NFSv4.
> > >>>
> > >>> --
> > >>> Trond Myklebust
> > >>> Linux NFS client maintainer, Hammerspace
> > >>> [email protected]
> > >>>
> > >>>
> > >