Hi folks,
Given that not everything for NFSv3 has a specification, I post a
question here (as it concerns linux v3 (client) implementation) but I
ask a generic question with respect to NOTIFY sent by an NFS server. A
NOTIFY message that is sent by an NFS server upon reboot has a monitor
name and a state. This "state" is an integer and is modified on each
server reboot. My question is: what about state value uniqueness? Is
there somewhere some notion that this value has to be unique (as in
say a random value).
Here's a problem. Say a client has 2 mounts to ip1 and ip2 (both
representing the same DNS name) and acquires a lock per mount. Now say
each of those servers reboot. Once up they each send a NOTIFY call and
each use a timestamp as basis for their "state" value -- which very
likely is to produce the same value for 2 servers rebooted at the same
time (or for the linux server that looks like a counter). On the
client side, once the client processes the 1st NOTIFY call, it updates
the "state" for the monitor name (ie a client monitors based on a DNS
name which is the same for ip1 and ip2) and then in the current code,
because the 2nd NOTIFY has the same "state" value this NOTIFY call
would be ignored. The linux client would never reclaim the 2nd lock
(but the application obviously would never know it's missing a lock)
--- data corruption.
Who is to blame: is the server not allowed to send "non-unique" state
value? Or is the client at fault here for some reason?
I'd appreciate the feedback.
> On May 14, 2024, at 2:56 PM, Olga Kornievskaia <[email protected]> wrote:
>
> Hi folks,
>
> Given that not everything for NFSv3 has a specification, I post a
> question here (as it concerns linux v3 (client) implementation) but I
> ask a generic question with respect to NOTIFY sent by an NFS server.
There is a standard:
https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm
> A NOTIFY message that is sent by an NFS server upon reboot has a monitor
> name and a state. This "state" is an integer and is modified on each
> server reboot. My question is: what about state value uniqueness? Is
> there somewhere some notion that this value has to be unique (as in
> say a random value).
>
> Here's a problem. Say a client has 2 mounts to ip1 and ip2 (both
> representing the same DNS name) and acquires a lock per mount. Now say
> each of those servers reboot. Once up they each send a NOTIFY call and
> each use a timestamp as basis for their "state" value -- which very
> likely is to produce the same value for 2 servers rebooted at the same
> time (or for the linux server that looks like a counter). On the
> client side, once the client processes the 1st NOTIFY call, it updates
> the "state" for the monitor name (ie a client monitors based on a DNS
> name which is the same for ip1 and ip2) and then in the current code,
> because the 2nd NOTIFY has the same "state" value this NOTIFY call
> would be ignored. The linux client would never reclaim the 2nd lock
> (but the application obviously would never know it's missing a lock)
> --- data corruption.
>
> Who is to blame: is the server not allowed to send "non-unique" state
> value? Or is the client at fault here for some reason?
The state value is supposed to be specific to the monitored
host. If the client is indeed ignoring the second reboot
notification, that's incorrect behavior, IMO.
--
Chuck Lever
On Tue, May 14, 2024 at 5:09 PM Chuck Lever III <[email protected]> wrote:
>
>
>
> > On May 14, 2024, at 2:56 PM, Olga Kornievskaia <[email protected]> wrote:
> >
> > Hi folks,
> >
> > Given that not everything for NFSv3 has a specification, I post a
> > question here (as it concerns linux v3 (client) implementation) but I
> > ask a generic question with respect to NOTIFY sent by an NFS server.
>
> There is a standard:
>
> https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm
Thank you Chuck. This too does not give any limits as to the
uniqueness of the state value.
> > A NOTIFY message that is sent by an NFS server upon reboot has a monitor
> > name and a state. This "state" is an integer and is modified on each
> > server reboot. My question is: what about state value uniqueness? Is
> > there somewhere some notion that this value has to be unique (as in
> > say a random value).
> >
> > Here's a problem. Say a client has 2 mounts to ip1 and ip2 (both
> > representing the same DNS name) and acquires a lock per mount. Now say
> > each of those servers reboot. Once up they each send a NOTIFY call and
> > each use a timestamp as basis for their "state" value -- which very
> > likely is to produce the same value for 2 servers rebooted at the same
> > time (or for the linux server that looks like a counter). On the
> > client side, once the client processes the 1st NOTIFY call, it updates
> > the "state" for the monitor name (ie a client monitors based on a DNS
> > name which is the same for ip1 and ip2) and then in the current code,
> > because the 2nd NOTIFY has the same "state" value this NOTIFY call
> > would be ignored. The linux client would never reclaim the 2nd lock
> > (but the application obviously would never know it's missing a lock)
> > --- data corruption.
> >
> > Who is to blame: is the server not allowed to send "non-unique" state
> > value? Or is the client at fault here for some reason?
>
> The state value is supposed to be specific to the monitored
> host. If the client is indeed ignoring the second reboot
> notification, that's incorrect behavior, IMO.
State is supposed to help against replays I think. This client is in
its right to update the state value upon processing a reboot
notification.
The fact that another sm_notiffy comes with the same state (and from
the same DNS name monitor name) seems logical that can be a re-try and
thus grounds for ignoring it.
>
>
> --
> Chuck Lever
>
>
On Tue, May 14, 2024 at 5:36 PM Frank Filz <[email protected]> wrote:
>
> > > On May 14, 2024, at 2:56 PM, Olga Kornievskaia <[email protected]> wrote:
> > >
> > > Hi folks,
> > >
> > > Given that not everything for NFSv3 has a specification, I post a
> > > question here (as it concerns linux v3 (client) implementation) but I
> > > ask a generic question with respect to NOTIFY sent by an NFS server.
> >
> > There is a standard:
> >
> > https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm
> >
> >
> > > A NOTIFY message that is sent by an NFS server upon reboot has a
> > > monitor name and a state. This "state" is an integer and is modified
> > > on each server reboot. My question is: what about state value
> > > uniqueness? Is there somewhere some notion that this value has to be
> > > unique (as in say a random value).
> > >
> > > Here's a problem. Say a client has 2 mounts to ip1 and ip2 (both
> > > representing the same DNS name) and acquires a lock per mount. Now say
> > > each of those servers reboot. Once up they each send a NOTIFY call and
> > > each use a timestamp as basis for their "state" value -- which very
> > > likely is to produce the same value for 2 servers rebooted at the same
> > > time (or for the linux server that looks like a counter). On the
> > > client side, once the client processes the 1st NOTIFY call, it updates
> > > the "state" for the monitor name (ie a client monitors based on a DNS
> > > name which is the same for ip1 and ip2) and then in the current code,
> > > because the 2nd NOTIFY has the same "state" value this NOTIFY call
> > > would be ignored. The linux client would never reclaim the 2nd lock
> > > (but the application obviously would never know it's missing a lock)
> > > --- data corruption.
> > >
> > > Who is to blame: is the server not allowed to send "non-unique" state
> > > value? Or is the client at fault here for some reason?
> >
> > The state value is supposed to be specific to the monitored host. If the client is
> > indeed ignoring the second reboot notification, that's incorrect behavior, IMO.
>
> If you are using multiple server IP addresses with the same DNS name, you may want to set:
>
> sysctl fs.nfs.nsm_use_hostnames=0
>
> The NLM will register with statd using the IP address as name instead of host name. Then your two IP addresses will each have a separate monitor entry and state value monitored.
In my setup I already have this set to 0. But I'll look around the
code to see what it is supposed to do.
>
> Frank
>
> > On May 14, 2024, at 2:56 PM, Olga Kornievskaia <[email protected]> wrote:
> >
> > Hi folks,
> >
> > Given that not everything for NFSv3 has a specification, I post a
> > question here (as it concerns linux v3 (client) implementation) but I
> > ask a generic question with respect to NOTIFY sent by an NFS server.
>
> There is a standard:
>
> https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm
>
>
> > A NOTIFY message that is sent by an NFS server upon reboot has a
> > monitor name and a state. This "state" is an integer and is modified
> > on each server reboot. My question is: what about state value
> > uniqueness? Is there somewhere some notion that this value has to be
> > unique (as in say a random value).
> >
> > Here's a problem. Say a client has 2 mounts to ip1 and ip2 (both
> > representing the same DNS name) and acquires a lock per mount. Now say
> > each of those servers reboot. Once up they each send a NOTIFY call and
> > each use a timestamp as basis for their "state" value -- which very
> > likely is to produce the same value for 2 servers rebooted at the same
> > time (or for the linux server that looks like a counter). On the
> > client side, once the client processes the 1st NOTIFY call, it updates
> > the "state" for the monitor name (ie a client monitors based on a DNS
> > name which is the same for ip1 and ip2) and then in the current code,
> > because the 2nd NOTIFY has the same "state" value this NOTIFY call
> > would be ignored. The linux client would never reclaim the 2nd lock
> > (but the application obviously would never know it's missing a lock)
> > --- data corruption.
> >
> > Who is to blame: is the server not allowed to send "non-unique" state
> > value? Or is the client at fault here for some reason?
>
> The state value is supposed to be specific to the monitored host. If the client is
> indeed ignoring the second reboot notification, that's incorrect behavior, IMO.
If you are using multiple server IP addresses with the same DNS name, you may want to set:
sysctl fs.nfs.nsm_use_hostnames=0
The NLM will register with statd using the IP address as name instead of host name. Then your two IP addresses will each have a separate monitor entry and state value monitored.
Frank
> -----Original Message-----
> From: Olga Kornievskaia [mailto:[email protected]]
> Sent: Tuesday, May 14, 2024 2:50 PM
> To: Frank Filz <[email protected]>
> Cc: Chuck Lever III <[email protected]>; Linux NFS Mailing List <linux-
> [email protected]>
> Subject: Re: sm notify (nlm) question
>
> On Tue, May 14, 2024 at 5:36 PM Frank Filz <[email protected]> wrote:
> >
> > > > On May 14, 2024, at 2:56 PM, Olga Kornievskaia <[email protected]>
> wrote:
> > > >
> > > > Hi folks,
> > > >
> > > > Given that not everything for NFSv3 has a specification, I post a
> > > > question here (as it concerns linux v3 (client) implementation)
> > > > but I ask a generic question with respect to NOTIFY sent by an NFS server.
> > >
> > > There is a standard:
> > >
> > > https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm
> > >
> > >
> > > > A NOTIFY message that is sent by an NFS server upon reboot has a
> > > > monitor name and a state. This "state" is an integer and is
> > > > modified on each server reboot. My question is: what about state
> > > > value uniqueness? Is there somewhere some notion that this value
> > > > has to be unique (as in say a random value).
> > > >
> > > > Here's a problem. Say a client has 2 mounts to ip1 and ip2 (both
> > > > representing the same DNS name) and acquires a lock per mount. Now
> > > > say each of those servers reboot. Once up they each send a NOTIFY
> > > > call and each use a timestamp as basis for their "state" value --
> > > > which very likely is to produce the same value for 2 servers
> > > > rebooted at the same time (or for the linux server that looks like
> > > > a counter). On the client side, once the client processes the 1st
> > > > NOTIFY call, it updates the "state" for the monitor name (ie a
> > > > client monitors based on a DNS name which is the same for ip1 and
> > > > ip2) and then in the current code, because the 2nd NOTIFY has the
> > > > same "state" value this NOTIFY call would be ignored. The linux
> > > > client would never reclaim the 2nd lock (but the application
> > > > obviously would never know it's missing a lock)
> > > > --- data corruption.
> > > >
> > > > Who is to blame: is the server not allowed to send "non-unique"
> > > > state value? Or is the client at fault here for some reason?
> > >
> > > The state value is supposed to be specific to the monitored host. If
> > > the client is indeed ignoring the second reboot notification, that's incorrect
> behavior, IMO.
> >
> > If you are using multiple server IP addresses with the same DNS name, you
> may want to set:
> >
> > sysctl fs.nfs.nsm_use_hostnames=0
> >
> > The NLM will register with statd using the IP address as name instead of host
> name. Then your two IP addresses will each have a separate monitor entry and
> state value monitored.
>
> In my setup I already have this set to 0. But I'll look around the code to see what
> it is supposed to do.
Hmm, maybe it doesn't work on the client side. I don't often test NLM clients with my Ganesha work because I only run one VM and NLM clients can’t function on the same host as any server other than knfsd...
Frank
On Tue, May 14, 2024 at 6:13 PM Frank Filz <[email protected]> wrote:
>
>
>
> > -----Original Message-----
> > From: Olga Kornievskaia [mailto:[email protected]]
> > Sent: Tuesday, May 14, 2024 2:50 PM
> > To: Frank Filz <[email protected]>
> > Cc: Chuck Lever III <[email protected]>; Linux NFS Mailing List <linux-
> > [email protected]>
> > Subject: Re: sm notify (nlm) question
> >
> > On Tue, May 14, 2024 at 5:36 PM Frank Filz <[email protected]> wrote:
> > >
> > > > > On May 14, 2024, at 2:56 PM, Olga Kornievskaia <[email protected]>
> > wrote:
> > > > >
> > > > > Hi folks,
> > > > >
> > > > > Given that not everything for NFSv3 has a specification, I post a
> > > > > question here (as it concerns linux v3 (client) implementation)
> > > > > but I ask a generic question with respect to NOTIFY sent by an NFS server.
> > > >
> > > > There is a standard:
> > > >
> > > > https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm
> > > >
> > > >
> > > > > A NOTIFY message that is sent by an NFS server upon reboot has a
> > > > > monitor name and a state. This "state" is an integer and is
> > > > > modified on each server reboot. My question is: what about state
> > > > > value uniqueness? Is there somewhere some notion that this value
> > > > > has to be unique (as in say a random value).
> > > > >
> > > > > Here's a problem. Say a client has 2 mounts to ip1 and ip2 (both
> > > > > representing the same DNS name) and acquires a lock per mount. Now
> > > > > say each of those servers reboot. Once up they each send a NOTIFY
> > > > > call and each use a timestamp as basis for their "state" value --
> > > > > which very likely is to produce the same value for 2 servers
> > > > > rebooted at the same time (or for the linux server that looks like
> > > > > a counter). On the client side, once the client processes the 1st
> > > > > NOTIFY call, it updates the "state" for the monitor name (ie a
> > > > > client monitors based on a DNS name which is the same for ip1 and
> > > > > ip2) and then in the current code, because the 2nd NOTIFY has the
> > > > > same "state" value this NOTIFY call would be ignored. The linux
> > > > > client would never reclaim the 2nd lock (but the application
> > > > > obviously would never know it's missing a lock)
> > > > > --- data corruption.
> > > > >
> > > > > Who is to blame: is the server not allowed to send "non-unique"
> > > > > state value? Or is the client at fault here for some reason?
> > > >
> > > > The state value is supposed to be specific to the monitored host. If
> > > > the client is indeed ignoring the second reboot notification, that's incorrect
> > behavior, IMO.
> > >
> > > If you are using multiple server IP addresses with the same DNS name, you
> > may want to set:
> > >
> > > sysctl fs.nfs.nsm_use_hostnames=0
> > >
> > > The NLM will register with statd using the IP address as name instead of host
> > name. Then your two IP addresses will each have a separate monitor entry and
> > state value monitored.
> >
> > In my setup I already have this set to 0. But I'll look around the code to see what
> > it is supposed to do.
>
> Hmm, maybe it doesn't work on the client side. I don't often test NLM clients with my Ganesha work because I only run one VM and NLM clients can’t function on the same host as any server other than knfsd...
I've been staring and tracing the code and here's what I conclude: the
use of nsm_use_hostname toggles nothing that helps. No matter what
statd always stores whatever it is monitoring based on the DSN name
(looks like git blame says it's due to nfs-utils's commit
0da56f7d359475837008ea4b8d3764fe982ef512 "statd - use dnsname to
ensure correct matching of NOTIFY requests". Now what's worse is that
when statd receives a 2nd monitoring request from lockd for something
that maps to the same DNS name, statd overwrites the previous
monitoring information it had. When a NOTIFY arrives from an IP
matching the DNS name, the statd does the downcall and it will send
whatever the last monitoring information lockd gave it. Therefore all
the other locks will never be recovered.
What I struggle with is how to solve this problem. Say ip1 and ip2 run
an NFS server and both are known under the same DNS name: foo.bar.com.
Does it mean that they represent the "same" server? Can we assume that
if one of them "rebooted" then the other rebooted as well? It seems
like we can't go backwards and go back to monitoring by IP. In that
case I can see that we'll get in trouble if the rebooted server indeed
comes back up with a different IP (same DNS name) and then it would
never match the old entry and the lock would never be recovered (but
then also I think lockd will only send the lock to the IP is stored
previously which in this case would be unreachable). If statd
continues to monitor by DNS name and then matches either ips to the
stored entry, then the problem comes with "state" update. Once statd
processes one NOTIFY which matched the DNS name its state "should" be
updated but then it would leads us back into the problem if ignoring
the 2nd NOTIFY call. If statd were to be changed to store multiple
monitor handles lockd asked to monitor, then when the 1st NOTIFY call
comes we can ask lockd to recover "all" the store handles. But then it
circles back to my question: can we assume that if one IP rebooted
does it imply all IPs rebooted?
Perhaps it's lockd that needs to change in how it keeps track of
servers that hold locks. The behaviour seems to have changed in 2010
(with commit 8ea6ecc8b0759756a766c05dc7c98c51ec90de37 "lockd: Create
client-side nlm_host cache") when nlm_host cache was introduced
written to be based on hash of IP. It seems that before things were
based on a DNS name making it in line with statd.
Anybody has any thoughts as to whether statd or lockd needs to change?
On Wed, 2024-05-22 at 09:57 -0400, Olga Kornievskaia wrote:
> On Tue, May 14, 2024 at 6:13 PM Frank Filz <[email protected]>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Olga Kornievskaia [mailto:[email protected]]
> > > Sent: Tuesday, May 14, 2024 2:50 PM
> > > To: Frank Filz <[email protected]>
> > > Cc: Chuck Lever III <[email protected]>; Linux NFS Mailing
> > > List <linux-
> > > [email protected]>
> > > Subject: Re: sm notify (nlm) question
> > >
> > > On Tue, May 14, 2024 at 5:36 PM Frank Filz
> > > <[email protected]> wrote:
> > > >
> > > > > > On May 14, 2024, at 2:56 PM, Olga Kornievskaia
> > > > > > <[email protected]>
> > > wrote:
> > > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > Given that not everything for NFSv3 has a specification, I
> > > > > > post a
> > > > > > question here (as it concerns linux v3 (client)
> > > > > > implementation)
> > > > > > but I ask a generic question with respect to NOTIFY sent by
> > > > > > an NFS server.
> > > > >
> > > > > There is a standard:
> > > > >
> > > > > https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm
> > > > >
> > > > >
> > > > > > A NOTIFY message that is sent by an NFS server upon reboot
> > > > > > has a
> > > > > > monitor name and a state. This "state" is an integer and is
> > > > > > modified on each server reboot. My question is: what about
> > > > > > state
> > > > > > value uniqueness? Is there somewhere some notion that this
> > > > > > value
> > > > > > has to be unique (as in say a random value).
> > > > > >
> > > > > > Here's a problem. Say a client has 2 mounts to ip1 and ip2
> > > > > > (both
> > > > > > representing the same DNS name) and acquires a lock per
> > > > > > mount. Now
> > > > > > say each of those servers reboot. Once up they each send a
> > > > > > NOTIFY
> > > > > > call and each use a timestamp as basis for their "state"
> > > > > > value --
> > > > > > which very likely is to produce the same value for 2
> > > > > > servers
> > > > > > rebooted at the same time (or for the linux server that
> > > > > > looks like
> > > > > > a counter). On the client side, once the client processes
> > > > > > the 1st
> > > > > > NOTIFY call, it updates the "state" for the monitor name
> > > > > > (ie a
> > > > > > client monitors based on a DNS name which is the same for
> > > > > > ip1 and
> > > > > > ip2) and then in the current code, because the 2nd NOTIFY
> > > > > > has the
> > > > > > same "state" value this NOTIFY call would be ignored. The
> > > > > > linux
> > > > > > client would never reclaim the 2nd lock (but the
> > > > > > application
> > > > > > obviously would never know it's missing a lock)
> > > > > > --- data corruption.
> > > > > >
> > > > > > Who is to blame: is the server not allowed to send "non-
> > > > > > unique"
> > > > > > state value? Or is the client at fault here for some
> > > > > > reason?
> > > > >
> > > > > The state value is supposed to be specific to the monitored
> > > > > host. If
> > > > > the client is indeed ignoring the second reboot notification,
> > > > > that's incorrect
> > > behavior, IMO.
> > > >
> > > > If you are using multiple server IP addresses with the same DNS
> > > > name, you
> > > may want to set:
> > > >
> > > > sysctl fs.nfs.nsm_use_hostnames=0
> > > >
> > > > The NLM will register with statd using the IP address as name
> > > > instead of host
> > > name. Then your two IP addresses will each have a separate
> > > monitor entry and
> > > state value monitored.
> > >
> > > In my setup I already have this set to 0. But I'll look around
> > > the code to see what
> > > it is supposed to do.
> >
> > Hmm, maybe it doesn't work on the client side. I don't often test
> > NLM clients with my Ganesha work because I only run one VM and NLM
> > clients can’t function on the same host as any server other than
> > knfsd...
>
> I've been staring and tracing the code and here's what I conclude:
> the
> use of nsm_use_hostname toggles nothing that helps. No matter what
> statd always stores whatever it is monitoring based on the DSN name
> (looks like git blame says it's due to nfs-utils's commit
> 0da56f7d359475837008ea4b8d3764fe982ef512 "statd - use dnsname to
> ensure correct matching of NOTIFY requests". Now what's worse is that
> when statd receives a 2nd monitoring request from lockd for something
> that maps to the same DNS name, statd overwrites the previous
> monitoring information it had. When a NOTIFY arrives from an IP
> matching the DNS name, the statd does the downcall and it will send
> whatever the last monitoring information lockd gave it. Therefore all
> the other locks will never be recovered.
>
> What I struggle with is how to solve this problem. Say ip1 and ip2
> run
> an NFS server and both are known under the same DNS name:
> foo.bar.com.
> Does it mean that they represent the "same" server? Can we assume
> that
> if one of them "rebooted" then the other rebooted as well? It seems
> like we can't go backwards and go back to monitoring by IP. In that
> case I can see that we'll get in trouble if the rebooted server
> indeed
> comes back up with a different IP (same DNS name) and then it would
> never match the old entry and the lock would never be recovered (but
> then also I think lockd will only send the lock to the IP is stored
> previously which in this case would be unreachable). If statd
> continues to monitor by DNS name and then matches either ips to the
> stored entry, then the problem comes with "state" update. Once statd
> processes one NOTIFY which matched the DNS name its state "should" be
> updated but then it would leads us back into the problem if ignoring
> the 2nd NOTIFY call. If statd were to be changed to store multiple
> monitor handles lockd asked to monitor, then when the 1st NOTIFY call
> comes we can ask lockd to recover "all" the store handles. But then
> it
> circles back to my question: can we assume that if one IP rebooted
> does it imply all IPs rebooted?
>
> Perhaps it's lockd that needs to change in how it keeps track of
> servers that hold locks. The behaviour seems to have changed in 2010
> (with commit 8ea6ecc8b0759756a766c05dc7c98c51ec90de37 "lockd: Create
> client-side nlm_host cache") when nlm_host cache was introduced
> written to be based on hash of IP. It seems that before things were
> based on a DNS name making it in line with statd.
>
> Anybody has any thoughts as to whether statd or lockd needs to
> change?
>
I believe Tom Talpey is to blame for the nsm_use_hostname stuff. That
all came from his 2006 Connectathon talk
https://nfsv4bat.org/Documents/ConnectAThon/2006/talpey-cthon06-nsm.pdf
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On 5/22/2024 12:20 PM, Trond Myklebust wrote:
> On Wed, 2024-05-22 at 09:57 -0400, Olga Kornievskaia wrote:
>> On Tue, May 14, 2024 at 6:13 PM Frank Filz <[email protected]>
>> wrote:
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Olga Kornievskaia [mailto:[email protected]]
>>>> Sent: Tuesday, May 14, 2024 2:50 PM
>>>> To: Frank Filz <[email protected]>
>>>> Cc: Chuck Lever III <[email protected]>; Linux NFS Mailing
>>>> List <linux-
>>>> [email protected]>
>>>> Subject: Re: sm notify (nlm) question
>>>>
>>>> On Tue, May 14, 2024 at 5:36 PM Frank Filz
>>>> <[email protected]> wrote:
>>>>>
>>>>>>> On May 14, 2024, at 2:56 PM, Olga Kornievskaia
>>>>>>> <[email protected]>
>>>> wrote:
>>>>>>>
>>>>>>> Hi folks,
>>>>>>>
>>>>>>> Given that not everything for NFSv3 has a specification, I
>>>>>>> post a
>>>>>>> question here (as it concerns linux v3 (client)
>>>>>>> implementation)
>>>>>>> but I ask a generic question with respect to NOTIFY sent by
>>>>>>> an NFS server.
>>>>>>
>>>>>> There is a standard:
>>>>>>
>>>>>> https://pubs.opengroup.org/onlinepubs/9629799/chap11.htm
>>>>>>
>>>>>>
>>>>>>> A NOTIFY message that is sent by an NFS server upon reboot
>>>>>>> has a
>>>>>>> monitor name and a state. This "state" is an integer and is
>>>>>>> modified on each server reboot. My question is: what about
>>>>>>> state
>>>>>>> value uniqueness? Is there somewhere some notion that this
>>>>>>> value
>>>>>>> has to be unique (as in say a random value).
>>>>>>>
>>>>>>> Here's a problem. Say a client has 2 mounts to ip1 and ip2
>>>>>>> (both
>>>>>>> representing the same DNS name) and acquires a lock per
>>>>>>> mount. Now
>>>>>>> say each of those servers reboot. Once up they each send a
>>>>>>> NOTIFY
>>>>>>> call and each use a timestamp as basis for their "state"
>>>>>>> value --
>>>>>>> which very likely is to produce the same value for 2
>>>>>>> servers
>>>>>>> rebooted at the same time (or for the linux server that
>>>>>>> looks like
>>>>>>> a counter). On the client side, once the client processes
>>>>>>> the 1st
>>>>>>> NOTIFY call, it updates the "state" for the monitor name
>>>>>>> (ie a
>>>>>>> client monitors based on a DNS name which is the same for
>>>>>>> ip1 and
>>>>>>> ip2) and then in the current code, because the 2nd NOTIFY
>>>>>>> has the
>>>>>>> same "state" value this NOTIFY call would be ignored. The
>>>>>>> linux
>>>>>>> client would never reclaim the 2nd lock (but the
>>>>>>> application
>>>>>>> obviously would never know it's missing a lock)
>>>>>>> --- data corruption.
>>>>>>>
>>>>>>> Who is to blame: is the server not allowed to send "non-
>>>>>>> unique"
>>>>>>> state value? Or is the client at fault here for some
>>>>>>> reason?
>>>>>>
>>>>>> The state value is supposed to be specific to the monitored
>>>>>> host. If
>>>>>> the client is indeed ignoring the second reboot notification,
>>>>>> that's incorrect
>>>> behavior, IMO.
>>>>>
>>>>> If you are using multiple server IP addresses with the same DNS
>>>>> name, you
>>>> may want to set:
>>>>>
>>>>> sysctl fs.nfs.nsm_use_hostnames=0
>>>>>
>>>>> The NLM will register with statd using the IP address as name
>>>>> instead of host
>>>> name. Then your two IP addresses will each have a separate
>>>> monitor entry and
>>>> state value monitored.
>>>>
>>>> In my setup I already have this set to 0. But I'll look around
>>>> the code to see what
>>>> it is supposed to do.
>>>
>>> Hmm, maybe it doesn't work on the client side. I don't often test
>>> NLM clients with my Ganesha work because I only run one VM and NLM
>>> clients can’t function on the same host as any server other than
>>> knfsd...
>>
>> I've been staring and tracing the code and here's what I conclude:
>> the
>> use of nsm_use_hostname toggles nothing that helps. No matter what
>> statd always stores whatever it is monitoring based on the DSN name
>> (looks like git blame says it's due to nfs-utils's commit
>> 0da56f7d359475837008ea4b8d3764fe982ef512 "statd - use dnsname to
>> ensure correct matching of NOTIFY requests". Now what's worse is that
>> when statd receives a 2nd monitoring request from lockd for something
>> that maps to the same DNS name, statd overwrites the previous
>> monitoring information it had. When a NOTIFY arrives from an IP
>> matching the DNS name, the statd does the downcall and it will send
>> whatever the last monitoring information lockd gave it. Therefore all
>> the other locks will never be recovered.
>>
>> What I struggle with is how to solve this problem. Say ip1 and ip2
>> run
>> an NFS server and both are known under the same DNS name:
>> foo.bar.com.
>> Does it mean that they represent the "same" server? Can we assume
>> that
>> if one of them "rebooted" then the other rebooted as well? It seems
>> like we can't go backwards and go back to monitoring by IP. In that
>> case I can see that we'll get in trouble if the rebooted server
>> indeed
>> comes back up with a different IP (same DNS name) and then it would
>> never match the old entry and the lock would never be recovered (but
>> then also I think lockd will only send the lock to the IP is stored
>> previously which in this case would be unreachable). If statd
>> continues to monitor by DNS name and then matches either ips to the
>> stored entry, then the problem comes with "state" update. Once statd
>> processes one NOTIFY which matched the DNS name its state "should" be
>> updated but then it would leads us back into the problem if ignoring
>> the 2nd NOTIFY call. If statd were to be changed to store multiple
>> monitor handles lockd asked to monitor, then when the 1st NOTIFY call
>> comes we can ask lockd to recover "all" the store handles. But then
>> it
>> circles back to my question: can we assume that if one IP rebooted
>> does it imply all IPs rebooted?
>>
>> Perhaps it's lockd that needs to change in how it keeps track of
>> servers that hold locks. The behaviour seems to have changed in 2010
>> (with commit 8ea6ecc8b0759756a766c05dc7c98c51ec90de37 "lockd: Create
>> client-side nlm_host cache") when nlm_host cache was introduced
>> written to be based on hash of IP. It seems that before things were
>> based on a DNS name making it in line with statd.
>>
>> Anybody has any thoughts as to whether statd or lockd needs to
>> change?
>>
>
> I believe Tom Talpey is to blame for the nsm_use_hostname stuff. That
> all came from his 2006 Connectathon talk
> https://nfsv4bat.org/Documents/ConnectAThon/2006/talpey-cthon06-nsm.pdf
I deny that!! :) All that talk intended to do was to point out how
deeply flawed the statmon protocol is, and how badly it was then
implemented. However, hostnames may be a slight improvement over
the mess that was 2006. And it's been kinda sorta working since then.
Personally I still think trying to "fix" nsm is a fool's errand.
It's just never ever going to succeed. Particularly if both the
clients *and* servers have to change. NFS4.1 is the better way.
Tom.