On 5/22/2024 12:20 PM, Trond Myklebust wrote: > On Wed, 2024-05-22 at 09:57 -0400, Olga Kornievskaia wrote: >> On Tue, May 14, 2024 at 6:13 PM Frank Filz >> wrote: >>> >>> >>> >>>> -----Original Message----- >>>> From: Olga Kornievskaia [] >>>> Sent: Tuesday, May 14, 2024 2:50 PM >>>> To: Frank Filz >>>> Cc: Chuck Lever III ; Linux NFS Mailing >>>> List >>>> >>>> Subject: Re: sm notify (nlm) question >>>> >>>> On Tue, May 14, 2024 at 5:36 PM Frank Filz >>>> wrote: >>>>> >>>>>>> On May 14, 2024, at 2:56 PM, Olga Kornievskaia >>>>>>> >>>> wrote: >>>>>>> >>>>>>> Hi folks, >>>>>>> >>>>>>> Given that not everything for NFSv3 has a specification, I >>>>>>> post a >>>>>>> question here (as it concerns linux v3 (client) >>>>>>> implementation) >>>>>>> but I ask a generic question with respect to NOTIFY sent by >>>>>>> an NFS server. >>>>>> >>>>>> There is a standard: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> A NOTIFY message that is sent by an NFS server upon reboot >>>>>>> has a >>>>>>> monitor name and a state. This "state" is an integer and is >>>>>>> modified on each server reboot. My question is: what about >>>>>>> state >>>>>>> value uniqueness? Is there somewhere some notion that this >>>>>>> value >>>>>>> has to be unique (as in say a random value). >>>>>>> >>>>>>> Here's a problem. Say a client has 2 mounts to ip1 and ip2 >>>>>>> (both >>>>>>> representing the same DNS name) and acquires a lock per >>>>>>> mount. Now >>>>>>> say each of those servers reboot. Once up they each send a >>>>>>> NOTIFY >>>>>>> call and each use a timestamp as basis for their "state" >>>>>>> value -- >>>>>>> which very likely is to produce the same value for 2 >>>>>>> servers >>>>>>> rebooted at the same time (or for the linux server that >>>>>>> looks like >>>>>>> a counter). On the client side, once the client processes >>>>>>> the 1st >>>>>>> NOTIFY call, it updates the "state" for the monitor name >>>>>>> (ie a >>>>>>> client monitors based on a DNS name which is the same for >>>>>>> ip1 and >>>>>>> ip2) and then in the current code, because the 2nd NOTIFY >>>>>>> has the >>>>>>> same "state" value this NOTIFY call would be ignored. The >>>>>>> linux >>>>>>> client would never reclaim the 2nd lock (but the >>>>>>> application >>>>>>> obviously would never know it's missing a lock) >>>>>>> --- data corruption. >>>>>>> >>>>>>> Who is to blame: is the server not allowed to send "non- >>>>>>> unique" >>>>>>> state value? Or is the client at fault here for some >>>>>>> reason? >>>>>> >>>>>> The state value is supposed to be specific to the monitored >>>>>> host. If >>>>>> the client is indeed ignoring the second reboot notification, >>>>>> that's incorrect >>>> behavior, IMO. >>>>> >>>>> If you are using multiple server IP addresses with the same DNS >>>>> name, you >>>> may want to set: >>>>> >>>>> sysctl fs.nfs.nsm_use_hostnames=0 >>>>> >>>>> The NLM will register with statd using the IP address as name >>>>> instead of host >>>> name. Then your two IP addresses will each have a separate >>>> monitor entry and >>>> state value monitored. >>>> >>>> In my setup I already have this set to 0. But I'll look around >>>> the code to see what >>>> it is supposed to do. >>> >>> Hmm, maybe it doesn't work on the client side. I don't often test >>> NLM clients with my Ganesha work because I only run one VM and NLM >>> clients can’t function on the same host as any server other than >>> knfsd... >> >> I've been staring and tracing the code and here's what I conclude: >> the >> use of nsm_use_hostname toggles nothing that helps. No matter what >> statd always stores whatever it is monitoring based on the DSN name >> (looks like git blame says it's due to nfs-utils's commit >> 0da56f7d359475837008ea4b8d3764fe982ef512 "statd - use dnsname to >> ensure correct matching of NOTIFY requests". Now what's worse is that >> when statd receives a 2nd monitoring request from lockd for something >> that maps to the same DNS name, statd overwrites the previous >> monitoring information it had. When a NOTIFY arrives from an IP >> matching the DNS name, the statd does the downcall and it will send >> whatever the last monitoring information lockd gave it. Therefore all >> the other locks will never be recovered. >> >> What I struggle with is how to solve this problem. Say ip1 and ip2 >> run >> an NFS server and both are known under the same DNS name: >> >> Does it mean that they represent the "same" server? Can we assume >> that >> if one of them "rebooted" then the other rebooted as well?  It seems >> like we can't go backwards and go back to monitoring by IP. In that >> case I can see that we'll get in trouble if the rebooted server >> indeed >> comes back up with a different IP (same DNS name) and then it would >> never match the old entry and the lock would never be recovered (but >> then also I think lockd will only send the lock to the IP is stored >> previously which in this case would be unreachable). If statd >> continues to monitor by DNS name and then matches either ips to the >> stored entry, then the problem comes with "state" update. Once statd >> processes one NOTIFY which matched the DNS name its state "should" be >> updated but then it would leads us back into the problem if ignoring >> the 2nd NOTIFY call. If statd were to be changed to store multiple >> monitor handles lockd asked to monitor, then when the 1st NOTIFY call >> comes we can ask lockd to recover "all" the store handles. But then >> it >> circles back to my question: can we assume that if one IP rebooted >> does it imply all IPs rebooted? >> >> Perhaps it's lockd that needs to change in how it keeps track of >> servers that hold locks. The behaviour seems to have changed in 2010 >> (with commit 8ea6ecc8b0759756a766c05dc7c98c51ec90de37 "lockd: Create >> client-side nlm_host cache") when nlm_host cache was introduced >> written to be based on hash of IP. It seems that before things were >> based on a DNS name making it in line with statd. >> >> Anybody has any thoughts as to whether statd or lockd needs to >> change? >> > > I believe Tom Talpey is to blame for the nsm_use_hostname stuff. That > all came from his 2006 Connectathon talk > I deny that!! :) All that talk intended to do was to point out how deeply flawed the statmon protocol is, and how badly it was then implemented. However, hostnames may be a slight improvement over the mess that was 2006. And it's been kinda sorta working since then. Personally I still think trying to "fix" nsm is a fool's errand. It's just never ever going to succeed. Particularly if both the clients *and* servers have to change. NFS4.1 is the better way. Tom.