LinuxLists.cc - The host->h_inuse NLM's nlm_gc

2006-06-05 07:44:03

Subject: The host->h_inuse NLM's nlm_gc_hosts()

I'm chasing a bug in our distribution - when signalling lockd, the lock
references do not seem to get released. I just spot the "nlm_gc_hosts"
explicitly set

host->h_inuse = 0

then call nlmsvc_mark_resources() to remark h_inuse to 1 if there is a
lock reference via:

nlmsvc_mark_resources ->
nlm_traverse_files ->
nlm_inspect_file->
nlm_traverse_locks->

183 again:
184 file->f_locks = 0;
185 for (fl = inode->i_flock; fl; fl = fl->fl_next) {
186 if (fl->fl_lmops != &nlmsvc_lock_operations)
187 continue;
188
189 /* update current lock count */
190 file->f_locks++;
191 lockhost = (struct nlm_host *) fl->fl_owner;
192 if (action == NLM_ACT_MARK)
193 lockhost->h_inuse = 1;
194 else if (action == NLM_ACT_CHECK)
195 return 1;
196 else if (action == NLM_ACT_UNLOCK) {
197 struct file_lock lock = *fl;
198
199 if (host && lockhost != host)
200 continue;

So in theory, from #193, the lockhost->h_inuse is supposed to get set
back to 1. However, I'm wondering whether the fl->fl_owner would get
overwritten anywhere by (task_struct *)current->files in various other
code paths so we could end up updating different structures ? This could
allow the host gets freed such that nlmsvc_invalidate_all() will no
longer find this host ?

Will keep doing this bug hunt tomorrow but wondering someone knows this
code well can assure me that this won't happen ? Many thanks in advance.

Also the assumption here is that by sending lockd a (any) signal would
kickoff nlmsvc_invalidate_all() to remove the locks. Is this a correct
assumption ?

-- Wendy

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-06-08 10:43:20

by Wendy Cheng

[permalink] [raw]

Subject: Re: The host->h_inuse NLM's nlm_gc_hosts()

On Mon, 2006-06-05 at 03:54 -0400, Wendy Cheng wrote:
> I'm chasing a bug in our distribution - when signalling lockd, the lock
> references do not seem to get released.

It turns out to be a false alarm. What happened was that we had reports
saying RHCS (Red Hat cluster suite) in some cases would not be able to
umount the filesystem during active-active failover whenever nfs client
was holding posix locks.

Extensive tracing shows the umount failure is actually caused by export
entry's vfsmount reference due to the way unexport was handled. Sending
a signal to lockd *did* release the posix locks as expected.

On the other hand, we do have some other NLM issues (with patches) that
we would like to discuss with the list. More on that later.

-- Wendy

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs