2002-06-12 16:27:43

by Daniel Forrest

[permalink] [raw]
Subject: [PATCH] lockd hangs during host lookup garbage collection

A deadlock occurs under the following sequence:

->nlmsvc_lock calls down(&file->f_sema)
->nlmsvc_create_block
->nlmclnt_lookup_host
->nlm_lookup_host may do garbage collection
->nlm_gc_hosts
->nlmsvc_mark_resources
->nlm_traverse_files action = NLM_ACT_MARK
->nlm_inspect_file loops over all files
->nlmsvc_traverse_blocks calls down(&file->f_sema)

Under heavy load (i.e. >32 client machines locking/unlocking the same
NFS mounted file repeatedly) this happens within seconds.

I discussed this with Trond Myklebust a couple of months ago, but the
question became one of why f_sema was being used at all. I don't know
the answer to that question. Until someone else removes f_sema, the
following patch will avoid the problem.

Please apply.

Dan

--- fs/lockd/svclock.c.ORIG Thu Oct 11 09:52:18 2001
+++ fs/lockd/svclock.c Tue Jun 11 17:06:01 2002
@@ -176,8 +176,14 @@
struct nlm_rqst *call;

/* Create host handle for callback */
+ /* We must up the semaphore in case the host lookup does
+ * garbage collection (which calls nlmsvc_traverse_blocks),
+ * but this shouldn't be a problem because nlmsvc_lock has
+ * to retry the lock after this anyway */
+ up(&file->f_sema);
host = nlmclnt_lookup_host(&rqstp->rq_addr,
rqstp->rq_prot, rqstp->rq_vers);
+ down(&file->f_sema);
if (host == NULL)
return NULL;