2008-06-05 21:42:38

by Steve Gaarder

[permalink] [raw]
Subject: File locking quits

I am running an NFS (version 3 and 4) server on Red Hat Enterprise 5,
upgraded to kernel version 2.6.18-92.el5 a couple weeks ago. A couple
days ago NFS file locking quit completely. Any program that tried to lock
a file would hang, including this piece of Python code:

import fcntl
fp = open("lock-test4", "a")
fcntl.lockf(fp.fileno(), fcntl.LOCK_EX|fcntl.LOCK_NB)

A packet sniff showed periodic retransmissions of requests to the lock
manager port, and no replies. I am running iptables with that port (among
others) allowed through. Restarting iptables did not help. The only
thing that did help was a reboot. The next day the problem happened
again; this time, when I rebooted, I reverted to the older kernel,
2.6.18-53.1.4.el5. So far, it's run a bit more than 24 hours without
incident.

- any idea what might be going on?
- am I correct that this locking is handled in the kernel?
- is there a way of restarting locking short of rebooting?
- how would I go about debugging this further?

thanks,

Steve Gaarder
System Administrator, Dept of Mathematics
Cornell University, Ithaca, NY, USA
[email protected]