2006-05-03 03:17:11

by Erik Walthinsen

[permalink] [raw]
Subject: Breaking locks on a lame server?

We've got a configuration with a NAS machine (3ware SATA RAID) running a
2.4.26 kernel is in use by a number of client machines, each running
multiple instances of user-mode-linux. Most all "block devices" presented
to the UML instances are located on the NAS, exported via NFS. The clients
are currently a mix of 2.4.26 and 2.6.15.6.

The problem is, whenever there's any kind of abnormal shutdown of either a
UML instance or the machine that's running it, a NFS lock is left in place.
This means that UML refuses to start up again using those files, unless I
patch the UML kernel to avoid F_SETLK, which is of course very unsafe (two
UMLs touching the same file means EDEADFILE).

I've seen other suggestions for breaking locks that involve restarting statd
on the clients, etc., but the problem is this: these locks are persistent
across reboots of the clients. IIRC they're also persistent *between*
clients, so I can't start a UML instance on an alternate machine either.

We're planning an upgrade of the NAS box to 2.6 as soon as we can, but that
means a system-wide shutdown of all our customer's UML instances, which
isn't something we do lightly.

1) am I to expect 2.6's NFS server implementation to somehow solve this?
2) if not, and in the meantime, are there any means by which I can inspect
(lslk does nothing) and hopefully kill these locks?

(I'm not on the list, already on too many, so please make sure to send to:
or cc: me)

Thankx,
Omega
aka Erik Walthinsen
[email protected]


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-05-03 14:34:09

by Jeff Layton

[permalink] [raw]
Subject: Re: Breaking locks on a lame server?

On Tue, 2006-05-02 at 20:17 -0700, Erik Walthinsen wrote:
> 1) am I to expect 2.6's NFS server implementation to somehow solve this?
> 2) if not, and in the meantime, are there any means by which I can inspect
> (lslk does nothing) and hopefully kill these locks?

One thing you can do (ableit not a nice thing), is to send a SIGKILL to
lockd on the NFS server. This will make it drop all of its locks.
Restarting statd just afterward should cue the active clients to
reacquire them. This has races, but may be better than restarting your
NAS. This works with 2.4 and 2.6 kernels.

I also recently posted a patch to add a procfs interface that would
allow dropping of locks on just a single block device (mostly to help in
clustered situations), but I received no comment on it. Depending on
your setup, that may help you as well.

-- Jeff




-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs