2004-05-31 11:02:06

by NeilBrown

[permalink] [raw]
Subject: Re: fcntl locks prevent unmounting of underlying filesystem

On Wednesday May 26, [email protected] wrote:
> I've seem to have run across a problem with NFS and fcntl locks. I'm
> trying to implement a HA-NFS solution using heartbeat, DRBD, LVM2, etc.
> I'm running the following:
>
> 2.6.6 kernel
> nfs-kernel-server and nfs-common 1.0.6-3 (debian packages)
>
> The underlying filesystem is reiserfs. Essentially what I'm seeing is
> that when I try to shut down NFS and unmount the filesystem for a
> failover, I'm unable to unmount if I have an fcntl lock on the file.

You need to make sure that lockd gets killed as well.
Just shutting down nfsd doesn't necessarily kill lockd, as if you have
any active nfs mounts lockd will stay up for them.
So, when you have shut down nfsd and before you try to unmount, could
you check if lockd is still running or not?
If it is, send it a SIGKILL. It won't exit, but it should release any
locks that it is holding.

If lockd has gone away at this point but locks are still being held,
then that is a real problem and I will try to look into it.

NeilBrown



-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-05-30 10:53:16

by Jeff Layton

[permalink] [raw]
Subject: Re: fcntl locks prevent unmounting of underlying filesystem

> You need to make sure that lockd gets killed as well.
> Just shutting down nfsd doesn't necessarily kill lockd, as if you have
> any active nfs mounts lockd will stay up for them.
> So, when you have shut down nfsd and before you try to unmount, could
> you check if lockd is still running or not?
> If it is, send it a SIGKILL. It won't exit, but it should release any
> locks that it is holding.
>
> If lockd has gone away at this point but locks are still being held,
> then that is a real problem and I will try to look into it.
>
> NeilBrown

Ahh, Thanks for the info! That does indeed seem to take care of the hold
on the underlying filesystem. I need to do a little more testing to see
how locks are handled, but that at least removes my current logjam.

Cheers!
--
Jeff Layton <[email protected]>


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-06-04 12:03:10

by Jeff Layton

[permalink] [raw]
Subject: Re: [NFS] fcntl locks prevent unmounting of underlying filesystem

On Sat, 2004-05-29 at 18:51, Neil Brown wrote:
> If lockd has gone away at this point but locks are still being held,
> then that is a real problem and I will try to look into it.
>
> NeilBrown

Ok, I think we have a test case where even a SIGKILL to lockd won't
help. When I did a single POSIX lock on the filesystem, sending the
SIGKILL to lockd did seem to clear up the problem unmounting the
filesystem. The folks on the linux-ha list were still having problems
and suggested I try a Connectathon test to see if the SIGKILL still
worked after that.

I downloaded connectathon:

http://www.connectathon.org/nfstests.html

And ran the locking test on a filesystem I had mounted from the server.
I then did on the server:

exportfs -u <filesystem>
exportfs -f
pkill -KILL -x lockd

And tried to unmount the filesystem. It wouldn't unmount. I then killed
connectathon and tried to unmount it. It wouldn't unmount. I then
unmounted the filesystem from the client, and still I couldn't unmount
the underlying filesystem. At this point, I rebooted the box, as I
didn't see any alternative.

So whatever connectathon does, it seems to hose up Linux NFS locking
pretty solidly. One thing it seems to do is reserve _a_lot_ of locks, so
perhaps it's a problem with the amount of them. If you could download
and try it, perhaps you could get to the bottom of the problem.

FWIW, I'm using 2.6.6 kernel on the server, with /proc/fs/nfsd mounted
and nfs-utils 1.0.6-3 from Debian archive, but Guochun Shi said that
he's been able to replicate this problem on recent 2.4 kernels as well.

Many thanks for your help!
-- Jeff


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha