From: Wendy Cheng <wcheng@redhat.com>
Subject: Re: [PATCH 1/2] NLM failover unlock commands
Date: Thu, 17 Jan 2008 10:48:56 -0500
Message-ID: <478F78E8.40601@redhat.com>
References: <20080108170220.GA21401@infradead.org>
	<20080108174958.GA25025@infradead.org>
	<4783E3C9.3040803@redhat.com>
	<20080109180214.GA31071@infradead.org>
	<20080110075959.GA9623@infradead.org> <4788665B.4020405@redhat.com>
	<18315.62909.330258.83038@notabene.brown>
	<478D14C5.1000804@redhat.com>
	<18317.7319.443532.62244@notabene.brown>
	<478D3820.9080402@redhat.com> <20080117151007.GB16581@fieldses.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Cc: Neil Brown <neilb@suse.de>,
	Christoph Hellwig <hch@infradead.org>,
	NFS list <linux-nfs@vger.kernel.org>, cluster-devel@redhat.com
To: "J. Bruce Fields" <bfields@fieldses.org>
In-Reply-To: <20080117151007.GB16581@fieldses.org>
Sender: cluster-devel-bounces@redhat.com
Errors-To: cluster-devel-bounces@redhat.com

J. Bruce Fields wrote:
> Remind me: why do we need both per-ip and per-filesystem methods?  In
> practice, I assume that we'll always do *both*?
>   

Failover normally is done via virtual IP address - so per-ip base method 
should be the core routine. However, for non-cluster filesystem such as 
ext3/4, changing server also implies umount. If there are clients not 
following rule and obtaining locks via different ip interfaces, umount 
would fail that ends up aborting the failover process. That's the place 
we need the per-filesystem method.

ServerA:
1. Tear down the IP address
2. Unexport the path
3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
4. If unmount required,
write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
5. Signal peer to begin take-over.

Sometime ago we were looking at "export name" as the core method (so 
per-filesystem method is a subset of that). Unfortunately, the prototype 
efforts showed the code would be too intrusive (if filesystem sub-tree 
is exported).
> We're migrating clients by moving a server ip address from one node to
> another.  And I assume we're permitting at most one node to export each
> filesystem at a time.  So it *should* be the case that the set of locks
> held on the filesystem(s) that are moving are the same as the set of
> locks held by the virtual ip that is moving.
>   

This is true for non-cluster filesystem. But a cluster filesystem can be 
exported from multiple servers.
> But presumably in some scenarios clients can get confused, and we need
> to ensure that stale locks are not left behind?
>   

Yes.

> We've discussed this before, but we should get the answer into comments
> in the code (or on the patches).
>
>   
ok, working on it. or should we add something into linux/Documentation 
to describe the overall logic ?

-- Wendy