2006-07-20 06:09:49

by Rajat Upadhyaya

[permalink] [raw]
Subject: NFSv4 Active-Active Cluster Queries

Hi,

I have been working on setting up NFSv4 in cluster scenario. But when
it comes to an Active-Active setup, it seems there are a few problems.

In Active-Passive case, the /var/lib/nfs directory itself is moved to
shared storage & on the nodes in the cluster, symlinks are created
pointing to the location on shared storage. This isn't possible in case
of Active-Active setup. In this case, we will need to use scripts to
merge the information in /var/lib/nfs from the failed node with that on
the running one and then do a re-export.

1) From the High Availability wiki (http://wiki.linux-ha.org), I got to
this page -
http://chilli.linuxmds.com/~mschilli/NFS/active-active-nfs.html where
a method of setting up Active-Active NFS failover was described.
Basically a set of scripts is used to duplicate the rmtab entries on the
node that takes over the process during failover. Will this method hold
in case of NFSv4 too? What other data would need to be duplicated?

2) Assuming there are 2 nodes in a cluster, a client mounts different
exports from both the nodes. Then if one node goes down, the other node
takes over the failed node's IP & re-exports its exports. In this case,
if (1) holds, what happens to the rmtab entry? Is it possible to merge
the 2 entries?

3) Locking issues - revocation & reacquiring of locks.

4) I can see that there is some HA callout code in nfs-utils (under
support). Is this applicable for NFSv4 also?

Is my understanding here correct? If so, are there any
work-arounds/solutions for the issues? Also, are there any other
problems which I have missed out?

Regards,
Rajat

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-07-26 08:58:31

by Yatin Manerker

[permalink] [raw]
Subject: Re: NFSv4 Active-Active Cluster Queries

>Yes. Even active- passive failover is problematic right now, mainly
> because the method we use for storing reboot recovery information
> (analagous to the state recorded by statd) is still in flux.

> We're aware of these problems and working on solving them. There
are
> some rough (possibly out of date notes) here:

> http://wiki.linux-nfs.org/index.php/Cluster_Coherent_NFS_design

> but those are more of interest to someone working on design of a
> solution than to someone trying to set up and use any of this (which
we
> don't recommend doing yet).

> -- b.

Hi
We have tested NFSv4 in Active/Passive HA clusters. I wanted to know
if we have missed out anything test scenarios. I have not found any
issues in the following test scenarios.
Here are the test results

Server: Suse Linux Enterprise server 10
Build : RC3
Kernel: 2.6.16.21
Hardware details: Server -> 32 bit hardware 2P - P4 2Gb RAM
HA clusters : Active/Passive
Client hardware: 1P, P3-512 MB RAM

Following scenarios are tested and passed for Active/Passive HA
clusters (iSCSI )

1.Client connection during failover and failback scenarios. Tested with
40 clients.
2.Large file transfer from 4 clients. Single file size of 8GB from each
client. Test was performed during failover and failback scenarios.

3.File listing : File listing was done from 40 clients. (A single
directory having 0.9 million files).
4.Recovery test: Physical link breakage and Power down recovery when
client is running test(connect athon) on NFS mounted volume.
5.Backward compatibility : NFSv3/v4 client connecting NFSv4/v3 server
6.Interop: Novell iFolder client data on NFSv4 mount point.NFSv4 client
mounting NFSv4 exported file system and iFolder client storing data on
the remote NFS file system
7.iozone and connect athon test executed for NFSv4 client in cluster
environment


Following scenarios are tested and passed for Active/Passive HA
clusters (DRBD ) (for single client )

1.Client connection during failover and failback scenarios. (V3 and V4
client)
2.Directory listing (v3 and v4 client)
3.Executed Connect athon test (Lock testing is done as part of this
test)


Regards,
Yatin

2006-07-20 07:12:25

by NeilBrown

[permalink] [raw]
Subject: Re: NFSv4 Active-Active Cluster Queries

On Thursday July 20, [email protected] wrote:
> Hi,
>
> I have been working on setting up NFSv4 in cluster scenario. But when
> it comes to an Active-Active setup, it seems there are a few problems.
>
> In Active-Passive case, the /var/lib/nfs directory itself is moved to
> shared storage & on the nodes in the cluster, symlinks are created
> pointing to the location on shared storage. This isn't possible in case
> of Active-Active setup. In this case, we will need to use scripts to
> merge the information in /var/lib/nfs from the failed node with that on
> the running one and then do a re-export.
>
> 1) From the High Availability wiki (http://wiki.linux-ha.org), I got to
> this page -
> http://chilli.linuxmds.com/~mschilli/NFS/active-active-nfs.html where
> a method of setting up Active-Active NFS failover was described.
> Basically a set of scripts is used to duplicate the rmtab entries on the
> node that takes over the process during failover. Will this method hold
> in case of NFSv4 too? What other data would need to be duplicated?

rmtab is simply not an issue - you can ignore it.
Provided /proc/fs/nfsd is mounted, the contents of rmtab are ignored.

What is important is locking state. For NFSv4, that means the
contents of /var/lib/nfs/v4recovery. I don't know exactly how this is
handled.

>
> 3) Locking issues - revocation & reacquiring of locks.

Wendy Cheng <[email protected]> has been looking at what kernel/utils
changes are needed to support this fully, though I think she is only
looking at v3.

>
> 4) I can see that there is some HA callout code in nfs-utils (under
> support). Is this applicable for NFSv4 also?

No. Currently all v4 state is handled by the kernel, though that may change.

NeilBrown

2006-07-20 13:52:19

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFSv4 Active-Active Cluster Queries

On Thu, Jul 20, 2006 at 12:11:26AM -0600, Rajat Upadhyaya wrote:
> I have been working on setting up NFSv4 in cluster scenario. But when
> it comes to an Active-Active setup, it seems there are a few problems.

Yes. Even active-passive failover is problematic right now, mainly
because the method we use for storing reboot recovery information
(analagous to the state recorded by statd) is still in flux.

We're aware of these problems and working on solving them. There are
some rough (possibly out of date notes) here:

http://wiki.linux-nfs.org/index.php/Cluster_Coherent_NFS_design

but those are more of interest to someone working on design of a
solution than to someone trying to set up and use any of this (which we
don't recommend doing yet).

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-07-21 17:27:24

by Wendy Cheng

[permalink] [raw]
Subject: Re: NFSv4 Active-Active Cluster Queries

Neil Brown wrote:

>On Thursday July 20, [email protected] wrote:
>
>
>>Hi,
>>
>>I have been working on setting up NFSv4 in cluster scenario. But when
>>it comes to an Active-Active setup, it seems there are a few problems.
>>
>>
>>
>>3) Locking issues - revocation & reacquiring of locks.
>>
>>
>
>Wendy Cheng <[email protected]> has been looking at what kernel/utils
>changes are needed to support this fully, though I think she is only
>looking at v3.
>
>
>
The v3 patches are currently under function verification tests using
RHCS cluser suite - so far so good
(http://people.redhat.com/wcheng/Patches/NLM/). After documenting the
procedure and restrictions, the formal submission (to nfs list) is
scheduled at end of this month.


Not sure what to do with V4 yet .. need to chat with CITI folks first.


-- Wendy



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs