2002-11-18 14:58:09

by Rashmi Agrawal

[permalink] [raw]
Subject: Failover in NFS

Hi All,

I might be very wrong but I am trying to do following

1. I have a 4 node cluster and nfsv3 in all the nodes of cluster with
server running in one
of the 2 nodesconnected to shared storage and 2 other nodes are acting
as clients.
2. If nfs server node crashes, I need to failover to another node
wherein I need to have access
to the lock state of the previous server and I need to tell the clients
that the IP address of the
nfs server node has changed. IS IT POSSIBLE or what can be done to
implement it?

Another scenario is..
1. When one of the client crashes, it has to failover to another client,
in that case if client
crashed with some locks held how the newly come up client going to grab
the lock again
when it is not released? Again the server might already be taking care
of it but I am not
sure about it.

I hope I can use NFS for this kind of set up and if not what will it
take to have this kind of set
up usinf NFS?

Regards
Rashmi



2002-11-18 15:37:19

by Ragnar Kjørstad

[permalink] [raw]
Subject: Re: Failover in NFS

On Mon, Nov 18, 2002 at 08:34:55PM +0530, Rashmi Agrawal wrote:
> 1. I have a 4 node cluster and nfsv3 in all the nodes of cluster with
> server running in one
> of the 2 nodesconnected to shared storage and 2 other nodes are acting
> as clients.
> 2. If nfs server node crashes, I need to failover to another node
> wherein I need to have access
> to the lock state of the previous server and I need to tell the clients
> that the IP address of the
> nfs server node has changed. IS IT POSSIBLE or what can be done to
> implement it?

No, you need to move the IP-address from the old nfs-server to the new
one. Then to the clients it will look like a regular reboot. (Check out
heartbeat, at http://www.linux-ha.org/)

You need to make sure that NFS is using the shared ip (the one you move
around) rather than the fixed ip. (I assume you will have a fixed ip on
each host in addition to the one you move around). Also, you need to put
/var/lib/nfs on shared stoarage. See the archive for more details.



--
Ragnar Kj?rstad
Big Storage

2002-11-18 22:05:36

by Jesse Pollard

[permalink] [raw]
Subject: Re: Failover in NFS

On Monday 18 November 2002 09:44 am, Ragnar Kj?rstad wrote:
> On Mon, Nov 18, 2002 at 08:34:55PM +0530, Rashmi Agrawal wrote:
> > 1. I have a 4 node cluster and nfsv3 in all the nodes of cluster with
> > server running in one
> > of the 2 nodesconnected to shared storage and 2 other nodes are acting
> > as clients.
> > 2. If nfs server node crashes, I need to failover to another node
> > wherein I need to have access
> > to the lock state of the previous server and I need to tell the clients
> > that the IP address of the
> > nfs server node has changed. IS IT POSSIBLE or what can be done to
> > implement it?
>
> No, you need to move the IP-address from the old nfs-server to the new
> one. Then to the clients it will look like a regular reboot. (Check out
> heartbeat, at http://www.linux-ha.org/)
>
> You need to make sure that NFS is using the shared ip (the one you move
> around) rather than the fixed ip. (I assume you will have a fixed ip on
> each host in addition to the one you move around). Also, you need to put
> /var/lib/nfs on shared stoarage. See the archive for more details.

It would actually be better to use two floating IP numbers. That way during
normal operation, both servers would be functioning simultaneously
(based on the shared storage on two nodes).

Then during failover, the floating IP of the failed node is activated on the
remaining node (total of 3 IP numbers now, one real, two floating). The NFS
recovery cycle should then cause the clients to remount the filesystem from
the backup server.

When the failed node is recovered, the active server should then disable the
floating IP associated with the recovered server, causing only the mounts
using that IP number to fall back to the proper node, balancing the load
again.
--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2002-11-18 22:28:48

by Jan Niehusmann

[permalink] [raw]
Subject: Re: Failover in NFS

On Mon, Nov 18, 2002 at 08:34:55PM +0530, Rashmi Agrawal wrote:
> 2. If nfs server node crashes, I need to failover to another node
> wherein I need to have access
> to the lock state of the previous server and I need to tell the clients
> that the IP address of the
> nfs server node has changed. IS IT POSSIBLE or what can be done to
> implement it?

Have a look at drbd, http://www.complang.tuwien.ac.at/reisner/drbd/.
Using that, together with heartbeat, you can build a nice failover nfs
server.

Jan

2002-11-18 22:34:59

by Jesse Pollard

[permalink] [raw]
Subject: Re: Failover in NFS

On Monday 18 November 2002 04:22 pm, Ragnar Kj?rstad wrote:
> On Mon, Nov 18, 2002 at 04:11:06PM -0600, Jesse Pollard wrote:
> > > No, you need to move the IP-address from the old nfs-server to the new
> > > one. Then to the clients it will look like a regular reboot. (Check out
> > > heartbeat, at http://www.linux-ha.org/)
> > >
> > > You need to make sure that NFS is using the shared ip (the one you move
> > > around) rather than the fixed ip. (I assume you will have a fixed ip on
> > > each host in addition to the one you move around). Also, you need to
> > > put /var/lib/nfs on shared stoarage. See the archive for more details.
> >
> > It would actually be better to use two floating IP numbers. That way
> > during normal operation, both servers would be functioning simultaneously
> > (based on the shared storage on two nodes).
> >
> > Then during failover, the floating IP of the failed node is activated on
> > the remaining node (total of 3 IP numbers now, one real, two floating).
> > The NFS recovery cycle should then cause the clients to remount the
> > filesystem from the backup server.
>
> Yes, that would be better.
>
> But it would not work as described above. There are some important
> limitations here:
>
> - I assumed that /var/lib/nfs is shared. If you want two servers to
> be active at once you need a different way to share lock-data.
>
> - AFAIK there is no way for statd to service 2 IP's at once.
> It will (AFAIK) bind to both adresses, but the problem is the
> message that is sent out at startup and includes the ip of
> the local host.
>
> Neither limitation is a law of nature. They can be fixed. I think there
> is work going on to change the way locks are stored, and I'm sure the
> second problem can be solved as well.

Actually, I was thinking that each server served a different mountpoint
instead of both providing the same one.

I'm not sure how the locks currently would be provided unless the
distributed lock from the shared storage interacts with each servers statd
properly. Otherwise you will already have problems.

Second, I thought that statd didn't care about the lock requests coming
from two IP numbers. This should be no different than having two network
interfaces attached to one server (and that works under Solaris). The
client should be using the name from the IP number, not the router used
between the client and server. I view the floating IP as existing behind
a router using the real IP. Since none of the clients are using the real
IP, the naming should remain consistant (I think).

> There may be solutions out there already. E.g. maybe Lifekeeper or
> Convolo include better support for this?

I don't know.
--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2002-11-18 22:26:50

by Ragnar Kjørstad

[permalink] [raw]
Subject: Re: Failover in NFS

On Mon, Nov 18, 2002 at 04:11:06PM -0600, Jesse Pollard wrote:
> > No, you need to move the IP-address from the old nfs-server to the new
> > one. Then to the clients it will look like a regular reboot. (Check out
> > heartbeat, at http://www.linux-ha.org/)
> >
> > You need to make sure that NFS is using the shared ip (the one you move
> > around) rather than the fixed ip. (I assume you will have a fixed ip on
> > each host in addition to the one you move around). Also, you need to put
> > /var/lib/nfs on shared stoarage. See the archive for more details.
>
> It would actually be better to use two floating IP numbers. That way during
> normal operation, both servers would be functioning simultaneously
> (based on the shared storage on two nodes).
>
> Then during failover, the floating IP of the failed node is activated on the
> remaining node (total of 3 IP numbers now, one real, two floating). The NFS
> recovery cycle should then cause the clients to remount the filesystem from
> the backup server.

Yes, that would be better.

But it would not work as described above. There are some important
limitations here:

- I assumed that /var/lib/nfs is shared. If you want two servers to
be active at once you need a different way to share lock-data.

- AFAIK there is no way for statd to service 2 IP's at once.
It will (AFAIK) bind to both adresses, but the problem is the
message that is sent out at startup and includes the ip of
the local host.


Neither limitation is a law of nature. They can be fixed. I think there
is work going on to change the way locks are stored, and I'm sure the
second problem can be solved as well.

There may be solutions out there already. E.g. maybe Lifekeeper or
Convolo include better support for this?



--
Ragnar Kj?rstad
Big Storage

2002-11-18 22:44:29

by Ragnar Kjørstad

[permalink] [raw]
Subject: Re: Failover in NFS

On Mon, Nov 18, 2002 at 04:41:37PM -0600, Jesse Pollard wrote:
> Actually, I was thinking that each server served a different mountpoint
> instead of both providing the same one.

I know.

> I'm not sure how the locks currently would be provided unless the
> distributed lock from the shared storage interacts with each servers statd
> properly. Otherwise you will already have problems.

"The distributed lock"? Are you talking about scsi-level locks?
No, there is no link between the locks on the lower levels and NFS.

> Second, I thought that statd didn't care about the lock requests coming
> from two IP numbers. This should be no different than having two network
> interfaces attached to one server (and that works under Solaris). The
> client should be using the name from the IP number, not the router used
> between the client and server. I view the floating IP as existing behind
> a router using the real IP. Since none of the clients are using the real
> IP, the naming should remain consistant (I think).

Yes, it's simular to having two network interfaces on one server. If it
works on solaris then clearly it can be make to work on linux as well.

Older versions of nfs-utils used only the IP from
gethostbyname(gethostname); Clearly that didn't work for setups like
this.

I wrote a patch that made it possible to change the IP-address to a
"service-IP". That allowed us to do failover like described in an
earlier mail.

Later that feature has been extended and modified by others. It is
possible that it now allows multiple IP-addresses. If that's the case,
then half the problem is solved.

The other half remains though.


--
Ragnar Kj?rstad
Big Storage

2002-11-19 01:29:56

by Michael Clark

[permalink] [raw]
Subject: Re: Failover in NFS

On 11/19/02 06:22, Ragnar Kj?rstad wrote:

> But it would not work as described above. There are some important
> limitations here:
>
> - I assumed that /var/lib/nfs is shared. If you want two servers to
> be active at once you need a different way to share lock-data.

I'm looking at this problem right now. Basically to support multiple
virtual NFS servers with failover support, lockd could be modified to
communicate with the local statd using the dest IP used in the locking
operation - then statd can modified to look at the peer address
(which is normally 127.0.0.1) to find out which NFS virtual server
the monitor request is for. This would also allow you to run a statd
per virtual NFS server (bound to the specific address instead of
IPADDR_ANY).

> - AFAIK there is no way for statd to service 2 IP's at once.
> It will (AFAIK) bind to both adresses, but the problem is the
> message that is sent out at startup and includes the ip of
> the local host.

The nfs-utils in cvs has an undocumented notify-only mode and the
hostname used in the reboot notify message can also overriden
on the commandline.

So during a take over for a virtual NFS server - the new taking over
node (already has its statd running) can run another copy of statd in
notify-only mode to send out the reboot messages using the name
of the virtual host.

This all assumes /var/lib/nfs/sm can be synchronised between the
hosts either with something like GFS or possibly a modified statd
that communicates monitor requests to its cluster peers.

I have just written some code to sync the /var/lib/nfs/sm directories.
It is a little deamon that uses dnotify to get realtime directory
update notifications, it then sends UDP messages to its cluster peers
to keep the /var/lib/nfs/sm in sync.

The problem is that until statd is virtual IP aware, if a host
is connected to 2 of the virtual NFS servers and sends an unmonitor
for one, both will get unmonitored.

I'm thinking that instead of statd just storing the monitor IP
adress in /var/lib/nfs/sm/1.2.3.4 that instead with the lockd changes,
it could also store the peer address (virtual NFS server) ie.
/var/lib/nfs/sm/5.6.7.8:1.2.3.4 (this would be compatible for a
old lock as the monitor would be stored as /var/lib/nfs/sm/127.0.0.1:1.2.3.4

> Neither limitation is a law of nature. They can be fixed. I think there
> is work going on to change the way locks are stored, and I'm sure the
> second problem can be solved as well.

Yes, i hope so.

~mc

2002-11-19 05:01:08

by Rashmi Agrawal

[permalink] [raw]
Subject: Re: Failover in NFS

Michael Clark wrote:

> On 11/19/02 06:22, Ragnar Kj?rstad wrote:
>
> > But it would not work as described above. There are some important
> > limitations here:
> >
> > - I assumed that /var/lib/nfs is shared. If you want two servers to
> > be active at once you need a different way to share lock-data.
>
> I'm looking at this problem right now. Basically to support multiple
> virtual NFS servers with failover support, lockd could be modified to
> communicate with the local statd using the dest IP used in the locking
> operation - then statd can modified to look at the peer address
> (which is normally 127.0.0.1) to find out which NFS virtual server
> the monitor request is for. This would also allow you to run a statd
> per virtual NFS server (bound to the specific address instead of
> IPADDR_ANY).

What is the plan for this implementation? Is it going to be part of main line
kernel.
If yes then when will it be available and if no, when will it be available in
the form of
patches.

Should we expect NFs failover support in linux kernel soon??

Rashmi

2002-11-19 07:33:24

by Michael Clark

[permalink] [raw]
Subject: Re: Failover in NFS

On 11/19/02 13:07, Rashmi Agrawal wrote:
> Michael Clark wrote:
>
>
>>On 11/19/02 06:22, Ragnar Kj?rstad wrote:
>>
>>
>>>But it would not work as described above. There are some important
>>>limitations here:
>>>
>>>- I assumed that /var/lib/nfs is shared. If you want two servers to
>>> be active at once you need a different way to share lock-data.
>>
>>I'm looking at this problem right now. Basically to support multiple
>>virtual NFS servers with failover support, lockd could be modified to
>>communicate with the local statd using the dest IP used in the locking
>>operation - then statd can modified to look at the peer address
>>(which is normally 127.0.0.1) to find out which NFS virtual server
>>the monitor request is for. This would also allow you to run a statd
>>per virtual NFS server (bound to the specific address instead of
>>IPADDR_ANY).
>
>
> What is the plan for this implementation? Is it going to be part of main line
> kernel.
> If yes then when will it be available and if no, when will it be available in
> the form of
> patches.

I still don't know how hard the lockd changes would be.
More just thinking how it could be done now.

> Should we expect NFs failover support in linux kernel soon??

Simple failover with a single NFS service is already supported.

As ppl have said - you just failover the /var/lib/nfs directory
along with the exported partition and modify the rpc.statd script
to use the virtual node name, and add a notify call to the cluster
service start script plus a few other little things.

One little trick is blocking the NFS port for the NFS ip alias
during failover. Assuming you are using exportfs and exportfs -u
in your cluster service script to raise and lower the export
(to allow for unmounting of the failover partition, etc).
Doing this is req'd to stop the takeover node from returning
stale nfs handle errors when the IP alias has been raised (and
gratutiously arped for) but before the newly mounted fs has been exported.

ie. in your boot scripts, disable access to NFS and create
a chain to add accept rules later.

iptables -N nfs_allow
iptables -A INPUT -p udp -m udp --dport 2049 -j nfs_allow
iptables -A INPUT -p udp -m udp --dport 2049 -j DROP

and in the cluster service scripts I then do:

case "$1" in
'start')
exportfs -o rw,sync someclient.somedomain:/foo/bar
iptables -A nfs_allow --dest $nfs_ip_alias -j ACCEPT
rstatd -N -n $nfs_virtual_hostname
;;
'stop')
iptables -D nfs_allow --dest $nfs_ip_alias -j ACCEPT
exportfs -u someclient.somedomain:/foo/bar
;;

Multiple virtual NFS servers failover support is what doesn't
currently work right due to the way statd works. This is what i'm
looking at the moment - no timeline - no promises of patches -
just ideas.

I do have one patch to make the kernel RPC code reply using
the ipalias address instead of the base host IP (although it
is a bit of a hack as it directly changes the address of the
nfsd server socket before replies - maybe introducing problems
during concurrency). I'm looking at a better solution for this.

This is needed for UDP NFS clients like MacOS X that do a connect
and thusly refuse to see the packets coming from the linux NFS server
originating from a different address. It also solves problem with
accessing an ip alias NFS server through NAT or a firewall.

~mc

2002-11-19 18:19:41

by Juan Gomez

[permalink] [raw]
Subject: Re: Failover in NFS






I do not think we need changes to lockd. Earlier this year I sent a patch
to Alan Cox that would enable you to control server's lockd grace period
from user land which helps you do the takeover from server to server. Also
I have sent in patches included in tha latest version of NFS utils that
would let you do takeover along with the lockd patch of the kernel and
using shared /var/lib/nfs/sm directory. This is a good time to check if
Alan has had some time to include that patch.

Juan



|---------+---------------------------------->
| | Michael Clark |
| | <[email protected]|
| | om> |
| | Sent by: |
| | linux-kernel-owner@vger|
| | .kernel.org |
| | |
| | |
| | 11/18/02 11:40 PM |
| | |
|---------+---------------------------------->
>-------------------------------------------------------------------------------------------------------------------------|
| |
| To: [email protected] |
| cc: Ragnar Kj?rstad <[email protected]>, Jesse Pollard <[email protected]>, |
| [email protected] |
| Subject: Re: Failover in NFS |
| |
| |
>-------------------------------------------------------------------------------------------------------------------------|



On 11/19/02 13:07, Rashmi Agrawal wrote:
> Michael Clark wrote:
>
>
>>On 11/19/02 06:22, Ragnar Kj?rstad wrote:
>>
>>
>>>But it would not work as described above. There are some important
>>>limitations here:
>>>
>>>- I assumed that /var/lib/nfs is shared. If you want two servers to
>>> be active at once you need a different way to share lock-data.
>>
>>I'm looking at this problem right now. Basically to support multiple
>>virtual NFS servers with failover support, lockd could be modified to
>>communicate with the local statd using the dest IP used in the locking
>>operation - then statd can modified to look at the peer address
>>(which is normally 127.0.0.1) to find out which NFS virtual server
>>the monitor request is for. This would also allow you to run a statd
>>per virtual NFS server (bound to the specific address instead of
>>IPADDR_ANY).
>
>
> What is the plan for this implementation? Is it going to be part of main
line
> kernel.
> If yes then when will it be available and if no, when will it be
available in
> the form of
> patches.

I still don't know how hard the lockd changes would be.
More just thinking how it could be done now.

> Should we expect NFs failover support in linux kernel soon??

Simple failover with a single NFS service is already supported.

As ppl have said - you just failover the /var/lib/nfs directory
along with the exported partition and modify the rpc.statd script
to use the virtual node name, and add a notify call to the cluster
service start script plus a few other little things.

One little trick is blocking the NFS port for the NFS ip alias
during failover. Assuming you are using exportfs and exportfs -u
in your cluster service script to raise and lower the export
(to allow for unmounting of the failover partition, etc).
Doing this is req'd to stop the takeover node from returning
stale nfs handle errors when the IP alias has been raised (and
gratutiously arped for) but before the newly mounted fs has been exported.

ie. in your boot scripts, disable access to NFS and create
a chain to add accept rules later.

iptables -N nfs_allow
iptables -A INPUT -p udp -m udp --dport 2049 -j nfs_allow
iptables -A INPUT -p udp -m udp --dport 2049 -j DROP

and in the cluster service scripts I then do:

case "$1" in
'start')
exportfs -o rw,sync someclient.somedomain:/foo/bar
iptables -A nfs_allow --dest $nfs_ip_alias -j ACCEPT
rstatd -N -n $nfs_virtual_hostname
;;
'stop')
iptables -D nfs_allow --dest $nfs_ip_alias -j ACCEPT
exportfs -u someclient.somedomain:/foo/bar
;;

Multiple virtual NFS servers failover support is what doesn't
currently work right due to the way statd works. This is what i'm
looking at the moment - no timeline - no promises of patches -
just ideas.

I do have one patch to make the kernel RPC code reply using
the ipalias address instead of the base host IP (although it
is a bit of a hack as it directly changes the address of the
nfsd server socket before replies - maybe introducing problems
during concurrency). I'm looking at a better solution for this.

This is needed for UDP NFS clients like MacOS X that do a connect
and thusly refuse to see the packets coming from the linux NFS server
originating from a different address. It also solves problem with
accessing an ip alias NFS server through NAT or a firewall.

~mc

2002-11-21 20:52:34

by Bill Davidsen

[permalink] [raw]
Subject: Re: Failover in NFS

On Mon, 18 Nov 2002, Jesse Pollard wrote:

> It would actually be better to use two floating IP numbers. That way during
> normal operation, both servers would be functioning simultaneously
> (based on the shared storage on two nodes).
>
> Then during failover, the floating IP of the failed node is activated on the
> remaining node (total of 3 IP numbers now, one real, two floating). The NFS
> recovery cycle should then cause the clients to remount the filesystem from
> the backup server.
>
> When the failed node is recovered, the active server should then disable the
> floating IP associated with the recovered server, causing only the mounts
> using that IP number to fall back to the proper node, balancing the load
> again.

That works for stateless connections, but for stateful connections like
POP, NNTP, SMTP, etc, you will lose all the connections currently
actively.

A proper solution is the have the recovered server accept ESTABLISHED and
--syn packets, then DNAT the rest to the fallback server, while the
fallback server takes and new (--syn) packets and does DNAT to the
recovered server.

I'm not sure iptables can do this right, you probably need a program to
get the DNAT part just correct. There may be some one of the experimental
patches which adds that capability, since people do load balancing with
Linux. It might take source routing, and certainly will be harder than
just turning off the alias ;-)

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-11-21 22:46:23

by Jesse Pollard

[permalink] [raw]
Subject: Re: Failover in NFS

On Thursday 21 November 2002 02:58 pm, Bill Davidsen wrote:
> On Mon, 18 Nov 2002, Jesse Pollard wrote:
> > It would actually be better to use two floating IP numbers. That way
> > during normal operation, both servers would be functioning simultaneously
> > (based on the shared storage on two nodes).
> >
> > Then during failover, the floating IP of the failed node is activated on
> > the remaining node (total of 3 IP numbers now, one real, two floating).
> > The NFS recovery cycle should then cause the clients to remount the
> > filesystem from the backup server.
> >
> > When the failed node is recovered, the active server should then disable
> > the floating IP associated with the recovered server, causing only the
> > mounts using that IP number to fall back to the proper node, balancing
> > the load again.
>
> That works for stateless connections, but for stateful connections like
> POP, NNTP, SMTP, etc, you will lose all the connections currently
> actively.

yes. That is the point. NFS v3/4 CAN use TCP connections. The only way
I know to force them back to the recovered server IS to kill the connection.

> A proper solution is the have the recovered server accept ESTABLISHED and
> --syn packets, then DNAT the rest to the fallback server, while the
> fallback server takes and new (--syn) packets and does DNAT to the
> recovered server.

ahhh no. that doesn't work. The current connections have to be terminated
since what you are describing sounds like a fallback to a fallback.

If you want something like this you have to perform load balancing at
a router (with NAT/DNAT) where the load balancing implementation is
independant of the host. Then each host in the cluster (since there may
be more than two) has to inform the router of the current load (say once
every 5, 10, or 15 seconds). If you are in a high availability configuration,
I would expect that there would need to be at least two load balancing
routers (a primary and backup). Then if a router fails, the higher up network
router would select an alternate path which would end up at the backup load
balancer.

TCP context would be saved in that situation. Even traffic loads could
be balanced between the two routers. This works because the "state"
information is only source/destination routes for packets, not TCP.

If a host node fails (not a router), then NEW connections can be redirected.
Unfortunately, the context of existing connections to the failed host is lost.

> I'm not sure iptables can do this right, you probably need a program to
> get the DNAT part just correct. There may be some one of the experimental
> patches which adds that capability, since people do load balancing with
> Linux. It might take source routing, and certainly will be harder than
> just turning off the alias ;-)

I don't think the host itself CAN do it, since you then get into the case
of a destination also being a router. It also means the load really doesn't
get balanced since the host must still carry the load of forwarding traffic
to the real server.

This can get really nasty in a cluster if it becomes necessary to reboot
various nodes. Suddenly the nodes start forwarding traffic around and
not doing the work.

If you are describing the use of a Linux router, however, you are back
in the second discussion with the load balancing router, and I think
(based on other discussion, not personal knowlege) iptables might
do it, with a little user land assist (the load balancing computations
could dynamically change the iptables entries for destinations).

This all started without a load balancing router, and how to get NFS
to switch servers. This, I think, is not that complicated other than
the tricky IP enable/disable between the two servers; and does ASSUME
(yes I know - "ass out of you and me" :-) a stateless communication
protocol.

--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2002-11-22 07:01:03

by Rashmi Agrawal

[permalink] [raw]
Subject: Re: Failover in NFS

Hi all,

As an alternative to NFS, how about using OpenAFS(Andrew file system) which
happens to provide following

1. Failover
2. Common namespace
3. Client cachin and efficient wide area protocol for excellent performance.

Or how about using SAMBA.

Any views on the pros and cons??

Regards
Rashmi
Ragnar Kj?rstad wrote:

> On Mon, Nov 18, 2002 at 04:11:06PM -0600, Jesse Pollard wrote:
> > > No, you need to move the IP-address from the old nfs-server to the new
> > > one. Then to the clients it will look like a regular reboot. (Check out
> > > heartbeat, at http://www.linux-ha.org/)
> > >
> > > You need to make sure that NFS is using the shared ip (the one you move
> > > around) rather than the fixed ip. (I assume you will have a fixed ip on
> > > each host in addition to the one you move around). Also, you need to put
> > > /var/lib/nfs on shared stoarage. See the archive for more details.
> >
> > It would actually be better to use two floating IP numbers. That way during
> > normal operation, both servers would be functioning simultaneously
> > (based on the shared storage on two nodes).
> >
> > Then during failover, the floating IP of the failed node is activated on the
> > remaining node (total of 3 IP numbers now, one real, two floating). The NFS
> > recovery cycle should then cause the clients to remount the filesystem from
> > the backup server.
>
> Yes, that would be better.
>
> But it would not work as described above. There are some important
> limitations here:
>
> - I assumed that /var/lib/nfs is shared. If you want two servers to
> be active at once you need a different way to share lock-data.
>
> - AFAIK there is no way for statd to service 2 IP's at once.
> It will (AFAIK) bind to both adresses, but the problem is the
> message that is sent out at startup and includes the ip of
> the local host.
>
> Neither limitation is a law of nature. They can be fixed. I think there
> is work going on to change the way locks are stored, and I'm sure the
> second problem can be solved as well.
>
> There may be solutions out there already. E.g. maybe Lifekeeper or
> Convolo include better support for this?
>
> --
> Ragnar Kj?rstad
> Big Storage

2002-11-22 18:07:54

by Gunther Mayer

[permalink] [raw]
Subject: Re: Failover in NFS

Jesse Pollard wrote:

>On Thursday 21 November 2002 02:58 pm, Bill Davidsen wrote:
>
>
>>On Mon, 18 Nov 2002, Jesse Pollard wrote:
>>
>>
>>>It would actually be better to use two floating IP numbers. That way
>>>during normal operation, both servers would be functioning simultaneously
>>>(based on the shared storage on two nodes).
>>>
>>>Then during failover, the floating IP of the failed node is activated on
>>>the remaining node (total of 3 IP numbers now, one real, two floating).
>>>The NFS recovery cycle should then cause the clients to remount the
>>>filesystem from the backup server.
>>>
>>>When the failed node is recovered, the active server should then disable
>>>the floating IP associated with the recovered server, causing only the
>>>mounts using that IP number to fall back to the proper node, balancing
>>>the load again.
>>>
>>>
>>That works for stateless connections, but for stateful connections like
>>POP, NNTP, SMTP, etc, you will lose all the connections currently
>>actively.
>>
>>
>
>yes. That is the point. NFS v3/4 CAN use TCP connections. The only way
>I know to force them back to the recovered server IS to kill the connection.
>
NFS over TCP does work very well for such failover configurations with a
virtual IP address.

To the NFS client a failover is indistinguishable from a server
crash+reboot which
is guaranteed to work by NFS standard definition.