2002-05-31 20:19:32

by Bruce Allan

[permalink] [raw]
Subject: nfs-utils patch to statd for multi-homed hosts

The attached patch provides the capability for statd to work with
multi-homed hosts (i.e. NFS servers and clients with more than one IP
interface each) as until now listed as a "FIXME" comment in
utils/statd/rmtcall.c

The theory is when a server or client crashes, reboots, and attempts to
notify the other of it's status change, if either host's interface on which
the connection was originally made is no longer functional, statd will
attempt to use alternate interfaces if they exist. This is made possible
by changing the addr member of the notify_list structure to a list of
addresses equivalent to that returned by a gethostbyname() call. Failed
attempts at notifying an address removes that address from the list.

Sorry this is an attachment, the mail program I use will munge the tabs to
spaces and render the patch almost useless (changing mailers is on my todo
list). The patch has been created against nfs-utils-1-0-1-pre7. Comments,
etc. welcome!

(See attached file: patch)

Regards,
---
Bruce Allan <[email protected]>
Software Engineer, Linux Technology Center
IBM Corporation, Beaverton OR
503-578-4187 IBM Tie-line 775-4187


Attachments:
patch (8.16 kB)

2002-06-01 10:27:35

by Ragnar Kjørstad

[permalink] [raw]
Subject: Re: nfs-utils patch to statd for multi-homed hosts

On Fri, May 31, 2002 at 01:18:27PM -0700, Bruce Allan wrote:
> The attached patch provides the capability for statd to work with
> multi-homed hosts (i.e. NFS servers and clients with more than one IP
> interface each) as until now listed as a "FIXME" comment in
> utils/statd/rmtcall.c
Great! :-)

> The theory is when a server or client crashes, reboots, and attempts to
> notify the other of it's status change, if either host's interface on w=
hich
> the connection was originally made is no longer functional, statd will
> attempt to use alternate interfaces if they exist. This is made possib=
le
> by changing the addr member of the notify_list structure to a list of
> addresses equivalent to that returned by a gethostbyname() call. Faile=
d
> attempts at notifying an address removes that address from the list.

Are you saying notifications used to be sent out on the interface where
the connection happened? I thought it only happened on the primary
interface....

Does the new code work with aliases as well as physical interfaces?

How does this interact with the "--name" switch?

I believe --name originally replaced the gethostbyname() name with the
user-supplied one. Now that there are multiple interfaces to worry about
the old approach no longer works well.
I see a few different alternatives of fixing that:
1. Remove "--name". It was mostly written to fix the exact same problem
that your patch fixes in a bettery way. Unfortenately this will not
solve the problem with multiple names mapping to the same ip.
2. Change "--name" so that it _adds_ the hostname to the list of
names to use in the notification-message, rather than replacing
the existing one(s). I suppose it should also be changed to accept
mulitple hostnames.
3. Could we possible send the IP to the clients rather than the
hostnames? That would eliminate the problem of mulitple names
matching the same hostname.


I suppose I should go read the spec and find out how this really work
before making comments, but I don't have the time just now, so take the
comments with a grain of salt..



--=20
Ragnar Kj=F8rstad
Big Storage

_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-06-06 22:14:59

by Bruce Allan

[permalink] [raw]
Subject: Re: nfs-utils patch to statd for multi-homed hosts


See my responses below...

Regards,
---
Bruce Allan <[email protected]>
Software Engineer, Linux Technology Center
IBM Corporation, Beaverton OR
503-578-4187 IBM Tie-line 775-4187


=
=
=20
Ragnar Kj=F8rstad =
=
=20
<[email protected] To: Bruce Allan=
/Beaverton/IBM@IBMUS =
=20
a.no> cc: [email protected]=
ourceforge.net =
=20
Sent by: Subject: Re: [NFS] n=
fs-utils patch to statd for multi-homed hosts =
=20
[email protected] =
=
=20
ceforge.net =
=
=20
=
=
=20
=
=
=20
06/01/2002 03:27 AM =
=
=20
=
=
=20
=
=
=20



> On Fri, May 31, 2002 at 01:18:27PM -0700, Bruce Allan wrote:
> > The attached patch provides the capability for statd to work with
> > multi-homed hosts (i.e. NFS servers and clients with more than one =
IP
> > interface each) as until now listed as a "FIXME" comment in
> > utils/statd/rmtcall.c
> Great! :-)

> > The theory is when a server or client crashes, reboots, and attempt=
s to
> > notify the other of it's status change, if either host's interface =
on
which
> > the connection was originally made is no longer functional, statd w=
ill
> > attempt to use alternate interfaces if they exist. This is made
possible
> > by changing the addr member of the notify_list structure to a list =
of
> > addresses equivalent to that returned by a gethostbyname() call.
Failed
> > attempts at notifying an address removes that address from the list=
.

> Are you saying notifications used to be sent out on the interface whe=
re
> the connection happened? I thought it only happened on the primary
> interface....

That's not what I was trying to say, and you are correct - notification=
s
were/are sent out on the primary interface. But, if the primary interf=
ace
for some reason stops working on either the client or server during the=

change in state there is a problem. The most likely case of this would=
be
when the server crashes and it's primary interface is hosed upon reboot=
.

> Does the new code work with aliases as well as physical interfaces?

Not really; it starts with the IP address as provided on stable storage=

(i.e. /var/lib/nfs/sm/*) and does a reverse lookup followed by a forwar=
d
lookup and tries to notify each of the addresses in the h_addr_list in
succession until one is valid. The new code doesn't check for aliases.=


> How does this interact with the "--name" switch?

It should work fine with the use of the "--name" switch. When a host g=
ets
an SM_MON request which is specified with the "--name" switch, a forwar=
d
lookup is done on that name and the IP address is saved to stable stora=
ge.
When a change of state occurs and the host needs to send an SM_NOTIFY, =
it
does the reverse and forward lookups to get the list of addresses to tr=
y.
By doing the reverse lookup, we should get the same hostname as provide=
d
with the "--name" switch.

> I believe --name originally replaced the gethostbyname() name with th=
e
> user-supplied one. Now that there are multiple interfaces to worry ab=
out
> the old approach no longer works well.
> I see a few different alternatives of fixing that:
> 1. Remove "--name". It was mostly written to fix the exact same probl=
em
> that your patch fixes in a bettery way. Unfortenately this will no=
t
> solve the problem with multiple names mapping to the same ip.
> 2. Change "--name" so that it _adds_ the hostname to the list of
> names to use in the notification-message, rather than replacing
> the existing one(s). I suppose it should also be changed to accept=

> mulitple hostnames.
> 3. Could we possible send the IP to the clients rather than the
> hostnames? That would eliminate the problem of mulitple names
> matching the same hostname.
>
> I suppose I should go read the spec and find out how this really work=

> before making comments, but I don't have the time just now, so take t=
he
> comments with a grain of salt..

For option 1, I can't think of a reason off-hand, but I suspect there m=
ay
still be cases for keeping the "--name" switch around, even if just for=

historical reasons.

For options 2 and 3, the Open Group Technical Standard - Protocols for
Internetworking: XNFS, Version 3W, page 173 states that an SM_NOTIFY se=
nds
'"mon_name" [which] is the name of the host that had the state change';=
so
I don't think these suggestions are possible.

Thanks for the feedback!
Bruce.

=



_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs