2002-08-24 22:16:19

by James Bottomley

[permalink] [raw]
Subject: High Availability NFS Proposal

Hi All,

My company, SteelEye Technology, already has a HA-NFS offering (using
filehandle aliasing and kernel authentication add ons). However, We'd like to
enhance the existing open source solution (actually, our current patches are
open source, I just got tired of merging them to each revision of whatever
distributions kernel). The new fsid option already performs the same task as
our kernel filehandle aliasing patches, so the only remaining issues are
authentication and locking.

Unfortunately, our product, LifeKeeper, is more complex than just a simple two
node active passive cluster, so the usual just share /var/lib/nfs solution
won't work for us. What I'd like to propose instead is to place hooks inside
mountd and statd that would allow them to propagate the necessary state
information into a cluster. The best way I can think of is to designate two
executable hooks (say /var/lib/nfs/mountd-hook and /var/lib/nfs/statd-hook).
If mountd and statd don't find these on start up, they proceed normally.
However, if they exist, mountd and statd will execute them with certain
arguments to allow the clustering software to keep track of the client
machines correctly.

These modifications should be enough to allow HA-NFS. However, to preserve
locks on failover, I need to introduce statd to the concept of virtual hosts,
so it can be told to stop tracking a virtual host, or behave as though a
virtual host had crashed. There are already the beginnings of IP aliasing
support in statd.c, so it shouldn't be too hard to progress to the full blown
solution.

I'll proceed in two phases: mountd first to give a HA solution and statd next
to give the complete HA-NFS solution. The hooks should be useable by any HA
clustering software on Linux.

Any feedback anyone might have on this proposal would be more than welcome.

James Bottomley




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2002-08-25 05:28:47

by Bill Rugolsky Jr.

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

On Sat, Aug 24, 2002 at 06:16:03PM -0400, James Bottomley wrote:
> Unfortunately, our product, LifeKeeper, is more complex than just a simple two
> node active passive cluster, so the usual just share /var/lib/nfs solution
> won't work for us. What I'd like to propose instead is to place hooks inside
> mountd and statd that would allow them to propagate the necessary state
> information into a cluster. The best way I can think of is to designate two
> executable hooks (say /var/lib/nfs/mountd-hook and /var/lib/nfs/statd-hook).
> If mountd and statd don't find these on start up, they proceed normally.
> However, if they exist, mountd and statd will execute them with certain
> arguments to allow the clustering software to keep track of the client
> machines correctly.

James,

Neil Brown has said that he is working on the NFS authentication infrastructure,
which apparently includes some userland notification infrastructure for mountd.
You will probably want to take note of his work when deciding on the format
of those "certain arguments" to the notification hooks.

Back in 2.2.x, David Woodhouse wrote code (never merged) that added
mountd upcalls to knfsd. The code was modified by Elvis Pftzenreuter
<[email protected]>, then updated by Luis Claudio R. Goncalves
<[email protected]>. IIRC, the original impetus was to handle
re-authentication in the presence of wildcard (netgroup, etc.) exports
in a sane way.

You can find Elvis's original patchset here:

http://marc.theaimsgroup.com/?l=linux-nfs&m=87941837303295&w=2

Some of Neil Brown's early comments on how to improve and generalize the
infrastructure can be found here:

http://marc.theaimsgroup.com/?l=linux-nfs&m=87941837303307&w=2

I expect that that details of Neil's work on this has evolved in light of the
direction that NFSv4 has taken, but remains largely the same in broad outline.
Best to ask Neil about that, of course. :-)

Regards,

Bill Rugolsky


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-08-26 04:36:36

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

On Sat, 2002-08-24 at 15:16, James Bottomley wrote:

> Unfortunately, our product, LifeKeeper, is more complex than just a simple two
> node active passive cluster, so the usual just share /var/lib/nfs solution
> won't work for us. What I'd like to propose instead is to place hooks inside
> mountd and statd that would allow them to propagate the necessary state
> information into a cluster. The best way I can think of is to designate two
> executable hooks (say /var/lib/nfs/mountd-hook and /var/lib/nfs/statd-hook).

What would you plan on sharing via these hooks that isn't already
maintained in /var/lib/nfs? Why not use dnotify to capture changes to
those files, and push them out as clients come and go?

<b


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-08-26 04:46:57

by James Bottomley

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

> Back in 2.2.x, David Woodhouse wrote code (never merged) that added
> mountd upcalls to knfsd. The code was modified by Elvis Pftzenreuter
> <[email protected]>, then updated by Luis Claudio R. Goncalves
> <[email protected]>. IIRC, the original impetus was to handle
> re-authentication in the presence of wildcard (netgroup, etc.) exports
> in a sane way.

> You can find Elvis's original patchset here:

> http://marc.theaimsgroup.com/?l=linux-nfs&m=87941837303295&w=2

> Some of Neil Brown's early comments on how to improve and generalize
> the infrastructure can be found here:

> http://marc.theaimsgroup.com/?l=linux-nfs&m=87941837303307&w=2

Actually, this is exactly what our current HA-NFS patch set (for 2.2. and 2.4)
does. It's available at:

http://licensing.steeleye.com/open_source/sourcepage.php?prefix=nfs

but I agree with Linus: If you can do it from user level, there's no need for
kernel hooks. That's essentially the reason I'd like to abandon this approach
and go with an entirely user level scheme based on hooks in mount and statd.

James




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-08-26 04:51:57

by James Bottomley

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

[email protected] said:
> What would you plan on sharing via these hooks that isn't already
> maintained in /var/lib/nfs? Why not use dnotify to capture changes to
> those files, and push them out as clients come and go?

Extra information for statd would pertain to virtual host, but by and large,
the information is identical. Dnotify won't work because in order to close
race windows, the cluster has to be an active participant (i.e. mountd may not
reply until the cluster confirms it has the information), it can't just wait
on triggers.

James




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-08-26 17:55:34

by Spencer Shepler

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

On Sun, James Bottomley wrote:
> > Back in 2.2.x, David Woodhouse wrote code (never merged) that added
> > mountd upcalls to knfsd. The code was modified by Elvis Pftzenreuter
> > <[email protected]>, then updated by Luis Claudio R. Goncalves
> > <[email protected]>. IIRC, the original impetus was to handle
> > re-authentication in the presence of wildcard (netgroup, etc.) exports
> > in a sane way.
>
> > You can find Elvis's original patchset here:
>
> > http://marc.theaimsgroup.com/?l=linux-nfs&m=87941837303295&w=2
>
> > Some of Neil Brown's early comments on how to improve and generalize
> > the infrastructure can be found here:
>
> > http://marc.theaimsgroup.com/?l=linux-nfs&m=87941837303307&w=2
>
> Actually, this is exactly what our current HA-NFS patch set (for 2.2. and 2.4)
> does. It's available at:
>
> http://licensing.steeleye.com/open_source/sourcepage.php?prefix=nfs
>
> but I agree with Linus: If you can do it from user level, there's no need for
> kernel hooks. That's essentially the reason I'd like to abandon this approach
> and go with an entirely user level scheme based on hooks in mount and statd.

How would you propose dealing with NFSv4 where the mount and statd
daemons are not involved with that implementation?

--
Spencer



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-08-26 19:04:28

by James Bottomley

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

[email protected] said:
> How would you propose dealing with NFSv4 where the mount and statd
> daemons are not involved with that implementation?

Distributions are only just starting to use V3. V4 is so far out as not to
need consideration at this time.

James




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-08-26 19:42:03

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

On Sun, 2002-08-25 at 21:51, James Bottomley wrote:

> Extra information for statd would pertain to virtual host, but by and large,
> the information is identical. Dnotify won't work because in order to close
> race windows, the cluster has to be an active participant (i.e. mountd may not
> reply until the cluster confirms it has the information), it can't just wait
> on triggers.

Good point. By the way, the race condition you describe is catastrophic
for at least one Linux NFS client.

I replicated the condition that a client would see as a result of the
race under 2.4.19 (with Trond's jumbo NFS patch) as follows:

* Server exports an fs.
* Client mounts it.
* Server unexports it.
* Client attempts to access the fs it thinks it still has
mounted.

The client oopsed, and its VFS got completely wedged. The client in
question was running headless (I wasn't expecting this operation to
cause a disaster), so I haven't captured the oops yet, but it was ...
more exciting than I expected.

I'll post a full report of the problem later, as time permits.

So yes, server-side hooks seem, er, somewhat important.

<b


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-08-26 22:01:55

by Spencer Shepler

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

On Mon, James Bottomley wrote:
> [email protected] said:
> > How would you propose dealing with NFSv4 where the mount and statd
> > daemons are not involved with that implementation?
>
> Distributions are only just starting to use V3. V4 is so far out as not to
> need consideration at this time.

My comment was meant to draw out a discussion that with NFSv4 the
user-level hooks will not be available and an in-kernel implementation
would need to be considered. If the current in-kernel implementation
is moving towards user-level, then future v4 work will need to be redone.

--
Spencer



-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone? Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-23 13:54:25

by James Bottomley

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

[email protected] said:
> Any progress on this?... The idea seems quite sound. My only
> suggestion would be not to depend on the existance of a program in /
> var/lib/nfs, but rather to have such a program specified as a command
> line argument. It makes it more explicit.

Not yet...I was hoping to have this ready for our HA NFS in RHAS2.1, but the
time scales wouldn't line up. I'll probably have time to get back to it in
October.

I'll make the program directory be an explicit configuration option, thanks.

> Also, the need for this support in mountd will almost certainly go
> away in 2.6 (that is what I am working on at the moment) but ofcourse
> 2.4 is going to be around for a while so it is probably still worth
> while.

Yes, historically the 2.2 kernel was still being released up to a year after
2.4 came out. How are you planning to change mountd? make the kernel do the
authentication checks?

James




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-24 04:17:57

by NeilBrown

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

On Monday September 23, [email protected] wrote:
> [email protected] said:
> > Any progress on this?... The idea seems quite sound. My only
> > suggestion would be not to depend on the existance of a program in /
> > var/lib/nfs, but rather to have such a program specified as a command
> > line argument. It makes it more explicit.
>
> Not yet...I was hoping to have this ready for our HA NFS in RHAS2.1, but the
> time scales wouldn't line up. I'll probably have time to get back to it in
> October.

"The best laid plans..." and all that. :-)

>
> I'll make the program directory be an explicit configuration option, thanks.
>
> > Also, the need for this support in mountd will almost certainly go
> > away in 2.6 (that is what I am working on at the moment) but ofcourse
> > 2.4 is going to be around for a while so it is probably still worth
> > while.
>
> Yes, historically the 2.2 kernel was still being released up to a year after
> 2.4 came out. How are you planning to change mountd? make the kernel do the
> authentication checks?

I haven't give a lot of thought to the exact changes to mountd, but
basically it will do what it currently does, but also listen on
a connection from the kernel.
The kernel will say
"I got a request from xx.yy.zz.ww, who is that?"
and mountd will tell the kernel about client "fred" which has that IP
address.
Then the kernel will say
"I got a filehandle from fred with a 0xnnmmnnmm filesystem
identifier. Where is that"
and mountd will tell the kernel about whichever export point for that
client matches the filehandle.

The old tools will still work on the new kernel (as well as they work
currently) and hopefully the new tools will work on old kernels
(though that isn't quite so high a priority). I hope to start
submitting some of this stuff in October after I have had a week off.

NeilBrown


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-26 16:14:13

by James Bottomley

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

[email protected] said:
> I haven't give a lot of thought to the exact changes to mountd, but
> basically it will do what it currently does, but also listen on a
> connection from the kernel. The kernel will say
> "I got a request from xx.yy.zz.ww, who is that?" and mountd will
> tell the kernel about client "fred" which has that IP address. Then
> the kernel will say
> "I got a filehandle from fred with a 0xnnmmnnmm filesystem
> identifier. Where is that" and mountd will tell the kernel about
> whichever export point for that client matches the filehandle.

This is sort of the way our current HANFS authentication scheme works. We
have a daemon that sleeps on an NFS call. When a request comes in the kernel
doesn't recognise it sends the data up to the daemon which then gets mountd to
authenticate or reject the request.

You're welcome to the code (its on http://licensing.steeleye.com/open_source/so
urcepage.php?prefix=nfs). I'm afraid its not the best code I ever wrote, but
it's GPL'd, take whatever you can salvage.

James




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-15 10:13:04

by NeilBrown

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

On Saturday August 24, [email protected] wrote:
>
> I'll proceed in two phases: mountd first to give a HA solution and statd next
> to give the complete HA-NFS solution. The hooks should be useable by any HA
> clustering software on Linux.

Any progress on this?... The idea seems quite sound.
My only suggestion would be not to depend on the existance of a
program in /var/lib/nfs, but rather to have such a program specified
as a command line argument. It makes it more explicit.

Also, the need for this support in mountd will almost certainly go
away in 2.6 (that is what I am working on at the moment) but ofcourse
2.4 is going to be around for a while so it is probably still worth
while.

NeilBrown


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2002-09-15 10:18:36

by NeilBrown

[permalink] [raw]
Subject: Re: High Availability NFS Proposal

On Monday August 26, [email protected] wrote:
> On Mon, James Bottomley wrote:
> > [email protected] said:
> > > How would you propose dealing with NFSv4 where the mount and statd
> > > daemons are not involved with that implementation?
> >
> > Distributions are only just starting to use V3. V4 is so far out as not to
> > need consideration at this time.
>
> My comment was meant to draw out a discussion that with NFSv4 the
> user-level hooks will not be available and an in-kernel implementation
> would need to be considered. If the current in-kernel implementation
> is moving towards user-level, then future v4 work will need to be redone.

I'm definately considering NFSv4 and I don't buy the argument that it
can all be done from user-space. I'm working toward what I hope will
be a nice clean interface for kernel and userspace to work together on
a number of related issues.

The only way to leave it to user-space that I can see would be to
move the bulk of the NFS server into userspace. If fast web servers
can be done in user-space, why not fast NFS servers? There would need
to be a direct interface for filehandle lookup, but I suspect it
could be done, and would atleast be an interesting project. But I
personally am not particularly motivated to try it. It's largely
working in the kernel and I am happy for it to stay there.

NeilBrown


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs