2003-09-18 16:02:27

by Chris Worley

[permalink] [raw]
Subject: Stopping NFS, ip address take over, zero-copy NFS for 2.4.21, and misc

Hi,

background...

Configuration: FC SAN serving all luns to multiple dual-cpu 3.0GHz XEON
I/O servers (using Qlogic 24xx HBA's) running GFS, re-exported via NFS
to about a dozen clients per NFS server. Each I/O server has 96 nfsd
threads running. NFS is being served over Myrinet over IP. Any
problems listed below are true for both NFS over Ethernet and over
Myrinet over IP (but Myrinet is a lot more stable, with no frag
problems). Servers and clients all running RH7.3 w/ a 2.4.21 kernel.

Patches: GFS, Direct-I/O and related kernel patches (doesn't seem to
work with IOZone "-I" option), and NFSSVC_MAXBLKSIZE set to 32768. Any
problems listed below are both with and without these changes (except
GFS patches, I gotta have those). Qla23x0 driver: both SG_SEGMENTS and
MAX_OUTSTANDING_COMMANDS set to 4096.

Clients mount with options (all performance related):

bg,nocto,intr,vers=3,rsize=32768,wsize=32768,hard,retrans=1000,timeo=3,nolock,async

1) NFS won't shutdown.

No matter the number of nfsd threads, NFS won't shutdown. It sticks and
eventually times-out trying to kill the nfsd threads. With only one
client, this isn't a problem. So it's number-of-clients related.

If NFS doesn't shutdown, then I can't gracefully unmount and shutdown
GFS... which means the only way to reboot an NFS server is take down the
network, and let the lock server fence the I/O server. Not pretty.

Any ideas on forcing NFS down?

2) IP address takeover between NFS servers.

With NFS stateless, and not running lock servers, I thought a simple IP
address takeover scheme (when an I/O server goes down, another just adds
the failed server's IP address as a virtual interface) would allow
clients to immediately renegotiate with the same IP address pointing to
another NFS server (serving the same partitions). The take-over is
successful: the clients can communicate with the new I/O server, but I
get "permission denied" (as root or otherwise) on the NFS mounted
partitions most of the time (sometimes it works).

What am I missing?

3) Zero-copy NFS patches had been available for kernels prior to
2.4.21... but are missing from Trond's 2.4.21 patches. I have to use
2.4.21 for the time being (can't use 2.6).

Is there hope of getting these patches for this kernel rev?

4) I Need to have more outstanding SCSI requests.

The SAN I'm using can parallelize many more outstanding SCSI requests
than I'm sending it. The Qlogic scatter-gather list size and
outstanding command queue seem to be big enough to handle more
requests. I'm seeing, at most, 5 outstanding requests per NFS server.

Is there something at the SCSI layer or driver layer that will allow for
more outstanding I/O requests?

Is there a way to find out if this is a SCSI layer problem, vs. driver
or NFS or GFS file system problem (i.e. something in proc I can monitor
to see outstanding requests at these different levels)?

5) How come the "retrans" and "timeo" values set on the client mount
don't show up in /proc/mounts?

6) Any performance hints I'm missing?

Thanks,

Chris



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-09-18 21:22:04

by James Pearson

[permalink] [raw]
Subject: Re: Stopping NFS, ip address take over, zero-copy NFS for 2.4.21, and misc

Chris Worley wrote:
>
> 2) IP address takeover between NFS servers.
>
> With NFS stateless, and not running lock servers, I thought a simple IP
> address takeover scheme (when an I/O server goes down, another just adds
> the failed server's IP address as a virtual interface) would allow
> clients to immediately renegotiate with the same IP address pointing to
> another NFS server (serving the same partitions). The take-over is
> successful: the clients can communicate with the new I/O server, but I
> get "permission denied" (as root or otherwise) on the NFS mounted
> partitions most of the time (sometimes it works).
>
> What am I missing?

I don't know a lot about NFS fail over, but I guess unless the take-over
server has a copy (or the same copy) of /var/lib/nfs/rmtab, then you
will get "permission denied" - the server needs an entry for each
client. I guess it works some of the time as the take-over server has a
'valid' entry in its rmtab file for that client.

James Pearson



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-19 15:36:57

by rnews

[permalink] [raw]
Subject: Re: Stopping NFS, ip address take over, zero-copy NFS for 2.4.21, and misc

Chris Worley <[email protected]> wrote:
| 2) IP address takeover between NFS servers.
|
| With NFS stateless, and not running lock servers, I thought a simple IP
| address takeover scheme (when an I/O server goes down, another just adds
| the failed server's IP address as a virtual interface) would allow
| clients to immediately renegotiate with the same IP address pointing to
| another NFS server (serving the same partitions). The take-over is
| successful: the clients can communicate with the new I/O server, but I
| get "permission denied" (as root or otherwise) on the NFS mounted
| partitions most of the time (sometimes it works).
|
| What am I missing?

You should migrate the appropriate entries from /var/lib/nfs/rmtab to
the new server. Also, when the new server uses a different device
number, you need to use the "fsid" option in /etc/exports, to avoid
changes in the NFS file handles.

--
Dick Streefland //// Altium BV
[email protected] (@ @) http://www.altium.com
--------------------------------oOO--(_)--OOo---------------------------



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-19 01:14:14

by Chris Worley

[permalink] [raw]
Subject: Re: Stopping NFS, ip address take over, zero-copy NFS for 2.4.21, and misc

James,

Can I just "cat" fake entries into rmtab... or does client really need
to mount it once.

When is rmtab flushed (deleted of old/defunct entries)?

Thanks!

Chris
On Thu, 2003-09-18 at 15:21, James Pearson wrote:
> Chris Worley wrote:
> >
> > 2) IP address takeover between NFS servers.
> >
> > With NFS stateless, and not running lock servers, I thought a simple IP
> > address takeover scheme (when an I/O server goes down, another just adds
> > the failed server's IP address as a virtual interface) would allow
> > clients to immediately renegotiate with the same IP address pointing to
> > another NFS server (serving the same partitions). The take-over is
> > successful: the clients can communicate with the new I/O server, but I
> > get "permission denied" (as root or otherwise) on the NFS mounted
> > partitions most of the time (sometimes it works).
> >
> > What am I missing?
>
> I don't know a lot about NFS fail over, but I guess unless the take-over
> server has a copy (or the same copy) of /var/lib/nfs/rmtab, then you
> will get "permission denied" - the server needs an entry for each
> client. I guess it works some of the time as the take-over server has a
> 'valid' entry in its rmtab file for that client.
>
> James Pearson
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-19 13:24:57

by Matt Schillinger

[permalink] [raw]
Subject: Re: Stopping NFS, ip address take over, zero-copy NFS for 2.4.21, and misc

On Thu, 2003-09-18 at 16:21, James Pearson wrote:
> Chris Worley wrote:
> >
> > 2) IP address takeover between NFS servers.
> >
> > With NFS stateless, and not running lock servers, I thought a simple IP
> > address takeover scheme (when an I/O server goes down, another just adds
> > the failed server's IP address as a virtual interface) would allow
> > clients to immediately renegotiate with the same IP address pointing to
> > another NFS server (serving the same partitions). The take-over is
> > successful: the clients can communicate with the new I/O server, but I
> > get "permission denied" (as root or otherwise) on the NFS mounted
> > partitions most of the time (sometimes it works).
> >
> > What am I missing?
>
> I don't know a lot about NFS fail over, but I guess unless the take-over
> server has a copy (or the same copy) of /var/lib/nfs/rmtab, then you
> will get "permission denied" - the server needs an entry for each
> client. I guess it works some of the time as the take-over server has a
> 'valid' entry in its rmtab file for that client.
>

Yes, i made a little daemon that for each exported directory, generates
a 'directory specific' rmtab file in
/path/to/directory/.nfs_cluster/rmtab. It updates each export directory
every 2 seconds and copies the rmtab to rmtab.old , then generates a
clean rmtab. it has worked fine for me to prevent permission
denied/Stale NFS handles.

Upon the failover, after the filesystem is mounted by the 'takeover
server', the .nfs_cluster/rmtab is processed and added to
/var/lib/nfs/rmtab before exportfs instances are ran on the export
directories.

What are you using for failover?

I use heartbeat.

I also had to make a 2 second delay after IP takeover, for the IP to
actually be working, before I run the exportfs for the newly mounted
filesystem.

Matt Schillinger
[email protected]


> James Pearson
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs





-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-19 12:50:31

by Bernd Schubert

[permalink] [raw]
Subject: Re: Stopping NFS, ip address take over, zero-copy NFS for 2.4.21, and misc

> 2) IP address takeover between NFS servers.
>
> With NFS stateless, and not running lock servers, I thought a simple IP
> address takeover scheme (when an I/O server goes down, another just adds
> the failed server's IP address as a virtual interface) would allow
> clients to immediately renegotiate with the same IP address pointing to
> another NFS server (serving the same partitions). The take-over is
> successful: the clients can communicate with the new I/O server, but I
> get "permission denied" (as root or otherwise) on the NFS mounted
> partitions most of the time (sometimes it works).
>
> What am I missing?
>

Don't know whats going wrong at your network, but it works pretty well on o=
ur=20
systems.=20
How do you sync both server system (main server and fall back server)? Well=
,=20
since its here a root-fs server and changes are done rarely, we simply=20
connect them via nbd and sync using dd of the whole device once a night.
=46or our main-data server (/home, etc) there is currently no fall back, bu=
t we=20
plan to doing this in the near future. To do so, we want to connect via enb=
d=20
and do a network raid1 between both systems.=20
Let me guess, do you sync using tar, etc.? This won't work since you need t=
he=20
very same inode numbers. The only way to do so, is to use network raid or a=
=20
1:1 copy of the whole partion-device.

Cheers,
Bernd


Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universit=E4t Heidelberg
INF 229
69120 Heidelberg
e-mail: [email protected]


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-19 09:05:52

by rnews

[permalink] [raw]
Subject: Re: Stopping NFS, ip address take over, zero-copy NFS for 2.4.21, and misc

Chris Worley <[email protected]> wrote:
| 2) IP address takeover between NFS servers.
|
| With NFS stateless, and not running lock servers, I thought a simple IP
| address takeover scheme (when an I/O server goes down, another just adds
| the failed server's IP address as a virtual interface) would allow
| clients to immediately renegotiate with the same IP address pointing to
| another NFS server (serving the same partitions). The take-over is
| successful: the clients can communicate with the new I/O server, but I
| get "permission denied" (as root or otherwise) on the NFS mounted
| partitions most of the time (sometimes it works).

You need to migrate the entries in /var/lib/nfs/rmtab to the new
machine. Also, when the device number on the new machine differs
from the device number on the old machine, you need to use the "fsid"
option in /etc/exports, to make sure the file handles don't change.

--
Dick Streefland //// Altium BV
[email protected] (@ @) http://www.altium.com
--------------------------------oOO--(_)--OOo---------------------------



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-19 22:44:58

by Chris Worley

[permalink] [raw]
Subject: Re: Stopping NFS, ip address take over, zero-copy NFS for 2.4.21, and misc

On Fri, 2003-09-19 at 07:26, Matt Schillinger wrote:
> On Thu, 2003-09-18 at 16:21, James Pearson wrote:
> > Chris Worley wrote:
> > >
> > > 2) IP address takeover between NFS servers.
> > >
> > > With NFS stateless, and not running lock servers, I thought a simple IP
> > > address takeover scheme (when an I/O server goes down, another just adds
> > > the failed server's IP address as a virtual interface) would allow
> > > clients to immediately renegotiate with the same IP address pointing to
> > > another NFS server (serving the same partitions). The take-over is
> > > successful: the clients can communicate with the new I/O server, but I
> > > get "permission denied" (as root or otherwise) on the NFS mounted
> > > partitions most of the time (sometimes it works).
> > >
> > > What am I missing?
> >
> > I don't know a lot about NFS fail over, but I guess unless the take-over
> > server has a copy (or the same copy) of /var/lib/nfs/rmtab, then you
> > will get "permission denied" - the server needs an entry for each
> > client. I guess it works some of the time as the take-over server has a
> > 'valid' entry in its rmtab file for that client.
> >
>
> Yes, i made a little daemon that for each exported directory, generates
> a 'directory specific' rmtab file in
> /path/to/directory/.nfs_cluster/rmtab. It updates each export directory
> every 2 seconds and copies the rmtab to rmtab.old , then generates a
> clean rmtab. it has worked fine for me to prevent permission
> denied/Stale NFS handles.
>
> Upon the failover, after the filesystem is mounted by the 'takeover
> server', the .nfs_cluster/rmtab is processed and added to
> /var/lib/nfs/rmtab before exportfs instances are ran on the export
> directories.
>
> What are you using for failover?
>
> I use heartbeat.
>
> I also had to make a 2 second delay after IP takeover, for the IP to
> actually be working, before I run the exportfs for the newly mounted
> filesystem.

GFS has a built-in "fence" system, which employs some sort of
heartbeat. When a server stops responding, a "fence" script gets run by
GFS's lock server... there's a fence script for our ICE Boxes: it power
cycles the server and adds the virtual interface on the next server (the
IP takeover).

Chris
>
> Matt Schillinger
> [email protected]
>
>
> > James Pearson
> >
> >
> >
> > -------------------------------------------------------
> > This sf.net email is sponsored by:ThinkGeek
> > Welcome to geek heaven.
> > http://thinkgeek.com/sf
> > _______________________________________________
> > NFS maillist - [email protected]
> > https://lists.sourceforge.net/lists/listinfo/nfs
>
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-20 10:21:23

by ian sison (mailing list)

[permalink] [raw]
Subject: An NFS-HA HOWTO anyone? WAS> Re: Stopping NFS, ip address take over, zero-copy NFS for 2.4.21, and misc


On Fri, 19 Sep 2003 [email protected] wrote:

> Chris Worley <[email protected]> wrote:
> | 2) IP address takeover between NFS servers.
> |
> | With NFS stateless, and not running lock servers, I thought a simple IP
> | address takeover scheme (when an I/O server goes down, another just adds
> | the failed server's IP address as a virtual interface) would allow
> | clients to immediately renegotiate with the same IP address pointing to
> | another NFS server (serving the same partitions). The take-over is
> | successful: the clients can communicate with the new I/O server, but I
> | get "permission denied" (as root or otherwise) on the NFS mounted
> | partitions most of the time (sometimes it works).
>
> You need to migrate the entries in /var/lib/nfs/rmtab to the new
> machine. Also, when the device number on the new machine differs
> from the device number on the old machine, you need to use the "fsid"
> option in /etc/exports, to make sure the file handles don't change.

This is important information, and it should go into a howto. NFS-HA is a
very common requirement nowadays, and there's very little information i've
seen on the net except for the Mission Critical Linux site.





-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-22 14:26:19

by Matt Schillinger

[permalink] [raw]
Subject: Re: An NFS-HA HOWTO anyone? WAS> Re: Stopping NFS, ip address take over, zero-copy NFS for 2.4.21, and misc

On Sat, 2003-09-20 at 05:22, ian sison (mailing list) wrote:
>
> On Fri, 19 Sep 2003 [email protected] wrote:
>
> > Chris Worley <[email protected]> wrote:
> > | 2) IP address takeover between NFS servers.
> > |
> > | With NFS stateless, and not running lock servers, I thought a simple IP
> > | address takeover scheme (when an I/O server goes down, another just adds
> > | the failed server's IP address as a virtual interface) would allow
> > | clients to immediately renegotiate with the same IP address pointing to
> > | another NFS server (serving the same partitions). The take-over is
> > | successful: the clients can communicate with the new I/O server, but I
> > | get "permission denied" (as root or otherwise) on the NFS mounted
> > | partitions most of the time (sometimes it works).
> >
> > You need to migrate the entries in /var/lib/nfs/rmtab to the new
> > machine. Also, when the device number on the new machine differs
> > from the device number on the old machine, you need to use the "fsid"
> > option in /etc/exports, to make sure the file handles don't change.
>
> This is important information, and it should go into a howto. NFS-HA is a
> very common requirement nowadays, and there's very little information i've
> seen on the net except for the Mission Critical Linux site.
>
I need to update info on the page, and quite honestly, i think the info
i have is buggy.. (I will update this week), but i have a document
showing how I achieve 'Active-Active' HA NFS over Shared SCSI.. Mainly,
it's by replicating rmtab entries by storing mountpoint specific rmtab
entries on the shared storage mountpoint...

the address is:

http://chilli.linuxmds.com/~mschilli/NFS/

Bookmark the page, and I promise by mid week, i'll have it updated.. I
will post to the mailing list when the update is complete.

Matt Schillinger
[email protected]
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-22 16:34:26

by Tom McNeal

[permalink] [raw]
Subject: Re: An NFS-HA HOWTO anyone? WAS> Re: Stopping NFS, ip address take over, zero-copy NFS for 2.4.21, and misc

Hi -

The NFS failover is not that simple, depending on the HA cluster used.
Mission Critical Linux used an IP migration scheme, but had to jump through
hoops to force the client to renegotiate; otherwise the new server would just
look at the file handle, which had MAC address info, and return errors.
They (ok, We) did not publish that in open source, so that the Convolo
cluster had lock failover capability, but the open source Kimberlite cluster
did not.

Anyway, since the behavior is so product dependent, it hasn't been really
addressed in the FAQ, but I'll look at it again.

Regards -

Tom

--
Tom McNeal
(650)906-0761(cell)
(650)964-8459(fax)
> On Sat, 2003-09-20 at 05:22, ian sison (mailing list) wrote:
> >
> > On Fri, 19 Sep 2003 [email protected] wrote:
> >
> > > Chris Worley <[email protected]> wrote:
> > > | 2) IP address takeover between NFS servers.
> > > |
> > > | With NFS stateless, and not running lock servers, I thought a simple IP
> > > | address takeover scheme (when an I/O server goes down, another just adds
> > > | the failed server's IP address as a virtual interface) would allow
> > > | clients to immediately renegotiate with the same IP address pointing to
> > > | another NFS server (serving the same partitions). The take-over is
> > > | successful: the clients can communicate with the new I/O server, but I
> > > | get "permission denied" (as root or otherwise) on the NFS mounted
> > > | partitions most of the time (sometimes it works).
> > >
> > > You need to migrate the entries in /var/lib/nfs/rmtab to the new
> > > machine. Also, when the device number on the new machine differs
> > > from the device number on the old machine, you need to use the "fsid"
> > > option in /etc/exports, to make sure the file handles don't change.
> >
> > This is important information, and it should go into a howto. NFS-HA is a
> > very common requirement nowadays, and there's very little information i've
> > seen on the net except for the Mission Critical Linux site.
> >
> I need to update info on the page, and quite honestly, i think the info
> i have is buggy.. (I will update this week), but i have a document
> showing how I achieve 'Active-Active' HA NFS over Shared SCSI.. Mainly,
> it's by replicating rmtab entries by storing mountpoint specific rmtab
> entries on the shared storage mountpoint...
>
> the address is:
>
> http://chilli.linuxmds.com/~mschilli/NFS/
>
> Bookmark the page, and I promise by mid week, i'll have it updated.. I
> will post to the mailing list when the update is complete.
>
> Matt Schillinger
> [email protected]
> >
> >
> >
> >
> > -------------------------------------------------------
> > This sf.net email is sponsored by:ThinkGeek
> > Welcome to geek heaven.
> > http://thinkgeek.com/sf
> > _______________________________________________
> > NFS maillist - [email protected]
> > https://lists.sourceforge.net/lists/listinfo/nfs
>
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs