2006-03-31 13:59:41

by Chris Osicki

[permalink] [raw]
Subject: Q: NFS server cluster

Hi

I've setup a two nodes Linux Cluster (Service Guard) as a NFS server.
They share storage which comes from two Clariions and do host based
mirroring (RAID1) using the standard Linux RAID driver: md.
So, when activating I assemble the array with mdadm --assemble,
then I activate LVM by running vgchange -ay then I mount the
LVOLs and export them. Everything's fine.
My testclient mounts the exported filesystem and then runs a program
which opens a file on the NFS-mounted filesystem and writes every
second few words into it.

Now if I initiate a fail-over (on the currently active node: unexport,
umount, vgchange -an, mdadm --stop and activate second node as
describe above) I get two kind of behaviour on the client:

1. Client doesn't notice the fail-over and I see it really happily
writing to the file (I checked on the server)

2. Client hangs for about nine minutes (during this time I get a "NFS
server mstlnfsv1 not responding still trying" when trying to read
this file and then the client recovers.

Few minutes later (sorry for being not precise, I've just started my
experimenting) the clients reports:

NFS getattr failed for server mstlnfsv1: error 7 (RPC: Authentication error)
NFS write failed for server mstlnfsv1: error 7 (RPC: Authentication error)
Error writing to /mnt/osk/testfile: Permission denied
NFS commit failed for server mstlnfsv1: error 7 (RPC: Authentication error)
NFS commit failed for server mstlnfsv1: error 7 (RPC: Authentication error)
NFS commit failed for server mstlnfsv1: error 7 (RPC: Authentication error)
NFS commit failed for server mstlnfsv1: error 7 (RPC: Authentication error)
NFS commit failed for server mstlnfsv1: error 7 (RPC: Authentication error)
NFS commit failed for server mstlnfsv1: error 7 (RPC: Authentication error)
NFS commit failed for server mstlnfsv1: error 7 (RPC: Authentication error)

Now I cannot access this file:

ls -l /mnt/osk/testfile
NFS getattr failed for server mstlnfsv1: error 7 (RPC: Authentication error)
ls: /mnt/osk/testfile: Permission denied


The testclient is a Sparc/Solaris10 machine.

Is there anything I can do to eliminate this behaviour?
If I understand correctly the problem is on the server side for the
client nothing changes, the same IP-address to talk to, the same
filesystem/filehandle. Or am I missing something.

I would be very thankfull for any help.

Regards,
Chris


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-04-01 12:16:47

by Jeff Layton

[permalink] [raw]
Subject: Re: Q: NFS server cluster

On Fri, 2006-03-31 at 16:02 +0200, Chris Osicki wrote:
> Is there anything I can do to eliminate this behaviour?
> If I understand correctly the problem is on the server side for the
> client nothing changes, the same IP-address to talk to, the same
> filesystem/filehandle. Or am I missing something.
>

Be sure the blockdev has the same device major/minor number on both
hosts, though it sounds like you may already have checked that.

Otherwise, this sounds a lot like an issue with ARP. When you float IP
addresses you need to have the new address owner send gratuitous ARPs so
that its neighbors update their caches. Make sure your failover software
is doing this, and that the client and/or router isn't ignoring them.

-- Jeff




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-04-04 13:13:17

by Chris Osicki

[permalink] [raw]
Subject: Re: Q: NFS server cluster

On Sat, 01 Apr 2006 07:16:06 -0500
Jeff Layton <[email protected]> wrote:

> On Fri, 2006-03-31 at 16:02 +0200, Chris Osicki wrote:
> > Is there anything I can do to eliminate this behaviour?
> > If I understand correctly the problem is on the server side for the
> > client nothing changes, the same IP-address to talk to, the same
> > filesystem/filehandle. Or am I missing something.
> >
>
> Be sure the blockdev has the same device major/minor number on both
> hosts, though it sounds like you may already have checked that.
>
> Otherwise, this sounds a lot like an issue with ARP. When you float IP
> addresses you need to have the new address owner send gratuitous ARPs so
> that its neighbors update their caches. Make sure your failover software
> is doing this, and that the client and/or router isn't ignoring them.
>
> -- Jeff
>
>


Thanks for your reply, Jeff.
The problem was self-induced. My fail-over script trying to free the
filesystem (on the server) in order to umount it incorrectly used
"fuser" and killed the rpc.mountd. It works as expected now.

Regards,
Chris


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs