2006-02-15 16:16:26

by Bernd Strieder

[permalink] [raw]
Subject: NFS mount in a changing network

Hello,

a bunch of machines had to change their IP addresses, or said
differently had to be moved into another IP network. One of the
machines had a long-running job that should continue to run. This
job (a theorem prover) had a log file on an NFS auto-mounted
volume redirected to from stdout. The client machine uses kernel
2.4.27, the server a SuSE kernel 2.4.21. Both are SuSE 9.0.

Due to the opened file, it was not possible to umount the volume,
and remount it with new IP addresses.

So, just changing the IP address was no option, so I added the
required new IP address to the interfaces of the server and the
client and left the old address for the running mount. New mounts
use the new addresses, the old mount continues to work. I had
tried this some days before, and then, it seemed to work.

The process seemed to be running fine, it has over 200000 minutes
of CPU since it was started, more than 100000 at the point the
change happened. But somehow the log-file stopped to be written
to, and I did not notice until a few days ago. The last write was
in about the time the IP address changes were done, and the server
was rebooted. The server booting should have made the client
waiting until it returned.

The situation is as follows, observe the "(deleted)" for the log
file.

# ll /proc/22664/fd

dr-x------ 2 user42 assis 0 2006-02-15 16:20 .
dr-xr-xr-x 3 user42 assis 0 2006-02-15 16:20 ..
lrwx------ 1 user42 assis 64 2006-02-15 16:20 0
-> /dev/pts/0 (deleted)
l-wx------ 1 user42 assis 64 2006-02-15 16:20 1
-> /home/user42/aaron/commendo.f0-5.cp.log.3 (deleted)
l-wx------ 1 user42 assis 64 2006-02-15 16:20 2
-> /home/user42/aaron/nohup.out

# ll /home/user42/aaron/commendo.f0-5.cp.log.3
-rw-r--r-- 1 user42 assis 397473403 2006-01-10
21:28 /home/user42/aaron/commendo.f0-5.cp.log.3

lsof does not list the file as being opened.

The mappings show a (deleted) too for the executable on the same
NFS volume.

# cat /proc/22664/maps
08048000-08101000 r-xp 00000000 00:0d
12590679 /home/user42/WMS/NEWHEAD/Waldmeister/bin/WaldmeisterII.fast
(deleted)
08101000-08106000 rw-p 000b8000 00:0d
12590679 /home/user42/WMS/NEWHEAD/Waldmeister/bin/WaldmeisterII.fast
(deleted)
08106000-0b331000 rwxp 00000000 00:00 0
40000000-40018000 r-xp 00000000 08:05 41943554 /lib/ld-2.3.2.so
40018000-40019000 rw-p 00017000 08:05 41943554 /lib/ld-2.3.2.so
40019000-40022000 rw-p 00000000 00:00 0
40032000-40054000 r-xp 00000000 08:05
16777782 /lib/i686/libm.so.6
40054000-40055000 rw-p 00021000 08:05
16777782 /lib/i686/libm.so.6
40055000-40181000 r-xp 00000000 08:05
16777781 /lib/i686/libc.so.6
40181000-40186000 rw-p 0012c000 08:05
16777781 /lib/i686/libc.so.6
40186000-402bc000 rw-p 00000000 00:00 0
....
97415000-a3d7c000 rw-p 5728f000 00:00 0
bfffc000-c0000000 rwxp ffffd000 00:00 0

Attaching gdb to the process and setting a breakpoint on printf,
after calling ferror returning 1, it is clear that printf does not
output anything. errno has the value 116 ESTALE.

How can this happen, an existing file gets into deleted status on
the client, without anybody having issued an unlink or whatever?
Or do stale filehandles finally result in deleted?

Since a large part of the log is missing the run finally is not
worth anything and will be killed sometimes. I'm leaving it
around, in the case some more information can still be gathered.

I don't know yet, whether my posting will be accepted, but please
CC me, as I'm not yet subscribed to the list. I will check the
archives, anyway.

Bernd Strieder


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-02-15 19:24:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS mount in a changing network

On Wed, 2006-02-15 at 17:16 +0100, Bernd Strieder wrote:

> Attaching gdb to the process and setting a breakpoint on printf,
> after calling ferror returning 1, it is clear that printf does not
> output anything. errno has the value 116 ESTALE.
>
> How can this happen, an existing file gets into deleted status on
> the client, without anybody having issued an unlink or whatever?
> Or do stale filehandles finally result in deleted?

See http://nfs.sourceforge.net/#faq_a10

Cheers,
Trond



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-02-16 10:16:37

by Bernd Strieder

[permalink] [raw]
Subject: Re: NFS mount in a changing network

Hello,

Trond Myklebust wrote:
> On Wed, 2006-02-15 at 17:16 +0100, Bernd Strieder wrote:
> > Attaching gdb to the process and setting a breakpoint on
> > printf, after calling ferror returning 1, it is clear that
> > printf does not output anything. errno has the value 116
> > ESTALE.
> >
> > How can this happen, an existing file gets into deleted status
> > on the client, without anybody having issued an unlink or
> > whatever? Or do stale filehandles finally result in deleted?
>
> See http://nfs.sourceforge.net/#faq_a10

There are 5 common reasons listed.

2. and 3. do not match, the file has not been touched on the server
or any other client.

5. does not match, it is XFS on the server side. There used to be
issues with NFS and XFS in the 2.4.x era, I had several times
hanging xfsdump probably on concurrent NFS accesses. But xfsdump
is not being used any more since a few months.

This leaves two reasons, 1. and 4.

4., i.e. a changed device id on the server should not be the
reason, the fs on the server is XFS on a regular scsi
disc /dev/sdc1, with manually assigned scsi ids, and no hardware
changes. Nothing should change on bootup.

If it is the 1. then I must have done an error during the whole
process of adding the additional IP addresses. The biggest error
probably trying it at all without any instructions.

Usually if a server is booted NFS clients hang, until the server is
back again. This is what happened always in the case of the
hanging xfsdumps, this is what I was relying on, when doing the
server reboot. The question is, whether during a usual server
reboot a client might interpret anything, that the server or the
file in question will not be reachable any more, before all
daemons are back again, then ESTALE would be the only acceptable
to do. E.g. when adding the IP addresses the old DNS names have
not been kept. Could this matter? What happens if directories from
the same fs are mounted via different IP addresses, might this
matter.

The exports line in question is:

/homes @crime(rw,async,no_root_squash) 192.168.16.28
(rw,async,no_root_squash)

@crime being a netgroup with the new addresses. This netgroup used
to be the old addresses.

The ultimate question, which might be helpful to others, which I
haven't found anything written on anywhere is:

What are the things to care about, if you want to switch the
network configuration of a NFS server and a NFS client if umount
must be avoided? Are there any chances to get this cleanly done?

Thanks,

Bernd Strieder


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-02-17 16:08:30

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS mount in a changing network

On Thu, 2006-02-16 at 11:16 +0100, Bernd Strieder wrote:

> The ultimate question, which might be helpful to others, which I
> haven't found anything written on anywhere is:
>
> What are the things to care about, if you want to switch the
> network configuration of a NFS server and a NFS client if umount
> must be avoided? Are there any chances to get this cleanly done?

The basic rule of thumb follows directly from 1:

If you have clients that have mounted a given partition, then do
_not_ bring up the server until you have set up all the exports.

IOW: if you want to move the clients from one server to another, then
the sequence is

1) Kill nfsd on _both_ servers
2) Move the IP address from server 1 to server 2
3) Add the exports for the clients from server 1 to /etc/exports on
server 2 (and possibly run exportfs)
4) Now bring nfsd up on server 2.

Cheers,
Trond



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-02-17 16:55:51

by Bernd Schubert

[permalink] [raw]
Subject: Re: NFS mount in a changing network

> The basic rule of thumb follows directly from 1:
>
> If you have clients that have mounted a given partition, then do
> _not_ bring up the server until you have set up all the exports.
>
> IOW: if you want to move the clients from one server to another, then
> the sequence is
>
> 1) Kill nfsd on _both_ servers
> 2) Move the IP address from server 1 to server 2
> 3) Add the exports for the clients from server 1 to /etc/exports on
> server 2 (and possibly run exportfs)
> 4) Now bring nfsd up on server 2.
>

We also always set fsid entries in the exports, to prevent problems if devi=
ce=20
names accidentely change.


Btw, information like this should be managed in a wiki, recently I found=20
http://wiki.linux-nfs.org, but it seems mainly related to NFSv4. Does=20
somebody mind if I would do some modifications there?

Cheers,
Bernd

=2D-=20
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universit=E4t Heidelberg
INF 229
69120 Heidelberg
e-mail: [email protected]


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-02-17 18:36:21

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS mount in a changing network

On Fri, 2006-02-17 at 17:55 +0100, Bernd Schubert wrote:
> > The basic rule of thumb follows directly from 1:
> >
> > If you have clients that have mounted a given partition, then do
> > _not_ bring up the server until you have set up all the exports.
> >
> > IOW: if you want to move the clients from one server to another, then
> > the sequence is
> >
> > 1) Kill nfsd on _both_ servers
> > 2) Move the IP address from server 1 to server 2
> > 3) Add the exports for the clients from server 1 to /etc/exports on
> > server 2 (and possibly run exportfs)
> > 4) Now bring nfsd up on server 2.
> >
>
> We also always set fsid entries in the exports, to prevent problems if device
> names accidentely change.
>
>
> Btw, information like this should be managed in a wiki, recently I found
> http://wiki.linux-nfs.org, but it seems mainly related to NFSv4. Does
> somebody mind if I would do some modifications there?

Feel free. I don't think we need to limit the subject matter to NFSv4 in
that wiki.

Cheers,
Trond



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-02-20 09:50:20

by Bernd Strieder

[permalink] [raw]
Subject: Re: NFS mount in a changing network

Hello,

Am Freitag, 17. Februar 2006 17:08 schrieben Sie:
> On Thu, 2006-02-16 at 11:16 +0100, Bernd Strieder wrote:
> > The ultimate question, which might be helpful to others, which
> > I haven't found anything written on anywhere is:
> >
> > What are the things to care about, if you want to switch the
> > network configuration of a NFS server and a NFS client if
> > umount must be avoided? Are there any chances to get this
> > cleanly done?
>
> The basic rule of thumb follows directly from 1:
>
> If you have clients that have mounted a given partition,
> then do _not_ bring up the server until you have set up all the
> exports.

IOW, if the server comes up, and has not the exports a running
client expects, then the file handles on the client will get
stale, and will not block until the server has the expected
exports again. So there is no safety net in that respect, which I
would prefer, if I had the choice. Many applications will not get
into a non-recoverable state, when blocked, but ESTALE is
different. I don't yet know whether closing the stale file-handle
is possible, so this handling is not nice to do.

>
> IOW: if you want to move the clients from one server to another,
> then the sequence is
>
> 1) Kill nfsd on _both_ servers
> 2) Move the IP address from server 1 to server 2
> 3) Add the exports for the clients from server 1 to
> /etc/exports on server 2 (and possibly run exportfs)
> 4) Now bring nfsd up on server 2.

I did not exactly move the server, but added additional IP
addresses of a different network to the server.

I cannot remember the exact sequence of things I did, but I added
export entries a few days before the main IP addresses changed,
and I verified that I could mount using both IP addresses of the
server, and I was happy, because it meant that I could possibly do
the change without killing that process on the client.

I used private addresses before, and used another private network
to try it out. I was not conscious that a missing export entry
will make the client stale, I did not experience any problems like
that during testing, probably because I did it right then. But in
the end it is highly possible that I really missed some export
entries at prime time. I need not tell you.

Thanks a lot,

Bernd Strieder


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs