2006-10-16 07:13:05

by Mohit Katiyar

[permalink] [raw]
Subject: Re: NFS inconsistent behaviour

Hi all,

I am currently using 2.6.16.21-0.8-smp kernel and SLES10 distribution.
I have two machines Machine1 and Machine2 both 4 processors SMP
machines.

The /etc/fstab file on both machines is as follows

/dev/sda3 / reiserfs acl,user_xattr 1 1
/dev/sda4 /home reiserfs acl,user_xattr 1 2
/dev/sda2 swap swap defaults 0 0
proc /proc proc defaults 0 0
sysfs /sys sysfs noauto 0 0
debugfs /sys/kernel/debug debugfs noauto 0 0
usbfs /proc/bus/usb usbfs noauto 0 0
devpts /dev/pts devpts mode=0620,gid=5 0 0

## NFS
##
server1k:/export/server1/s25vol1 /mnt/nfs1 nfs hard,intr,tcp
server2k:/export/server2/s25vol2 /mnt/nfs2 nfs hard,intr,tcp
server2k:/export/server2/s25_2sp /mnt/nfs3 nfs hard,intr,tcp

Now on Machine1 when I try to repeatedly mount/unmount the nfs
partition by the following procedure

[Machine1:] while :; do mount -a -F -t nfs ;umount -a -t nfs ; done

The partitions are mounted and unmounted two or three times and after
that the process comes to halt and after some time the following
message keeps on coming

and mounting/unmounting stops.

mount: RPC: Remote system error - Connection timed out

mount: RPC: Remote system error - Connection timed out

mount: RPC: Remote system error - Connection timed out

.

..

.

.

On Machine2 when I try the same mounting/unmounting infinitely then
after 30 seconds or around the following messages starts to display

mount: server2k:/export/server2/s25_2sp: can't read superblock
mount: server1k:/export/server1/s25vol1: can't read superblock
mount: server2k:/export/server2/s25vol2: can't read superblock
mount: server2k:/export/server2/s25_2sp: can't read superblock
mount: server1k:/export/server1/s25vol1: can't read superblock

.

.

.

Also there are several lockd daemon created

> $ ps -ef | grep lockd
> root 21 15 0 13:51 ? 00:00:00 [kblockd/0]
> root 22 15 0 13:51 ? 00:00:00 [kblockd/1]
> root 23 15 0 13:51 ? 00:00:00 [kblockd/2]
> root 24 15 0 13:51 ? 00:00:00 [kblockd/3]
> root 11715 1 0 13:58 ? 00:00:00 [lockd]
> root 11762 1 0 13:58 ? 00:00:00 [lockd]
> root 11765 1 0 13:58 ? 00:00:00 [lockd]
> root 11800 1 0 13:58 ? 00:00:00 [lockd]
> root 11801 1 0 13:58 ? 00:00:00 [lockd]
> root 11811 1 0 13:58 ? 00:00:00 [lockd]
> root 11823 1 0 13:58 ? 00:00:00 [lockd]
> root 11825 1 0 13:58 ? 00:00:00 [lockd]
> root 11834 1 0 13:58 ? 00:00:00 [lockd]
> root 11835 1 0 13:58 ? 00:00:00 [lockd]
> root 11839 1 0 13:58 ? 00:00:00 [lockd]
> root 11859 1 0 13:58 ? 00:00:00 [lockd]
> root 11860 1 0 13:58 ? 00:00:00 [lockd]
> root 11870 1 0 13:58 ? 00:00:00 [lockd]
> root 11879 1 0 13:58 ? 00:00:00 [lockd]
> root 11883 1 0 13:58 ? 00:00:00 [lockd]
> root 11884 1 0 13:58 ? 00:00:00 [lockd]
> root 11894 1 0 13:59 ? 00:00:00 [lockd]
> root 11913 1 0 13:59 ? 00:00:00 [lockd]
> root 11915 1 0 13:59 ? 00:00:00 [lockd]
> root 11927 1 0 13:59 ? 00:00:00 [lockd]
> root 11928 1 0 13:59 ? 00:00:00 [lockd]
> root 11929 1 0 13:59 ? 00:00:00 [lockd]
> root 11940 1 0 13:59 ? 00:00:00 [lockd]
> root 11953 1 0 13:59 ? 00:00:00 [lockd]
> root 11963 1 0 13:59 ? 00:00:00 [lockd]
> root 11964 1 0 13:59 ? 00:00:00 [lockd]
> root 11965 1 0 13:59 ? 00:00:00 [lockd]
> root 11974 1 0 13:59 ? 00:00:00 [lockd]
> root 11978 1 0 13:59 ? 00:00:00 [lockd]
> root 11979 1 0 13:59 ? 00:00:00 [lockd]
> root 11988 1 0 13:59 ? 00:00:00 [lockd]
> root 11989 1 0 13:59 ? 00:00:00 [lockd]
> root 12264 1 0 13:59 ? 00:00:00 [lockd]
> root 12268 1 0 13:59 ? 00:00:00 [lockd]
> root 12488 1 0 13:59 ? 00:00:00 [lockd]
> root 12489 1 0 13:59 ? 00:00:00 [lockd]
> root 12490 1 0 13:59 ? 00:00:00 [lockd]
> root 12500 1 0 13:59 ? 00:00:00 [lockd]
> nfs 12905 10345 0 13:59 pts/1 00:00:00 grep lockd

There are no messages in the /var/log/messages.

If I try to mount/unmount the partitions manually without a loop they
completely work fine. I am able to mount and unmount them without any
problem. But whenever this is put in a infinite loop it goes to
inconsistent state.

Is it a some kind of bug?

Does anyone faced the same problem or anyone can help me in the case
what is going wrong and where?

TIA

Mohit Katiyar

HCL Technologies Ltd.


2006-10-16 08:46:58

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: NFS inconsistent behaviour

On Mon, Oct 16, 2006 at 04:13:00PM +0900, Mohit Katiyar wrote:
[...]
>
> [Machine1:] while :; do mount -a -F -t nfs ;umount -a -t nfs ; done
>

This will quickly run out of [privileged] TCP sockets unless mount and
nfs use UDP.

Try mounting with -o udp

--
Frank

2006-10-16 09:35:26

by Mohit Katiyar

[permalink] [raw]
Subject: Re: NFS inconsistent behaviour

Hi,
But I think unmounting will free the sockets. I am also unmounting the
partition in the loop. Also both machines are same configuration but
show different behaviour.

Thanks
Mohit

On 10/16/06, Frank van Maarseveen <[email protected]> wrote:
> On Mon, Oct 16, 2006 at 04:13:00PM +0900, Mohit Katiyar wrote:
> [...]
> >
> > [Machine1:] while :; do mount -a -F -t nfs ;umount -a -t nfs ; done
> >
>
> This will quickly run out of [privileged] TCP sockets unless mount and
> nfs use UDP.
>
> Try mounting with -o udp
>
> --
> Frank
>

2006-10-16 09:39:08

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: NFS inconsistent behaviour

On Mon, Oct 16, 2006 at 06:35:24PM +0900, Mohit Katiyar wrote:
> Hi,
> But I think unmounting will free the sockets.

Try "netstat -t", when the problem occurs. It will probably
show a lot tcp connections in state TIME_WAIT.

--
Frank

2006-10-18 01:22:46

by Mohit Katiyar

[permalink] [raw]
Subject: Re: NFS inconsistent behaviour

Sorry Frank I couldnt check your response due to non availability of machine.
I checked it today and when i issued the netstat -t ,I could see a lot
of tcp connections in TIME_WAIT state.
Is this a normal behaviour? So we cannot mount and umount infinitely
with tcp option? Why there are so many connections in waiting state?
These all questions pop up suddenly when such things happen
Any help would be great

Thanks
Mohit

On 10/16/06, Frank van Maarseveen <[email protected]> wrote:
> On Mon, Oct 16, 2006 at 06:35:24PM +0900, Mohit Katiyar wrote:
> > Hi,
> > But I think unmounting will free the sockets.
>
> Try "netstat -t", when the problem occurs. It will probably
> show a lot tcp connections in state TIME_WAIT.
>
> --
> Frank
>

2006-10-18 06:39:48

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: NFS inconsistent behaviour

On Wed, Oct 18, 2006 at 10:22:44AM +0900, Mohit Katiyar wrote:
> I checked it today and when i issued the netstat -t ,I could see a lot
> of tcp connections in TIME_WAIT state.
> Is this a normal behaviour?

yes... but see below

> So we cannot mount and umount infinitely
> with tcp option? Why there are so many connections in waiting state?

I think it's called the 2MSL wait: there may be TCP segments on the
wire which (in theory) could disrupt new connections which reuse local
and remote port so the ports stay in use for a few minutes. This is
standard TCP behavior but only occurs when connections are improperly
shutdown. Apparently this happens when umounting a tcp NFS mount but
also for a lot of other tcp based RPC (showmount, rpcinfo). I'm not
sure who's to blame but it might be the rpc functions inside glibc.

I'd switch to NFS over udp if this is problem.

(cc'ed to nfs mailing list)

--
Frank

2006-10-18 17:57:28

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS inconsistent behaviour

On Wed, 2006-10-18 at 08:39 +0200, Frank van Maarseveen wrote:
> On Wed, Oct 18, 2006 at 10:22:44AM +0900, Mohit Katiyar wrote:
> > I checked it today and when i issued the netstat -t ,I could see a lot
> > of tcp connections in TIME_WAIT state.
> > Is this a normal behaviour?
>
> yes... but see below
>
> > So we cannot mount and umount infinitely
> > with tcp option? Why there are so many connections in waiting state?
>
> I think it's called the 2MSL wait: there may be TCP segments on the
> wire which (in theory) could disrupt new connections which reuse local
> and remote port so the ports stay in use for a few minutes. This is
> standard TCP behavior but only occurs when connections are improperly
> shutdown. Apparently this happens when umounting a tcp NFS mount but
> also for a lot of other tcp based RPC (showmount, rpcinfo). I'm not
> sure who's to blame but it might be the rpc functions inside glibc.
>
> I'd switch to NFS over udp if this is problem.

Just out of interest. Why does anyone actually _want_ to keep
mount/umounting to the point where they run out of ports? That is going
to kill performance in all sorts of unhealthy ways, not least by
completely screwing over any caching.

Note also that you _can_ change the range of ports used by the NFS
client itself at least. Just edit /proc/sys/sunrpc/{min,max}_resvport.
On the server side, you can use the 'insecure' option in order to allow
mounts that originate from non-privileged ports (i.e. port > 1024).
If you are using strong authentication (for instance RPCSEC_GSS/krb5)
then that actually makes a lot of sense, since the only reason for the
privileged port requirement was to disallow unprivileged NFS clients.

Cheers,
Trond

2006-10-18 18:37:28

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS inconsistent behaviour

On Wed, 2006-10-18 at 13:57 -0400, Trond Myklebust wrote:
> Note also that you _can_ change the range of ports used by the NFS
> client itself at least. Just edit /proc/sys/sunrpc/{min,max}_resvport.
> On the server side, you can use the 'insecure' option in order to allow
> mounts that originate from non-privileged ports (i.e. port > 1024).
> If you are using strong authentication (for instance RPCSEC_GSS/krb5)
> then that actually makes a lot of sense, since the only reason for the
> privileged port requirement was to disallow unprivileged NFS clients.

Oops... Something got lost there. That last sentence should read

...since the only reason for the privileged port requirement was
to disallow unprivileged NFS clients that could be used to spoof
other user identities via the weak AUTH_SYS authentication.

Cheers,
Trond

2006-10-18 18:38:10

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: NFS inconsistent behaviour

On Wed, Oct 18, 2006 at 01:57:09PM -0400, Trond Myklebust wrote:
> On Wed, 2006-10-18 at 08:39 +0200, Frank van Maarseveen wrote:
> > On Wed, Oct 18, 2006 at 10:22:44AM +0900, Mohit Katiyar wrote:
> > > I checked it today and when i issued the netstat -t ,I could see a lot
> > > of tcp connections in TIME_WAIT state.
> > > Is this a normal behaviour?
> >
> > yes... but see below
> >
> > > So we cannot mount and umount infinitely
> > > with tcp option? Why there are so many connections in waiting state?
> >
> > I think it's called the 2MSL wait: there may be TCP segments on the
> > wire which (in theory) could disrupt new connections which reuse local
> > and remote port so the ports stay in use for a few minutes. This is
> > standard TCP behavior but only occurs when connections are improperly
> > shutdown. Apparently this happens when umounting a tcp NFS mount but
> > also for a lot of other tcp based RPC (showmount, rpcinfo). I'm not
> > sure who's to blame but it might be the rpc functions inside glibc.
> >
> > I'd switch to NFS over udp if this is problem.
>
> Just out of interest. Why does anyone actually _want_ to keep
> mount/umounting to the point where they run out of ports? That is going
> to kill performance in all sorts of unhealthy ways, not least by
> completely screwing over any caching.

I ran out of privileged ports due to treemounting on /net from about 50
servers. The autofs program map for this uses the "showmount" command and
that one apparently uses privileged ports too (buried inside RPC client
libs part of glibc IIRC). The combination broke autofs and a number of
other services because there were no privileged ports left anymore.

So it can happen in practice.

> Note also that you _can_ change the range of ports used by the NFS
> client itself at least. Just edit /proc/sys/sunrpc/{min,max}_resvport.
> On the server side, you can use the 'insecure' option in order to allow
> mounts that originate from non-privileged ports (i.e. port > 1024).

Increasing the privileged port range in the kernel might be doable in
some cases. It might be useful to extend it to include port 2049 too.

--
Frank

2006-10-18 19:26:35

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS inconsistent behaviour

On Wed, 2006-10-18 at 20:38 +0200, Frank van Maarseveen wrote:
> I ran out of privileged ports due to treemounting on /net from about 50
> servers. The autofs program map for this uses the "showmount" command and
> that one apparently uses privileged ports too (buried inside RPC client
> libs part of glibc IIRC). The combination broke autofs and a number of
> other services because there were no privileged ports left anymore.

Yeah. The RPC library appears to always try to grab a privileged port if
it can. One solution would be to have the autofs scripts drop all
privileges before calling showmount.

I suppose we could also change the showmount program to create a socket
that is bound to an unprivileged port, then use
clnttcp_create()/clntudp_create().

We could probably do the same in the "mount" program when doing things
like interrogating the portmapper, probing for rpc ports etc. The only
case where mount might actually need to use a privileged port is when
talking to mountd. Even then, it could be trained to first try using an
unprivileged port.

Cheers,
Trond

2006-10-18 20:09:43

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: [NFS] NFS inconsistent behaviour

On Wed, Oct 18, 2006 at 03:26:20PM -0400, Trond Myklebust wrote:
> On Wed, 2006-10-18 at 20:38 +0200, Frank van Maarseveen wrote:
> > I ran out of privileged ports due to treemounting on /net from about 50
> > servers. The autofs program map for this uses the "showmount" command and
> > that one apparently uses privileged ports too (buried inside RPC client
> > libs part of glibc IIRC). The combination broke autofs and a number of
> > other services because there were no privileged ports left anymore.
>
> Yeah. The RPC library appears to always try to grab a privileged port if
> it can. One solution would be to have the autofs scripts drop all
> privileges before calling showmount.
>
> I suppose we could also change the showmount program to create a socket
> that is bound to an unprivileged port, then use
> clnttcp_create()/clntudp_create().
>
> We could probably do the same in the "mount" program when doing things
> like interrogating the portmapper, probing for rpc ports etc. The only
> case where mount might actually need to use a privileged port is when
> talking to mountd. Even then, it could be trained to first try using an
> unprivileged port.

If we could fix why there are that many connections in state TIME_WAIT
then using privileged ports would not be a problem either.

--
Frank

2006-10-18 20:17:48

by Chuck Lever

[permalink] [raw]
Subject: Re: [NFS] NFS inconsistent behaviour

On 10/18/06, Frank van Maarseveen <[email protected]> wrote:
> On Wed, Oct 18, 2006 at 03:26:20PM -0400, Trond Myklebust wrote:
> > On Wed, 2006-10-18 at 20:38 +0200, Frank van Maarseveen wrote:
> > > I ran out of privileged ports due to treemounting on /net from about 50
> > > servers. The autofs program map for this uses the "showmount" command and
> > > that one apparently uses privileged ports too (buried inside RPC client
> > > libs part of glibc IIRC). The combination broke autofs and a number of
> > > other services because there were no privileged ports left anymore.
> >
> > Yeah. The RPC library appears to always try to grab a privileged port if
> > it can. One solution would be to have the autofs scripts drop all
> > privileges before calling showmount.
> >
> > I suppose we could also change the showmount program to create a socket
> > that is bound to an unprivileged port, then use
> > clnttcp_create()/clntudp_create().
> >
> > We could probably do the same in the "mount" program when doing things
> > like interrogating the portmapper, probing for rpc ports etc. The only
> > case where mount might actually need to use a privileged port is when
> > talking to mountd. Even then, it could be trained to first try using an
> > unprivileged port.
>
> If we could fix why there are that many connections in state TIME_WAIT
> then using privileged ports would not be a problem either.

Some discussion on both FreeBSD and Linux mailing lists suggests that
ignoring TIME_WAIT has some risk to it, so that may not be an
advisable path to take. However, there are probably some cases where
it is safe, such as idle timeouts, where the client is certain there
is no traffic in flight.

Both client implementations (kernel and glibc) should re-use port
numbers or connections aggressively. To that end, the kernel RPC
client is already doing this. I know Red Hat has suggested using a
connection manager for user-level RPC applications to share. In
addition the kernel NFS client is sharing connections to a server
between all mount points going to that server.

--
"We who cut mere stones must always be envisioning cathedrals"
-- Quarry worker's creed

2006-10-18 20:44:53

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] NFS inconsistent behaviour

On Wed, 2006-10-18 at 16:17 -0400, Chuck Lever wrote:
> Both client implementations (kernel and glibc) should re-use port
> numbers or connections aggressively. To that end, the kernel RPC
> client is already doing this. I know Red Hat has suggested using a
> connection manager for user-level RPC applications to share. In
> addition the kernel NFS client is sharing connections to a server
> between all mount points going to that server.

IIRC, Mike Waychison did some work a couple of years ago on a userspace
daemon that managed RPC connections.

Cheers,
Trond

2006-10-19 01:53:45

by Mohit Katiyar

[permalink] [raw]
Subject: Re: NFS inconsistent behaviour

Yes, I do not want to mount unmount infinitely but was just checking
out of curiosity but mounting/unmounting infinitely works comepletely
fine on SLES 9 which uses 2.6.5 kernel. I was just wondering what has
been changed that it does not work now?

On 10/19/06, Trond Myklebust <[email protected]> wrote:
> On Wed, 2006-10-18 at 08:39 +0200, Frank van Maarseveen wrote:
> > On Wed, Oct 18, 2006 at 10:22:44AM +0900, Mohit Katiyar wrote:
> > > I checked it today and when i issued the netstat -t ,I could see a lot
> > > of tcp connections in TIME_WAIT state.
> > > Is this a normal behaviour?
> >
> > yes... but see below
> >
> > > So we cannot mount and umount infinitely
> > > with tcp option? Why there are so many connections in waiting state?
> >
> > I think it's called the 2MSL wait: there may be TCP segments on the
> > wire which (in theory) could disrupt new connections which reuse local
> > and remote port so the ports stay in use for a few minutes. This is
> > standard TCP behavior but only occurs when connections are improperly
> > shutdown. Apparently this happens when umounting a tcp NFS mount but
> > also for a lot of other tcp based RPC (showmount, rpcinfo). I'm not
> > sure who's to blame but it might be the rpc functions inside glibc.
> >
> > I'd switch to NFS over udp if this is problem.
>
> Just out of interest. Why does anyone actually _want_ to keep
> mount/umounting to the point where they run out of ports? That is going
> to kill performance in all sorts of unhealthy ways, not least by
> completely screwing over any caching.
>
> Note also that you _can_ change the range of ports used by the NFS
> client itself at least. Just edit /proc/sys/sunrpc/{min,max}_resvport.
> On the server side, you can use the 'insecure' option in order to allow
> mounts that originate from non-privileged ports (i.e. port > 1024).
> If you are using strong authentication (for instance RPCSEC_GSS/krb5)
> then that actually makes a lot of sense, since the only reason for the
> privileged port requirement was to disallow unprivileged NFS clients.
>
> Cheers,
> Trond
>
>

2006-10-19 12:08:50

by Alan

[permalink] [raw]
Subject: Re: [NFS] NFS inconsistent behaviour

Ar Mer, 2006-10-18 am 16:17 -0400, ysgrifennodd Chuck Lever:
> Some discussion on both FreeBSD and Linux mailing lists suggests that
> ignoring TIME_WAIT has some risk to it, so that may not be an

Ignoring time wait leads to corrupted sessions and can lead to tcp food
fights. It exists for a reason although the protocol itself actually
does still have flaws in this area (which are kept in the locked
cupboard full of skeletons at the IETF 8) )

Alan