2007-03-08 12:42:37

by Bernhard Busch

[permalink] [raw]
Subject: NFS mount problem (2000 NFS filesystems) of linux clients to a solaris server

Hi



I tried to mount 2000 filesystems from a
Sun Solaris Fileserver with the ZFS Filesystem
via NFS on Linux Clients.

Reason: Sun Solaris ZFS is a hierarchical filesystem
with a separate filesystem for a specific user
with quota support. We have approximately
2000 User and pool-folders.


It is possible to export these 2000 filesystems on the server (sun solaris)
and to mount these 2000 nfs filesystem on sgi and solaris
clients without any problems.


On Linux Clients (SLES10, Suse10.2) i get =

error messages like the following ones:

nfs bindresvport: Address already in use
nfs bindresvport: Address already in use
nfs bindresvport: Address already in use
mount: solaris10-02:/fs/DISK/disk1998: can't read superblock
mount: solaris10-02:/fs/DISK/disk1999: can't read superblock
mount: solaris10-02:/fs/DISK/disk2000: can't read superblock



Any help?


Bernhard

-- =

Dr. Bernhard Busch
Max-Planck-Institut f=FCr Biochemie
Rechenzentrum
Am Klopferspitz 18a
D-82152 Martinsried
Tel: +49(89)8578-2582
Fax: +49(89)8578-2479
Email [email protected]



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE=
VDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2007-03-13 13:14:25

by Steve Dickson

[permalink] [raw]
Subject: Re: NFS mount problem (2000 NFS filesystems) of linux clients to a solaris server

Olaf Kirch wrote:
> On Friday 09 March 2007 13:16, Bernhard Busch wrote:
>> But , if i remove the sleep command the
>>
>> nfs bindresvport: Address already in use
>
> This is a message from the mount command, and it's really
> a problem in the RPC library. At some point, mount would
> use 2 ports per mount (one when doing pmap_getport, the
> other when talking to the server's mountd). I think the getport
> call was fixed a while ago, as it doesn't really need a privport
> at all. But for many NFS servers, a privileged port is a must
> when talking to mountd.
True... there was a bug in the glibc code that was causing
pmap port queries to use privileged ports and the entire
privileged port range was not being used... both were fixed
while back...

>
> I think one reasonable fix for this would be to make mount
> (or the rpc library) issue a setsockopt(SOL_SOCKET, SO_REUSEADDR)
> *after* it's done with the request, and before closing the socket. That way,
> we can immediately rebind to this port, without risking confusion by having to
> mount commands bind to the same port at the same time.
I looked into doing this and it got really messy quick... Remember
SO_REUSEADDR is basically a server option used for listening
sockets... so when you use this option on the client, it works
but puts the socket in a very weird state... something just
looked wrong...

I'm of the option the true answer is, Stop using privileged ports
all together. The notion that using privileged ports give any
type of security is a bit absurd... imho... Especially now that we
have true security with the -o sec=<whatever> option...

So I'm thinking we should start allowing the actual NFS mount/traffic
to exist on any port and only keep the privileged port silliness
for mountd quires... something that could actually be done over
UDP (assuming there are no firewall issues)....

steved.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-08 13:08:10

by Talpey, Thomas

[permalink] [raw]
Subject: Re: NFS mount problem (2000 NFS filesystems) of linux clients to a solaris server

At 07:42 AM 3/8/2007, Bernhard Busch wrote:
>It is possible to export these 2000 filesystems on the server (sun solaris)
>and to mount these 2000 nfs filesystem on sgi and solaris
>clients without any problems.
>
>
>On Linux Clients (SLES10, Suse10.2) i get
>error messages like the following ones:
>
>nfs bindresvport: Address already in use
>nfs bindresvport: Address already in use
>nfs bindresvport: Address already in use
>mount: solaris10-02:/fs/DISK/disk1998: can't read superblock
>mount: solaris10-02:/fs/DISK/disk1999: can't read superblock
>mount: solaris10-02:/fs/DISK/disk2000: can't read superblock

You need to increase the number of ports available on the Linux NFS
client.

echo 65535 >/proc/sys/sunrpc/max_resvport

This will raise it to the maximum, you could use smaller values but
because you are creating so many mounts, that would quite possibly
start to collide with other reserved ports.

In fact, you might want to set /proc/sys/sunrpc/min_resvport to
something large (32768), also in order to avoid collisions. However
that in turn might require reconfiguring some of your NFS servers to
accept "nonprivileged" ports (Linux server export option "insecure",
others see documentation).

Tom.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-09 12:16:24

by Bernhard Busch

[permalink] [raw]
Subject: Re: NFS mount problem (2000 NFS filesystems) of linux clients to a solaris server

Talpey, Thomas wrote:




Hello Tom


Thank you very much for your help.
I was able to mount the 2000 NFS files after your modifications
via:

for i in `seq 1 2000`
do
mount -t nfs -o intr,hard,tcp solaris10-02:/fs/DISK/disk$i =

/fs/solaris10-02/DISK/disk$i
sleep 1
done


But , if i remove the sleep command the

nfs bindresvport: Address already in use

error appears again.

On solaris and sgi clients the above command works correctly without the =

sleep command.

So the machine takes about 1 hour to mount all nfs filesystems.

Any idea?

Best wishes

Bernhard




> At 07:42 AM 3/8/2007, Bernhard Busch wrote:
> =

>> It is possible to export these 2000 filesystems on the server (sun solar=
is)
>> and to mount these 2000 nfs filesystem on sgi and solaris
>> clients without any problems.
>>
>>
>> On Linux Clients (SLES10, Suse10.2) i get =

>> error messages like the following ones:
>>
>> nfs bindresvport: Address already in use
>> nfs bindresvport: Address already in use
>> nfs bindresvport: Address already in use
>> mount: solaris10-02:/fs/DISK/disk1998: can't read superblock
>> mount: solaris10-02:/fs/DISK/disk1999: can't read superblock
>> mount: solaris10-02:/fs/DISK/disk2000: can't read superblock
>> =

>
> You need to increase the number of ports available on the Linux NFS
> client.
>
> echo 65535 >/proc/sys/sunrpc/max_resvport
>
> This will raise it to the maximum, you could use smaller values but
> because you are creating so many mounts, that would quite possibly
> start to collide with other reserved ports.
>
> In fact, you might want to set /proc/sys/sunrpc/min_resvport to
> something large (32768), also in order to avoid collisions. However
> that in turn might require reconfiguring some of your NFS servers to
> accept "nonprivileged" ports (Linux server export option "insecure",
> others see documentation).
>
> Tom.
>
>
>
> =



-- =

Dr. Bernhard Busch
Max-Planck-Institut f=FCr Biochemie
Rechenzentrum
Am Klopferspitz 18a
D-82152 Martinsried
Tel: +49(89)8578-2582
Fax: +49(89)8578-2479
Email [email protected]



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE=
VDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-09 13:09:27

by Talpey, Thomas

[permalink] [raw]
Subject: Re: NFS mount problem (2000 NFS filesystems) of linux clients to a solaris server

Well, I know what's happening but I'm having a little difficulty coming
up with a clean workaround.

The mount command creates and binds a socket in userspace, then
passes it to the kernel for connecting to the server. Newer kernels
don't use this socket however, and it is closed when the mount
completes. However, it's a TCP socket and is bound to a privileged
port, the socket isn't destroyed immediately, it sticks around for
a short time.

The problem is that you are creating two thousand of these, and
the mounts are completing faster than these sockets are destroyed.
It's the mount command which is failing now, not the actual mount.

One solution may be to lower the maximum number of system tcp
orphans, which would prevent too many of these from clogging up
the TCP port space.
echo 100 >/proc/sys/net/ipv4/tcp_max_orphans
This is a really crude fix though, the default is 131072 and it is
not good for TCP correctness to destroy endpoints so readily.
So, I'd try saving and restorng this value around the mounts.
Maybe someone here can come up with a better workaround.

I think the better solution for you would be to think through how
you could use fewer mounts, maybe demand-mounting using an
automounter, or some other scheme. 2000+ mounts per client is
really way over the top in normal use, for most installations. Imagine
the traffic at boot time or network breakage, just to mount them
all. And if you have more than one client, the server sees Nx2000
connections!

I hope this helps.

Tom.

At 07:16 AM 3/9/2007, Bernhard Busch wrote:
>Talpey, Thomas wrote:
>
>
>
>
>Hello Tom
>
>
>Thank you very much for your help.
>I was able to mount the 2000 NFS files after your modifications
>via:
>
>for i in `seq 1 2000`
>do
> mount -t nfs -o intr,hard,tcp solaris10-02:/fs/DISK/disk$i =

>/fs/solaris10-02/DISK/disk$i
> sleep 1
> done
>
>
>But , if i remove the sleep command the
>
>nfs bindresvport: Address already in use
>
>error appears again.
>
>On solaris and sgi clients the above command works correctly without the =

>sleep command.
>
>So the machine takes about 1 hour to mount all nfs filesystems.
>
>Any idea?
>
>Best wishes
>
>Bernhard
>
>
>
>
>> At 07:42 AM 3/8/2007, Bernhard Busch wrote:
>> =

>>> It is possible to export these 2000 filesystems on the server (sun sola=
ris)
>>> and to mount these 2000 nfs filesystem on sgi and solaris
>>> clients without any problems.
>>>
>>>
>>> On Linux Clients (SLES10, Suse10.2) i get =

>>> error messages like the following ones:
>>>
>>> nfs bindresvport: Address already in use
>>> nfs bindresvport: Address already in use
>>> nfs bindresvport: Address already in use
>>> mount: solaris10-02:/fs/DISK/disk1998: can't read superblock
>>> mount: solaris10-02:/fs/DISK/disk1999: can't read superblock
>>> mount: solaris10-02:/fs/DISK/disk2000: can't read superblock
>>> =

>>
>> You need to increase the number of ports available on the Linux NFS
>> client.
>>
>> echo 65535 >/proc/sys/sunrpc/max_resvport
>>
>> This will raise it to the maximum, you could use smaller values but
>> because you are creating so many mounts, that would quite possibly
>> start to collide with other reserved ports.
>>
>> In fact, you might want to set /proc/sys/sunrpc/min_resvport to
>> something large (32768), also in order to avoid collisions. However
>> that in turn might require reconfiguring some of your NFS servers to
>> accept "nonprivileged" ports (Linux server export option "insecure",
>> others see documentation).
>>
>> Tom.
>>
>>
>>
>> =

>
>
>-- =

>Dr. Bernhard Busch
>Max-Planck-Institut f=FCr Biochemie
>Rechenzentrum
>Am Klopferspitz 18a
>D-82152 Martinsried
>Tel: +49(89)8578-2582
>Fax: +49(89)8578-2479
>Email [email protected]
>
>
>


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE=
VDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-09 14:28:40

by Olaf Kirch

[permalink] [raw]
Subject: Re: NFS mount problem (2000 NFS filesystems) of linux clients to a solaris server

On Friday 09 March 2007 13:16, Bernhard Busch wrote:
> But , if i remove the sleep command the
>
> nfs bindresvport: Address already in use

This is a message from the mount command, and it's really
a problem in the RPC library. At some point, mount would
use 2 ports per mount (one when doing pmap_getport, the
other when talking to the server's mountd). I think the getport
call was fixed a while ago, as it doesn't really need a privport
at all. But for many NFS servers, a privileged port is a must
when talking to mountd.

I think one reasonable fix for this would be to make mount
(or the rpc library) issue a setsockopt(SOL_SOCKET, SO_REUSEADDR)
*after* it's done with the request, and before closing the socket. That way,
we can immediately rebind to this port, without risking confusion by having to
mount commands bind to the same port at the same time.

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-09 14:48:04

by Olaf Kirch

[permalink] [raw]
Subject: Re: NFS mount problem (2000 NFS filesystems) of linux clients to a solaris server

On Friday 09 March 2007 14:09, Talpey, Thomas wrote:
> The mount command creates and binds a socket in userspace, then
> passes it to the kernel for connecting to the server. Newer kernels
> don't use this socket however, and it is closed when the mount
> completes. However, it's a TCP socket and is bound to a privileged
> port, the socket isn't destroyed immediately, it sticks around for
> a short time.

Why do we keep creating those at all?

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-09 15:05:33

by Talpey, Thomas

[permalink] [raw]
Subject: Re: NFS mount problem (2000 NFS filesystems) of linux clients to a solaris server

At 09:46 AM 3/9/2007, Olaf Kirch wrote:
>On Friday 09 March 2007 14:09, Talpey, Thomas wrote:
>> The mount command creates and binds a socket in userspace, then
>> passes it to the kernel for connecting to the server. Newer kernels
>> don't use this socket however, and it is closed when the mount
>> completes. However, it's a TCP socket and is bound to a privileged
>> port, the socket isn't destroyed immediately, it sticks around for
>> a short time.
>
>Why do we keep creating those at all?

Good question! There's already some checking for kernel version in that
code to handle ancient-kernel compat, it certainly could be extended
to this. Or tossed altogether. It certainly makes no sense for IPv6 or
RDMA or ...

Tom.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-12 12:35:52

by Talpey, Thomas

[permalink] [raw]
Subject: Re: NFS mount problem (2000 NFS filesystems) of linux clients to a solaris server

At 11:04 AM 3/9/2007, Talpey, Thomas wrote:
>At 09:46 AM 3/9/2007, Olaf Kirch wrote:
>>On Friday 09 March 2007 14:09, Talpey, Thomas wrote:
>>> The mount command creates and binds a socket in userspace, then
>>> passes it to the kernel for connecting to the server. Newer kernels
>>> don't use this socket however, and it is closed when the mount
>>> completes. However, it's a TCP socket and is bound to a privileged
>>> port, the socket isn't destroyed immediately, it sticks around for
>>> a short time.
>>
>>Why do we keep creating those at all?
>
>Good question! There's already some checking for kernel version in that
>code to handle ancient-kernel compat, it certainly could be extended
>to this. Or tossed altogether. It certainly makes no sense for IPv6 or
>RDMA or ...

I was going to volunteer a patch for this, but when I went and dug up
the last kernel to use the nfs_mount_data->fd, I discovered it was
2.1.31 (two dot *one* dot thirtyone) released on April 3 1997.

For its ten-year anniversary, is the better approach to delete this
support from nfs-utils altogether? It means that nfs-utils >1.0.12
won't work on early-to-mid 2.1 kernels. Which does not strike me
as a likely issue.

Or, would everyone prefer that the compatibility be retained?

I'll send the latter compatible patch if nobody has an opinion.

Tom.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-03-12 16:57:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS mount problem (2000 NFS filesystems) of linux clients to a solaris server

On Mon, 2007-03-12 at 08:35 -0400, Talpey, Thomas wrote:
> For its ten-year anniversary, is the better approach to delete this
> support from nfs-utils altogether? It means that nfs-utils >1.0.12
> won't work on early-to-mid 2.1 kernels. Which does not strike me
> as a likely issue.

The mount command _knows_ which kernel it is working on (and uses that
information to determine what revision of the mount structure to fill
in).
Why not just leave the fd argument blank for all kernels >= 2.2.0?

Cheers,
Trond


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs