2004-12-08 08:58:56

by Olaf Kirch

[permalink] [raw]
Subject: xprt_bindresvport

Hi,

the current xprt_bindresvport implementation will search for a privileged
port by counting down from 800 to 0. I think this is a bug, because it
will potentially interfere with services trying to bind to low ports as
well. The bindresvport implementation in glibc picks from the 600-1023
range.

I also think it would be good to start at a "random" port. Otherwise,
when you reboot, the server may still have a TCB for the old connection
and send you an ACK probe when you try to connect (if all goes well), and
the client's TCP stack will RST and fail the connect. If things go
not-so-well you have a packet filter somewhere inbetween that eats the
ACK probe because its connection tracking engine thinks the connection
is in half-open and shouldn't see any SYN-less ACKs yet.

Olaf
--
Olaf Kirch | Things that make Monday morning interesting, #2:
[email protected] | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-12-08 14:33:25

by Lever, Charles

[permalink] [raw]
Subject: RE: xprt_bindresvport

> the current xprt_bindresvport implementation will search for=20
> a privileged
> port by counting down from 800 to 0. I think this is a bug, because it
> will potentially interfere with services trying to bind to=20
> low ports as
> well.

is this idle speculation, or do you actually have a test case that
fails? :^)

> The bindresvport implementation in glibc picks from the 600-1023
> range.

we should review what other RPC implementations do (namely the reference
implementation, Solaris).

but also notice this cuts the usable port range in half (from ~800 to
~420). we need some form of mitigation to ensure we aren't limiting the
number of NFS mounts a client can have.

> I also think it would be good to start at a "random" port. Otherwise,
> when you reboot, the server may still have a TCB for the old=20
> connection
> and send you an ACK probe when you try to connect (if all=20
> goes well), and
> the client's TCP stack will RST and fail the connect. If things go
> not-so-well you have a packet filter somewhere inbetween that eats the
> ACK probe because its connection tracking engine thinks the connection
> is in half-open and shouldn't see any SYN-less ACKs yet.

i don't agree. you're just making the bad behavior more rare by
choosing ports at random; you are not actually addressing the root
problem you describe. is it possible to fix the RPC client to retry the
connection in this case? and/or fix the server to recognize and remove
the context for the old connection?

introducing randomness here will make reproducing problems in this area
very difficult.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-08 18:18:42

by Mike Waychison

[permalink] [raw]
Subject: [PATCH] xprt sharing (was Re: xprt_bindresvport)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lever, Charles wrote:
>>the current xprt_bindresvport implementation will search for
>>a privileged
>>port by counting down from 800 to 0. I think this is a bug, because it
>>will potentially interfere with services trying to bind to
>>low ports as
>>well.
>
>
> is this idle speculation, or do you actually have a test case that
> fails? :^)
>

Well, I haven't seen this 'interfere' with services yet, I can imagine
that it is plausible for a service to want to grab some port at a later
time only to have it in use by nfs.

>
>>The bindresvport implementation in glibc picks from the 600-1023
>>range.
>
>
> we should review what other RPC implementations do (namely the reference
> implementation, Solaris).
>
> but also notice this cuts the usable port range in half (from ~800 to
> ~420). we need some form of mitigation to ensure we aren't limiting the
> number of NFS mounts a client can have.


This has been bugging me for a while. The fact that we are limitting
ourselves to a single nfs mount per port. From what I can tell, Solaris
shares the transports between nfs mounts from the same server and saves
themselves a lot of trouble with running out of port numbers in doing so.

The attached patch does the same for Linux against 2.6.9. We share
xprts from existing connections, effectively removing any limit on the
number of nfs mounts we have in the system.

The only thing to worry about now is any talking to the portmapper or
mountd from userspace using tcp, which will put the reserved ports in
TIME_WAIT state. This can limit the 'speed' at which we mount many mounts.

- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBt0VQdQs4kOxk3/MRAjo8AKCJjoqPxk1R3ev+o+UyquE0kiw0LQCfebi8
6mfhMSHLidFslVyt6jFeNis=
=0uCI
-----END PGP SIGNATURE-----


Attachments:
xprt_sharing.diff (5.19 kB)

2004-12-08 19:08:31

by Lever, Charles

[permalink] [raw]
Subject: RE: [PATCH] xprt sharing (was Re: xprt_bindresvport)

> > but also notice this cuts the usable port range in half=20
> (from ~800 to
> > ~420). we need some form of mitigation to ensure we aren't=20
> limiting the
> > number of NFS mounts a client can have.
>=20
> This has been bugging me for a while. The fact that we are limitting
> ourselves to a single nfs mount per port. From what I can=20
> tell, Solaris
> shares the transports between nfs mounts from the same server=20
> and saves
> themselves a lot of trouble with running out of port numbers=20
> in doing so.
>=20
> The attached patch does the same for Linux against 2.6.9. We share
> xprts from existing connections, effectively removing any limit on the
> number of nfs mounts we have in the system.
>=20
> The only thing to worry about now is any talking to the portmapper or
> mountd from userspace using tcp, which will put the reserved ports in
> TIME_WAIT state. This can limit the 'speed' at which we=20
> mount many mounts.

we're looking at a similar solution. we want to make sure we don't
limit the scalability of everyone's mount point by making them all
funnel through a single slot table.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-08 22:00:35

by Lever, Charles

[permalink] [raw]
Subject: RE: [PATCH] xprt sharing (was Re: xprt_bindresvport)

i don't have a patch like this, but trond may have something.

> -----Original Message-----
> From: Mike Waychison [mailto:[email protected]]=20
> Sent: Wednesday, December 08, 2004 4:58 PM
> To: Lever, Charles
> Cc: Olaf Kirch; [email protected]
> Subject: Re: [PATCH] xprt sharing (was Re: [NFS] xprt_bindresvport)
>=20
>=20
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>=20
> Lever, Charles wrote:
> >>>but also notice this cuts the usable port range in half=20
> >>
> >>(from ~800 to
> >>
> >>>~420). we need some form of mitigation to ensure we aren't=20
> >>
> >>limiting the
> >>
> >>>number of NFS mounts a client can have.
> >>
> >>This has been bugging me for a while. The fact that we are=20
> limitting
> >>ourselves to a single nfs mount per port. From what I can=20
> >>tell, Solaris
> >>shares the transports between nfs mounts from the same server=20
> >>and saves
> >>themselves a lot of trouble with running out of port numbers=20
> >>in doing so.
> >>
> >>The attached patch does the same for Linux against 2.6.9. We share
> >>xprts from existing connections, effectively removing any=20
> limit on the
> >>number of nfs mounts we have in the system.
> >>
> >>The only thing to worry about now is any talking to the=20
> portmapper or
> >>mountd from userspace using tcp, which will put the=20
> reserved ports in
> >>TIME_WAIT state. This can limit the 'speed' at which we=20
> >>mount many mounts.
> >=20
> >=20
> > we're looking at a similar solution. we want to make sure we don't
> > limit the scalability of everyone's mount point by making them all
> > funnel through a single slot table.
> >=20
>=20
> Can you post any work in progress for this? The xprt patch I posted
> was written a while ago, and I just realized this afternoon that it
> doesn't seem to do the right thing for tcp sockets that are=20
> autoclosed.
>=20
> If you have a similar patch that works, it would save me the=20
> trouble ;)
>=20
>=20
> - --
> Mike Waychison
> Sun Microsystems, Inc.
> 1 (650) 352-5299 voice
> 1 (416) 202-8336 voice
>=20
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> NOTICE: The opinions expressed in this email are held by me,
> and may not represent the views of Sun Microsystems, Inc.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.5 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>=20
> iD8DBQFBt3jodQs4kOxk3/MRArXNAKCaahtv7uNfhX2n2yaz/N3D18t0vgCfSOLa
> rOARY+qtJrFfWOtb0m18cSk=3D
> =3DHlXX
> -----END PGP SIGNATURE-----
>=20


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-08 21:58:11

by Mike Waychison

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lever, Charles wrote:
>>>but also notice this cuts the usable port range in half
>>
>>(from ~800 to
>>
>>>~420). we need some form of mitigation to ensure we aren't
>>
>>limiting the
>>
>>>number of NFS mounts a client can have.
>>
>>This has been bugging me for a while. The fact that we are limitting
>>ourselves to a single nfs mount per port. From what I can
>>tell, Solaris
>>shares the transports between nfs mounts from the same server
>>and saves
>>themselves a lot of trouble with running out of port numbers
>>in doing so.
>>
>>The attached patch does the same for Linux against 2.6.9. We share
>>xprts from existing connections, effectively removing any limit on the
>>number of nfs mounts we have in the system.
>>
>>The only thing to worry about now is any talking to the portmapper or
>>mountd from userspace using tcp, which will put the reserved ports in
>>TIME_WAIT state. This can limit the 'speed' at which we
>>mount many mounts.
>
>
> we're looking at a similar solution. we want to make sure we don't
> limit the scalability of everyone's mount point by making them all
> funnel through a single slot table.
>

Can you post any work in progress for this? The xprt patch I posted
was written a while ago, and I just realized this afternoon that it
doesn't seem to do the right thing for tcp sockets that are autoclosed.

If you have a similar patch that works, it would save me the trouble ;)


- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBt3jodQs4kOxk3/MRArXNAKCaahtv7uNfhX2n2yaz/N3D18t0vgCfSOLa
rOARY+qtJrFfWOtb0m18cSk=
=HlXX
-----END PGP SIGNATURE-----


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 08:55:09

by Peter Åstrand

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

On Wed, 8 Dec 2004, Mike Waychison wrote:

> This has been bugging me for a while. The fact that we are limitting
> ourselves to a single nfs mount per port. From what I can tell, Solaris
> shares the transports between nfs mounts from the same server and saves
> themselves a lot of trouble with running out of port numbers in doing so.
>
> The attached patch does the same for Linux against 2.6.9. We share
> xprts from existing connections, effectively removing any limit on the
> number of nfs mounts we have in the system.

Good work!


> The only thing to worry about now is any talking to the portmapper or
> mountd from userspace using tcp, which will put the reserved ports in
> TIME_WAIT state. This can limit the 'speed' at which we mount many
> mounts.

How about using SO_REUSEADDR?

--
Peter ?strand Chief Developer
Cendio http://www.thinlinc.com
Teknikringen 3 http://www.cendio.se
583 30 Link?ping Phone: +46-13-21 46 00




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 11:01:32

by Olaf Kirch

[permalink] [raw]
Subject: Re: xprt_bindresvport

On Wed, Dec 08, 2004 at 06:33:15AM -0800, Lever, Charles wrote:
> > port by counting down from 800 to 0. I think this is a bug, because it
> > will potentially interfere with services trying to bind to
> > low ports as
> > well.
>
> is this idle speculation, or do you actually have a test case that
> fails? :^)

I've seen many cases of glibc's bindresvport (which allocates from
600-1023) snatching ports used by other services, e.g. cups. The way
we solve this in user space is by having a file in /etc with a blacklist
of ports bindresvport isn't allowed to touch.

I admit that I haven't seen NFS step on someone else toes yet. But it
all depends on the number of mounts you have, and on coincidence.

At any rate, it really looks like a bug to me. Why should our scan include
ports in the low range (which is a no-no) but refuse to touch ports in the
800-1023 range? It's just an inverted test, IMO.

> i don't agree. you're just making the bad behavior more rare by
> choosing ports at random; you are not actually addressing the root
> problem you describe. is it possible to fix the RPC client to retry the
> connection in this case? and/or fix the server to recognize and remove
> the context for the old connection?

You cannot fix the RPC client. It will not see any packets from the
server, so it simply sits there until it times out and retries the
connection. This can cause system boot to take quite a long time.

You can fix the firewall, this is currently being tossed around on the
netfilter list. Unfortunately not all routers out there run the latest and
greatest netfilter code; so we don't really have much control over that.

> introducing randomness here will make reproducing problems in this area
> very difficult.

I agree. And I would only enable randomization if we change the port
range to 600-up or something like this.

Olaf
--
Olaf Kirch | Things that make Monday morning interesting, #2:
[email protected] | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 11:14:17

by Olaf Kirch

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

On Thu, Dec 09, 2004 at 09:54:58AM +0100, Peter =C5strand wrote:
> > The only thing to worry about now is any talking to the portmapper or
> > mountd from userspace using tcp, which will put the reserved ports in
> > TIME_WAIT state. This can limit the 'speed' at which we mount many
> > mounts.
>=20
> How about using SO_REUSEADDR?

You cannot use it safely on active (i.e. client) sockets. Consider this

Application A:
set SO_REUSEADDR
bind to port 1234
connect to server foo, port 2049

Application B:
set SO_REUSEADDR
bind to port 1234 (succeeds because of REUSEADDR)
connect to server foo, port 2049: fails with EADDRNOTAVAIL,
because there already is a connection from
client:1234 -> foo:2049

Olaf
--=20
Olaf Kirch | Things that make Monday morning interesting, #2:
[email protected] | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 11:22:37

by Olaf Kirch

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

On Wed, Dec 08, 2004 at 11:08:18AM -0800, Lever, Charles wrote:
> we're looking at a similar solution. we want to make sure we don't
> limit the scalability of everyone's mount point by making them all
> funnel through a single slot table.

I think you will hit all sorts of limits on the way if you do this,
not just the sunrpc locks. Using a single TCP connection means you
will have to serialize all sendmsg() calls across all clients, because
otherwise you'll mess up the RPC record framing.
You may also run into the max send/recv buffer sizes of a socket.

I cannot see how this can scale very well.

Olaf
--
Olaf Kirch | Things that make Monday morning interesting, #2:
[email protected] | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 11:31:12

by Olaf Kirch

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

On Wed, Dec 08, 2004 at 01:17:53PM -0500, Mike Waychison wrote:
> This has been bugging me for a while. The fact that we are limitting
> ourselves to a single nfs mount per port. From what I can tell, Solaris
> shares the transports between nfs mounts from the same server and saves
> themselves a lot of trouble with running out of port numbers in doing so.

Shouldn't we allow NFS mounts to use non-privileged ports? Many
environments don't really care about the "security" provided by privileged
ports, but would be more than happy if they can run with a few hundred
NFS mounts

Olaf
--
Olaf Kirch | Things that make Monday morning interesting, #2:
[email protected] | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 13:34:15

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

to den 09.12.2004 Klokka 12:22 (+0100) skreiv Olaf Kirch:

> I think you will hit all sorts of limits on the way if you do this,
> not just the sunrpc locks. Using a single TCP connection means you
> will have to serialize all sendmsg() calls across all clients, because
> otherwise you'll mess up the RPC record framing.
> You may also run into the max send/recv buffer sizes of a socket.
>
> I cannot see how this can scale very well.

As long as it scales better than 1 privileged port per mountpoint. ;-)

Seriously, though: we *already* have this serialization problem with the
single client per transport case, and so there is nothing that needs to
added to the locking in order to deal with multiple clients per
transport. IOW contention today is at the per-request level and it would
have to remain so for the shared transport case.

Note also that we could also create pools of several transport sockets
per server: the current locking scheme allows for that too. That would
improve per-request scalability at the same time as it allows us to
limit the privileged port usage. There be a couple of small dragons
there (unless you are running on NFSv4.1 w/ sessions, you cannot replay
a request on a different port for instance) but such a scheme does not
have to be too sophisticated to be useful.

Cheers,
Trond

--
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 13:37:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

to den 09.12.2004 Klokka 12:31 (+0100) skreiv Olaf Kirch:

> Shouldn't we allow NFS mounts to use non-privileged ports? Many
> environments don't really care about the "security" provided by privileged
> ports, but would be more than happy if they can run with a few hundred
> NFS mounts

Most AUTH_SYS based models still require it, however I agree that use of
strong security makes the privileged port totally redundant.

Cheers,
Trond
--
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 13:42:04

by Olaf Kirch

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

On Thu, Dec 09, 2004 at 08:33:32AM -0500, Trond Myklebust wrote:
> Seriously, though: we *already* have this serialization problem with the
> single client per transport case, and so there is nothing that needs to
> added to the locking in order to deal with multiple clients per
> transport. IOW contention today is at the per-request level and it would
> have to remain so for the shared transport case.

But two separate mounts with separate sockets do not serialize (at least
they shouldn't). And contention doesn't happen on the client only.
The server needs to serialize sending over TCP as well; the more
sockets you have the less likely it will step on its own toes.

> Note also that we could also create pools of several transport sockets
> per server: the current locking scheme allows for that too. That would
> improve per-request scalability at the same time as it allows us to
> limit the privileged port usage.

Yes, that would help scalability a lot.

Olaf
--
Olaf Kirch | Things that make Monday morning interesting, #2:
[email protected] | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 13:44:46

by Olaf Kirch

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

On Thu, Dec 09, 2004 at 08:36:55AM -0500, Trond Myklebust wrote:
> > Shouldn't we allow NFS mounts to use non-privileged ports? Many
> > environments don't really care about the "security" provided by privileged
> > ports, but would be more than happy if they can run with a few hundred
> > NFS mounts
>
> Most AUTH_SYS based models still require it, however I agree that use of
> strong security makes the privileged port totally redundant.

The Linux auth_sys works fine with unprivileged ports if you allow it to;
so why shouldn't we make that configurable on the client too? It'd sure
help some installations who for some reason or other habe an excessive
number of exported file systems (see .sig below :).

Olaf
--
Olaf Kirch | Things that make Monday morning interesting, #2:
[email protected] | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 14:03:22

by Lever, Charles

[permalink] [raw]
Subject: RE: [PATCH] xprt sharing (was Re: xprt_bindresvport)

> But two separate mounts with separate sockets do not=20
> serialize (at least they shouldn't).

there's so much spin locking and BKL activity in both the RPC client and
the NFS client that effectively, there is significant serialization
today.

> > Note also that we could also create pools of several=20
> transport sockets=20
> > per server: the current locking scheme allows for that too.=20
> That would=20
> > improve per-request scalability at the same time as it allows us to=20
> > limit the privileged port usage.
>=20
> Yes, that would help scalability a lot.

that's one direction we would like to take after the transport switch
API is integrated into 2.6.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 16:36:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

to den 09.12.2004 Klokka 14:44 (+0100) skreiv Olaf Kirch:

> The Linux auth_sys works fine with unprivileged ports if you allow it to;
> so why shouldn't we make that configurable on the client too? It'd sure
> help some installations who for some reason or other habe an excessive
> number of exported file systems (see .sig below :).

_Another_ mount option? Urgh... 8-)

Seriously: if you have 8000 NFS mount points, you do not want 8000
different superblocks, 8000 different RPC client structs, and 8000
different sockets.
Apart from being a ridiculous waste of resources, that does little to
pop the contention bubble. It just ends up pushing it down to the
(shared) device layer.

So while I am open to the idea of making use of unprivileged ports, I do
not accept it as a substitute for socket sharing.

Cheers,
Trond
--
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 19:34:50

by Dan Stromberg

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

On Thu, 2004-12-09 at 12:31 +0100, Olaf Kirch wrote:
> On Wed, Dec 08, 2004 at 01:17:53PM -0500, Mike Waychison wrote:
> > This has been bugging me for a while. The fact that we are limitting
> > ourselves to a single nfs mount per port. From what I can tell, Solaris
> > shares the transports between nfs mounts from the same server and saves
> > themselves a lot of trouble with running out of port numbers in doing so.
>
> Shouldn't we allow NFS mounts to use non-privileged ports? Many
> environments don't really care about the "security" provided by privileged
> ports, but would be more than happy if they can run with a few hundred
> NFS mounts

IMO, this is a good time to apply the principle: "Be liberal in what you
accept, and conservative in what you send".

Last I heard, windows didn't even have a concept of a reserved port.

When I wrote a BSD-compatible printsystem in python, I made it accept
connections from any port, but generate connections only from reserved
ports.

It'd probably be worthwhile to have options to make NFS (and my
printsystem) generate any port (not just reserved ones), and accept only
reserved ports - but the default probably should be to accept any port,
and send only reserved ports - not because reserved ports are effective
at all, but because it'll avoid never ending questions about why NFS
isn't working.



Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-12-09 21:34:12

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

to den 09.12.2004 Klokka 11:34 (-0800) skreiv Dan Stromberg:

> It'd probably be worthwhile to have options to make NFS (and my
> printsystem) generate any port (not just reserved ones), and accept only
> reserved ports - but the default probably should be to accept any port,
> and send only reserved ports - not because reserved ports are effective
> at all, but because it'll avoid never ending questions about why NFS
> isn't working.

Sure. The questions will no longer read "why isn't NFS working". They'll
read "why can any Tom, Dick and Harry suddenly read my private email
directly from the NFS server?". 8-)

The standard AUTH_SYS/AUTH_UNIX authentication scheme only checks the
source IP address, and trusts the client 100% when it comes to supplying
the correct uid/gid/.... (see RFC1831). By placing the additional
requirement that the source must be a privileged port, one is at least
able to prevent ordinary users on an authorized client from being able
to spoof NFS requests.

Cheers,
Trond

--
Trond Myklebust <[email protected]>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-12-09 22:29:57

by Dan Stromberg

[permalink] [raw]
Subject: Re: [PATCH] xprt sharing (was Re: xprt_bindresvport)

On Thu, 2004-12-09 at 16:33 -0500, Trond Myklebust wrote:
> to den 09.12.2004 Klokka 11:34 (-0800) skreiv Dan Stromberg:
>
> > It'd probably be worthwhile to have options to make NFS (and my
> > printsystem) generate any port (not just reserved ones), and accept only
> > reserved ports - but the default probably should be to accept any port,
> > and send only reserved ports - not because reserved ports are effective
> > at all, but because it'll avoid never ending questions about why NFS
> > isn't working.
>
> Sure. The questions will no longer read "why isn't NFS working". They'll
> read "why can any Tom, Dick and Harry suddenly read my private email
> directly from the NFS server?". 8-)
>
> The standard AUTH_SYS/AUTH_UNIX authentication scheme only checks the
> source IP address, and trusts the client 100% when it comes to supplying
> the correct uid/gid/.... (see RFC1831). By placing the additional
> requirement that the source must be a privileged port, one is at least
> able to prevent ordinary users on an authorized client from being able
> to spoof NFS requests.
>
> Cheers,
> Trond

I bow before your superior geekiness. :)



Attachments:
signature.asc (189.00 B)
This is a digitally signed message part