2007-05-12 00:01:52

by Mark Glines

[permalink] [raw]
Subject: [patch] ip_local_port_range sysctl has annoying default

On a powerpc machine (kurobox) I have here with 128M of RAM, the default
value of /proc/sys/net/ipv4/ip_local_port_range is:
2048 4999

This setting affects the port assigned to an application by default
when the application doesn't specify a port to use, like, for instance,
an outgoing connection. It affects both TCP and UDP. The default
values for this sysctl vary depending on the size of the tcp bind hash,
which in turn, varies depending on the size of the system RAM (I think).

By a one-in-a-million coincidence, this machine has a default port
range starting with 2048, and this breaks things for me. I'm trying to
run both klive and nfs on this box, but klive starts first (probably
because of the filename sort order), and claims UDP port 2049 for its
own purposes, causing the nfs server to fail to start.

If the bind hash size is over a certain threshold, the range
32768-61000 is used. If it is under a certain threshold, a range
like (1024|2048|3072)-4999 is used, depending on exactly how small it
is. Thix box happened to get the 2048-4999 range, which broke nfs.

A comment just above the code that does this says, "Try to be a bit
smarter and adjust defaults depending on available memory." "smarter"?
Maybe, maybe not. Either way, it's unexpected.

Following the principle of least astonishment, I think it seems better
to use high, out-of-the-way port numbers regardless of how much RAM the
system has. So, the following patch changes this behavior slightly.
The system still picks a dynamic range depending on the bind hash size,
but now, all ranges start with 32768. I suppose another reasonable way
to do this would be to end all ranges with 61000, or something like
that.

It also seems funny to me that this would be in tcp_init(), when it
affects both TCP and UDP. But hey, it is where it is.

Signed-off-by: Mark Glines <[email protected]>

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index bd4c295..4431b87 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2464,14 +2464,14 @@ void __init tcp_init(void)
(tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket));
order++)
;
+ sysctl_local_port_range[0] = 32768;
if (order >= 4) {
- sysctl_local_port_range[0] = 32768;
sysctl_local_port_range[1] = 61000;
tcp_death_row.sysctl_max_tw_buckets = 180000;
sysctl_tcp_max_orphans = 4096 << (order - 4);
sysctl_max_syn_backlog = 1024;
} else if (order < 3) {
- sysctl_local_port_range[0] = 1024 * (3 - order);
+ sysctl_local_port_range[1] = 32768 + (1024 * order);
tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
sysctl_tcp_max_orphans >>= (3 - order);
sysctl_max_syn_backlog = 128;


2007-05-12 00:06:41

by David Miller

[permalink] [raw]
Subject: Re: [patch] ip_local_port_range sysctl has annoying default

From: Mark Glines <[email protected]>
Date: Fri, 11 May 2007 17:01:35 -0700

> Following the principle of least astonishment, I think it seems better
> to use high, out-of-the-way port numbers regardless of how much RAM the
> system has. So, the following patch changes this behavior slightly.
> The system still picks a dynamic range depending on the bind hash size,
> but now, all ranges start with 32768. I suppose another reasonable way
> to do this would be to end all ranges with 61000, or something like
> that.

All ports above and including 1024 are non-privileged and available to
anyone.

Applications which have some requirements in this area need to work
those things out themselves.

2007-05-12 01:04:17

by Mark Glines

[permalink] [raw]
Subject: Re: [patch] ip_local_port_range sysctl has annoying default

On Sat, 12 May 2007 00:06:45 UTC
David Miller <[email protected]> wrote:
> All ports above and including 1024 are non-privileged and available to
> anyone.
>
> Applications which have some requirements in this area need to work
> those things out themselves.

Hi David,

I agree completely. My issue is that an application which doesn't care
which port it binds to (twistd, on klive's behalf) stomped on the port
of an application which cares very much about which port it binds to
(nfs). I will gladly accept *any* solution to this problem.

I agree that it would be preferable to change the port NFS decides to
bind to. If you have a patch to do this, I will happily apply it and
go on my merry way.

However, the world we live in does have port numbers exceeding 1024
listed in /etc/services. What I'd like to know is, for applications
which don't care what port they get, the kernel will assign values of
32768 and above on some machines, but not others. (Based on their bind
hash size.) Starting from 32768 seems like very sane behavior to me,
because it minimizes the chances of a collision, and (as far as I know)
doesn't cost anything. A configuration which stomps on a
not-entirely-unknown application like nfs *by default* isn't
necessarily a bug, but it is a worst case scenario, from the
perspective of a lowly user like me, who wants things to Just Work. :)

Is there a compelling reason not to assign random ports starting from
32768 everywhere regardless of their bind hash size, like my patch
attempts to do? Does it consume any extra resources to do so?

Thanks,

Mark

2007-05-12 02:12:27

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch] ip_local_port_range sysctl has annoying default

Mark Glines wrote:
>
> By a one-in-a-million coincidence, this machine has a default port
> range starting with 2048, and this breaks things for me. I'm trying to
> run both klive and nfs on this box, but klive starts first (probably
> because of the filename sort order), and claims UDP port 2049 for its
> own purposes, causing the nfs server to fail to start.
>
> If the bind hash size is over a certain threshold, the range
> 32768-61000 is used. If it is under a certain threshold, a range
> like (1024|2048|3072)-4999 is used, depending on exactly how small it
> is. Thix box happened to get the 2048-4999 range, which broke nfs.
>
> A comment just above the code that does this says, "Try to be a bit
> smarter and adjust defaults depending on available memory." "smarter"?
> Maybe, maybe not. Either way, it's unexpected.
>
> Following the principle of least astonishment, I think it seems better
> to use high, out-of-the-way port numbers regardless of how much RAM the
> system has. So, the following patch changes this behavior slightly.
> The system still picks a dynamic range depending on the bind hash size,
> but now, all ranges start with 32768. I suppose another reasonable way
> to do this would be to end all ranges with 61000, or something like
> that.
>

Yes, that would be better. The IANA recommended port range for dynamic
ports are 49152-65535; Linux extends this to 32768 and chops off some of
the really high ports, but keeping them in the high range is thus the
right thing to do.

-hpa

2007-05-12 02:14:46

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch] ip_local_port_range sysctl has annoying default

David Miller wrote:
>
> All ports above and including 1024 are non-privileged and available to
> anyone.
>
> Applications which have some requirements in this area need to work
> those things out themselves.

However, there are a large number of applications which have registered
ports in this range.

-hpa

2007-05-12 03:18:19

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: [patch] ip_local_port_range sysctl has annoying default

In article <[email protected]> you wrote:
> However, there are a large number of applications which have registered
> ports in this range.

And some application who request random listening ports actually query the
/etc/services file to ensure it is a "unnamed" port.

Gruss
Bernd

2007-05-12 19:10:27

by Mark Glines

[permalink] [raw]
Subject: Re: [patch] ip_local_port_range sysctl has annoying default

On Fri, 11 May 2007 19:12:15 -0700
"H. Peter Anvin" <[email protected]> wrote:
> > Following the principle of least astonishment, I think it seems
> > better to use high, out-of-the-way port numbers regardless of how
> > much RAM the system has. So, the following patch changes this
> > behavior slightly. The system still picks a dynamic range depending
> > on the bind hash size, but now, all ranges start with 32768. I
> > suppose another reasonable way to do this would be to end all
> > ranges with 61000, or something like that.
> >
>
> Yes, that would be better. The IANA recommended port range for
> dynamic ports are 49152-65535; Linux extends this to 32768 and chops
> off some of the really high ports, but keeping them in the high range
> is thus the right thing to do.

Well, in that case, is there anything wrong with just using the
range IANA recommends, in all cases?

Please consider this patch instead of my previous one.

Signed-off-by: Mark Glines <[email protected]>

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 43fb160..b04b167 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -34,7 +34,7 @@ EXPORT_SYMBOL(inet_csk_timer_bug_msg);
* For high-usage systems, use sysctl to change this to
* 32768-61000
*/
-int sysctl_local_port_range[2] = { 1024, 4999 };
+int sysctl_local_port_range[2] = { 49152, 65535 };

int inet_csk_bind_conflict(const struct sock *sk,
const struct inet_bind_bucket *tb)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index bd4c295..33ef0e7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2465,13 +2465,10 @@ void __init tcp_init(void)
order++)
;
if (order >= 4) {
- sysctl_local_port_range[0] = 32768;
- sysctl_local_port_range[1] = 61000;
tcp_death_row.sysctl_max_tw_buckets = 180000;
sysctl_tcp_max_orphans = 4096 << (order - 4);
sysctl_max_syn_backlog = 1024;
} else if (order < 3) {
- sysctl_local_port_range[0] = 1024 * (3 - order);
tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
sysctl_tcp_max_orphans >>= (3 - order);
sysctl_max_syn_backlog = 128;

2007-05-12 19:12:49

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [patch] ip_local_port_range sysctl has annoying default

Mark Glines wrote:
>
> Well, in that case, is there anything wrong with just using the
> range IANA recommends, in all cases?
>

I think the IANA range is considered too small in most cases; I suspect
there is also a feeling that "there be dragons" near the very top.

-hpa

2007-05-12 19:15:37

by Alan

[permalink] [raw]
Subject: Re: [patch] ip_local_port_range sysctl has annoying default

> Well, in that case, is there anything wrong with just using the
> range IANA recommends, in all cases?
>
> Please consider this patch instead of my previous one.

Please send this patch to the netdev list and CC the relevant networking
maintainer.

Alan

2007-05-12 19:31:12

by Mark Glines

[permalink] [raw]
Subject: Re: [patch] ip_local_port_range sysctl has annoying default

On Sat, 12 May 2007 12:12:38 -0700
"H. Peter Anvin" <[email protected]> wrote:

> Mark Glines wrote:
> >
> > Well, in that case, is there anything wrong with just using the
> > range IANA recommends, in all cases?
> >
>
> I think the IANA range is considered too small in most cases; I
> suspect there is also a feeling that "there be dragons" near the very
> top.

Ok, thanks for the explanation. Sounds like we're using high port
numbers in the "spirit" of the IANA recommendation, without using
their actual numbers.

I still haven't gotten an answer to this: is there a performance
issue (or memory usage or security or something) with using the same
port range in all cases, even on memory-constrained systems? And if
there is, can't we *still* use big numbers, even if the range isn't
as wide?

If there's no reason not to (security, resource consumption,
whatever), I think it would be an improvement to use high, out of the
way port numbering in all cases. (Especially since the kernel already
does this on most of my machines, anyway.)

There was a comment in there about how 32768-61000 should be used on
high-use systems; is there a drawback to just using this range
*everywhere*? (It's already the default in non-memory-constrained
cases, because of what tcp_init() was doing.)

Thanks,

Signed-off-by: Mark Glines <[email protected]>

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 43fb160..12d9ddc 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -29,12 +29,7 @@ const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n";
EXPORT_SYMBOL(inet_csk_timer_bug_msg);
#endif

-/*
- * This array holds the first and last local port number.
- * For high-usage systems, use sysctl to change this to
- * 32768-61000
- */
-int sysctl_local_port_range[2] = { 1024, 4999 };
+int sysctl_local_port_range[2] = { 32768, 61000 };

int inet_csk_bind_conflict(const struct sock *sk,
const struct inet_bind_bucket *tb)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index bd4c295..33ef0e7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2465,13 +2465,10 @@ void __init tcp_init(void)
order++)
;
if (order >= 4) {
- sysctl_local_port_range[0] = 32768;
- sysctl_local_port_range[1] = 61000;
tcp_death_row.sysctl_max_tw_buckets = 180000;
sysctl_tcp_max_orphans = 4096 << (order - 4);
sysctl_max_syn_backlog = 1024;
} else if (order < 3) {
- sysctl_local_port_range[0] = 1024 * (3 - order);
tcp_death_row.sysctl_max_tw_buckets >>= (3 - order);
sysctl_tcp_max_orphans >>= (3 - order);
sysctl_max_syn_backlog = 128;

2007-05-12 20:04:16

by Alan

[permalink] [raw]
Subject: Re: [patch] ip_local_port_range sysctl has annoying default

> > I think the IANA range is considered too small in most cases; I
> > suspect there is also a feeling that "there be dragons" near the very
> > top.
>
> Ok, thanks for the explanation. Sounds like we're using high port
> numbers in the "spirit" of the IANA recommendation, without using
> their actual numbers.

The top space is reserved when using masquerading and used for the
masquerading ports normally in that situation. Clipping them off avoids
differing behaviour with masquerading on/off.

Alan

2007-05-14 20:22:20

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [patch] ip_local_port_range sysctl has annoying default


On May 11 2007 19:14, H. Peter Anvin wrote:
>David Miller wrote:
>>
>> All ports above and including 1024 are non-privileged and available to
>> anyone.
>>
>> Applications which have some requirements in this area need to work
>> those things out themselves.
>
>However, there are a large number of applications which have registered
>ports in this range.

For more reference material, check up on http://lkml.org/lkml/2007/1/24/258
it is/was basically the same issue (multiple privileged programs fighting
their way in the 'lower half' @ 512-1023)


Jan
--