Different namespace application might require different maximal number
of TCP sockets independently of the host.
Signed-off-by: Haishuang Yan <[email protected]>
---
include/net/netns/ipv4.h | 1 +
include/net/tcp.h | 5 +++--
net/ipv4/sysctl_net_ipv4.c | 14 +++++++-------
net/ipv4/tcp.c | 3 ---
net/ipv4/tcp_input.c | 1 -
net/ipv4/tcp_ipv4.c | 1 +
6 files changed, 12 insertions(+), 13 deletions(-)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 20d061c..305e031 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -127,6 +127,7 @@ struct netns_ipv4 {
int sysctl_tcp_timestamps;
struct inet_timewait_death_row tcp_death_row;
int sysctl_max_syn_backlog;
+ int sysctl_tcp_max_orphans;
#ifdef CONFIG_NET_L3_MASTER_DEV
int sysctl_udp_l3mdev_accept;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index b510f28..ac2d998 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -320,10 +320,11 @@ static inline bool tcp_too_many_orphans(struct sock *sk, int shift)
{
struct percpu_counter *ocp = sk->sk_prot->orphan_count;
int orphans = percpu_counter_read_positive(ocp);
+ int tcp_max_orphans = sock_net(sk)->ipv4.sysctl_tcp_max_orphans;
- if (orphans << shift > sysctl_tcp_max_orphans) {
+ if (orphans << shift > tcp_max_orphans) {
orphans = percpu_counter_sum_positive(ocp);
- if (orphans << shift > sysctl_tcp_max_orphans)
+ if (orphans << shift > tcp_max_orphans)
return true;
}
return false;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 0d3c038..4f26c8d3 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -394,13 +394,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
.proc_handler = proc_dointvec
},
{
- .procname = "tcp_max_orphans",
- .data = &sysctl_tcp_max_orphans,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec
- },
- {
.procname = "tcp_fastopen",
.data = &sysctl_tcp_fastopen,
.maxlen = sizeof(int),
@@ -1085,6 +1078,13 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
.mode = 0644,
.proc_handler = proc_dointvec
},
+ {
+ .procname = "tcp_max_orphans",
+ .data = &init_net.ipv4.sysctl_tcp_max_orphans,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
#ifdef CONFIG_IP_ROUTE_MULTIPATH
{
.procname = "fib_multipath_use_neigh",
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5091402..39187ac 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3522,9 +3522,6 @@ void __init tcp_init(void)
}
- cnt = tcp_hashinfo.ehash_mask + 1;
- sysctl_tcp_max_orphans = cnt / 2;
-
tcp_init_mem();
/* Set per-socket limits to no more than 1/128 the pressure threshold */
limit = nr_free_buffer_pages() << (PAGE_SHIFT - 7);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c5d7656..0230509 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -88,7 +88,6 @@
int sysctl_tcp_stdurg __read_mostly;
int sysctl_tcp_rfc1337 __read_mostly;
-int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
int sysctl_tcp_frto __read_mostly = 2;
int sysctl_tcp_min_rtt_wlen __read_mostly = 300;
int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a63486a..4b17a91 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2468,6 +2468,7 @@ static int __net_init tcp_sk_init(struct net *net)
net->ipv4.tcp_death_row.hashinfo = &tcp_hashinfo;
net->ipv4.sysctl_max_syn_backlog = max(128, cnt / 256);
+ net->ipv4.sysctl_tcp_max_orphans = cnt / 2;
net->ipv4.sysctl_tcp_sack = 1;
net->ipv4.sysctl_tcp_window_scaling = 1;
net->ipv4.sysctl_tcp_timestamps = 1;
--
1.8.3.1
On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
<[email protected]> wrote:
> Different namespace application might require different maximal number
> of TCP sockets independently of the host.
So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
in a whole system, right? This just makes OOM easier to trigger.
> On 2017??9??9??, at ????6:13, Cong Wang <[email protected]> wrote:
>
> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
> <[email protected]> wrote:
>> Different namespace application might require different maximal number
>> of TCP sockets independently of the host.
>
> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
> in a whole system, right? This just makes OOM easier to trigger.
>
>From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
+ ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <[email protected]> wrote:
>
>
>> On 2017年9月9日, at 上午6:13, Cong Wang <[email protected]> wrote:
>>
>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>> <[email protected]> wrote:
>>> Different namespace application might require different maximal number
>>> of TCP sockets independently of the host.
>>
>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>> in a whole system, right? This just makes OOM easier to trigger.
>>
>
> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
Nope, by N I mean the number of containers. Before your patch, the limit
is global, after your patch it is per container.
> On 2017??9??9??, at ????12:35, Cong Wang <[email protected]> wrote:
>
> On Fri, Sep 8, 2017 at 6:25 PM, ?Ϻ?˫ <[email protected]> wrote:
>>
>>
>>> On 2017??9??9??, at ????6:13, Cong Wang <[email protected]> wrote:
>>>
>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>>> <[email protected]> wrote:
>>>> Different namespace application might require different maximal number
>>>> of TCP sockets independently of the host.
>>>
>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>>> in a whole system, right? This just makes OOM easier to trigger.
>>>
>>
>> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
>
> Nope, by N I mean the number of containers. Before your patch, the limit
> is global, after your patch it is per container.
>
Yeah, for example, if there is N containers, before the patch, I mean the limit is:
N * net->ipv4.sysctl_tcp_max_orphans
After the patch, the limit is:
ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + ??
From: 严海双 <[email protected]>
Date: Sat, 9 Sep 2017 13:09:57 +0800
>
>
>> On 2017年9月9日, at 下午12:35, Cong Wang <[email protected]> wrote:
>>
>> On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <[email protected]> wrote:
>>>
>>>
>>>> On 2017年9月9日, at 上午6:13, Cong Wang <[email protected]> wrote:
>>>>
>>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>>>> <[email protected]> wrote:
>>>>> Different namespace application might require different maximal number
>>>>> of TCP sockets independently of the host.
>>>>
>>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>>>> in a whole system, right? This just makes OOM easier to trigger.
>>>>
>>>
>>> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
>>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
>>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
>>
>> Nope, by N I mean the number of containers. Before your patch, the limit
>> is global, after your patch it is per container.
>>
>
> Yeah, for example, if there is N containers, before the patch, I mean the limit is:
>
> N * net->ipv4.sysctl_tcp_max_orphans
>
> After the patch, the limit is:
>
> ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + …
Not true.
Please remove "N" from your equation of the current situation.
"sysctl_tcp_max_orphans" applies to entire system, it is a global limit,
comparing one limit against all orphans in the system, there is no N.
> On 2017??9??9??, at ????1:16, David Miller <[email protected]> wrote:
>
> From: ?Ϻ?˫ <[email protected]>
> Date: Sat, 9 Sep 2017 13:09:57 +0800
>
>>
>>
>>> On 2017??9??9??, at ????12:35, Cong Wang <[email protected]> wrote:
>>>
>>> On Fri, Sep 8, 2017 at 6:25 PM, ?Ϻ?˫ <[email protected]> wrote:
>>>>
>>>>
>>>>> On 2017??9??9??, at ????6:13, Cong Wang <[email protected]> wrote:
>>>>>
>>>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>>>>> <[email protected]> wrote:
>>>>>> Different namespace application might require different maximal number
>>>>>> of TCP sockets independently of the host.
>>>>>
>>>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>>>>> in a whole system, right? This just makes OOM easier to trigger.
>>>>>
>>>>
>>>> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
>>>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
>>>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
>>>
>>> Nope, by N I mean the number of containers. Before your patch, the limit
>>> is global, after your patch it is per container.
>>>
>>
>> Yeah, for example, if there is N containers, before the patch, I mean the limit is:
>>
>> N * net->ipv4.sysctl_tcp_max_orphans
>>
>> After the patch, the limit is:
>>
>> ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + ??
>
> Not true.
>
> Please remove "N" from your equation of the current situation.
>
> "sysctl_tcp_max_orphans" applies to entire system, it is a global limit,
> comparing one limit against all orphans in the system, there is no N.
Yes, it??s right. I browse the source code and found that it??s a global limit,
sorry for my mistake.
Thanks David and Cong.