2019-07-25 20:43:41

by Brodie Greenfield

[permalink] [raw]
Subject: [PATCH 1/2] ipmr: Make cache queue length configurable

We want to be able to keep more spaces available in our queue for
processing incoming multicast traffic (adding (S,G) entries) - this lets
us learn more groups faster, rather than dropping them at this stage.

Signed-off-by: Brodie Greenfield <[email protected]>
---
Documentation/networking/ip-sysctl.txt | 8 ++++++++
include/net/netns/ipv4.h | 1 +
net/ipv4/af_inet.c | 1 +
net/ipv4/ipmr.c | 4 +++-
net/ipv4/sysctl_net_ipv4.c | 7 +++++++
5 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index acdfb5d2bcaa..02f77e932adf 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -887,6 +887,14 @@ ip_local_reserved_ports - list of comma separated ranges

Default: Empty

+ip_mr_cache_queue_length - INTEGER
+ Limit the number of multicast packets we can have in the queue to be
+ resolved.
+ Bear in mind that when an unresolved multicast packet is received,
+ there is an O(n) traversal of the queue. This should be considered
+ if increasing.
+ Default: 10
+
ip_unprivileged_port_start - INTEGER
This is a per-namespace sysctl. It defines the first
unprivileged port in the network namespace. Privileged ports
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 104a6669e344..3411d3f18d51 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -187,6 +187,7 @@ struct netns_ipv4 {
int sysctl_igmp_max_msf;
int sysctl_igmp_llm_reports;
int sysctl_igmp_qrv;
+ unsigned int sysctl_ip_mr_cache_queue_length;

struct ping_group_range ping_group_range;

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 0dfb72c46671..8e25538bdb1e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1827,6 +1827,7 @@ static __net_init int inet_init_net(struct net *net)
net->ipv4.sysctl_igmp_llm_reports = 1;
net->ipv4.sysctl_igmp_qrv = 2;

+ net->ipv4.sysctl_ip_mr_cache_queue_length = 10;
return 0;
}

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index ddbf8c9a1abb..c6a6c3e453a9 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1127,6 +1127,7 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
struct sk_buff *skb, struct net_device *dev)
{
const struct iphdr *iph = ip_hdr(skb);
+ struct net *net = dev_net(dev);
struct mfc_cache *c;
bool found = false;
int err;
@@ -1142,7 +1143,8 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,

if (!found) {
/* Create a new entry if allowable */
- if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
+ if (atomic_read(&mrt->cache_resolve_queue_len) >=
+ net->ipv4.sysctl_ip_mr_cache_queue_length ||
(c = ipmr_cache_alloc_unres()) == NULL) {
spin_unlock_bh(&mfc_unres_lock);

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index ba0fc4b18465..78ae86e8c6cb 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -784,6 +784,13 @@ static struct ctl_table ipv4_net_table[] = {
.proc_handler = proc_dointvec
},
#ifdef CONFIG_IP_MULTICAST
+ {
+ .procname = "ip_mr_cache_queue_length",
+ .data = &init_net.ipv4.sysctl_ip_mr_cache_queue_length,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
{
.procname = "igmp_qrv",
.data = &init_net.ipv4.sysctl_igmp_qrv,
--
2.21.0



2019-07-26 10:11:26

by Stephen Suryaputra

[permalink] [raw]
Subject: Re: [PATCH 1/2] ipmr: Make cache queue length configurable

On Fri, Jul 26, 2019 at 08:42:29AM +1200, Brodie Greenfield wrote:
> We want to be able to keep more spaces available in our queue for
> processing incoming multicast traffic (adding (S,G) entries) - this lets
> us learn more groups faster, rather than dropping them at this stage.
>
> Signed-off-by: Brodie Greenfield <[email protected]>

Our system can use this. The patch applied cleanly to my net-next
sandbox. Thank you.

Reviewed-by: Stephen Suryaputra <[email protected]>

2019-07-26 11:17:12

by Nikolay Aleksandrov

[permalink] [raw]
Subject: Re: [PATCH 1/2] ipmr: Make cache queue length configurable

On 26/07/2019 14:05, Nikolay Aleksandrov wrote:
> On 25/07/2019 23:42, Brodie Greenfield wrote:
>> We want to be able to keep more spaces available in our queue for
>> processing incoming multicast traffic (adding (S,G) entries) - this lets
>> us learn more groups faster, rather than dropping them at this stage.
>>
>> Signed-off-by: Brodie Greenfield <[email protected]>
>> ---
>> Documentation/networking/ip-sysctl.txt | 8 ++++++++
>> include/net/netns/ipv4.h | 1 +
>> net/ipv4/af_inet.c | 1 +
>> net/ipv4/ipmr.c | 4 +++-
>> net/ipv4/sysctl_net_ipv4.c | 7 +++++++
>> 5 files changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
>> index acdfb5d2bcaa..02f77e932adf 100644
>> --- a/Documentation/networking/ip-sysctl.txt
>> +++ b/Documentation/networking/ip-sysctl.txt
>> @@ -887,6 +887,14 @@ ip_local_reserved_ports - list of comma separated ranges
>>
>> Default: Empty
>>
>> +ip_mr_cache_queue_length - INTEGER
>> + Limit the number of multicast packets we can have in the queue to be
>> + resolved.
>> + Bear in mind that when an unresolved multicast packet is received,
>> + there is an O(n) traversal of the queue. This should be considered
>> + if increasing.
>> + Default: 10
>> +
>
> Hi,
> You've said it yourself - it has linear traversal time, but doesn't this patch allow any netns on the
> system to increase its limit to any value, thus possibly affecting others ?
> Though the socket limit will kick in at some point. I think that's where David
> was going with his suggestion back in 2018:
> https://www.spinics.net/lists/netdev/msg514543.html
>
> If we add this sysctl now, we'll be stuck with it. I'd prefer David's suggestion
> so we can rely only on the receive queue queue limit which is already configurable.
> We still need to be careful with the defaults though, the NOCACHE entry is 128 bytes
> and with the skb overhead currently on my setup we end up at about 277 entries default limit.

I mean that people might be surprised if they increased that limit by default, that's the
only problem I'm not sure how to handle. Maybe we need some hard limit anyway.
Have you done any tests what value works for your setup ?

In the end we might have to go with this patch, but perhaps limit the per-netns sysctl
to the init_ns value as maximum (similar to what we did for frags) or don't make it per-netns
at all.

>
> Cheers,
> Nik
>
>> ip_unprivileged_port_start - INTEGER
>> This is a per-namespace sysctl. It defines the first
>> unprivileged port in the network namespace. Privileged ports
>> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
>> index 104a6669e344..3411d3f18d51 100644
>> --- a/include/net/netns/ipv4.h
>> +++ b/include/net/netns/ipv4.h
>> @@ -187,6 +187,7 @@ struct netns_ipv4 {
>> int sysctl_igmp_max_msf;
>> int sysctl_igmp_llm_reports;
>> int sysctl_igmp_qrv;
>> + unsigned int sysctl_ip_mr_cache_queue_length;
>>
>> struct ping_group_range ping_group_range;
>>
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index 0dfb72c46671..8e25538bdb1e 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -1827,6 +1827,7 @@ static __net_init int inet_init_net(struct net *net)
>> net->ipv4.sysctl_igmp_llm_reports = 1;
>> net->ipv4.sysctl_igmp_qrv = 2;
>>
>> + net->ipv4.sysctl_ip_mr_cache_queue_length = 10;
>> return 0;
>> }
>>
>> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
>> index ddbf8c9a1abb..c6a6c3e453a9 100644
>> --- a/net/ipv4/ipmr.c
>> +++ b/net/ipv4/ipmr.c
>> @@ -1127,6 +1127,7 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
>> struct sk_buff *skb, struct net_device *dev)
>> {
>> const struct iphdr *iph = ip_hdr(skb);
>> + struct net *net = dev_net(dev);
>> struct mfc_cache *c;
>> bool found = false;
>> int err;
>> @@ -1142,7 +1143,8 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
>>
>> if (!found) {
>> /* Create a new entry if allowable */
>> - if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
>> + if (atomic_read(&mrt->cache_resolve_queue_len) >=
>> + net->ipv4.sysctl_ip_mr_cache_queue_length ||
>> (c = ipmr_cache_alloc_unres()) == NULL) {
>> spin_unlock_bh(&mfc_unres_lock);
>>
>> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
>> index ba0fc4b18465..78ae86e8c6cb 100644
>> --- a/net/ipv4/sysctl_net_ipv4.c
>> +++ b/net/ipv4/sysctl_net_ipv4.c
>> @@ -784,6 +784,13 @@ static struct ctl_table ipv4_net_table[] = {
>> .proc_handler = proc_dointvec
>> },
>> #ifdef CONFIG_IP_MULTICAST
>> + {
>> + .procname = "ip_mr_cache_queue_length",
>> + .data = &init_net.ipv4.sysctl_ip_mr_cache_queue_length,
>> + .maxlen = sizeof(int),
>> + .mode = 0644,
>> + .proc_handler = proc_dointvec
>> + },
>> {
>> .procname = "igmp_qrv",
>> .data = &init_net.ipv4.sysctl_igmp_qrv,
>>
>


2019-07-26 11:29:39

by Nikolay Aleksandrov

[permalink] [raw]
Subject: Re: [PATCH 1/2] ipmr: Make cache queue length configurable

On 25/07/2019 23:42, Brodie Greenfield wrote:
> We want to be able to keep more spaces available in our queue for
> processing incoming multicast traffic (adding (S,G) entries) - this lets
> us learn more groups faster, rather than dropping them at this stage.
>
> Signed-off-by: Brodie Greenfield <[email protected]>
> ---
> Documentation/networking/ip-sysctl.txt | 8 ++++++++
> include/net/netns/ipv4.h | 1 +
> net/ipv4/af_inet.c | 1 +
> net/ipv4/ipmr.c | 4 +++-
> net/ipv4/sysctl_net_ipv4.c | 7 +++++++
> 5 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index acdfb5d2bcaa..02f77e932adf 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -887,6 +887,14 @@ ip_local_reserved_ports - list of comma separated ranges
>
> Default: Empty
>
> +ip_mr_cache_queue_length - INTEGER
> + Limit the number of multicast packets we can have in the queue to be
> + resolved.
> + Bear in mind that when an unresolved multicast packet is received,
> + there is an O(n) traversal of the queue. This should be considered
> + if increasing.
> + Default: 10
> +

Hi,
You've said it yourself - it has linear traversal time, but doesn't this patch allow any netns on the
system to increase its limit to any value, thus possibly affecting others ?
Though the socket limit will kick in at some point. I think that's where David
was going with his suggestion back in 2018:
https://www.spinics.net/lists/netdev/msg514543.html

If we add this sysctl now, we'll be stuck with it. I'd prefer David's suggestion
so we can rely only on the receive queue queue limit which is already configurable.
We still need to be careful with the defaults though, the NOCACHE entry is 128 bytes
and with the skb overhead currently on my setup we end up at about 277 entries default limit.

Cheers,
Nik

> ip_unprivileged_port_start - INTEGER
> This is a per-namespace sysctl. It defines the first
> unprivileged port in the network namespace. Privileged ports
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index 104a6669e344..3411d3f18d51 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -187,6 +187,7 @@ struct netns_ipv4 {
> int sysctl_igmp_max_msf;
> int sysctl_igmp_llm_reports;
> int sysctl_igmp_qrv;
> + unsigned int sysctl_ip_mr_cache_queue_length;
>
> struct ping_group_range ping_group_range;
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 0dfb72c46671..8e25538bdb1e 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1827,6 +1827,7 @@ static __net_init int inet_init_net(struct net *net)
> net->ipv4.sysctl_igmp_llm_reports = 1;
> net->ipv4.sysctl_igmp_qrv = 2;
>
> + net->ipv4.sysctl_ip_mr_cache_queue_length = 10;
> return 0;
> }
>
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index ddbf8c9a1abb..c6a6c3e453a9 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -1127,6 +1127,7 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
> struct sk_buff *skb, struct net_device *dev)
> {
> const struct iphdr *iph = ip_hdr(skb);
> + struct net *net = dev_net(dev);
> struct mfc_cache *c;
> bool found = false;
> int err;
> @@ -1142,7 +1143,8 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
>
> if (!found) {
> /* Create a new entry if allowable */
> - if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
> + if (atomic_read(&mrt->cache_resolve_queue_len) >=
> + net->ipv4.sysctl_ip_mr_cache_queue_length ||
> (c = ipmr_cache_alloc_unres()) == NULL) {
> spin_unlock_bh(&mfc_unres_lock);
>
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index ba0fc4b18465..78ae86e8c6cb 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -784,6 +784,13 @@ static struct ctl_table ipv4_net_table[] = {
> .proc_handler = proc_dointvec
> },
> #ifdef CONFIG_IP_MULTICAST
> + {
> + .procname = "ip_mr_cache_queue_length",
> + .data = &init_net.ipv4.sysctl_ip_mr_cache_queue_length,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec
> + },
> {
> .procname = "igmp_qrv",
> .data = &init_net.ipv4.sysctl_igmp_qrv,
>


2019-07-27 17:04:34

by Stephen Suryaputra

[permalink] [raw]
Subject: Re: [PATCH 1/2] ipmr: Make cache queue length configurable

On Fri, Jul 26, 2019 at 7:18 AM Nikolay Aleksandrov
<[email protected]> wrote:

> > You've said it yourself - it has linear traversal time, but doesn't this patch allow any netns on the
> > system to increase its limit to any value, thus possibly affecting others ?
> > Though the socket limit will kick in at some point. I think that's where David
> > was going with his suggestion back in 2018:
> > https://www.spinics.net/lists/netdev/msg514543.html
> >
> > If we add this sysctl now, we'll be stuck with it. I'd prefer David's suggestion
> > so we can rely only on the receive queue queue limit which is already configurable.
> > We still need to be careful with the defaults though, the NOCACHE entry is 128 bytes
> > and with the skb overhead currently on my setup we end up at about 277 entries default limit.
>
> I mean that people might be surprised if they increased that limit by default, that's the
> only problem I'm not sure how to handle. Maybe we need some hard limit anyway.
> Have you done any tests what value works for your setup ?

FYI: for ours, it is 2048.