2019-03-06 21:16:02

by Brodie Greenfield

[permalink] [raw]
Subject: [PATCH 0/2] Make ipmr queue length configurable

We want to have some more space in our queue for processing incoming
multicast packets, so we can process more of them without dropping
them prematurely. It is useful to be able to increase this limit on
higher-spec platforms that can handle more items.

For the particular use case here at Allied Telesis, we have linux
running on our switches and routers, with support for the number of
multicast groups being increased. Basically, this queue length affects
the time taken to fully learn all of the multicast streams.

Changes in v2:
- Tidy up a few unnecessary bits of code.
- Put the sysctl inside ip multicast ifdef.
- Included the IPv6 version.

Brodie Greenfield (2):
ipmr: Make cache queue length configurable
ip6mr: Make cache queue length configurable

Documentation/networking/ip-sysctl.txt | 16 ++++++++++++++++
include/net/netns/ipv4.h | 1 +
include/net/netns/ipv6.h | 1 +
net/ipv4/af_inet.c | 1 +
net/ipv4/ipmr.c | 4 +++-
net/ipv4/sysctl_net_ipv4.c | 7 +++++++
net/ipv6/af_inet6.c | 1 +
net/ipv6/ip6mr.c | 4 +++-
net/ipv6/sysctl_net_ipv6.c | 7 +++++++
9 files changed, 40 insertions(+), 2 deletions(-)

--
2.21.0



2019-03-06 21:16:07

by Brodie Greenfield

[permalink] [raw]
Subject: [PATCH 2/2] ip6mr: Make cache queue length configurable

We want to be able to keep more spaces available in our queue for
processing incoming IPv6 multicast traffic (adding (S,G) entries) - this
lets us learn more groups faster, rather than dropping them at this stage.

Signed-off-by: Brodie Greenfield <[email protected]>
---
Documentation/networking/ip-sysctl.txt | 8 ++++++++
include/net/netns/ipv6.h | 1 +
net/ipv6/af_inet6.c | 1 +
net/ipv6/ip6mr.c | 4 +++-
net/ipv6/sysctl_net_ipv6.c | 7 +++++++
5 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 02f77e932adf..68eada3ca915 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1481,6 +1481,14 @@ skip_notify_on_dev_down - BOOLEAN
on userspace caches to track link events and evict routes.
Default: false (generate message)

+ip_mr_cache_queue_length - INTEGER
+ Limit the number of multicast packets we can have in the queue to be
+ resolved.
+ Bear in mind that when an unresolved multicast packet is received,
+ there is an O(n) traversal of the queue. This should be considered
+ if increasing.
+ Default: 10
+
IPv6 Fragmentation:

ip6frag_high_thresh - INTEGER
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index ef1ed529f33c..84b58424c799 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -46,6 +46,7 @@ struct netns_sysctl_ipv6 {
int max_hbh_opts_len;
int seg6_flowlabel;
bool skip_notify_on_dev_down;
+ unsigned int ip6_mr_cache_queue_length;
};

struct netns_ipv6 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index d99753b5e39b..6551bb63e5a2 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -856,6 +856,7 @@ static int __net_init inet6_net_init(struct net *net)
net->ipv6.sysctl.max_hbh_opts_cnt = IP6_DEFAULT_MAX_HBH_OPTS_CNT;
net->ipv6.sysctl.max_dst_opts_len = IP6_DEFAULT_MAX_DST_OPTS_LEN;
net->ipv6.sysctl.max_hbh_opts_len = IP6_DEFAULT_MAX_HBH_OPTS_LEN;
+ net->ipv6.sysctl.ip6_mr_cache_queue_length = 10;
atomic_set(&net->ipv6.fib6_sernum, 1);

err = ipv6_init_mibs(net);
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index cc01aa3f2b5e..bb445871437e 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1135,6 +1135,7 @@ static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt,
static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi,
struct sk_buff *skb, struct net_device *dev)
{
+ struct net *net = dev_net(dev);
struct mfc6_cache *c;
bool found = false;
int err;
@@ -1153,7 +1154,8 @@ static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi,
* Create a new entry if allowable
*/

- if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
+ if (atomic_read(&mrt->cache_resolve_queue_len) >=
+ net->ipv6.sysctl.ip6_mr_cache_queue_length ||
(c = ip6mr_cache_alloc_unres()) == NULL) {
spin_unlock_bh(&mfc_unres_lock);

diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index e15cd37024fd..a27299d4cc34 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
.mode = 0644,
.proc_handler = proc_dointvec
},
+ {
+ .procname = "ip6_mr_cache_queue_length",
+ .data = &init_net.ipv6.sysctl.ip6_mr_cache_queue_length,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
{ }
};

--
2.21.0


2019-03-06 21:16:12

by Brodie Greenfield

[permalink] [raw]
Subject: [PATCH 1/2] ipmr: Make cache queue length configurable

We want to be able to keep more spaces available in our queue for
processing incoming multicast traffic (adding (S,G) entries) - this lets
us learn more groups faster, rather than dropping them at this stage.

Signed-off-by: Brodie Greenfield <[email protected]>
---
Documentation/networking/ip-sysctl.txt | 8 ++++++++
include/net/netns/ipv4.h | 1 +
net/ipv4/af_inet.c | 1 +
net/ipv4/ipmr.c | 4 +++-
net/ipv4/sysctl_net_ipv4.c | 7 +++++++
5 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index acdfb5d2bcaa..02f77e932adf 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -887,6 +887,14 @@ ip_local_reserved_ports - list of comma separated ranges

Default: Empty

+ip_mr_cache_queue_length - INTEGER
+ Limit the number of multicast packets we can have in the queue to be
+ resolved.
+ Bear in mind that when an unresolved multicast packet is received,
+ there is an O(n) traversal of the queue. This should be considered
+ if increasing.
+ Default: 10
+
ip_unprivileged_port_start - INTEGER
This is a per-namespace sysctl. It defines the first
unprivileged port in the network namespace. Privileged ports
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 104a6669e344..3411d3f18d51 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -187,6 +187,7 @@ struct netns_ipv4 {
int sysctl_igmp_max_msf;
int sysctl_igmp_llm_reports;
int sysctl_igmp_qrv;
+ unsigned int sysctl_ip_mr_cache_queue_length;

struct ping_group_range ping_group_range;

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 0dfb72c46671..8e25538bdb1e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1827,6 +1827,7 @@ static __net_init int inet_init_net(struct net *net)
net->ipv4.sysctl_igmp_llm_reports = 1;
net->ipv4.sysctl_igmp_qrv = 2;

+ net->ipv4.sysctl_ip_mr_cache_queue_length = 10;
return 0;
}

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index ddbf8c9a1abb..c6a6c3e453a9 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1127,6 +1127,7 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
struct sk_buff *skb, struct net_device *dev)
{
const struct iphdr *iph = ip_hdr(skb);
+ struct net *net = dev_net(dev);
struct mfc_cache *c;
bool found = false;
int err;
@@ -1142,7 +1143,8 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,

if (!found) {
/* Create a new entry if allowable */
- if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
+ if (atomic_read(&mrt->cache_resolve_queue_len) >=
+ net->ipv4.sysctl_ip_mr_cache_queue_length ||
(c = ipmr_cache_alloc_unres()) == NULL) {
spin_unlock_bh(&mfc_unres_lock);

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index ba0fc4b18465..78ae86e8c6cb 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -784,6 +784,13 @@ static struct ctl_table ipv4_net_table[] = {
.proc_handler = proc_dointvec
},
#ifdef CONFIG_IP_MULTICAST
+ {
+ .procname = "ip_mr_cache_queue_length",
+ .data = &init_net.ipv4.sysctl_ip_mr_cache_queue_length,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
{
.procname = "igmp_qrv",
.data = &init_net.ipv4.sysctl_igmp_qrv,
--
2.21.0


2019-03-06 21:16:17

by Chris Packham

[permalink] [raw]
Subject: Re: [PATCH 2/2] ip6mr: Make cache queue length configurable

On 7/03/19 9:20 AM, Brodie Greenfield wrote:
> We want to be able to keep more spaces available in our queue for
> processing incoming IPv6 multicast traffic (adding (S,G) entries) - this
> lets us learn more groups faster, rather than dropping them at this stage.
>
> Signed-off-by: Brodie Greenfield <[email protected]>
> ---
> Documentation/networking/ip-sysctl.txt | 8 ++++++++
> include/net/netns/ipv6.h | 1 +
> net/ipv6/af_inet6.c | 1 +
> net/ipv6/ip6mr.c | 4 +++-
> net/ipv6/sysctl_net_ipv6.c | 7 +++++++
> 5 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index 02f77e932adf..68eada3ca915 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -1481,6 +1481,14 @@ skip_notify_on_dev_down - BOOLEAN
> on userspace caches to track link events and evict routes.
> Default: false (generate message)
>
> +ip_mr_cache_queue_length - INTEGER

Should be "ip6_mr_cache_queue_length" for this patch.

> + Limit the number of multicast packets we can have in the queue to be
> + resolved.
> + Bear in mind that when an unresolved multicast packet is received,
> + there is an O(n) traversal of the queue. This should be considered
> + if increasing.
> + Default: 10
> +
> IPv6 Fragmentation:
>
> ip6frag_high_thresh - INTEGER
> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> index ef1ed529f33c..84b58424c799 100644
> --- a/include/net/netns/ipv6.h
> +++ b/include/net/netns/ipv6.h
> @@ -46,6 +46,7 @@ struct netns_sysctl_ipv6 {
> int max_hbh_opts_len;
> int seg6_flowlabel;
> bool skip_notify_on_dev_down;
> + unsigned int ip6_mr_cache_queue_length;
> };
>
> struct netns_ipv6 {
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index d99753b5e39b..6551bb63e5a2 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -856,6 +856,7 @@ static int __net_init inet6_net_init(struct net *net)
> net->ipv6.sysctl.max_hbh_opts_cnt = IP6_DEFAULT_MAX_HBH_OPTS_CNT;
> net->ipv6.sysctl.max_dst_opts_len = IP6_DEFAULT_MAX_DST_OPTS_LEN;
> net->ipv6.sysctl.max_hbh_opts_len = IP6_DEFAULT_MAX_HBH_OPTS_LEN;
> + net->ipv6.sysctl.ip6_mr_cache_queue_length = 10;
> atomic_set(&net->ipv6.fib6_sernum, 1);
>
> err = ipv6_init_mibs(net);
> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
> index cc01aa3f2b5e..bb445871437e 100644
> --- a/net/ipv6/ip6mr.c
> +++ b/net/ipv6/ip6mr.c
> @@ -1135,6 +1135,7 @@ static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt,
> static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi,
> struct sk_buff *skb, struct net_device *dev)
> {
> + struct net *net = dev_net(dev);
> struct mfc6_cache *c;
> bool found = false;
> int err;
> @@ -1153,7 +1154,8 @@ static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi,
> * Create a new entry if allowable
> */
>
> - if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
> + if (atomic_read(&mrt->cache_resolve_queue_len) >=
> + net->ipv6.sysctl.ip6_mr_cache_queue_length ||
> (c = ip6mr_cache_alloc_unres()) == NULL) {
> spin_unlock_bh(&mfc_unres_lock);
>
> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
> index e15cd37024fd..a27299d4cc34 100644
> --- a/net/ipv6/sysctl_net_ipv6.c
> +++ b/net/ipv6/sysctl_net_ipv6.c
> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
> .mode = 0644,
> .proc_handler = proc_dointvec
> },
> + {
> + .procname = "ip6_mr_cache_queue_length",
> + .data = &init_net.ipv6.sysctl.ip6_mr_cache_queue_length,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec
> + },
> { }
> };
>
>


2019-03-07 15:42:31

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH 1/2] ipmr: Make cache queue length configurable

On Thu, 7 Mar 2019 09:19:55 +1300
Brodie Greenfield <[email protected]> wrote:

> +ip_mr_cache_queue_length - INTEGER
> + Limit the number of multicast packets we can have in the queue to be
> + resolved.
> + Bear in mind that when an unresolved multicast packet is received,
> + there is an O(n) traversal of the queue. This should be considered
> + if increasing.
> +

Why not make it to a unsigned value? A negative value doesn't make
much sense here.

Although other sysctl values date back to a time when Linux was
sloppy about allowing negative values, it would be good to use unsigned
now.