2019-06-06 12:38:41

by Christian Brauner

[permalink] [raw]
Subject: [PATCH RESEND net-next 0/2] br_netfilter: enable in non-initial netns

Hey everyone,

This is another resend of the same patch series. I have received so many
requests, pings, and questions that I would really like to push for this
again.

Over time I have seen multiple reports by users who want to run applications
(Kubernetes e.g. via [1]) that require the br_netfilter module in
non-initial network namespaces. There are *a lot* of issues for this. A
shortlist including ChromeOS and other big users is found below under
[2]! Even non-devs already tried to get more traction on this by
commenting on the patchset (cf. [3]).

Currently, the /proc/sys/net/bridge folder is only created in the
initial network namespace. This patch series ensures that the
/proc/sys/net/bridge folder is available in each network namespace if
the module is loaded and disappears from all network namespaces when the
module is unloaded.
The patch series also makes the sysctls:

bridge-nf-call-arptables
bridge-nf-call-ip6tables
bridge-nf-call-iptables
bridge-nf-filter-pppoe-tagged
bridge-nf-filter-vlan-tagged
bridge-nf-pass-vlan-input-dev

apply per network namespace. This unblocks some use-cases where users
would like to e.g. not do bridge filtering for bridges in a specific
network namespace while doing so for bridges located in another network
namespace.
The netfilter rules are afaict already per network namespace so it
should be safe for users to specify whether a bridge device inside their
network namespace is supposed to go through iptables et al. or not.
Also, this can already be done by setting an option for each individual
bridge via Netlink. It should also be possible to do this for all
bridges in a network namespace via sysctls.

Thanks!
Christian

[1]: https://github.com/zimmertr/Bootstrap-Kubernetes-with-Ansible
[2]: https://bugs.chromium.org/p/chromium/issues/detail?id=878034
https://github.com/lxc/lxd/issues/5193
https://discuss.linuxcontainers.org/t/bridge-nf-call-iptables-and-swap-error-on-lxd-with-kubeadm/2204
https://github.com/lxc/lxd/issues/3306
https://gitlab.com/gitlab-org/gitlab-runner/issues/3705
https://ubuntuforums.org/showthread.php?t=2415032
https://medium.com/@thomaszimmerman93/hi-im-unable-to-get-kubeadm-init-to-run-due-to-br-netfilter-not-being-loaded-within-the-5642a4ccfece
[3]: https://lkml.org/lkml/2019/3/7/365

Christian Brauner (2):
br_netfilter: add struct netns_brnf
br_netfilter: namespace bridge netfilter sysctls

include/net/net_namespace.h | 3 +
include/net/netfilter/br_netfilter.h | 3 +-
include/net/netns/netfilter.h | 16 +++
net/bridge/br_netfilter_hooks.c | 166 ++++++++++++++++++---------
net/bridge/br_netfilter_ipv6.c | 2 +-
5 files changed, 134 insertions(+), 56 deletions(-)

--
2.21.0


2019-06-06 12:39:03

by Christian Brauner

[permalink] [raw]
Subject: [PATCH RESEND net-next 2/2] br_netfilter: namespace bridge netfilter sysctls

Currently, the /proc/sys/net/bridge folder is only created in the initial
network namespace. This patch ensures that the /proc/sys/net/bridge folder
is available in each network namespace if the module is loaded and
disappears from all network namespaces when the module is unloaded.

In doing so the patch makes the sysctls:

bridge-nf-call-arptables
bridge-nf-call-ip6tables
bridge-nf-call-iptables
bridge-nf-filter-pppoe-tagged
bridge-nf-filter-vlan-tagged
bridge-nf-pass-vlan-input-dev

apply per network namespace. This unblocks some use-cases where users would
like to e.g. not do bridge filtering for bridges in a specific network
namespace while doing so for bridges located in another network namespace.

The netfilter rules are afaict already per network namespace so it should
be safe for users to specify whether bridge devices inside a network
namespace are supposed to go through iptables et al. or not. Also, this can
already be done per-bridge by setting an option for each individual bridge
via Netlink. It should also be possible to do this for all bridges in a
network namespace via sysctls.

Signed-off-by: Christian Brauner <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
---
include/net/netfilter/br_netfilter.h | 3 +-
net/bridge/br_netfilter_hooks.c | 116 ++++++++++++++++++++-------
net/bridge/br_netfilter_ipv6.c | 2 +-
3 files changed, 91 insertions(+), 30 deletions(-)

diff --git a/include/net/netfilter/br_netfilter.h b/include/net/netfilter/br_netfilter.h
index 89808ce293c4..302fcd3aade2 100644
--- a/include/net/netfilter/br_netfilter.h
+++ b/include/net/netfilter/br_netfilter.h
@@ -42,7 +42,8 @@ static inline struct rtable *bridge_parent_rtable(const struct net_device *dev)
return port ? &port->br->fake_rtable : NULL;
}

-struct net_device *setup_pre_routing(struct sk_buff *skb);
+struct net_device *setup_pre_routing(struct sk_buff *skb,
+ const struct net *net);

#if IS_ENABLED(CONFIG_IPV6)
int br_validate_ipv6(struct net *net, struct sk_buff *skb);
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index b51c6b49fc6f..02960259e51b 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -68,17 +68,17 @@ static inline __be16 vlan_proto(const struct sk_buff *skb)
return 0;
}

-#define IS_VLAN_IP(skb) \
+#define IS_VLAN_IP(skb, net) \
(vlan_proto(skb) == htons(ETH_P_IP) && \
- init_net.brnf.filter_vlan_tagged)
+ net->brnf.filter_vlan_tagged)

-#define IS_VLAN_IPV6(skb) \
+#define IS_VLAN_IPV6(skb, net) \
(vlan_proto(skb) == htons(ETH_P_IPV6) && \
- init_net.brnf.filter_vlan_tagged)
+ net->brnf.filter_vlan_tagged)

-#define IS_VLAN_ARP(skb) \
+#define IS_VLAN_ARP(skb, net) \
(vlan_proto(skb) == htons(ETH_P_ARP) && \
- init_net.brnf.filter_vlan_tagged)
+ net->brnf.filter_vlan_tagged)

static inline __be16 pppoe_proto(const struct sk_buff *skb)
{
@@ -86,15 +86,15 @@ static inline __be16 pppoe_proto(const struct sk_buff *skb)
sizeof(struct pppoe_hdr)));
}

-#define IS_PPPOE_IP(skb) \
+#define IS_PPPOE_IP(skb, net) \
(skb->protocol == htons(ETH_P_PPP_SES) && \
pppoe_proto(skb) == htons(PPP_IP) && \
- init_net.brnf.filter_pppoe_tagged)
+ net->brnf.filter_pppoe_tagged)

-#define IS_PPPOE_IPV6(skb) \
+#define IS_PPPOE_IPV6(skb, net) \
(skb->protocol == htons(ETH_P_PPP_SES) && \
pppoe_proto(skb) == htons(PPP_IPV6) && \
- init_net.brnf.filter_pppoe_tagged)
+ net->brnf.filter_pppoe_tagged)

/* largest possible L2 header, see br_nf_dev_queue_xmit() */
#define NF_BRIDGE_MAX_MAC_HEADER_LENGTH (PPPOE_SES_HLEN + ETH_HLEN)
@@ -391,12 +391,14 @@ static int br_nf_pre_routing_finish(struct net *net, struct sock *sk, struct sk_
return 0;
}

-static struct net_device *brnf_get_logical_dev(struct sk_buff *skb, const struct net_device *dev)
+static struct net_device *brnf_get_logical_dev(struct sk_buff *skb,
+ const struct net_device *dev,
+ const struct net *net)
{
struct net_device *vlan, *br;

br = bridge_parent(dev);
- if (init_net.brnf.pass_vlan_indev == 0 || !skb_vlan_tag_present(skb))
+ if (net->brnf.pass_vlan_indev == 0 || !skb_vlan_tag_present(skb))
return br;

vlan = __vlan_find_dev_deep_rcu(br, skb->vlan_proto,
@@ -406,7 +408,7 @@ static struct net_device *brnf_get_logical_dev(struct sk_buff *skb, const struct
}

/* Some common code for IPv4/IPv6 */
-struct net_device *setup_pre_routing(struct sk_buff *skb)
+struct net_device *setup_pre_routing(struct sk_buff *skb, const struct net *net)
{
struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);

@@ -417,7 +419,7 @@ struct net_device *setup_pre_routing(struct sk_buff *skb)

nf_bridge->in_prerouting = 1;
nf_bridge->physindev = skb->dev;
- skb->dev = brnf_get_logical_dev(skb, skb->dev);
+ skb->dev = brnf_get_logical_dev(skb, skb->dev, net);

if (skb->protocol == htons(ETH_P_8021Q))
nf_bridge->orig_proto = BRNF_PROTO_8021Q;
@@ -452,8 +454,9 @@ static unsigned int br_nf_pre_routing(void *priv,
return NF_DROP;
br = p->br;

- if (IS_IPV6(skb) || IS_VLAN_IPV6(skb) || IS_PPPOE_IPV6(skb)) {
- if (!init_net.brnf.call_ip6tables &&
+ if (IS_IPV6(skb) || IS_VLAN_IPV6(skb, state->net) ||
+ IS_PPPOE_IPV6(skb, state->net)) {
+ if (!state->net->brnf.call_ip6tables &&
!br_opt_get(br, BROPT_NF_CALL_IP6TABLES))
return NF_ACCEPT;

@@ -461,11 +464,12 @@ static unsigned int br_nf_pre_routing(void *priv,
return br_nf_pre_routing_ipv6(priv, skb, state);
}

- if (!init_net.brnf.call_iptables &&
+ if (!state->net->brnf.call_iptables &&
!br_opt_get(br, BROPT_NF_CALL_IPTABLES))
return NF_ACCEPT;

- if (!IS_IP(skb) && !IS_VLAN_IP(skb) && !IS_PPPOE_IP(skb))
+ if (!IS_IP(skb) && !IS_VLAN_IP(skb, state->net) &&
+ !IS_PPPOE_IP(skb, state->net))
return NF_ACCEPT;

nf_bridge_pull_encap_header_rcsum(skb);
@@ -475,7 +479,7 @@ static unsigned int br_nf_pre_routing(void *priv,

if (!nf_bridge_alloc(skb))
return NF_DROP;
- if (!setup_pre_routing(skb))
+ if (!setup_pre_routing(skb, state->net))
return NF_DROP;

nf_bridge = nf_bridge_info_get(skb);
@@ -498,7 +502,7 @@ static int br_nf_forward_finish(struct net *net, struct sock *sk, struct sk_buff
struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
struct net_device *in;

- if (!IS_ARP(skb) && !IS_VLAN_ARP(skb)) {
+ if (!IS_ARP(skb) && !IS_VLAN_ARP(skb, net)) {

if (skb->protocol == htons(ETH_P_IP))
nf_bridge->frag_max_size = IPCB(skb)->frag_max_size;
@@ -553,9 +557,11 @@ static unsigned int br_nf_forward_ip(void *priv,
if (!parent)
return NF_DROP;

- if (IS_IP(skb) || IS_VLAN_IP(skb) || IS_PPPOE_IP(skb))
+ if (IS_IP(skb) || IS_VLAN_IP(skb, state->net) ||
+ IS_PPPOE_IP(skb, state->net))
pf = NFPROTO_IPV4;
- else if (IS_IPV6(skb) || IS_VLAN_IPV6(skb) || IS_PPPOE_IPV6(skb))
+ else if (IS_IPV6(skb) || IS_VLAN_IPV6(skb, state->net) ||
+ IS_PPPOE_IPV6(skb, state->net))
pf = NFPROTO_IPV6;
else
return NF_ACCEPT;
@@ -586,7 +592,7 @@ static unsigned int br_nf_forward_ip(void *priv,
skb->protocol = htons(ETH_P_IPV6);

NF_HOOK(pf, NF_INET_FORWARD, state->net, NULL, skb,
- brnf_get_logical_dev(skb, state->in),
+ brnf_get_logical_dev(skb, state->in, state->net),
parent, br_nf_forward_finish);

return NF_STOLEN;
@@ -605,18 +611,18 @@ static unsigned int br_nf_forward_arp(void *priv,
return NF_ACCEPT;
br = p->br;

- if (!init_net.brnf.call_arptables &&
+ if (!state->net->brnf.call_arptables &&
!br_opt_get(br, BROPT_NF_CALL_ARPTABLES))
return NF_ACCEPT;

if (!IS_ARP(skb)) {
- if (!IS_VLAN_ARP(skb))
+ if (!IS_VLAN_ARP(skb, state->net))
return NF_ACCEPT;
nf_bridge_pull_encap_header(skb);
}

if (arp_hdr(skb)->ar_pln != 4) {
- if (IS_VLAN_ARP(skb))
+ if (IS_VLAN_ARP(skb, state->net))
nf_bridge_push_encap_header(skb);
return NF_ACCEPT;
}
@@ -776,9 +782,11 @@ static unsigned int br_nf_post_routing(void *priv,
if (!realoutdev)
return NF_DROP;

- if (IS_IP(skb) || IS_VLAN_IP(skb) || IS_PPPOE_IP(skb))
+ if (IS_IP(skb) || IS_VLAN_IP(skb, state->net) ||
+ IS_PPPOE_IP(skb, state->net))
pf = NFPROTO_IPV4;
- else if (IS_IPV6(skb) || IS_VLAN_IPV6(skb) || IS_PPPOE_IPV6(skb))
+ else if (IS_IPV6(skb) || IS_VLAN_IPV6(skb, state->net) ||
+ IS_PPPOE_IPV6(skb, state->net))
pf = NFPROTO_IPV6;
else
return NF_ACCEPT;
@@ -1060,6 +1068,49 @@ static inline void br_netfilter_sysctl_default(struct netns_brnf *brnf)
brnf->pass_vlan_indev = 0;
}

+static __net_init int br_netfilter_sysctl_init_net(struct net *net)
+{
+ struct ctl_table *table = brnf_table;
+
+ if (net_eq(net, &init_net))
+ return 0;
+
+ table = kmemdup(table, sizeof(brnf_table), GFP_KERNEL);
+ if (!table)
+ return -ENOMEM;
+
+ table[0].data = &net->brnf.call_arptables;
+ table[1].data = &net->brnf.call_iptables;
+ table[2].data = &net->brnf.call_ip6tables;
+ table[3].data = &net->brnf.filter_vlan_tagged;
+ table[4].data = &net->brnf.filter_pppoe_tagged;
+ table[5].data = &net->brnf.pass_vlan_indev;
+
+ net->brnf.ctl_hdr = register_net_sysctl(net, "net/bridge", table);
+ if (!net->brnf.ctl_hdr) {
+ kfree(table);
+ return -ENOMEM;
+ }
+
+ br_netfilter_sysctl_default(&net->brnf);
+
+ return 0;
+}
+
+static __net_exit void br_netfilter_sysctl_exit_net(struct net *net)
+{
+ if (net_eq(net, &init_net))
+ return;
+
+ unregister_net_sysctl_table(net->brnf.ctl_hdr);
+ kfree(net->brnf.ctl_hdr->ctl_table_arg);
+}
+
+static struct pernet_operations br_netfilter_sysctl_ops = {
+ .init = br_netfilter_sysctl_init_net,
+ .exit = br_netfilter_sysctl_exit_net,
+};
+
static int __init br_netfilter_init(void)
{
int ret;
@@ -1086,6 +1137,14 @@ static int __init br_netfilter_init(void)
unregister_pernet_subsys(&brnf_net_ops);
return -ENOMEM;
}
+
+ ret = register_pernet_subsys(&br_netfilter_sysctl_ops);
+ if (ret < 0) {
+ unregister_netdevice_notifier(&brnf_notifier);
+ unregister_pernet_subsys(&brnf_net_ops);
+ unregister_net_sysctl_table(init_net.brnf.ctl_hdr);
+ return ret;
+ }
#endif
RCU_INIT_POINTER(nf_br_ops, &br_ops);
printk(KERN_NOTICE "Bridge firewalling registered\n");
@@ -1099,6 +1158,7 @@ static void __exit br_netfilter_fini(void)
unregister_pernet_subsys(&brnf_net_ops);
#ifdef CONFIG_SYSCTL
unregister_net_sysctl_table(init_net.brnf.ctl_hdr);
+ unregister_pernet_subsys(&br_netfilter_sysctl_ops);
#endif
}

diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
index 0e63e5dc5ac4..e4e0c836c3f5 100644
--- a/net/bridge/br_netfilter_ipv6.c
+++ b/net/bridge/br_netfilter_ipv6.c
@@ -224,7 +224,7 @@ unsigned int br_nf_pre_routing_ipv6(void *priv,
nf_bridge = nf_bridge_alloc(skb);
if (!nf_bridge)
return NF_DROP;
- if (!setup_pre_routing(skb))
+ if (!setup_pre_routing(skb, state->net))
return NF_DROP;

nf_bridge = nf_bridge_info_get(skb);
--
2.21.0

2019-06-06 12:39:03

by Christian Brauner

[permalink] [raw]
Subject: [PATCH RESEND net-next 1/2] br_netfilter: add struct netns_brnf

This adds struct netns_brnf in preparation for per-network-namespace
br_netfilter settings. The individual br_netfilter sysctl options are moved
into a central place in struct net. The struct is only included when
the CONFIG_BRIDGE_NETFILTER kconfig option is enabled in the kernel.

Signed-off-by: Christian Brauner <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
---
include/net/net_namespace.h | 3 ++
include/net/netns/netfilter.h | 16 ++++++++
net/bridge/br_netfilter_hooks.c | 68 ++++++++++++++++-----------------
3 files changed, 52 insertions(+), 35 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 12689ddfc24c..a958d09dc14d 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -127,6 +127,9 @@ struct net {
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
struct netns_ct ct;
#endif
+#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
+ struct netns_brnf brnf;
+#endif
#if defined(CONFIG_NF_TABLES) || defined(CONFIG_NF_TABLES_MODULE)
struct netns_nftables nft;
#endif
diff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h
index ca043342c0eb..eedbd1ac940e 100644
--- a/include/net/netns/netfilter.h
+++ b/include/net/netns/netfilter.h
@@ -35,4 +35,20 @@ struct netns_nf {
bool defrag_ipv6;
#endif
};
+
+struct netns_brnf {
+#ifdef CONFIG_SYSCTL
+ struct ctl_table_header *ctl_hdr;
+#endif
+
+ /* default value is 1 */
+ int call_iptables;
+ int call_ip6tables;
+ int call_arptables;
+
+ /* default value is 0 */
+ int filter_vlan_tagged;
+ int filter_pppoe_tagged;
+ int pass_vlan_indev;
+};
#endif
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 34fa72c72ad8..b51c6b49fc6f 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -49,23 +49,6 @@ struct brnf_net {
bool enabled;
};

-#ifdef CONFIG_SYSCTL
-static struct ctl_table_header *brnf_sysctl_header;
-static int brnf_call_iptables __read_mostly = 1;
-static int brnf_call_ip6tables __read_mostly = 1;
-static int brnf_call_arptables __read_mostly = 1;
-static int brnf_filter_vlan_tagged __read_mostly;
-static int brnf_filter_pppoe_tagged __read_mostly;
-static int brnf_pass_vlan_indev __read_mostly;
-#else
-#define brnf_call_iptables 1
-#define brnf_call_ip6tables 1
-#define brnf_call_arptables 1
-#define brnf_filter_vlan_tagged 0
-#define brnf_filter_pppoe_tagged 0
-#define brnf_pass_vlan_indev 0
-#endif
-
#define IS_IP(skb) \
(!skb_vlan_tag_present(skb) && skb->protocol == htons(ETH_P_IP))

@@ -87,15 +70,15 @@ static inline __be16 vlan_proto(const struct sk_buff *skb)

#define IS_VLAN_IP(skb) \
(vlan_proto(skb) == htons(ETH_P_IP) && \
- brnf_filter_vlan_tagged)
+ init_net.brnf.filter_vlan_tagged)

#define IS_VLAN_IPV6(skb) \
(vlan_proto(skb) == htons(ETH_P_IPV6) && \
- brnf_filter_vlan_tagged)
+ init_net.brnf.filter_vlan_tagged)

#define IS_VLAN_ARP(skb) \
(vlan_proto(skb) == htons(ETH_P_ARP) && \
- brnf_filter_vlan_tagged)
+ init_net.brnf.filter_vlan_tagged)

static inline __be16 pppoe_proto(const struct sk_buff *skb)
{
@@ -106,12 +89,12 @@ static inline __be16 pppoe_proto(const struct sk_buff *skb)
#define IS_PPPOE_IP(skb) \
(skb->protocol == htons(ETH_P_PPP_SES) && \
pppoe_proto(skb) == htons(PPP_IP) && \
- brnf_filter_pppoe_tagged)
+ init_net.brnf.filter_pppoe_tagged)

#define IS_PPPOE_IPV6(skb) \
(skb->protocol == htons(ETH_P_PPP_SES) && \
pppoe_proto(skb) == htons(PPP_IPV6) && \
- brnf_filter_pppoe_tagged)
+ init_net.brnf.filter_pppoe_tagged)

/* largest possible L2 header, see br_nf_dev_queue_xmit() */
#define NF_BRIDGE_MAX_MAC_HEADER_LENGTH (PPPOE_SES_HLEN + ETH_HLEN)
@@ -413,7 +396,7 @@ static struct net_device *brnf_get_logical_dev(struct sk_buff *skb, const struct
struct net_device *vlan, *br;

br = bridge_parent(dev);
- if (brnf_pass_vlan_indev == 0 || !skb_vlan_tag_present(skb))
+ if (init_net.brnf.pass_vlan_indev == 0 || !skb_vlan_tag_present(skb))
return br;

vlan = __vlan_find_dev_deep_rcu(br, skb->vlan_proto,
@@ -470,7 +453,7 @@ static unsigned int br_nf_pre_routing(void *priv,
br = p->br;

if (IS_IPV6(skb) || IS_VLAN_IPV6(skb) || IS_PPPOE_IPV6(skb)) {
- if (!brnf_call_ip6tables &&
+ if (!init_net.brnf.call_ip6tables &&
!br_opt_get(br, BROPT_NF_CALL_IP6TABLES))
return NF_ACCEPT;

@@ -478,7 +461,8 @@ static unsigned int br_nf_pre_routing(void *priv,
return br_nf_pre_routing_ipv6(priv, skb, state);
}

- if (!brnf_call_iptables && !br_opt_get(br, BROPT_NF_CALL_IPTABLES))
+ if (!init_net.brnf.call_iptables &&
+ !br_opt_get(br, BROPT_NF_CALL_IPTABLES))
return NF_ACCEPT;

if (!IS_IP(skb) && !IS_VLAN_IP(skb) && !IS_PPPOE_IP(skb))
@@ -621,7 +605,8 @@ static unsigned int br_nf_forward_arp(void *priv,
return NF_ACCEPT;
br = p->br;

- if (!brnf_call_arptables && !br_opt_get(br, BROPT_NF_CALL_ARPTABLES))
+ if (!init_net.brnf.call_arptables &&
+ !br_opt_get(br, BROPT_NF_CALL_ARPTABLES))
return NF_ACCEPT;

if (!IS_ARP(skb)) {
@@ -1021,42 +1006,42 @@ int brnf_sysctl_call_tables(struct ctl_table *ctl, int write,
static struct ctl_table brnf_table[] = {
{
.procname = "bridge-nf-call-arptables",
- .data = &brnf_call_arptables,
+ .data = &init_net.brnf.call_arptables,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = brnf_sysctl_call_tables,
},
{
.procname = "bridge-nf-call-iptables",
- .data = &brnf_call_iptables,
+ .data = &init_net.brnf.call_iptables,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = brnf_sysctl_call_tables,
},
{
.procname = "bridge-nf-call-ip6tables",
- .data = &brnf_call_ip6tables,
+ .data = &init_net.brnf.call_ip6tables,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = brnf_sysctl_call_tables,
},
{
.procname = "bridge-nf-filter-vlan-tagged",
- .data = &brnf_filter_vlan_tagged,
+ .data = &init_net.brnf.filter_vlan_tagged,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = brnf_sysctl_call_tables,
},
{
.procname = "bridge-nf-filter-pppoe-tagged",
- .data = &brnf_filter_pppoe_tagged,
+ .data = &init_net.brnf.filter_pppoe_tagged,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = brnf_sysctl_call_tables,
},
{
.procname = "bridge-nf-pass-vlan-input-dev",
- .data = &brnf_pass_vlan_indev,
+ .data = &init_net.brnf.pass_vlan_indev,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = brnf_sysctl_call_tables,
@@ -1065,6 +1050,16 @@ static struct ctl_table brnf_table[] = {
};
#endif

+static inline void br_netfilter_sysctl_default(struct netns_brnf *brnf)
+{
+ brnf->call_iptables = 1;
+ brnf->call_ip6tables = 1;
+ brnf->call_arptables = 1;
+ brnf->filter_vlan_tagged = 0;
+ brnf->filter_pppoe_tagged = 0;
+ brnf->pass_vlan_indev = 0;
+}
+
static int __init br_netfilter_init(void)
{
int ret;
@@ -1079,9 +1074,12 @@ static int __init br_netfilter_init(void)
return ret;
}

+ /* Always set default values. Even if CONFIG_SYSCTL is not set. */
+ br_netfilter_sysctl_default(&init_net.brnf);
+
#ifdef CONFIG_SYSCTL
- brnf_sysctl_header = register_net_sysctl(&init_net, "net/bridge", brnf_table);
- if (brnf_sysctl_header == NULL) {
+ init_net.brnf.ctl_hdr = register_net_sysctl(&init_net, "net/bridge", brnf_table);
+ if (!init_net.brnf.ctl_hdr) {
printk(KERN_WARNING
"br_netfilter: can't register to sysctl.\n");
unregister_netdevice_notifier(&brnf_notifier);
@@ -1100,7 +1098,7 @@ static void __exit br_netfilter_fini(void)
unregister_netdevice_notifier(&brnf_notifier);
unregister_pernet_subsys(&brnf_net_ops);
#ifdef CONFIG_SYSCTL
- unregister_net_sysctl_table(brnf_sysctl_header);
+ unregister_net_sysctl_table(init_net.brnf.ctl_hdr);
#endif
}

--
2.21.0

2019-06-06 17:55:36

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH RESEND net-next 1/2] br_netfilter: add struct netns_brnf

On Thu, Jun 06, 2019 at 08:14:40AM -0700, Stephen Hemminger wrote:
> On Thu, 6 Jun 2019 13:41:41 +0200
> Christian Brauner <[email protected]> wrote:
>
> > +struct netns_brnf {
> > +#ifdef CONFIG_SYSCTL
> > + struct ctl_table_header *ctl_hdr;
> > +#endif
> > +
> > + /* default value is 1 */
> > + int call_iptables;
> > + int call_ip6tables;
> > + int call_arptables;
> > +
> > + /* default value is 0 */
> > + int filter_vlan_tagged;
> > + int filter_pppoe_tagged;
> > + int pass_vlan_indev;
> > +};
>
> Do you really need to waste four bytes for each
> flag value. If you use a u8 that would work just as well.

I think we had discussed something like this but the problem why we
can't do this stems from how the sysctl-table stuff is implemented.
I distinctly remember that it couldn't be done with a flag due to that.

Christian

2019-06-06 18:08:41

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH RESEND net-next 1/2] br_netfilter: add struct netns_brnf

On Thu, 6 Jun 2019 13:41:41 +0200
Christian Brauner <[email protected]> wrote:

> +struct netns_brnf {
> +#ifdef CONFIG_SYSCTL
> + struct ctl_table_header *ctl_hdr;
> +#endif
> +
> + /* default value is 1 */
> + int call_iptables;
> + int call_ip6tables;
> + int call_arptables;
> +
> + /* default value is 0 */
> + int filter_vlan_tagged;
> + int filter_pppoe_tagged;
> + int pass_vlan_indev;
> +};

Do you really need to waste four bytes for each
flag value. If you use a u8 that would work just as well.

Bool would also work but the kernel developers frown on bool
in structures.

2019-06-06 18:32:18

by Pablo Neira Ayuso

[permalink] [raw]
Subject: Re: [PATCH RESEND net-next 1/2] br_netfilter: add struct netns_brnf

On Thu, Jun 06, 2019 at 05:19:39PM +0200, Christian Brauner wrote:
> On Thu, Jun 06, 2019 at 08:14:40AM -0700, Stephen Hemminger wrote:
> > On Thu, 6 Jun 2019 13:41:41 +0200
> > Christian Brauner <[email protected]> wrote:
> >
> > > +struct netns_brnf {
> > > +#ifdef CONFIG_SYSCTL
> > > + struct ctl_table_header *ctl_hdr;
> > > +#endif
> > > +
> > > + /* default value is 1 */
> > > + int call_iptables;
> > > + int call_ip6tables;
> > > + int call_arptables;
> > > +
> > > + /* default value is 0 */
> > > + int filter_vlan_tagged;
> > > + int filter_pppoe_tagged;
> > > + int pass_vlan_indev;
> > > +};
> >
> > Do you really need to waste four bytes for each
> > flag value. If you use a u8 that would work just as well.
>
> I think we had discussed something like this but the problem why we
> can't do this stems from how the sysctl-table stuff is implemented.
> I distinctly remember that it couldn't be done with a flag due to that.

Could you define a pernet_operations object? I mean, define the id and size
fields, then pass it to register_pernet_subsys() for registration.
Similar to what we do in net/ipv4/netfilter/ipt_CLUSTER.c, see
clusterip_net_ops and clusterip_pernet() for instance.

2019-06-07 14:30:55

by Pablo Neira Ayuso

[permalink] [raw]
Subject: Re: [PATCH RESEND net-next 1/2] br_netfilter: add struct netns_brnf

On Fri, Jun 07, 2019 at 03:25:16PM +0200, Christian Brauner wrote:
> On Thu, Jun 06, 2019 at 06:30:35PM +0200, Pablo Neira Ayuso wrote:
> > On Thu, Jun 06, 2019 at 05:19:39PM +0200, Christian Brauner wrote:
> > > On Thu, Jun 06, 2019 at 08:14:40AM -0700, Stephen Hemminger wrote:
> > > > On Thu, 6 Jun 2019 13:41:41 +0200
> > > > Christian Brauner <[email protected]> wrote:
> > > >
> > > > > +struct netns_brnf {
> > > > > +#ifdef CONFIG_SYSCTL
> > > > > + struct ctl_table_header *ctl_hdr;
> > > > > +#endif
> > > > > +
> > > > > + /* default value is 1 */
> > > > > + int call_iptables;
> > > > > + int call_ip6tables;
> > > > > + int call_arptables;
> > > > > +
> > > > > + /* default value is 0 */
> > > > > + int filter_vlan_tagged;
> > > > > + int filter_pppoe_tagged;
> > > > > + int pass_vlan_indev;
> > > > > +};
> > > >
> > > > Do you really need to waste four bytes for each
> > > > flag value. If you use a u8 that would work just as well.
> > >
> > > I think we had discussed something like this but the problem why we
> > > can't do this stems from how the sysctl-table stuff is implemented.
> > > I distinctly remember that it couldn't be done with a flag due to that.
> >
> > Could you define a pernet_operations object? I mean, define the id and size
> > fields, then pass it to register_pernet_subsys() for registration.
> > Similar to what we do in net/ipv4/netfilter/ipt_CLUSTER.c, see
> > clusterip_net_ops and clusterip_pernet() for instance.
>
> Hm, I don't think that would work. The sysctls for br_netfilter are
> located in /proc/sys/net/bridge under /proc/sys/net which is tightly
> integrated with the sysctls infrastructure for all of net/ and all the
> folder underneath it including "core", "ipv4" and "ipv6".
> I don't think creating and managing files manually in /proc/sys/net is
> going to fly. It also doesn't seem very wise from a consistency and
> complexity pov. I'm also not sure if this would work at all wrt to file
> creation and reference counting if there are two different ways of
> managing them in the same subfolder...
> (clusterip creates files manually underneath /proc/net which probably is
> the reason why it gets away with it.)

br_netfilter is now a module, and br_netfilter_hooks.c is part of it
IIRC, this file registers these sysctl entries from the module __init
path.

It would be a matter of adding a new .init callback to the existing
brnf_net_ops object in br_netfilter_hooks.c. Then, call
register_net_sysctl() from this .init callback to register the sysctl
entries per netns.

There is already a brnf_net area that you can reuse for this purpose,
to place these pernetns flags...

struct brnf_net {
bool enabled;
};

which is going to be glad to have more fields (under the #ifdef
CONFIG_SYSCTL) there.

2019-06-07 14:36:41

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH RESEND net-next 1/2] br_netfilter: add struct netns_brnf

On Thu, Jun 06, 2019 at 06:30:35PM +0200, Pablo Neira Ayuso wrote:
> On Thu, Jun 06, 2019 at 05:19:39PM +0200, Christian Brauner wrote:
> > On Thu, Jun 06, 2019 at 08:14:40AM -0700, Stephen Hemminger wrote:
> > > On Thu, 6 Jun 2019 13:41:41 +0200
> > > Christian Brauner <[email protected]> wrote:
> > >
> > > > +struct netns_brnf {
> > > > +#ifdef CONFIG_SYSCTL
> > > > + struct ctl_table_header *ctl_hdr;
> > > > +#endif
> > > > +
> > > > + /* default value is 1 */
> > > > + int call_iptables;
> > > > + int call_ip6tables;
> > > > + int call_arptables;
> > > > +
> > > > + /* default value is 0 */
> > > > + int filter_vlan_tagged;
> > > > + int filter_pppoe_tagged;
> > > > + int pass_vlan_indev;
> > > > +};
> > >
> > > Do you really need to waste four bytes for each
> > > flag value. If you use a u8 that would work just as well.
> >
> > I think we had discussed something like this but the problem why we
> > can't do this stems from how the sysctl-table stuff is implemented.
> > I distinctly remember that it couldn't be done with a flag due to that.
>
> Could you define a pernet_operations object? I mean, define the id and size
> fields, then pass it to register_pernet_subsys() for registration.
> Similar to what we do in net/ipv4/netfilter/ipt_CLUSTER.c, see
> clusterip_net_ops and clusterip_pernet() for instance.

Hm, I don't think that would work. The sysctls for br_netfilter are
located in /proc/sys/net/bridge under /proc/sys/net which is tightly
integrated with the sysctls infrastructure for all of net/ and all the
folder underneath it including "core", "ipv4" and "ipv6".
I don't think creating and managing files manually in /proc/sys/net is
going to fly. It also doesn't seem very wise from a consistency and
complexity pov. I'm also not sure if this would work at all wrt to file
creation and reference counting if there are two different ways of
managing them in the same subfolder...
(clusterip creates files manually underneath /proc/net which probably is
the reason why it gets away with it.)

Christian

2019-06-07 14:45:18

by Pablo Neira Ayuso

[permalink] [raw]
Subject: Re: [PATCH RESEND net-next 1/2] br_netfilter: add struct netns_brnf

On Fri, Jun 07, 2019 at 04:28:58PM +0200, Pablo Neira Ayuso wrote:
> On Fri, Jun 07, 2019 at 03:25:16PM +0200, Christian Brauner wrote:
> > On Thu, Jun 06, 2019 at 06:30:35PM +0200, Pablo Neira Ayuso wrote:
> > > On Thu, Jun 06, 2019 at 05:19:39PM +0200, Christian Brauner wrote:
> > > > On Thu, Jun 06, 2019 at 08:14:40AM -0700, Stephen Hemminger wrote:
> > > > > On Thu, 6 Jun 2019 13:41:41 +0200
> > > > > Christian Brauner <[email protected]> wrote:
> > > > >
> > > > > > +struct netns_brnf {
> > > > > > +#ifdef CONFIG_SYSCTL
> > > > > > + struct ctl_table_header *ctl_hdr;
> > > > > > +#endif
> > > > > > +
> > > > > > + /* default value is 1 */
> > > > > > + int call_iptables;
> > > > > > + int call_ip6tables;
> > > > > > + int call_arptables;
> > > > > > +
> > > > > > + /* default value is 0 */
> > > > > > + int filter_vlan_tagged;
> > > > > > + int filter_pppoe_tagged;
> > > > > > + int pass_vlan_indev;
> > > > > > +};
> > > > >
> > > > > Do you really need to waste four bytes for each
> > > > > flag value. If you use a u8 that would work just as well.
> > > >
> > > > I think we had discussed something like this but the problem why we
> > > > can't do this stems from how the sysctl-table stuff is implemented.
> > > > I distinctly remember that it couldn't be done with a flag due to that.
> > >
> > > Could you define a pernet_operations object? I mean, define the id and size
> > > fields, then pass it to register_pernet_subsys() for registration.
> > > Similar to what we do in net/ipv4/netfilter/ipt_CLUSTER.c, see
> > > clusterip_net_ops and clusterip_pernet() for instance.
> >
> > Hm, I don't think that would work. The sysctls for br_netfilter are
> > located in /proc/sys/net/bridge under /proc/sys/net which is tightly
> > integrated with the sysctls infrastructure for all of net/ and all the
> > folder underneath it including "core", "ipv4" and "ipv6".
> > I don't think creating and managing files manually in /proc/sys/net is
> > going to fly. It also doesn't seem very wise from a consistency and
> > complexity pov. I'm also not sure if this would work at all wrt to file
> > creation and reference counting if there are two different ways of
> > managing them in the same subfolder...
> > (clusterip creates files manually underneath /proc/net which probably is
> > the reason why it gets away with it.)
>
> br_netfilter is now a module, and br_netfilter_hooks.c is part of it
> IIRC, this file registers these sysctl entries from the module __init
> path.
>
> It would be a matter of adding a new .init callback to the existing
> brnf_net_ops object in br_netfilter_hooks.c. Then, call
> register_net_sysctl() from this .init callback to register the sysctl
> entries per netns.

Actually, this is what you patch is doing...

> There is already a brnf_net area that you can reuse for this purpose,
> to place these pernetns flags...
>
> struct brnf_net {
> bool enabled;
> };
>
> which is going to be glad to have more fields (under the #ifdef
> CONFIG_SYSCTL) there.

... except that struct brnf_net is not used to store the ctl_table.

So what I'm propose should be result in a small update to your patch 2/2.

2019-06-09 15:57:58

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH RESEND net-next 1/2] br_netfilter: add struct netns_brnf

On Fri, Jun 07, 2019 at 04:43:43PM +0200, Pablo Neira Ayuso wrote:
> On Fri, Jun 07, 2019 at 04:28:58PM +0200, Pablo Neira Ayuso wrote:
> > On Fri, Jun 07, 2019 at 03:25:16PM +0200, Christian Brauner wrote:
> > > On Thu, Jun 06, 2019 at 06:30:35PM +0200, Pablo Neira Ayuso wrote:
> > > > On Thu, Jun 06, 2019 at 05:19:39PM +0200, Christian Brauner wrote:
> > > > > On Thu, Jun 06, 2019 at 08:14:40AM -0700, Stephen Hemminger wrote:
> > > > > > On Thu, 6 Jun 2019 13:41:41 +0200
> > > > > > Christian Brauner <[email protected]> wrote:
> > > > > >
> > > > > > > +struct netns_brnf {
> > > > > > > +#ifdef CONFIG_SYSCTL
> > > > > > > + struct ctl_table_header *ctl_hdr;
> > > > > > > +#endif
> > > > > > > +
> > > > > > > + /* default value is 1 */
> > > > > > > + int call_iptables;
> > > > > > > + int call_ip6tables;
> > > > > > > + int call_arptables;
> > > > > > > +
> > > > > > > + /* default value is 0 */
> > > > > > > + int filter_vlan_tagged;
> > > > > > > + int filter_pppoe_tagged;
> > > > > > > + int pass_vlan_indev;
> > > > > > > +};
> > > > > >
> > > > > > Do you really need to waste four bytes for each
> > > > > > flag value. If you use a u8 that would work just as well.
> > > > >
> > > > > I think we had discussed something like this but the problem why we
> > > > > can't do this stems from how the sysctl-table stuff is implemented.
> > > > > I distinctly remember that it couldn't be done with a flag due to that.
> > > >
> > > > Could you define a pernet_operations object? I mean, define the id and size
> > > > fields, then pass it to register_pernet_subsys() for registration.
> > > > Similar to what we do in net/ipv4/netfilter/ipt_CLUSTER.c, see
> > > > clusterip_net_ops and clusterip_pernet() for instance.
> > >
> > > Hm, I don't think that would work. The sysctls for br_netfilter are
> > > located in /proc/sys/net/bridge under /proc/sys/net which is tightly
> > > integrated with the sysctls infrastructure for all of net/ and all the
> > > folder underneath it including "core", "ipv4" and "ipv6".
> > > I don't think creating and managing files manually in /proc/sys/net is
> > > going to fly. It also doesn't seem very wise from a consistency and
> > > complexity pov. I'm also not sure if this would work at all wrt to file
> > > creation and reference counting if there are two different ways of
> > > managing them in the same subfolder...
> > > (clusterip creates files manually underneath /proc/net which probably is
> > > the reason why it gets away with it.)
> >
> > br_netfilter is now a module, and br_netfilter_hooks.c is part of it
> > IIRC, this file registers these sysctl entries from the module __init
> > path.
> >
> > It would be a matter of adding a new .init callback to the existing
> > brnf_net_ops object in br_netfilter_hooks.c. Then, call
> > register_net_sysctl() from this .init callback to register the sysctl
> > entries per netns.
>
> Actually, this is what you patch is doing...
>
> > There is already a brnf_net area that you can reuse for this purpose,
> > to place these pernetns flags...
> >
> > struct brnf_net {
> > bool enabled;
> > };
> >
> > which is going to be glad to have more fields (under the #ifdef
> > CONFIG_SYSCTL) there.
>
> ... except that struct brnf_net is not used to store the ctl_table.
>
> So what I'm propose should be result in a small update to your patch 2/2.

Actually not, I think. I had to rework it substantially but I think the
outcome is quite nice. :) I'll send a new version now/today. :)

Thanks!
Christian