2018-09-03 04:40:00

by Christian Brauner

[permalink] [raw]
Subject: [PATCH net-next v1 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR

Hey,

# v1 introduction:
The only functional change is the export of rtnl_get_net_ns_capable()
which is needed in case ipv6 is built as a module.

Note, I did not change the property name to IFA_TARGET_NSID as there was
no clear agreement what would be preferred. My personal preference is to
keep the IFA_IF_NETNSID name because it aligns naturally with the
IFLA_IF_NETNSID property for RTM_*LINK requests. Jiri seems to prefer
this name too.
However, if there is agreement that another property name makes more
sense I'm happy to send a v2 that changes this.

## Performance:
To test this patchset I performed 1 million getifaddrs() requests
against a network namespace containing 5 interfaces (lo, eth{0-4}). The
first test used a network namespace aware getifaddrs() implementation I
wrote and the second test used the traditional setns() + getifaddrs()
method. The results show that this patchsets allows userspace to cut
retrieval time in half:
1. netns_getifaddrs(): 82 microseconds
2. setns() + getifaddrs(): 162 microseconds

# v0 introduction:
A while back we introduced and enabled IFLA_IF_NETNSID in
RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
to signficant performance increases since it allows userspace to avoid
taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
interfaces from the netns associated with the netns_fd. Especially when a
lot of network namespaces are in use, using setns() becomes increasingly
problematic when performance matters.
Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
getifaddrs() style functions and friends). But currently, RTM_GETADDR
requests do not support a similar property like IFLA_IF_NETNSID for
RTM_*LINK requests.
This is problematic since userspace can retrieve interfaces from another
network namespace by sending a IFLA_IF_NETNSID property along but
RTM_GETLINK request but is still forced to use the legacy setns() style of
retrieving interfaces in RTM_GETADDR requests.

The goal of this series is to make it possible to perform RTM_GETADDR
requests on different network namespaces. To this end a new IFA_IF_NETNSID
property for RTM_*ADDR requests is introduced. It can be used to send a
network namespace identifier along in RTM_*ADDR requests. The network
namespace identifier will be used to retrieve the target network namespace
in which the request is supposed to be fulfilled. This aligns the behavior
of RTM_*ADDR requests with the behavior of RTM_*LINK requests.

## Security:
- The caller must have assigned a valid network namespace identifier for
the target network namespace.
- The caller must have CAP_NET_ADMIN in the owning user namespace of the
target network namespace.

Thanks!
Christian

[1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
[2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
[3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
[4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
[5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")

Christian Brauner (5):
rtnetlink: add rtnl_get_net_ns_capable()
if_addr: add IFA_IF_NETNSID
ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
rtnetlink: move type calculation out of loop

include/net/rtnetlink.h | 1 +
include/uapi/linux/if_addr.h | 1 +
net/core/rtnetlink.c | 19 +++++++---
net/ipv4/devinet.c | 38 +++++++++++++++-----
net/ipv6/addrconf.c | 70 ++++++++++++++++++++++++++++--------
5 files changed, 101 insertions(+), 28 deletions(-)

--
2.17.1



2018-09-03 04:40:28

by Christian Brauner

[permalink] [raw]
Subject: [PATCH net-next v1 1/5] rtnetlink: add rtnl_get_net_ns_capable()

get_target_net() will be used in follow-up patches in ipv{4,6} codepaths to
retrieve network namespaces based on network namespace identifiers. So
remove the static declaration and export in the rtnetlink header. Also,
rename it to rtnl_get_net_ns_capable() to make it obvious what this
function is doing.

Signed-off-by: Christian Brauner <[email protected]>
---
v0->v1:
- export rtnl_get_net_ns_capable().
Kbuild reported a build failure when ipv6 is built as a module. This was
caused by rtnl_get_net_ns_capable() not being exported. Fix this by
exporting it.
---
include/net/rtnetlink.h | 1 +
net/core/rtnetlink.c | 17 +++++++++++++----
2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
index 0bbaa5488423..cf26e5aacac4 100644
--- a/include/net/rtnetlink.h
+++ b/include/net/rtnetlink.h
@@ -165,6 +165,7 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm);

int rtnl_nla_parse_ifla(struct nlattr **tb, const struct nlattr *head, int len,
struct netlink_ext_ack *exterr);
+struct net *rtnl_get_net_ns_capable(struct sock *sk, int netnsid);

#define MODULE_ALIAS_RTNL_LINK(kind) MODULE_ALIAS("rtnl-link-" kind)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 24431e578310..30645d9a9801 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1841,7 +1841,15 @@ static bool link_dump_filtered(struct net_device *dev,
return false;
}

-static struct net *get_target_net(struct sock *sk, int netnsid)
+/**
+ * rtnl_get_net_ns_capable - Get netns if sufficiently privileged.
+ * @sk: netlink socket
+ * @netnsid: network namespace identifier
+ *
+ * Returns the network namespace identified by netnsid on success or an error
+ * pointer on failure.
+ */
+struct net *rtnl_get_net_ns_capable(struct sock *sk, int netnsid)
{
struct net *net;

@@ -1858,6 +1866,7 @@ static struct net *get_target_net(struct sock *sk, int netnsid)
}
return net;
}
+EXPORT_SYMBOL_GPL(rtnl_get_net_ns_capable);

static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
{
@@ -1893,7 +1902,7 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
ifla_policy, NULL) >= 0) {
if (tb[IFLA_IF_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
- tgt_net = get_target_net(skb->sk, netnsid);
+ tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
if (IS_ERR(tgt_net)) {
tgt_net = net;
netnsid = -1;
@@ -2761,7 +2770,7 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,

if (tb[IFLA_IF_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
- tgt_net = get_target_net(NETLINK_CB(skb).sk, netnsid);
+ tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, netnsid);
if (IS_ERR(tgt_net))
return PTR_ERR(tgt_net);
}
@@ -3171,7 +3180,7 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr *nlh,

if (tb[IFLA_IF_NETNSID]) {
netnsid = nla_get_s32(tb[IFLA_IF_NETNSID]);
- tgt_net = get_target_net(NETLINK_CB(skb).sk, netnsid);
+ tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(skb).sk, netnsid);
if (IS_ERR(tgt_net))
return PTR_ERR(tgt_net);
}
--
2.17.1


2018-09-03 04:41:00

by Christian Brauner

[permalink] [raw]
Subject: [PATCH net-next v1 2/5] if_addr: add IFA_IF_NETNSID

This adds a new IFA_IF_NETNSID property to be used by address families such
as PF_INET and PF_INET6.
The IFA_IF_NETNSID property can be used to send a network namespace
identifier as part of a request. If a IFA_IF_NETNSID property is identified
it will be used to retrieve the target network namespace in which the
request is to be made.

Signed-off-by: Christian Brauner <[email protected]>
Cc: Jiri Benc <[email protected]>
Cc: Nicolas Dichtel <[email protected]>
---
v0->v1:
- unchanged
Note, I did not change the property name to IFA_TARGET_NSID as there
was no clear agreement what would be preferred. My personal preference
is to keep the IFA_IF_NETNSID name because it aligns naturally with
the IFLA_IF_NETNSID property for RTM_*LINK requests. Jiri seems to
prefer this name too.
However, if there is agreement that another property name makes more
sense I'm happy to send a v2 that changes this.
---
include/uapi/linux/if_addr.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/if_addr.h b/include/uapi/linux/if_addr.h
index ebaf5701c9db..0e0cd588cac0 100644
--- a/include/uapi/linux/if_addr.h
+++ b/include/uapi/linux/if_addr.h
@@ -34,6 +34,7 @@ enum {
IFA_MULTICAST,
IFA_FLAGS,
IFA_RT_PRIORITY, /* u32, priority/metric for prefix route */
+ IFA_IF_NETNSID,
__IFA_MAX,
};

--
2.17.1


2018-09-03 04:41:20

by Christian Brauner

[permalink] [raw]
Subject: [PATCH net-next v1 3/5] ipv4: enable IFA_IF_NETNSID for RTM_GETADDR

- Backwards Compatibility:
If userspace wants to determine whether ipv4 RTM_GETADDR requests support
the new IFA_IF_NETNSID property they should verify that the reply after
sending a request includes the IFA_IF_NETNSID property. If it does not
userspace should assume that IFA_IF_NETNSID is not supported for ipv4
RTM_GETADDR requests on this kernel.
- From what I gather from current userspace tools that make use of
RTM_GETADDR requests some of them pass down struct ifinfomsg when they
should actually pass down struct ifaddrmsg. To not break existing
tools that pass down the wrong struct we will do the same as for
RTM_GETLINK | NLM_F_DUMP requests and not error out when the
nlmsg_parse() fails.

- Security:
Callers must have CAP_NET_ADMIN in the owning user namespace of the
target network namespace.

Signed-off-by: Christian Brauner <[email protected]>
---
v0->v1:
- unchanged
---
net/ipv4/devinet.c | 38 ++++++++++++++++++++++++++++++--------
1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index ea4bd8a52422..c52271309a1f 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -100,6 +100,7 @@ static const struct nla_policy ifa_ipv4_policy[IFA_MAX+1] = {
[IFA_CACHEINFO] = { .len = sizeof(struct ifa_cacheinfo) },
[IFA_FLAGS] = { .type = NLA_U32 },
[IFA_RT_PRIORITY] = { .type = NLA_U32 },
+ [IFA_IF_NETNSID] = { .type = NLA_S32 },
};

#define IN4_ADDR_HSIZE_SHIFT 8
@@ -1584,7 +1585,8 @@ static int put_cacheinfo(struct sk_buff *skb, unsigned long cstamp,
}

static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
- u32 portid, u32 seq, int event, unsigned int flags)
+ u32 portid, u32 seq, int event, unsigned int flags,
+ int netnsid)
{
struct ifaddrmsg *ifm;
struct nlmsghdr *nlh;
@@ -1601,6 +1603,9 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
ifm->ifa_scope = ifa->ifa_scope;
ifm->ifa_index = ifa->ifa_dev->dev->ifindex;

+ if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
+ goto nla_put_failure;
+
if (!(ifm->ifa_flags & IFA_F_PERMANENT)) {
preferred = ifa->ifa_preferred_lft;
valid = ifa->ifa_valid_lft;
@@ -1648,6 +1653,9 @@ static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
{
struct net *net = sock_net(skb->sk);
+ struct nlattr *tb[IFA_MAX+1];
+ struct net *tgt_net = net;
+ int netnsid = -1;
int h, s_h;
int idx, s_idx;
int ip_idx, s_ip_idx;
@@ -1660,12 +1668,23 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
s_idx = idx = cb->args[1];
s_ip_idx = ip_idx = cb->args[2];

+ if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX,
+ ifa_ipv4_policy, NULL) >= 0) {
+ if (tb[IFA_IF_NETNSID]) {
+ netnsid = nla_get_s32(tb[IFA_IF_NETNSID]);
+
+ tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
+ if (IS_ERR(tgt_net))
+ return PTR_ERR(tgt_net);
+ }
+ }
+
for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
idx = 0;
- head = &net->dev_index_head[h];
+ head = &tgt_net->dev_index_head[h];
rcu_read_lock();
- cb->seq = atomic_read(&net->ipv4.dev_addr_genid) ^
- net->dev_base_seq;
+ cb->seq = atomic_read(&tgt_net->ipv4.dev_addr_genid) ^
+ tgt_net->dev_base_seq;
hlist_for_each_entry_rcu(dev, head, index_hlist) {
if (idx < s_idx)
goto cont;
@@ -1680,9 +1699,10 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
if (ip_idx < s_ip_idx)
continue;
if (inet_fill_ifaddr(skb, ifa,
- NETLINK_CB(cb->skb).portid,
- cb->nlh->nlmsg_seq,
- RTM_NEWADDR, NLM_F_MULTI) < 0) {
+ NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq,
+ RTM_NEWADDR, NLM_F_MULTI,
+ netnsid) < 0) {
rcu_read_unlock();
goto done;
}
@@ -1698,6 +1718,8 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
cb->args[0] = h;
cb->args[1] = idx;
cb->args[2] = ip_idx;
+ if (netnsid >= 0)
+ put_net(tgt_net);

return skb->len;
}
@@ -1715,7 +1737,7 @@ static void rtmsg_ifa(int event, struct in_ifaddr *ifa, struct nlmsghdr *nlh,
if (!skb)
goto errout;

- err = inet_fill_ifaddr(skb, ifa, portid, seq, event, 0);
+ err = inet_fill_ifaddr(skb, ifa, portid, seq, event, 0, -1);
if (err < 0) {
/* -EMSGSIZE implies BUG in inet_nlmsg_size() */
WARN_ON(err == -EMSGSIZE);
--
2.17.1


2018-09-03 04:41:38

by Christian Brauner

[permalink] [raw]
Subject: [PATCH net-next v1 4/5] ipv6: enable IFA_IF_NETNSID for RTM_GETADDR

- Backwards Compatibility:
If userspace wants to determine whether ipv6 RTM_GETADDR requests support
the new IFA_IF_NETNSID property they should verify that the reply after
sending a request includes the IFA_IF_NETNSID property. If it does not
userspace should assume that IFA_IF_NETNSID is not supported for ipv6
RTM_GETADDR requests on this kernel.
- From what I gather from current userspace tools that make use of
RTM_GETADDR requests some of them pass down struct ifinfomsg when they
should actually pass down struct ifaddrmsg. To not break existing
tools that pass down the wrong struct we will do the same as for
RTM_GETLINK | NLM_F_DUMP requests and not error out when the
nlmsg_parse() fails.

- Security:
Callers must have CAP_NET_ADMIN in the owning user namespace of the
target network namespace.

Signed-off-by: Christian Brauner <[email protected]>
---
v0->v1:
- unchanged
---
net/ipv6/addrconf.c | 70 +++++++++++++++++++++++++++++++++++----------
1 file changed, 55 insertions(+), 15 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d51a8c0b3372..00f1af374150 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4491,6 +4491,7 @@ static const struct nla_policy ifa_ipv6_policy[IFA_MAX+1] = {
[IFA_CACHEINFO] = { .len = sizeof(struct ifa_cacheinfo) },
[IFA_FLAGS] = { .len = sizeof(u32) },
[IFA_RT_PRIORITY] = { .len = sizeof(u32) },
+ [IFA_IF_NETNSID] = { .type = NLA_S32 },
};

static int
@@ -4794,7 +4795,8 @@ static inline int inet6_ifaddr_msgsize(void)
}

static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
- u32 portid, u32 seq, int event, unsigned int flags)
+ u32 portid, u32 seq, int event, unsigned int flags,
+ int netnsid)
{
struct nlmsghdr *nlh;
u32 preferred, valid;
@@ -4806,6 +4808,9 @@ static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
put_ifaddrmsg(nlh, ifa->prefix_len, ifa->flags, rt_scope(ifa->scope),
ifa->idev->dev->ifindex);

+ if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
+ goto error;
+
if (!((ifa->flags&IFA_F_PERMANENT) &&
(ifa->prefered_lft == INFINITY_LIFE_TIME))) {
preferred = ifa->prefered_lft;
@@ -4855,7 +4860,8 @@ static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
}

static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
- u32 portid, u32 seq, int event, u16 flags)
+ u32 portid, u32 seq, int event, u16 flags,
+ int netnsid)
{
struct nlmsghdr *nlh;
u8 scope = RT_SCOPE_UNIVERSE;
@@ -4868,6 +4874,9 @@ static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
if (!nlh)
return -EMSGSIZE;

+ if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
+ return -EMSGSIZE;
+
put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex);
if (nla_put_in6_addr(skb, IFA_MULTICAST, &ifmca->mca_addr) < 0 ||
put_cacheinfo(skb, ifmca->mca_cstamp, ifmca->mca_tstamp,
@@ -4881,7 +4890,8 @@ static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
}

static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca,
- u32 portid, u32 seq, int event, unsigned int flags)
+ u32 portid, u32 seq, int event,
+ unsigned int flags, int netnsid)
{
struct net_device *dev = fib6_info_nh_dev(ifaca->aca_rt);
int ifindex = dev ? dev->ifindex : 1;
@@ -4895,6 +4905,9 @@ static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca,
if (!nlh)
return -EMSGSIZE;

+ if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
+ return -EMSGSIZE;
+
put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex);
if (nla_put_in6_addr(skb, IFA_ANYCAST, &ifaca->aca_addr) < 0 ||
put_cacheinfo(skb, ifaca->aca_cstamp, ifaca->aca_tstamp,
@@ -4916,7 +4929,7 @@ enum addr_type_t {
/* called with rcu_read_lock() */
static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
struct netlink_callback *cb, enum addr_type_t type,
- int s_ip_idx, int *p_ip_idx)
+ int s_ip_idx, int *p_ip_idx, int netnsid)
{
struct ifmcaddr6 *ifmca;
struct ifacaddr6 *ifaca;
@@ -4936,7 +4949,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
RTM_NEWADDR,
- NLM_F_MULTI);
+ NLM_F_MULTI, netnsid);
if (err < 0)
break;
nl_dump_check_consistent(cb, nlmsg_hdr(skb));
@@ -4953,7 +4966,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
RTM_GETMULTICAST,
- NLM_F_MULTI);
+ NLM_F_MULTI, netnsid);
if (err < 0)
break;
}
@@ -4968,7 +4981,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
RTM_GETANYCAST,
- NLM_F_MULTI);
+ NLM_F_MULTI, netnsid);
if (err < 0)
break;
}
@@ -4985,6 +4998,9 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
enum addr_type_t type)
{
struct net *net = sock_net(skb->sk);
+ struct nlattr *tb[IFA_MAX+1];
+ struct net *tgt_net = net;
+ int netnsid = -1;
int h, s_h;
int idx, ip_idx;
int s_idx, s_ip_idx;
@@ -4996,11 +5012,22 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
s_idx = idx = cb->args[1];
s_ip_idx = ip_idx = cb->args[2];

+ if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX,
+ ifa_ipv6_policy, NULL) >= 0) {
+ if (tb[IFA_IF_NETNSID]) {
+ netnsid = nla_get_s32(tb[IFA_IF_NETNSID]);
+
+ tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
+ if (IS_ERR(tgt_net))
+ return PTR_ERR(tgt_net);
+ }
+ }
+
rcu_read_lock();
- cb->seq = atomic_read(&net->ipv6.dev_addr_genid) ^ net->dev_base_seq;
+ cb->seq = atomic_read(&tgt_net->ipv6.dev_addr_genid) ^ tgt_net->dev_base_seq;
for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
idx = 0;
- head = &net->dev_index_head[h];
+ head = &tgt_net->dev_index_head[h];
hlist_for_each_entry_rcu(dev, head, index_hlist) {
if (idx < s_idx)
goto cont;
@@ -5012,7 +5039,7 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
goto cont;

if (in6_dump_addrs(idev, skb, cb, type,
- s_ip_idx, &ip_idx) < 0)
+ s_ip_idx, &ip_idx, netnsid) < 0)
goto done;
cont:
idx++;
@@ -5023,6 +5050,8 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
cb->args[0] = h;
cb->args[1] = idx;
cb->args[2] = ip_idx;
+ if (netnsid >= 0)
+ put_net(tgt_net);

return skb->len;
}
@@ -5053,12 +5082,14 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
struct netlink_ext_ack *extack)
{
struct net *net = sock_net(in_skb->sk);
+ struct net *tgt_net = net;
struct ifaddrmsg *ifm;
struct nlattr *tb[IFA_MAX+1];
struct in6_addr *addr = NULL, *peer;
struct net_device *dev = NULL;
struct inet6_ifaddr *ifa;
struct sk_buff *skb;
+ int netnsid = -1;
int err;

err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv6_policy,
@@ -5066,15 +5097,24 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
if (err < 0)
return err;

+ if (tb[IFA_IF_NETNSID]) {
+ netnsid = nla_get_s32(tb[IFA_IF_NETNSID]);
+
+ tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(in_skb).sk,
+ netnsid);
+ if (IS_ERR(tgt_net))
+ return PTR_ERR(tgt_net);
+ }
+
addr = extract_addr(tb[IFA_ADDRESS], tb[IFA_LOCAL], &peer);
if (!addr)
return -EINVAL;

ifm = nlmsg_data(nlh);
if (ifm->ifa_index)
- dev = dev_get_by_index(net, ifm->ifa_index);
+ dev = dev_get_by_index(tgt_net, ifm->ifa_index);

- ifa = ipv6_get_ifaddr(net, addr, dev, 1);
+ ifa = ipv6_get_ifaddr(tgt_net, addr, dev, 1);
if (!ifa) {
err = -EADDRNOTAVAIL;
goto errout;
@@ -5087,14 +5127,14 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
}

err = inet6_fill_ifaddr(skb, ifa, NETLINK_CB(in_skb).portid,
- nlh->nlmsg_seq, RTM_NEWADDR, 0);
+ nlh->nlmsg_seq, RTM_NEWADDR, 0, netnsid);
if (err < 0) {
/* -EMSGSIZE implies BUG in inet6_ifaddr_msgsize() */
WARN_ON(err == -EMSGSIZE);
kfree_skb(skb);
goto errout_ifa;
}
- err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
+ err = rtnl_unicast(skb, tgt_net, NETLINK_CB(in_skb).portid);
errout_ifa:
in6_ifa_put(ifa);
errout:
@@ -5113,7 +5153,7 @@ static void inet6_ifa_notify(int event, struct inet6_ifaddr *ifa)
if (!skb)
goto errout;

- err = inet6_fill_ifaddr(skb, ifa, 0, 0, event, 0);
+ err = inet6_fill_ifaddr(skb, ifa, 0, 0, event, 0, -1);
if (err < 0) {
/* -EMSGSIZE implies BUG in inet6_ifaddr_msgsize() */
WARN_ON(err == -EMSGSIZE);
--
2.17.1


2018-09-03 04:42:18

by Christian Brauner

[permalink] [raw]
Subject: [PATCH net-next v1 5/5] rtnetlink: move type calculation out of loop

I don't see how the type - which is one of
RTM_{GETADDR,GETROUTE,GETNETCONF} - can change. So do the message type
calculation once before entering the for loop.

Signed-off-by: Christian Brauner <[email protected]>
---
v0->v1:
- unchanged
---
net/core/rtnetlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 30645d9a9801..b36dab7507a0 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3265,13 +3265,13 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
{
int idx;
int s_idx = cb->family;
+ int type = cb->nlh->nlmsg_type - RTM_BASE;

if (s_idx == 0)
s_idx = 1;

for (idx = 1; idx <= RTNL_FAMILY_MAX; idx++) {
struct rtnl_link **tab;
- int type = cb->nlh->nlmsg_type-RTM_BASE;
struct rtnl_link *link;
rtnl_dumpit_func dumpit;

--
2.17.1


2018-09-03 14:58:50

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH net-next v1 4/5] ipv6: enable IFA_IF_NETNSID for RTM_GETADDR

On 03.09.2018 07:37, Christian Brauner wrote:
> - Backwards Compatibility:
> If userspace wants to determine whether ipv6 RTM_GETADDR requests support
> the new IFA_IF_NETNSID property they should verify that the reply after
> sending a request includes the IFA_IF_NETNSID property. If it does not
> userspace should assume that IFA_IF_NETNSID is not supported for ipv6
> RTM_GETADDR requests on this kernel.
> - From what I gather from current userspace tools that make use of
> RTM_GETADDR requests some of them pass down struct ifinfomsg when they
> should actually pass down struct ifaddrmsg. To not break existing
> tools that pass down the wrong struct we will do the same as for
> RTM_GETLINK | NLM_F_DUMP requests and not error out when the
> nlmsg_parse() fails.
>
> - Security:
> Callers must have CAP_NET_ADMIN in the owning user namespace of the
> target network namespace.
>
> Signed-off-by: Christian Brauner <[email protected]>
> ---
> v0->v1:
> - unchanged
> ---
> net/ipv6/addrconf.c | 70 +++++++++++++++++++++++++++++++++++----------
> 1 file changed, 55 insertions(+), 15 deletions(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index d51a8c0b3372..00f1af374150 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -4491,6 +4491,7 @@ static const struct nla_policy ifa_ipv6_policy[IFA_MAX+1] = {
> [IFA_CACHEINFO] = { .len = sizeof(struct ifa_cacheinfo) },
> [IFA_FLAGS] = { .len = sizeof(u32) },
> [IFA_RT_PRIORITY] = { .len = sizeof(u32) },
> + [IFA_IF_NETNSID] = { .type = NLA_S32 },
> };
>
> static int
> @@ -4794,7 +4795,8 @@ static inline int inet6_ifaddr_msgsize(void)
> }
>
> static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
> - u32 portid, u32 seq, int event, unsigned int flags)
> + u32 portid, u32 seq, int event, unsigned int flags,
> + int netnsid)
> {
> struct nlmsghdr *nlh;
> u32 preferred, valid;
> @@ -4806,6 +4808,9 @@ static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
> put_ifaddrmsg(nlh, ifa->prefix_len, ifa->flags, rt_scope(ifa->scope),
> ifa->idev->dev->ifindex);
>
> + if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
> + goto error;
> +
> if (!((ifa->flags&IFA_F_PERMANENT) &&
> (ifa->prefered_lft == INFINITY_LIFE_TIME))) {
> preferred = ifa->prefered_lft;
> @@ -4855,7 +4860,8 @@ static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
> }
>
> static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
> - u32 portid, u32 seq, int event, u16 flags)
> + u32 portid, u32 seq, int event, u16 flags,
> + int netnsid)
> {
> struct nlmsghdr *nlh;
> u8 scope = RT_SCOPE_UNIVERSE;
> @@ -4868,6 +4874,9 @@ static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
> if (!nlh)
> return -EMSGSIZE;
>
> + if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
> + return -EMSGSIZE;
> +
> put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex);
> if (nla_put_in6_addr(skb, IFA_MULTICAST, &ifmca->mca_addr) < 0 ||
> put_cacheinfo(skb, ifmca->mca_cstamp, ifmca->mca_tstamp,
> @@ -4881,7 +4890,8 @@ static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
> }
>
> static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca,
> - u32 portid, u32 seq, int event, unsigned int flags)
> + u32 portid, u32 seq, int event,
> + unsigned int flags, int netnsid)

Here we will have 7 arguments, while we have only 6 x86 registers. Can we do something
with this?

> {
> struct net_device *dev = fib6_info_nh_dev(ifaca->aca_rt);
> int ifindex = dev ? dev->ifindex : 1;
> @@ -4895,6 +4905,9 @@ static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca,
> if (!nlh)
> return -EMSGSIZE;
>
> + if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
> + return -EMSGSIZE;
> +
> put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex);
> if (nla_put_in6_addr(skb, IFA_ANYCAST, &ifaca->aca_addr) < 0 ||
> put_cacheinfo(skb, ifaca->aca_cstamp, ifaca->aca_tstamp,
> @@ -4916,7 +4929,7 @@ enum addr_type_t {
> /* called with rcu_read_lock() */
> static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
> struct netlink_callback *cb, enum addr_type_t type,
> - int s_ip_idx, int *p_ip_idx)
> + int s_ip_idx, int *p_ip_idx, int netnsid)
> {
> struct ifmcaddr6 *ifmca;
> struct ifacaddr6 *ifaca;
> @@ -4936,7 +4949,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
> NETLINK_CB(cb->skb).portid,
> cb->nlh->nlmsg_seq,
> RTM_NEWADDR,
> - NLM_F_MULTI);
> + NLM_F_MULTI, netnsid);
> if (err < 0)
> break;
> nl_dump_check_consistent(cb, nlmsg_hdr(skb));
> @@ -4953,7 +4966,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
> NETLINK_CB(cb->skb).portid,
> cb->nlh->nlmsg_seq,
> RTM_GETMULTICAST,
> - NLM_F_MULTI);
> + NLM_F_MULTI, netnsid);
> if (err < 0)
> break;
> }
> @@ -4968,7 +4981,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
> NETLINK_CB(cb->skb).portid,
> cb->nlh->nlmsg_seq,
> RTM_GETANYCAST,
> - NLM_F_MULTI);
> + NLM_F_MULTI, netnsid);
> if (err < 0)
> break;
> }
> @@ -4985,6 +4998,9 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
> enum addr_type_t type)
> {
> struct net *net = sock_net(skb->sk);
> + struct nlattr *tb[IFA_MAX+1];
> + struct net *tgt_net = net;
> + int netnsid = -1;

This function already has 10 local variables, while this patch adds 3 more.
Documentation/process/coding-style.rst says:

Another measure of the function is the number of local variables. They
shouldn't exceed 5-10, or you're doing something wrong. Re-think the
function, and split it into smaller pieces. A human brain can
generally easily keep track of about 7 different things, anything more
and it gets confused.

Can we do something with this?

> int h, s_h;
> int idx, ip_idx;
> int s_idx, s_ip_idx;
> @@ -4996,11 +5012,22 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
> s_idx = idx = cb->args[1];
> s_ip_idx = ip_idx = cb->args[2];
>
> + if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX,
> + ifa_ipv6_policy, NULL) >= 0) {
> + if (tb[IFA_IF_NETNSID]) {
> + netnsid = nla_get_s32(tb[IFA_IF_NETNSID]);
> +
> + tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
> + if (IS_ERR(tgt_net))
> + return PTR_ERR(tgt_net);
> + }
> + }
> +
> rcu_read_lock();
> - cb->seq = atomic_read(&net->ipv6.dev_addr_genid) ^ net->dev_base_seq;
> + cb->seq = atomic_read(&tgt_net->ipv6.dev_addr_genid) ^ tgt_net->dev_base_seq;
> for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
> idx = 0;
> - head = &net->dev_index_head[h];
> + head = &tgt_net->dev_index_head[h];
> hlist_for_each_entry_rcu(dev, head, index_hlist) {
> if (idx < s_idx)
> goto cont;
> @@ -5012,7 +5039,7 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
> goto cont;
>
> if (in6_dump_addrs(idev, skb, cb, type,
> - s_ip_idx, &ip_idx) < 0)
> + s_ip_idx, &ip_idx, netnsid) < 0)
> goto done;
> cont:
> idx++;
> @@ -5023,6 +5050,8 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
> cb->args[0] = h;
> cb->args[1] = idx;
> cb->args[2] = ip_idx;
> + if (netnsid >= 0)
> + put_net(tgt_net);
>
> return skb->len;
> }
> @@ -5053,12 +5082,14 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
> struct netlink_ext_ack *extack)
> {
> struct net *net = sock_net(in_skb->sk);
> + struct net *tgt_net = net;
> struct ifaddrmsg *ifm;
> struct nlattr *tb[IFA_MAX+1];
> struct in6_addr *addr = NULL, *peer;
> struct net_device *dev = NULL;
> struct inet6_ifaddr *ifa;
> struct sk_buff *skb;
> + int netnsid = -1;
> int err;
>
> err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv6_policy,
> @@ -5066,15 +5097,24 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
> if (err < 0)
> return err;
>
> + if (tb[IFA_IF_NETNSID]) {
> + netnsid = nla_get_s32(tb[IFA_IF_NETNSID]);
> +
> + tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(in_skb).sk,
> + netnsid);
> + if (IS_ERR(tgt_net))
> + return PTR_ERR(tgt_net);
> + }
> +
> addr = extract_addr(tb[IFA_ADDRESS], tb[IFA_LOCAL], &peer);
> if (!addr)
> return -EINVAL;
>
> ifm = nlmsg_data(nlh);
> if (ifm->ifa_index)
> - dev = dev_get_by_index(net, ifm->ifa_index);
> + dev = dev_get_by_index(tgt_net, ifm->ifa_index);
>
> - ifa = ipv6_get_ifaddr(net, addr, dev, 1);
> + ifa = ipv6_get_ifaddr(tgt_net, addr, dev, 1);
> if (!ifa) {
> err = -EADDRNOTAVAIL;
> goto errout;
> @@ -5087,14 +5127,14 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
> }
>
> err = inet6_fill_ifaddr(skb, ifa, NETLINK_CB(in_skb).portid,
> - nlh->nlmsg_seq, RTM_NEWADDR, 0);
> + nlh->nlmsg_seq, RTM_NEWADDR, 0, netnsid);
> if (err < 0) {
> /* -EMSGSIZE implies BUG in inet6_ifaddr_msgsize() */
> WARN_ON(err == -EMSGSIZE);
> kfree_skb(skb);
> goto errout_ifa;
> }
> - err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
> + err = rtnl_unicast(skb, tgt_net, NETLINK_CB(in_skb).portid);
> errout_ifa:
> in6_ifa_put(ifa);
> errout:
> @@ -5113,7 +5153,7 @@ static void inet6_ifa_notify(int event, struct inet6_ifaddr *ifa)
> if (!skb)
> goto errout;
>
> - err = inet6_fill_ifaddr(skb, ifa, 0, 0, event, 0);
> + err = inet6_fill_ifaddr(skb, ifa, 0, 0, event, 0, -1);
> if (err < 0) {
> /* -EMSGSIZE implies BUG in inet6_ifaddr_msgsize() */
> WARN_ON(err == -EMSGSIZE);

Thanks,
Kirill

2018-09-04 03:13:58

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH net-next v1 3/5] ipv4: enable IFA_IF_NETNSID for RTM_GETADDR

On 9/2/18 10:37 PM, Christian Brauner wrote:
> - Backwards Compatibility:
> If userspace wants to determine whether ipv4 RTM_GETADDR requests support
> the new IFA_IF_NETNSID property they should verify that the reply after
> sending a request includes the IFA_IF_NETNSID property. If it does not
> userspace should assume that IFA_IF_NETNSID is not supported for ipv4
> RTM_GETADDR requests on this kernel.

Can only use it once per message type, but NLM_F_DUMP_FILTERED is a flag
that can be set to explicitly say the request is filtered as requested.
See 21fdd092acc7e. I would like to see other filters added for addresses
in the same release this gets used. The only one that comes to mind for
addresses is to only return addresses for devices with master device
index N (same intent as 21fdd092acc7e for neighbors).

2018-09-04 06:51:35

by Jiri Benc

[permalink] [raw]
Subject: Re: [PATCH net-next v1 3/5] ipv4: enable IFA_IF_NETNSID for RTM_GETADDR

On Mon, 3 Sep 2018 21:11:30 -0600, David Ahern wrote:
> Can only use it once per message type, but NLM_F_DUMP_FILTERED is a flag
> that can be set to explicitly say the request is filtered as requested.

The problem is that NLM_F_DUMP_FILTERED is too coarse. There's no way
to determine whether the netnsid was honored or whether it was not but
other filtering took effect.

This is a general problem with netlink: unknown attributes are ignored.
We need a way to detect that certain attribute was understood by the
kernel or was not. And it needs to work retroactively, i.e. the
application has to be able to determine the currently running kernel
does not support the feature (because it's too old).

That's why we return back the attribute in responses to a request with
IFLA_IF_NETNSID present and why we should do the same for
IFA_IF_NETNSID.

> See 21fdd092acc7e. I would like to see other filters added for addresses
> in the same release this gets used. The only one that comes to mind for
> addresses is to only return addresses for devices with master device
> index N (same intent as 21fdd092acc7e for neighbors).

I also question the statement that IFA_F_NETNSID is a filter: my
understanding of "filter" is something that limits the output to a
certain subset. I.e., unfiltered results always contain everything that
is in a filtered result. While with IFA_F_NETNSID, we get a completely
different set of data. Does that really constitute a filter? Note that
we can still filter in the target netns.

Thanks,

Jiri

2018-09-04 07:22:06

by Nicolas Dichtel

[permalink] [raw]
Subject: Re: [PATCH net-next v1 3/5] ipv4: enable IFA_IF_NETNSID for RTM_GETADDR


Le 04/09/2018 à 08:50, Jiri Benc a écrit :
> On Mon, 3 Sep 2018 21:11:30 -0600, David Ahern wrote:
>> Can only use it once per message type, but NLM_F_DUMP_FILTERED is a flag
>> that can be set to explicitly say the request is filtered as requested.
>
> The problem is that NLM_F_DUMP_FILTERED is too coarse. There's no way
> to determine whether the netnsid was honored or whether it was not but
> other filtering took effect.
>
> This is a general problem with netlink: unknown attributes are ignored.
> We need a way to detect that certain attribute was understood by the
> kernel or was not. And it needs to work retroactively, i.e. the
> application has to be able to determine the currently running kernel
> does not support the feature (because it's too old).
>
> That's why we return back the attribute in responses to a request with
> IFLA_IF_NETNSID present and why we should do the same for
> IFA_IF_NETNSID.
+1

>
>> See 21fdd092acc7e. I would like to see other filters added for addresses
>> in the same release this gets used. The only one that comes to mind for
>> addresses is to only return addresses for devices with master device
>> index N (same intent as 21fdd092acc7e for neighbors).
>
> I also question the statement that IFA_F_NETNSID is a filter: my
> understanding of "filter" is something that limits the output to a
> certain subset. I.e., unfiltered results always contain everything that
> is in a filtered result. While with IFA_F_NETNSID, we get a completely
> different set of data. Does that really constitute a filter? Note that
> we can still filter in the target netns.
+1


Regards,
Nicolas

2018-09-04 16:29:21

by David Ahern

[permalink] [raw]
Subject: Re: [PATCH net-next v1 3/5] ipv4: enable IFA_IF_NETNSID for RTM_GETADDR

On 9/4/18 12:50 AM, Jiri Benc wrote:
>
> This is a general problem with netlink: unknown attributes are ignored.
> We need a way to detect that certain attribute was understood by the
> kernel or was not. And it needs to work retroactively, i.e. the
> application has to be able to determine the currently running kernel
> does not support the feature (because it's too old).

sure, and that has been discussed before.

>
> That's why we return back the attribute in responses to a request with
> IFLA_IF_NETNSID present and why we should do the same for
> IFA_IF_NETNSID.
>
>> See 21fdd092acc7e. I would like to see other filters added for addresses
>> in the same release this gets used. The only one that comes to mind for
>> addresses is to only return addresses for devices with master device
>> index N (same intent as 21fdd092acc7e for neighbors).
>
> I also question the statement that IFA_F_NETNSID is a filter: my
> understanding of "filter" is something that limits the output to a
> certain subset. I.e., unfiltered results always contain everything that
> is in a filtered result. While with IFA_F_NETNSID, we get a completely
> different set of data. Does that really constitute a filter? Note that
> we can still filter in the target netns.
>

I'll buy that argument over the 'too coarse' one. Looking at the link
version of this flag, the NLM_F_DUMP_FILTERED flag is not used there so
for consistency the address one should follow suit.