2018-04-27 10:26:58

by Christian Brauner

[permalink] [raw]
Subject: [PATCH net-next 0/2] netns: uevent filtering

Hey everyone,

This is the new approach to uevent filtering as discussed (see the
threads in [1], [2], and [3]).

This series deals with with fixing up uevent filtering logic:
- uevent filtering logic is simplified
- locking time on uevent_sock_list is minimized
- tagged and untagged kobjects are handled in separate codepaths
- permissions for userspace are fixed for network device uevents in
network namespaces owned by non-initial user namespaces
Udev is now able to see those events correctly which it wasn't before.
For example, moving a physical device into a network namespace not
owned by the initial user namespaces before gave:

root@xen1:~# udevadm --debug monitor -k
calling: monitor
monitor will print the received events for:
KERNEL - the kernel uevent

sender uid=65534, message ignored
sender uid=65534, message ignored
sender uid=65534, message ignored
sender uid=65534, message ignored
sender uid=65534, message ignored

and now after the discussion and solution in [3] correctly gives:

root@xen1:~# udevadm --debug monitor -k
calling: monitor
monitor will print the received events for:
KERNEL - the kernel uevent

KERNEL[625.301042] add /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net)
KERNEL[625.301109] move /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net)
KERNEL[625.301138] move /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net)
KERNEL[655.333272] remove /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net)

Thanks!
Christian

[1]: https://lkml.org/lkml/2018/4/4/739
[2]: https://lkml.org/lkml/2018/4/26/767
[3]: https://lkml.org/lkml/2018/4/26/738

Christian Brauner (2):
uevent: add alloc_uevent_skb() helper
netns: restrict uevents

lib/kobject_uevent.c | 175 ++++++++++++++++++++++++++++++-------------
1 file changed, 123 insertions(+), 52 deletions(-)

--
2.17.0



2018-04-27 10:26:58

by Christian Brauner

[permalink] [raw]
Subject: [PATCH net-next 2/2 v3] netns: restrict uevents

commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")

enabled sending hotplug events into all network namespaces back in 2010.
Over time the set of uevents that get sent into all network namespaces has
shrunk. We have now reached the point where hotplug events for all devices
that carry a namespace tag are filtered according to that namespace.
Specifically, they are filtered whenever the namespace tag of the kobject
does not match the namespace tag of the netlink socket.
Currently, only network devices carry namespace tags (i.e. network
namespace tags). Hence, uevents for network devices only show up in the
network namespace such devices are created in or moved to.

However, any uevent for a kobject that does not have a namespace tag
associated with it will not be filtered and we will broadcast it into all
network namespaces. This behavior stopped making sense when user namespaces
were introduced.

This patch simplifies and fixes couple of things:
- Split codepath for sending uevents by kobject namespace tags:
1. Untagged kobjects - uevent_net_broadcast_untagged():
Untagged kobjects will be broadcast into all uevent sockets recorded
in uevent_sock_list, i.e. into all network namespacs owned by the
intial user namespace.
2. Tagged kobjects - uevent_net_broadcast_tagged():
Tagged kobjects will only be broadcast into the network namespace they
were tagged with.
Handling of tagged kobjects in 2. does not cause any semantic changes.
This is just splitting out the filtering logic that was handled by
kobj_bcast_filter() before.
Handling of untagged kobjects in 1. will cause a semantic change. The
reasons why this is needed and ok have been discussed in [1]. Here is a
short summary:
- Userspace ignores uevents from network namespaces that are not owned by
the intial user namespace:
Uevents are filtered by userspace in a user namespace because the
received uid != 0. Instead the uid associated with the event will be
65534 == "nobody" because the global root uid is not mapped.
This means we can safely and without introducing regressions modify the
kernel to not send uevents into all network namespaces whose owning
user namespace is not the initial user namespace because we know that
userspace will ignore the message because of the uid anyway.
I have a) verified that is is true for every udev implementation out
there b) that this behavior has been present in all udev
implementations from the very beginning.
- Thundering herd:
Broadcasting uevents into all network namespaces introduces significant
overhead.
All processes that listen to uevents running in non-initial user
namespaces will end up responding to uevents that will be meaningless
to them. Mainly, because non-initial user namespaces cannot easily
manage devices unless they have a privileged host-process helping them
out. This means that there will be a thundering herd of activity when
there shouldn't be any.
- Removing needless overhead/Increasing performance:
Currently, the uevent socket for each network namespace is added to the
global variable uevent_sock_list. The list itself needs to be protected
by a mutex. So everytime a uevent is generated the mutex is taken on
the list. The mutex is held *from the creation of the uevent (memory
allocation, string creation etc. until all uevent sockets have been
handled*. This is aggravated by the fact that for each uevent socket
that has listeners the mc_list must be walked as well which means we're
talking O(n^2) here. Given that a standard Linux workload usually has
quite a lot of network namespaces and - in the face of containers - a
lot of user namespaces this quickly becomes a performance problem (see
"Thundering herd" above). By just recording uevent sockets of network
namespaces that are owned by the initial user namespace we
significantly increase performance in this codepath.
- Injecting uevents:
There's a valid argument that containers might be interested in
receiving device events especially if they are delegated to them by a
privileged userspace process. One prime example are SR-IOV enabled
devices that are explicitly designed to be handed of to other users
such as VMs or containers.
This use-case can now be correctly handled since
commit 692ec06d7c92 ("netns: send uevent messages"). This commit
introduced the ability to send uevents from userspace. As such we can
let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
namespace of the network namespace of the netlink socket) userspace
process make a decision what uevents should be sent. This removes the
need to blindly broadcast uevents into all user namespaces and provides
a performant and safe solution to this problem.
- Filtering logic:
This patch filters by *owning user namespace of the network namespace a
given task resides in* and not by user namespace of the task per se.
This means if the user namespace of a given task is unshared but the
network namespace is kept and is owned by the initial user namespace a
listener that is opening the uevent socket in that network namespace
can still listen to uevents.
- Fix permission for tagged kobjects:
Network devices that are created or moved into a network namespace that
is owned by a non-initial user namespace currently are send with
INVALID_{G,U}ID in their credentials. This means that all current udev
implementations in userspace will ignore the uevent they receive for
them. This has lead to weird bugs whereby new devices showing up in such
network namespaces were not recognized and did not get IPs assigned etc.
This patch adjusts the permission to the appropriate {g,u}id in the
respective user namespace. This way udevd is able to correctly handle
such devices.
- Simplify filtering logic:
do_one_broadcast() already ensures that only listeners in mc_list receive
uevents that have the same network namespace as the uevent socket itself.
So the filtering logic in kobj_bcast_filter is not needed (see [3]). This
patch therefore removes kobj_bcast_filter() and replaces
netlink_broadcast_filtered() with the simpler netlink_broadcast()
everywhere.

[1]: https://lkml.org/lkml/2018/4/4/739
[2]: https://lkml.org/lkml/2018/4/26/767
[3]: https://lkml.org/lkml/2018/4/26/738
Signed-off-by: Christian Brauner <[email protected]>
---
lib/kobject_uevent.c | 140 ++++++++++++++++++++++++++++++-------------
1 file changed, 99 insertions(+), 41 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index c3cb110f663b..d8ce5e6d83af 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -22,6 +22,7 @@
#include <linux/socket.h>
#include <linux/skbuff.h>
#include <linux/netlink.h>
+#include <linux/uidgid.h>
#include <linux/uuid.h>
#include <linux/ctype.h>
#include <net/sock.h>
@@ -231,30 +232,6 @@ int kobject_synth_uevent(struct kobject *kobj, const char *buf, size_t count)
return r;
}

-#ifdef CONFIG_NET
-static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
-{
- struct kobject *kobj = data, *ksobj;
- const struct kobj_ns_type_operations *ops;
-
- ops = kobj_ns_ops(kobj);
- if (!ops && kobj->kset) {
- ksobj = &kobj->kset->kobj;
- if (ksobj->parent != NULL)
- ops = kobj_ns_ops(ksobj->parent);
- }
-
- if (ops && ops->netlink_ns && kobj->ktype->namespace) {
- const void *sock_ns, *ns;
- ns = kobj->ktype->namespace(kobj);
- sock_ns = ops->netlink_ns(dsk);
- return sock_ns != ns;
- }
-
- return 0;
-}
-#endif
-
#ifdef CONFIG_UEVENT_HELPER
static int kobj_usermode_filter(struct kobject *kobj)
{
@@ -296,6 +273,7 @@ static void cleanup_uevent_env(struct subprocess_info *info)
}
#endif

+#ifdef CONFIG_NET
static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
const char *action_string,
const char *devpath)
@@ -321,15 +299,13 @@ static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
return skb;
}

-static int kobject_uevent_net_broadcast(struct kobject *kobj,
- struct kobj_uevent_env *env,
- const char *action_string,
- const char *devpath)
+static int uevent_net_broadcast_untagged(struct kobj_uevent_env *env,
+ const char *action_string,
+ const char *devpath)
{
- int retval = 0;
-#if defined(CONFIG_NET)
struct sk_buff *skb = NULL;
struct uevent_sock *ue_sk;
+ int retval = 0;

/* send netlink message */
list_for_each_entry(ue_sk, &uevent_sock_list, list) {
@@ -345,19 +321,95 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
continue;
}

- retval = netlink_broadcast_filtered(uevent_sock, skb_get(skb),
- 0, 1, GFP_KERNEL,
- kobj_bcast_filter,
- kobj);
+ retval = netlink_broadcast(uevent_sock, skb_get(skb), 0, 1,
+ GFP_KERNEL);
/* ENOBUFS should be handled in userspace */
if (retval == -ENOBUFS || retval == -ESRCH)
retval = 0;
}
consume_skb(skb);
-#endif
+
return retval;
}

+static int uevent_net_broadcast_tagged(struct sock *usk,
+ struct kobj_uevent_env *env,
+ const char *action_string,
+ const char *devpath)
+{
+ struct user_namespace *owning_user_ns = sock_net(usk)->user_ns;
+ struct sk_buff *skb = NULL;
+ int ret;
+
+ skb = alloc_uevent_skb(env, action_string, devpath);
+ if (!skb)
+ return -ENOMEM;
+
+ /* fix credentials */
+ if (owning_user_ns != &init_user_ns) {
+ struct netlink_skb_parms *parms = &NETLINK_CB(skb);
+ kuid_t root_uid;
+ kgid_t root_gid;
+
+ /* fix uid */
+ root_uid = make_kuid(owning_user_ns, 0);
+ if (!uid_valid(root_uid))
+ root_uid = GLOBAL_ROOT_UID;
+ parms->creds.uid = root_uid;
+
+ /* fix gid */
+ root_gid = make_kgid(owning_user_ns, 0);
+ if (!gid_valid(root_gid))
+ root_gid = GLOBAL_ROOT_GID;
+ parms->creds.gid = root_gid;
+ }
+
+ ret = netlink_broadcast(usk, skb, 0, 1, GFP_KERNEL);
+ /* ENOBUFS should be handled in userspace */
+ if (ret == -ENOBUFS || ret == -ESRCH)
+ ret = 0;
+
+ return ret;
+}
+#endif
+
+static int kobject_uevent_net_broadcast(struct kobject *kobj,
+ struct kobj_uevent_env *env,
+ const char *action_string,
+ const char *devpath)
+{
+ int ret = 0;
+
+#ifdef CONFIG_NET
+ const struct kobj_ns_type_operations *ops;
+ const struct net *net = NULL;
+
+ ops = kobj_ns_ops(kobj);
+ if (!ops && kobj->kset) {
+ struct kobject *ksobj = &kobj->kset->kobj;
+ if (ksobj->parent != NULL)
+ ops = kobj_ns_ops(ksobj->parent);
+ }
+
+ /* kobjects currently only carry network namespace tags and they
+ * are the only tag relevant here since we want to decide which
+ * network namespaces to broadcast the uevent into.
+ */
+ if (ops && ops->netlink_ns && kobj->ktype->namespace)
+ if (ops->type == KOBJ_NS_TYPE_NET)
+ net = kobj->ktype->namespace(kobj);
+
+ if (!net)
+ ret = uevent_net_broadcast_untagged(env, action_string,
+ devpath);
+ else
+ ret = uevent_net_broadcast_tagged(net->uevent_sock->sk, env,
+ action_string, devpath);
+#endif
+
+ return ret;
+}
+
static void zap_modalias_env(struct kobj_uevent_env *env)
{
static const char modalias_prefix[] = "MODALIAS=";
@@ -716,9 +768,13 @@ static int uevent_net_init(struct net *net)

net->uevent_sock = ue_sk;

- mutex_lock(&uevent_sock_mutex);
- list_add_tail(&ue_sk->list, &uevent_sock_list);
- mutex_unlock(&uevent_sock_mutex);
+ /* Restrict uevents to initial user namespace. */
+ if (sock_net(ue_sk->sk)->user_ns == &init_user_ns) {
+ mutex_lock(&uevent_sock_mutex);
+ list_add_tail(&ue_sk->list, &uevent_sock_list);
+ mutex_unlock(&uevent_sock_mutex);
+ }
+
return 0;
}

@@ -726,9 +782,11 @@ static void uevent_net_exit(struct net *net)
{
struct uevent_sock *ue_sk = net->uevent_sock;

- mutex_lock(&uevent_sock_mutex);
- list_del(&ue_sk->list);
- mutex_unlock(&uevent_sock_mutex);
+ if (sock_net(ue_sk->sk)->user_ns == &init_user_ns) {
+ mutex_lock(&uevent_sock_mutex);
+ list_del(&ue_sk->list);
+ mutex_unlock(&uevent_sock_mutex);
+ }

netlink_kernel_release(ue_sk->sk);
kfree(ue_sk);
--
2.17.0


2018-04-27 10:27:58

by Christian Brauner

[permalink] [raw]
Subject: [PATCH net-next 1/2 v3] uevent: add alloc_uevent_skb() helper

This patch adds alloc_uevent_skb() in preparation for follow up patches.

Signed-off-by: Christian Brauner <[email protected]>
---
lib/kobject_uevent.c | 39 ++++++++++++++++++++++++++-------------
1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 15ea216a67ce..c3cb110f663b 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -296,6 +296,31 @@ static void cleanup_uevent_env(struct subprocess_info *info)
}
#endif

+static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
+ const char *action_string,
+ const char *devpath)
+{
+ struct sk_buff *skb = NULL;
+ char *scratch;
+ size_t len;
+
+ /* allocate message with maximum possible size */
+ len = strlen(action_string) + strlen(devpath) + 2;
+ skb = alloc_skb(len + env->buflen, GFP_KERNEL);
+ if (!skb)
+ return NULL;
+
+ /* add header */
+ scratch = skb_put(skb, len);
+ sprintf(scratch, "%s@%s", action_string, devpath);
+
+ skb_put_data(skb, env->buf, env->buflen);
+
+ NETLINK_CB(skb).dst_group = 1;
+
+ return skb;
+}
+
static int kobject_uevent_net_broadcast(struct kobject *kobj,
struct kobj_uevent_env *env,
const char *action_string,
@@ -314,22 +339,10 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
continue;

if (!skb) {
- /* allocate message with the maximum possible size */
- size_t len = strlen(action_string) + strlen(devpath) + 2;
- char *scratch;
-
retval = -ENOMEM;
- skb = alloc_skb(len + env->buflen, GFP_KERNEL);
+ skb = alloc_uevent_skb(env, action_string, devpath);
if (!skb)
continue;
-
- /* add header */
- scratch = skb_put(skb, len);
- sprintf(scratch, "%s@%s", action_string, devpath);
-
- skb_put_data(skb, env->buf, env->buflen);
-
- NETLINK_CB(skb).dst_group = 1;
}

retval = netlink_broadcast_filtered(uevent_sock, skb_get(skb),
--
2.17.0


2018-04-27 16:29:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH net-next 0/2] netns: uevent filtering

Christian Brauner <[email protected]> writes:

> Hey everyone,
>
> This is the new approach to uevent filtering as discussed (see the
> threads in [1], [2], and [3]).
>
> This series deals with with fixing up uevent filtering logic:
> - uevent filtering logic is simplified
> - locking time on uevent_sock_list is minimized
> - tagged and untagged kobjects are handled in separate codepaths
> - permissions for userspace are fixed for network device uevents in
> network namespaces owned by non-initial user namespaces
> Udev is now able to see those events correctly which it wasn't before.
> For example, moving a physical device into a network namespace not
> owned by the initial user namespaces before gave:
>
> root@xen1:~# udevadm --debug monitor -k
> calling: monitor
> monitor will print the received events for:
> KERNEL - the kernel uevent
>
> sender uid=65534, message ignored
> sender uid=65534, message ignored
> sender uid=65534, message ignored
> sender uid=65534, message ignored
> sender uid=65534, message ignored
>
> and now after the discussion and solution in [3] correctly gives:
>
> root@xen1:~# udevadm --debug monitor -k
> calling: monitor
> monitor will print the received events for:
> KERNEL - the kernel uevent
>
> KERNEL[625.301042] add /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net)
> KERNEL[625.301109] move /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net)
> KERNEL[625.301138] move /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net)
> KERNEL[655.333272] remove /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net)

Acked-by: "Eric W. Biederman" <[email protected]>

Overall this change looks good and I would nave not problems
if it was merged as it. I have one or two nits. But they are not
particularly important.

Eric


> Thanks!
> Christian
>
> [1]: https://lkml.org/lkml/2018/4/4/739
> [2]: https://lkml.org/lkml/2018/4/26/767
> [3]: https://lkml.org/lkml/2018/4/26/738
>
> Christian Brauner (2):
> uevent: add alloc_uevent_skb() helper
> netns: restrict uevents
>
> lib/kobject_uevent.c | 175 ++++++++++++++++++++++++++++++-------------
> 1 file changed, 123 insertions(+), 52 deletions(-)

2018-04-27 16:32:18

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH net-next 2/2 v3] netns: restrict uevents

Christian Brauner <[email protected]> writes:
> ---
> lib/kobject_uevent.c | 140 ++++++++++++++++++++++++++++++-------------
> 1 file changed, 99 insertions(+), 41 deletions(-)
>
> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> index c3cb110f663b..d8ce5e6d83af 100644
> --- a/lib/kobject_uevent.c
> +++ b/lib/kobject_uevent.c
>
> +static int uevent_net_broadcast_tagged(struct sock *usk,
> + struct kobj_uevent_env *env,
> + const char *action_string,
> + const char *devpath)
> +{
> + struct user_namespace *owning_user_ns = sock_net(usk)->user_ns;
> + struct sk_buff *skb = NULL;
> + int ret;
> +
> + skb = alloc_uevent_skb(env, action_string, devpath);
> + if (!skb)
> + return -ENOMEM;
> +
> + /* fix credentials */
> + if (owning_user_ns != &init_user_ns) {

Nit: This test is just a performance optimization as such is not
necessary. That is we can safely unconditionally set the
credentials this way.

> + struct netlink_skb_parms *parms = &NETLINK_CB(skb);
> + kuid_t root_uid;
> + kgid_t root_gid;
> +
> + /* fix uid */
> + root_uid = make_kuid(owning_user_ns, 0);
> + if (!uid_valid(root_uid))
> + root_uid = GLOBAL_ROOT_UID;
> + parms->creds.uid = root_uid;
> +
> + /* fix gid */
> + root_gid = make_kgid(owning_user_ns, 0);
> + if (!gid_valid(root_gid))
> + root_gid = GLOBAL_ROOT_GID;
> + parms->creds.gid = root_gid;
> + }
> +
> + ret = netlink_broadcast(usk, skb, 0, 1, GFP_KERNEL);
> + /* ENOBUFS should be handled in userspace */
> + if (ret == -ENOBUFS || ret == -ESRCH)
> + ret = 0;
> +
> + return ret;
> +}
> +#endif

2018-04-27 16:41:16

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH net-next 1/2 v3] uevent: add alloc_uevent_skb() helper

Christian Brauner <[email protected]> writes:

> This patch adds alloc_uevent_skb() in preparation for follow up patches.
>
> Signed-off-by: Christian Brauner <[email protected]>
> ---
> lib/kobject_uevent.c | 39 ++++++++++++++++++++++++++-------------
> 1 file changed, 26 insertions(+), 13 deletions(-)
>
> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> index 15ea216a67ce..c3cb110f663b 100644
> --- a/lib/kobject_uevent.c
> +++ b/lib/kobject_uevent.c
> @@ -296,6 +296,31 @@ static void cleanup_uevent_env(struct subprocess_info *info)
> }
> #endif
>
> +static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
> + const char *action_string,
> + const char *devpath)
> +{
> + struct sk_buff *skb = NULL;
> + char *scratch;
> + size_t len;
> +
> + /* allocate message with maximum possible size */
> + len = strlen(action_string) + strlen(devpath) + 2;
> + skb = alloc_skb(len + env->buflen, GFP_KERNEL);
> + if (!skb)
> + return NULL;
> +
> + /* add header */
> + scratch = skb_put(skb, len);
> + sprintf(scratch, "%s@%s", action_string, devpath);
> +
> + skb_put_data(skb, env->buf, env->buflen);
> +
> + NETLINK_CB(skb).dst_group = 1;

nit:
We might want to explicitly set NETLINK_CB(skb).portid to 0 and
NETLINK_CB(skb).creds.uid to GLOBAL_ROOT_UID and
NETLINK_CB(skb).creds.gid to GLOBAL_ROOT_GID here
just to make it clear this is happening.

It is not a problem because they __alloc_skb memsets to 0 the
fields of struct sk_buff that it does not initialize. And these
are the zero values.

Still it would be nice to be able to look at the code and quickly
see these are the values being set.

Eric

2018-04-28 19:11:30

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH net-next 1/2 v3] uevent: add alloc_uevent_skb() helper

On Fri, Apr 27, 2018 at 11:39:44AM -0500, Eric W. Biederman wrote:
> Christian Brauner <[email protected]> writes:
>
> > This patch adds alloc_uevent_skb() in preparation for follow up patches.
> >
> > Signed-off-by: Christian Brauner <[email protected]>
> > ---
> > lib/kobject_uevent.c | 39 ++++++++++++++++++++++++++-------------
> > 1 file changed, 26 insertions(+), 13 deletions(-)
> >
> > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> > index 15ea216a67ce..c3cb110f663b 100644
> > --- a/lib/kobject_uevent.c
> > +++ b/lib/kobject_uevent.c
> > @@ -296,6 +296,31 @@ static void cleanup_uevent_env(struct subprocess_info *info)
> > }
> > #endif
> >
> > +static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
> > + const char *action_string,
> > + const char *devpath)
> > +{
> > + struct sk_buff *skb = NULL;
> > + char *scratch;
> > + size_t len;
> > +
> > + /* allocate message with maximum possible size */
> > + len = strlen(action_string) + strlen(devpath) + 2;
> > + skb = alloc_skb(len + env->buflen, GFP_KERNEL);
> > + if (!skb)
> > + return NULL;
> > +
> > + /* add header */
> > + scratch = skb_put(skb, len);
> > + sprintf(scratch, "%s@%s", action_string, devpath);
> > +
> > + skb_put_data(skb, env->buf, env->buflen);
> > +
> > + NETLINK_CB(skb).dst_group = 1;
>
> nit:
> We might want to explicitly set NETLINK_CB(skb).portid to 0 and
> NETLINK_CB(skb).creds.uid to GLOBAL_ROOT_UID and
> NETLINK_CB(skb).creds.gid to GLOBAL_ROOT_GID here
> just to make it clear this is happening.
>
> It is not a problem because they __alloc_skb memsets to 0 the
> fields of struct sk_buff that it does not initialize. And these
> are the zero values.
>
> Still it would be nice to be able to look at the code and quickly
> see these are the values being set.

Don't really mind adding it. Ok, non-functional changes added to the new
version. But then let's set "portid" too:

parms = &NETLINK_CB(skb);
parms->creds.uid = GLOBAL_ROOT_UID;
parms->creds.gid = GLOBAL_ROOT_GID;
parms->dst_group = 1;
parms->portid = 0;

Christian

2018-04-28 19:17:31

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH net-next 2/2 v3] netns: restrict uevents

On Fri, Apr 27, 2018 at 11:30:26AM -0500, Eric W. Biederman wrote:
> Christian Brauner <[email protected]> writes:
> > ---
> > lib/kobject_uevent.c | 140 ++++++++++++++++++++++++++++++-------------
> > 1 file changed, 99 insertions(+), 41 deletions(-)
> >
> > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> > index c3cb110f663b..d8ce5e6d83af 100644
> > --- a/lib/kobject_uevent.c
> > +++ b/lib/kobject_uevent.c
> >
> > +static int uevent_net_broadcast_tagged(struct sock *usk,
> > + struct kobj_uevent_env *env,
> > + const char *action_string,
> > + const char *devpath)
> > +{
> > + struct user_namespace *owning_user_ns = sock_net(usk)->user_ns;
> > + struct sk_buff *skb = NULL;
> > + int ret;
> > +
> > + skb = alloc_uevent_skb(env, action_string, devpath);
> > + if (!skb)
> > + return -ENOMEM;
> > +
> > + /* fix credentials */
> > + if (owning_user_ns != &init_user_ns) {
>
> Nit: This test is just a performance optimization as such is not
> necessary. That is we can safely unconditionally set the
> credentials this way.

alloc_uevent_skb() will now set

parms = &NETLINK_CB(skb);
parms->creds.uid = GLOBAL_ROOT_UID;
parms->creds.gid = GLOBAL_ROOT_GID;
parms->dst_group = 1;
parms->portid = 0;

explicitly. So repeating that initialization unconditionally here does
not make sense to me. Also, this hits map_uid_down() in user_namespace.c
which is a known-hotpath (Remember the extensive testing we did back for
uidmap limit bumping from 5 to 340.). And even though it might not
matter much in this case there's no need to hit this code. The condition
also make it obvious that only non-initial user namespace uevent sockets
need fixing.

Christian

>
> > + struct netlink_skb_parms *parms = &NETLINK_CB(skb);
> > + kuid_t root_uid;
> > + kgid_t root_gid;
> > +
> > + /* fix uid */
> > + root_uid = make_kuid(owning_user_ns, 0);
> > + if (!uid_valid(root_uid))
> > + root_uid = GLOBAL_ROOT_UID;
> > + parms->creds.uid = root_uid;
> > +
> > + /* fix gid */
> > + root_gid = make_kgid(owning_user_ns, 0);
> > + if (!gid_valid(root_gid))
> > + root_gid = GLOBAL_ROOT_GID;
> > + parms->creds.gid = root_gid;
> > + }
> > +
> > + ret = netlink_broadcast(usk, skb, 0, 1, GFP_KERNEL);
> > + /* ENOBUFS should be handled in userspace */
> > + if (ret == -ENOBUFS || ret == -ESRCH)
> > + ret = 0;
> > +
> > + return ret;
> > +}
> > +#endif