Received: by 10.213.65.68 with SMTP id h4csp275110imn; Sat, 17 Mar 2018 03:32:38 -0700 (PDT) X-Google-Smtp-Source: AG47ELvr6o1os7FoSnoQKP/dEZ+Xo6Pzb8pmlI02i05rMMfNZ0XP3WHerE0SesZ9/qMblvjdrjZX X-Received: by 2002:a17:902:ac96:: with SMTP id h22-v6mr5161076plr.93.1521282758447; Sat, 17 Mar 2018 03:32:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521282758; cv=none; d=google.com; s=arc-20160816; b=gCNZWuOibyh7dHp6GmHAReEwke/5+0GAmRb73TtdDNoeQmqk/WR2uZMGB1Pds1ZvQB FR5bFXKDc1gPSMOOOQqQUVocL/gPatj78FSEyHzRz2hFFpF5GHch9qZqm3d74AhC/gOZ Y+eNH8IuAtCNQx2xWXzDcXJuf6Bc62BH2OsMNumgUYj0XGpJi9VlXPOnROvJjniCEldI ObDZHXIFzcHR1tAepei8hAAY3NQma845B1C9AWl1wJdI7PEjMPa4ENRLBFqgpsBqDMWX miYeGVwnV2cRJWv6leaPd1KnHYJWgN0ZzABR+NjzB+jQpbiMNDB2obYrBUbBW8LIS8U/ n+kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=6zZRz6UN7m7xXUfR/x6dJuUA5UDNTlcNdfHOlxMjrqU=; b=etmnlDcMhuJez5sRerhwiBrpfbmdTcFnTnoma6To+n8agSh2REzki7Bg/o79FyFWsD LenJo2wravT1dliqQwVUqvy+0PX4oBnfXvG7VBZVwpHRfmiECKQbatG/u2GYChqWHKZf 83hZskXweLDI1bqrRIUbh+eN6m2t/4mDFM9aiEWESa8LBuHtOGPE9EGpgL2IBOcSSIkb QhbXUyKuGqazaIAhUccHOjhz45qhGDI9wcYP9sKoN2/YqHssx09EE9RlpyAF/Cmj+4vI OAlZjNT7gUZVzFgLP472iidHU7DjDUNkSd2w/Xp20BpzyKVmqKGh438g9Vv9eChnrFET p9wQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o32-v6si7888540pld.414.2018.03.17.03.32.24; Sat, 17 Mar 2018 03:32:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752383AbeCQKba (ORCPT + 99 others); Sat, 17 Mar 2018 06:31:30 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:59438 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751677AbeCQKb2 (ORCPT ); Sat, 17 Mar 2018 06:31:28 -0400 Received: from mail-wr0-f197.google.com ([209.85.128.197]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1ex97X-00063T-9U for linux-kernel@vger.kernel.org; Sat, 17 Mar 2018 10:31:27 +0000 Received: by mail-wr0-f197.google.com with SMTP id r29so6875141wra.13 for ; Sat, 17 Mar 2018 03:31:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=6zZRz6UN7m7xXUfR/x6dJuUA5UDNTlcNdfHOlxMjrqU=; b=ScSkmaRuD+/+faCr5C17rpjqWuvFpyodDVpYTOtmKwUFyM3pqEGwJBTwitUPt2z0Ms EM8jGugeU2hPLH0JIWImd04J+QWWdUPlACl04iXHPaE6Bnfw7elXMYv6ye+Tegif2Dhk Kl7hBqYJB0+lXyL87O1wOZDMD/3jGIgYG1GDXeDi4HAYKk+8XWMwlCXfnM8eUruXcX69 d0mPrp5hP46SjhYKbtXZH3urSrKajB13u0CiBTWi6XyuzYLPTNcvcN2TUOH4culcRIKG CINvp6Al+PuvRv2Vhb31HO54h6KN2ETukZNPZj1Yx5mAMRo/1WuCTCmSQv8SqM9cMuKa hrgQ== X-Gm-Message-State: AElRT7EVZgwTiqSmdAy2dDyslXgePFlr13F/27b4g6gmb+J7i6lAxa7x Sf/b/EdBDzJLBAV29QUrejoCXITkPZXdJS5uHwRu5CdCEu4pvRd/XOua6hPkhV6epgl8Cm/yelR Fj2348dOTRsdtNvh4JPvitX2D/RwOWoydennNPC8+Fg== X-Received: by 10.28.154.5 with SMTP id c5mr3666312wme.122.1521282686826; Sat, 17 Mar 2018 03:31:26 -0700 (PDT) X-Received: by 10.28.154.5 with SMTP id c5mr3666296wme.122.1521282686484; Sat, 17 Mar 2018 03:31:26 -0700 (PDT) Received: from gmail.com ([2a02:8070:8895:9700:4c9d:d656:8952:c1b2]) by smtp.gmail.com with ESMTPSA id g7sm7961644wrb.78.2018.03.17.03.31.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 17 Mar 2018 03:31:25 -0700 (PDT) Date: Sat, 17 Mar 2018 11:31:24 +0100 From: Christian Brauner To: Kirill Tkhai Cc: Christian Brauner , ebiederm@xmission.com, gregkh@linuxfoundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, serge@hallyn.com, avagin@virtuozzo.com Subject: Re: [PATCH v2] netns: send uevent messages Message-ID: <20180317103123.GB4301@gmail.com> References: <20180316125030.23624-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 16, 2018 at 11:14:31PM +0300, Kirill Tkhai wrote: > On 16.03.2018 15:50, Christian Brauner wrote: > > This patch adds a receive method to NETLINK_KOBJECT_UEVENT netlink sockets > > to allow sending uevent messages into the network namespace the socket > > belongs to. > > > > Currently non-initial network namespaces are already isolated and don't > > receive uevents. There are a number of cases where it is beneficial for a > > sufficiently privileged userspace process to send a uevent into a network > > namespace. > > > > One such use case would be debugging and fuzzing of a piece of software > > which listens and reacts to uevents. By running a copy of that software > > inside a network namespace, specific uevents could then be presented to it. > > More concretely, this would allow for easy testing of udevd/ueventd. > > > > This will also allow some piece of software to run components inside a > > separate network namespace and then effectively filter what that software > > can receive. Some examples of software that do directly listen to uevents > > and that we have in the past attempted to run inside a network namespace > > are rbd (CEPH client) or the X server. > > > > Implementation: > > The implementation has been kept as simple as possible from the kernel's > > perspective. Specifically, a simple input method uevent_net_rcv() is added > > to NETLINK_KOBJECT_UEVENT sockets which completely reuses existing > > af_netlink infrastructure and does neither add an additional netlink family > > nor requires any user-visible changes. > > > > For example, by using netlink_rcv_skb() we can make use of existing netlink > > infrastructure to report back informative error messages to userspace. > > > > Furthermore, this implementation does not introduce any overhead for > > existing uevent generating codepaths. The struct netns gets a new uevent > > socket member that records the uevent socket associated with that network > > namespace including its position in the uevent socket list. Since we record > > the uevent socket for each network namespace in struct net we don't have to > > walk the whole uevent socket list. Instead we can directly retrieve the > > relevant uevent socket and send the message. At exit time we can now also > > trivially remove the uevent socket from the uevent socket list. This keeps > > the codepath very performant without introducing needless overhead and even > > makes older codepaths faster. > > > > Uevent sequence numbers are kept global. When a uevent message is sent to > > another network namespace the implementation will simply increment the > > global uevent sequence number and append it to the received uevent. This > > has the advantage that the kernel will never need to parse the received > > uevent message to replace any existing uevent sequence numbers. Instead it > > is up to the userspace process to remove any existing uevent sequence > > numbers in case the uevent message to be sent contains any. > > > > Security: > > In order for a caller to send uevent messages to a target network namespace > > the caller must have CAP_SYS_ADMIN in the owning user namespace of the > > target network namespace. Additionally, any received uevent message is > > verified to not exceed size UEVENT_BUFFER_SIZE. This includes the space > > needed to append the uevent sequence number. > > > > Testing: > > This patch has been tested and verified to work with the following udev > > implementations: > > 1. CentOS 6 with udevd version 147 > > 2. Debian Sid with systemd-udevd version 237 > > 3. Android 7.1.1 with ueventd > > > > Signed-off-by: Christian Brauner > > --- > > Changelog v1->v2: > > * Add the whole struct uevent_sock to struct net not just the socket > > member. Since struct uevent_sock records the position of the uevent > > socket in the uevent socket list we can trivially remove it from the > > uevent socket list during cleanup. This speeds up the old removal > > codepath. list_del() will hitl __list_del_entry_valid() in its call chain > > which will validate that the element is a member of the list. If it isn't > > it will take care that the list is not modified. > > Changelog v0->v1: > > * Hold mutex_lock() until uevent is sent to preserve uevent message > > ordering. See udev and commit for reference: > > > > commit 7b60a18da393ed70db043a777fd9e6d5363077c4 > > Author: Andrew Vagin > > Date: Wed Mar 7 14:49:56 2012 +0400 > > > > uevent: send events in correct order according to seqnum (v3) > > > > The queue handling in the udev daemon assumes that the events are > > ordered. > > --- > > include/linux/kobject.h | 6 +++ > > include/net/net_namespace.h | 4 +- > > lib/kobject_uevent.c | 98 ++++++++++++++++++++++++++++++++++++++------- > > 3 files changed, 93 insertions(+), 15 deletions(-) > > > > diff --git a/include/linux/kobject.h b/include/linux/kobject.h > > index 7f6f93c3df9c..c572c7abc609 100644 > > --- a/include/linux/kobject.h > > +++ b/include/linux/kobject.h > > @@ -39,6 +39,12 @@ extern char uevent_helper[]; > > /* counter to tag the uevent, read only except for the kobject core */ > > extern u64 uevent_seqnum; > > > > +/* uevent socket */ > > +struct uevent_sock { > > + struct list_head list; > > + struct sock *sk; > > +}; > > I missed, why we do this external? Mostly because of transparency but I've kept it directly in kobject_uevent.c in the new version for now. > > > + > > /* > > * The actions here must match the index to the string array > > * in lib/kobject_uevent.c > > diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h > > index f306b2aa15a4..abd7d91bffac 100644 > > --- a/include/net/net_namespace.h > > +++ b/include/net/net_namespace.h > > @@ -40,7 +40,7 @@ struct net_device; > > struct sock; > > struct ctl_table_header; > > struct net_generic; > > -struct sock; > > +struct uevent_sock; > > struct netns_ipvs; > > > > > > @@ -79,6 +79,8 @@ struct net { > > struct sock *rtnl; /* rtnetlink socket */ > > struct sock *genl_sock; > > > > + struct uevent_sock *uevent_sock; /* uevent socket */ > > Since there will be one more version, could you please to move all preparation > related to the above change to a separate patch? Then we have series of two patches > with two logical changes. Yes, I split adding struct uevent_sock to struct net and related improvements in kobject_uevent.c into a separate patch in v3. Thanks! Christian > > > + > > struct list_head dev_base_head; > > struct hlist_head *dev_name_head; > > struct hlist_head *dev_index_head; > > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c > > index 9fe6ec8fda28..53e9123474c0 100644 > > --- a/lib/kobject_uevent.c > > +++ b/lib/kobject_uevent.c > > @@ -25,6 +25,7 @@ > > #include > > #include > > #include > > +#include > > #include > > > > > > @@ -33,10 +34,6 @@ u64 uevent_seqnum; > > char uevent_helper[UEVENT_HELPER_PATH_LEN] = CONFIG_UEVENT_HELPER_PATH; > > #endif > > #ifdef CONFIG_NET > > -struct uevent_sock { > > - struct list_head list; > > - struct sock *sk; > > -}; > > static LIST_HEAD(uevent_sock_list); > > #endif > > > > @@ -602,12 +599,88 @@ int add_uevent_var(struct kobj_uevent_env *env, const char *format, ...) > > EXPORT_SYMBOL_GPL(add_uevent_var); > > > > #if defined(CONFIG_NET) > > +static int uevent_net_broadcast(struct sock *usk, struct sk_buff *skb, > > + struct netlink_ext_ack *extack) > > +{ > > + int ret; > > + /* u64 to chars: 2^64 - 1 = 21 chars */ > > + char buf[sizeof("SEQNUM=") + 21]; > > + struct sk_buff *skbc; > > + > > + /* bump and prepare sequence number */ > > + ret = snprintf(buf, sizeof(buf), "SEQNUM=%llu", ++uevent_seqnum); > > + if (ret < 0 || (size_t)ret >= sizeof(buf)) > > + return -ENOMEM; > > + ret++; > > + > > + /* verify message does not overflow */ > > + if ((skb->len + ret) > UEVENT_BUFFER_SIZE) { > > + NL_SET_ERR_MSG(extack, "uevent message too big"); > > + return -EINVAL; > > + } > > + > > + /* copy skb and extend to accommodate sequence number */ > > + skbc = skb_copy_expand(skb, 0, ret, GFP_KERNEL); > > + if (!skbc) > > + return -ENOMEM; > > + > > + /* append sequence number */ > > + skb_put_data(skbc, buf, ret); > > + > > + /* remove msg header */ > > + skb_pull(skbc, NLMSG_HDRLEN); > > + > > + /* set portid 0 to inform userspace message comes from kernel */ > > + NETLINK_CB(skbc).portid = 0; > > + NETLINK_CB(skbc).dst_group = 1; > > + > > + ret = netlink_broadcast(usk, skbc, 0, 1, GFP_KERNEL); > > + /* ENOBUFS should be handled in userspace */ > > + if (ret == -ENOBUFS || ret == -ESRCH) > > + ret = 0; > > + > > + return ret; > > +} > > + > > +static int uevent_net_rcv_skb(struct sk_buff *skb, struct nlmsghdr *nlh, > > + struct netlink_ext_ack *extack) > > +{ > > + int ret; > > + struct net *net; > > + > > + if (!nlmsg_data(nlh)) > > + return -EINVAL; > > + > > + /* > > + * Verify that we are allowed to send messages to the target > > + * network namespace. The caller must have CAP_SYS_ADMIN in the > > + * owning user namespace of the target network namespace. > > + */ > > + net = sock_net(NETLINK_CB(skb).sk); > > + if (!netlink_ns_capable(skb, net->user_ns, CAP_SYS_ADMIN)) { > > + NL_SET_ERR_MSG(extack, "missing CAP_SYS_ADMIN capability"); > > + return -EPERM; > > + } > > + > > + mutex_lock(&uevent_sock_mutex); > > + ret = uevent_net_broadcast(net->uevent_sock->sk, skb, extack); > > + mutex_unlock(&uevent_sock_mutex); > > + > > + return ret; > > +} > > + > > +static void uevent_net_rcv(struct sk_buff *skb) > > +{ > > + netlink_rcv_skb(skb, &uevent_net_rcv_skb); > > +} > > + > > static int uevent_net_init(struct net *net) > > { > > struct uevent_sock *ue_sk; > > struct netlink_kernel_cfg cfg = { > > .groups = 1, > > - .flags = NL_CFG_F_NONROOT_RECV, > > + .input = uevent_net_rcv, > > + .flags = NL_CFG_F_NONROOT_RECV > > }; > > > > ue_sk = kzalloc(sizeof(*ue_sk), GFP_KERNEL); > > @@ -621,6 +694,9 @@ static int uevent_net_init(struct net *net) > > kfree(ue_sk); > > return -ENODEV; > > } > > + > > + net->uevent_sock = ue_sk; > > + > > mutex_lock(&uevent_sock_mutex); > > list_add_tail(&ue_sk->list, &uevent_sock_list); > > mutex_unlock(&uevent_sock_mutex); > > @@ -629,22 +705,16 @@ static int uevent_net_init(struct net *net) > > > > static void uevent_net_exit(struct net *net) > > { > > - struct uevent_sock *ue_sk; > > + struct uevent_sock *ue_sk = net->uevent_sock; > > > > mutex_lock(&uevent_sock_mutex); > > - list_for_each_entry(ue_sk, &uevent_sock_list, list) { > > - if (sock_net(ue_sk->sk) == net) > > - goto found; > > - } > > - mutex_unlock(&uevent_sock_mutex); > > - return; > > - > > -found: > > list_del(&ue_sk->list); > > mutex_unlock(&uevent_sock_mutex); > > > > netlink_kernel_release(ue_sk->sk); > > kfree(ue_sk); > > + > > + return; > > } > > > > static struct pernet_operations uevent_net_ops = { > > Thanks, > Kirill