Received: by 10.213.65.68 with SMTP id h4csp1150235imn; Wed, 4 Apr 2018 13:32:43 -0700 (PDT) X-Google-Smtp-Source: AIpwx483TcyrFQ64FH1x3XtrFhBezQB844idjJRx62+Q/kRTSV2z3F4BEHc9WkAXdRqIpxNie48z X-Received: by 10.101.93.65 with SMTP id e1mr9435320pgt.172.1522873963893; Wed, 04 Apr 2018 13:32:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522873963; cv=none; d=google.com; s=arc-20160816; b=E+4wpvtePVDKa28XHf4eGgFKwzWV/T19pxvkIfCvqSgL+l3YWvsLAp6yCQpa07TKJs viK7WxEM7l+1bTKH+kRHaTjMVbAkFhJUVx5GerK28WkOHqovsWTIoYfn+a0Qdzd7Uscs S5+//R7JNoDeGobr9joe5mKi7zexPNRIc2qvqZQLWyhz8RGZ9iqJGy0j+Q0ec36zRHRQ ilDBn9A8daVh8q8uSWtiacfthRb/atZAr+EtCck+ngvzecWwrBmFQi2l092qUP3aBiKa yl2RIulw4jTr4hn1RRUmiE5HQe0sahgol0Eq4B9cUBjbJxdCXf2+WFKi3IW4Nug9FZZV r2Pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:date:from:arc-authentication-results; bh=4GnNlBgR6OfNCBbe73oz+r4FITLhgrfDW70F4NsF01Q=; b=kXa3HoEwK6DA6gf/GtTL8RkEi921SVedwe7uaIuplcsSBRFovsm3m4WyX27J7xNWW7 6e+HO5uFRDhlc7VVkNHML4VygWJD5RLqnACQtId0c+fCIEBu+9ZgbhwR5d1I3ZN4epKE z2VCSlw0LUTWaN71dSPf6i3gB3HsN60X/JFAgBCXlWOh895D0x035a9YYjM7COGNjhZy CjiUaf0G/aZUdvkgtQe/mEg8IJtJU27cT9zaFFLPXKhK83/mxBzridFbbuvA04gfLvv7 wsLByBATDy23dNCmAXaN8i6v8Cq5hjPA0Db9GsG4kfrtgIBGXA9SPy/O3IDA0/PcPrle 1UUg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x33-v6si4765063plb.356.2018.04.04.13.32.29; Wed, 04 Apr 2018 13:32:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752292AbeDDUa5 (ORCPT + 99 others); Wed, 4 Apr 2018 16:30:57 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:37036 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751749AbeDDUaz (ORCPT ); Wed, 4 Apr 2018 16:30:55 -0400 Received: from mail-wr0-f199.google.com ([209.85.128.199]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1f3p3W-0000rA-3g for linux-kernel@vger.kernel.org; Wed, 04 Apr 2018 20:30:54 +0000 Received: by mail-wr0-f199.google.com with SMTP id e15so12141410wrj.14 for ; Wed, 04 Apr 2018 13:30:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=4GnNlBgR6OfNCBbe73oz+r4FITLhgrfDW70F4NsF01Q=; b=owwRaSpEejciAZlFDA0yfNvDAB4MU6ODWqMvIIYcLERUC6+guPjc7JvFHO+GFTHo20 Z32KnJyDLv3XVd8C8wyelMtwp1coKPy8jazsEHe27JlYU8TPuNJ1iRd2XGkka37lF3UU 8l7NtdVSWm/NJyNrFm1TFc6dM8nhOjJA7KA7hox8+E64yrhgf9K+uMuoKeTnODqur0Hp qLqBJ5qxt77kcqBNMoK5Y22WOmLLbHNbOIFmYT6Rx9Nk4TxXqh1AKxuFz8g3G+fHC1co DsLrrOaOCB8cKb/mxXgeukb2FZoQs6qfa3bemEmTBQo088FNfcF+IJfUHOb6CgPvyt8y qoEw== X-Gm-Message-State: AElRT7GUkByRgneD2SRs5T96kDSO+JIz2DZfRztDKNJFbVyHLLcvYQLi OjLuEIRmh+16czO/awg1v+1u5ku92nnXAI0M9ndBmpGGT0m4zRENNz1x3yJC2fTxggizDTQqW+/ 2rwEGpDWlq9Kp9NMhOqKkj4Rsvi9PYhEygFPLnXQWag== X-Received: by 10.28.54.154 with SMTP id y26mr8018737wmh.102.1522873853764; Wed, 04 Apr 2018 13:30:53 -0700 (PDT) X-Received: by 10.28.54.154 with SMTP id y26mr8018724wmh.102.1522873853493; Wed, 04 Apr 2018 13:30:53 -0700 (PDT) Received: from gmail.com ([2a02:8070:8895:9700:f846:923f:cd5e:df8b]) by smtp.gmail.com with ESMTPSA id x9sm13484055wrb.18.2018.04.04.13.30.51 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 04 Apr 2018 13:30:52 -0700 (PDT) From: Christian Brauner X-Google-Original-From: Christian Brauner Date: Wed, 4 Apr 2018 22:30:49 +0200 To: ebiederm@xmission.com, davem@davemloft.net, gregkh@linuxfoundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: avagin@virtuozzo.com, ktkhai@virtuozzo.com, serge@hallyn.com Subject: Re: [PATCH net] netns: filter uevents correctly Message-ID: <20180404203048.GA21118@gmail.com> References: <20180404194857.29375-1-christian.brauner@ubuntu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20180404194857.29375-1-christian.brauner@ubuntu.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 04, 2018 at 09:48:57PM +0200, Christian Brauner wrote: > commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces") > > enabled sending hotplug events into all network namespaces back in 2010. > Over time the set of uevents that get sent into all network namespaces has > shrunk. We have now reached the point where hotplug events for all devices > that carry a namespace tag are filtered according to that namespace. > > Specifically, they are filtered whenever the namespace tag of the kobject > does not match the namespace tag of the netlink socket. One example are > network devices. Uevents for network devices only show up in the network > namespaces these devices are moved to or created in. > > However, any uevent for a kobject that does not have a namespace tag > associated with it will not be filtered and we will *try* to broadcast it > into all network namespaces. > > The original patchset was written in 2010 before user namespaces were a > thing. With the introduction of user namespaces sending out uevents became > partially isolated as they were filtered by user namespaces: > > net/netlink/af_netlink.c:do_one_broadcast() > > if (!net_eq(sock_net(sk), p->net)) { > if (!(nlk->flags & NETLINK_F_LISTEN_ALL_NSID)) > return; > > if (!peernet_has_id(sock_net(sk), p->net)) > return; > > if (!file_ns_capable(sk->sk_socket->file, p->net->user_ns, > CAP_NET_BROADCAST)) > j return; > } > > The file_ns_capable() check will check whether the caller had > CAP_NET_BROADCAST at the time of opening the netlink socket in the user > namespace of interest. This check is fine in general but seems insufficient > to me when paired with uevents. The reason is that devices always belong to > the initial user namespace so uevents for kobjects that do not carry a > namespace tag should never be sent into another user namespace. This has > been the intention all along. But there's one case where this breaks, > namely if a new user namespace is created by root on the host and an > identity mapping is established between root on the host and root in the > new user namespace. Here's a reproducer: > > sudo unshare -U --map-root > udevadm monitor -k > # Now change to initial user namespace and e.g. do > modprobe kvm > # or > rmmod kvm > > will allow the non-initial user namespace to retrieve all uevents from the > host. This seems very anecdotal given that in the general case user > namespaces do not see any uevents and also can't really do anything useful > with them. > > Additionally, it is now possible to send uevents from userspace. As such we > can let a sufficiently privileged (CAP_SYS_ADMIN in the owning user > namespace of the network namespace of the netlink socket) userspace process > make a decision what uevents should be sent. > > This makes me think that we should simply ensure that uevents for kobjects > that do not carry a namespace tag are *always* filtered by user namespace > in kobj_bcast_filter(). Specifically: > - If the owning user namespace of the uevent socket is not init_user_ns the > event will always be filtered. > - If the network namespace the uevent socket belongs to was created in the > initial user namespace but was opened from a non-initial user namespace > the event will be filtered as well. > Put another way, uevents for kobjects not carrying a namespace tag are now > always only sent to the initial user namespace. The regression potential > for this is near to non-existent since user namespaces can't really do > anything with interesting devices. > > Signed-off-by: Christian Brauner That was supposed to be [PATCH net] not [PATCH net-next] which is obviously closed. Sorry about that. Christian > --- > lib/kobject_uevent.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c > index 15ea216a67ce..cb98cddb6e3b 100644 > --- a/lib/kobject_uevent.c > +++ b/lib/kobject_uevent.c > @@ -251,7 +251,15 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data) > return sock_ns != ns; > } > > - return 0; > + /* > + * The kobject does not carry a namespace tag so filter by user > + * namespace below. > + */ > + if (sock_net(dsk)->user_ns != &init_user_ns) > + return 1; > + > + /* Check if socket was opened from non-initial user namespace. */ > + return sk_user_ns(dsk) != &init_user_ns; > } > #endif > > -- > 2.15.1 >